09:15
<sideshowbarker>

zcorpan: About the validator-nu/Mozilla HTML parser: It seems like either we must not have any good tests for reporting of invalid character references in html5lib-tests, or else we do but we’re just not running them in CI anywhere for the validator-nu/Mozilla HTML parser code. Otherwise I would have caught https://github.com/validator/htmlparser/issues/82 long ago.

A few years back, I did try to set up the htmlparser repo CI to run the html5lib-tests suite, but it was failing due to known issues with it not conforming to encoding handling. So in https://github.com/validator/htmlparser/pull/48 I had spent a bunch of time trying to patch it before I ended up realizing that Henri had long before that already started a branch to fix it properly (but subsequently ended up abandoning it).

But I guess what I should do at this point is to try to set up CI for it again — but have it ignore all the failing encoding tests from the html5lib-tests suite.

09:24
<sideshowbarker>
Maybe that’s what the Mozilla CI is actually already doing, I dunno. But if it is, then I think that CI must also not be running the error-reporting tests for invalid character references. Otherwise it would have caught the https://github.com/validator/htmlparser/issues/82 regression.
09:26
<sideshowbarker>

zcorpan: And the context for me mentioning this now is https://github.com/validator/htmlparser/issues/82 — which I fixed today in https://github.com/validator/htmlparser/pull/83.

But that needs review before merging into the main branch, and I guess it might be a while before Henri is able to review it. And I don’t know who else from Mozilla is currently able to review HTML parser patches

But anyway, after looking at the issue and the code and testing it, I realized it’s a pretty serious regression in the expected error-reporting behavior for invalid named character references. I think it’s probably causing no error to be reported for most “normal” cases of invalid character references (not just the pathological cases like the &ThisAmpersandShouldBeDeemedAmbiguous; case reported in that issue) — and it’s been that way for nearly 2 years now, since https://github.com/validator/htmlparser/commit/44427c0df60ec899c52b149bba03e1abffad2f5e.

All that said, I guess this doesn’t affect Firefox behavior at all beyond the error flagging that’s shown in View Source. And for the HTML checker for now, I’m going ahead and temporarily switching back to having the HTML checker build use a feature branch rather than the main branch (until the fix can be merged). So it’s not really urgent.

But as I mentioned above, it has made me realize that we really should consider having CI set up somewhere to properly run all the right html5lib-tests — before any further changes get committed to the sources.

13:12
<zcorpan>
sideshowbarker: thanks for the heads up. Seems like it can wait for Henri, but ideally we ought to have another reviewer :)
13:49
<zcorpan>
annevk or Domenic : https://github.com/whatwg/html/pull/8008 seems ready. Do either of you want to do an editorial pass? Thx