WHATWG on 2022-01-07

11:56	<annevk>	emilio: so :active/:hover use the flat tree, but :has() uses the node tree? Is using the flat tree for selector matching not really expensive?
12:26	<emilio>	emilio: so :active/:hover use the flat tree, but :has() uses the node tree? Is using the flat tree for selector matching not really expensive? That is right. :has() uses the node tree like all other selector combinators, why would it use the flat tree?
12:26	<emilio>	Not that it would be impossible to do but it'd be weird
12:28	<annevk>	emilio: yeah never mind, I guess it all makes sense. And :has can't really use the flat tree as that'd break encapsulation, I think.
12:30	<emilio>	Right
15:48	<Domenic>	Thinking of trying to tackle https://github.com/whatwg/dom/issues/849 again, or at least make progress ... is there any easy way to know what element names the HTML parser accepts? Or do I have to walk through various parser states?
15:49	<Domenic>	I guess looking at https://html.spec.whatwg.org/#tag-open-state + tag name state is not too bad...
16:14	<Sam Sneddon [:gsnedders]>	can someone who understands how event dispatch is specified comment on https://bugs.webkit.org/show_bug.cgi?id=234730? because I've utterly confused myself now.
16:27	<annevk>	Domenic: can element names contain `>` today? That seems problematic
16:27	<annevk>	Domenic: I doubt we want to allow CR
16:27	<Domenic>	> is excluded from LenientElementNameStartChar and LenientElementNameChar in my sketch
16:28	<Domenic>	I think CR is probably disallowed by the parser but as preprocessing, so I didn't see it when reading. Good catch.
16:28	<Domenic>	Although what about entities hmm
16:29	<Domenic>	Entities don't work in tag names, huh
16:30	<annevk>	Oh I missed `>` there
16:30	<annevk>	Yeah entities only work inside attributes or between tags
16:30	<annevk>	(Context is https://github.com/whatwg/dom/issues/849 fwiw.)
16:31	<annevk>	The other thing I wonder about is whether we should only add leniency for the HTML namespace
16:32	<annevk>	But maybe it doesn't matter so much as you can already create trees that cannot serialize as XML so simplicity ought to win
16:33	<Domenic>	I cannot find what in the spec disallows the parser from creating elements with CR
16:33	<Domenic>	But browsers do not allow it http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=9940
16:34	<Domenic>	I think https://html.spec.whatwg.org/#tag-name-state is missing CR
16:34	<Andreu Botella (he/they)>	CR is handled in the preprocessing stage
16:34	<annevk>	Before the tokenization stage, the input stream must be preprocessed by normalizing newlines.
16:34	<Domenic>	I couldn't find that Ctrl+Fing for "CARRIAGE RETURN", and lots of parser steps actually look for CR...
16:35	<Domenic>	OK, so why does CR appear explicitly in places like https://html.spec.whatwg.org/#the-initial-insertion-mode
16:35	<annevk>	Domenic: I think that might be due to an entity reference?
16:36	<Domenic>	Seems plausible
16:37	<annevk>	Yeah, it's a conformance error, but it will get through
16:37	<Domenic>	Yeah because the tokenizer converts them then returns to the state it was in previously
16:37	<annevk>	No idea why that was not normalized as well...
16:37	<Domenic>	OK, updating whatwg/dom thread to exclude CR, and it looks like there are no spec bugs around CR
16:38	<annevk>	I guess it wasn't normalized because you can also get there through JS and guarding all entry points would be somewhat pointless overhead
16:39	<annevk>	(Not that specific point, but as an attribute value, say.)
16:39	<Domenic>	Well I think the idea is if you do ` ` in certain places then you actually should end up with a CR in the resulting parsed data
16:39	<Domenic>	And so e.g. if you do that in early parts of the document then the initial insertion mode state will actually see the CR and ignore it, not normalize it
16:40	<annevk>	Right, though then the question is why `` doesn't work (but JS equivalents do)
16:40	<Domenic>	https://html.spec.whatwg.org/#parsing-main-incolgroup is a better example where it inserts the CR instead of ignoring it.
16:41	<Domenic>	Hmmm
16:41	<annevk>	Finding logic in the parser might not be the best use of our time 🙂
16:42	<Domenic>	Yeah OK good point
16:42	<annevk>	For strictly split on : it might be worth clarifying you'd split on the first or concatenate return values 1...N
16:45	<annevk>	For Prefix there might still be some edge cases I suspect due to XML 4th/5th edition divide, where browsers didn't uniformly stick with the 4th (not entirely sure if some updated the parser, but not the corresponding DOM methods)
16:47	<Domenic>	Yeah I wonder about tests, I wonder if we can apply https://randomascii.wordpress.com/2014/01/27/theres-only-four-billion-floatsso-test-them-all/ to this
16:47	<Domenic>	Probably not :)
16:48	<annevk>	For PCENChar I think the banning of noncharacters is a bit dumb and removing that would simplify the production a lot
16:48	<Domenic>	Yeah I don't have strong feelings there, happy to take a new suggestion.
16:54	<annevk>	Oh I see, that came from XML and we'll preserve some of that through NameStartChar. I guess I'd consider simplifying that as well to C0 and above or even A0 and above (like URL code points), but I'm not sure how much we want to go for
16:55	<Domenic>	On the one hand, it's pretty separable. On the other hand, maybe we should do this all at once, since it's hard to get momentum for these sorts of things.
16:56	<Domenic>	Oh, or you mean just making LenientNameStartChar even more lenient
16:56	<annevk>	Well all of them I suppose. Less range checks ftw
16:57	<annevk>	Nobody has ever proven the value of segmenting Unicode in such a way to my knowledge and most things work fine without it
16:58	<Domenic>	Unpaired surrogates?
16:59	<annevk>	Hmm that's a good point, you included them but does that actually work?
17:00	<annevk>	Oh wait, URLs do consider noncharacters non-conforming, but they do work. Surrogates cannot work there however.
17:06	<Domenic>	I think they work http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=9941
17:06	<annevk>	Domenic: as for testing, given Unicode is 2^21 if I'm not mistaken that might actually be feasible? Element-creation is a bit more expensive than floats though I suppose 🙂
17:06	<Domenic>	Yeah that's my worry.
17:07	<Domenic>	IIRC we already have a cloneNode test that can cause timeouts just when creating + cloning one instance of every existing/historical HTML tag name
17:07	<Domenic>	C++ unit tests in browsers could probably be exhaustive though
17:09	<annevk>	We could have a manual test for us to verify things on the side and for when computers get fast
17:10	<annevk>	It does seem that surrogates are fair game, hurray
17:15	<Domenic>	I guess the question is whether we want to make it easier to create non-serializable DOMs via DOM APIs. I think that's slightly bad? So maybe sticking with the union of current DOM API values + HTML parser values would be better than just allowing the DOM APIs to be maximally free.
17:28	<annevk>	Domenic: that would somewhat argue for branching on the HTML/SVG/MathML namespaces which is a bit odd
17:29	<annevk>	But I guess it still works if the approach is minimal set of steps starting from the status quo, or some such
17:29	<Domenic>	Well, within reason, I guess :)
17:29	<Domenic>	I'll add a comment with your approach
17:30	<annevk>	Thanks, I guess I mainly want to hear from someone that these range checks are negligible, since otherwise we might as well improve that while we're there
17:35	<Domenic>	In your version what is the justification for excluding tab, LF, CR, FF, space, /, >, and NULL? If we are no longer concerned about serializability seems like they could be allowed...
17:42	<annevk>	Domenic: I think it would all still work in the HTML parser when serialized? It's mainly XML that's affected for worse
17:42	<Domenic>	Oh right
17:42	<Domenic>	Not sure what I was thinking
17:42	<Domenic>	OK I am more in favor of your proposal now
17:50	<Domenic>	OK no I see what I was saying. Consider the following element local name: "$a". This is currently disallowed by createElement(). And the parser cannot parse it. But if we allow a larger set for createElement(), then the resulting serialization is something the parser cannot parse.
17:51	<Domenic>	Like ideally createElement() would only accept ASCII alpha for the first character; that would guarantee it only ever creates elements which serialize in a parseable way. But we already accept, for whatever reason, NameStartChar. The conclusion then is we should probably not expand beyond NameStartChar.
18:54	<ntim>	Domenic: Hi! I'm curious about how the <popup> anchoring works, and how/if it affects the containing block of the element? <popup> are designed to be in the top layer which forces the containing block to the viewport, so it's a question i'm wondering.
18:54	<Domenic>	ntim: I don't know much about <popup>; mfreed is the person to ask there.
18:55	ntim	hopes it's not yet another special positioning algorithm that isn't CSS describable
19:04	<Alan Stearns>	As far as I know, anchoring of popups is still under discussion. This has some history, and there’s a current proposal linked at the end: https://github.com/w3ctag/design-reviews/issues/599