WHATWG on 2026-02-04

00:00	<Andreu Botella>	sure, as long as when publishing a new spec someone makes sure that it's not wrongly marked as current work
04:58	<nektro>	got https://github.com/nektro/zig-whatwg-url to pass 100% of `urltestdata.json` and `IdnaTestV2.json` (w/ base included this time)
11:45	<foolip>	What's up with https://www.w3.org/TR/xml/#NT-NameStartChar claiming that names in XML can start with ":"?
11:48	<foolip>	https://www.w3.org/TR/xml-names/#orphans seems to touch on this, but is non-normative...
11:56	<sideshowbarker>	Namespaces
11:56	<sideshowbarker>	it’s superseded by Namespaces, right?
11:56	<sideshowbarker>	so I guess in a non-Namespace aware XML implementation, it would be allowed
11:57	<sideshowbarker>	but there are no such implementations, I guess
11:57	<sideshowbarker>	there would be no point in any non-Namespace-aware XML implementation
11:57	<foolip>	Hmm, seems like https://www.w3.org/TR/xml-names/ doesn't monkeypatch this in the way I'd expect and not normatively, but fine, it's namespaces.
11:57	<foolip>	As long as ":" is the only special case it's all good.
11:58	<foolip>	I thought I might be reading the wrong thing entirely.
11:58	<sideshowbarker>	yeah, have fun trying to implement from those specs to begin with, I guess
11:58	<sideshowbarker>	I mean, they are not rigorous specs, by our current standards for such
11:59	<sideshowbarker>	I guess the productions are rigorous enough
12:02	<Noam Rosenthal>	`:` aside, this seems to be relatively close to `ID_Start` and `ID_Continue` in unicode (https://www.unicode.org/reports/tr31/#D1) (*TIL) We can probably use those for valid PI targets in HTML as well, which would exclude current uses of bogus comment such as `lit$$`
13:23	<sideshowbarker>	hsivonen: friendly bump on https://github.com/validator/htmlparser/pull/113
13:29	<hsivonen>	I happened to be already reviewing it, but I'm a bit confused. Should I be looking at some spec PR that's not landed, yet?
13:33	<eemeli>	XML's `NameStartChar` and `NameChar` have not been updated to match Unicode changes, so e.g. the invisible Arabic Letter Mark U+061C counts as a valid first (and potentially only) character of an XML name.
13:40	<hsivonen>	For sure XML tooling compat, one needs to stay within the XML rules as they were in 1998. I'm skeptical of mixing UAX 31 into HTML conformance requirements. Rules like https://html.spec.whatwg.org/#custom-data-attribute have worked well enough so far. I suppose the validator could complain about going outside UAX 31, but UAX 31 checks would be bad for browser DOM perf.
13:40	<Noam Rosenthal>	Agreed, I think we can remain with a simple ascii subset of this.
13:44	<sideshowbarker>	No, that is intended to functionally match the spec requirements as I understand them. But if it’s too much of deviation that it’s not mapping back to the spec clearly enough, then I guess I need to revisit it. Or if you’re saying that as currently written, it’s not even functionally matching the spec requirements, then I guess I definitely need to go back and try again.
13:46	<hsivonen>	sideshowbarker: At least it looks different from the spec. I haven't worked out if it's actually functionally equivalent, but I'll try to think about that more now that I know that I'm not supposed to be looking at an in-flight spec PR.
13:49	<sideshowbarker>	Well also go back in right now and make a re-try at seeing if I can make it strictly follow the spec algorithm. To be honest, in the first crack I took at it, I was focused just making it work for what I needed in the context of the HTML checker. Wasn’t remembering that it’d need to work in, you know, the more-important context of being in browser engine.
13:50	<sideshowbarker>	For me it’s always a bit of a challenge working at another level of remove, the way that’s necessary for working on the parser source in Java
13:53	<sideshowbarker>	hsivonen: well, and to put it in other terms, I was also writing against getting it to pass the tests. And it does pass the tests
13:56	<hsivonen>	sideshowbarker: Does the validator need to know in the steaming SAX mode whether a target that a browser would clone an "option" into exists earlier in the doc?
14:08	<sideshowbarker>	No. Because it’s using SAXStreamer, right? We don’t use SAXTreeBuilder for the checker. It builds no tree at all. So, never clones any content itself. Never fires any events for the cloning. It only ever knows about the source elements, and not at all about the browser-would-clone-this content.
14:10	<hsivonen>	sideshowbarker: So there's no validator hard requirement to track `selectedcontent` in the tree builder?
14:18	<sideshowbarker>	Right. No such requirement. So the selectedContentPointer etc. stuff gets called — but it’s all a no-op. But it’s there because of the inheritance from TreeBuilder. I mean, the only alternative would be to just have it in the code of the actual-tree-building-implementation subclasses where it’s not a no-op.
14:19	<sideshowbarker>	I guess I could try to see if I could make it be that way, if you that’d be better.
14:19	<foolip>	zcorpan hsivonen I'm curious if you have revised thoughts on > vs ?> for both conformance and serialization? https://github.com/whatwg/html/pull/12118#issuecomment-3844500491
14:20	<foolip>	At first I was convinced about the argument that a stray > could be flagged if we require ?>, but then I couldn't write an example that made sense...
14:21	<zcorpan>	foolip: `<?xml-stylesheet href="data:text/css,a>b{}"?>`
14:26	<foolip>	That's still a syntax error because the attribute value doesn't have a closing ", right?
14:27	<hsivonen>	A syntax error on another layer, right?
14:29	<foolip>	Yes, true with the plan I suggested. If we want it to be an error in the first layer, we could tokenize pseudo-attributes in the main parser.
14:31	<foolip>	Do we have any way to signal parse errors that would happen in the data setter? That's a question regardless of conformance, since you can get > in data.
14:34	<hsivonen>	We can have > without a preceding question mark be a HTML-layer conformance error even if pseudo attributes are parsed on the DOM layer. As for the DOM setters, I believe we already have plenty of ways to cause problematic serializations using DOM setters.
14:34	<zcorpan>	Right, e.g. --> in comment data is allowed
14:40	<foolip>	If I did tokenize pseudo-attributes in the main parser and did the parse errors in all the right places, would that tip the balance in favor of making just > the recommended syntax? My concern in a nutshell is that with ?> it looks like > in data should work but still it doesn't.
14:44	<zcorpan>	foolip: What would happen when dynamically changing .data? Still need DOM-level parsing also?
14:44	<hsivonen>	We should not put the pseudo-attribute parsing in the HTML parser.
14:45	<hsivonen>	foolip: Would you change how Blink and WebKti serialize?
14:46	<hsivonen>	sideshowbarker: I posted non-exhaustive review comments. I suspect that removing the tree builder tracking of `selectedcontent` will result in coming closer to the spec generally.
14:46	<sideshowbarker>	Ok, I’ll take a look there
14:48	<hsivonen>	(I need to step away from the chat now, unfortunately.)
15:30	<foolip>	I'd be happy to change it in Blink if the decision is to make <?foo> the more canonical HTML syntax, with whatever parsing and conformance changes required, yes.
15:31	<foolip>	I have to admit this is probably the right stance, because there's no way we can put the parsing in the XML parser for for PIs.
15:32	<zcorpan>	I'm ok with optional ? if the serializer omits it
15:41	<Noam Rosenthal>	I think this is equivalent to the stray slash in `<img />`, it should be tolerated but not be canonical
16:29	<foolip>	zcorpan: do you think we could define conformance rules for PI data to say that they have to follow certain rules, so that this becomes more like validating an attribute value than a parse error produced by the tokenizer?
18:58	<sideshowbarker>	What would be an example of such a rule?
19:41	<zcorpan>	@foolip:matrix.org: sure, can say that the data must follow the syntax for pseudo attributes
19:45	<zcorpan>	@foolip:matrix.org: is the idea that we ałlow authors to use any target names, and we just check for compat before minting new standardized PIs?
20:51	<Noam Rosenthal>	Yea pretty much, though we can maybe make some guarantee that we won't use `-` in minted ones, so that userland can feel confident they can use them (kind of like custom elements)
21:35	<foolip>	zcorpan: I think the choices are requiring pseudo-attribute syntax in data just for known PIs, or we just say that all PIs, including future PIs and custom PIs (with hyphens?) will use pseudo-attributes. I think the latter is probably fine in practice, because you can just put whatever in an attribute value if you want that.
21:36	<foolip>	I think we'd do the attribute parsing for any PI, so it would make the most sense to also have the conformance requirement for any PI.