WHATWG on 2026-01-30

05:20	<sideshowbarker>	https://x.com/filpizlo/status/2017074082249048347
07:14	<zcorpan>	Commented. Thanks
07:19	<sideshowbarker>	zcorpan: We were talking on the whattnot call about the “Why not just use PIs instead of minting a new exclamation point syntax?” question for the marker thing, and Noam mentioned you had expressed some rationale for why. What was that? Or could you give a link(s) to your comment(s) on that?
07:20	<sideshowbarker>	IMHO Henri makes a pretty strong case for why we should just use PIs — unless there’s some really compelling reason not to.
07:33	<Noam Rosenthal>	IMHO Henri makes a pretty strong case for why we should just use PIs — unless there’s some really compelling reason not to. FWIW I was convinced especially by the arguments that extensions might be incompat; I think we can have something like ProcessingInstruction.prototype.marker that points to the actual clean marker data, not even a subclass, and can respond to mutations of the PI's data. This allows zero modifications to DOM and makes marker inspection and manipulation agnostic of whether its xml or HTML
07:45	<zcorpan>	sideshowbarker: I proposed it in https://github.com/whatwg/html/issues/11542#issuecomment-3759955087 The main benefit I think is less confusion about the end of the "tag" in the syntax. In XML PIs end with ?> but in the HTML parser it ends with >. But the compat with XML tooling and DOM traversal scripts is more important, so I think now we should use PIs and live with mutability. We can trim a trailing "?" in the parser so polyglot is still possible.
07:51	<sideshowbarker>	Ah I see: If we're going to use a new node type instead of ProcessingInstruction, there's no need to use the syntax for processing instructions. So yeah, objectively it’d make sense if we did have a very good reason to use a new node type. Which takes it back more to just being about the question of whether we need to use a new node type. And yeah the “the compat with XML tooling and DOM traversal scripts” argument seems to weigh things more heavily toward not doing that.
07:54	<Noam Rosenthal>	Note that `ProcessingInstruction`'s "target" is immutable, so at least the marker type can remain immutable and mutating the data only affects the name and whatever attributes future usages of this process
08:03	<zcorpan>	Noam Rosenthal: ah yes. Simplest would be to not have pseudo-attributes but just take the data and treat as the name. If we want fail-open future compat: no have pseudo-attributes for now, but let the name be the part in the data before any whitespace.
08:04	<Noam Rosenthal>	Noam Rosenthal: ah yes. Simplest would be to not have pseudo-attributes but just take the data and treat as the name. If we want fail-open future compat: no have pseudo-attributes for now, but let the name be the part in the data before any whitespace. Yes I think the latter. There are already emerging cases for pseudo attributes, eg for using this for css custom highlights
08:05	<Noam Rosenthal>	I wonder if it's important to have it as a property of a PI rather than a subclass. HTML elements are subclasses despite not being known to XML
08:19	<zcorpan>	Hmm yeah, subclass seems like it should be compatible enough. I think it would only break checks like `node.constructor.name` but probably scripts usually use `nodeType`
08:58	<annevk>	Do we need a subclass? If we actually define how the pseudo-attributes are parsed in a way that exactly matches what we do for style sheets we could just repurpose all of it.
09:00	<annevk>	We might have some flexibility to start allowing unquoted values in there or names without values. Not sure.
09:01	<Noam Rosenthal>	Do we need a subclass? If we actually define how the pseudo-attributes are parsed in a way that exactly matches what we do for style sheets we could just repurpose all of it. Probably for the case where a plain xml PI is adopted in an HTML doc
09:02	<Noam Rosenthal>	But not 100% certain
09:03	<annevk>	I'm not sure I understand. I think Henri's idea is that we essentially have PIs with as much structure as they have today and then the structured data is created lazily on top, perhaps exposed through an API. And the syntax for the structured data follows what we already implement for the xml-stylesheet PI.
09:05	<Noam Rosenthal>	I'm not sure I understand. I think Henri's idea is that we essentially have PIs with as much structure as they have today and then the structured data is created lazily on top, perhaps exposed through an API. And the syntax for the structured data follows what we already implement for the xml-stylesheet PI. Yes, the question is if the marker specific API is exposed for things that are not a marker target. Currently though it's just a name
09:06	<annevk>	I think we just make all PIs possible markers and all of them support some kind of attribute API. So you can do <?highlight> and such.
09:06	<annevk>	And then highlightPI.setAttribute('type', 'syntax-error') or some such.
09:06	<Noam Rosenthal>	I think we just make all PIs possible markers and all of them support some kind of attribute API. So you can do <?highlight> and such. That's also ok though the parser needs to know which ones to parse, to avoid web compat issues
09:07	<Noam Rosenthal>	Ie parse unknown ones as bogus comments
09:07	<annevk>	Have we already identified we can't parse <? as PIs universally?
09:07	<Noam Rosenthal>	We identified that there is use of them in the wild as bogus comments
09:08	<Noam Rosenthal>	Not a whole lot but some target names are in use
09:08	<annevk>	Do you think we can blocklist?
09:08	<Noam Rosenthal>	At the very least extending it to the full range of names is something we can do gradually
09:09	<Noam Rosenthal>	As in first just parse an allowlist and continue from there with a separate compat rollout
09:09	<annevk>	If we just ban xml/xml-stylesheet/php/?
09:10	<Noam Rosenthal>	Probably a few more common ones in the HA IIRC
09:10	<annevk>	It seems like we should be able to get some data on that and then just forever let those names be bogus comments.
09:11	<Noam Rosenthal>	I am not opposed to that. In the end it's the same feature with different compat strategies.
09:14	<annevk>	I think figuring that out first will give us a bit more room when we get to concrete features that will use these PIs. So doing the parser changes to enable PIs in HTML and figuring out the syntax xml-stylesheet PIs use in a bit more detail.
09:31	<Noam Rosenthal>	Btw another feature I can envision for these markers/PIs is a directive for the parser to buffer instead of stream, to help authors avoid rendering at intermediate points... But that needs some incubation
10:33	<Noam Rosenthal>	https://docs.google.com/spreadsheets/d/1VZDB3BA-G5VZpHdpYAEczVB1u2iEkN2YC-6dZdNuyBU/edit?gid=0#gid=0 xml, php, ra-page, xpacket, if, echo, import appear in more than 0.001% of main document responses in January 2026. Only <?xml and <?php appear in more than 0.1% of pages, Up to us where we want to draw the line
10:49	<Noam Rosenthal>	<?xpacket?> is usually inside svg for embedding XMP info about the SVG. I don't think anyone relies on these becoming comments in particular
10:50	<Noam Rosenthal>	<?ra-page?> is in Japanese government websites but I don't think it's queried client-side. <?if and <?echo are leaks from PHP
10:51	<smaug>	When implementation uses separate processes per site, what is the current global supposed to be in focus() when site A calls siteBWindow.focus(). I assume it is null, given that current realm is "running execution context", or am I missing something. (This is related to the focus-without-user-activation)
10:52	<smaug>	Though, even currently, similar question applies to window.close() and its use of incumbent global.
11:03	<Noam Rosenthal>	Isn't it a `SecurityError`?
11:03	<Noam Rosenthal>	It's defined as a cross-origin property on WindowProxy
11:03	<smaug>	I mean, you can call siteBWindow.focus()
11:04	<smaug>	right
11:06	<smaug>	But does something actually define how this is supposed to work? I mean other than things like this. I could very well miss something here.
11:07	<Noam Rosenthal>	https://html.spec.whatwg.org/multipage/nav-history-apis.html#integration-with-idl:crossoriginproperties-(-o-) ?
11:08	<Noam Rosenthal>	oh this means that it is callable I guess
11:09	<smaug>	window.close() works fine for cross site
11:14	<Noam Rosenthal>	right, my mistake. seems like in blink it queries the incumbent https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/frame/dom_window.cc;l=704?q=dom_window.c&ss=chromium%2Fchromium%2Fsrc
11:15	<smaug>	What might incumbent be in cross-site case? And why ? 🙂
11:16	<smaug>	(but incumbent global thingie has the backup stack, so current global might be even different from it)
12:55	<annevk>	Noam Rosenthal: I would draw a very conservative line as none of those names seem particularly important. And I think I would include xml-stylesheet because it currently doesn't work and would enable injection attacks if it started working.
13:22	<Noam Rosenthal>	Noam Rosenthal: I would draw a very conservative line as none of those names seem particularly important. And I think I would include xml-stylesheet because it currently doesn't work and would enable injection attacks if it started working. Yea xml, xml-stylesheet and maybe php seem sufficient
13:28	<annevk>	Noam Rosenthal: with conservative I meant including all the things you spot, but I think we're willing to try it either way. 😊
13:58	<foolip>	annevk: good point about `<?xml-stylesheet?>`. I definitely don't think we should make those work in HTML, and maybe keeping them as bogus comments is the way. If parsed into PIs, then we'd have to think about what they do if moved into XML documents...
14:00	<foolip>	On pseudo-attributes, I think this is a decision we actually need to make up front, because if we want to use the syntax `<?start mything?>`, `mything` looks an awful lot like a boolean attribute, and I think it would be super weird to later have the mix `<?start mything more="stuff"?>` where you can't reverse the order.
14:00	<sideshowbarker>	I see the `<?ra-page?>` thing is apparently from http://developer.symmetric.jp/roundabout/faq/000877.html — some product named Roundabout that seems to have stopped being marketed more than 10 years ago. So all that `<?ra-page?>` is legacy stuff; there’s never going to ever end up being more of it.
14:04	<annevk>	foolip: agreed, as I said above that would be a good first step here: "So doing the parser changes to enable PIs in HTML and figuring out the syntax xml-stylesheet PIs use in a bit more detail."
14:04	<foolip>	Yes, that's what I want to try now.
14:05	<foolip>	Unfortunately, if we want follow existing <? syntax, I think the more verbose `<?start name="mything"?>` is necessary.
14:05	<annevk>	foolip: as a baseline, yes. But I think we're still XML-compatible if we extend it.
14:07	<annevk>	One wrinkle might be whether we want to support `<?xml-stylesheet href=foo.css?>` in XML, but I think that would be okay.
14:08	<foolip>	If we go with an attributes model, then we'd obviously serialize it into something that works in XML even if HTML doesn't require quotes and whatnot.
14:09	<annevk>	That makes sense, but there's also the question at which layer this micro parser lives. If xml-stylesheet parsing is part of the XML parser it doesn't matter, but I suspect it's higher level?
14:10	<foolip>	Good question, I wonder if XML parser API's expose it or if it's built on top...
14:10	<annevk>	I suspect it's on top and if so, we'd want to make that shared code somehow.
14:11	<annevk>	(The reason I suspect it's on top is because PIs don't have any knowledge about this currently.)
14:12	<foolip>	Which spec defined <?xml-stylesheet?>?
14:13	<foolip>	https://www.w3.org/TR/xml-stylesheet/
14:15	<annevk>	Yeah, in combination with https://drafts.csswg.org/cssom/#requirements-on-user-agents-implementing-the-xml-stylesheet-processing-instruction which I think I wrote (at least part of) at some point.
14:16	<foolip>	So in spec it's layered on top, and in Chromium as well, it parses the data string: https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/dom/processing_instruction.cc;l=114-189;drc=089ae32bf2adeb77bd0afd067c006ce02707f548
14:16	<annevk>	I was initially fairly skeptical, but I kinda like reusing PIs now. Glad hsivonen chimed in.
14:21	<annevk>	I think what we could change is that we do the "data parse" as part of allocating the PI node so we can be sure everyone gets the same thing. And then we have to decide if "data parse" is identical across HTML and XML and if we want to support syntax beyond what xml-stylesheet allows. "data serialize" should be identical across HTML and XML in any event and be compatible with xml-stylesheet.
14:22	<annevk>	Though it might be a bit more involved still as currently PIs pretty much roundtrip as-is so maybe "data serialize" is just "return input".
14:23	<annevk>	Anyway, fun project.
14:30	<Tim van der Lippe>	👋
16:10	<opavliuk>	Hey folks! Could someone help verify our WPT URL test implementation, or point me to the right person to talk to about it? For context, we’re developing an industry-grade, pure-Python, open-source implementation of the WHATWG URL specification, and we’d like to confirm that our tests align with the official ones. Any help would be greatly appreciated!
17:06	<annevk>	opavliuk: what's the question?
18:14	<opavliuk>	annevk thank you for getting back to me. We selected the following test data files from https://github.com/web-platform-tests/wpt/tree/master/url/resources: IdnaTestV2.json IdnaTestV2-removed.json percent-encoding.json setters_tests.json toascii.json urltestdata.json So my first question is: are these test data sets sufficient to verify conformance of a standalone Python (i.e., non-JavaScript, non-browser) implementation of the WHATWG URL specification?
18:20	<annevk>	I think they should be. setters_tests are somewhat specific to the JS API for URLs. Not sure if you want to imitate that, but you're welcome to of course.
18:21	<annevk>	If you're doing percent-encoding as an API, note https://github.com/whatwg/url/pull/896 as well which reduces the surface area a bit.
18:23	<annevk>	To add, these tests have also been used to prove an independent C++ implementation that's now used in Node.js. And there's implementations in several other languages as well, though I haven't kept track.
18:25	<opavliuk>	I think they should be Great, thank you! setters_tests are somewhat specific to the JS API for URLs. Not sure if you want to imitate that, but you're welcome to of course. if I understand you correctly, you’re saying that using them is optional unless we intend to closely imitate the JS API. our intention is to satisfy the specification at the API level, while also providing a pythonic interface alongside it
18:29	<opavliuk>	these tests have also been used to prove an independent C++ implementation that's now used in Node.js I assume you’re referring to https://github.com/ada-url/ada If so, yes — we use it as a reference implementation 👍 The library also provides Python bindings, however, when running our test suite against them, I noticed several failures related to IDNA. That was one of the reasons we decided to start our own project, btw
18:34	<opavliuk>	One more question, if you don’t mind. Since we’re getting close to release, I wanted to double-check that our test coverage is complete. For that reason, I looked at the browser test results at https://wpt.fyi/results/url?label=experimental&label=master&aligned unfortunately, I’m not able to reliably map the tests listed in the dashboard to the test cases we generate from the JSON files. In some cases the numbers are higher, in others lower, which suggests I’m approaching this incorrectly. I’d appreciate your advice on whether it makes sense to try mapping these at all, and if so, what the correct approach would be?
18:44	<opavliuk>	my goal is simply to verify that our overall number of tests is at least in the right range, since it differs significantly across implementations. I’m essentially looking for a reliable source of truth.