00:30
<akaster>
Is there any appetite to revisit what origins are supposed to be exposed via document.ancestorOrigins ? We've got an open PR to implement it in Ladybird, but I'm not sure whether there was any consensus as to what it should be doing from https://github.com/whatwg/html/issues/1918. Chrome and Safar seem to implement it, while Firefox does not
00:31
<akaster>
er.. location.ancestorOrigins
00:36
<sideshowbarker>

akaster: One thing to consider is: Somebody from the Ladybird project who’s familiar with that Ladybird PR could get it on the agenda for the next WHATNOT meeting/call by adding a comment to https://github.com/whatwg/html/issues/10471.

That next call is on July 18th at 9am US/West.

But if nobody else from Ladybird can be on the call, I can read up on the PR and could be on myself to talk about it.

00:41
<akaster>
Hmm. Sure. I've been meaning to figure out how to get us more involved in the standards processes anyway. I think we have quite a few open WhatWG-related issues laying around that might be worth aggregating into a list in our own issue tracker as well.
01:35
<sideshowbarker>
akaster: If you want, I can make time to help with either or both of those (figuring out how to get more involved in standards processes, and aggregating the relevant issues). And I’d be happy to do it. I’m anyway looking for more ways I could contribute to the project — and it’d be a good fit for me, since I’m already pretty involved in the WHATWG. (And would also give me another reason to procrastinate on debugging https://github.com/LadybirdBrowser/ladybird/issues/75 😆)
07:10
<annevk>
akaster: I don't think there's an update on ancestorOrigins, but I think bz's concern around it exposing too much information still holds. There's talk about a header-based version of that feature that does a lot better in terms of information exposure: https://github.com/w3c/webappsec-fetch-metadata/issues/56
08:10
<lynko>
Hi, I was contemplating HTML entities and I realized that there are two entities, &nLt; and &nGt;, whose UTF-8 expansions are longer than their ASCII representations (six bytes versus five bytes). Only these entities have this property. I realized this while writing an HTML parser in C, using constant string views to parse a document with no allocations and no copies. It seems to me that, in UTF-8, these entities and the mandate to replace U+0000 with U+FFFD are the only things preventing me from decoding inline HTML text in-place by mutating the buffer. I punt the replacement of nul bytes to the user, but because of &nLt; and &nGt;, inline text is 20% longer in the worst case after expansion. Am I crazy, or is this a serious limitation? HTML is so close to being parseable in-place. I don't want to jump to the conclusion that these entities should be deprecated, but there would be a benefit. I'm tempted to ignore &nLt; and &nGt; just for this reason!
08:34
<annevk>
Domenic: if you have a couple minutes could you look at https://github.com/whatwg/mimesniff/pull/192? I'd like to land it
08:36
<annevk>
lynko: don't you have the same problem with decoding bytes to text? E.g., 0xFF has to become U+FFFD too.
08:39
<lynko>
lynko: don't you have the same problem with decoding bytes to text? E.g., 0xFF has to become U+FFFD too.
Sure, but that's malformed input, and I pass it on just like a nul byte (it's up to the user to decide if conversion is necessary). ≪⃒ and ≫⃒ make it a problem for correct input too.
08:42
<annevk>
How do you avoid allocations for creating nodes and such?
08:45
<lynko>
There's one flat inout array for nodes, if the parser runs out of room it stops writing but still returns the number of nodes that were parsed. If your buffer was already big enough then it happens in one pass and doesn't trigger a reallocation, and even if it does, you know exactly how much room you need. The parser also always produces a valid hierarchy even if it gets cut off partway through
08:49
<annevk>
I see, but that also means you have to keep the entire input file in memory as well, right? For a markup-heavy document I wonder if that's still going to be beneficial. But it's interesting for sure.
08:50
<annevk>
If anyone hsivonen might have some thoughts about this, but not sure if he's around.
08:52
<lynko>
It's kind of a specific use case, but I personally don't anticipate the need to parse files larger than I can store in memory. I do have some ideas about extending the API for streaming purposes with similar in-place properties... but it's beside the point
08:52
<hsivonen>
I haven't previously noticed this property of the entity names, but I have noticed that replacing U+0000 with U+FFFD is rather unfortunate from the UTF-8 perspective.
08:52
<lynko>
The answer is wilful violation :)
08:54
<hsivonen>
lynko: Does modifying the buffer in place really help? That is, don't you need an API that can report the content of a text nodes as multiple API chunks anyway? Once you have that, referring to static memory that contains the entity expansion is workable.
08:55
<hsivonen>
lynko: Can your tree builder side usefully hold onto text nodes that point to source data? Won't that mean retaining the buffer space for all the tags at presumably high cost?
08:57
<hsivonen>
lynko: also, when entities resolve to shorter output, won't you have quadratic memmoves that defeat the benefit of avoiding copies?
08:59
<lynko>
lynko: Does modifying the buffer in place really help? That is, don't you need an API that can report the content of a text nodes as multiple API chunks anyway? Once you have that, referring to static memory that contains the entity expansion is workable.
If all I really want is decoded inline text, then I just want to compute that text. It's always a bonus not to have to find a new place to put it.
09:02
<lynko>
lynko: Can your tree builder side usefully hold onto text nodes that point to source data? Won't that mean retaining the buffer space for all the tags at presumably high cost?
Tags are 75% of a document in the worst case. I'll always have to store a tag's type (and attributes) somehow, so I just use a single string view and don't decode attributes, except for parsing validation. I'm not sure exactly how the tradeoff works out, but tag content is usually a much smaller portion... I'm guessing 10%. Another benefit besides memory locality is that this approach preserves tags' case, although I don't store views of the closing tag
09:03
<lynko>
lynko: also, when entities resolve to shorter output, won't you have quadratic memmoves that defeat the benefit of avoiding copies?
This can be done in O(n) where n is the length of the text, and it was already O(n). Also it only matters for documents with extreme amounts of entity references
09:05
<hsivonen>
I see. I'm looking forward to seeing the results with the willful violation. I don't expect us to un-spec the two entities that have been there for a long time, despite them being niche, though. At least not without a use counter.
09:07
<lynko>
I see. I'm looking forward to seeing the results with the willful violation. I don't expect us to un-spec the two entities that have been there for a long time, despite them being niche, though. At least not without a use counter.
No no, I'm not suggesting U+FFFD should be changed. Too much momentum. And I wouldn't transmit HTML with nul bytes in it. This is for internal use, where I can optimize for correct input and retain control codes.
09:08
<annevk>
Also note that parsers in browser engines store local names (of most elements and attributes) as atomized strings. It'd be interesting to see the memory and performance differences though.
09:14
<lynko>
Also note that parsers in browser engines store local names (of most elements and attributes) as atomized strings. It'd be interesting to see the memory and performance differences though.
I doubt I could make a fair comparison. Browser engines have a lot more concerns than me, so they must have to make compromises of some sort. But you've given me a lot to think about. Short-string optimization and generating a new text storage buffer alongside the node array seems worth exploring.
09:15
<annevk>
Domenic: thanks, follow-up request: https://github.com/web-platform-tests/wpt/pull/47002
09:29
<lynko>
One thing to notice is that if the input isn't held in memory, all significant text has to be copied no matter what. Hopefully a document is at least 90% significant. Standard tags only have to be scanned once to get their type. In place parsing is zero-copy, entity expansion is low-copy (guaranteed to be fewer copies than just copying all the significant text all the time), and entity expansion could be in-place except for the aforementioned entities. I think this is a legitimate use case for HTML that could broaden its applicability.
09:43
<lynko>
...Another thing to notice, which is a point against me, is that real-world documents are often heavily indented, easily over 50% insignificant whitespace...
09:47
<nicolo-ribaudo>

Hey, I'm going to propose some changes to how indirect eval and new Function work (https://github.com/tc39/ecma262/issues/3160, https://github.com/tc39/ecma262/pull/3374), so their behaviour only depends on their arguments and not on their caller.
Re-align setTimeout's behaviour to eval's behaviour will then need some changes to step 8.4.7 of https://html.spec.whatwg.org/#timer-initialisation-steps, aligning the HTML spec to what Chrome (and I think Safari) did when I tested it.

If anybody has opinions about it, please leave a comment on that GitHub issue :)

11:01
<akaster>
akaster: I don't think there's an update on ancestorOrigins, but I think bz's concern around it exposing too much information still holds. There's talk about a header-based version of that feature that does a lot better in terms of information exposure: https://github.com/w3c/webappsec-fetch-metadata/issues/56
Hm. In that case, should I retract the request to discuss it at WHATNOT this Thursday? It sounds like not implementing it as Firefox does is probably the 'best' for user privacy until the other issues are worked out
11:08
<annevk>
akaster: seems fine to bring up to see if anyone is interested in working on it
11:13
<annevk>
nicolo-ribaudo: you don't actually say what Safari does
11:25
<nicolo-ribaudo>
I think safari uses the base URL of the entrypoint of the module graph for eval/new Function, and the base URL of the realm/document for setTimeout. It's the only way I can explain the behaviour I see in https://github.com/nicolo-ribaudo/function-dynamic-scoping?tab=readme-ov-file#notes
11:39
<annevk>
Oh wow.
16:12
<annevk>
Seems to be tracked in https://github.com/whatwg/html/issues/10478 btw
20:50
<Wild rose>
Der weltweit führende Dating- Assistent https://www1.afego.life/v8L8OE