00:30 | <akaster> | Is there any appetite to revisit what origins are supposed to be exposed via document.ancestorOrigins ? We've got an open PR to implement it in Ladybird, but I'm not sure whether there was any consensus as to what it should be doing from https://github.com/whatwg/html/issues/1918. Chrome and Safar seem to implement it, while Firefox does not |
00:31 | <akaster> | er.. location.ancestorOrigins |
00:36 | <sideshowbarker> | akaster: One thing to consider is: Somebody from the Ladybird project who’s familiar with that Ladybird PR could get it on the agenda for the next WHATNOT meeting/call by adding a comment to https://github.com/whatwg/html/issues/10471. That next call is on July 18th at 9am US/West. But if nobody else from Ladybird can be on the call, I can read up on the PR and could be on myself to talk about it. |
00:41 | <akaster> | Hmm. Sure. I've been meaning to figure out how to get us more involved in the standards processes anyway. I think we have quite a few open WhatWG-related issues laying around that might be worth aggregating into a list in our own issue tracker as well. |
01:35 | <sideshowbarker> | akaster: If you want, I can make time to help with either or both of those (figuring out how to get more involved in standards processes, and aggregating the relevant issues). And I’d be happy to do it. I’m anyway looking for more ways I could contribute to the project — and it’d be a good fit for me, since I’m already pretty involved in the WHATWG. (And would also give me another reason to procrastinate on debugging https://github.com/LadybirdBrowser/ladybird/issues/75 😆) |
07:10 | <annevk> | akaster: I don't think there's an update on ancestorOrigins , but I think bz's concern around it exposing too much information still holds. There's talk about a header-based version of that feature that does a lot better in terms of information exposure: https://github.com/w3c/webappsec-fetch-metadata/issues/56 |
08:10 | <lynko> | Hi, I was contemplating HTML entities and I realized that there are two entities, ≪⃒ and ≫⃒, whose UTF-8 expansions are longer than their ASCII representations (six bytes versus five bytes). Only these entities have this property. I realized this while writing an HTML parser in C, using constant string views to parse a document with no allocations and no copies. It seems to me that, in UTF-8, these entities and the mandate to replace U+0000 with U+FFFD are the only things preventing me from decoding inline HTML text in-place by mutating the buffer. I punt the replacement of nul bytes to the user, but because of ≪⃒ and ≫⃒, inline text is 20% longer in the worst case after expansion. Am I crazy, or is this a serious limitation? HTML is so close to being parseable in-place. I don't want to jump to the conclusion that these entities should be deprecated, but there would be a benefit. I'm tempted to ignore ≪⃒ and ≫⃒ just for this reason! |
08:34 | <annevk> | Domenic: if you have a couple minutes could you look at https://github.com/whatwg/mimesniff/pull/192? I'd like to land it |
08:36 | <annevk> | lynko: don't you have the same problem with decoding bytes to text? E.g., 0xFF has to become U+FFFD too. |
08:39 | <lynko> | lynko: don't you have the same problem with decoding bytes to text? E.g., 0xFF has to become U+FFFD too. |
08:42 | <annevk> | How do you avoid allocations for creating nodes and such? |
08:45 | <lynko> | There's one flat inout array for nodes, if the parser runs out of room it stops writing but still returns the number of nodes that were parsed. If your buffer was already big enough then it happens in one pass and doesn't trigger a reallocation, and even if it does, you know exactly how much room you need. The parser also always produces a valid hierarchy even if it gets cut off partway through |
08:49 | <annevk> | I see, but that also means you have to keep the entire input file in memory as well, right? For a markup-heavy document I wonder if that's still going to be beneficial. But it's interesting for sure. |
08:50 | <annevk> | If anyone hsivonen might have some thoughts about this, but not sure if he's around. |
08:52 | <lynko> | It's kind of a specific use case, but I personally don't anticipate the need to parse files larger than I can store in memory. I do have some ideas about extending the API for streaming purposes with similar in-place properties... but it's beside the point |
08:52 | <hsivonen> | I haven't previously noticed this property of the entity names, but I have noticed that replacing U+0000 with U+FFFD is rather unfortunate from the UTF-8 perspective. |
08:52 | <lynko> | The answer is wilful violation :) |
08:54 | <hsivonen> | lynko: Does modifying the buffer in place really help? That is, don't you need an API that can report the content of a text nodes as multiple API chunks anyway? Once you have that, referring to static memory that contains the entity expansion is workable. |
08:55 | <hsivonen> | lynko: Can your tree builder side usefully hold onto text nodes that point to source data? Won't that mean retaining the buffer space for all the tags at presumably high cost? |
08:57 | <hsivonen> | lynko: also, when entities resolve to shorter output, won't you have quadratic memmoves that defeat the benefit of avoiding copies? |
08:59 | <lynko> | lynko: Does modifying the buffer in place really help? That is, don't you need an API that can report the content of a text nodes as multiple API chunks anyway? Once you have that, referring to static memory that contains the entity expansion is workable. |
09:02 | <lynko> | lynko: Can your tree builder side usefully hold onto text nodes that point to source data? Won't that mean retaining the buffer space for all the tags at presumably high cost? |
09:03 | <lynko> | lynko: also, when entities resolve to shorter output, won't you have quadratic memmoves that defeat the benefit of avoiding copies? |
09:05 | <hsivonen> | I see. I'm looking forward to seeing the results with the willful violation. I don't expect us to un-spec the two entities that have been there for a long time, despite them being niche, though. At least not without a use counter. |
09:07 | <lynko> | I see. I'm looking forward to seeing the results with the willful violation. I don't expect us to un-spec the two entities that have been there for a long time, despite them being niche, though. At least not without a use counter. |
09:08 | <annevk> | Also note that parsers in browser engines store local names (of most elements and attributes) as atomized strings. It'd be interesting to see the memory and performance differences though. |
09:14 | <lynko> | Also note that parsers in browser engines store local names (of most elements and attributes) as atomized strings. It'd be interesting to see the memory and performance differences though. |
09:15 | <annevk> | Domenic: thanks, follow-up request: https://github.com/web-platform-tests/wpt/pull/47002 |
09:29 | <lynko> | One thing to notice is that if the input isn't held in memory, all significant text has to be copied no matter what. Hopefully a document is at least 90% significant. Standard tags only have to be scanned once to get their type. In place parsing is zero-copy, entity expansion is low-copy (guaranteed to be fewer copies than just copying all the significant text all the time), and entity expansion could be in-place except for the aforementioned entities. I think this is a legitimate use case for HTML that could broaden its applicability. |
09:43 | <lynko> | ...Another thing to notice, which is a point against me, is that real-world documents are often heavily indented, easily over 50% insignificant whitespace... |
09:47 | <nicolo-ribaudo> | Hey, I'm going to propose some changes to how indirect If anybody has opinions about it, please leave a comment on that GitHub issue :) |
11:01 | <akaster> | akaster: I don't think there's an update on |
11:08 | <annevk> | akaster: seems fine to bring up to see if anyone is interested in working on it |
11:13 | <annevk> | nicolo-ribaudo: you don't actually say what Safari does |
11:25 | <nicolo-ribaudo> | I think safari uses the base URL of the entrypoint of the module graph for eval /new Function , and the base URL of the realm/document for setTimeout . It's the only way I can explain the behaviour I see in https://github.com/nicolo-ribaudo/function-dynamic-scoping?tab=readme-ov-file#notes |
11:39 | <annevk> | Oh wow. |
16:12 | <annevk> | Seems to be tracked in https://github.com/whatwg/html/issues/10478 btw |
20:50 | <Wild rose> | Der weltweit führende Dating- Assistent https://www1.afego.life/v8L8OE |