04:20
<devsrealmguy>
Hi guys, is this room appropriate to ask questions regarding the HTML spec?
04:21
<devsrealmguy>
I'll ask for now, but feel free to delete it if it's not allowed, so here is my question:
04:29
<devsrealmguy>
In the HTML parsing section, specifically the "Named character reference state", we have the following info: >>> Consume the maximum number of characters possible, where the consumed characters are one of the identifiers in the first column of the named character references table. Append each character to the temporary buffer when it's consumed. >>[If there is a match]: If the character reference was consumed as part of an attribute, and the last character matched is not a U+003B SEMICOLON character (;), and the next input character is either a U+003D EQUALS SIGN character (=) or an ASCII alphanumeric, then, for historical reasons, flush code points consumed as a character reference and switch to the return state. >>[Otherwise:] If the last character matched is not a U+003B SEMICOLON character (;), then this is a missing-semicolon-after-character-reference parse error. Set the temporary buffer to the empty string. Append one or two characters corresponding to the character reference name (as given by the second column of the named character references table) to the temporary buffer. Flush code points consumed as a character reference. Switch to the return state.
04:33
<devsrealmguy>
At what point do I decide to stop consuming? It's not clear , I know I can lookahead and stop matching when I encounter a semi-colon but the spec doesn't make that clear
04:35
<sideshowbarker>
devsrealmguy: I think you basically stop consuming when there’s no match
04:35
<sideshowbarker>
no substring match
04:38
<sideshowbarker>
so if you get &h, keep consuming, &he, keep consuming, &hel, &hell, &helli, still consuming
04:39
<sideshowbarker>
because those all are substring matches of valid character references
04:40
<sideshowbarker>
but if you hit &helliq, stop consuming β€” because that is not a substring match of any valid character reference
04:41
<devsrealmguy>
Thanks for replying, I appreciate it a lot. I am on mobile, so, it's really hard to write and my sight is really poor, so, sorry if I my grammar is wrong. And yes, that is exactly what I am doing, what if you have &not and &notin?
04:44
<devsrealmguy>
> but if you hit `&helliq`, stop consuming β€” because that is not a substring match of any valid character reference Brilliant, I was looking to stop consuming when I encounter a semi-colon but you just nailed it.
04:54
<devsrealmguy>
Never mind with the &notit and &notin. Got it, thanks 😊
07:29
<Noam Rosenthal>
morning annevk, I think https://github.com/whatwg/fetch/pull/1311 is reviewable again, and I have two other pending ones (preload & controller), when you get the chance to look
14:48
<devsrealmguy>
I think I am done covering the tokenization state, the only state that is a bit difficult is the "Named Character Reference State". Just to confirm I didn't mess things up, this point says: 1) If the last character matched is not a U+003B SEMICOLON character (;), then this is a missing-semicolon-after-character-reference parse error. 2) Set the temporary buffer to the empty string. Append one or two characters corresponding to the character reference name (as given by the second column of the named character references table) to the temporary buffer" An example of point 1 is this: "James &amp his brother went for launch". It is clear that the &amp is missing a semi-colon, so, is the point 2 saying to correct it to &amp; (with the semi-colon) or is it saying to swap it with the codepoint which is U+00026 as that is what is given in the second column table which the point 2 is referring to. I was curious enough to check how other libraries did theirs, some are converting the codepoint to a character, some are appending the semi-colon and many ignored it. I can see why it trips people of. Where should I sail to?
14:57
<Ms2ger πŸ’‰πŸ’‰>
Step 1 is just "the author screwed up" - you should ignore the step completely unless you surface parse errors somehow
14:59
<Ms2ger πŸ’‰πŸ’‰>
&amp without the semicolon is in the table, so you just parsed an "&"
15:00
<Ms2ger πŸ’‰πŸ’‰>
Did that help or just confuse more? :)
15:00
<devsrealmguy>
You mean, I should add the "&" to the temporary buffer?
15:02
<Ms2ger πŸ’‰πŸ’‰>
Yes, it seems like that's the result
15:02
Ms2ger πŸ’‰πŸ’‰
was unaware of this particular ugly corner of the parser
15:03
<devsrealmguy>
Okay, thanks. I spent tons of time on just that state than the rest of the states combined πŸ˜‚
17:02
<Domenic>
At some point we should blog about the fact that we've added/are adding a bunch of new standards: Web IDL, Test Utils, WebSockets, and File System
17:02
<Domenic>
I guess I will try to draft something quick
19:28
<Domenic>
Posted https://blog.whatwg.org/new-living-standards-2021 and tweeted https://twitter.com/WHATWG/status/1466126880344616966
22:38
<Mattias Buelens>
I seem to remember there was a tool that lists all Web IDL interfaces, where they are defined and where they are used in other specifications. Anyone knows what it was called again?
22:38
<mek>
https://dontcallmedom.github.io/webidlpedia/index.html ?
22:39
<Mattias Buelens>
Yep, that's it. Thanks! 😁
22:40
<Andreu Botella (he/they)>
There's also https://github.com/w3c/webref/tree/main/ed/idl