00:16
<Domenic>
I'm OK to skip.
05:36
<Domenic>
Domenic: do the changes in https://github.com/w3c/csswg-drafts/issues/10550#issuecomment-2328247135 work for you? Any preferred text for referencing this kind of serialization from HTML?
Yes, they work! I think I suggested some reference text in the last sentence of https://github.com/w3c/csswg-drafts/issues/10550#issuecomment-2292850688 . (Although I used <a href="..."> instead of <span data-x="..."> for clarity since they don't know Wattsi over there.)
06:13
<annevk>
I'm okay to skip as well.
06:16
<annevk>
hsivonen: Adam Rice: I'd appreciate your thoughts on https://github.com/whatwg/encoding/pull/335. I'm convinced we should do GB18030-2022, but not entirely sure whether it should impact GBK or not. It seems about as incompatible for either encoding so maybe keeping them aligned is better? That would require updating the underlying table and have 18 additional encoder mappings. The alternative would be not touching the table and have 18 encoder/decoder mappings solely for gb18030.
08:35
<hsivonen>
hsivonen: Adam Rice: I'd appreciate your thoughts on https://github.com/whatwg/encoding/pull/335. I'm convinced we should do GB18030-2022, but not entirely sure whether it should impact GBK or not. It seems about as incompatible for either encoding so maybe keeping them aligned is better? That would require updating the underlying table and have 18 additional encoder mappings. The alternative would be not touching the table and have 18 encoder/decoder mappings solely for gb18030.
It seems bad to tweak legacy encodings, and I'm not sure how much this matters in practice. I'm quite skeptical of there being real interest these days in GB18030-the-encoding as opposed to GB18030-the-repertoire with UTF-8. Making WHATWG gbk diverge from Windows 936 does not seem nice in principle, but I could see an argument why changing WHATWG gbk might make sense in practice if any of this matters in practice. Do you know if Microsoft intends to change Windows code pages 936 and 54936?
08:40
<annevk>
hsivonen: I know ICU doesn't plan to make changes to GBK. I doubt Windows would make changes. The main reason to change GBK as well is because it already inherits most of gb18030 in the Encoding Standard. E.g., GBK uses the gb18030 decoder. Anyway, we could decide to safeguard GBK from these additional changes and defend it with test coverage, I'm just not sure it's worth it. Especially as it's mainly PUA code points.
08:40
<annevk>
And it already regressed in Chromium and WebKit for about a year without anybody complaining...
08:41
<hsivonen>
And it already regressed in Chromium and WebKit for about a year without anybody complaining...
Do you mean Chromium has implemented the gb18030 label as gb18030-2022?
08:43
<annevk>
hsivonen: some variant of gb18030-2022, yes. I think the one they currently ship is not the Unicode recommendation as they just copied the code from WebKit without much discussion. And it has impacted their implementation of GBK as well (because of the Encoding Standard sharing a lot of logic between GBK and gb18030), just like it has in WebKit...
08:43
<Ms2ger>
🙈
08:45
<hsivonen>
I'm already sad that our EUC-KR isn't an exact PUA match for Windows code page 949, but my sadness about this isn't really a good technical argument.
08:47
<hsivonen>
If gbk in both WebKit and Chromium has diverged from Windows code page 936 for a year towards less PUA, I guess it's a rather convincing sign of it being Web-compatible. What happens if you try to encode the relevant PUA code points?
08:48
<annevk>
Yeah, I can't say I'm thrilled about how all this went down with some flip-flopping along the way as to how exactly GB18030-2022 is to be implemented coupled with unforeseen side effects for GBK. But it is what it is.
08:49
<annevk>
hsivonen: that currently fails. https://github.com/web-platform-tests/wpt/pull/48240 demonstrates this. The reason is that the additional encoder code points are guarded by a gb18030 check. It's like the worst solution one can imagine.
08:52
<annevk>

hsivonen: given the three lists in https://github.com/whatwg/encoding/issues/312#issuecomment-2354764499 my idea for the best solution given today's knowledge is as follows:

  1. We encode the 18 bidirectional mappings (the first set) directly in the gb18030 index.
  2. We encode the 18 from Unicode mappings (the second set) as an additional encoder step shared by GBK and gb18030.
  3. We leave the third set alone as the gb18030 decoder (and GBK decoder for that matter) already handle those as-is.
08:55
<hsivonen>

hsivonen: given the three lists in https://github.com/whatwg/encoding/issues/312#issuecomment-2354764499 my idea for the best solution given today's knowledge is as follows:

  1. We encode the 18 bidirectional mappings (the first set) directly in the gb18030 index.
  2. We encode the 18 from Unicode mappings (the second set) as an additional encoder step shared by GBK and gb18030.
  3. We leave the third set alone as the gb18030 decoder (and GBK decoder for that matter) already handle those as-is.
Does 'as-is' in item 3 mean doing the Unicode recommendation of decoding the 4-byte sequences (as gb18030) to non-PUA (and errors in gbk)?
08:57
<annevk>
hsivonen: the GBK decoder is the gb18030 decoder. They are currently completely identical. And those sequences already map to non-PUA in the gb18030 decoder as currently specified. (This third set doesn't actually constitute a change with respect to GB18030-2005. This is a bit confusing and did result in redundant code being written for WebKit. I mentioned it to Ken Lunde, but I'm not sure much is going to be done about it.)
08:59
<hsivonen>
hsivonen: the GBK decoder is the gb18030 decoder. They are currently completely identical. And those sequences already map to non-PUA in the gb18030 decoder as currently specified. (This third set doesn't actually constitute a change with respect to GB18030-2005. This is a bit confusing and did result in redundant code being written for WebKit. I mentioned it to Ken Lunde, but I'm not sure much is going to be done about it.)
Ah, right sorry about my confusing about gbk decoder. If the third set stays as-is in the index, will the first set end up taking precedence over the third set in the gb18030 encoder due to existing specified search order?
09:00
<annevk>
hsivonen: if you want even more evidence, see how https://github.com/WebKit/WebKit/commit/068d177a29ad44c82c95dd204a27d4841979c301 removed those mappings without needed corresponding changes in tests.
09:01
<annevk>
hsivonen: yeah it'll take precedence because we look at the two-byte table (index gb18030) before resorting to index gb18030 ranges
09:03
<hsivonen>
hsivonen: yeah it'll take precedence because we look at the two-byte table (index gb18030) before resorting to index gb18030 ranges
OK. I guess it makes sense to do this, although I don't like it that this will result in more caveats for me to mention every time I happen to talk about the relationship of the WHATWG encoding and Windows code pages. (But so far, it's only been about talking about them: no real-world evidence of practical problem either way.)
09:04
<annevk>
hsivonen: maybe just memorize a URL for when people are eager for the cliff notes? :-)
09:05
<annevk>
Okay, I'll see about preparing a new PR.
09:05
<annevk>
Thanks for talking it through!
09:05
<hsivonen>
Thanks!
09:06
<hsivonen>
The URL to memorize will be https://docs.rs/encoding_rs/ . (See e.g. https://docs.rs/encoding_rs/latest/encoding_rs/static.GBK.html )
10:26
<smaug>
hmm, CloseWatchers kind of work in documents which aren't bound to a browsing context? I guess that is just a mistake (or I'm missing some check somewhere)
10:26
<smaug>
Fine to me
15:14
<zcorpan>
Domenic: ^
15:25
<Dominic Farolino>
OK who is up to date on sequential focus navigation knowledge? I'm reading https://html.spec.whatwg.org/C#sequential-navigation-search-algorithm and I think I've convinced myself that in the algorithm, there is no semantic difference between the "selection mechanism=sequential" and "selection mechanism=DOM" cases. Where am I off? Both seem to get the first (for example) sequentially focusable area in starting point's Document, after starting point, no?
16:22
<Dominic Farolino>
(I've asked this question more concretely in https://github.com/whatwg/html/pull/10632#pullrequestreview-2313203811)
16:23
<annevk>
hsivonen: Adam Rice: https://github.com/whatwg/encoding/pull/336 is ready for review now. I also filed all the implementer bugs and verified the test coverage with a WebKit patch.
16:40
<annevk>
hsivonen: FWIW, the visualization script runs as part of the build: https://github.com/whatwg/encoding/blob/main/Makefile
16:40
<hsivonen>
hsivonen: FWIW, the visualization script runs as part of the build: https://github.com/whatwg/encoding/blob/main/Makefile
OK.
17:13
<Panos Astithas>
Thanks all, the meeting is now canceled.