05:12
<MikeSmith>
annevk: at https://encoding.spec.whatwg.org/#ref-for-ascii-code-point the Encoding spec says this:
05:12
<MikeSmith>
> These are “ASCII-incompatible” encodings and other than ISO-2022-JP, UTF-16BE, and UTF-16LE, which are unfortunately required due to deployed content, they are not supported.
05:13
<MikeSmith>
...while at https://html.spec.whatwg.org/multipage/infrastructure.html#ascii-compatible-encoding the HTML spec says this:
05:13
<MikeSmith>
> An ASCII-compatible encoding is any encoding that is not a UTF-16 encoding. [ENCODING]
05:26
<MikeSmith>
..
05:27
<MikeSmith>
So per that HTML spec language, ISO-2022-JP is an ASCII-compatible encoding
06:20
<MikeSmith>
hmm, actually ISO-2022-JP is actually an ASCII-compatible encoding, isn’t it?
06:21
<MikeSmith>
so that “other than ISO-2022-JP” part of the language in the Encoding spec should be dropped, shouldn’t it?
07:45
<annevk>
MikeSmith: it’s not due to the escapes
07:45
<annevk>
MikeSmith: we should prolly harmonize that language in HTML though
07:51
<MikeSmith>
annevk: Yeah I am asking because we have some code in the validator.nu parser that does a “is ASCII-compatible” check
07:51
<MikeSmith>
but I think it’s based on old spec language
07:52
<MikeSmith>
I think the current equivalent in the HTML spec is just the explicit checks for UTF-16
07:53
<MikeSmith>
anyway, I need to check that the code is actually doing something that current spec requires (and not something that it used to require and doesn’t now)
08:40
<annevk>
MikeSmith: HTML should just drop ASCII-compatible at this point; not sure why we kept it when we added UTF-16 encoding as a thing
08:42
<annevk>
MikeSmith: and I guess HTML's "UTF-16 encoding" could move to Encoding, but would like to land the refactoring PR for Encoding first
09:22
<MikeSmith>
annevk: refactoring PR is the “Rename Encoding's "streams" to "I/O queues"” PR?
09:23
<annevk>
MikeSmith: yeah
09:23
<MikeSmith>
k
09:24
<MikeSmith>
annevk: by the way, the specific check in the validator.nu code that I’m wondering about does this:
09:24
<annevk>
andreubotella: btw, realized that Domenic is out this week so do you want to wait or potentially do some tidying up later?
09:25
<MikeSmith>
> The encoding “foo” is not an ASCII superset and, therefore, cannot be used in an internal encoding declaration. Continuing the sniffing algorithm
09:25
<MikeSmith>
..in the the meta-scan code
09:26
<MikeSmith>
actually, it’s doing the same thing for the fully-parsed case too
09:27
<MikeSmith>
> Internal encoding declaration specified “foo”, which is not an ASCII superset. Not changing the encoding.
09:29
<MikeSmith>
since there’s earlier code that explicitly checks for UTF-16, then as far as I can see, that “not an ASCII superset. Not changing the encoding” would only get reached if the encoding is ISO-2022-JP and if ISO-2022-JP is considered to not be an ASCII superset
09:31
<MikeSmith>
ah
09:31
<MikeSmith>
that is this:
09:31
<MikeSmith>
> If the encoding that is already being used to interpret the input stream is a UTF-16 encoding, then set the confidence to certain and return. The new encoding is ignored
09:32
<MikeSmith>
...except that the spec says to do that ignore return only for UTF-16 encodings explictly (not for “not an ASCII superset” encodings)
09:34
<annevk>
Yeah, note that the specification has seen some refactoring already
09:38
<annevk>
MikeSmith: https://github.com/whatwg/html/commit/a73180679a40fce96b8e8fb6dfa5815a9bce30eb is probably of interest
09:41
MikeSmith
looks
09:41
<MikeSmith>
annevk: ah yeah that’s it
09:41
<MikeSmith>
2015
09:42
<MikeSmith>
I am kind of surprised how far out of conformance the validator.nu Java code is with the spec
09:43
<MikeSmith>
I mean specifically the encodings-handling code
09:44
<MikeSmith>
since it’s used for Firefox too, I would expect that’d necessarily mean that Firefox was also way out of conformance with the spec as far as encodings handling
09:45
<annevk>
MikeSmith: is it actually non-compliant though? Only checking for UTF-16 seems correct
09:46
<MikeSmith>
that is just one place I have found where the Java code is non-conforming
09:46
<annevk>
To stress the point a bit, the Encoding Standard's definition of ASCII-incompatible is completely non-normative
09:46
<MikeSmith>
OK
09:47
annevk
wonders if the big OK represents an annoyed MikeSmith 😊
09:47
<MikeSmith>
no, no — not annoyed at all
09:48
<MikeSmith>
anyway, another place that the Java code does not match the spec is that it implements the Charset Alias Matching thing rather than just trim-leading-trailing whitespace
09:49
<MikeSmith>
so I am kind of beginning to suspect that this is a part of the Java source that doesn’t actually get used in the Firefox code
09:50
<annevk>
oh yeah, that's bad
09:50
<annevk>
Pretty sure that's not in Firefox indeed
09:50
<annevk>
I wonder how many more times I will see Charset Alias Matching referenced in my life
09:50
<MikeSmith>
yeah, I think Henri must have separate C++ source for this
09:50
<MikeSmith>
haha
09:51
<MikeSmith>
more than you would like, I’m sure
09:52
<MikeSmith>
oh, actually I already know one specific place where the Firefox code does something very different from the Java code: the "replacement" encoding name/label
09:52
<MikeSmith>
there is zero code in the Java sources for dealing with the "replacement" encoding
09:52
<MikeSmith>
...yet Firefox handles it per-spec
09:53
<annevk>
R.I.P. He rid the web from Charset Alias Matching. OK chap.
09:57
<MikeSmith>
hahah
10:05
<andreubotella>
annevk: oh, I didn't know that. Let's merge now and fix later, then
10:07
<annevk>
andreubotella: sounds good, doing a final round of nits now
10:07
<andreubotella>
👍
12:08
<noamr>
annevk: hi, I've updated https://github.com/whatwg/html/pull/5574 to account for same-origin concerns as discussed.
20:28
<EveryOS>
Today I posted to the wicg discourse 1400 words worth of the most stupid, unrealistic idea I've ever had. At least it has not been deleted, so that's a plus...