#whatwg on 2020-09-08

05:12	<MikeSmith>	annevk: at https://encoding.spec.whatwg.org/#ref-for-ascii-code-point the Encoding spec says this:
05:12	<MikeSmith>	> These are “ASCII-incompatible” encodings and other than ISO-2022-JP, UTF-16BE, and UTF-16LE, which are unfortunately required due to deployed content, they are not supported.
05:13	<MikeSmith>	...while at https://html.spec.whatwg.org/multipage/infrastructure.html#ascii-compatible-encoding the HTML spec says this:
05:13	<MikeSmith>	> An ASCII-compatible encoding is any encoding that is not a UTF-16 encoding. [ENCODING]
05:26	<MikeSmith>	..
05:27	<MikeSmith>	So per that HTML spec language, ISO-2022-JP is an ASCII-compatible encoding
06:20	<MikeSmith>	hmm, actually ISO-2022-JP is actually an ASCII-compatible encoding, isn’t it?
06:21	<MikeSmith>	so that “other than ISO-2022-JP” part of the language in the Encoding spec should be dropped, shouldn’t it?
07:45	<annevk>	MikeSmith: it’s not due to the escapes
07:45	<annevk>	MikeSmith: we should prolly harmonize that language in HTML though
07:51	<MikeSmith>	annevk: Yeah I am asking because we have some code in the validator.nu parser that does a “is ASCII-compatible” check
07:51	<MikeSmith>	but I think it’s based on old spec language
07:52	<MikeSmith>	I think the current equivalent in the HTML spec is just the explicit checks for UTF-16
07:53	<MikeSmith>	anyway, I need to check that the code is actually doing something that current spec requires (and not something that it used to require and doesn’t now)
08:40	<annevk>	MikeSmith: HTML should just drop ASCII-compatible at this point; not sure why we kept it when we added UTF-16 encoding as a thing
08:42	<annevk>	MikeSmith: and I guess HTML's "UTF-16 encoding" could move to Encoding, but would like to land the refactoring PR for Encoding first
09:22	<MikeSmith>	annevk: refactoring PR is the “Rename Encoding's "streams" to "I/O queues"” PR?
09:23	<annevk>	MikeSmith: yeah
09:23	<MikeSmith>	k
09:24	<MikeSmith>	annevk: by the way, the specific check in the validator.nu code that I’m wondering about does this:
09:24	<annevk>	andreubotella: btw, realized that Domenic is out this week so do you want to wait or potentially do some tidying up later?
09:25	<MikeSmith>	> The encoding “foo” is not an ASCII superset and, therefore, cannot be used in an internal encoding declaration. Continuing the sniffing algorithm
09:25	<MikeSmith>	..in the the meta-scan code
09:26	<MikeSmith>	actually, it’s doing the same thing for the fully-parsed case too
09:27	<MikeSmith>	> Internal encoding declaration specified “foo”, which is not an ASCII superset. Not changing the encoding.
09:29	<MikeSmith>	since there’s earlier code that explicitly checks for UTF-16, then as far as I can see, that “not an ASCII superset. Not changing the encoding” would only get reached if the encoding is ISO-2022-JP and if ISO-2022-JP is considered to not be an ASCII superset
09:31	<MikeSmith>	ah
09:31	<MikeSmith>	that is this:
09:31	<MikeSmith>	> If the encoding that is already being used to interpret the input stream is a UTF-16 encoding, then set the confidence to certain and return. The new encoding is ignored
09:32	<MikeSmith>	...except that the spec says to do that ignore return only for UTF-16 encodings explictly (not for “not an ASCII superset” encodings)
09:34	<annevk>	Yeah, note that the specification has seen some refactoring already
09:38	<annevk>	MikeSmith: https://github.com/whatwg/html/commit/a73180679a40fce96b8e8fb6dfa5815a9bce30eb is probably of interest
09:41	MikeSmith	looks
09:41	<MikeSmith>	annevk: ah yeah that’s it
09:41	<MikeSmith>	2015
09:42	<MikeSmith>	I am kind of surprised how far out of conformance the validator.nu Java code is with the spec
09:43	<MikeSmith>	I mean specifically the encodings-handling code
09:44	<MikeSmith>	since it’s used for Firefox too, I would expect that’d necessarily mean that Firefox was also way out of conformance with the spec as far as encodings handling
09:45	<annevk>	MikeSmith: is it actually non-compliant though? Only checking for UTF-16 seems correct
09:46	<MikeSmith>	that is just one place I have found where the Java code is non-conforming
09:46	<annevk>	To stress the point a bit, the Encoding Standard's definition of ASCII-incompatible is completely non-normative
09:46	<MikeSmith>	OK
09:47	annevk	wonders if the big OK represents an annoyed MikeSmith 😊
09:47	<MikeSmith>	no, no — not annoyed at all
09:48	<MikeSmith>	anyway, another place that the Java code does not match the spec is that it implements the Charset Alias Matching thing rather than just trim-leading-trailing whitespace
09:49	<MikeSmith>	so I am kind of beginning to suspect that this is a part of the Java source that doesn’t actually get used in the Firefox code
09:50	<annevk>	oh yeah, that's bad
09:50	<annevk>	Pretty sure that's not in Firefox indeed
09:50	<annevk>	I wonder how many more times I will see Charset Alias Matching referenced in my life
09:50	<MikeSmith>	yeah, I think Henri must have separate C++ source for this
09:50	<MikeSmith>	haha
09:51	<MikeSmith>	more than you would like, I’m sure
09:52	<MikeSmith>	oh, actually I already know one specific place where the Firefox code does something very different from the Java code: the "replacement" encoding name/label
09:52	<MikeSmith>	there is zero code in the Java sources for dealing with the "replacement" encoding
09:52	<MikeSmith>	...yet Firefox handles it per-spec
09:53	<annevk>	R.I.P. He rid the web from Charset Alias Matching. OK chap.
09:57	<MikeSmith>	hahah
10:05	<andreubotella>	annevk: oh, I didn't know that. Let's merge now and fix later, then
10:07	<annevk>	andreubotella: sounds good, doing a final round of nits now
10:07	<andreubotella>	👍
12:08	<noamr>	annevk: hi, I've updated https://github.com/whatwg/html/pull/5574 to account for same-origin concerns as discussed.
20:28	<EveryOS>	Today I posted to the wicg discourse 1400 words worth of the most stupid, unrealistic idea I've ever had. At least it has not been deleted, so that's a plus...