#whatwg on 2014-06-15

18:46	<krit>	annevk: won’t make it before noon tomorrow (personal reasons)
19:50	<Hixie>	got it down to 8.5s user+sys to do all the tests and parse the html spec and reserialise it
22:58	<gsnedders>	which of the encodings defined in Encoding are not ASCII-supersets?
22:59	<caitp>	ebcdic
23:00	<gsnedders>	caitp: is not in Encoding
23:01	<caitp>	i know
23:01	<gsnedders>	then it by definition is not an encoding defined in Encoding which is not an ASCII-supserset
23:01	<caitp>	anyways, it would be anything which doesn't have the "if it's less than 0x80, return it"
23:01	<caitp>	clause
23:02	<caitp>	with the exception of the utf16 stuff
23:02	<gsnedders>	and possibly some of the SBCSes, as at least ibm866 isn't
23:02	<zewt>	i don't recall there being any at all, ascii-compatibility is pretty fundamental
23:02	<caitp>	utf16be isn't really ascii-compatible
23:02	<caitp>	on a little endian system
23:03	<gsnedders>	no variant of UTF-16 is an ASCII-superset
23:03	<zewt>	it's not a multibyte encoding at all, double-byte encodings are a different world entirely
23:03	<caitp>	well, they are sort of
23:04	<caitp>	if the low byte is the first byte read, and you're skipping a byte for each character, and the code points are all below 0x80
23:04	<zewt>	oh yeah this http://krijnhoetmer.nl/irc-logs/whatwg/20111215#l-1034
23:05	<zewt>	hope was to get ibm866 dropped, no idea if anyone actually tried
23:06	<zewt>	caitp: "skipping a byte for each character" if you have to skip every other byte then ... that's not a superset of ASCII. heh
23:06	<caitp>	it is for the first character you read ;)
23:07	<zewt>	as ascii supports streams which are longer than one byte long, that's also not a superset of ASCII :0
23:07	<zewt>	)
23:07	<caitp>	ascii is a text encoding and has no concepts of streams
23:08	<caitp>	a single utf16 character can look like a null-terminated ascii string
23:08	<zewt>	not sure what this has to do with the fact that UTF-16 is in no possible conceivable contrived way a superset of ASCII, heh
23:09	<caitp>	it is, because unicode is a superset of ascii, codepoints 0x00-0x7F, followed by latin1 extensions to ascii, followed by the rest of the basic multilingual plane
23:09	<zewt>	encodings that are streams of 8-bit units (ascii, utf-8, sjis, most of them) are typically treated as separate concepts to ones that are streams of 16-bit units (utf-16, ucs-2) or 32-bit (ucs-4)
23:09	<zewt>	... utf-16 is not a superset of ASCII. sorry, this is too silly a conversation for me to bother with
23:10	<caitp>	unicode is a superset of ASCII, and if you look at patterns of bytes, it's possible that you can't tell the difference between certain single-character UTF16 strings, and certain null-terminated ASCII strings
23:11	<zewt>	no. an encoding which is a superset of ASCII is one where the same string of codepoints ("hello"), encoded with both encodings, results in the same block of data.
23:13	<caitp>	nonsense, we're in agreement that utf16 bye definition contains codepoints represented by a minimum of 16 bits, but that does not mean that codepoints between 0x0000 and 0x0080 aren't supersets of ascii, and can't look identical to certain ascii strings
23:13	<caitp>	obviously that depends on arch and doesn't include multi-character strings, byte that's irrelevant
23:16	<zewt>	you seem to have a deep misunderstanding of what "superset of ascii" means; it does not mean "every sequence of bytes that is valid ASCII is also valid UTF-16", it means "every sequence of bytes that is valid ASCII has the same interpretation in UTF-16", which is obviously false
23:16	<zewt>	anyhow, going to do something else now :)
23:17	<caitp>	that's one definition of superset, but when you get down to patterns of bits, it's not the case
23:17	<caitp>	but regardless I agree it's not a super important discussion to have
23:17	<caitp>	nobody cares about utf16 =)
23:18	<gsnedders>	Plenty of people care about UTF-16 and it's used plenty
23:19	<caitp>	it's not really used in any serious capacity for interchange of data
23:20	<gsnedders>	Plenty of CJK sites use it