18:46
<krit>
annevk: won’t make it before noon tomorrow (personal reasons)
19:50
<Hixie>
got it down to 8.5s user+sys to do all the tests and parse the html spec and reserialise it
22:58
<gsnedders>
which of the encodings defined in Encoding are not ASCII-supersets?
22:59
<caitp>
ebcdic
23:00
<gsnedders>
caitp: is not in Encoding
23:01
<caitp>
i know
23:01
<gsnedders>
then it by definition is not an encoding defined in Encoding which is not an ASCII-supserset
23:01
<caitp>
anyways, it would be anything which doesn't have the "if it's less than 0x80, return it"
23:01
<caitp>
clause
23:02
<caitp>
with the exception of the utf16 stuff
23:02
<gsnedders>
and possibly some of the SBCSes, as at least ibm866 isn't
23:02
<zewt>
i don't recall there being any at all, ascii-compatibility is pretty fundamental
23:02
<caitp>
utf16be isn't really ascii-compatible
23:02
<caitp>
on a little endian system
23:03
<gsnedders>
no variant of UTF-16 is an ASCII-superset
23:03
<zewt>
it's not a multibyte encoding at all, double-byte encodings are a different world entirely
23:03
<caitp>
well, they are sort of
23:04
<caitp>
if the low byte is the first byte read, and you're skipping a byte for each character, and the code points are all below 0x80
23:04
<zewt>
oh yeah this http://krijnhoetmer.nl/irc-logs/whatwg/20111215#l-1034
23:05
<zewt>
hope was to get ibm866 dropped, no idea if anyone actually tried
23:06
<zewt>
caitp: "skipping a byte for each character" if you have to skip every other byte then ... that's not a superset of ASCII. heh
23:06
<caitp>
it is for the first character you read ;)
23:07
<zewt>
as ascii supports streams which are longer than one byte long, that's also not a superset of ASCII :0
23:07
<zewt>
)
23:07
<caitp>
ascii is a text encoding and has no concepts of streams
23:08
<caitp>
a single utf16 character can look like a null-terminated ascii string
23:08
<zewt>
not sure what this has to do with the fact that UTF-16 is in no possible conceivable contrived way a superset of ASCII, heh
23:09
<caitp>
it is, because unicode is a superset of ascii, codepoints 0x00-0x7F, followed by latin1 extensions to ascii, followed by the rest of the basic multilingual plane
23:09
<zewt>
encodings that are streams of 8-bit units (ascii, utf-8, sjis, most of them) are typically treated as separate concepts to ones that are streams of 16-bit units (utf-16, ucs-2) or 32-bit (ucs-4)
23:09
<zewt>
... utf-16 is not a superset of ASCII. sorry, this is too silly a conversation for me to bother with
23:10
<caitp>
unicode is a superset of ASCII, and if you look at patterns of bytes, it's possible that you can't tell the difference between certain single-character UTF16 strings, and certain null-terminated ASCII strings
23:11
<zewt>
no. an encoding which is a superset of ASCII is one where the same string of codepoints ("hello"), encoded with both encodings, results in the same block of data.
23:13
<caitp>
nonsense, we're in agreement that utf16 bye definition contains codepoints represented by a minimum of 16 bits, but that does not mean that codepoints between 0x0000 and 0x0080 aren't supersets of ascii, and can't look identical to certain ascii strings
23:13
<caitp>
obviously that depends on arch and doesn't include multi-character strings, byte that's irrelevant
23:16
<zewt>
you seem to have a deep misunderstanding of what "superset of ascii" means; it does not mean "every sequence of bytes that is valid ASCII is also valid UTF-16", it means "every sequence of bytes that is valid ASCII *has the same interpretation* in UTF-16", which is obviously false
23:16
<zewt>
anyhow, going to do something else now :)
23:17
<caitp>
that's one definition of superset, but when you get down to patterns of bits, it's not the case
23:17
<caitp>
but regardless I agree it's not a super important discussion to have
23:17
<caitp>
nobody cares about utf16 =)
23:18
<gsnedders>
Plenty of people care about UTF-16 and it's used plenty
23:19
<caitp>
it's not really used in any serious capacity for interchange of data
23:20
<gsnedders>
Plenty of CJK sites use it