| 10:00 | <qq[IrcCity]> | hello. where can I find people who care about bugs in processing “text/plain” by major modern browsers? more details here http://www.superstructure.info/browser/compromised/toxic-sniffing.html (the story is only partially written, but enough to see bugs). |
| 10:07 | <annevk> | qq[IrcCity]: "bugs" |
| 10:07 | <annevk> | qq[IrcCity]: see https://mimesniff.spec.whatwg.org/ |
| 10:09 | <annevk> | qq[IrcCity]: if your problem is with BOM being more important than other encoding declarations, that's https://encoding.spec.whatwg.org/ |
| 10:09 | <annevk> | qq[IrcCity]: also not considered a bug |
| 10:15 | <qq[IrcCity]> | annevk: a text wall too high; didn’t find anyting relevant to my case but mentioning of one https://mimesniff.spec.whatwg.org/#no-sniff-flag but without much detail. |
| 10:15 | <qq[IrcCity]> | could you give a link to provisions that are relevant to “text/plain”? |
| 10:16 | <annevk> | qq[IrcCity]: it's very much unclear what "processing bug" you're talking about so I don't know |
| 10:16 | <annevk> | qq[IrcCity]: if it's indeed mostly about encodings, I recommend reading the Encoding Standard |
| 10:16 | <annevk> | qq[IrcCity]: you shouldn't be using non-utf-8 encodings anyway |
| 10:16 | <qq[IrcCity]> | damn. who approved it? |
| 10:16 | <annevk> | qq[IrcCity]: and BOM trumps Content-Type |
| 10:17 | <annevk> | qq[IrcCity]: not sure what you mean |
| 10:17 | <qq[IrcCity]> | which standard-making body declared “you shouldn't be using non-utf-8 encodings anyway”? |
| 10:18 | <annevk> | qq[IrcCity]: WHATWG, and I guess W3C did too since they copied it over |
| 10:20 | <qq[IrcCity]> | annevk: W3C standard of HTML5 doesn’t contain such rubbish. it ony specified that if the document is seemingly Unicode (isn’t encoded in an octet-oriented code page), then BOM takes precedence. |
| 10:21 | <annevk> | qq[IrcCity]: while out of date and a poor reference, http://www.w3.org/TR/encoding/ most certainly has the same requirement as https://encoding.spec.whatwg.org/ with regards to utf-8 |
| 10:22 | <annevk> | qq[IrcCity]: also, http://www.w3.org/TR/html5/references.html#refsENCODING |
| 10:22 | <annevk> | qq[IrcCity]: anyway, I wouldn't recommend reading W3C copies, they're not what's being implemented |
| 10:24 | <qq[IrcCity]> | http://www.w3.org/TR/encoding/ doesn’t state anything is deprecated, restricted to certain circumstances, or so. all encodings are, theoretically, permitted. |
| 10:25 | <annevk> | qq[IrcCity]: *sigh* |
| 10:25 | <annevk> | qq[IrcCity]: 'Authors must use the utf-8 encoding and must use the ASCII case-insensitive "utf-8" label to identify it.' |
| 10:26 | <qq[IrcCity]> | again, HTML5 is about “text/html”. annevk, do you understand the word “text/plain”? |
| 10:26 | <annevk> | qq[IrcCity]: HTML defines how text/plain works |
| 10:26 | <annevk> | qq[IrcCity]: and text/plain is not a word |
| 10:27 | <qq[IrcCity]> | RoTFL. these are different media types. |
| 10:27 | <annevk> | qq[IrcCity]: sure, but it's rendered using the HTML parser, see "Page load processing model for text files" |
| 10:28 | <annevk> | qq[IrcCity]: in the HTML standard |
| 10:28 | <qq[IrcCity]> | annevk, I read in HTML5 about parser. |
| 10:28 | <annevk> | qq[IrcCity]: irrespective of that, the Encoding standard applies to all MIME types |
| 10:28 | <qq[IrcCity]> | it’s a stage after decoding, not before. |
| 10:29 | <annevk> | qq[IrcCity]: actually, part of the HTML parser handles decoding |
| 10:29 | <qq[IrcCity]> | HTML decoding rules apply only to HTML. |
| 10:30 | <qq[IrcCity]> | and HTML5 make a special provision abour “irrelevent” confidence. |
| 10:30 | <qq[IrcCity]> | when browser operates in its internal text encoding. |
| 10:31 | <annevk> | I see, well, have a nice day |
| 10:47 | <qq[IrcCity]> | annevk, I think W3C people missed your out-of-context “Authors must use the utf-8” thing in CR-encoding-20140916, and possibly will be amazed with such your “novel” ideas as applying BOM sniffing to all media types including binaries ☺ |
| 10:47 | <qq[IrcCity]> | may I quote this chat in mailing lists? |
| 11:32 | <qq[IrcCity]> | given the channel has logs publicly readable at krijnhoetmer.nl, I’ll proceed to quote the conversation without explicit permission. |
| 11:36 | <annevk> | Hah, you're in for a surprise ;-) |
| 11:37 | <annevk> | But yeah, feel free to quote. I tried to help you out, but it seems like you had some set of answers already |
| 11:44 | <qq[IrcCity]> | sure, I’m conservative. my “set of answers” is based on original HTTP/1.1, not diluted by accommodation to idiocy. |
| 11:44 | <annevk> | qq[IrcCity]: please stay civil |
| 11:46 | <qq[IrcCity]> | naïve people here. admins of httpd software were told for 17 years (since RFC 2068 till HTML5) they must specify Content-Type: text/whichever, charset=actual. many of them ignored it. |
| 11:47 | <qq[IrcCity]> | now you make BOM sniffing mandatory and hope it would be universal solution ☺ |
| 11:50 | <annevk> | sounds about right |
| 11:52 | <qq[IrcCity]> | and naïve Prof. Dürst who boasts he pioneered reliability of heuristic UTF-8 detection as early as in 1997, but seemingly unaware how modern broswers “detect” UTF-8 :D |
| 11:56 | <qq[IrcCity]> | I would look to see what Martin J. Dürst would do when, eventually, learned about the algorithm promulgated by Anne van Kesteren and company. what would he say to Anne. |
| 11:57 | <annevk> | Pretty sure he knows about it |
| 11:58 | <qq[IrcCity]> | if the stream starts with «\357\273\277», then it’s damn UTF-8, no matter the fourth byte is \377 :D |
| 12:18 | <qq[IrcCity]> | one big problem: major browsers ceased to honour values in Content-Type, and not only for text/html. and two narrower cases: 1. broken text/plain (already), |
| 12:18 | <qq[IrcCity]> | 2. such novelties as “the Encoding standard applies to all MIME types” (yet to happen to application/* and so on). |
| 12:34 | <nox> | How is text/plain broken? |
| 12:36 | <qq[IrcCity]> | nox, did you see test cases at http://www.superstructure.info/browser/compromised/toxic-sniffing.html#better ? |
| 12:36 | <qq[IrcCity]> | one should enter \357\273\277 and so on manually, of course. |
| 12:39 | <nox> | I don't understand. |
| 12:39 | <nox> | What spec does that test follow? |
| 12:42 | <qq[IrcCity]> | nox, testing compliancy with RFC 2616 was initially in mind. RFC 7231 is somewhat vague about Content-Type, but major browsers defy even its relaxed provisions. |
| 12:42 | <nox> | Where is the accommodation to idiocy? |
| 12:44 | <qq[IrcCity]> | nox: overriding explicitly serviced «charset=» with own guesses. BTW, what is no-sniff-flag? |
| 12:48 | <qq[IrcCity]> | in other words, Google Chrome tells its user that one lamer qq[IrcCity] mistook «charset=» and it (Google) knows better what was intended meaning. |
| 12:49 | <qq[IrcCity]> | even when the text its claim to be UTF-8 has \377 in the fourth octet. |
| 12:49 | <nox> | What is that text? |
| 12:49 | <nox> | The page is lacking many information. What was the actual charset you intended to transmit? |
| 12:51 | <qq[IrcCity]> | nox, there is a simpler test case at http://course.irccity.ru/ya-yu-9-amp.txt but about toxic UTF-16. two minutes, about to made the same for UTF-8. |
| 12:53 | <nox> | qq[IrcCity]: And why would \377 as the fourth octet change things btw? |
| 12:53 | <qq[IrcCity]> | from the point of view of RFC 2616, it’s at all irrelevant. |
| 12:54 | <qq[IrcCity]> | but it can show madness of BOM sniffing better. |
| 12:55 | <nox> | I see. |
| 12:56 | <nox> | Interestingly, forcing charset to Windows-1251 on that last link doesn't change anything in Safari. |
| 12:57 | <qq[IrcCity]> | http://course.irccity.ru/p-guillemet-yi-ya.txt shows toxic UTF-8. |
| 12:57 | <nox> | qq[IrcCity]: Then again, the question is whether your examples are the majority, or whether actually honouring 'charset' breaks more things. |
| 12:58 | <qq[IrcCity]> | examples are not about majority. they are about predictability. |
| 12:59 | <nox> | The rules you call bugs are predictable. |
| 13:01 | <qq[IrcCity]> | they are predictable only since some guys found certain WhatWG and developers agreed to follow recommendations. they aren’t predictable in the old world where protocols did matter. |
| 13:02 | <nox> | You do realise that the specs are how they are because actually the majority never cared about honouring 'charset', even in the old world? |
| 13:02 | <qq[IrcCity]> | nox, which majority? Russian-speaking sites mostly cared. |
| 13:03 | <qq[IrcCity]> | and nobody can predict new things will WhatWG invent tomorrow: UTF-8 sniffing for octet-stream or whatever. |
| 13:03 | <nox> | Dürst in a mail linked on the page says "Yes, the iso-8859-1 'default' was invalidated because there were millions and millions of documents for which it would have been wrong, especially in Eastern Europe and Asia." |
| 13:03 | <nox> | Are you talking about something else? |
| 13:03 | <qq[IrcCity]> | nox, about overriding explicitly serviced charset, again. |
| 13:04 | <qq[IrcCity]> | not old HTTP default. |
| 13:07 | <nox> | So I guess this actually broke a Russian-speaking site somewhere? |
| 13:07 | <qq[IrcCity]> | what?? |
| 13:07 | <nox> | Not honouring Content-Type. |
| 13:08 | <qq[IrcCity]> | what do you speaking about? 90%+ pages contained charset= if not in Content-Type, then in HTML <meta> in the worst case. |
| 13:10 | <qq[IrcCity]> | if I am barred from saying п»їя in Windows-1251 (although all four characters pertain to the codepage), then the protocol is not honoured anymore. not a problem with HTML that may not start… |
| 13:10 | <qq[IrcCity]> | … from arbitrary characters anyway, but a problem for text/plain. |
| 13:12 | <nox> | My question is, is that an actual bug that actually breaks stuff, or is that just a theoretical problem that doesn't break anything in practice? |
| 13:12 | <nox> | You are not barred from saying such a thing in a text/plain document, you are barred from beginning a document with such a thing. |
| 13:12 | <nox> | No? |
| 13:14 | <nox> | Did you look at RFC5987 btw? |
| 13:18 | <nox> | Also, https://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-26#section-3.1.1.5 |
| 13:19 | <qq[IrcCity]> | nox, how is RFC5987 relevant to the dispute? it is related to our case (RFC 7231) like RFC 1522 (i18n of headers) is related to RFC 1521 (i18n of bodies). |
| 13:20 | <nox> | qq[IrcCity]: Yeah disregard that one, lost myself in tabs. |
| 13:20 | <nox> | Meant to ask whether httpbis had some changes planned in that area, apparently no. |
| 13:21 | <qq[IrcCity]> | nox: so you agree that browsers defy HTTP semantics 3.1.1.5, aren’t you? |
| 13:22 | <nox> | I agree, but I'm not sure it's actually important. |
| 13:23 | <qq[IrcCity]> | I’m not aware anyone complained before me about that. |
| 13:24 | <qq[IrcCity]> | I made Codepage Explorer just for the case, to show an actual application that can be broken by unlucky combination of octets. |
| 13:25 | <nox> | It's a nice catch, but I'm afraid you'll just be told that it's not actually a problem in real world and thus nothing will be changed. |
| 13:27 | <qq[IrcCity]> | not a nice thing for me, since my trust in browsers ended abruptly. this is not that Internet I was accustomed to. |
| 13:28 | <qq[IrcCity]> | nowadays browsers lie you twice more than despisable Windows lied users in 1990s. |
| 16:17 | <annevk> | Ugh |
| 16:18 | <annevk> | ECMAScript still requires that `var x = new ArrayBuffer(10); postMessage(x, "*", [x]); console.log(x.byteLength)` throws, but no implementation does that |
| 16:36 | <annevk> | Oh, actually, I'm mistaken |
| 17:33 | <Domenic> | Hmm I think it does |
| 17:34 | <Domenic> | Last I checked TC39 still wants to try that. And no implementers want to try that. So awesome. |
| 23:27 | <MikeSmith> | I don't understand how it's possible that I never knew about http://devdocs.io/ before now |
| 23:27 | <MikeSmith> | do other people here know about it already? |
| 23:28 | <MikeSmith> | it seems extremely well done, as far as putting some very good UI/UX around aggregated docs from a bunch of different sources (e.g., MDN, but a ton of other stuff as well) |
| 23:40 | <jgraham> | Oh, I had heard of that but now it has Rust docs |
| 23:40 | <jgraham> | Seems like it could be more convenient than trying to remember where it installs them and start a web server |