06:50
<sideshowbarker>
At https://html.spec.whatwg.org/multipage/parsing.html#parse-error-surrogate-in-input-stream I see this note: > Surrogates can only find their way into the input stream via script APIs such as document.write() ...but I can't find what part of the parsing algorithm it is that actually prevents surrogates from otherwise finding their way into the input stream. There's at least nothing in the parsing algorithm that explicitly checks for surrogates, so I'm not sure what else I should be looking for...
08:32
<annevk>
sideshowbarker: byte to code point conversion takes care of that (i.e., the Encoding standard)
08:40
<sideshowbarker>

annevk: thanks, yeah, I subsequently at https://html.spec.whatwg.org/multipage/parsing.html#the-input-byte-stream found:

Given a character encoding, the bytes in the input byte stream must be converted to characters for the tokenizer's input stream, by passing the input byte stream and character encoding to decode.

08:48
<sideshowbarker>

annevk: so if I have a style element in a document that’s already been parsed by a conformant HTML parser, and I want to process the CSS stylesheet that element contains, I should not need to do https://drafts.csswg.org/css-syntax/#input-preprocessing to pre-process that any further, right?

I mean, because the HTML parser has already done all the newline normalization and replacing of U+0000 and replacing of stray surrogates which that CSS preprocessing algorithm would do. Right?

10:30
<annevk>
sideshowbarker: I don't think HTML replaces FF by LF. Do CSS parser actually do that? Seems really weird.
10:31
<annevk>
HTML does handle U+0000 if memory serves so that step can indeed be skipped
10:32
<sideshowbarker>
yeah I don’t know why CSS needs to be doing something different than calling into “decode” from the Encoding standard, the way that the HTML parsing algorithm does
10:33
<annevk>
sideshowbarker: well this is something they do after they have code points
10:33
<sideshowbarker>
ah
10:33
<annevk>
So it applies to post-parser and script inputs
10:35
<sideshowbarker>
I see. But that takes me back to my question about why it should be necessary to do that at all for the case of style element contents that have already been through an HTML parser
11:22
<annevk>
sideshowbarker: well, the FF to LF translation would not have happened. Why that's necessary? Not sure, I don't recall FF being special in CSS.
11:28
<sideshowbarker>
After thinking about it more, I guess what I'm really wondering about is whether browsers actually do that CSS preprocessing for style element contents. So I suppose I should look at the browser sources to see