00:06
<gsnedders>
I can't find anything, at all?
00:06
<gsnedders>
Did this get dropped with pre-processing the input stream?
00:08
<SimonSapin>
gsnedders: in CSS? It’s in "consume an escaped code point"
00:08
<gsnedders>
SimonSapin: HTML.
00:08
<gsnedders>
I thought this would be assumed given #whatwg :P
00:08
<SimonSapin>
character encodings’ decoders never emit them
00:10
<SimonSapin>
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#consume-a-character-reference does the rest
00:11
<gsnedders>
SimonSapin: the encoding spec doesn't say that?
00:12
<SimonSapin>
gsnedders: not explicitly, but none of the decoders emit them
00:12
<gsnedders>
UTF-8 appears to be able to?
00:13
<gsnedders>
Unless I'm being blind?
00:14
<gsnedders>
And noncharacteres?
00:14
<gsnedders>
s/noncharacteres/perm. noncharacters/
00:16
<SimonSapin>
"If byte is 0xED, set utf-8 upper boundary to 0x9F. " and the like prevent surrogates
00:16
<gsnedders>
Okay, I'd tried to quickly reason with that in my head but failed. :P
00:17
<SimonSapin>
u'\uD7FF'.encode('utf8') == b'\xed\x9f\xbf'
00:17
<gsnedders>
Well, that's a nice bug.
00:17
<gsnedders>
Given that's never been legal.
00:17
<SimonSapin>
\uD800 would be \xed\xa0\x80
00:18
<gsnedders>
Oh, wait, U+d800 is the first.
00:18
<gsnedders>
Duh.
00:18
<SimonSapin>
which is forbidden by the "boundaries"
00:18
<gsnedders>
This is why I cannot read.
00:18
<gsnedders>
read?
00:18
<gsnedders>
reason.
00:18
<gsnedders>
I'm blaming jetlag for my inability to make sense, despite not really being jetlagged at all. :P
00:19
<SimonSapin>
the decoders emit "scalar values" http://www.unicode.org/glossary/#unicode_scalar_value , which exclude surrogates but not other non-characters
00:19
<gsnedders>
Right.
00:19
<gsnedders>
So the former handling of permament non-characters is gone?
00:19
<SimonSapin>
I don’t know
00:20
<SimonSapin>
it’s not in the Encoding layer at least
00:20
<gsnedders>
Doesn't appear to be in HTML either.
00:20
<gsnedders>
Of that former section only U+0000 still exists, by virtue of it now being handled inline in the parser.
03:44
<gsnedders>
So, the input stream is defined in terms of a series of the HTML speeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeec`.
03:45
<gsnedders>
Um, okay. Something went wrong there, and I couldn't stop that.
03:48
<gsnedders>
s/of .*$/of what the HTML spec defines a "Unicode codepoint" means — do we really want lone surrogates to be valid?/
03:53
<zewt>
valid where?
03:58
<gsnedders>
A valid HTML document.
04:01
<gsnedders>
Or a valid fragment. Say from innerHTML.
04:02
<Hixie_>
"valid" in what sense?
04:02
<gsnedders>
Produces no parse-errors.
04:03
<gsnedders>
Which is a conformance requirement.
04:03
<zewt>
iirc you can't get lone surrogates in an HTML document, since the charset decoders won't emit them
04:04
<gsnedders>
zewt: Yes, you need to be going from an abstract source, like the innerHTML case from the DOM.
04:05
<zewt>
not sure what you're asking; you can insert them with script, but an actual html document won't generate them
04:09
<zewt>
are you saying div.textContent = "\ud800" should throw an exception or something?
04:09
<gsnedders>
Right, but that doesn't exempt you from conformance, merely from conformance-checkers having to check it :P
04:09
<gsnedders>
No, be non-conforming and trigger a parse error.
04:09
<zewt>
"parse error" where?
04:10
<zewt>
i mean, an error in which parser?
04:14
<gsnedders>
zewt: Most browsers can report parse errors to dev tools
04:14
<gsnedders>
From the fragment case from innerHTML.
04:15
<Hixie_>
gsnedders: i don't think it cares about it in the parser, but there's separate conformance errors about there not being lone surrogates
04:15
<gsnedders>
Where?
04:16
<Hixie_>
a fascinating question
04:16
<gsnedders>
There *used* to be in the clause prohibiting perm. noncharacters
04:18
<Hixie_>
"Text nodes and attribute values must consist of Unicode characters, must not contain U+0000 characters, must not contain permanently undefined Unicode characters (noncharacters), and must not contain control characters other than space characters."
04:18
<Hixie_>
+ "The term Unicode character is used to mean a Unicode scalar value (i.e. any Unicode code point that is not a surrogate code point)."
04:22
<gsnedders>
Where's that?
04:23
<gsnedders>
http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#unicode-code-point gives the definition of codepoint differently?
04:23
<Hixie_>
what i pasted doesn't use the term "code point"
04:23
<zewt>
(it does a little)
04:26
<gsnedders>
Blarf!
04:27
<gsnedders>
Hixie_: So it's a conformance requirement yet not a parse error?
04:28
<Hixie_>
zewt: not normatively :-)
04:29
<gsnedders>
Hixie_: Although everything else in that list of what they cannot contain is a parse rror?
04:29
<gsnedders>
*error
04:29
<gsnedders>
AFAIK you removed it on grounds that encodings cannot output lone surrogates.
04:31
<Hixie_>
removed from the parser?
04:32
<gsnedders>
From the preprocessing the input stream clause that forbids null, noncharacters, etc.
09:16
<Ms2ger>
Hm, the box at http://www.whatwg.org/specs/web-apps/current-work/multipage/workers.html#WorkerGlobalScope-partial lists a bug that has been closed for months
09:16
Ms2ger
wonders who maintains that
10:48
<Ms2ger>
Anyone around who understands the parser?
10:50
<Ms2ger>
Maybe zcorpan
11:28
<zcorpan>
Ms2ger: what parser?
11:30
zcorpan
isn't actually here
17:04
<Ms2ger>
zcorpan-not-here: html
17:17
<jgraham>
I love XHTML. I love the way I can get a parser error in gecko loading a page over a bad connection
17:18
<tantek>
nice
17:18
tantek
presumes jgraham means "XML" in general
17:18
<Ms2ger>
jgraham, you know a thing or two about the html parser, right?
17:18
<jgraham>
Well I suppose the other alternative is that the connection isn't the problem and the content is broken
17:19
<Ms2ger>
I can't figure out what "If the stack of open elements has a button element in scope, then run these substeps:" means
17:19
<jgraham>
But the fowc (Flash of Working Content) before the YSOD is... irritating
17:19
<Ms2ger>
"has a button element in scope" links to "The stack of open elements is said to have an element in scope when it has an element in the specific scope consisting of the following element types:"
17:19
<jgraham>
Ms2ger: What so you mean "means"?
17:19
<Ms2ger>
Note no reference to the "button" part
17:20
<Ms2ger>
My stack is [html, body]
17:20
<Ms2ger>
Do I have a button element in scope?
17:22
<jgraham>
Ms2ger: No. The spec is confusing here because there are two different things
17:22
<jgraham>
The element that may or may not be in scope
17:22
<jgraham>
and the set of elements that delimit the scopes
17:23
<jgraham>
Having a button element in scope is not the same as having an element in button scope
17:23
<Ms2ger>
Right
17:23
<Ms2ger>
I suspected that was the case
17:23
<jgraham>
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#has-an-element-in-the-specific-scope
17:23
<Ms2ger>
But I can't find a clear definition of having a button element in scope
17:24
<jgraham>
"target node" in that algorithm is a html:button element
17:24
<Ms2ger>
(I can find a definition of "having an element in button scope", but I don't understand the definition)
17:24
<jgraham>
But I don't think it actually says that explicitly
17:24
<jgraham>
File a bug
17:25
<Ms2ger>
Filed https://www.w3.org/Bugs/Public/show_bug.cgi?id=23119
17:26
<jgraham>
I think instead of talking about "target node" it should be clear that it is an algorithm taking a list and an element name (or an namespace, name pair; I'm not sure)
17:26
<jgraham>
And say "if node is an /element/ element" or something
17:26
<Ms2ger>
This code has been implemented interoperably at least four times, and it still isn't clear \o/
17:26
<jgraham>
I think this part of the spec changed
17:26
<Ms2ger>
Yeah, it might have
17:27
<jgraham>
But the namespace part was never clear
17:27
<Ms2ger>
Okay, that helps
17:27
Ms2ger
goes back to reviewing this one-line test case
17:28
<jgraham>
In general it would be good if Hixie was explict for each algorithm what the imputs and ouptputs are
17:28
<jgraham>
Even if it meant having some special convention to represent it
17:28
<Ms2ger>
(And I've consumed the first token, yay)
17:29
<Ms2ger>
Also, the order of the switch cases in the parser seems to be random
17:31
<jgraham>
ElementInScope :: Stack Element -> LocalName -> [LocalName] -> Bool
17:33
<Ms2ger>
Or ElementInScope :: Stack Element -> (LocalName, NS) -> [(LocalName, NS)] -> Bool?
17:33
<jgraham>
Right
17:33
<jgraham>
(dunno if tuples can be represented like that in Haskell, but this is at best pseudo-Haskell)
17:34
<Ms2ger>
Yeah, that's fine
17:34
Ms2ger
wrote some Haskell this year
17:35
<Ms2ger>
(Oh, the list is even explicitly [(LocalName, NS)] now
17:35
jgraham
should probably learn enough to do something useful
17:36
<Ms2ger>
I dunno
18:07
<Ms2ger>
gsnedders, land https://critic.hoppipolla.co.uk/r/193 already :)
19:46
<Yuhong>
https://news.ycombinator.com/item?id=6307219
19:46
<Yuhong>
Someone wishes that conforming HTML required closing tags.