00:00
<SamB>
I tend to think of the schemes that I can't get resources from as URNs. mailto: excepted, for some reason.
00:01
<zewt>
i call them all "URLs", because I don't find the distinction useful, and trying to distinguish them seems like trying to divide things artificially
00:05
<zewt>
i guess things like XML namespaces make it feel particularly silly, where you have strings that look exactly like URLs to HTTP websites, but are really meaningless identifiers that don't point to any resource
00:06
<zewt>
(not that XML namespaces are anything to draw much information from; using URLs as opaque identifiers is a terrible idea)
00:12
<gsnedders>
zewt: I dunno. I've just never got out of the habit. :P
00:13
<SamB>
zewt: well, that's pretty much the ISBN scheme in a nutshell (opaque identifiers, I mean, not so much the terrible idea part)
00:15
SamB
eagerly awaits a proper URL implementation of that scheme
00:22
<Hixie>
SamB: URLs that don't point to a specific resource or software action are kinda pointless imho.
00:22
<Hixie>
SamB: (this correlates well to the total lack of usage of such URLs)
00:23
<SamB>
I think they get used in bibliographic data
00:23
<SamB>
though granted URLs that can be resolved seem preferred
00:48
<esprehn>
the MediaQueryList API is really silly, it reinvents event listeners :/
01:00
<TabAtkins>
gsnedders: Yes, and so that does indeed imply that Futhark is an alphabet. Cool.
01:00
<TabAtkins>
esprehn: Agreed. :/
01:20
<gsnedders>
TabAtkins As is Futhorc, FWIW. They're *relatively* similar.
01:20
<gsnedders>
TabAtkins: phonetically Old Norse and Old English were mutually comprehensible, at least to some extent
01:21
<gsnedders>
TabAtkins: The alphabets are probably similar enough to be understandable though not quite as far
01:21
<TabAtkins>
Yeah.
01:25
<gsnedders>
(The differences in verb forms would be especially noticable written, and not quite as much phonetically)
01:49
<Hixie>
SamB: find me users that click on links that do nothing, and i'll be impressed :-)
01:49
<Hixie>
SamB: (i've nothing against e.g. isbn: links, since they do do something if you have appropriate software.)
02:21
<gsnedders>
Hixie: a tel: link doesn't really do the same thing as any other link when clicked on, as an example
02:21
<gsnedders>
(On a phone, say.)
02:21
<Hixie>
tel: is like mailto:
02:21
<gsnedders>
(Which you wouldn't know about. :))
02:21
<Hixie>
a simple URL
02:21
<Hixie>
to an action
02:22
<zewt>
tel: and mailto: are just like any other url (they tend to load other apps, but so can ftp or anything else)
02:22
<gsnedders>
I'd argue it's not a location, and hence tel/mailto are different to ftp/http
02:23
<gsnedders>
but, eh, idk really
02:23
<gsnedders>
Well, care more than know
02:23
<zewt>
it's not the same as http in that it indicates an action (make a phone call) rather than a resource location, but i think that's completely uninteresting as far as the "what is a url" question goes
02:26
<zewt>
how about we just go with a url is like porn: hard to define, but i know it when i see it
02:36
<SamB>
I imagine tel: would actually have worked on the Windows 95 box I used to use ... if there was a phone plugged into the modem
02:37
<SamB>
anyway I want software that has all the books
02:37
<SamB>
though some books seem to predate ISBNs, so it'd need to support more schemes than just that
02:38
<SamB>
oh, and then there's periodicals
03:33
<Hixie>
zewt: my point is that a url is easy to define, it's whatever has the form foo:bar... like in url.spec.whatwg.org
14:05
<smaug____>
Domenic_: what is the status with stream APIs
14:05
<smaug____>
there is still the w3c draft and then there is your draft
14:27
<smaug____>
though, I guess http://anolis.hoppipolla.co.uk/aquarium.py/output?uri=http%3A%2F%2Frawgithub.com%2Fwhatwg%2Fstreams%2Fofficial-lookin%2Findex.html&process_filter=on&process_toc=on&process_xref=on&process_sub=on&process_annotate=on&filter=&annotation=&newline_char=LF&tab_char=SPACE&min_depth=2&max_depth=6&w3c_compat_xref_a_placement=on&parser=lxml.html&serializer=html5lib&output_encoding=ascii is n
14:27
<smaug____>
ot even a real draft yet
14:29
<smaug____>
I guess the readme isn't just converted to a spec format yet
16:21
<vmatva_>
Hi everybody. I'm reading WHATWG Encoding standard. And there is a 'encode' algorithm (http://encoding.spec.whatwg.org/#encode) . It says "Let output be a code point stream." That seems wrong to me, because result of encoding is a byte stream. Not a code point stream. Am I right? Or I misunderstood something?
16:41
<smaug____>
vmatva_: better to ask annevk about that spec
16:42
sarsky11-hi
is now away - Reason : Auto-Away after 30 minutes
16:47
<smaug____>
MikeSmith: Didn't you start spec'ing Console API ?
17:20
<SamB>
vmatva_: that does sound wrong, yes
19:15
<Domenic_>
smaug____: drafts are merging, using whatwg/streams as the base. Some of the stuff, e.g. Object URLs, may get ported over from the W3C draft, but the core primitives are the ones from whatwg/streams.
19:34
<SimonSapin>
Hixie: http://whatwg.org/C#input-stream "Any character that is a not a Unicode character, i.e. any isolated surrogate, is a parse error. (These can only find their way into the input stream via script APIs such as document.write().)"
19:34
<SimonSapin>
It’s a parse errors, but UA still have to process it and can end up with surrogates in any string in the resulting tree?
19:36
<zewt>
well, can you write a surrogate pair with two document.write() calls and have it work?
19:37
<zewt>
(haven't tried but would guess yes)
19:38
<SimonSapin>
zewt: I think yes, but I’m not interested in unpaired surrogates
19:38
<SimonSapin>
s/not/more/, sorry
19:38
<zewt>
i mean, you'd have a lone surrogate, but only temporarily
19:38
<SimonSapin>
what does temporarily mean?
19:39
<zewt>
document.write("\ud800"); document.write("\udc00"); would give you a lone surrogate for the period between the two calls
19:39
<SimonSapin>
data:text/html,<script>document.write("a\uD800b")</script>
19:40
<SimonSapin>
In firefox, I see D800 in a box, ie. "missing glyph"
19:40
<zewt>
that's what i'd expect
19:40
<zewt>
chrome gives just "a"
19:40
<SimonSapin>
so the surrogate made its way all the way up to the fonts subsystem
19:41
<zewt>
chrome's behavior is pretty surprising to me
19:44
<zewt>
more specifically: the whole string is in the DOM, but rendering stops (the "b" isn't rendered)
19:45
<zewt>
(which means that writing UTF-16 codepoints one at a time still works)
19:47
<SimonSapin>
Firefox is fine with it anywhere in the tree
19:47
<SimonSapin>
data:text/html,<script>document.write("<a\uD800b>");document.write(document.body.firstChild.tagName)</script>
19:47
<SimonSapin>
data:text/html,<script>document.write("<a a\uD800b>");document.write(document.body.firstChild.attributes[0].localName)</script>
19:50
<SimonSapin>
For context: in Servo we’re considering having the HTML tokenizer work on UTF-8 input rather than UTF-16. But UTF-8 can not encode surrogates.
19:50
<zewt>
i think chrome and firefox are doing the same thing during parsing, firefox's renderer is just better at coping with it
19:51
<SimonSapin>
I’m trying to determine if we can get away with decoding surrogates to U+FFFD, or if we’re constrained by web compat
19:51
<SimonSapin>
I have a hard time imagining real content relying on this, but this is the web
19:51
<SimonSapin>
hsivonen: any opinion?
19:51
<zewt>
i couldn't say, but i'd suspect there are people for some reason doing things like (equivalent to) for(i=0;i<s.length;++i) document.write(s.charAt(i));
19:52
<zewt>
or writing blocks of 1024 codepoints, or things that would otherwise split surrogates into two writes
19:54
<SimonSapin>
so, surrogate pairs in separate d.write calls. Yes, that seems plausible
19:55
<gsnedders>
SimonSapin: Yes, lone surrogates can be introduced from document.write, so it is an issue
19:56
<gsnedders>
I tried to convince Hixie to change this before to forbid lone surrogates to no avail, fwiw
19:57
<zewt>
trying to think of a real-life case where the above might happen, but it's hard to even think of legitimate uses for document.write itself...
19:58
<SimonSapin>
gsnedders: forbid as in replace them with U+FFFD?
20:00
<gsnedders>
SimonSapin: aye
20:01
<SimonSapin>
I’d be in favor
20:01
<zewt>
it's pretty weird that document.write allows it and (eg) createElement doesn't, since it means document.createElement(otherElement.tagName) can fail
20:01
<zewt>
(at least in Chrome, didn't check the spec for that)
20:01
<gsnedders>
zewt: There are plenty of cases that can fail though
20:01
<Domenic_>
yeah createElement has a lot of restrictions that HTML parsing doesn't
20:01
<Domenic_>
which is IMO quite weird.
20:02
<gsnedders>
It has to match the Name production in XML 1.04e in every browser
20:03
<zewt>
so you can create another element with the same tag via cloneNode, but not createElement
20:05
<zewt>
weird, but I guess I can't think of any case where it's harmful, and assuming document.write's leniency is web compat, I guess it's not necessarily better to make other APIs extra lenient just to be consistent with that
20:07
<jgraham>
SimonSapin: That sounds like a bad idea (trying to pretend that the input stream is UTF8)
20:07
<SimonSapin>
jgraham: explain?
20:07
<zewt>
i'd be worried that it would result in different layering
20:07
<jgraham>
SimonSapin: Well it isn't so…
20:08
<zewt>
eg. different encodings (and therefore different error cases and representable concepts) at each place in the pipeline
20:08
<jgraham>
SimonSapin: Specifically it seems like the kind of assumption that would be embedded quite deeply in the code
20:08
<jgraham>
Then later we would find web compat issues
20:08
<jgraham>
'and then it would be impossible to change without rewriting lots of things
20:11
<SimonSapin>
jgraham: existing implementations could keep using UTF-16 and still decode lone surrogates to U+FFFD
20:12
<gsnedders>
Indeed, the spec used to require this
20:12
<gsnedders>
And it wasn't compat reasons that changed it
20:13
<jgraham>
gsnedders: If the implementations don't do it we have no idea what the compat restrictions are
20:13
<SimonSapin>
gsnedders: what was the reason?
20:13
<gsnedders>
jgraham: Implementations did for a while.
20:13
<jgraham>
gsnedders: Citation needed
20:13
<gsnedders>
SimonSapin: IIRC it was an accidental change in an editorial change that nobody noticed for about two years
20:13
<gsnedders>
Or at least nobody questioned for that long
20:15
<gsnedders>
(I think html5lib-tests still requires lone surrogates get replaced by U+FFFD!)
20:16
<jgraham>
(also doing UCS2 to UTF8 conversion in every document.write / innerHTML call seems rather performance-suboptimal)
20:16
<jgraham>
(especially for innerHTML which is tragically often in tight loops)
20:17
<zewt>
(UTF-16)
20:17
<gsnedders>
zewt: No, UTF-16+lone-surrogates-passthrough
20:25
<zewt>
firefox still allows pasting lone surrogates into input fields, heh
20:26
<zewt>
looks like chrome does too (with the same rendering issues as elsewhere)
20:28
<zewt>
firefox will copy it out to the clipboard, chrome copies FFFD
20:28
<gsnedders>
Man, running `git annotate` on the spec takes… a while.
20:28
<zewt>
it's one of the slower git things in my experience
20:29
<gsnedders>
Well, right. It's obviously slow. But doing it on a 6MB file is really horrible.
20:30
<zewt>
probably more to do with the number of revisions visible
20:30
<zewt>
or how many revisions it has to go back or something, don't know how it's implemented
20:31
<gsnedders>
Yeah, number of revisions and number of lines.
20:31
<gsnedders>
It's number of lines that really kills it though
20:33
<gsnedders>
(The fact it's almost entirely CPU bound and not IO bound should be telling)
20:33
<vmatva_>
SamB: thank you. I submitted a bug. Just needed any confirmation.
22:04
<SimonSapin>
gsnedders: does html5lib-tests have tests for document.write()?
22:05
<gsnedders>
SimonSapin: No. But it does have tests for the tokenizer given a specific input, which I think is where these things are.
22:05
<gsnedders>
SimonSapin: i.e., the input stream prior to pre-processing
22:06
<SimonSapin>
gsnedders: Right. But normally that’s the output of the character encoding decoders, which never emit surrogates
22:07
<gsnedders>
SimonSapin: Indeed
22:12
<gsnedders>
SimonSapin: There's three justifications for having it there: a) in implementations with scripting support have to cope with this case (as the input stream can through the second entry to the input stream); b) bugs in the encoding layer leading to it; c) the fact these tests were written before there was any definition about the encoding layer in general
22:12
<SimonSapin>
makes sense
22:14
<gsnedders>
(wrt a, if the impl doesn't have scripting support they can just detect lone surrogates in the test and ignore that test)
22:15
<gsnedders>
SimonSapin: Do you, in webencodings, work around issue8271?
22:16
<SimonSapin>
gsnedders: no. The actual codecs are Python’s
22:16
<gsnedders>
SimonSapin: This is what I thought
22:16
<gsnedders>
Hence html5lib-python has to deal with the fact it supports Python versions with broken encodings :)
22:16
<SimonSapin>
python-webencodings is not doing much besides aliases
22:19
<gsnedders>
So it technically doesn't impl the spec :P
22:41
<SimonSapin>
uh, Presto fails at data:text/html,<script>document.write("\uD83D");document.write("\uDCA9")</script>
22:41
<SimonSapin>
(not that it’s very relevant anymore)
22:45
<gsnedders>
What does it do?
22:57
<SimonSapin>
Presto gives two empty rectangles (which seem to be the glyph for "did not find a glyph")
22:59
<zewt>
data:text/html,<script>document.write("\uD83D"); document.write("\uDCA9"); document.write(document.body.innerText.charCodeAt(0).toString(16));</script>