#whatwg on 2014-03-19

00:07	<Hixie>	wtf
00:07	<Hixie>	json encodes astral characters as their UTF-16 surrogates encoded in ASCII???
00:07	<JosephSilber>	I think I found a bug in Chrome: http://codepen.io/JosephSilber/pen/dFgxo/
00:08	<JosephSilber>	Toggling the "absolute" class there doesn't affect the parent's width.
00:08	<Hixie>	crbug.com/new
00:08	<JosephSilber>	Adding the "absolute" class directly in the HTML does work: http://codepen.io/JosephSilber/pen/LiCKA/
00:09	<JosephSilber>	Hixie: Yeah. Just checking here first if I'm misunderstanding expected behavior.
00:09	<Hixie>	i don't understand what's going on in that test
00:10	<Hixie>	there's three files?
00:10	<JosephSilber>	Hixie: there are two flex containers, nested.
00:10	<JosephSilber>	The inner container should collapse to its content's width.
00:11	<SimonSapin>	Hixie: yes. The J in JSON is JavaScript, 16bit strings and all
00:11	<JosephSilber>	Setting one of its children's position to absolute should collpase the container's width.
00:11	<SimonSapin>	at least, that’s if you want to backslash-escape such characters. I think just having them literally in UTF-8 should also work
00:11	<JosephSilber>	Work well in Firefox, and also works well in Chrome when not doing it dynamically.
00:13	<Hixie>	JosephSilber: is http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=2893 equivalent to your test?
00:14	<JosephSilber>	Hixie: yes. Shouldn't .inner's width collpase? Like it does here: http://codepen.io/JosephSilber/pen/LiCKA/
00:14	<Hixie>	JosephSilber: it seems to work to me... when you set the attribute, the element becomes abs pos and the right thing moves under it. am i missing something?
00:15	<JosephSilber>	Hixie: check it out in ff
00:15	<Hixie>	oh i see, the width is on the inner ones, not the outer ones
00:15	<Hixie>	yup, looks like a bug
00:15	<JosephSilber>	k. will report
00:16	<Hixie>	(btw when making a test case you really want to use as little as possible. so e.g. all the styles on the button are extraneous here and should be dropped in the test. having a separate <script> block is extraneous if you can just do it inline. etc.)
00:16	<Hixie>	(i thought the rgba() thing was especially amusing :-) )
00:16	<JosephSilber>	It's scss
00:19	<Hixie>	scss?
00:19	<Hixie>	in a test? :-)
00:22	<JosephSilber>	ha
00:23	<TabAtkins>	JosephSilber: Yeah, that's a bug.
00:24	<JosephSilber>	reporting
00:26	<SamB>	SimonSapin: raw UTF-8, huh? well, as long as we aren't applying "UTF-8" to the UTF-16 ...
00:26	<SamB>	(which is actually called something else)
00:27	<SimonSapin>	wait, what?
00:27	<SimonSapin>	no, it’s not CESU-8
00:29	<JosephSilber>	https://code.google.com/p/chromium/issues/detail?id=353837&thanks=353837&ts=1395188953
01:00	<MikeSmith>	tantek: are all of the rel values in the POSH table of the link-relations page meant to be considered conforming/valid?
01:01	<tantek>	no
01:02	<tantek>	they're just random author extensions, HTML4 style
01:02	<tantek>	basically, they're stuff someone has found in the wild
01:04	<MikeSmith>	ok
01:04	<MikeSmith>	rel=publisher semms to be used quite a lot
01:05	<tantek>	if there's a spec for rel=publisher, and you think it's a useful value, go ahead and add it to http://microformats.org/wiki/existing-rel-values#HTML5_link_type_extensions
01:06	<tantek>	I haven't found a use for it myself so I've ignored it
01:06	<tantek>	I figure it someone cares about it enough, they'll do the minimal step of "registering" it by editing the wiki
01:15	<MikeSmith>	tantek: yeah you're right. I'll wait to see if anybody registers it.
02:24	<Hixie>	wow, a phishing attempt with the subject line "full specification"
02:24	<Hixie>	that's a bit more targetted than i expected!
03:16	<Hixie>	am i missing something, or does the JSON specification not say what the root of the grammar is
03:19	<Hixie>	rfc4627 does, so i guess i'll use that
03:19	<Hixie>	(how many specs does one format need, anyway)
03:24	<estellevw>	Question as to whether something is a bug or feature:
03:24	<estellevw>	when a min and max are both set on <input type="number"> chrome makes the input as narrow as needed for the width of the maximum value -- so when max is not set, the input is much wider. I can see that the #shadow-root <div id=inner-editor> is getting a width set, but I don't see where it is getting that width set.
03:25	<estellevw>	http://codepen.io/estelle/pen/rIlFv/ is the test case
03:25	<estellevw>	FF on the other hand has them all the same width
03:25	<Hixie>	if the question is "can a user agent vary the width of a type=number field based on the allowed range", the answer is yes. it can also vary the colour. it can also vary the width based on the time of day.
03:26	<Hixie>	it can also replace the text field with a button that pops up a dialog that asks the user for the number in roman numerals input via ASL recognised by webcam
03:28	<estellevw>	it isn't expected behavior though in the minds of most developers
03:28	<Hixie>	unfortunately, that is true
03:28	<estellevw>	it forces developers to include css width for input types because default width is no longer rational
03:28	<Hixie>	but most developers seem to forget that HTML is not primarily (let alone exclusively) a visual page description language
03:29	<estellevw>	ok, thanks
03:29	<estellevw>	so, feature, not bug. thanks
03:29	<Hixie>	i mean, there's nothing about HTML that says that the CSS will even be looked at
03:29	<Hixie>	or that the page won't be rendered by speaking it out loud
03:29	<Hixie>	or in braille
03:29	<Hixie>	on a dynamic braille display
03:31	<Hixie>	in other news, wtf, https://www.ietf.org/rfc/rfc4627.txt actually CONTRADICTS the json.org description
03:33	<Krinkle>	In what way?
03:33	<Hixie>	the rfc (and ecma 404) say that whitespace can be before or after tokens, the page says whitespace can only be between tokens.
03:35	<SimonSapin>	yay for competing specs
03:36	<zewt>	not really surprising that a description of a format based on unreadable state diagrams is imprecise
03:36	<Hixie>	the diagrams are actually very precise
03:36	<Hixie>	it's the stuff around them that's confusing
03:36	<zewt>	reminds me of sqlite's documentation: used to be readable, then at some point changed to that style of diagram, which made them completely worthless and unreadable
03:37	<Hixie>	you don't like railroad diagrams? why not?
03:37	<Hixie>	they're really nice and simple to understand
03:37	<zewt>	because they're hard to read
03:37	<Hixie>	if HTML's syntax wasn't such a mess, i'd totally use them to specify everything in HTML too
03:37	<Hixie>	i find them really easy to read
03:37	<Hixie>	you just start on one end and follow the paths
03:38	<zewt>	for documentation, at least (eg. user-facing)
03:38	<zewt>	given http://www.postgresql.org/docs/9.0/static/sql-select.html vs. https://www.sqlite.org/lang_select.html, sqlite's are utterly useless and opaque to me as a user
03:39	<Hixie>	oh man, i couldn't disagree more
03:39	<Hixie>	the postgre one there is the unreadable one
03:39	<zewt>	i can skim and understand postgresql's at a glance; i have to stare and squint at sqlite's
03:40	<Hixie>	the mysql docs use the postgre style too and i have to twease them apart each time to work out what they mean
03:40	<Hixie>	the sqlite one is just a matter of following the line, so much easier for me
03:40	<Hixie>	(the sqlite ones are even better than the json ones, since they have arrows)
03:41	<zewt>	maybe for a spec where i was writing a parser, but as a user writing SQL queries postgres's lets me see the command much more naturally
03:42	<Hixie>	oh hey, look at that. nothing in ecma 404 says that the keys in a json object must be unique, and the RFC only makes uniqueness a SHOULD.
03:42	<Krinkle>	Well, the rfc says "Insignificant whitespace is allowed before or after any of the six structural characters"
03:42	<Hixie>	zewt: i have found the opposite, personally
03:42	<Krinkle>	whereas ecma 404 says before or after any value
03:42	<zewt>	without a parser algorithm, a "must" would be pretty meaningless anyway
03:42	<Hixie>	Krinkle: right, those two agree. but json.org says "between".
03:42	<Krinkle>	so that still leaves a different regarding whitespace before e.g. a non-object as root
03:42	<zewt>	at least as far as parsers go
03:42	<Hixie>	Krinkle: well that too
03:43	<Krinkle>	even those two don't agree imho
03:43	<Krinkle>	e.g. ` "foo" `
03:43	<Hixie>	Krinkle: but the rfc is clear that only object and array are allowed as root
03:43	<Krinkle>	as the sole json packet
03:43	<zewt>	or some rule that says "if there are illegal duplicated keys, parsing fails"
03:43	<Hixie>	Krinkle: 404 and json.org don't actually say what the root of a json file is (!)
03:43	<Hixie>	Krinkle: (i was complaining about that earlier)
03:43	<zewt>	from my use of JSON, any type can be the root; "10" is valid JSON
03:44	<zewt>	(but I know there are parsers that expect the root to be a dictionary)
03:44	<Krinkle>	Yes, any JSON value (which is specified)
03:44	<Krinkle>	I don't think it should have to declare a root, you're encoding or decoding values as JSON values.
03:44	<Krinkle>	that should be sufficient
03:44	<Hixie>	the RFC disagrees
03:44	<zewt>	Krinkle: well, "value" should be the root (using json.org's terminology)
03:45	<zewt>	that should definitely be specified, if json.org is meant to be used as a spec (don't really know if it is)
03:45	Hixie	decides that for his purposes, JSON objects are gonna have to have unique keys
03:45	<Krinkle>	Where does the RFC say that root can only be array or object?
03:45	<Hixie>	section 2 paragraph 2: A JSON text is a serialized object or array.
03:45	<Hixie>	JSON-text = object / array
03:45	<Krinkle>	Right
03:46	<zewt>	seems like the main important thing is defining which key is used if there's a duplicate (first or last, presumably)
03:47	<Hixie>	zewt: given that these specs all agree that parsers can "support a superset" of json... (!)
03:47	<Krinkle>	So does this actually cause a problem in practice? Or just pointing out an oversight? I think all parsers I've seen just treat 'JSON-text = value' that's the easiest
03:47	<zewt>	i've written JSON parsers and I couldn't even say which behavior my parsers use (but they're only used in controlled environments, where it doesn't matter)
03:47	<Hixie>	Krinkle: i'm implementing a parser and have no idea what i'm supposed to be doing, either about whitespace, about the root, or about keys in objects.
03:47	<zewt>	(most likely the last)
03:48	<Hixie>	man, the lack of comments in json is a pain in teh ass
03:48	<Krinkle>	ignore whitespace, use your 'value' argorythem from the root (don't special case the root, just go straight into parsing the value),
03:48	<Krinkle>	what about keys in objects?
03:49	<zewt>	FWIW, both Chrome and Firefox's JSON.parse("10") return 10, so on that one I'd have to say the RFC (from what you've described) is wrong
03:49	<zewt>	(a data point which I'm sure you already know, heh)
03:49	<Hixie>	the "value" thing directly contradicts the RFC, and the others are entirely vague about this, so I'm not convinced about that.
03:49	<Krinkle>	it doesn't contradict it, it just supports a compatible superset.
03:49	<Krinkle>	one that is quite common
03:49	<zewt>	both of those also return 10 for " 10"
03:50	<Krinkle>	and "10" for ' "10" '
03:50	<zewt>	Krinkle: of course, if the real definition of JSON is some mysterious superset of what those "specs" say, they're pretty worthless as specs
03:50	<Krinkle>	common sense and simple/lazy implementation
03:51	<Krinkle>	and I'm sure there's a wide scale of test cases of existing implementations you can plug in to make sure you did it right
03:51	<zewt>	if the specs say whitespace can only lie between tokens and don't allow a newline at the end, that'll break tons of inputs
03:51	<zewt>	(i always output a \n at the end of JSON, so curl output isn't stupid)
03:51	<Hixie>	Krinkle: keys in objects, as in, duplicate keys
03:52	<zewt>	if you have to apply liberal common sense and compare against existing implementations to implement JSON because the JSON specs aren't enough, those JSON specs are broken.
03:52	<Krinkle>	as being the parser, it wouldn't break anything. You'd tolerate more than others if anything, more likely you'd be tolerating what everybody else tolerates.
03:52	<Krinkle>	being the encoder is slightly more difficult indeed.
03:52	<Hixie>	zewt: the json specs being broken is more or less the thesis of my rant tonight, yes.
03:53	<zewt>	JSON.parse('{"a": 1, "a": 2}').a returns 2 in both chrome and firefox, which is also what I'd expect (parse a key, write it to the dictionary, if it happens to already be in the dictionary overwrite it)
03:53	<SimonSapin>	Krinkle: if you need common sense and guessing to fill the holes in a spec, it’s a bad spec
03:54	<Krinkle>	I'm not saying it's a good spec (I think it's better than most specs and a hell of a lot easier to implement as such), just saying it seems a moot point to doubt over. I think it's interesting to talk about, but if you're unsure what to do in the actual encoding/parser writing, I'd know better.
03:55	<Krinkle>	the safest route would be to encode as minimal as possible (no whitespace of any kind, and assuming your implementation program language doesn't support dupe keys, that input isn't a problem).
03:55	<zewt>	hardly moot: if the specs are ambiguous or wrong, then they should either be fixed (if whoever's maintaining the spec is willing to fix them, which RFCs seem to have a poor record of) or replaced
03:57	<Krinkle>	and in the parser, if you encounter a dupe key you can blame the input, garbage in garbage out. throw an error, or silently keep the first or last encounter. Shouldn't matter in practice as I'd consider it invalid input.
03:57	<zewt>	json.org seems more like a description of the file format and not really a spec--it says what the file format looks like, but nothing in precise terms about what to do with it. that in mind, the main error seems to be the whitespace issue
03:57	<Krinkle>	Hm.. the spec doesnt' say keys have to be unique. interesting.
03:57	<zewt>	web specs always have to precisely define how "invalid input" is handled
03:58	<Krinkle>	None of the languages listed support that, so it's obviously an oversight (no ambiguity as what the intent was). That should be fixed indeed.
03:58	<zewt>	json.org doesn't (but it doesn't seem to be attempting to be a real spec, so that's probably not a bug)
04:06	<SimonSapin>	I like CSS Syntax’s approach of having non-normative railroad diagrams to get a idea of what the syntax looks like, and precise normative text for implementers
04:12	<Hixie>	Krinkle: the RFC says "SHOULD", which means it wasn't even an oversight there
04:13	<Hixie>	another bug... looks like there's nothing saying that lone surrogates are illegal
04:13	<Hixie>	(in escapes, i mean)
04:15	<Hixie>	interesting, leading zeros in numbers aren't allowed
04:15	<Hixie>	pity about the lack of trailing commas
04:16	<Hixie>	(in objects or arrays)
04:16	<zewt>	they don't seem to be illegal according to chrome/firefox's implementations (but I expect basically zero non-web implementations will, since if you output to UTF-8...)
04:17	<zewt>	also, whoever's responsible for infecting JSON with UTF-16 needs to be exposed and publically shamed
04:18	<Hixie>	zewt: that's just from its JS heritage, i guess
04:19	<Hixie>	ok. for my purposes, the root can be any value, whitespace is allowed anywhere outside a leaf token, duplicate keys are fatal error invalid, and lone surrogate escapes are fatal error invalid.
04:21	<zewt>	don't know your context, but for general parsing i think duplicate keys shouldn't be a fatal error; take the last seen value
04:23	<zewt>	that seems to be what most implementations land on, intentionally or not (json.loads in Python does the same)
04:23	<Hixie>	that seems like a recipe for a security bug
04:26	<zewt>	only if someone has other behavior (like picking the first-seen value), right?
04:26	<Hixie>	right
04:26	<Hixie>	in particular, if a validator does
04:27	<Hixie>	or a serialiser
04:33	<zewt>	i guess i could see a streaming parser doing something different (but a streaming parser couldn't enforce unique keys anyway) ... minor since JSON is rarely streamed, but worth mentioning i guess
04:41	<Hixie>	huh, no range on numbers, either
04:42	<zewt>	json.loads('9'*100000) gives an exact result in python, heh
06:07	<MikeSmith>	nashorn wtf
08:38	<zcorpan>	Hixie: the last 3 commit emails have an error message
09:32	<annevk>	ooh, maybe the problem is with svn.whatwg.org and not my server
09:35	<jgraham>	/win 4
10:19	<annevk>	Hixie: the JSON thing is being fixed
10:20	<annevk>	Although I wonder what the difference is between http://tools.ietf.org/html/rfc7158 and http://tools.ietf.org/html/rfc7159
10:24	<annevk>	It seems they fixed the date and removed Tim Bray's email address in a <meta> element
10:24	<annevk>	In any event, that RFC matches 404 much closer: http://tools.ietf.org/html/rfc7159#section-2
10:28	<Ms2ger>	As for the SQL definitions in the backscroll: I find neither particularly readable, but then again, I don't know SQL
10:44	<annevk>	Is Jeff basically saying power is for sale? https://twitter.com/jeff_jaffe/status/446072553820278785
10:50	<darobin>	annevk: I don't think it's clear what he's saying
10:51	<darobin>	I think that the problem he's looking at is how much team involvement a given individual member may require
10:51	<darobin>	if it's too high, that would drive the price up
10:52	<darobin>	I'm not sure that's really related to power; I reckon "power" is 1) ill-defined in this case and 2) largely orthogonal
10:53	<darobin>	I wonder if there could be an "Individual College"
10:53	<darobin>	for every N individual members, there is one seat added to the AC
10:53	<darobin>	and individual members elect representatives to those seats
10:53	<darobin>	I'm not sure that would be of any use, though
10:53	<darobin>	maybe I should join that webizen thing
10:54	darobin	sighs
11:11	<MikeSmith>	there should be a thing where, if you pay extra, you're guaranteed nobody from the team will interfere with your work
11:12	<Ms2ger>	Like, Ian Jacobs won't change my specs behind my back?
11:17	<jgraham>	Well we have that
11:17	<jgraham>	Except instead of paying extra you pay less
11:17	<jgraham>	It's called "WHATWG"
11:17	<Ms2ger>	Zing
11:24	MikeSmith	readies drm.spec.whatwg.org for non-interference-guaranteed work at whatwg
11:35	<annevk>	whatwg.org/C is out of date again :-(
12:40	<foolip_>	MikeSmith: what's that spec supposed to be?
12:40	<foolip_>	april fools?
12:42	<MikeSmith>	foolip_: hadn't thought it through yet
13:33	<zcorpan>	jgraham: does wpt-serve support range requests?
13:34	<jgraham>	zcorpan: In theory, yes
13:35	<jgraham>	I don't know if it works more than the testsuite though
13:36	<jgraham>	(that is, there are tests for it but I wouldn't bet my life on the tests or the implementation being correct)
13:37	<zcorpan>	do you know off-hand of such a test?
13:39	<jgraham>	I mean tests in the wptserve testsuite
13:40	<jgraham>	Although I think I implemented it because some test was implementing a half-assed version of Range in PHP
13:49	<jgraham>	All I remember was that it was written by Payman/Joāo
14:00	<zewt>	"HTTP is now defined by 6, not 2 specs" :\|
14:02	<annevk>	If that isn't Progress I don't know what is
14:02	<zewt>	nothing like splitting one thing into seventy to make it "easy" to find stuff
14:03	<jgraham>	6>2?
14:07	<zewt>	last i checked
14:14	<Domenic_>	Hixie: the JSON RFC is basically a fork of 404; I would not use it.
14:20	<Domenic_>	It looks like Jeff Jaffe signed up for twitter just to reply to that tweet?
14:28	<annevk>	Domenic_: that was my impression
14:29	<annevk>	jgraham: that's not how that joke works
14:34	<zcorpan>	Hixie: https://www.w3.org/Bugs/Public/show_bug.cgi?id=24860#c12
15:19	<dglazkov>	good morning, Whatwg!
16:30	<Hixie>	wow there really is no difference between 7158 and 7159. weird.
16:31	<annevk>	Hixie: they just fixed the date
16:32	<Hixie>	"fixed"?
16:33	<Hixie>	there's literally no difference between them, except the second one has errata apparently.
16:33	<Hixie>	so let me get this right.
16:33	<Hixie>	they'll publish an entirely new rfc just to update the date, but they won't publish an entirely new rfc to fix errors in the content?
16:35	<Hixie>	i mean it doesn't even really "fix" the date, since there's still an rfc with the wrong date out there now.
16:35	<annevk>	correct
16:36	<Hixie>	and this new version still doesn't fix the mess around whether values should be unique
16:36	<Hixie>	in fact it makes it even more muddled
16:36	<annevk>	it's up to the implementation, JavaScript's JSON has last wins iirc
16:37	<jgraham>	Gotta love a format specifically designed for interchange where "it's up to the implementation"
16:46	<gsnedders>	The RFC was to fix editorial issues, not to make any changes to the format (and making something defined would be a change).
16:46	<Hixie>	it did change the format
16:46	<Hixie>	quite radically, actually, from the first RFC
16:50	<annevk>	It aligned with ES5 after I and others asked them to do that
16:50	<Hixie>	ES6 just defers to 404
16:50	<Hixie>	which isn't as well-defined as the RFC
16:51	<Hixie>	(e.g. it doesn't define the root value, as we were discussing last night)
16:51	<annevk>	Yeah, Ecma 404 is what ES5 has
16:51	<annevk>	Anything can be root
16:51	<Hixie>	it doesn't say that
16:51	<Hixie>	it actually literally doesn't define the format in the most basic sense
16:51	<Hixie>	as far as i can tell
16:53	<annevk>	Hixie: it says that JSON text is a sequence of code points that conforms to the grammar
16:53	<Hixie>	right
16:53	<Hixie>	and it doesn't give "the grammar"
16:54	<annevk>	"JSON text is a sequence of tokens formed from Unicode code points that conforms to the JSON value grammar"
16:54	<Hixie>	oh, it says "the JSON Value grammar"
16:54	<Hixie>	interesting
16:54	<annevk>	seems clear enough
16:54	<Hixie>	how did i miss that like 15 times
16:54	<Hixie>	weird
16:56	<jgraham>	Possibly because it has some weirdness about "Conforming JSON text" vs "JSON Text"
16:56	<jgraham>	I actually can't tell if they are supposed to be different
16:57	<jgraham>	It looks like maybe "JSON Text" is a superset of "Conforming JSON Text"
16:57	<jgraham>	But "Conforming JSON Text" has to "strictly" match "the JSON grammar", which is undefined
16:58	<jgraham>	("JSON Text" merely has to "conform to" (not "strictly") "The JSON Value grammar")
16:59	<jgraham>	(but it's hard to tell if "conforming" vs "strictly conforming" is suspposed to be a substantive difference)
17:18	<annevk>	It isn't really hard to tell, but you could file some bugs for improvement
17:18	<annevk>	TabAtkins: you around?
17:19	<jgraham>	It's hard to tell in the sense that from the ECMA spec I guenuinely don't know
17:20	<jgraham>	*genuinely
17:28	<gsnedders>	The aim of ECMA 404 was to define grammar, not semantics. Which is odd.
17:29	<annevk>	That's always been the goal of JSON though
17:29	<jgraham>	Not really
17:29	<jgraham>	I mean
17:29	<annevk>	Implementations do things like rounding on the numbers and such too, which isn't really forbidden either
17:30	<annevk>	It was the goal of its creator, unless he changed his mind on the goal midway through
17:30	<annevk>	For a while he didn't even want to define the alphabet in use, until we told him that was a bad idea
17:31	<jgraham>	it clearly does define some semantics
17:31	<jgraham>	It more or less defines how the numbers work
17:31	<zcorpan>	in my json parser, [] is an elephant
17:31	<zcorpan>	a real one
17:32	<jgraham>	You would be hard pushed to argue that 10e17 in JSON could be interpreted as 27 or something
17:32	<jgraham>	Although it doesn't define what + or - means
17:32	<annevk>	jgraham: sure, but if you don't use decimal storage, are you non-conforming?
17:33	<jgraham>	Basically istm that Crockford isn't to be trusted with this kind of thing and that JSON has succeeded in spite of him rather than because of him
17:33	<jgraham>	So saying "well the creator wanted X" doesn't seem like a great argument
17:34	<annevk>	I was talking about goals
17:34	<annevk>	In any event, this doesn't seem like a great use of my time
17:35	<jgraham>	I highly doubt it was his goal to create a format that couldn't actually be used for interchange reliably
17:35	<jgraham>	and if it was his goal seems like one that no one should share
17:35	<jgraham>	So either way it seems quite irrelevant
17:37	<annevk>	I think it actually makes sense. It breaks down a bit with generic parsers. But lots of things will be decided at the application layer anyway.
17:38	<Hixie>	json succeeded for the same reason xml succeeded (and did better than xml because it is a simpler format than xml)
17:38	<Hixie>	the reason is, people have an irrational fear of defining custom core syntaxes
17:39	<jgraham>	No
17:39	<Hixie>	people think that if you define a vocabulary on top of a core syntax, it's better than defining a vocabulary and a core syntax together
17:39	<Hixie>	which confuses me greatly, especially when the formats they use don't really fit the problem space
17:39	<jgraham>	It's because having a simple to work with format that has prewritten, predebugged parsers in a range of langauges is a huge win over custom-everything
17:40	<jgraham>	It means that you don't have to keep learning people's half-baked formats
17:40	<jgraham>	And makes interop simpler
17:40	<Hixie>	yeah instead you have to write custom vocabulary interpreters for half-baked vocabularies that you keep having to learn
17:41	<Hixie>	and interop fails because neither the syntax nor the vocabulary define error handling
17:41	<jgraham>	Which is a much easier problem, it turns out, and one that you would have to solve anyway
17:41	<Hixie>	how is it easier? it's the same.
17:41	<Hixie>	you just change your lexical space from unicode characters to different tokens
17:42	<jgraham>	Not at all. If I want to interop with, say, the github API I just have to use an off-the-shelf json lib that I have used hundreds of times before and write some simple code to extract the data I care about
17:43	<jgraham>	If they had invented GitHub-ON for the purpose I would have to either write a parser or learn their library that I had never used before
17:43	<Hixie>	that fails in the same way that people using "simple code to extract the data" they care about from HTML fails
17:43	<jgraham>	and still write the code to extract the data from the file
17:43	<jgraham>	It actually doesn't
17:43	<jgraham>	That's why the format has been a success
17:44	<jgraham>	In spite of the fact that it's horribly flawed in several ways
17:44	<jgraham>	and the people speccing it have managed to make a complete clusterfuck of something that could have been rather straightforward
17:45	<zcorpan>	i saw somewhere someone was working on a "JSON5" which supported more things like comments and unquoted keys
17:45	<zcorpan>	and trailing commas
18:18	<SamB>	... personally I think JSON+C is exactly the right thing. Except that stupid UTF-16 stuff.
18:18	<SamB>	(But that's what you get for basing it on JS syntax ...)
18:18	<TabAtkins>	annevk: I'm around now.
18:19	<annevk>	TabAtkins: any interest in tackling my Selectors questions?
18:19	<TabAtkins>	Point me to them?
18:19	<annevk>	TabAtkins: emailed www-style
18:19	<TabAtkins>	Ah, kk. I'll respond.
18:19	TabAtkins	hasn't checked his email yet this morning.
18:19	<annevk>	Basically wondering if I'm invoking the correct hooks and what hooks to use for matches()
18:20	<SamB>	Hixie: I think it drastically reduces the number of sharp edge cases that need to be dealt with, or at least localizes them a lot better ...
18:21	<Hixie>	that's jgraham's position too, i think
18:21	<Hixie>	i think it just hides them more
18:21	<Hixie>	which makes them less likely to be handled
18:22	<SamB>	replicating someone else's buggy parser in another language is not most people's idea of fun
18:22	<Hixie>	but replacting someone else's buggy vocabulary interpreter in another language is?
18:22	<Hixie>	replicating
18:23	<SamB>	well, many programs don't need to understand the whole vocabulary
18:23	<Hixie>	that's the same logic that leads to people writing parsers that don't need to handle the whole syntax
18:23	<SamB>	that's often not possible
18:24	<SamB>	what I mean is that if you are handed a data structure, you don't have to look in every nook and cranny; you only need to look in the places relevant to the task at hand
18:24	<Hixie>	assuming those places exist. and are the right type. and aren't out of range. and...
18:25	<SamB>	okay, yes, true
18:26	<SamB>	but given that many of these people aren't going to be doing proper error checking ANYWAY ...
18:26	<Hixie>	in other news, i've just realised that in json, numbers are special in that they're the one token whose end is determined by look-ahead.
18:26	<Hixie>	how annoying.
18:27	<SamB>	what, no lexer?
18:27	<Hixie>	?
18:28	<SamB>	... why is this a problem? Are you writing the lexer by hand?
18:28	<annevk>	Hixie: in what environment are you implementing your own JSON parser?
18:28	<Hixie>	SamB: yeah
18:29	<SamB>	do you not have a *lex you could use?
18:29	<Hixie>	annevk: freepascal. there's lots of existing ones, i just figured it would be fun.
18:29	<annevk>	Hixie: I see
18:29	<annevk>	Hixie: are you adding comment support? :-)
18:29	<SamB>	Hixie: please tell me you're actually implementing JSON+C, yes
18:30	<Hixie>	SamB: i could use a lexer. i happen to chose not to this time. :-)
18:30	<Hixie>	SamB: i'm implementing whatever is needed to parse the tokeniser tests in html5lib's test suite :-)
18:31	<annevk>	Oh my
18:31	<SamB>	so why is it that you're using Object Pascal?
18:31	<gsnedders>	We do touch a fair few bits of edge-cases. :)
18:31	<Hixie>	gsnedders: hehe
18:31	<annevk>	This new version of Anolis is going to be built on primitives you implemented yourself Hixie? :-P
18:31	<Hixie>	SamB: is best language.
18:31	<gsnedders>	:)
18:31	<Hixie>	annevk: yep :-)
18:32	<Hixie>	including my own utf-8 decoder :-)
18:32	<annevk>	Hixie: please make it somewhat clean this time so we can see the source code
18:33	<Hixie>	hah
18:33	<Hixie>	i make no promises
18:33	SamB	wonders why Object Pascal is so little heard of
18:33	<Hixie>	SamB: it was pretty popular on windows for a while (under the name Delphi)
18:33	<SamB>	possibly it has had too many names and too few implementations?
18:34	<Hixie>	but yeah, i dunno why it's not more popular
18:34	<SamB>	Hixie: yes, I know, and it's still used there
18:37	<Hixie>	hahaha, json's silly surrogate escape thing triggered my utf-8 system's "surrogates aren't allowed" assertion
18:38	<gsnedders>	:)
18:38	<SamB>	Hixie: what did you do to cause that?
18:38	<gsnedders>	Yup, we have lone surrogates in the html5lib tokenizer JSON.
18:38	<gsnedders>	They're perfectly allowed in JSON :)
18:38	<Hixie>	uh
18:38	<SamB>	I mean why is this getting into UTF-8
18:38	<SamB>	gsnedders: eww
18:38	<Hixie>	SamB: i use utf-8 as my internal representation
18:39	<Hixie>	gsnedders: huh
18:39	<gsnedders>	SamB: We need to test lone surrogates are handled correctly!
18:39	<gsnedders>	Hixie: huh at wha?
18:39	SamB	goes to read the spec ...
18:39	<Hixie>	gsnedders: how do you get lone surrogates out of the html parser?
18:39	<Hixie>	SamB: the json spec is pretty messed up when it comes to surrogates
18:39	<gsnedders>	Hixie: Out of it? We don't. But we have them in the input stream.
18:39	<Hixie>	gsnedders: ahhh...
18:39	<Hixie>	interesting
18:40	<Hixie>	well, my input stream can't support lone surrogates
18:40	<Hixie>	so i'm probably ok just skipping those tests
18:40	<SamB>	is there a reason why JSON is ECMA 404?
18:40	<Hixie>	i guess i'll turn lone surrogates into FFFD
18:40	<Hixie>	(in the json parser)
18:42	SamB	wants a font where U+FFFD is represented by logo-encoding.svg -- colors and all!
18:42	<Hixie>	hm well that makes unicode escapes into another thing that needs lookahead
18:47	<TabAtkins>	Hixie: Why are you using utf-8 as the internal representation? That's an encoding, it's weird to use that internally. Just use arrays of codepoints.
18:48	<dglazkov>	what should this show? http://jsbin.com/bubot/1/edit
18:48	<gsnedders>	TabAtkins: Why would you use arrays of codepoints? That's massively wasteful, esp. if it's mostly ASCII.
18:48	<TabAtkins>	dglazkov: What do you think it should show?
18:48	<TabAtkins>	gsnedders: Because it's simpler? Or use a unicode string, if your language provides that.
18:49	<dglazkov>	ARIAL in arial, INITIAL in times new roman or whatever UA's initial value is?
18:49	<TabAtkins>	Ah, I missed that the initial value is generally a serif font.
18:51	<TabAtkins>	http://jsbin.com/zugojoxa/1/edit?html,output
18:51	<TabAtkins>	This shows the problem a little more clearly - the two "INITIAL"s should be the same font.
18:51	<TabAtkins>	annevk: Just to make sure - these are hooks you need for .query() and .matches()?
18:52	<dglazkov>	TabAtkins: I wonder what mozilla/ie do here?
18:53	<TabAtkins>	I'm on ChromeOS, so I can't tell.
18:53	<dglazkov>	me too :-\
18:55	<dglazkov>	TabAtkins: but this is a bug, right?
18:55	<TabAtkins>	Yes.
18:56	<SamB>	I'm confused: https://tools.ietf.org/rfcdiff?difftype=--hwdiff&url1=rfc7158&url2=rfc7159
18:56	<Hixie>	TabAtkins: because arrays of codepoints take 8 bytes per character and require that the entire input be copied, rather than the input taking 1 byte per character and the data not needing to be copied?
18:57	<SamB>	it doesn't seem like there were any changes other than the change in RFC number and in the "Obsoletes:" line ...
18:57	<TabAtkins>	Then you have to accept encoding limitations, like the fact that you can't encode a lone surrogate in valid utf-8.
18:57	<TabAtkins>	SamB: Yeah, looks like it.
18:57	<Hixie>	yup
18:57	<Hixie>	i am very happy to accept that limitation :-)
18:57	<SamB>	and the year
18:58	<dglazkov>	TabAtkins: gecko gets it right
18:59	<TabAtkins>	It's probably something to do with our bizarre parsing of 'font'.
18:59	<TabAtkins>	Well, hm, never mind, that still doesn't make sense.
19:00	<Hixie>	woot, my json parser found a bug in my test rather than the other way around
19:00	<dglazkov>	TabAtkins: nah. I found this by code inspection. We just don't do anything sensible there. And I wondered if this was intentional
19:00	<TabAtkins>	(I was wondering if it was parsing as a font named "inherit", but that wouldn't help - it would just do fallback, which should produce the default font.)
19:01	<dglazkov>	TabAtkins: we get right to the point where we need to apply property "initial", and then we just go "weeee" and leave
19:01	<TabAtkins>	Fun.
19:02	<SamB>	ouch! http://timelessrepo.com/json-isnt-a-javascript-subset
19:09	<SamB>	oh fun, if I pass difftype=--help to the rfcdiff page, it outputs plaintext as HTML ...
19:09	<SamB>	I mean as text/html
19:12	<SamB>	oh, and otherwise it produces what looks like it's intended to be XHTML labeled as text/html ...
19:22	<annevk>	TabAtkins: I need hooks for querySelector, query, and matches
19:22	<annevk>	TabAtkins: querySelector and matches both take an absolute selector afaict
19:22	<annevk>	TabAtkins: query takes a relative
19:22	<TabAtkins>	querySelector is an absolute scope-filtered, possibly with a reference set.
19:23	<TabAtkins>	query is relative, definitely with a reference set.
19:23	<TabAtkins>	matches is absolute, definitely with a reference set.
19:24	<TabAtkins>	(The only effect of having a reference set is giving meaning to :scope.)
19:25	<annevk>	Oh, I thought scoping root was for that
19:25	<TabAtkins>	Nope, that's only if you're scoping.
19:25	<TabAtkins>	Sorry for the confusing wording, but :scope got named before anything else.
19:26	<TabAtkins>	And the scoping root is the default reference set, if you don't specify anything else.
19:26	<TabAtkins>	You generally don't want to scope. querySelector() does, but really only because it didn't have relative selectors at the time.
19:26	<TabAtkins>	<style scoped> is the only other thing that uses scoping.
19:26	TabAtkins	is off to lunch for a bit, will answer any further questions in an hour or so.
19:27	<annevk>	Oh okay. So querySelector using a scoping root is fine.
19:27	<annevk>	However, matches should use a reference set so no scoping is done
19:28	<SamB>	relative selectors?
19:30	<SamB>	hmm, well, ECMA 404 doesn't say you can have unpaired surrogates in your JSON
19:30	<annevk>	Okay, I should look at this again tomorrow, thanks for the pointers so far TabAtkins
19:31	<annevk>	TabAtkins: I do find it a bit odd that you have API hooks for selectors separate from the general selector matching (where is the algorithm for that? and why is it not linked from the API hooks section?)
19:33	SamB	wonders what Haskell does if you have surrogates in your Strings masquerading as Chars
19:36	<gsnedders>	Is there any sane way to find a font that contains a given Unicode codepoint on OS X?
19:36	<gsnedders>	Like, I blatantly have one as it manages to font-switch in places for it.
19:36	<gsnedders>	But I can't tell what font it comes from
20:23	<gsnedders>	SamB: It'll allow them, because Char is just an integral type, of range 0–0x10FFFF
20:33	<Hixie>	yeah that's basically what i did too, except that i have an operator overload for assignment that checks for surrogates :-)
20:33	<Hixie>	(and i allow -1 to mean eof)
20:33	<TabAtkins>	SamB: http://dev.w3.org/csswg/selectors/#relative
20:34	<gsnedders>	You need a type system with dependent types to allow only matched surrogates
20:34	<Hixie>	uh no, i want no surrogates :-)
20:35	<gsnedders>	That's easy :P
20:39	<Hixie>	annevk: is there anything in particular i need to do on https://www.w3.org/Bugs/Public/show_bug.cgi?id=24810 or did you reassign to me just so i could look it over? (it looks good)
20:39	<annevk>	Hixie: I assigned it to you so you could remove the bits in HTML
20:40	<annevk>	Hixie: e.g. scripting environment is no longer a thing DOM has now or needs HTML to define so you can remove that
20:40	<annevk>	Hixie: and under microtask checkpoint there's a bit of cleanup you can do
20:40	<Hixie>	aaah right
20:40	<Hixie>	cool
20:40	<Hixie>	thanks
21:22	<SamB>	gsnedders: Char is not quite an integral type, but yeah, I guess it has allow them given the way Enum works ...
21:22	<SamB>	and Bounded
21:26	<gsnedders>	SamB: Okay, it's not an integral type, but it has a 1:1 mapping to one
21:28	<SamB>	back to JSON, RFC 715[89] also doesn't permit unpaired surrogates, but in section 8.2 warns that they have been seen in the wild and that the behaviour of software encountering them is unpredictable
21:28	<SamB>	https://tools.ietf.org/html/rfc7159#section-8.2
21:29	<gsnedders>	Oooh! That's a change!
22:12	<jgraham>	Hmm, that section says that it does permit unpaired surrogates
22:13	<jgraham>	It just says that the behaviour of unpaired surrogates is undefined
22:14	<zewt>	i'd sooner have it defined as outputting FFFE than being a parse error
22:15	<jgraham>	The whole thing is pretty woeful
22:15	<zewt>	so if some API client inputs a string from a user in a UTF-16 env, and the user pastes in a string with a broken surrogate, it doesn't become a server-side error later
22:15	<jgraham>	You are kind of expected to guess how string escapes work
22:16	<zewt>	eg. i'd rather there be no possible invalid user inputs as a string, even if it means some (invalid surrogates) not round-tripping through some paths
22:16	<jgraham>	zewt: AFAICT the spec simply fails to define how strings are interpreted at all
22:17	<zewt>	sure, i'm just saying how i'd prefer it
22:17	<jgraham>	I agree that fatal errors aren't great
22:17	<zewt>	in practice that may be unlikely: most JSON encoders I've used just output UTF-8 for everything and never use \u escapes anyway
22:18	<zewt>	(or in the case of JS, outputs UTF-16 codepoints that get encoded to UTF-8 later)
22:31	<SamB>	gsnedders: well, I mean, the ABNF doesn't rule them out, but there are no semantics given for them in the prose
22:35	<SamB>	what was that April 1 RFC with the, er, disillusioned definitions for keywords?
22:37	<gsnedders>	jgraham: What do you want to do with serializer tests for html5lib, BTW?
22:38	<gsnedders>	jgraham: Given they depend upon so many serialization options, and there are infinitely many valid serializations…
23:04	<jgraham>	gsnedders: I don't think I have a strong opinion. It makes sense to have some tests for html5lib itself
23:04	<jgraham>	I'm not sure that they are worth sharing with other projects though
23:10	<SimonSapin>	SamB: 6919
23:11	<SimonSapin>	jgraham: could you test round-tripping rather than exact serializations?
23:15	<gsnedders>	SimonSapin: Yes, though obviously some tests must test exact serializations
23:15	<Hixie>	does anyone here know anything about MSE?
23:15	<Hixie>	(https://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html)
23:16	<gsnedders>	SimonSapin: Also note not everything round-trips
23:16	<SimonSapin>	gsnedders: why, and why?
23:17	<gsnedders>	SimonSapin: e.g., We need to make sure serialization of attribute values doesn't expose XSS bugs in old IE
23:17	<gsnedders>	SimonSapin: and e.g., (XML) <table><tr><td>foo
23:17	<gsnedders>	SimonSapin: (a tree with no tbody)
23:18	<SimonSapin>	gsnedders: but parse+serialize+parse should still be the same as parse, right?
23:19	<gsnedders>	SimonSapin: No. Well, there is obviously a serialization, but some parse errors create odd trees.
23:19	<gsnedders>	SimonSapin: Like foster parenting can cause odd things
23:19	<SimonSapin>	isn’t serialize+parse idempotent?
23:20	<SimonSapin>	(not parse+serialize)
23:20	<gsnedders>	SimonSapin: Given parse+serialize+parse, "<p><table><p>" is hard to handle.
23:20	<gsnedders>	SimonSapin: Because it's serialization isn't at all obvious from the tree it produces.
23:20	<gsnedders>	*its
23:20	<SimonSapin>	I don’t see the problem
23:21	<gsnedders>	At the moment we only try to serialize trees that a conforming input can create. i.e., not ones like that
23:21	<SamB>	hmm, you would certainly think that given a just-parsed document, serialize\|parse would be idempotent
23:21	<SimonSapin>	SamB: yes, that’s what I mean
23:21	<SimonSapin>	is that not the case for HTML?
23:22	<gsnedders>	It is.
23:22	<gsnedders>	It's just the serialize case gets exceptionally hard if you want to make it complete.
23:22	<gsnedders>	Like, for <p><table><p> you have to go from an XML infoset like <p><p/><table/></p> to having the second p appear within the table.
23:23	<gsnedders>	Because you can't serialize that tree as <p><p><table></table></p></p> in HTML.
23:23	<SimonSapin>	gsnedders: do you mean that only testing idempotency it not that useful because you can make a serializer that’s idempotent bug wrong? (e.g always return the empty string)
23:23	<SamB>	SimonSapin: that wouldn't make serialize\|parse idempotent
23:23	<gsnedders>	SimonSapin: No, I mean it's impractically hard to do.
23:24	<gsnedders>	SimonSapin: Because a serializer that's idempotent is really complex to handle cases like (XML) <p><p/><table/></p>
23:25	<gsnedders>	SimonSapin: I played about with taking all parser tests and checking serializerparser
23:25	<SamB>	so what was the goal again?
23:25	<gsnedders>	The goal is to improve the current testing situation of html5lib's serializer. Which current relies on shared tests in html5lib-tests dependent upon serialization choice.
23:33	<gsnedders>	https://gist.github.com/gsnedders/9653913 is what currently fails to roundtrip with html5lib. Some (e.g., the script stuff) are obviously bugs.
23:34	<gsnedders>	<p><b><i><u></p>\n<p>X is gonna be very hard to serialize correctly
23:35	<Hixie>	why?
23:37	<gsnedders>	Well, "<p><b><i><u></u></i></b></p><b><i><u>\n<p>X</u></i></b>" doesn't correspond to the same thing, despite being the obvious serialization of the tree
23:38	<gsnedders>	Or rather, the logic for when you omit the p closing tag is complicated.
23:38	<Hixie>	oh
23:38	<gsnedders>	Okay, maybe that isn't as bad.
23:38	<Hixie>	i would just never omit closing tags
23:38	<Hixie>	:-D
23:38	<gsnedders>	I thought that was the weird case where AAA stuff made it horrible.
23:38	<gsnedders>	idk.
23:39	<gsnedders>	I don't have time for this.
23:39	<gsnedders>	:)
23:39	<gsnedders>	This isn't my dissertation. :)
23:41	<gsnedders>	Hixie: Though that does mean the informative description of when p end tags can be omitted in the spec is wrong :)
23:42	<gsnedders>	Or is Writing HTML Documents normative?
23:42	<gsnedders>	It appears to be normative for documents, authoring tools, and markup generators.
23:42	<gsnedders>	In which case the normative description is wrong :)
23:52	<Hixie>	gsnedders: it's not wrong. it just doesn't handle non-conforming cases since those cases are already non-conforming.
23:53	<Hixie>	gsnedders: if a tool outputs <b><p> then it's bogus regardless of where it closes the </b>
23:53	<gsnedders>	Bah!
23:53	<Hixie>	i'm just sayin'
23:54	<gsnedders>	See, this is what makes serialize\|parse idempotency hard!
23:54	<gsnedders>	Like, sure, yeah, obviously any tree the parser creates can be serialized.
23:54	<Hixie>	should just reguse to serialise anything that's non-conforming
23:54	<Hixie>	refuse
23:54	<gsnedders>	To what degree on non-conformity?
23:54	<gsnedders>	*of
23:54	<Hixie>	human-checked!
23:54	<Hixie>	:-D
23:54	<gsnedders>	:)
23:55	<gsnedders>	Do we allow unknown elements? Because their parse-model could change!
23:55	<Hixie>	(or, document your tool as requiring conforming input, and if it's given non-conforming input, say that the output could be garbage.)
23:55	<gsnedders>	Yeah, that's the sensible approach. :)
23:56	<gsnedders>	I originally did this just because I wanted to see how hard it'd be to guarantee serializer\|\|parser idempotency given a tree from the parser. Because if it was easy then we could trivially get way more tests. :)
23:57	<gsnedders>	There's no schema representing all the content model restrictions, is there?