| 00:07 | <Hixie> | wtf |
| 00:07 | <Hixie> | json encodes astral characters as their UTF-16 surrogates encoded in ASCII??? |
| 00:07 | <JosephSilber> | I think I found a bug in Chrome: http://codepen.io/JosephSilber/pen/dFgxo/ |
| 00:08 | <JosephSilber> | Toggling the "absolute" class there doesn't affect the parent's width. |
| 00:08 | <Hixie> | crbug.com/new |
| 00:08 | <JosephSilber> | Adding the "absolute" class directly in the HTML does work: http://codepen.io/JosephSilber/pen/LiCKA/ |
| 00:09 | <JosephSilber> | Hixie: Yeah. Just checking here first if I'm misunderstanding expected behavior. |
| 00:09 | <Hixie> | i don't understand what's going on in that test |
| 00:10 | <Hixie> | there's three files? |
| 00:10 | <JosephSilber> | Hixie: there are two flex containers, nested. |
| 00:10 | <JosephSilber> | The inner container should collapse to its content's width. |
| 00:11 | <SimonSapin> | Hixie: yes. The J in JSON is JavaScript, 16bit strings and all |
| 00:11 | <JosephSilber> | Setting one of its children's position to absolute should collpase the container's width. |
| 00:11 | <SimonSapin> | at least, that’s if you want to backslash-escape such characters. I think just having them literally in UTF-8 should also work |
| 00:11 | <JosephSilber> | Work well in Firefox, and also works well in Chrome when not doing it dynamically. |
| 00:13 | <Hixie> | JosephSilber: is http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=2893 equivalent to your test? |
| 00:14 | <JosephSilber> | Hixie: yes. Shouldn't .inner's width collpase? Like it does here: http://codepen.io/JosephSilber/pen/LiCKA/ |
| 00:14 | <Hixie> | JosephSilber: it seems to work to me... when you set the attribute, the element becomes abs pos and the right thing moves under it. am i missing something? |
| 00:15 | <JosephSilber> | Hixie: check it out in ff |
| 00:15 | <Hixie> | oh i see, the width is on the inner ones, not the outer ones |
| 00:15 | <Hixie> | yup, looks like a bug |
| 00:15 | <JosephSilber> | k. will report |
| 00:16 | <Hixie> | (btw when making a test case you really want to use as little as possible. so e.g. all the styles on the button are extraneous here and should be dropped in the test. having a separate <script> block is extraneous if you can just do it inline. etc.) |
| 00:16 | <Hixie> | (i thought the rgba() thing was especially amusing :-) ) |
| 00:16 | <JosephSilber> | It's scss |
| 00:19 | <Hixie> | scss? |
| 00:19 | <Hixie> | in a test? :-) |
| 00:22 | <JosephSilber> | ha |
| 00:23 | <TabAtkins> | JosephSilber: Yeah, that's a bug. |
| 00:24 | <JosephSilber> | reporting |
| 00:26 | <SamB> | SimonSapin: raw UTF-8, huh? well, as long as we aren't applying "UTF-8" to the UTF-16 ... |
| 00:26 | <SamB> | (which is actually called something else) |
| 00:27 | <SimonSapin> | wait, what? |
| 00:27 | <SimonSapin> | no, it’s not CESU-8 |
| 00:29 | <JosephSilber> | https://code.google.com/p/chromium/issues/detail?id=353837&thanks=353837&ts=1395188953 |
| 01:00 | <MikeSmith> | tantek: are all of the rel values in the POSH table of the link-relations page meant to be considered conforming/valid? |
| 01:01 | <tantek> | no |
| 01:02 | <tantek> | they're just random author extensions, HTML4 style |
| 01:02 | <tantek> | basically, they're stuff someone has found in the wild |
| 01:04 | <MikeSmith> | ok |
| 01:04 | <MikeSmith> | rel=publisher semms to be used quite a lot |
| 01:05 | <tantek> | if there's a spec for rel=publisher, and you think it's a useful value, go ahead and add it to http://microformats.org/wiki/existing-rel-values#HTML5_link_type_extensions |
| 01:06 | <tantek> | I haven't found a use for it myself so I've ignored it |
| 01:06 | <tantek> | I figure it someone cares about it enough, they'll do the minimal step of "registering" it by editing the wiki |
| 01:15 | <MikeSmith> | tantek: yeah you're right. I'll wait to see if anybody registers it. |
| 02:24 | <Hixie> | wow, a phishing attempt with the subject line "full specification" |
| 02:24 | <Hixie> | that's a bit more targetted than i expected! |
| 03:16 | <Hixie> | am i missing something, or does the JSON specification not say what the root of the grammar is |
| 03:19 | <Hixie> | rfc4627 does, so i guess i'll use that |
| 03:19 | <Hixie> | (how many specs does one format need, anyway) |
| 03:24 | <estellevw> | Question as to whether something is a bug or feature: |
| 03:24 | <estellevw> | when a min and max are both set on <input type="number"> chrome makes the input as narrow as needed for the width of the maximum value -- so when max is not set, the input is much wider. I can see that the #shadow-root <div id=inner-editor> is getting a width set, but I don't see where it is getting that width set. |
| 03:25 | <estellevw> | http://codepen.io/estelle/pen/rIlFv/ is the test case |
| 03:25 | <estellevw> | FF on the other hand has them all the same width |
| 03:25 | <Hixie> | if the question is "can a user agent vary the width of a type=number field based on the allowed range", the answer is yes. it can also vary the colour. it can also vary the width based on the time of day. |
| 03:26 | <Hixie> | it can also replace the text field with a button that pops up a dialog that asks the user for the number in roman numerals input via ASL recognised by webcam |
| 03:28 | <estellevw> | it isn't expected behavior though in the minds of most developers |
| 03:28 | <Hixie> | unfortunately, that is true |
| 03:28 | <estellevw> | it forces developers to include css width for input types because default width is no longer rational |
| 03:28 | <Hixie> | but most developers seem to forget that HTML is not primarily (let alone exclusively) a visual page description language |
| 03:29 | <estellevw> | ok, thanks |
| 03:29 | <estellevw> | so, feature, not bug. thanks |
| 03:29 | <Hixie> | i mean, there's nothing about HTML that says that the CSS will even be looked at |
| 03:29 | <Hixie> | or that the page won't be rendered by speaking it out loud |
| 03:29 | <Hixie> | or in braille |
| 03:29 | <Hixie> | on a dynamic braille display |
| 03:31 | <Hixie> | in other news, wtf, https://www.ietf.org/rfc/rfc4627.txt actually CONTRADICTS the json.org description |
| 03:33 | <Krinkle> | In what way? |
| 03:33 | <Hixie> | the rfc (and ecma 404) say that whitespace can be before or after tokens, the page says whitespace can only be between tokens. |
| 03:35 | <SimonSapin> | yay for competing specs |
| 03:36 | <zewt> | not really surprising that a description of a format based on unreadable state diagrams is imprecise |
| 03:36 | <Hixie> | the diagrams are actually very precise |
| 03:36 | <Hixie> | it's the stuff around them that's confusing |
| 03:36 | <zewt> | reminds me of sqlite's documentation: used to be readable, then at some point changed to that style of diagram, which made them completely worthless and unreadable |
| 03:37 | <Hixie> | you don't like railroad diagrams? why not? |
| 03:37 | <Hixie> | they're really nice and simple to understand |
| 03:37 | <zewt> | because they're hard to read |
| 03:37 | <Hixie> | if HTML's syntax wasn't such a mess, i'd totally use them to specify everything in HTML too |
| 03:37 | <Hixie> | i find them really easy to read |
| 03:37 | <Hixie> | you just start on one end and follow the paths |
| 03:38 | <zewt> | for documentation, at least (eg. user-facing) |
| 03:38 | <zewt> | given http://www.postgresql.org/docs/9.0/static/sql-select.html vs. https://www.sqlite.org/lang_select.html, sqlite's are utterly useless and opaque to me as a user |
| 03:39 | <Hixie> | oh man, i couldn't disagree more |
| 03:39 | <Hixie> | the postgre one there is the unreadable one |
| 03:39 | <zewt> | i can skim and understand postgresql's at a glance; i have to stare and squint at sqlite's |
| 03:40 | <Hixie> | the mysql docs use the postgre style too and i have to twease them apart each time to work out what they mean |
| 03:40 | <Hixie> | the sqlite one is just a matter of following the line, so much easier for me |
| 03:40 | <Hixie> | (the sqlite ones are even better than the json ones, since they have arrows) |
| 03:41 | <zewt> | maybe for a spec where i was writing a parser, but as a user writing SQL queries postgres's lets me see the command much more naturally |
| 03:42 | <Hixie> | oh hey, look at that. nothing in ecma 404 says that the keys in a json object must be unique, and the RFC only makes uniqueness a SHOULD. |
| 03:42 | <Krinkle> | Well, the rfc says "Insignificant whitespace is allowed before or after any of the six structural characters" |
| 03:42 | <Hixie> | zewt: i have found the opposite, personally |
| 03:42 | <Krinkle> | whereas ecma 404 says before or after any value |
| 03:42 | <zewt> | without a parser algorithm, a "must" would be pretty meaningless anyway |
| 03:42 | <Hixie> | Krinkle: right, those two agree. but json.org says "between". |
| 03:42 | <Krinkle> | so that still leaves a different regarding whitespace before e.g. a non-object as root |
| 03:42 | <zewt> | at least as far as parsers go |
| 03:42 | <Hixie> | Krinkle: well that too |
| 03:43 | <Krinkle> | even those two don't agree imho |
| 03:43 | <Krinkle> | e.g. ` "foo" ` |
| 03:43 | <Hixie> | Krinkle: but the rfc is clear that only object and array are allowed as root |
| 03:43 | <Krinkle> | as the sole json packet |
| 03:43 | <zewt> | or some rule that says "if there are illegal duplicated keys, parsing fails" |
| 03:43 | <Hixie> | Krinkle: 404 and json.org don't actually say what the root of a json file is (!) |
| 03:43 | <Hixie> | Krinkle: (i was complaining about that earlier) |
| 03:43 | <zewt> | from my use of JSON, any type can be the root; "10" is valid JSON |
| 03:44 | <zewt> | (but I know there are parsers that expect the root to be a dictionary) |
| 03:44 | <Krinkle> | Yes, any JSON value (which is specified) |
| 03:44 | <Krinkle> | I don't think it should have to declare a root, you're encoding or decoding values as JSON values. |
| 03:44 | <Krinkle> | that should be sufficient |
| 03:44 | <Hixie> | the RFC disagrees |
| 03:44 | <zewt> | Krinkle: well, "value" should be the root (using json.org's terminology) |
| 03:45 | <zewt> | that should definitely be specified, if json.org is meant to be used as a spec (don't really know if it is) |
| 03:45 | Hixie | decides that for his purposes, JSON objects are gonna have to have unique keys |
| 03:45 | <Krinkle> | Where does the RFC say that root can only be array or object? |
| 03:45 | <Hixie> | section 2 paragraph 2: A JSON text is a serialized object or array. |
| 03:45 | <Hixie> | JSON-text = object / array |
| 03:45 | <Krinkle> | Right |
| 03:46 | <zewt> | seems like the main important thing is defining which key is used if there's a duplicate (first or last, presumably) |
| 03:47 | <Hixie> | zewt: given that these specs all agree that parsers can "support a superset" of json... (!) |
| 03:47 | <Krinkle> | So does this actually cause a problem in practice? Or just pointing out an oversight? I think all parsers I've seen just treat 'JSON-text = value' that's the easiest |
| 03:47 | <zewt> | i've written JSON parsers and I couldn't even say which behavior my parsers use (but they're only used in controlled environments, where it doesn't matter) |
| 03:47 | <Hixie> | Krinkle: i'm implementing a parser and have no idea what i'm supposed to be doing, either about whitespace, about the root, or about keys in objects. |
| 03:47 | <zewt> | (most likely the last) |
| 03:48 | <Hixie> | man, the lack of comments in json is a pain in teh ass |
| 03:48 | <Krinkle> | ignore whitespace, use your 'value' argorythem from the root (don't special case the root, just go straight into parsing the value), |
| 03:48 | <Krinkle> | what about keys in objects? |
| 03:49 | <zewt> | FWIW, both Chrome and Firefox's JSON.parse("10") return 10, so on that one I'd have to say the RFC (from what you've described) is wrong |
| 03:49 | <zewt> | (a data point which I'm sure you already know, heh) |
| 03:49 | <Hixie> | the "value" thing directly contradicts the RFC, and the others are entirely vague about this, so I'm not convinced about that. |
| 03:49 | <Krinkle> | it doesn't contradict it, it just supports a compatible superset. |
| 03:49 | <Krinkle> | one that is quite common |
| 03:49 | <zewt> | both of those also return 10 for " 10" |
| 03:50 | <Krinkle> | and "10" for ' "10" ' |
| 03:50 | <zewt> | Krinkle: of course, if the real definition of JSON is some mysterious superset of what those "specs" say, they're pretty worthless as specs |
| 03:50 | <Krinkle> | common sense and simple/lazy implementation |
| 03:51 | <Krinkle> | and I'm sure there's a wide scale of test cases of existing implementations you can plug in to make sure you did it right |
| 03:51 | <zewt> | if the specs say whitespace can only lie between tokens and don't allow a newline at the end, that'll break tons of inputs |
| 03:51 | <zewt> | (i always output a \n at the end of JSON, so curl output isn't stupid) |
| 03:51 | <Hixie> | Krinkle: keys in objects, as in, duplicate keys |
| 03:52 | <zewt> | if you have to apply liberal common sense and compare against existing implementations to implement JSON because the JSON specs aren't enough, those JSON specs are broken. |
| 03:52 | <Krinkle> | as being the parser, it wouldn't break anything. You'd tolerate more than others if anything, more likely you'd be tolerating what everybody else tolerates. |
| 03:52 | <Krinkle> | being the encoder is slightly more difficult indeed. |
| 03:52 | <Hixie> | zewt: the json specs being broken is more or less the thesis of my rant tonight, yes. |
| 03:53 | <zewt> | JSON.parse('{"a": 1, "a": 2}').a returns 2 in both chrome and firefox, which is also what I'd expect (parse a key, write it to the dictionary, if it happens to already be in the dictionary overwrite it) |
| 03:53 | <SimonSapin> | Krinkle: if you need common sense and guessing to fill the holes in a spec, it’s a bad spec |
| 03:54 | <Krinkle> | I'm not saying it's a good spec (I think it's better than most specs and a hell of a lot easier to implement as such), just saying it seems a moot point to doubt over. I think it's interesting to talk about, but if you're unsure what to do in the actual encoding/parser writing, I'd know better. |
| 03:55 | <Krinkle> | the safest route would be to encode as minimal as possible (no whitespace of any kind, and assuming your implementation program language doesn't support dupe keys, that input isn't a problem). |
| 03:55 | <zewt> | hardly moot: if the specs are ambiguous or wrong, then they should either be fixed (if whoever's maintaining the spec is willing to fix them, which RFCs seem to have a poor record of) or replaced |
| 03:57 | <Krinkle> | and in the parser, if you encounter a dupe key you can blame the input, garbage in garbage out. throw an error, or silently keep the first or last encounter. Shouldn't matter in practice as I'd consider it invalid input. |
| 03:57 | <zewt> | json.org seems more like a description of the file format and not really a spec--it says what the file format looks like, but nothing in precise terms about what to *do* with it. that in mind, the main error seems to be the whitespace issue |
| 03:57 | <Krinkle> | Hm.. the spec doesnt' say keys have to be unique. interesting. |
| 03:57 | <zewt> | web specs always have to precisely define how "invalid input" is handled |
| 03:58 | <Krinkle> | None of the languages listed support that, so it's obviously an oversight (no ambiguity as what the intent was). That should be fixed indeed. |
| 03:58 | <zewt> | json.org doesn't (but it doesn't seem to be attempting to be a real spec, so that's probably not a bug) |
| 04:06 | <SimonSapin> | I like CSS Syntax’s approach of having non-normative railroad diagrams to get a idea of what the syntax looks like, and precise normative text for implementers |
| 04:12 | <Hixie> | Krinkle: the RFC says "SHOULD", which means it wasn't even an oversight there |
| 04:13 | <Hixie> | another bug... looks like there's nothing saying that lone surrogates are illegal |
| 04:13 | <Hixie> | (in escapes, i mean) |
| 04:15 | <Hixie> | interesting, leading zeros in numbers aren't allowed |
| 04:15 | <Hixie> | pity about the lack of trailing commas |
| 04:16 | <Hixie> | (in objects or arrays) |
| 04:16 | <zewt> | they don't seem to be illegal according to chrome/firefox's implementations (but I expect basically zero non-web implementations will, since if you output to UTF-8...) |
| 04:17 | <zewt> | also, whoever's responsible for infecting JSON with UTF-16 needs to be exposed and publically shamed |
| 04:18 | <Hixie> | zewt: that's just from its JS heritage, i guess |
| 04:19 | <Hixie> | ok. for my purposes, the root can be any value, whitespace is allowed anywhere outside a leaf token, duplicate keys are fatal error invalid, and lone surrogate escapes are fatal error invalid. |
| 04:21 | <zewt> | don't know your context, but for general parsing i think duplicate keys shouldn't be a fatal error; take the last seen value |
| 04:23 | <zewt> | that seems to be what most implementations land on, intentionally or not (json.loads in Python does the same) |
| 04:23 | <Hixie> | that seems like a recipe for a security bug |
| 04:26 | <zewt> | only if someone has other behavior (like picking the first-seen value), right? |
| 04:26 | <Hixie> | right |
| 04:26 | <Hixie> | in particular, if a validator does |
| 04:27 | <Hixie> | or a serialiser |
| 04:33 | <zewt> | i guess i could see a streaming parser doing something different (but a streaming parser couldn't enforce unique keys anyway) ... minor since JSON is rarely streamed, but worth mentioning i guess |
| 04:41 | <Hixie> | huh, no range on numbers, either |
| 04:42 | <zewt> | json.loads('9'*100000) gives an exact result in python, heh |
| 06:07 | <MikeSmith> | nashorn wtf |
| 08:38 | <zcorpan> | Hixie: the last 3 commit emails have an error message |
| 09:32 | <annevk> | ooh, maybe the problem is with svn.whatwg.org and not my server |
| 09:35 | <jgraham> | /win 4 |
| 10:19 | <annevk> | Hixie: the JSON thing is being fixed |
| 10:20 | <annevk> | Although I wonder what the difference is between http://tools.ietf.org/html/rfc7158 and http://tools.ietf.org/html/rfc7159 |
| 10:24 | <annevk> | It seems they fixed the date and removed Tim Bray's email address in a <meta> element |
| 10:24 | <annevk> | In any event, that RFC matches 404 much closer: http://tools.ietf.org/html/rfc7159#section-2 |
| 10:28 | <Ms2ger> | As for the SQL definitions in the backscroll: I find neither particularly readable, but then again, I don't know SQL |
| 10:44 | <annevk> | Is Jeff basically saying power is for sale? https://twitter.com/jeff_jaffe/status/446072553820278785 |
| 10:50 | <darobin> | annevk: I don't think it's clear what he's saying |
| 10:51 | <darobin> | I think that the problem he's looking at is how much team involvement a given individual member may require |
| 10:51 | <darobin> | if it's too high, that would drive the price up |
| 10:52 | <darobin> | I'm not sure that's really related to power; I reckon "power" is 1) ill-defined in this case and 2) largely orthogonal |
| 10:53 | <darobin> | I wonder if there could be an "Individual College" |
| 10:53 | <darobin> | for every N individual members, there is one seat added to the AC |
| 10:53 | <darobin> | and individual members elect representatives to those seats |
| 10:53 | <darobin> | I'm not sure that would be of any use, though |
| 10:53 | <darobin> | maybe I should join that webizen thing |
| 10:54 | darobin | sighs |
| 11:11 | <MikeSmith> | there should be a thing where, if you pay extra, you're guaranteed nobody from the team will interfere with your work |
| 11:12 | <Ms2ger> | Like, Ian Jacobs won't change my specs behind my back? |
| 11:17 | <jgraham> | Well we have that |
| 11:17 | <jgraham> | Except instead of paying extra you pay less |
| 11:17 | <jgraham> | It's called "WHATWG" |
| 11:17 | <Ms2ger> | Zing |
| 11:24 | MikeSmith | readies drm.spec.whatwg.org for non-interference-guaranteed work at whatwg |
| 11:35 | <annevk> | whatwg.org/C is out of date again :-( |
| 12:40 | <foolip_> | MikeSmith: what's that spec supposed to be? |
| 12:40 | <foolip_> | april fools? |
| 12:42 | <MikeSmith> | foolip_: hadn't thought it through yet |
| 13:33 | <zcorpan> | jgraham: does wpt-serve support range requests? |
| 13:34 | <jgraham> | zcorpan: In theory, yes |
| 13:35 | <jgraham> | I don't know if it works more than the testsuite though |
| 13:36 | <jgraham> | (that is, there are tests for it but I wouldn't bet my life on the tests or the implementation being correct) |
| 13:37 | <zcorpan> | do you know off-hand of such a test? |
| 13:39 | <jgraham> | I mean tests in the wptserve testsuite |
| 13:40 | <jgraham> | Although I think I implemented it because some test was implementing a half-assed version of Range in PHP |
| 13:49 | <jgraham> | All I remember was that it was written by Payman/Joāo |
| 14:00 | <zewt> | "HTTP is now defined by 6, not 2 specs" :| |
| 14:02 | <annevk> | If that isn't Progress I don't know what is |
| 14:02 | <zewt> | nothing like splitting one thing into seventy to make it "easy" to find stuff |
| 14:03 | <jgraham> | 6>2? |
| 14:07 | <zewt> | last i checked |
| 14:14 | <Domenic_> | Hixie: the JSON RFC is basically a fork of 404; I would not use it. |
| 14:20 | <Domenic_> | It looks like Jeff Jaffe signed up for twitter just to reply to that tweet? |
| 14:28 | <annevk> | Domenic_: that was my impression |
| 14:29 | <annevk> | jgraham: that's not how that joke works |
| 14:34 | <zcorpan> | Hixie: https://www.w3.org/Bugs/Public/show_bug.cgi?id=24860#c12 |
| 15:19 | <dglazkov> | good morning, Whatwg! |
| 16:30 | <Hixie> | wow there really is no difference between 7158 and 7159. weird. |
| 16:31 | <annevk> | Hixie: they just fixed the date |
| 16:32 | <Hixie> | "fixed"? |
| 16:33 | <Hixie> | there's literally no difference between them, except the second one has errata apparently. |
| 16:33 | <Hixie> | so let me get this right. |
| 16:33 | <Hixie> | they'll publish an entirely new rfc just to update the date, but they won't publish an entirely new rfc to fix errors in the content? |
| 16:35 | <Hixie> | i mean it doesn't even really "fix" the date, since there's still an rfc with the wrong date out there now. |
| 16:35 | <annevk> | correct |
| 16:36 | <Hixie> | and this new version still doesn't fix the mess around whether values should be unique |
| 16:36 | <Hixie> | in fact it makes it even more muddled |
| 16:36 | <annevk> | it's up to the implementation, JavaScript's JSON has last wins iirc |
| 16:37 | <jgraham> | Gotta love a format specifically designed for interchange where "it's up to the implementation" |
| 16:46 | <gsnedders> | The RFC was to fix editorial issues, not to make any changes to the format (and making something defined would be a change). |
| 16:46 | <Hixie> | it did change the format |
| 16:46 | <Hixie> | quite radically, actually, from the first RFC |
| 16:50 | <annevk> | It aligned with ES5 after I and others asked them to do that |
| 16:50 | <Hixie> | ES6 just defers to 404 |
| 16:50 | <Hixie> | which isn't as well-defined as the RFC |
| 16:51 | <Hixie> | (e.g. it doesn't define the root value, as we were discussing last night) |
| 16:51 | <annevk> | Yeah, Ecma 404 is what ES5 has |
| 16:51 | <annevk> | Anything can be root |
| 16:51 | <Hixie> | it doesn't say that |
| 16:51 | <Hixie> | it actually literally doesn't define the format in the most basic sense |
| 16:51 | <Hixie> | as far as i can tell |
| 16:53 | <annevk> | Hixie: it says that JSON text is a sequence of code points that conforms to the grammar |
| 16:53 | <Hixie> | right |
| 16:53 | <Hixie> | and it doesn't give "the grammar" |
| 16:54 | <annevk> | "JSON text is a sequence of tokens formed from Unicode code points that conforms to the JSON value grammar" |
| 16:54 | <Hixie> | oh, it says "the JSON Value grammar" |
| 16:54 | <Hixie> | interesting |
| 16:54 | <annevk> | seems clear enough |
| 16:54 | <Hixie> | how did i miss that like 15 times |
| 16:54 | <Hixie> | weird |
| 16:56 | <jgraham> | Possibly because it has some weirdness about "Conforming JSON text" vs "JSON Text" |
| 16:56 | <jgraham> | I actually can't tell if they are supposed to be different |
| 16:57 | <jgraham> | It looks like maybe "JSON Text" is a superset of "Conforming JSON Text" |
| 16:57 | <jgraham> | But "Conforming JSON Text" has to "strictly" match "the JSON grammar", which is undefined |
| 16:58 | <jgraham> | ("JSON Text" merely has to "conform to" (not "strictly") "The JSON Value grammar") |
| 16:59 | <jgraham> | (but it's hard to tell if "conforming" vs "strictly conforming" is suspposed to be a substantive difference) |
| 17:18 | <annevk> | It isn't really hard to tell, but you could file some bugs for improvement |
| 17:18 | <annevk> | TabAtkins: you around? |
| 17:19 | <jgraham> | It's hard to tell in the sense that from the ECMA spec I guenuinely don't know |
| 17:20 | <jgraham> | *genuinely |
| 17:28 | <gsnedders> | The aim of ECMA 404 was to define grammar, not semantics. Which is odd. |
| 17:29 | <annevk> | That's always been the goal of JSON though |
| 17:29 | <jgraham> | Not really |
| 17:29 | <jgraham> | I mean |
| 17:29 | <annevk> | Implementations do things like rounding on the numbers and such too, which isn't really forbidden either |
| 17:30 | <annevk> | It was the goal of its creator, unless he changed his mind on the goal midway through |
| 17:30 | <annevk> | For a while he didn't even want to define the alphabet in use, until we told him that was a bad idea |
| 17:31 | <jgraham> | it clearly does define some semantics |
| 17:31 | <jgraham> | It more or less defines how the numbers work |
| 17:31 | <zcorpan> | in my json parser, [] is an elephant |
| 17:31 | <zcorpan> | a real one |
| 17:32 | <jgraham> | You would be hard pushed to argue that 10e17 in JSON could be interpreted as 27 or something |
| 17:32 | <jgraham> | Although it doesn't define what + or - means |
| 17:32 | <annevk> | jgraham: sure, but if you don't use decimal storage, are you non-conforming? |
| 17:33 | <jgraham> | Basically istm that Crockford isn't to be trusted with this kind of thing and that JSON has succeeded in spite of him rather than because of him |
| 17:33 | <jgraham> | So saying "well the creator wanted X" doesn't seem like a great argument |
| 17:34 | <annevk> | I was talking about goals |
| 17:34 | <annevk> | In any event, this doesn't seem like a great use of my time |
| 17:35 | <jgraham> | I highly doubt it was his goal to create a format that couldn't actually be used for interchange reliably |
| 17:35 | <jgraham> | and if it was his goal seems like one that no one should share |
| 17:35 | <jgraham> | So either way it seems quite irrelevant |
| 17:37 | <annevk> | I think it actually makes sense. It breaks down a bit with generic parsers. But lots of things will be decided at the application layer anyway. |
| 17:38 | <Hixie> | json succeeded for the same reason xml succeeded (and did better than xml because it is a simpler format than xml) |
| 17:38 | <Hixie> | the reason is, people have an irrational fear of defining custom core syntaxes |
| 17:39 | <jgraham> | No |
| 17:39 | <Hixie> | people think that if you define a vocabulary on top of a core syntax, it's better than defining a vocabulary and a core syntax together |
| 17:39 | <Hixie> | which confuses me greatly, especially when the formats they use don't really fit the problem space |
| 17:39 | <jgraham> | It's because having a simple to work with format that has prewritten, predebugged parsers in a range of langauges is a huge win over custom-everything |
| 17:40 | <jgraham> | It means that you don't have to keep learning people's half-baked formats |
| 17:40 | <jgraham> | And makes interop simpler |
| 17:40 | <Hixie> | yeah instead you have to write custom vocabulary interpreters for half-baked vocabularies that you keep having to learn |
| 17:41 | <Hixie> | and interop fails because neither the syntax nor the vocabulary define error handling |
| 17:41 | <jgraham> | Which is a much easier problem, it turns out, and one that you would have to solve anyway |
| 17:41 | <Hixie> | how is it easier? it's the same. |
| 17:41 | <Hixie> | you just change your lexical space from unicode characters to different tokens |
| 17:42 | <jgraham> | Not at all. If I want to interop with, say, the github API I just have to use an off-the-shelf json lib that I have used hundreds of times before and write some simple code to extract the data I care about |
| 17:43 | <jgraham> | If they had invented GitHub-ON for the purpose I would have to either write a parser or learn their library that I had never used before |
| 17:43 | <Hixie> | that fails in the same way that people using "simple code to extract the data" they care about from HTML fails |
| 17:43 | <jgraham> | and still write the code to extract the data from the file |
| 17:43 | <jgraham> | It actually doesn't |
| 17:43 | <jgraham> | That's why the format has been a success |
| 17:44 | <jgraham> | In spite of the fact that it's horribly flawed in several ways |
| 17:44 | <jgraham> | and the people speccing it have managed to make a complete clusterfuck of something that could have been rather straightforward |
| 17:45 | <zcorpan> | i saw somewhere someone was working on a "JSON5" which supported more things like comments and unquoted keys |
| 17:45 | <zcorpan> | and trailing commas |
| 18:18 | <SamB> | ... personally I think JSON+C is exactly the right thing. Except that stupid UTF-16 stuff. |
| 18:18 | <SamB> | (But that's what you get for basing it on JS syntax ...) |
| 18:18 | <TabAtkins> | annevk: I'm around now. |
| 18:19 | <annevk> | TabAtkins: any interest in tackling my Selectors questions? |
| 18:19 | <TabAtkins> | Point me to them? |
| 18:19 | <annevk> | TabAtkins: emailed www-style |
| 18:19 | <TabAtkins> | Ah, kk. I'll respond. |
| 18:19 | TabAtkins | hasn't checked his email yet this morning. |
| 18:19 | <annevk> | Basically wondering if I'm invoking the correct hooks and what hooks to use for matches() |
| 18:20 | <SamB> | Hixie: I think it drastically reduces the number of sharp edge cases that need to be dealt with, or at least localizes them a lot better ... |
| 18:21 | <Hixie> | that's jgraham's position too, i think |
| 18:21 | <Hixie> | i think it just hides them more |
| 18:21 | <Hixie> | which makes them less likely to be handled |
| 18:22 | <SamB> | replicating someone else's buggy parser in another language is not most people's idea of fun |
| 18:22 | <Hixie> | but replacting someone else's buggy vocabulary interpreter in another language is? |
| 18:22 | <Hixie> | replicating |
| 18:23 | <SamB> | well, many programs don't need to understand the whole vocabulary |
| 18:23 | <Hixie> | that's the same logic that leads to people writing parsers that don't need to handle the whole syntax |
| 18:23 | <SamB> | that's often not possible |
| 18:24 | <SamB> | what I mean is that if you are handed a data structure, you don't have to *look* in every nook and cranny; you only need to look in the places relevant to the task at hand |
| 18:24 | <Hixie> | assuming those places exist. and are the right type. and aren't out of range. and... |
| 18:25 | <SamB> | okay, yes, true |
| 18:26 | <SamB> | but given that many of these people aren't going to be doing proper error checking ANYWAY ... |
| 18:26 | <Hixie> | in other news, i've just realised that in json, numbers are special in that they're the one token whose end is determined by look-ahead. |
| 18:26 | <Hixie> | how annoying. |
| 18:27 | <SamB> | what, no lexer? |
| 18:27 | <Hixie> | ? |
| 18:28 | <SamB> | ... why is this a problem? Are you writing the lexer by hand? |
| 18:28 | <annevk> | Hixie: in what environment are you implementing your own JSON parser? |
| 18:28 | <Hixie> | SamB: yeah |
| 18:29 | <SamB> | do you not have a *lex you could use? |
| 18:29 | <Hixie> | annevk: freepascal. there's lots of existing ones, i just figured it would be fun. |
| 18:29 | <annevk> | Hixie: I see |
| 18:29 | <annevk> | Hixie: are you adding comment support? :-) |
| 18:29 | <SamB> | Hixie: please tell me you're actually implementing JSON+C, yes |
| 18:30 | <Hixie> | SamB: i could use a lexer. i happen to chose not to this time. :-) |
| 18:30 | <Hixie> | SamB: i'm implementing whatever is needed to parse the tokeniser tests in html5lib's test suite :-) |
| 18:31 | <annevk> | Oh my |
| 18:31 | <SamB> | so why is it that you're using Object Pascal? |
| 18:31 | <gsnedders> | We do touch a fair few bits of edge-cases. :) |
| 18:31 | <Hixie> | gsnedders: hehe |
| 18:31 | <annevk> | This new version of Anolis is going to be built on primitives you implemented yourself Hixie? :-P |
| 18:31 | <Hixie> | SamB: is best language. |
| 18:31 | <gsnedders> | :) |
| 18:31 | <Hixie> | annevk: yep :-) |
| 18:32 | <Hixie> | including my own utf-8 decoder :-) |
| 18:32 | <annevk> | Hixie: please make it somewhat clean this time so we can see the source code |
| 18:33 | <Hixie> | hah |
| 18:33 | <Hixie> | i make no promises |
| 18:33 | SamB | wonders why Object Pascal is so little heard of |
| 18:33 | <Hixie> | SamB: it was pretty popular on windows for a while (under the name Delphi) |
| 18:33 | <SamB> | possibly it has had too many names and too few implementations? |
| 18:34 | <Hixie> | but yeah, i dunno why it's not more popular |
| 18:34 | <SamB> | Hixie: yes, I know, and it's still used there |
| 18:37 | <Hixie> | hahaha, json's silly surrogate escape thing triggered my utf-8 system's "surrogates aren't allowed" assertion |
| 18:38 | <gsnedders> | :) |
| 18:38 | <SamB> | Hixie: what did you do to cause that? |
| 18:38 | <gsnedders> | Yup, we have lone surrogates in the html5lib tokenizer JSON. |
| 18:38 | <gsnedders> | They're perfectly allowed in JSON :) |
| 18:38 | <Hixie> | uh |
| 18:38 | <SamB> | I mean why is this getting into UTF-8 |
| 18:38 | <SamB> | gsnedders: eww |
| 18:38 | <Hixie> | SamB: i use utf-8 as my internal representation |
| 18:39 | <Hixie> | gsnedders: huh |
| 18:39 | <gsnedders> | SamB: We need to test lone surrogates are handled correctly! |
| 18:39 | <gsnedders> | Hixie: huh at wha? |
| 18:39 | SamB | goes to read the spec ... |
| 18:39 | <Hixie> | gsnedders: how do you get lone surrogates out of the html parser? |
| 18:39 | <Hixie> | SamB: the json spec is pretty messed up when it comes to surrogates |
| 18:39 | <gsnedders> | Hixie: Out of it? We don't. But we have them in the input stream. |
| 18:39 | <Hixie> | gsnedders: ahhh... |
| 18:39 | <Hixie> | interesting |
| 18:40 | <Hixie> | well, my input stream can't support lone surrogates |
| 18:40 | <Hixie> | so i'm probably ok just skipping those tests |
| 18:40 | <SamB> | is there a reason why JSON is ECMA 404? |
| 18:40 | <Hixie> | i guess i'll turn lone surrogates into FFFD |
| 18:40 | <Hixie> | (in the json parser) |
| 18:42 | SamB | wants a font where U+FFFD is represented by logo-encoding.svg -- colors and all! |
| 18:42 | <Hixie> | hm well that makes unicode escapes into another thing that needs lookahead |
| 18:47 | <TabAtkins> | Hixie: Why are you using utf-8 as the internal representation? That's an encoding, it's weird to use that internally. Just use arrays of codepoints. |
| 18:48 | <dglazkov> | what should this show? http://jsbin.com/bubot/1/edit |
| 18:48 | <gsnedders> | TabAtkins: Why would you use arrays of codepoints? That's massively wasteful, esp. if it's mostly ASCII. |
| 18:48 | <TabAtkins> | dglazkov: What do you *think* it should show? |
| 18:48 | <TabAtkins> | gsnedders: Because it's simpler? Or use a unicode string, if your language provides that. |
| 18:49 | <dglazkov> | ARIAL in arial, INITIAL in times new roman or whatever UA's initial value is? |
| 18:49 | <TabAtkins> | Ah, I missed that the initial value is generally a serif font. |
| 18:51 | <TabAtkins> | http://jsbin.com/zugojoxa/1/edit?html,output |
| 18:51 | <TabAtkins> | This shows the problem a little more clearly - the two "INITIAL"s should be the same font. |
| 18:51 | <TabAtkins> | annevk: Just to make sure - these are hooks you need for .query() and .matches()? |
| 18:52 | <dglazkov> | TabAtkins: I wonder what mozilla/ie do here? |
| 18:53 | <TabAtkins> | I'm on ChromeOS, so I can't tell. |
| 18:53 | <dglazkov> | me too :-\ |
| 18:55 | <dglazkov> | TabAtkins: but this is a bug, right? |
| 18:55 | <TabAtkins> | Yes. |
| 18:56 | <SamB> | I'm confused: https://tools.ietf.org/rfcdiff?difftype=--hwdiff&url1=rfc7158&url2=rfc7159 |
| 18:56 | <Hixie> | TabAtkins: because arrays of codepoints take 8 bytes per character and require that the entire input be copied, rather than the input taking 1 byte per character and the data not needing to be copied? |
| 18:57 | <SamB> | it doesn't seem like there were any changes other than the change in RFC number and in the "Obsoletes:" line ... |
| 18:57 | <TabAtkins> | Then you have to accept encoding limitations, like the fact that you can't encode a lone surrogate in valid utf-8. |
| 18:57 | <TabAtkins> | SamB: Yeah, looks like it. |
| 18:57 | <Hixie> | yup |
| 18:57 | <Hixie> | i am very happy to accept that limitation :-) |
| 18:57 | <SamB> | and the year |
| 18:58 | <dglazkov> | TabAtkins: gecko gets it right |
| 18:59 | <TabAtkins> | It's probably something to do with our bizarre parsing of 'font'. |
| 18:59 | <TabAtkins> | Well, hm, never mind, that still doesn't make sense. |
| 19:00 | <Hixie> | woot, my json parser found a bug in my test rather than the other way around |
| 19:00 | <dglazkov> | TabAtkins: nah. I found this by code inspection. We just don't do anything sensible there. And I wondered if this was intentional |
| 19:00 | <TabAtkins> | (I was wondering if it was parsing as a font named "inherit", but that wouldn't help - it would just do fallback, which should produce the default font.) |
| 19:01 | <dglazkov> | TabAtkins: we get right to the point where we need to apply property "initial", and then we just go "weeee" and leave |
| 19:01 | <TabAtkins> | Fun. |
| 19:02 | <SamB> | ouch! http://timelessrepo.com/json-isnt-a-javascript-subset |
| 19:09 | <SamB> | oh fun, if I pass difftype=--help to the rfcdiff page, it outputs plaintext as HTML ... |
| 19:09 | <SamB> | I mean as text/html |
| 19:12 | <SamB> | oh, and otherwise it produces what looks like it's intended to be XHTML labeled as text/html ... |
| 19:22 | <annevk> | TabAtkins: I need hooks for querySelector, query, and matches |
| 19:22 | <annevk> | TabAtkins: querySelector and matches both take an absolute selector afaict |
| 19:22 | <annevk> | TabAtkins: query takes a relative |
| 19:22 | <TabAtkins> | querySelector is an absolute scope-filtered, possibly with a reference set. |
| 19:23 | <TabAtkins> | query is relative, definitely with a reference set. |
| 19:23 | <TabAtkins> | matches is absolute, definitely with a reference set. |
| 19:24 | <TabAtkins> | (The only effect of having a reference set is giving meaning to :scope.) |
| 19:25 | <annevk> | Oh, I thought scoping root was for that |
| 19:25 | <TabAtkins> | Nope, that's only if you're scoping. |
| 19:25 | <TabAtkins> | Sorry for the confusing wording, but :scope got named before anything else. |
| 19:26 | <TabAtkins> | And the scoping root is the default reference set, if you don't specify anything else. |
| 19:26 | <TabAtkins> | You generally don't want to scope. querySelector() does, but really only because it didn't have relative selectors at the time. |
| 19:26 | <TabAtkins> | <style scoped> is the only other thing that uses scoping. |
| 19:26 | TabAtkins | is off to lunch for a bit, will answer any further questions in an hour or so. |
| 19:27 | <annevk> | Oh okay. So querySelector using a scoping root is fine. |
| 19:27 | <annevk> | However, matches should use a reference set so no scoping is done |
| 19:28 | <SamB> | relative selectors? |
| 19:30 | <SamB> | hmm, well, ECMA 404 doesn't say you can have unpaired surrogates in your JSON |
| 19:30 | <annevk> | Okay, I should look at this again tomorrow, thanks for the pointers so far TabAtkins |
| 19:31 | <annevk> | TabAtkins: I do find it a bit odd that you have API hooks for selectors separate from the general selector matching (where is the algorithm for that? and why is it not linked from the API hooks section?) |
| 19:33 | SamB | wonders what Haskell does if you have surrogates in your Strings masquerading as Chars |
| 19:36 | <gsnedders> | Is there any sane way to find a font that contains a given Unicode codepoint on OS X? |
| 19:36 | <gsnedders> | Like, I blatantly have one as it manages to font-switch in places for it. |
| 19:36 | <gsnedders> | But I can't tell what font it comes from |
| 20:23 | <gsnedders> | SamB: It'll allow them, because Char is just an integral type, of range 0–0x10FFFF |
| 20:33 | <Hixie> | yeah that's basically what i did too, except that i have an operator overload for assignment that checks for surrogates :-) |
| 20:33 | <Hixie> | (and i allow -1 to mean eof) |
| 20:33 | <TabAtkins> | SamB: http://dev.w3.org/csswg/selectors/#relative |
| 20:34 | <gsnedders> | You need a type system with dependent types to allow only matched surrogates |
| 20:34 | <Hixie> | uh no, i want no surrogates :-) |
| 20:35 | <gsnedders> | That's easy :P |
| 20:39 | <Hixie> | annevk: is there anything in particular i need to do on https://www.w3.org/Bugs/Public/show_bug.cgi?id=24810 or did you reassign to me just so i could look it over? (it looks good) |
| 20:39 | <annevk> | Hixie: I assigned it to you so you could remove the bits in HTML |
| 20:40 | <annevk> | Hixie: e.g. scripting environment is no longer a thing DOM has now or needs HTML to define so you can remove that |
| 20:40 | <annevk> | Hixie: and under microtask checkpoint there's a bit of cleanup you can do |
| 20:40 | <Hixie> | aaah right |
| 20:40 | <Hixie> | cool |
| 20:40 | <Hixie> | thanks |
| 21:22 | <SamB> | gsnedders: Char is not *quite* an integral type, but yeah, I guess it has allow them given the way Enum works ... |
| 21:22 | <SamB> | and Bounded |
| 21:26 | <gsnedders> | SamB: Okay, it's not an integral type, but it has a 1:1 mapping to one |
| 21:28 | <SamB> | back to JSON, RFC 715[89] also doesn't permit unpaired surrogates, but in section 8.2 warns that they have been seen in the wild and that the behaviour of software encountering them is unpredictable |
| 21:28 | <SamB> | https://tools.ietf.org/html/rfc7159#section-8.2 |
| 21:29 | <gsnedders> | Oooh! That's a change! |
| 22:12 | <jgraham> | Hmm, that section says that it does permit unpaired surrogates |
| 22:13 | <jgraham> | It just says that the behaviour of unpaired surrogates is undefined |
| 22:14 | <zewt> | i'd sooner have it defined as outputting FFFE than being a parse error |
| 22:15 | <jgraham> | The whole thing is pretty woeful |
| 22:15 | <zewt> | so if some API client inputs a string from a user in a UTF-16 env, and the user pastes in a string with a broken surrogate, it doesn't become a server-side error later |
| 22:15 | <jgraham> | You are kind of expected to guess how string escapes work |
| 22:16 | <zewt> | eg. i'd rather there be no possible invalid user inputs as a string, even if it means some (invalid surrogates) not round-tripping through some paths |
| 22:16 | <jgraham> | zewt: AFAICT the spec simply fails to define how strings are interpreted at all |
| 22:17 | <zewt> | sure, i'm just saying how i'd prefer it |
| 22:17 | <jgraham> | I agree that fatal errors aren't great |
| 22:17 | <zewt> | in practice that may be unlikely: most JSON encoders I've used just output UTF-8 for everything and never use \u escapes anyway |
| 22:18 | <zewt> | (or in the case of JS, outputs UTF-16 codepoints that get encoded to UTF-8 later) |
| 22:31 | <SamB> | gsnedders: well, I mean, the ABNF doesn't rule them out, but there are no semantics given for them in the prose |
| 22:35 | <SamB> | what was that April 1 RFC with the, er, disillusioned definitions for keywords? |
| 22:37 | <gsnedders> | jgraham: What do you want to do with serializer tests for html5lib, BTW? |
| 22:38 | <gsnedders> | jgraham: Given they depend upon so many serialization options, and there are infinitely many valid serializations… |
| 23:04 | <jgraham> | gsnedders: I don't think I have a strong opinion. It makes sense to have *some* tests for html5lib itself |
| 23:04 | <jgraham> | I'm not sure that they are worth sharing with other projects though |
| 23:10 | <SimonSapin> | SamB: 6919 |
| 23:11 | <SimonSapin> | jgraham: could you test round-tripping rather than exact serializations? |
| 23:15 | <gsnedders> | SimonSapin: Yes, though obviously some tests must test exact serializations |
| 23:15 | <Hixie> | does anyone here know anything about MSE? |
| 23:15 | <Hixie> | (https://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html) |
| 23:16 | <gsnedders> | SimonSapin: Also note not everything round-trips |
| 23:16 | <SimonSapin> | gsnedders: why, and why? |
| 23:17 | <gsnedders> | SimonSapin: e.g., We need to make sure serialization of attribute values doesn't expose XSS bugs in old IE |
| 23:17 | <gsnedders> | SimonSapin: and e.g., (XML) <table><tr><td>foo |
| 23:17 | <gsnedders> | SimonSapin: (a tree with no tbody) |
| 23:18 | <SimonSapin> | gsnedders: but parse+serialize+parse should still be the same as parse, right? |
| 23:19 | <gsnedders> | SimonSapin: No. Well, there is obviously *a* serialization, but some parse errors create odd trees. |
| 23:19 | <gsnedders> | SimonSapin: Like foster parenting can cause odd things |
| 23:19 | <SimonSapin> | isn’t serialize+parse idempotent? |
| 23:20 | <SimonSapin> | (*not* parse+serialize) |
| 23:20 | <gsnedders> | SimonSapin: Given parse+serialize+parse, "<p><table><p>" is hard to handle. |
| 23:20 | <gsnedders> | SimonSapin: Because it's serialization isn't at all obvious from the tree it produces. |
| 23:20 | <gsnedders> | *its |
| 23:20 | <SimonSapin> | I don’t see the problem |
| 23:21 | <gsnedders> | At the moment we only try to serialize trees that a conforming input can create. i.e., not ones like that |
| 23:21 | <SamB> | hmm, you would certainly think that given a just-parsed document, serialize|parse would be idempotent |
| 23:21 | <SimonSapin> | SamB: yes, that’s what I mean |
| 23:21 | <SimonSapin> | is that not the case for HTML? |
| 23:22 | <gsnedders> | It is. |
| 23:22 | <gsnedders> | It's just the serialize case gets exceptionally hard if you want to make it complete. |
| 23:22 | <gsnedders> | Like, for <p><table><p> you have to go from an XML infoset like <p><p/><table/></p> to having the second p appear within the table. |
| 23:23 | <gsnedders> | Because you can't serialize that tree as <p><p><table></table></p></p> in HTML. |
| 23:23 | <SimonSapin> | gsnedders: do you mean that only testing idempotency it not that useful because you can make a serializer that’s idempotent bug wrong? (e.g always return the empty string) |
| 23:23 | <SamB> | SimonSapin: that wouldn't make serialize|parse idempotent |
| 23:23 | <gsnedders> | SimonSapin: No, I mean it's impractically hard to do. |
| 23:24 | <gsnedders> | SimonSapin: Because a serializer that's idempotent is really complex to handle cases like (XML) <p><p/><table/></p> |
| 23:25 | <gsnedders> | SimonSapin: I played about with taking all parser tests and checking serializerparser |
| 23:25 | <SamB> | so what was the goal again? |
| 23:25 | <gsnedders> | The goal is to improve the current testing situation of html5lib's serializer. Which current relies on shared tests in html5lib-tests dependent upon serialization choice. |
| 23:33 | <gsnedders> | https://gist.github.com/gsnedders/9653913 is what currently fails to roundtrip with html5lib. Some (e.g., the script stuff) are obviously bugs. |
| 23:34 | <gsnedders> | <p><b><i><u></p>\n<p>X is gonna be very hard to serialize correctly |
| 23:35 | <Hixie> | why? |
| 23:37 | <gsnedders> | Well, "<p><b><i><u></u></i></b></p><b><i><u>\n<p>X</u></i></b>" doesn't correspond to the same thing, despite being the obvious serialization of the tree |
| 23:38 | <gsnedders> | Or rather, the logic for when you omit the p closing tag is complicated. |
| 23:38 | <Hixie> | oh |
| 23:38 | <gsnedders> | Okay, maybe that isn't as bad. |
| 23:38 | <Hixie> | i would just never omit closing tags |
| 23:38 | <Hixie> | :-D |
| 23:38 | <gsnedders> | I thought that was the weird case where AAA stuff made it horrible. |
| 23:38 | <gsnedders> | idk. |
| 23:39 | <gsnedders> | I don't have time for this. |
| 23:39 | <gsnedders> | :) |
| 23:39 | <gsnedders> | This isn't my dissertation. :) |
| 23:41 | <gsnedders> | Hixie: Though that does mean the informative description of when p end tags can be omitted in the spec is wrong :) |
| 23:42 | <gsnedders> | Or is Writing HTML Documents normative? |
| 23:42 | <gsnedders> | It appears to be normative for documents, authoring tools, and markup generators. |
| 23:42 | <gsnedders> | In which case the normative description is wrong :) |
| 23:52 | <Hixie> | gsnedders: it's not wrong. it just doesn't handle non-conforming cases since those cases are already non-conforming. |
| 23:53 | <Hixie> | gsnedders: if a tool outputs <b><p> then it's bogus regardless of where it closes the </b> |
| 23:53 | <gsnedders> | Bah! |
| 23:53 | <Hixie> | i'm just sayin' |
| 23:54 | <gsnedders> | See, this is what makes serialize|parse idempotency hard! |
| 23:54 | <gsnedders> | Like, sure, yeah, obviously any tree the parser creates *can* be serialized. |
| 23:54 | <Hixie> | should just reguse to serialise anything that's non-conforming |
| 23:54 | <Hixie> | refuse |
| 23:54 | <gsnedders> | To what degree on non-conformity? |
| 23:54 | <gsnedders> | *of |
| 23:54 | <Hixie> | human-checked! |
| 23:54 | <Hixie> | :-D |
| 23:54 | <gsnedders> | :) |
| 23:55 | <gsnedders> | Do we allow unknown elements? Because their parse-model could change! |
| 23:55 | <Hixie> | (or, document your tool as requiring conforming input, and if it's given non-conforming input, say that the output could be garbage.) |
| 23:55 | <gsnedders> | Yeah, that's the sensible approach. :) |
| 23:56 | <gsnedders> | I originally did this just because I wanted to see how hard it'd be to guarantee serializer||parser idempotency given a tree from the parser. Because if it was easy then we could trivially get way more tests. :) |
| 23:57 | <gsnedders> | There's no schema representing all the content model restrictions, is there? |