| 10:11 | <Hixie> | hm |
| 10:11 | <Hixie> | are there really two parse errors for "<!DOCTYPE" but only one for "<!DOCTYPE "? |
| 10:20 | <hsivonen> | Hixie: yes |
| 10:21 | <hsivonen> | Hixie: probably not worth tweaking |
| 10:34 | <hsivonen> | <b><table><td></b><i></table>X |
| 10:34 | <hsivonen> | Why isn't <b> supposed to reopen before X? |
| 11:02 | <Hixie> | isn't it? |
| 11:03 | <Hixie> | oh because the table is in the <b> |
| 11:03 | <Hixie> | and so the X is still in the <b> |
| 11:03 | <Hixie> | the </b> in the above has no effect |
| 11:05 | <virtuelv> | Hixie: How's Bergen? |
| 11:14 | <gsnedders> | http://geoffers.no-ip.com/svn/php-html-5-direct/tests/numbersTest |
| 11:16 | <Hixie> | virtuelv: rainy |
| 11:17 | <Hixie> | gsnedders: does that match the spec or the spec with your proposed changes? |
| 11:17 | <gsnedders> | Hixie: the spec |
| 11:18 | <virtuelv> | Hixie: Norway's been pretty much like that for a couple of weeks now |
| 11:18 | <gsnedders> | Hixie: even when the spec does very odd things (like a list of integers with input "10" outputting [1]) |
| 11:19 | <Hixie> | gsnedders: k |
| 11:19 | <Hixie> | gsnedders: can you include that link in one of your e-mails? (or just mail it directly to me ian⊙hc) I'll try to look at what browsers do with those tests when I update the spec |
| 11:20 | <gsnedders> | Hixie: I'm going to email it shortly |
| 11:20 | <gsnedders> | Hixie: just a few more general issues with the number section, then my review of that is done, and I'll send it off with the final email |
| 11:20 | <hsivonen> | Hixie: ah, I didn't realized the table was in b. I've got a bug then. |
| 11:22 | <virtuelv> | Hixie: re DOMContentLoaded - it'd be useful to have some event when the DOM is loaded and styles are available/applied |
| 11:41 | <hsivonen> | translating the spec to code would be less error-prone if the spec didn't have gotos that create unnatural loops |
| 11:42 | <gsnedders> | hsivonen: heh. I ended up with a do {} while (true); in my implementation of the lists of integers. |
| 11:42 | <gsnedders> | then relying on break and continue statements |
| 11:42 | <hsivonen> | gsnedders: I'm pretty sure do-while is always natural |
| 11:42 | <hsivonen> | (natural in the compiler sense) |
| 11:43 | <gsnedders> | ah. in that sense. |
| 11:43 | <gsnedders> | (of natural) |
| 11:43 | <gsnedders> | PHP likely does something odd with it, though, knowing PHP. |
| 11:44 | <gsnedders> | has anyone apart from zcorpan_ and myself started the spec review, anyway? |
| 11:44 | <hsivonen> | if I had to guess, my guess would be that even PHP created only natural loops for the purpose of compiler optimization |
| 11:44 | <hsivonen> | gsnedders: I'm reviewing the parsing spec as I go |
| 11:45 | <hsivonen> | gsnedders: I don't have much to say about tokenization, but I have posted remark about tree building |
| 11:47 | <gsnedders> | hsivonen: ah. I just haven't seen that much. |
| 11:48 | <hsivonen> | lost in the flood I guess :-( |
| 11:48 | <gsnedders> | ah, now I see |
| 11:55 | <Hixie> | hsivonen: believe me, the spec doesn't look like what i'd want it to look like if i was doing this from scratch |
| 11:55 | <Hixie> | anyway, time to be a tourist |
| 11:56 | <gsnedders> | Hixie: rarely anything ends up as you'd like it to if you started from scratch :P |
| 12:19 | <hsivonen> | <a><p>X<a>Y</a>Z</p></a> |
| 12:20 | <hsivonen> | Why does the first <a> come off the stack before <p> goes in? |
| 12:20 | <hsivonen> | ooh. does the p get reparented? |
| 12:22 | <hsivonen> | now I'm confused |
| 12:37 | <met_> | http://ajaxian.com/archives/google-gears-roadmap-and-features |
| 12:37 | <hsivonen> | ooh! my code lacks step #10 of the AAA! |
| 12:44 | <Philip`> | gsnedders: In numbersTest: s/dimentions/dimensions/ |
| 12:47 | <gsnedders> | Philip`: fixed |
| 15:15 | <gsnedders> | Jero: you around? |
| 15:15 | <Jero> | yup |
| 15:17 | <gsnedders> | did you start your PHP5 implementation from scratch not knowing that there was a semi-started one before, or some other reason? |
| 15:24 | <gsnedders> | Jero: and I've started on a 1:1 implementation in PHP, which isn't really so relevant in the real world |
| 15:25 | <Jero> | gsnedders: correct, I found out later that there was already an HTML5 parser in PHP |
| 15:25 | <Jero> | gsnedders: but I could access the site (some issues with Trac I believe) |
| 15:26 | <gsnedders> | Jero: it's not so interesting now. a lot of the code written for it is obsolete |
| 15:26 | <gsnedders> | http://php-html5lib.dashslot.net/svn/trunk works, though |
| 15:27 | <Jero> | gsnedders: interesting |
| 15:27 | <Jero> | also, what do you think of my implementation so far? |
| 15:27 | <gsnedders> | I've never had time to really look into it |
| 15:27 | <gsnedders> | (due to school, and now trying to get as much of the spec review done as possible before going away in a week) |
| 15:28 | <gsnedders> | http://geoffers.no-ip.com/svn/php-html-5-direct contains the direct implementation |
| 15:29 | <Jero> | thanks |
| 15:29 | <gsnedders> | it's all very slow, though |
| 15:29 | <Jero> | so is my implementation at the moment :p |
| 15:29 | <gsnedders> | the direct one will be far slower, though |
| 15:30 | <Jero> | yeah, i'm sure |
| 15:30 | <gsnedders> | as the aim is to make absolutely no compromises from the spec |
| 15:30 | <gsnedders> | which is the case of the tokeniser means one character at a time |
| 15:30 | <gsnedders> | *means emitting |
| 15:30 | <Jero> | yeah, that's not a very optimal solution :p |
| 15:31 | <Jero> | but I guess I've only made three or four changes to the entire parsing algorithm compared to the spec |
| 15:33 | <Philip`> | If you want to write a new tokeniser in some language, it could perhaps be helpful to build on my work - that has a direct representation of the spec algorithm, and generates C++ or JS code to execute it, and it ought to be fairly quick to do other languages in the same way |
| 15:35 | <Philip`> | (I need to add some kind of abstraction in the code-generating part - JS was only easy because it's almost entirely identical to C++ except for replacing 'bool' with 'var', and it takes a little bit more effort if you needs $s in front of variables) |
| 15:35 | <Philip`> | (but I'll at least try to create a Perl implementation too, to make sure it's sufficiently portable between languages) |
| 15:44 | <gsnedders> | Jero: I may, however, try forking off the direct impl and work on optimising it (as that's far nicer than starting from scratch, as I can just rewrite one method at a time) |
| 15:48 | <Jero> | well, I followed the spec in everything (with three or four exceptions), so that's basically the same as forking off the direct implementation, don't you think? |
| 15:50 | <gsnedders> | Jero: yes |
| 15:51 | <gsnedders> | Jero: it would be interesting to compare the two, though (and optimising it won't take overly long to do) |
| 15:52 | <Jero> | my impl still has a couple of bugs (though most of them are related I think) |
| 15:53 | <Jero> | and I'm a bit behind when it comes to the last 60 or so revisions |
| 15:53 | <gsnedders> | heh. any bugs in the direct impl are either PHP bugs or spec bugs |
| 15:54 | <gsnedders> | and I wouldn't allow any regressions when optimising it |
| 15:55 | <Jero> | gsnedders: you can contribute to the code if you want to in the future |
| 15:55 | <gsnedders> | Jero: I'll probably optimise the tokeniser and then see how the two compare, then decide what to do from there |
| 15:56 | <Jero> | the tokeniser of my implementation you mean? |
| 15:57 | <gsnedders> | the tokeniser of the direct implementation, then compare it to your tokeniser |
| 15:57 | <Jero> | that sounds like a good idea |
| 15:58 | <Jero> | I'll upload the code I have on my PC to the online version of my parser, so you can compare it to the latest and greatest |
| 15:58 | <gsnedders> | heh. it won't be for a while, though |
| 15:58 | <gsnedders> | the tokeniser isn't written in the direct impl yet |
| 15:59 | <Jero> | oh i see :p |
| 15:59 | <gsnedders> | (which I had actually implied earlier) |
| 16:01 | <Jero> | also, don't you think it'd be great to have the HTML5's parsing algorithm being used by the built-in DOMDocument->loadHTML() function in PHP? |
| 16:02 | <Jero> | ATM that function uses the libxml2 HTML parser |
| 16:02 | <gsnedders> | Jero: as if you're ever gonna persude the PHP devs to implement a draft standard… |
| 16:02 | <Jero> | don't worry, it was just an idea.. |
| 16:03 | <gsnedders> | Jero: it took me many, many, many years to persuade them of a bug in strip_tags(), which they kept writing off as being invalid HTML (as the aim there is to use a basic parser that'll work with valid HTML) despite me citing specific parts of the specification that clearly said otherwise |
| 16:05 | <Jero> | heh |
| 16:06 | <gsnedders> | I bet they didn't have a copy of the SGML spec, and were simply saying what they thought was right. |
| 16:06 | <gsnedders> | (it's actually something that despite being part of the SGML spec is relevant) |
| 16:09 | <Jero> | what was the bug? |
| 16:10 | <gsnedders> | U+003E within quoted attribute values |
| 16:10 | <gsnedders> | it probably breaks if you mix single and double quotes, actually |
| 16:10 | <gsnedders> | e.g., <foo bar="this'> is parsed as a single |foo| element where @bar=this |
| 16:12 | <Jero> | so it closes the value of bar upon seeing the ' character? |
| 16:12 | <gsnedders> | yes |
| 16:12 | <Jero> | that is indeed very weird |
| 16:13 | <Jero> | and what was their argument? |
| 16:14 | <gsnedders> | actually, that does work correctly |
| 16:14 | <gsnedders> | var_dump(strip_tags('<foo bar="this\'>">')); indeed produces string(0) "" |
| 16:14 | <gsnedders> | Jero: for the > bug? that it was invalid HTML. |
| 16:14 | <gsnedders> | Jero: for the latter? I only just thought of it |
| 16:15 | <Jero> | i see |
| 16:15 | <gsnedders> | the former is untrue, as it is completely valid |
| 16:16 | <gsnedders> | [^<&] off the top of my head |
| 16:17 | <Jero> | heh |
| 16:17 | <Jero> | and they still haven't fixed it? |
| 16:17 | <gsnedders> | the former is fixed in 5.2.2, IIRC |
| 16:18 | <gsnedders> | only 5, though |
| 16:18 | <gsnedders> | the same patch would apply against 4.4 fine, but it's unfixed |
| 16:19 | <Jero> | that's stupid |
| 16:20 | <gsnedders> | typical of PHP development, though |
| 16:21 | <Jero> | that's too bad |
| 16:21 | <Philip`> | <foo <bar=<bar> is syntactically valid in HTML5 now - only ["&] (or ['&] or (\s|&)) does anything |
| 16:22 | Philip` | wonders how that will mess up strip_tags |
| 16:22 | <gsnedders> | Jero: http://cvs.php.net/viewvc.cgi/php-src/ext/standard/tests/strings/bug40432.phpt?revision=1.2&view=markup&pathrev=MAIN |
| 16:22 | <Jero> | thanks |
| 16:23 | <gsnedders> | I think I saw it fail in 5.2.3, actually |
| 16:24 | <gsnedders> | Philip`: http://cvs.php.net/viewvc.cgi/php-src/ext/standard/string.c?view=markup — search for php_u_strip_tags |
| 16:24 | <gsnedders> | Philip`: string(0) "" is PHP 5.2.3's output, though |
| 16:29 | <Jero> | gsnedders, i'm off, if you ever need me regarding my HTML5 parser, email me at [censored :)] |
| 16:30 | <gsnedders> | Jero: I'll be around here if you ever want me |
| 16:30 | <Jero> | alrighty, bye |
| 17:26 | <gsnedders> | jgraham: do you really think that those tests would be that hard to get working in another language? the script I use to parse it is in the repos |
| 17:28 | <gsnedders> | jgraham: I didn't want to copy the html5lib test cases format as it would mean I'd need the input data repeated multiple times for each algorithm |
| 17:34 | <Philip`> | gsnedders: It would probably be useful to give more detail on the test format, like how it represents arrays and strings |
| 17:34 | <Philip`> | or just use JSON since that already defines those things and everyone has JSON parsers already :-) |
| 17:35 | <gsnedders> | and have each test as an object with an array of results? |
| 17:39 | <gsnedders> | Philip`: but yeah, the documentation was thrown together very quickly |
| 17:52 | <Philip`> | gsnedders: I was thinking of something like [["Empty string", "", false, false, false, null, "", []], ...], since that's about the same as what you have already but more JSONic, but maybe ["Empty string", "", { "unsigned":false, "signed":false, "real":false, ... }] would be more easily extensible |
| 17:53 | <gsnedders> | Philip`: I was thinking {"":[false,false,false,null,null,[]]} |
| 17:53 | <Philip`> | It'd be nice if JSON allowed you to keep comments |
| 17:54 | <gsnedders> | Philip`: there are only headers for large groups of tests, so I don't feel that much about keeping them |
| 17:56 | <Philip`> | What about XML? <numbertest><!-- Empty string --><input></input><outputs><output algorithm="unsigned"><false/></output><output algorithm="integerlist"><items/></output>... |
| 17:56 | <gsnedders> | that means defining data types and the like |
| 17:56 | <Philip`> | Hmm, maybe the [false,false,...] one is easiest |
| 17:58 | <Philip`> | In any case, it does seem probably easier to use JSON rather than a custom data format when you have arrays and non-ASCII strings, to avoid making every implementor implement another test parser |
| 17:59 | <gsnedders> | that's true |
| 17:59 | <gsnedders> | just lack of comments in JSON is annoying |
| 18:00 | <gsnedders> | around 15 minutes to be completely happy with a JSON version of the test suite… not overly slow… |
| 18:00 | <Philip`> | (JSON is also quite handy when you're running tests in web browsers) |
| 18:01 | <gsnedders> | (It would've been easier if it were possible to get pretty printing of JSON in PHP) |
| 18:01 | <gsnedders> | (as I just hacked my existing parser) |
| 20:14 | <gsnedders> | jgraham: just looking at the PHPUnit compiled version of the tests? |
| 21:18 | <virtuelv_> | is it defined anywhere what the implied DOM should be like when using createHTMLDocument()? |
| 21:19 | <virtuelv_> | (iow: what should the DOM be like given var doc = document.implementation.createHTMLDocument(""); |
| 21:19 | <virtuelv_> | doc.documentElement.innerHTML = "<h1>What</h1>"; |
| 21:19 | <virtuelv_> | alert(doc.documentElement.outerHTML); |
| 21:20 | <virtuelv_> | what should be alerted? |
| 21:21 | <jgraham> | gsnedders: Yeah, for some reason I looked at the PHP version |
| 21:22 | <gsnedders> | jgraham: yeah. that'd be thy impossible to parse. there's now a JSON version of the tests in the repo as well, though |
| 21:22 | <gsnedders> | (but that loses some data, like not distinguishing between ints and floats) |
| 21:24 | <Philip`> | Could you store floats as strings instead of numbers? |
| 21:25 | <gsnedders> | then parse the string? |
| 21:25 | <gsnedders> | hmmm… |