10:11
<Hixie>
hm
10:11
<Hixie>
are there really two parse errors for "<!DOCTYPE" but only one for "<!DOCTYPE "?
10:20
<hsivonen>
Hixie: yes
10:21
<hsivonen>
Hixie: probably not worth tweaking
10:34
<hsivonen>
<b><table><td></b><i></table>X
10:34
<hsivonen>
Why isn't <b> supposed to reopen before X?
11:02
<Hixie>
isn't it?
11:03
<Hixie>
oh because the table is in the <b>
11:03
<Hixie>
and so the X is still in the <b>
11:03
<Hixie>
the </b> in the above has no effect
11:05
<virtuelv>
Hixie: How's Bergen?
11:14
<gsnedders>
http://geoffers.no-ip.com/svn/php-html-5-direct/tests/numbersTest
11:16
<Hixie>
virtuelv: rainy
11:17
<Hixie>
gsnedders: does that match the spec or the spec with your proposed changes?
11:17
<gsnedders>
Hixie: the spec
11:18
<virtuelv>
Hixie: Norway's been pretty much like that for a couple of weeks now
11:18
<gsnedders>
Hixie: even when the spec does very odd things (like a list of integers with input "10" outputting [1])
11:19
<Hixie>
gsnedders: k
11:19
<Hixie>
gsnedders: can you include that link in one of your e-mails? (or just mail it directly to me ian⊙hc) I'll try to look at what browsers do with those tests when I update the spec
11:20
<gsnedders>
Hixie: I'm going to email it shortly
11:20
<gsnedders>
Hixie: just a few more general issues with the number section, then my review of that is done, and I'll send it off with the final email
11:20
<hsivonen>
Hixie: ah, I didn't realized the table was in b. I've got a bug then.
11:22
<virtuelv>
Hixie: re DOMContentLoaded - it'd be useful to have some event when the DOM is loaded and styles are available/applied
11:41
<hsivonen>
translating the spec to code would be less error-prone if the spec didn't have gotos that create unnatural loops
11:42
<gsnedders>
hsivonen: heh. I ended up with a do {} while (true); in my implementation of the lists of integers.
11:42
<gsnedders>
then relying on break and continue statements
11:42
<hsivonen>
gsnedders: I'm pretty sure do-while is always natural
11:42
<hsivonen>
(natural in the compiler sense)
11:43
<gsnedders>
ah. in that sense.
11:43
<gsnedders>
(of natural)
11:43
<gsnedders>
PHP likely does something odd with it, though, knowing PHP.
11:44
<gsnedders>
has anyone apart from zcorpan_ and myself started the spec review, anyway?
11:44
<hsivonen>
if I had to guess, my guess would be that even PHP created only natural loops for the purpose of compiler optimization
11:44
<hsivonen>
gsnedders: I'm reviewing the parsing spec as I go
11:45
<hsivonen>
gsnedders: I don't have much to say about tokenization, but I have posted remark about tree building
11:47
<gsnedders>
hsivonen: ah. I just haven't seen that much.
11:48
<hsivonen>
lost in the flood I guess :-(
11:48
<gsnedders>
ah, now I see
11:55
<Hixie>
hsivonen: believe me, the spec doesn't look like what i'd want it to look like if i was doing this from scratch
11:55
<Hixie>
anyway, time to be a tourist
11:56
<gsnedders>
Hixie: rarely anything ends up as you'd like it to if you started from scratch :P
12:19
<hsivonen>
<a><p>X<a>Y</a>Z</p></a>
12:20
<hsivonen>
Why does the first <a> come off the stack before <p> goes in?
12:20
<hsivonen>
ooh. does the p get reparented?
12:22
<hsivonen>
now I'm confused
12:37
<met_>
http://ajaxian.com/archives/google-gears-roadmap-and-features
12:37
<hsivonen>
ooh! my code lacks step #10 of the AAA!
12:44
<Philip`>
gsnedders: In numbersTest: s/dimentions/dimensions/
12:47
<gsnedders>
Philip`: fixed
15:15
<gsnedders>
Jero: you around?
15:15
<Jero>
yup
15:17
<gsnedders>
did you start your PHP5 implementation from scratch not knowing that there was a semi-started one before, or some other reason?
15:24
<gsnedders>
Jero: and I've started on a 1:1 implementation in PHP, which isn't really so relevant in the real world
15:25
<Jero>
gsnedders: correct, I found out later that there was already an HTML5 parser in PHP
15:25
<Jero>
gsnedders: but I could access the site (some issues with Trac I believe)
15:26
<gsnedders>
Jero: it's not so interesting now. a lot of the code written for it is obsolete
15:26
<gsnedders>
http://php-html5lib.dashslot.net/svn/trunk works, though
15:27
<Jero>
gsnedders: interesting
15:27
<Jero>
also, what do you think of my implementation so far?
15:27
<gsnedders>
I've never had time to really look into it
15:27
<gsnedders>
(due to school, and now trying to get as much of the spec review done as possible before going away in a week)
15:28
<gsnedders>
http://geoffers.no-ip.com/svn/php-html-5-direct contains the direct implementation
15:29
<Jero>
thanks
15:29
<gsnedders>
it's all very slow, though
15:29
<Jero>
so is my implementation at the moment :p
15:29
<gsnedders>
the direct one will be far slower, though
15:30
<Jero>
yeah, i'm sure
15:30
<gsnedders>
as the aim is to make absolutely no compromises from the spec
15:30
<gsnedders>
which is the case of the tokeniser means one character at a time
15:30
<gsnedders>
*means emitting
15:30
<Jero>
yeah, that's not a very optimal solution :p
15:31
<Jero>
but I guess I've only made three or four changes to the entire parsing algorithm compared to the spec
15:33
<Philip`>
If you want to write a new tokeniser in some language, it could perhaps be helpful to build on my work - that has a direct representation of the spec algorithm, and generates C++ or JS code to execute it, and it ought to be fairly quick to do other languages in the same way
15:35
<Philip`>
(I need to add some kind of abstraction in the code-generating part - JS was only easy because it's almost entirely identical to C++ except for replacing 'bool' with 'var', and it takes a little bit more effort if you needs $s in front of variables)
15:35
<Philip`>
(but I'll at least try to create a Perl implementation too, to make sure it's sufficiently portable between languages)
15:44
<gsnedders>
Jero: I may, however, try forking off the direct impl and work on optimising it (as that's far nicer than starting from scratch, as I can just rewrite one method at a time)
15:48
<Jero>
well, I followed the spec in everything (with three or four exceptions), so that's basically the same as forking off the direct implementation, don't you think?
15:50
<gsnedders>
Jero: yes
15:51
<gsnedders>
Jero: it would be interesting to compare the two, though (and optimising it won't take overly long to do)
15:52
<Jero>
my impl still has a couple of bugs (though most of them are related I think)
15:53
<Jero>
and I'm a bit behind when it comes to the last 60 or so revisions
15:53
<gsnedders>
heh. any bugs in the direct impl are either PHP bugs or spec bugs
15:54
<gsnedders>
and I wouldn't allow any regressions when optimising it
15:55
<Jero>
gsnedders: you can contribute to the code if you want to in the future
15:55
<gsnedders>
Jero: I'll probably optimise the tokeniser and then see how the two compare, then decide what to do from there
15:56
<Jero>
the tokeniser of my implementation you mean?
15:57
<gsnedders>
the tokeniser of the direct implementation, then compare it to your tokeniser
15:57
<Jero>
that sounds like a good idea
15:58
<Jero>
I'll upload the code I have on my PC to the online version of my parser, so you can compare it to the latest and greatest
15:58
<gsnedders>
heh. it won't be for a while, though
15:58
<gsnedders>
the tokeniser isn't written in the direct impl yet
15:59
<Jero>
oh i see :p
15:59
<gsnedders>
(which I had actually implied earlier)
16:01
<Jero>
also, don't you think it'd be great to have the HTML5's parsing algorithm being used by the built-in DOMDocument->loadHTML() function in PHP?
16:02
<Jero>
ATM that function uses the libxml2 HTML parser
16:02
<gsnedders>
Jero: as if you're ever gonna persude the PHP devs to implement a draft standard…
16:02
<Jero>
don't worry, it was just an idea..
16:03
<gsnedders>
Jero: it took me many, many, many years to persuade them of a bug in strip_tags(), which they kept writing off as being invalid HTML (as the aim there is to use a basic parser that'll work with valid HTML) despite me citing specific parts of the specification that clearly said otherwise
16:05
<Jero>
heh
16:06
<gsnedders>
I bet they didn't have a copy of the SGML spec, and were simply saying what they thought was right.
16:06
<gsnedders>
(it's actually something that despite being part of the SGML spec is relevant)
16:09
<Jero>
what was the bug?
16:10
<gsnedders>
U+003E within quoted attribute values
16:10
<gsnedders>
it probably breaks if you mix single and double quotes, actually
16:10
<gsnedders>
e.g., <foo bar="this'> is parsed as a single |foo| element where @bar=this
16:12
<Jero>
so it closes the value of bar upon seeing the ' character?
16:12
<gsnedders>
yes
16:12
<Jero>
that is indeed very weird
16:13
<Jero>
and what was their argument?
16:14
<gsnedders>
actually, that does work correctly
16:14
<gsnedders>
var_dump(strip_tags('<foo bar="this\'>">')); indeed produces string(0) ""
16:14
<gsnedders>
Jero: for the > bug? that it was invalid HTML.
16:14
<gsnedders>
Jero: for the latter? I only just thought of it
16:15
<Jero>
i see
16:15
<gsnedders>
the former is untrue, as it is completely valid
16:16
<gsnedders>
[^<&] off the top of my head
16:17
<Jero>
heh
16:17
<Jero>
and they still haven't fixed it?
16:17
<gsnedders>
the former is fixed in 5.2.2, IIRC
16:18
<gsnedders>
only 5, though
16:18
<gsnedders>
the same patch would apply against 4.4 fine, but it's unfixed
16:19
<Jero>
that's stupid
16:20
<gsnedders>
typical of PHP development, though
16:21
<Jero>
that's too bad
16:21
<Philip`>
<foo <bar=<bar> is syntactically valid in HTML5 now - only ["&] (or ['&] or (\s|&)) does anything
16:22
Philip`
wonders how that will mess up strip_tags
16:22
<gsnedders>
Jero: http://cvs.php.net/viewvc.cgi/php-src/ext/standard/tests/strings/bug40432.phpt?revision=1.2&view=markup&pathrev=MAIN
16:22
<Jero>
thanks
16:23
<gsnedders>
I think I saw it fail in 5.2.3, actually
16:24
<gsnedders>
Philip`: http://cvs.php.net/viewvc.cgi/php-src/ext/standard/string.c?view=markup — search for php_u_strip_tags
16:24
<gsnedders>
Philip`: string(0) "" is PHP 5.2.3's output, though
16:29
<Jero>
gsnedders, i'm off, if you ever need me regarding my HTML5 parser, email me at [censored :)]
16:30
<gsnedders>
Jero: I'll be around here if you ever want me
16:30
<Jero>
alrighty, bye
17:26
<gsnedders>
jgraham: do you really think that those tests would be that hard to get working in another language? the script I use to parse it is in the repos
17:28
<gsnedders>
jgraham: I didn't want to copy the html5lib test cases format as it would mean I'd need the input data repeated multiple times for each algorithm
17:34
<Philip`>
gsnedders: It would probably be useful to give more detail on the test format, like how it represents arrays and strings
17:34
<Philip`>
or just use JSON since that already defines those things and everyone has JSON parsers already :-)
17:35
<gsnedders>
and have each test as an object with an array of results?
17:39
<gsnedders>
Philip`: but yeah, the documentation was thrown together very quickly
17:52
<Philip`>
gsnedders: I was thinking of something like [["Empty string", "", false, false, false, null, "", []], ...], since that's about the same as what you have already but more JSONic, but maybe ["Empty string", "", { "unsigned":false, "signed":false, "real":false, ... }] would be more easily extensible
17:53
<gsnedders>
Philip`: I was thinking {"":[false,false,false,null,null,[]]}
17:53
<Philip`>
It'd be nice if JSON allowed you to keep comments
17:54
<gsnedders>
Philip`: there are only headers for large groups of tests, so I don't feel that much about keeping them
17:56
<Philip`>
What about XML? <numbertest><!-- Empty string --><input></input><outputs><output algorithm="unsigned"><false/></output><output algorithm="integerlist"><items/></output>...
17:56
<gsnedders>
that means defining data types and the like
17:56
<Philip`>
Hmm, maybe the [false,false,...] one is easiest
17:58
<Philip`>
In any case, it does seem probably easier to use JSON rather than a custom data format when you have arrays and non-ASCII strings, to avoid making every implementor implement another test parser
17:59
<gsnedders>
that's true
17:59
<gsnedders>
just lack of comments in JSON is annoying
18:00
<gsnedders>
around 15 minutes to be completely happy with a JSON version of the test suite… not overly slow…
18:00
<Philip`>
(JSON is also quite handy when you're running tests in web browsers)
18:01
<gsnedders>
(It would've been easier if it were possible to get pretty printing of JSON in PHP)
18:01
<gsnedders>
(as I just hacked my existing parser)
20:14
<gsnedders>
jgraham: just looking at the PHPUnit compiled version of the tests?
21:18
<virtuelv_>
is it defined anywhere what the implied DOM should be like when using createHTMLDocument()?
21:19
<virtuelv_>
(iow: what should the DOM be like given var doc = document.implementation.createHTMLDocument("");
21:19
<virtuelv_>
doc.documentElement.innerHTML = "<h1>What</h1>";
21:19
<virtuelv_>
alert(doc.documentElement.outerHTML);
21:20
<virtuelv_>
what should be alerted?
21:21
<jgraham>
gsnedders: Yeah, for some reason I looked at the PHP version
21:22
<gsnedders>
jgraham: yeah. that'd be thy impossible to parse. there's now a JSON version of the tests in the repo as well, though
21:22
<gsnedders>
(but that loses some data, like not distinguishing between ints and floats)
21:24
<Philip`>
Could you store floats as strings instead of numbers?
21:25
<gsnedders>
then parse the string?
21:25
<gsnedders>
hmmm…