#whatwg on 2007-07-12

10:11	<Hixie>	hm
10:11	<Hixie>	are there really two parse errors for "<!DOCTYPE" but only one for "<!DOCTYPE "?
10:20	<hsivonen>	Hixie: yes
10:21	<hsivonen>	Hixie: probably not worth tweaking
10:34	<hsivonen>	<b><table><td></b><i></table>X
10:34	<hsivonen>	Why isn't <b> supposed to reopen before X?
11:02	<Hixie>	isn't it?
11:03	<Hixie>	oh because the table is in the <b>
11:03	<Hixie>	and so the X is still in the <b>
11:03	<Hixie>	the </b> in the above has no effect
11:05	<virtuelv>	Hixie: How's Bergen?
11:14	<gsnedders>	http://geoffers.no-ip.com/svn/php-html-5-direct/tests/numbersTest
11:16	<Hixie>	virtuelv: rainy
11:17	<Hixie>	gsnedders: does that match the spec or the spec with your proposed changes?
11:17	<gsnedders>	Hixie: the spec
11:18	<virtuelv>	Hixie: Norway's been pretty much like that for a couple of weeks now
11:18	<gsnedders>	Hixie: even when the spec does very odd things (like a list of integers with input "10" outputting [1])
11:19	<Hixie>	gsnedders: k
11:19	<Hixie>	gsnedders: can you include that link in one of your e-mails? (or just mail it directly to me ian⊙hc) I'll try to look at what browsers do with those tests when I update the spec
11:20	<gsnedders>	Hixie: I'm going to email it shortly
11:20	<gsnedders>	Hixie: just a few more general issues with the number section, then my review of that is done, and I'll send it off with the final email
11:20	<hsivonen>	Hixie: ah, I didn't realized the table was in b. I've got a bug then.
11:22	<virtuelv>	Hixie: re DOMContentLoaded - it'd be useful to have some event when the DOM is loaded and styles are available/applied
11:41	<hsivonen>	translating the spec to code would be less error-prone if the spec didn't have gotos that create unnatural loops
11:42	<gsnedders>	hsivonen: heh. I ended up with a do {} while (true); in my implementation of the lists of integers.
11:42	<gsnedders>	then relying on break and continue statements
11:42	<hsivonen>	gsnedders: I'm pretty sure do-while is always natural
11:42	<hsivonen>	(natural in the compiler sense)
11:43	<gsnedders>	ah. in that sense.
11:43	<gsnedders>	(of natural)
11:43	<gsnedders>	PHP likely does something odd with it, though, knowing PHP.
11:44	<gsnedders>	has anyone apart from zcorpan_ and myself started the spec review, anyway?
11:44	<hsivonen>	if I had to guess, my guess would be that even PHP created only natural loops for the purpose of compiler optimization
11:44	<hsivonen>	gsnedders: I'm reviewing the parsing spec as I go
11:45	<hsivonen>	gsnedders: I don't have much to say about tokenization, but I have posted remark about tree building
11:47	<gsnedders>	hsivonen: ah. I just haven't seen that much.
11:48	<hsivonen>	lost in the flood I guess :-(
11:48	<gsnedders>	ah, now I see
11:55	<Hixie>	hsivonen: believe me, the spec doesn't look like what i'd want it to look like if i was doing this from scratch
11:55	<Hixie>	anyway, time to be a tourist
11:56	<gsnedders>	Hixie: rarely anything ends up as you'd like it to if you started from scratch :P
12:19	<hsivonen>	<a><p>X<a>Y</a>Z</p></a>
12:20	<hsivonen>	Why does the first <a> come off the stack before <p> goes in?
12:20	<hsivonen>	ooh. does the p get reparented?
12:22	<hsivonen>	now I'm confused
12:37	<met_>	http://ajaxian.com/archives/google-gears-roadmap-and-features
12:37	<hsivonen>	ooh! my code lacks step #10 of the AAA!
12:44	<Philip`>	gsnedders: In numbersTest: s/dimentions/dimensions/
12:47	<gsnedders>	Philip`: fixed
15:15	<gsnedders>	Jero: you around?
15:15	<Jero>	yup
15:17	<gsnedders>	did you start your PHP5 implementation from scratch not knowing that there was a semi-started one before, or some other reason?
15:24	<gsnedders>	Jero: and I've started on a 1:1 implementation in PHP, which isn't really so relevant in the real world
15:25	<Jero>	gsnedders: correct, I found out later that there was already an HTML5 parser in PHP
15:25	<Jero>	gsnedders: but I could access the site (some issues with Trac I believe)
15:26	<gsnedders>	Jero: it's not so interesting now. a lot of the code written for it is obsolete
15:26	<gsnedders>	http://php-html5lib.dashslot.net/svn/trunk works, though
15:27	<Jero>	gsnedders: interesting
15:27	<Jero>	also, what do you think of my implementation so far?
15:27	<gsnedders>	I've never had time to really look into it
15:27	<gsnedders>	(due to school, and now trying to get as much of the spec review done as possible before going away in a week)
15:28	<gsnedders>	http://geoffers.no-ip.com/svn/php-html-5-direct contains the direct implementation
15:29	<Jero>	thanks
15:29	<gsnedders>	it's all very slow, though
15:29	<Jero>	so is my implementation at the moment :p
15:29	<gsnedders>	the direct one will be far slower, though
15:30	<Jero>	yeah, i'm sure
15:30	<gsnedders>	as the aim is to make absolutely no compromises from the spec
15:30	<gsnedders>	which is the case of the tokeniser means one character at a time
15:30	<gsnedders>	*means emitting
15:30	<Jero>	yeah, that's not a very optimal solution :p
15:31	<Jero>	but I guess I've only made three or four changes to the entire parsing algorithm compared to the spec
15:33	<Philip`>	If you want to write a new tokeniser in some language, it could perhaps be helpful to build on my work - that has a direct representation of the spec algorithm, and generates C++ or JS code to execute it, and it ought to be fairly quick to do other languages in the same way
15:35	<Philip`>	(I need to add some kind of abstraction in the code-generating part - JS was only easy because it's almost entirely identical to C++ except for replacing 'bool' with 'var', and it takes a little bit more effort if you needs $s in front of variables)
15:35	<Philip`>	(but I'll at least try to create a Perl implementation too, to make sure it's sufficiently portable between languages)
15:44	<gsnedders>	Jero: I may, however, try forking off the direct impl and work on optimising it (as that's far nicer than starting from scratch, as I can just rewrite one method at a time)
15:48	<Jero>	well, I followed the spec in everything (with three or four exceptions), so that's basically the same as forking off the direct implementation, don't you think?
15:50	<gsnedders>	Jero: yes
15:51	<gsnedders>	Jero: it would be interesting to compare the two, though (and optimising it won't take overly long to do)
15:52	<Jero>	my impl still has a couple of bugs (though most of them are related I think)
15:53	<Jero>	and I'm a bit behind when it comes to the last 60 or so revisions
15:53	<gsnedders>	heh. any bugs in the direct impl are either PHP bugs or spec bugs
15:54	<gsnedders>	and I wouldn't allow any regressions when optimising it
15:55	<Jero>	gsnedders: you can contribute to the code if you want to in the future
15:55	<gsnedders>	Jero: I'll probably optimise the tokeniser and then see how the two compare, then decide what to do from there
15:56	<Jero>	the tokeniser of my implementation you mean?
15:57	<gsnedders>	the tokeniser of the direct implementation, then compare it to your tokeniser
15:57	<Jero>	that sounds like a good idea
15:58	<Jero>	I'll upload the code I have on my PC to the online version of my parser, so you can compare it to the latest and greatest
15:58	<gsnedders>	heh. it won't be for a while, though
15:58	<gsnedders>	the tokeniser isn't written in the direct impl yet
15:59	<Jero>	oh i see :p
15:59	<gsnedders>	(which I had actually implied earlier)
16:01	<Jero>	also, don't you think it'd be great to have the HTML5's parsing algorithm being used by the built-in DOMDocument->loadHTML() function in PHP?
16:02	<Jero>	ATM that function uses the libxml2 HTML parser
16:02	<gsnedders>	Jero: as if you're ever gonna persude the PHP devs to implement a draft standard…
16:02	<Jero>	don't worry, it was just an idea..
16:03	<gsnedders>	Jero: it took me many, many, many years to persuade them of a bug in strip_tags(), which they kept writing off as being invalid HTML (as the aim there is to use a basic parser that'll work with valid HTML) despite me citing specific parts of the specification that clearly said otherwise
16:05	<Jero>	heh
16:06	<gsnedders>	I bet they didn't have a copy of the SGML spec, and were simply saying what they thought was right.
16:06	<gsnedders>	(it's actually something that despite being part of the SGML spec is relevant)
16:09	<Jero>	what was the bug?
16:10	<gsnedders>	U+003E within quoted attribute values
16:10	<gsnedders>	it probably breaks if you mix single and double quotes, actually
16:10	<gsnedders>	e.g., <foo bar="this'> is parsed as a single \|foo\| element where @bar=this
16:12	<Jero>	so it closes the value of bar upon seeing the ' character?
16:12	<gsnedders>	yes
16:12	<Jero>	that is indeed very weird
16:13	<Jero>	and what was their argument?
16:14	<gsnedders>	actually, that does work correctly
16:14	<gsnedders>	var_dump(strip_tags('<foo bar="this\'>">')); indeed produces string(0) ""
16:14	<gsnedders>	Jero: for the > bug? that it was invalid HTML.
16:14	<gsnedders>	Jero: for the latter? I only just thought of it
16:15	<Jero>	i see
16:15	<gsnedders>	the former is untrue, as it is completely valid
16:16	<gsnedders>	[^<&] off the top of my head
16:17	<Jero>	heh
16:17	<Jero>	and they still haven't fixed it?
16:17	<gsnedders>	the former is fixed in 5.2.2, IIRC
16:18	<gsnedders>	only 5, though
16:18	<gsnedders>	the same patch would apply against 4.4 fine, but it's unfixed
16:19	<Jero>	that's stupid
16:20	<gsnedders>	typical of PHP development, though
16:21	<Jero>	that's too bad
16:21	<Philip`>	<foo <bar=<bar> is syntactically valid in HTML5 now - only ["&] (or ['&] or (\s\|&)) does anything
16:22	Philip`	wonders how that will mess up strip_tags
16:22	<gsnedders>	Jero: http://cvs.php.net/viewvc.cgi/php-src/ext/standard/tests/strings/bug40432.phpt?revision=1.2&view=markup&pathrev=MAIN
16:22	<Jero>	thanks
16:23	<gsnedders>	I think I saw it fail in 5.2.3, actually
16:24	<gsnedders>	Philip`: http://cvs.php.net/viewvc.cgi/php-src/ext/standard/string.c?view=markup — search for php_u_strip_tags
16:24	<gsnedders>	Philip`: string(0) "" is PHP 5.2.3's output, though
16:29	<Jero>	gsnedders, i'm off, if you ever need me regarding my HTML5 parser, email me at [censored :)]
16:30	<gsnedders>	Jero: I'll be around here if you ever want me
16:30	<Jero>	alrighty, bye
17:26	<gsnedders>	jgraham: do you really think that those tests would be that hard to get working in another language? the script I use to parse it is in the repos
17:28	<gsnedders>	jgraham: I didn't want to copy the html5lib test cases format as it would mean I'd need the input data repeated multiple times for each algorithm
17:34	<Philip`>	gsnedders: It would probably be useful to give more detail on the test format, like how it represents arrays and strings
17:34	<Philip`>	or just use JSON since that already defines those things and everyone has JSON parsers already :-)
17:35	<gsnedders>	and have each test as an object with an array of results?
17:39	<gsnedders>	Philip`: but yeah, the documentation was thrown together very quickly
17:52	<Philip`>	gsnedders: I was thinking of something like [["Empty string", "", false, false, false, null, "", []], ...], since that's about the same as what you have already but more JSONic, but maybe ["Empty string", "", { "unsigned":false, "signed":false, "real":false, ... }] would be more easily extensible
17:53	<gsnedders>	Philip`: I was thinking {"":[false,false,false,null,null,[]]}
17:53	<Philip`>	It'd be nice if JSON allowed you to keep comments
17:54	<gsnedders>	Philip`: there are only headers for large groups of tests, so I don't feel that much about keeping them
17:56	<Philip`>	What about XML? <numbertest><!-- Empty string --><input></input><outputs><output algorithm="unsigned"><false/></output><output algorithm="integerlist"><items/></output>...
17:56	<gsnedders>	that means defining data types and the like
17:56	<Philip`>	Hmm, maybe the [false,false,...] one is easiest
17:58	<Philip`>	In any case, it does seem probably easier to use JSON rather than a custom data format when you have arrays and non-ASCII strings, to avoid making every implementor implement another test parser
17:59	<gsnedders>	that's true
17:59	<gsnedders>	just lack of comments in JSON is annoying
18:00	<gsnedders>	around 15 minutes to be completely happy with a JSON version of the test suite… not overly slow…
18:00	<Philip`>	(JSON is also quite handy when you're running tests in web browsers)
18:01	<gsnedders>	(It would've been easier if it were possible to get pretty printing of JSON in PHP)
18:01	<gsnedders>	(as I just hacked my existing parser)
20:14	<gsnedders>	jgraham: just looking at the PHPUnit compiled version of the tests?
21:18	<virtuelv_>	is it defined anywhere what the implied DOM should be like when using createHTMLDocument()?
21:19	<virtuelv_>	(iow: what should the DOM be like given var doc = document.implementation.createHTMLDocument("");
21:19	<virtuelv_>	doc.documentElement.innerHTML = "<h1>What</h1>";
21:19	<virtuelv_>	alert(doc.documentElement.outerHTML);
21:20	<virtuelv_>	what should be alerted?
21:21	<jgraham>	gsnedders: Yeah, for some reason I looked at the PHP version
21:22	<gsnedders>	jgraham: yeah. that'd be thy impossible to parse. there's now a JSON version of the tests in the repo as well, though
21:22	<gsnedders>	(but that loses some data, like not distinguishing between ints and floats)
21:24	<Philip`>	Could you store floats as strings instead of numbers?
21:25	<gsnedders>	then parse the string?
21:25	<gsnedders>	hmmm…