#whatwg on 2008-03-03

02:12	<Hixie>	so my girlfriend looks over my shoulder at the changes i just made, and without knowing i had just added them, points to the three sections i added or renamed today and laughs at me for naming them that
02:12	<Hixie>	maybe i'm having a bad naming day or something
02:23	<Lachy>	Hixie, which section names were they?
02:24	<Hixie>	the "after after body insertion mode", the "after after frameset insertion mode", and the "unexpected end" (with the rules for when you stop parsing "with prejudice")
02:45	<Hixie>	god, i can't wait for gsnedders' preprocessor
03:02	<mpt>	after after?
03:05	<jwalden>	after post-* perhaps if you care enough about the wording
03:14	<Lachy>	does anyone know if Firefox 3 has implemented support for any microformats yet? If so, what have they got, or did they drop that idea?
03:19	<jwalden>	I think it's mostly extensionland
03:31	<Lachy>	yeah, I'm aware of the extensions, but there were announcements around the beginning of last year that FF3 was going to add native support for them
04:14	<jwalden>	I think they never had a developer willing and capable, time-wise, of doing so
04:14	<jwalden>	but I didn't pay too mcuh attention
05:14	<Hixie>	Philip`: yt?
08:15	<zcorpan>	Hixie: re gamespy.com, opera does use quirks mode, apparently; i was probably getting confused myself by all the testing back and forth
08:16	<Hixie>	were the changes i made still good?
08:16	<zcorpan>	i think so
08:17	<zcorpan>	haven't looked at the changelog yet, but i trust that you did what you said you did :)
08:17	<Hixie>	please don't :-)
08:17	<zcorpan>	i have some more doctype feedback coming soon
08:17	<Hixie>	with these changes i expect i've made all kinds of errors
08:17	<Hixie>	cool
08:18	<zcorpan>	i think we need to ignore the trailing "EN" (or whatever it might be) in the FPI
08:19	<zcorpan>	firefox and safari don't work with a number of pages because of that
08:19	<zcorpan>	but opera and ie do
08:21	<zcorpan>	looking at about 60 pages that have something other than //EN at the end (and wouldn't trigger standards mode anyway), 34 look good only in quirks mode, 1 looks good only in standards mode (in opera/firefox), and 1 looks good only in standards mode in opera but the same in quirks or standards in firefox
08:21	<zcorpan>	and the rest looked pretty much the same in either mode
08:30	<Hixie>	if you haven't already, send mail saying how you propose to check for that in the spec
08:31	<zcorpan>	i'm about to send it in a minute. just check that the FPI starts with e.g. "-//w3c//dtd html 3.2//"
08:32	<hsivonen>	Hixie: should I expect large volumes of conformance checker-relevant spec changes in the near term?
08:32	<Hixie>	hsivonen: i'm currently going through the tree construction feedback
08:32	<hsivonen>	Hixie: I take that as a "yes" :-)
08:33	<Hixie>	well, i'm almost done
08:33	<Hixie>	and after that i'm out of things to do again
08:33	<Hixie>	so. maybe not "large" volumes :-)
08:33	<hsivonen>	ok. perhaps then I should just file bugs for the changes manually and not write a script
08:37	<hsivonen>	hmm. looks like I'm going to need the script anyway.
08:41	<Hixie>	script?
08:43	<hsivonen>	Hixie: a script for mining the spec svn log for conformance checker and tools-relevant changes and filing bugs automatically so that I don't miss changes
08:43	<hsivonen>	at this point, doing a vgrep on the spec and parser source isn't such a great idea
08:43	<hsivonen>	or vdiff, rather
08:46	<hsivonen>	hmm. I see a form field called "source" handled in web-apps-tracker, but I don't see a field named that way in the form
08:47	<zcorpan>	hsivonen: another hidden feature? :)
08:47	<Hixie>	anything with "c" (or is it "v"? one or the other) in the "affected" part of the checkin comments will give you the checkins that i think affect you
08:48	<hsivonen>	Hixie: yeah. now I need to figure out how to modify web-apps-tracker to post to Bugzilla
08:48	<Hixie>	heh
08:48	<Hixie>	good luck wit hthat
08:49	<hsivonen>	umm. is there a reason to expect that luck is needed when posting to Bugzilla_
08:49	<hsivonen>	?
08:54	<Hixie>	well your script will need to deal with cookies
08:55	<Hixie>	which always makes things exciting in my experience
08:56	<hsivonen>	isn't the login cookie a constant that can be hard-coded once sniffed from a browser session?
09:04	<Hixie>	not with bugzilla
09:04	<Hixie>	it is ip-address locked, iirc
09:07	<zcorpan>	hsivonen: see http://www.sitepoint.com/forums/showthread.php?p=3739972#post3738863 and onward
09:08	<zcorpan>	Hixie: likewise, perhaps accesskey should be made conforming... :)
09:08	<Hixie>	there's a whole folder on accesskey
09:08	<Hixie>	we need a solution
09:08	<Hixie>	we don't have one
09:08	<Hixie>	at least last i checked :-)
09:09	<zcorpan>	flagging it as an error isn't helping authors, it seems
09:19	<hsivonen>	zcorpan: I'm still trying to shun doctype sniffing on the XML side
09:19	<zcorpan>	hsivonen: yeah
09:20	<hsivonen>	(for the reasons stated in http://hsivonen.iki.fi/doctype/#xml )
09:20	<Hixie>	zcorpan: i agree; i haven't worked out what we should do yet
09:20	<hsivonen>	zcorpan: and the project that became Validator.nu started specifically as a non-DTD validator
09:21	<zcorpan>	hsivonen: don't tell me :)
09:21	<Philip`>	Hixie: Yes
09:21	<hsivonen>	:-)
09:22	<Hixie>	Philip`: no idea what i wanted to ask you anymore, sorry. it was probably about one of the e-mails you sent, in which case the question will be in my e-mail reply now.
09:22	<Philip`>	Oh, okay :-)
09:23	<hsivonen>	zcorpan: anyway, I don't find actionable feedback that doesn't run counter central design decisions or that don't belong to Hixie's plate instead (accesskey)
09:23	<zcorpan>	hsivonen: what about the accept header?
09:26	<hsivonen>	zcorpan: My thinking is that XHTML+MathML+SVG+RDF has a larger feature set than HTML only, so the former should be preferred
09:26	<hsivonen>	besides, people who do conneg usually want everything but IE to see the non-text/html version
09:27	<hsivonen>	moreover, the generic UI has a manual override anyway
09:28	<zcorpan>	makes sense, i guess tommy's case is a bit uncommon in using xhtml but wanting html
09:28	<zcorpan>	hsivonen: can i send GET parameters along with Opera's validate feature now?
09:29	<zcorpan>	(or remember settings some other way?)
09:32	<hsivonen>	zcorpan: what kind of GET parameters? Opera does a POST, doesn't it?
09:32	<zcorpan>	hsivonen: uh, make that GET-like parameters along with a POST request
09:33	<zcorpan>	that is, typing in http://validator.nu/?parser=html in opera:config
09:33	<hsivonen>	zcorpan: they are supported if those fields come before the form field that contains the document
09:33	<hsivonen>	oh
09:33	<hsivonen>	that's not supported
09:33	<Hixie>	me gets an e-mail from someone saying that the acid3 page on wikipedia is unfair to oepra because opera can't show its daily progress on nightly builds
09:33	<Hixie>	i wonder if i should point out that it was opera people who _created_ that page...
09:34	<Philip`>	It seems no more unfair to Opera than it is to IE
09:35	<hsivonen>	zcorpan: filed http://bugzilla.validator.nu/show_bug.cgi?id=77
09:35	<Hixie>	Philip`: well they also said it was unfair to IE
09:36	<zcorpan>	hsivonen: thanks
09:41	<Hixie>	hsivonen: so you think <table><p><i></table> should be only one error, not two? (wrong place for the <p>, missing </i>)
09:43	<Hixie>	i guess <table><p><i><tr> should arguably be not fewer errors than <table><p><i></table>
09:43	<Hixie>	and right now it is
09:47	<hsivonen>	Hixie: what have I said?
09:48	<hsivonen>	Hixie: two errors in that case seems reasonable on surface
09:48	<Hixie>	i've made it one error
09:48	<Hixie>	(foster parenting)
09:48	<hsivonen>	Hixie: well, that's sufficient for finding the document as a whole to be in error
09:48	<Hixie>	and made <table><p><tr> be one error too (it used to be two errors)
09:48	<Hixie>	yeah
09:49	<Hixie>	the next commit is this change
09:50	<Hixie>	man, i kee having to look at what the subject line of these mails was to work out what section they're talking about
09:52	<annevk>	hmm, i guess I can duplicate the subject line in the body somehow from now on
09:52	<hsivonen>	Hixie: I'm having trouble parsing that. Do you mean all the relevant bits should be in the email body from now on?
09:53	<Hixie>	pretend that i never look at the subject line
09:53	<Hixie>	i read the e-mail bodies to work out where to file the e-mails, and when i reply to them i just reply to a concatenated stream of bodies
09:54	<Hixie>	so really i never see the subject lines
09:54	<hsivonen>	Hixie: ok
09:54	<Hixie>	thanks :-)
09:54	<annevk>	can't you concatenate the subject too btw?
09:54	<Hixie>	(it's no biggie, though)
09:55	<Hixie>	annevk: i use pine, and pine doesn't do that
09:55	<hsivonen>	(aside: the bugzilla login cookie seems to be a constant)
09:55	<Hixie>	really? not ip-locked?
09:55	<Hixie>	i wonder why i keep getting logged out then
09:55	<hsivonen>	Hixie: I mean constant across requests
09:55	<Hixie>	ah ok
09:55	<annevk>	you can make it ip-locked if you want
09:56	<annevk>	at least, last time I checked that was optional
09:56	<Hixie>	yeah but that check box seems to not work across browsers, or something
09:56	<Hixie>	maybe it's locked to browser+ip or something
09:56	<Hixie>	i dunno
09:56	<gavin_>	the checkbox doesn't remove all IP restrictions, afaik
09:56	<Hixie>	i do know i keep having to log in to the various bugzilla instances i use
09:56	<gavin_>	it only makes them looser
09:56	<Hixie>	ah
09:56	<Hixie>	well it's annoying as hell
09:57	<annevk>	maybe it's better to have a private pc versus public pc option...
10:00	<Philip`>	Maybe they do that since it's easy to make an attachment file that steals people's cookies?
10:01	<Philip`>	but then they could just use HttpOnly cookies instead
10:06	<hsivonen>	Hixie: do I understand correctly that when you say "BOM", you mean U+FEFF. but when the Unicode folks say "BOM", they mean U+FEFF that indeed functions as a BOM?
10:07	<Hixie>	i rarely say "BOM" alone
10:07	<Hixie>	i usually say "U+FEFF BYTE ORDER MARK character" or some such
10:09	<hsivonen>	oh. Martin Dürst was quoting gsnedders, not you
10:09	<hsivonen>	anyway, I think I agree with Martin's point if I understood it correctly
10:09	<zcorpan>	speaking of BOMs, i noted a while ago that ES4 strips/ignores all U+FEFFs
10:10	<zcorpan>	because it was needed for web compat
10:10	<hsivonen>	that is, I think BOM swallowing should be left on the encoding layer except in the case of UTF-8 in which case the HTML5 layer should swallow it
10:11	<hsivonen>	conceptually, that is
10:11	<Hixie>	hsivonen: respond to my e-mail on the subject explaining why it helps users to do that instead of what i proposed :-)
10:11	<hsivonen>	in practice, the BOM sniffing needs to happen on the HTML layer, though...
10:12	<hsivonen>	so one would actually implement "UTF-16" by instantiating a UTF-16BE or a UTF-16LE decoder after swallowing the BOM
10:12	<Hixie>	x<table> x</table> -- should that render as "xx" like in firefox, or "x x" like in safari?
10:13	<hsivonen>	Hixie: I guess I have to reread what you proposed and test UTF-16BE with initial U+FEFF in browsers
10:13	<annevk>	Firefox seems better
10:13	<annevk>	(because you keep whitespace inside the <table>, which is also what Acid3 requires fwiw...)
10:14	<hsivonen>	after all, it seems to boil down to whether browsers treat an initial U+FEFF in UTF-16BE as non-space character data or not
10:14	<Hixie>	oh there's no doubt that <table> </table> has no foster parenting
10:14	<Hixie>	annevk: (the safari output could be obtailed through adoption)
10:14	<annevk>	then i guess I don't care
10:14	Hixie	looks at his data to see if anyone is using utf-16
10:15	<annevk>	<form> parsing is still causing us issues btw
10:15	<hsivonen>	Safari output follows from doing the whitespaceness check on the text node level which kinda makes sense
10:15	<annevk>	nested forms are common enough to warrent special rules it seems... :(
10:15	<Hixie>	annevk: has feedback been sent?
10:16	<annevk>	no, I've no idea what the spec should say
10:16	<annevk>	but probably something that matches WebKit/Firefox
10:17	<Hixie>	i saw UTF-16 explicitly declared on 0.004% of pages
10:18	<Hixie>	annevk: well, i don't even know what the issue is unless i have feedback :-)
10:19	Philip`	wrote a page with some script that DOM-inserts a form into the middle of another form, because he wanted an asynchronous file-upload box in the middle of a normal input form, and it felt quite evil :-(
10:19	<Philip`>	(but I think it works anyway, so that's good enough for me)
10:24	<Hixie>	hmm
10:24	<Hixie>	i wonder if we should do what hsivonen suggests in this e-mail, and basically make the tokeniser only emit strings, not characters
10:25	<Philip`>	How does that work with incremental rendering?
10:25	<Philip`>	(...of pages which are just text)
10:26	<Hixie>	poorly
10:26	<Hixie>	it also works poorly with things like " aaa" which should become " <html><head><head><body>aaa"
10:27	<annevk>	i hope you're still allowed to do incremental rendering?
10:27	<annevk>	by moving parts of text over?
10:28	<zcorpan>	Hixie: it becomes "<html><head><head><body> aaa" in opera, it seems, and i don't think we've run into any trouble because of that
10:30	<Hixie>	yeah, spaces around optional tags are messed up by most browsers
10:30	<Hixie>	i'm trying to fix that
10:30	<annevk>	please don't
10:31	<annevk>	it has already caused us issues
10:31	<Hixie>	i'm not specifying something that screws up the round tripping that badly
10:32	<annevk>	guess we implement html5-delta then :p
10:32	<Hixie>	i don't see why it should break things if we do it right
10:33	<annevk>	it was something about expecting documentElement.firstChild to be <head>
10:33	<zcorpan>	yeah, not ignoring whitespace before head broke pages
10:33	<Hixie>	yeah well oepra's parsing of <head> is so fucked up as it is that that wouldn't work anyway :-P
10:33	<annevk>	dude, we fixed that
10:33	<Hixie>	i'll believe that when i see it :-P
10:34	<zcorpan>	what's fucked up?
10:34	<Hixie>	i filed the bug years ago, it was only once i forced hte issue that acid3 that i saw any movement there at all
10:34	<Hixie>	i'm not at all convinced that it's been compltely fixed
10:34	<zcorpan>	i think we don't get a head for frameset documents, but that's all i know
10:35	<zcorpan>	(i.e. frameset documents without an explicit head)
10:35	<Hixie>	hmm, whatever solution we come up with for "x<table> x</table>" can also work for "x</body> </html> x"
10:36	<zcorpan>	i'd like the latter to be solved by ignoring </body> and </html>, but then i don't care about roundtripping so much
10:36	<zcorpan>	at least not roundtripping or insignificant whitespace
10:36	<annevk>	Hixie, http://my.opera.com/desktopteam/blog/ :)
10:36	<zcorpan>	and placemenet of comments
10:37	<Hixie>	well, not roundtripping spaces there basically means that you can't put spaces after the </body>.
10:37	<zcorpan>	oh noes :)
10:37	<Hixie>	as in, the syntax is a lie if we say you can have spaces after </body>
10:38	<Hixie>	and i think that's dumb :-)
10:38	<Hixie>	annevk: the last opera build i tried failed to connect to the network half the time, and the one before that crashed on startup :-)
10:38	<Hixie>	not to mention that opera on mac looks ugly as hell :-P
10:47	<Hixie>	jeez this week is going to be insane
10:47	<Hixie>	so many meetings
10:47	<Hixie>	ok bed time
10:47	<Hixie>	nn
11:15	hsivonen	congratulates self for writing the spec log to bugzilla script anyway
11:18	<annevk>	Hixie, just keep trying... anyway, e-mailed the nested forms issue
11:19	<hsivonen>	has anyone tested if TIS-620 needs to become an alias for Windows-874?
11:32	<Philip`>	hsivonen: I see 39 pages (out of 125K) that use charset=tis-620, if that's what you mean
11:37	<hsivonen>	Philip`: I mean: do browser implement tis-620 as an alias of Windows-874?
11:37	<Philip`>	Ah, okay
11:39	Philip`	sees that the spec diffs don't provide enough context to actually be useful
11:43	<annevk>	how much do you want?
11:45	<Philip`>	Potentially a quite large number of lines
11:45	<Philip`>	which would be inconvenient in the more common cases, which isn't good
11:45	<annevk>	maybe a hidden parameter context ?
11:45	<annevk>	so you could mangle the URI if you need more
11:46	<Philip`>	It'd be nice if instead of "@@ -37190,21 +37190,22 @@ function receiver(e) {" it showed something useful like the most recent parent node id attribute but that's probably hard :-)
11:46	<annevk>	yes
11:47	<annevk>	note that most IDs are auto-generated and those are not shown in web-apps-tracker
11:52	<MikeSmith>	Philip` - please forward that auto-responder message to me at mike⊙wo
11:53	<Philip`>	MikeSmith: Sent
11:53	<MikeSmith>	thanks
11:53	<Philip`>	Hixie: r1305 ("This change does not change the black box behaviour of the spec") does appear to change the behaviour of the spec
11:53	<Philip`>	(unless I've made a mistake)
11:54	<Philip`>	e.g. with the input <!doctype! system""?
11:54	<Philip`>	Expected: [u'ParseError', u'ParseError', u'ParseError', [u'DOCTYPE', u'!', None, u'', False]]
11:54	<Philip`>	Got: [u'ParseError', u'ParseError', u'ParseError', [u'DOCTYPE', u'!', None, u'', True]]
11:54	<Philip`>	where 'Expected' is what my implementation used to give
11:55	<annevk>	are you sure that's not the result of 1306?
11:56	<Philip`>	Oh, good point - it is
11:56	<Philip`>	because I couldn't read the r1305 diff properly, so I was referring to the spec too and got them mixed up :-p
11:56	<Philip`>	Hixie: Please ignore me :-)
12:10	<annevk>	Hixie, 'The "before htmlhtml root element node, which is then added to the stack.' misses a space
12:16	<annevk>	in "before head insertion mode" are the second-to-last and last equal?
12:17	<annevk>	also, the first in the "before head insertion mode" should also be grouped with those
16:34	<gsnedders>	hsivonen: I meant U+FEFF functioning as a BOM
16:35	<hsivonen>	gsnedders: ok. then I think I misunderstood something
16:35	hsivonen	is bitten by the Python variable visibility rules again :-(
16:37	Philip`	thinks the visibility thing is an oddly inelegant part of the language
16:50	gsnedders	thinks the ambiguous amperstand is confusing
16:56	Philip`	unexpectedly realises that "Prince" sounds like "prints", and that that possibly wasn't a coincidence in the software name
16:58	<hsivonen>	annevk: would a stack of form pointers work for the nested form case or is the issue more complex?
16:59	<annevk>	did you see my e-mail?
17:00	<annevk>	you basically want some kind of scoping where </form> can't get through
17:00	<annevk>	and you probably don't want to set the form pointer to null either
17:00	<hsivonen>	annevk: I saw the email. is scope a form pointer stack or something else?
17:00	<annevk>	<form><div></form><input></div></form> the <input> is still associated with the form
17:01	<hsivonen>	eww.
17:01	<annevk>	hsivonen, it's a block level element
17:01	<Philip`>	In <form><div><form><input>..., which form is the input associated with?
17:01	<annevk>	<center>, <blockquote>, <h1> - <h6>, <div>, etc.
17:01	<annevk>	Philip`, the second <form> gets ignored because the form pointer is already associated with something
17:04	<annevk>	Philip`, are you revising tests for all HTML5 spec changes?
17:05	<hsivonen>	more cases where legacy encoding labels are de facto aliases for newer encodings keep creeping out of the woodwork
17:06	<annevk>	i saw that in #webkit and asked him to mail whatwg⊙wo, i'm glad he did :)
17:06	<Philip`>	annevk: Only for the tokeniser
17:07	<Philip`>	(and I can't guarantee I haven't missed any changes)
17:08	<annevk>	hopefully enough fresh implementations keep coming to sort out all the mistakes...
17:08	<annevk>	or test contributions for that matter
17:09	<hsivonen>	aside: great Python on JVM news: http://fwierzbicki.blogspot.com/2008/02/jythons-future-looking-sunny.html
17:09	<Philip`>	I've been updating the Python html5lib to follow the spec, but gave up on the Ruby one after finding that it already had some non-trivial bugs (plus a trivial bug that hid all the others)
17:10	<Philip`>	Well, at least it had one non-trivial bug
17:10	<Philip`>	and maybe it was actually trivial, but I didn't try looking because I worried it might not be
17:11	<annevk>	did you leave the bug exposed?
17:11	<annevk>	if you did someone else will prolly fix it
17:11	<Philip`>	(this being the <x y="&notit"> case or something like that)
17:11	<Philip`>	annevk: Yes, it'll fail if someone runs the tokeniser tests
17:11	<annevk>	ah, that was annoying to fix on the python side too...
17:12	<annevk>	and the python side was actually hiding some bugs there too, i remember, hmm
17:12	<annevk>	since ruby was a port, i guess that's what went wrong...
17:12	<Philip`>	It seems easy to handle all the entities just with a regexp
17:12	<Philip`>	(plus another regexp for entities in attributes)
17:13	<Philip`>	though that's not so good if your input is a stream rather than a string
19:19	<Philip`>	hsivonen_: The obvious question is, what were fmt=0 up to fmt=5?
19:50	<Hixie>	Krzysztof Żelechowski's e-mails read like poetry
19:51	<Hixie>	or haikus
19:51	<Hixie>	e.g.:
19:51	<Hixie>	---
19:51	<Hixie>	I am not sure I understand you correctly
19:51	<Hixie>	but if this introduces the ability
19:51	<Hixie>	to make the user agent
19:51	<Hixie>	report a different URL than the effective target,
19:51	<gsnedders>	I should try sending an email to public-html or whatwg that's a pantoum sometime, just to see if someone notices
19:51	<Hixie>	it is going to be a sweet candy for phishers.
19:51	<Hixie>	(Newer browsers made this effect unavailable to scripts).
19:51	<Hixie>	---
19:51	<gsnedders>	actually, a pantoum is too obvious. too much repetition.
19:51	<gsnedders>	Maybe a sonnet?
20:27	<annevk>	gsnedders, it sets it from missing to the empty string...
20:27	<gsnedders>	annevk: where?
20:27	<annevk>	gsnedders, if you don't see that you're reading too much into it
20:27	<annevk>	gsnedders, I quoted those bits in my e-mail...
20:27	<gsnedders>	I searched the entire spec for "missing"
20:27	<annevk>	missing is not mentioned again
20:28	<annevk>	it does not need to be
20:28	<gsnedders>	annevk: I'm an asshole, per markp's definition. what do you expect? :P
20:29	<gsnedders>	annevk: if it is marked as missing, how does it not need to be marked as not missing?
20:29	annevk	shrugs
20:29	<gsnedders>	From what I can see in the spec, it is marked as missing, and is never marked as _not_ missing
20:30	<gsnedders>	it needs to be set in the "Before DOCTYPE public identifier state" and the "Before DOCTYPE system identifier state"
20:53	<annevk>	"public identifier, and system identifier must be marked as missing (which is a distinct state from the empty string)"
20:54	<annevk>	then it says "Set the DOCTYPE token's system identifier to the empty string"
21:19	<Philip`>	Hmm, ICU4J's gb2312 seems to act identically to iso-8859-1
21:19	<Philip`>	(gb2312-1980 works like it should, though)
21:21	<annevk>	i think i'll add window.scroll/scrollTo/scrollBy to CSSOM View
21:22	<Philip`>	Oops, I'm wrong
21:24	<Philip`>	gb2312 does seem to work, but values outside the special range (about 0xA0-0xFF) are treated like in iso-8859-1
21:25	<Philip`>	and gb2312-1980 is something totally different and not ASCII compatible
21:26	<annevk>	after all, it has scroll*
21:28	<Hixie>	ok so i parsed over 6 billion files with the parser change to the insertion mode thing (testing that it never hit a table-related element) and it didn't crash
21:29	<Hixie>	so i figure it's safe
21:31	<gsnedders>	annevk: that to me doesn't mean actually changing the marker
21:33	<Philip`>	Exception in thread "pool-1-thread-1201" java.lang.NoClassDefFoundError: Could not initialize class nu.validator.htmlparser.impl.EncodingInfo
21:33	<Philip`>	Hmm...
21:34	<hsivonen_>	Philip`: did you update ICU4J without updating the parser, too?
21:34	<Philip`>	I'm not sure what version of the parser I'm using
21:34	<hsivonen_>	Philip`: the new ICU4J has a b0rked UTF-7 decoder that crashes the old EncodingInfo
21:34	<Philip`>	so that's quite possible
21:35	<hsivonen_>	or, rather, EncodingInfo crashes the UTF-7 decoder
21:36	<annevk>	gsnedders, there's no marker, there's just states
21:36	<Philip`>	Things work better when I use a more recently compiled version of the parser - thanks :-)
21:37	<gsnedders>	annevk: "marked as missing" — a marker.
21:38	<annevk>	"must be marked as missing (which is a distinct state from the empty string)" -- a state
21:38	<annevk>	...
21:38	annevk	-> back to cssom-view
21:38	<gsnedders>	so the marker is a state. ergh.
21:40	<Philip`>	I see reported encoding errors on 1.5% of pages with reported encodings
21:41	<Philip`>	(counting things like charset=ISO-8559-1 as an error)
21:41	<Philip`>	(but unknown charsets are only about a quarter of the errors)
21:43	<Philip`>	(18% of gb2312 pages have errors)
21:47	<Philip`>	http://www.narkasabasi.com/v2/ <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-9"> <meta http-equiv="content-Type" content="text/html; charset=windows-1254" /> <meta http-equiv="content-type" content-type content="text/html"; charset= "x-mac-turkish">
21:47	<Philip`>	(Urgh - s/ /\n/)
21:47	<Philip`>	Looks like they couldn't quite make their mind up
21:49	<Philip`>	<meta content="http://schemas.microsoft.com/intellisense/ie5"; name="vs_targetSchema" charset="utf-8"> - that's not good if it gets incorrcetly interpreted as the page's charset
21:50	jgraham	wonders why dave hodder is asking the same question on public-html-comments that I already answered on the whatwg list
21:50	<Philip`>	http://jellybelly.com/International/Japanese/home.html is an interesting test case
21:50	<Philip`>	(Opera 9.2 fails)
21:51	<Philip`>	(Opera 9.5 fails too, though it gets the layout correct)
21:51	<SadEagle>	Philip`: check out the JS in there, too
21:51	<annevk>	jgraham, I think he's asking for a pointer to that e-mail again, he apparently forgot :)
21:51	<Philip`>	(Safari 3 fails too)
21:52	jgraham	goes to answer again
21:52	<Philip`>	SadEagle: What's unusual about the JS?
21:53	<SadEagle>	nothing unusual --- but it's sniffing for netscape >= 3, ie >= 4
21:54	<Philip`>	Ah
21:55	<Philip`>	That should degrade gracefully in other browsers, so it's not much of a problem :-)
21:58	<annevk>	oh, Acid3 is announced
22:00	Philip`	sees 88582 pages with <meta http-equiv="content-type" content="text/html; charset=...">, 25175 with an HTTP Content-Type: text/html; charset=..., and 171 with <meta charset="...">
22:00	<Philip`>	(out of 130K)
22:01	<annevk>	how many pages with content="" and without a valid value for http-equiv="" ?
22:01	<Philip`>	What's a "valid value"?
22:02	<annevk>	content-type ascii case-insensitive
22:02	<Philip`>	What about all the other http-equiv values?
22:02	<annevk>	where content contains the word charset
22:03	<annevk>	content=""*
22:03	<Philip`>	Hmm, I suppose I could look for that
22:04	<Philip`>	but I won't bother doing that now, since it takes 20 minutes to run
22:04	<annevk>	k
22:21	<SadEagle>	Philip`: LOL, just stumbled on a webpage with 2 encoding headers, neither of which is right
23:15	<jgraham>	Hixie: In the outline algorithm, in the conditon "When exiting a sectioning content element, if the stack is not empty"
23:15	<Hixie>	yes?
23:15	<jgraham>	it's not clear to me what "Let current section be the last section in the outline of the current outlinee element.
23:15	<jgraham>	Insert its outline at the end of the current section. (This does not change which section is the last section in the outline.)" means
23:16	<jgraham>	Specifically the last line.
23:16	<Hixie>	oops
23:16	<Hixie>	"its" refers to the sectioning content element being exited
23:16	<Hixie>	let me fix that
23:21	<Hixie>	ok fixed
23:21	<Hixie>	is that clearer?
23:25	<jgraham>	I think that helps. I just need to work through and see if I have a sensible mental model of what's happening
23:25	<Hixie>	k
23:25	<Hixie>	i know you will, but, let me know what i can do to improve it
23:26	<Hixie>	the current algorithm should be way better than what was there before, yet get mostly the same results
23:28	<annevk>	Hixie, you're still going through parser feedback right?
23:28	<Hixie>	yes
23:28	<Hixie>	i'm stalled right now trying to figure out how to handle spaces in <table> x </table>
23:29	<annevk>	flip a coin :)
23:30	<Hixie>	between what and what? i have no options so far :-)
23:30	<annevk>	between " x <table></table>" and "x<table> </table>"
23:30	<Philip`>	annevk: Do you mean the algorithm for handling spaces should involve flipping a coin?
23:31	<annevk>	Philip`, that could be interesting, but requiring specific hardware for HTML 5 might be too much
23:32	jgraham	suggests asking the user to decide
23:33	<Philip`>	annevk: Any implementation is acceptable as long as it acts the same as flipping a coin
23:33	<Hixie>	annevk: oh i've decided it's "x <table> </table>", the question is how to get there.
23:33	<jgraham>	"You have encountered a space inside a table. Would you like to move it outside (Y/n)"
23:34	<Philip`>	There should be a web service that provides a stream of random bits, called Flipr
23:34	<Hixie>	i'm thinking a flag on the table that decides whether spaces are sent out or not
23:34	<Hixie>	that gets set as soon as you send anything out
23:34	<Hixie>	the problem is nested tables in the innerHTML case makes this relatively hard to specify
23:34	<annevk>	so i think the current spec covers it
23:34	<annevk>	because you consume characters until the end
23:34	<annevk>	which makes "x " a character block
23:35	<annevk>	and the space before it gets treated specially
23:35	<Hixie>	the current spec has no concept of "character blocks" :-)
23:35	<annevk>	the append the character stuff
23:35	<Hixie>	but e.g. "<table> x<span></span> </table>" should become "x<span></span> <table> </table>"
23:36	<Hixie>	so it's not that simply
23:36	<Hixie>	simple
23:36	<annevk>	for real?
23:36	annevk	didn't think that trailling space would be placed in front too
23:37	<Hixie>	"<table> foo<span></span> bar</table>" shouldn't display "foobar"
23:37	<Hixie>	it should display "foo bar"
23:37	<Hixie>	that's the bug :-)
23:38	<annevk>	a flag sounds easiest then, yes
23:49	<jgraham>	Hixie: I think I'm still confused about what it means to insert a section "at the end of" another section. Do you mean "as the last child section of" the other section?
23:50	<jgraham>	(so if you have <body><section><h1>foo you end up with a section for the <section> as a child of the section for the <body>
23:50	<Hixie>	in "When entering a heading content element" i used the term "append it to /candidate section/"
23:50	<Hixie>	is that clearer?
23:50	<annevk>	sounds like appendChild()...
23:52	<jgraham>	Yeah, that bit is clearer.
23:52	<Hixie>	ok i'll use that terminology
23:52	Hixie	regens