02:12
<Hixie>
so my girlfriend looks over my shoulder at the changes i just made, and without knowing i had just added them, points to the three sections i added or renamed today and laughs at me for naming them that
02:12
<Hixie>
maybe i'm having a bad naming day or something
02:23
<Lachy>
Hixie, which section names were they?
02:24
<Hixie>
the "after after body insertion mode", the "after after frameset insertion mode", and the "unexpected end" (with the rules for when you stop parsing "with prejudice")
02:45
<Hixie>
god, i can't wait for gsnedders' preprocessor
03:02
<mpt>
after after?
03:05
<jwalden>
after post-* perhaps if you care enough about the wording
03:14
<Lachy>
does anyone know if Firefox 3 has implemented support for any microformats yet? If so, what have they got, or did they drop that idea?
03:19
<jwalden>
I think it's mostly extensionland
03:31
<Lachy>
yeah, I'm aware of the extensions, but there were announcements around the beginning of last year that FF3 was going to add native support for them
04:14
<jwalden>
I think they never had a developer willing and capable, time-wise, of doing so
04:14
<jwalden>
but I didn't pay too mcuh attention
05:14
<Hixie>
Philip`: yt?
08:15
<zcorpan>
Hixie: re gamespy.com, opera does use quirks mode, apparently; i was probably getting confused myself by all the testing back and forth
08:16
<Hixie>
were the changes i made still good?
08:16
<zcorpan>
i think so
08:17
<zcorpan>
haven't looked at the changelog yet, but i trust that you did what you said you did :)
08:17
<Hixie>
please don't :-)
08:17
<zcorpan>
i have some more doctype feedback coming soon
08:17
<Hixie>
with these changes i expect i've made all kinds of errors
08:17
<Hixie>
cool
08:18
<zcorpan>
i think we need to ignore the trailing "EN" (or whatever it might be) in the FPI
08:19
<zcorpan>
firefox and safari don't work with a number of pages because of that
08:19
<zcorpan>
but opera and ie do
08:21
<zcorpan>
looking at about 60 pages that have something other than //EN at the end (and wouldn't trigger standards mode anyway), 34 look good only in quirks mode, 1 looks good only in standards mode (in opera/firefox), and 1 looks good only in standards mode in opera but the same in quirks or standards in firefox
08:21
<zcorpan>
and the rest looked pretty much the same in either mode
08:30
<Hixie>
if you haven't already, send mail saying how you propose to check for that in the spec
08:31
<zcorpan>
i'm about to send it in a minute. just check that the FPI *starts* with e.g. "-//w3c//dtd html 3.2//"
08:32
<hsivonen>
Hixie: should I expect large volumes of conformance checker-relevant spec changes in the near term?
08:32
<Hixie>
hsivonen: i'm currently going through the tree construction feedback
08:32
<hsivonen>
Hixie: I take that as a "yes" :-)
08:33
<Hixie>
well, i'm almost done
08:33
<Hixie>
and after that i'm out of things to do again
08:33
<Hixie>
so. maybe not "large" volumes :-)
08:33
<hsivonen>
ok. perhaps then I should just file bugs for the changes manually and not write a script
08:37
<hsivonen>
hmm. looks like I'm going to need the script anyway.
08:41
<Hixie>
script?
08:43
<hsivonen>
Hixie: a script for mining the spec svn log for conformance checker and tools-relevant changes and filing bugs automatically so that I don't miss changes
08:43
<hsivonen>
at this point, doing a vgrep on the spec and parser source isn't such a great idea
08:43
<hsivonen>
or vdiff, rather
08:46
<hsivonen>
hmm. I see a form field called "source" handled in web-apps-tracker, but I don't see a field named that way in the form
08:47
<zcorpan>
hsivonen: another hidden feature? :)
08:47
<Hixie>
anything with "c" (or is it "v"? one or the other) in the "affected" part of the checkin comments will give you the checkins that i think affect you
08:48
<hsivonen>
Hixie: yeah. now I need to figure out how to modify web-apps-tracker to post to Bugzilla
08:48
<Hixie>
heh
08:48
<Hixie>
good luck wit hthat
08:49
<hsivonen>
umm. is there a reason to expect that luck is needed when posting to Bugzilla_
08:49
<hsivonen>
?
08:54
<Hixie>
well your script will need to deal with cookies
08:55
<Hixie>
which always makes things exciting in my experience
08:56
<hsivonen>
isn't the login cookie a constant that can be hard-coded once sniffed from a browser session?
09:04
<Hixie>
not with bugzilla
09:04
<Hixie>
it is ip-address locked, iirc
09:07
<zcorpan>
hsivonen: see http://www.sitepoint.com/forums/showthread.php?p=3739972#post3738863 and onward
09:08
<zcorpan>
Hixie: likewise, perhaps accesskey should be made conforming... :)
09:08
<Hixie>
there's a whole folder on accesskey
09:08
<Hixie>
we need a solution
09:08
<Hixie>
we don't have one
09:08
<Hixie>
at least last i checked :-)
09:09
<zcorpan>
flagging it as an error isn't helping authors, it seems
09:19
<hsivonen>
zcorpan: I'm still trying to shun doctype sniffing on the XML side
09:19
<zcorpan>
hsivonen: yeah
09:20
<hsivonen>
(for the reasons stated in http://hsivonen.iki.fi/doctype/#xml )
09:20
<Hixie>
zcorpan: i agree; i haven't worked out what we should do yet
09:20
<hsivonen>
zcorpan: and the project that became Validator.nu started specifically as a non-DTD validator
09:21
<zcorpan>
hsivonen: don't tell me :)
09:21
<Philip`>
Hixie: Yes
09:21
<hsivonen>
:-)
09:22
<Hixie>
Philip`: no idea what i wanted to ask you anymore, sorry. it was probably about one of the e-mails you sent, in which case the question will be in my e-mail reply now.
09:22
<Philip`>
Oh, okay :-)
09:23
<hsivonen>
zcorpan: anyway, I don't find actionable feedback that doesn't run counter central design decisions or that don't belong to Hixie's plate instead (accesskey)
09:23
<zcorpan>
hsivonen: what about the accept header?
09:26
<hsivonen>
zcorpan: My thinking is that XHTML+MathML+SVG+RDF has a larger feature set than HTML only, so the former should be preferred
09:26
<hsivonen>
besides, people who do conneg usually want everything but IE to see the non-text/html version
09:27
<hsivonen>
moreover, the generic UI has a manual override anyway
09:28
<zcorpan>
makes sense, i guess tommy's case is a bit uncommon in using xhtml but wanting html
09:28
<zcorpan>
hsivonen: can i send GET parameters along with Opera's validate feature now?
09:29
<zcorpan>
(or remember settings some other way?)
09:32
<hsivonen>
zcorpan: what kind of GET parameters? Opera does a POST, doesn't it?
09:32
<zcorpan>
hsivonen: uh, make that GET-like parameters along with a POST request
09:33
<zcorpan>
that is, typing in http://validator.nu/?parser=html in opera:config
09:33
<hsivonen>
zcorpan: they are supported if those fields come before the form field that contains the document
09:33
<hsivonen>
oh
09:33
<hsivonen>
that's not supported
09:33
<Hixie>
me gets an e-mail from someone saying that the acid3 page on wikipedia is unfair to oepra because opera can't show its daily progress on nightly builds
09:33
<Hixie>
i wonder if i should point out that it was opera people who _created_ that page...
09:34
<Philip`>
It seems no more unfair to Opera than it is to IE
09:35
<hsivonen>
zcorpan: filed http://bugzilla.validator.nu/show_bug.cgi?id=77
09:35
<Hixie>
Philip`: well they also said it was unfair to IE
09:36
<zcorpan>
hsivonen: thanks
09:41
<Hixie>
hsivonen: so you think <table><p><i></table> should be only one error, not two? (wrong place for the <p>, missing </i>)
09:43
<Hixie>
i guess <table><p><i><tr> should arguably be not fewer errors than <table><p><i></table>
09:43
<Hixie>
and right now it is
09:47
<hsivonen>
Hixie: what have I said?
09:48
<hsivonen>
Hixie: two errors in that case seems reasonable on surface
09:48
<Hixie>
i've made it one error
09:48
<Hixie>
(foster parenting)
09:48
<hsivonen>
Hixie: well, that's sufficient for finding the document as a whole to be in error
09:48
<Hixie>
and made <table><p><tr> be one error too (it used to be two errors)
09:48
<Hixie>
yeah
09:49
<Hixie>
the next commit is this change
09:50
<Hixie>
man, i kee having to look at what the subject line of these mails was to work out what section they're talking about
09:52
<annevk>
hmm, i guess I can duplicate the subject line in the body somehow from now on
09:52
<hsivonen>
Hixie: I'm having trouble parsing that. Do you mean all the relevant bits should be in the email body from now on?
09:53
<Hixie>
pretend that i never look at the subject line
09:53
<Hixie>
i read the e-mail bodies to work out where to file the e-mails, and when i reply to them i just reply to a concatenated stream of bodies
09:54
<Hixie>
so really i never see the subject lines
09:54
<hsivonen>
Hixie: ok
09:54
<Hixie>
thanks :-)
09:54
<annevk>
can't you concatenate the subject too btw?
09:54
<Hixie>
(it's no biggie, though)
09:55
<Hixie>
annevk: i use pine, and pine doesn't do that
09:55
<hsivonen>
(aside: the bugzilla login cookie seems to be a constant)
09:55
<Hixie>
really? not ip-locked?
09:55
<Hixie>
i wonder why i keep getting logged out then
09:55
<hsivonen>
Hixie: I mean constant across requests
09:55
<Hixie>
ah ok
09:55
<annevk>
you can make it ip-locked if you want
09:56
<annevk>
at least, last time I checked that was optional
09:56
<Hixie>
yeah but that check box seems to not work across browsers, or something
09:56
<Hixie>
maybe it's locked to browser+ip or something
09:56
<Hixie>
i dunno
09:56
<gavin_>
the checkbox doesn't remove all IP restrictions, afaik
09:56
<Hixie>
i do know i keep having to log in to the various bugzilla instances i use
09:56
<gavin_>
it only makes them looser
09:56
<Hixie>
ah
09:56
<Hixie>
well it's annoying as hell
09:57
<annevk>
maybe it's better to have a private pc versus public pc option...
10:00
<Philip`>
Maybe they do that since it's easy to make an attachment file that steals people's cookies?
10:01
<Philip`>
but then they could just use HttpOnly cookies instead
10:06
<hsivonen>
Hixie: do I understand correctly that when you say "BOM", you mean U+FEFF. but when the Unicode folks say "BOM", they mean U+FEFF that indeed functions as a BOM?
10:07
<Hixie>
i rarely say "BOM" alone
10:07
<Hixie>
i usually say "U+FEFF BYTE ORDER MARK character" or some such
10:09
<hsivonen>
oh. Martin Dürst was quoting gsnedders, not you
10:09
<hsivonen>
anyway, I think I agree with Martin's point if I understood it correctly
10:09
<zcorpan>
speaking of BOMs, i noted a while ago that ES4 strips/ignores *all* U+FEFFs
10:10
<zcorpan>
because it was needed for web compat
10:10
<hsivonen>
that is, I think BOM swallowing should be left on the encoding layer except in the case of UTF-8 in which case the HTML5 layer should swallow it
10:11
<hsivonen>
conceptually, that is
10:11
<Hixie>
hsivonen: respond to my e-mail on the subject explaining why it helps users to do that instead of what i proposed :-)
10:11
<hsivonen>
in practice, the BOM sniffing needs to happen on the HTML layer, though...
10:12
<hsivonen>
so one would actually implement "UTF-16" by instantiating a UTF-16BE or a UTF-16LE decoder after swallowing the BOM
10:12
<Hixie>
x<table> x</table> -- should that render as "xx" like in firefox, or "x x" like in safari?
10:13
<hsivonen>
Hixie: I guess I have to reread what you proposed and test UTF-16BE with initial U+FEFF in browsers
10:13
<annevk>
Firefox seems better
10:13
<annevk>
(because you keep whitespace inside the <table>, which is also what Acid3 requires fwiw...)
10:14
<hsivonen>
after all, it seems to boil down to whether browsers treat an initial U+FEFF in UTF-16BE as non-space character data or not
10:14
<Hixie>
oh there's no doubt that <table> </table> has no foster parenting
10:14
<Hixie>
annevk: (the safari output could be obtailed through adoption)
10:14
<annevk>
then i guess I don't care
10:14
Hixie
looks at his data to see if anyone is using utf-16
10:15
<annevk>
<form> parsing is still causing us issues btw
10:15
<hsivonen>
Safari output follows from doing the whitespaceness check on the text node level which kinda makes sense
10:15
<annevk>
nested forms are common enough to warrent special rules it seems... :(
10:15
<Hixie>
annevk: has feedback been sent?
10:16
<annevk>
no, I've no idea what the spec should say
10:16
<annevk>
but probably something that matches WebKit/Firefox
10:17
<Hixie>
i saw UTF-16 explicitly declared on 0.004% of pages
10:18
<Hixie>
annevk: well, i don't even know what the issue is unless i have feedback :-)
10:19
Philip`
wrote a page with some script that DOM-inserts a form into the middle of another form, because he wanted an asynchronous file-upload box in the middle of a normal input form, and it felt quite evil :-(
10:19
<Philip`>
(but I think it works anyway, so that's good enough for me)
10:24
<Hixie>
hmm
10:24
<Hixie>
i wonder if we should do what hsivonen suggests in this e-mail, and basically make the tokeniser only emit strings, not characters
10:25
<Philip`>
How does that work with incremental rendering?
10:25
<Philip`>
(...of pages which are just text)
10:26
<Hixie>
poorly
10:26
<Hixie>
it also works poorly with things like " aaa" which should become " <html><head><head><body>aaa"
10:27
<annevk>
i hope you're still allowed to do incremental rendering?
10:27
<annevk>
by moving parts of text over?
10:28
<zcorpan>
Hixie: it becomes "<html><head><head><body> aaa" in opera, it seems, and i don't think we've run into any trouble because of that
10:30
<Hixie>
yeah, spaces around optional tags are messed up by most browsers
10:30
<Hixie>
i'm trying to fix that
10:30
<annevk>
please don't
10:31
<annevk>
it has already caused us issues
10:31
<Hixie>
i'm not specifying something that screws up the round tripping that badly
10:32
<annevk>
guess we implement html5-delta then :p
10:32
<Hixie>
i don't see why it should break things if we do it right
10:33
<annevk>
it was something about expecting documentElement.firstChild to be <head>
10:33
<zcorpan>
yeah, not ignoring whitespace before head broke pages
10:33
<Hixie>
yeah well oepra's parsing of <head> is so fucked up as it is that that wouldn't work anyway :-P
10:33
<annevk>
dude, we fixed that
10:33
<Hixie>
i'll believe that when i see it :-P
10:34
<zcorpan>
what's fucked up?
10:34
<Hixie>
i filed the bug years ago, it was only once i forced hte issue that acid3 that i saw any movement there at all
10:34
<Hixie>
i'm not at all convinced that it's been compltely fixed
10:34
<zcorpan>
i think we don't get a head for frameset documents, but that's all i know
10:35
<zcorpan>
(i.e. frameset documents without an explicit head)
10:35
<Hixie>
hmm, whatever solution we come up with for "x<table> x</table>" can also work for "x</body> </html> x"
10:36
<zcorpan>
i'd like the latter to be solved by ignoring </body> and </html>, but then i don't care about roundtripping so much
10:36
<zcorpan>
at least not roundtripping or insignificant whitespace
10:36
<annevk>
Hixie, http://my.opera.com/desktopteam/blog/ :)
10:36
<zcorpan>
and placemenet of comments
10:37
<Hixie>
well, not roundtripping spaces there basically means that you can't put spaces after the </body>.
10:37
<zcorpan>
oh noes :)
10:37
<Hixie>
as in, the syntax is a lie if we say you can have spaces after </body>
10:38
<Hixie>
and i think that's dumb :-)
10:38
<Hixie>
annevk: the last opera build i tried failed to connect to the network half the time, and the one before that crashed on startup :-)
10:38
<Hixie>
not to mention that opera on mac looks ugly as hell :-P
10:47
<Hixie>
jeez this week is going to be insane
10:47
<Hixie>
so many meetings
10:47
<Hixie>
ok bed time
10:47
<Hixie>
nn
11:15
hsivonen
congratulates self for writing the spec log to bugzilla script anyway
11:18
<annevk>
Hixie, just keep trying... anyway, e-mailed the nested forms issue
11:19
<hsivonen>
has anyone tested if TIS-620 needs to become an alias for Windows-874?
11:32
<Philip`>
hsivonen: I see 39 pages (out of 125K) that use charset=tis-620, if that's what you mean
11:37
<hsivonen>
Philip`: I mean: do browser implement tis-620 as an alias of Windows-874?
11:37
<Philip`>
Ah, okay
11:39
Philip`
sees that the spec diffs don't provide enough context to actually be useful
11:43
<annevk>
how much do you want?
11:45
<Philip`>
Potentially a quite large number of lines
11:45
<Philip`>
which would be inconvenient in the more common cases, which isn't good
11:45
<annevk>
maybe a hidden parameter context ?
11:45
<annevk>
so you could mangle the URI if you need more
11:46
<Philip`>
It'd be nice if instead of "@@ -37190,21 +37190,22 @@ function receiver(e) {" it showed something useful like the most recent parent node id attribute but that's probably hard :-)
11:46
<annevk>
yes
11:47
<annevk>
note that most IDs are auto-generated and those are not shown in web-apps-tracker
11:52
<MikeSmith>
Philip` - please forward that auto-responder message to me at mike⊙wo
11:53
<Philip`>
MikeSmith: Sent
11:53
<MikeSmith>
thanks
11:53
<Philip`>
Hixie: r1305 ("This change does not change the black box behaviour of the spec") does appear to change the behaviour of the spec
11:53
<Philip`>
(unless I've made a mistake)
11:54
<Philip`>
e.g. with the input <!doctype! system""?
11:54
<Philip`>
Expected: [u'ParseError', u'ParseError', u'ParseError', [u'DOCTYPE', u'!', None, u'', False]]
11:54
<Philip`>
Got: [u'ParseError', u'ParseError', u'ParseError', [u'DOCTYPE', u'!', None, u'', True]]
11:54
<Philip`>
where 'Expected' is what my implementation used to give
11:55
<annevk>
are you sure that's not the result of 1306?
11:56
<Philip`>
Oh, good point - it is
11:56
<Philip`>
because I couldn't read the r1305 diff properly, so I was referring to the spec too and got them mixed up :-p
11:56
<Philip`>
Hixie: Please ignore me :-)
12:10
<annevk>
Hixie, 'The "before htmlhtml root element node, which is then added to the stack.' misses a space
12:16
<annevk>
in "before head insertion mode" are the second-to-last and last equal?
12:17
<annevk>
also, the first in the "before head insertion mode" should also be grouped with those
16:34
<gsnedders>
hsivonen: I meant U+FEFF functioning as a BOM
16:35
<hsivonen>
gsnedders: ok. then I think I misunderstood something
16:35
hsivonen
is bitten by the Python variable visibility rules again :-(
16:37
Philip`
thinks the visibility thing is an oddly inelegant part of the language
16:50
gsnedders
thinks the ambiguous amperstand is confusing
16:56
Philip`
unexpectedly realises that "Prince" sounds like "prints", and that that possibly wasn't a coincidence in the software name
16:58
<hsivonen>
annevk: would a stack of form pointers work for the nested form case or is the issue more complex?
16:59
<annevk>
did you see my e-mail?
17:00
<annevk>
you basically want some kind of scoping where </form> can't get through
17:00
<annevk>
and you probably don't want to set the form pointer to null either
17:00
<hsivonen>
annevk: I saw the email. is scope a form pointer stack or something else?
17:00
<annevk>
<form><div></form><input></div></form> the <input> is still associated with the form
17:01
<hsivonen>
eww.
17:01
<annevk>
hsivonen, it's a block level element
17:01
<Philip`>
In <form><div><form><input>..., which form is the input associated with?
17:01
<annevk>
<center>, <blockquote>, <h1> - <h6>, <div>, etc.
17:01
<annevk>
Philip`, the second <form> gets ignored because the form pointer is already associated with something
17:04
<annevk>
Philip`, are you revising tests for all HTML5 spec changes?
17:05
<hsivonen>
more cases where legacy encoding labels are de facto aliases for newer encodings keep creeping out of the woodwork
17:06
<annevk>
i saw that in #webkit and asked him to mail whatwg⊙wo, i'm glad he did :)
17:06
<Philip`>
annevk: Only for the tokeniser
17:07
<Philip`>
(and I can't guarantee I haven't missed any changes)
17:08
<annevk>
hopefully enough fresh implementations keep coming to sort out all the mistakes...
17:08
<annevk>
or test contributions for that matter
17:09
<hsivonen>
aside: great Python on JVM news: http://fwierzbicki.blogspot.com/2008/02/jythons-future-looking-sunny.html
17:09
<Philip`>
I've been updating the Python html5lib to follow the spec, but gave up on the Ruby one after finding that it already had some non-trivial bugs (plus a trivial bug that hid all the others)
17:10
<Philip`>
Well, at least it had one non-trivial bug
17:10
<Philip`>
and maybe it was actually trivial, but I didn't try looking because I worried it might not be
17:11
<annevk>
did you leave the bug exposed?
17:11
<annevk>
if you did someone else will prolly fix it
17:11
<Philip`>
(this being the <x y="&notit"> case or something like that)
17:11
<Philip`>
annevk: Yes, it'll fail if someone runs the tokeniser tests
17:11
<annevk>
ah, that was annoying to fix on the python side too...
17:12
<annevk>
and the python side was actually hiding some bugs there too, i remember, hmm
17:12
<annevk>
since ruby was a port, i guess that's what went wrong...
17:12
<Philip`>
It seems easy to handle all the entities just with a regexp
17:12
<Philip`>
(plus another regexp for entities in attributes)
17:13
<Philip`>
though that's not so good if your input is a stream rather than a string
19:19
<Philip`>
hsivonen_: The obvious question is, what were fmt=0 up to fmt=5?
19:50
<Hixie>
Krzysztof Żelechowski's e-mails read like poetry
19:51
<Hixie>
or haikus
19:51
<Hixie>
e.g.:
19:51
<Hixie>
---
19:51
<Hixie>
I am not sure I understand you correctly
19:51
<Hixie>
but if this introduces the ability
19:51
<Hixie>
to make the user agent
19:51
<Hixie>
report a different URL than the effective target,
19:51
<gsnedders>
I should try sending an email to public-html or whatwg that's a pantoum sometime, just to see if someone notices
19:51
<Hixie>
it is going to be a sweet candy for phishers.
19:51
<Hixie>
(Newer browsers made this effect unavailable to scripts).
19:51
<Hixie>
---
19:51
<gsnedders>
actually, a pantoum is too obvious. too much repetition.
19:51
<gsnedders>
Maybe a sonnet?
20:27
<annevk>
gsnedders, it sets it from missing to the empty string...
20:27
<gsnedders>
annevk: where?
20:27
<annevk>
gsnedders, if you don't see that you're reading too much into it
20:27
<annevk>
gsnedders, I quoted those bits in my e-mail...
20:27
<gsnedders>
I searched the entire spec for "missing"
20:27
<annevk>
missing is not mentioned again
20:28
<annevk>
it does not need to be
20:28
<gsnedders>
annevk: I'm an asshole, per markp's definition. what do you expect? :P
20:29
<gsnedders>
annevk: if it is marked as missing, how does it not need to be marked as not missing?
20:29
annevk
shrugs
20:29
<gsnedders>
From what I can see in the spec, it is marked as missing, and is never marked as _not_ missing
20:30
<gsnedders>
it needs to be set in the "Before DOCTYPE public identifier state" and the "Before DOCTYPE system identifier state"
20:53
<annevk>
"public identifier, and system identifier must be marked as missing (which is a distinct state from the empty string)"
20:54
<annevk>
then it says "Set the DOCTYPE token's system identifier to the empty string"
21:19
<Philip`>
Hmm, ICU4J's gb2312 seems to act identically to iso-8859-1
21:19
<Philip`>
(gb2312-1980 works like it should, though)
21:21
<annevk>
i think i'll add window.scroll/scrollTo/scrollBy to CSSOM View
21:22
<Philip`>
Oops, I'm wrong
21:24
<Philip`>
gb2312 does seem to work, but values outside the special range (about 0xA0-0xFF) are treated like in iso-8859-1
21:25
<Philip`>
and gb2312-1980 is something totally different and not ASCII compatible
21:26
<annevk>
after all, it has scroll*
21:28
<Hixie>
ok so i parsed over 6 billion files with the parser change to the insertion mode thing (testing that it never hit a table-related element) and it didn't crash
21:29
<Hixie>
so i figure it's safe
21:31
<gsnedders>
annevk: that to me doesn't mean actually changing the marker
21:33
<Philip`>
Exception in thread "pool-1-thread-1201" java.lang.NoClassDefFoundError: Could not initialize class nu.validator.htmlparser.impl.EncodingInfo
21:33
<Philip`>
Hmm...
21:34
<hsivonen_>
Philip`: did you update ICU4J without updating the parser, too?
21:34
<Philip`>
I'm not sure what version of the parser I'm using
21:34
<hsivonen_>
Philip`: the new ICU4J has a b0rked UTF-7 decoder that crashes the old EncodingInfo
21:34
<Philip`>
so that's quite possible
21:35
<hsivonen_>
or, rather, EncodingInfo crashes the UTF-7 decoder
21:36
<annevk>
gsnedders, there's no marker, there's just states
21:36
<Philip`>
Things work better when I use a more recently compiled version of the parser - thanks :-)
21:37
<gsnedders>
annevk: "marked as missing" — a marker.
21:38
<annevk>
"must be marked as missing (which is a distinct state from the empty string)" -- a state
21:38
<annevk>
...
21:38
annevk
-> back to cssom-view
21:38
<gsnedders>
so the marker is a state. ergh.
21:40
<Philip`>
I see reported encoding errors on 1.5% of pages with reported encodings
21:41
<Philip`>
(counting things like charset=ISO-8559-1 as an error)
21:41
<Philip`>
(but unknown charsets are only about a quarter of the errors)
21:43
<Philip`>
(18% of gb2312 pages have errors)
21:47
<Philip`>
http://www.narkasabasi.com/v2/ <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-9"> <meta http-equiv="content-Type" content="text/html; charset=windows-1254" /> <meta http-equiv="content-type" content-type content="text/html"; charset= "x-mac-turkish">
21:47
<Philip`>
(Urgh - s/ /\n/)
21:47
<Philip`>
Looks like they couldn't quite make their mind up
21:49
<Philip`>
<meta content="http://schemas.microsoft.com/intellisense/ie5"; name="vs_targetSchema" charset="utf-8"> - that's not good if it gets incorrcetly interpreted as the page's charset
21:50
jgraham
wonders why dave hodder is asking the same question on public-html-comments that I already answered on the whatwg list
21:50
<Philip`>
http://jellybelly.com/International/Japanese/home.html is an interesting test case
21:50
<Philip`>
(Opera 9.2 fails)
21:51
<Philip`>
(Opera 9.5 fails too, though it gets the layout correct)
21:51
<SadEagle>
Philip`: check out the JS in there, too
21:51
<annevk>
jgraham, I think he's asking for a pointer to that e-mail again, he apparently forgot :)
21:51
<Philip`>
(Safari 3 fails too)
21:52
jgraham
goes to answer again
21:52
<Philip`>
SadEagle: What's unusual about the JS?
21:53
<SadEagle>
nothing unusual --- but it's sniffing for netscape >= 3, ie >= 4
21:54
<Philip`>
Ah
21:55
<Philip`>
That should degrade gracefully in other browsers, so it's not much of a problem :-)
21:58
<annevk>
oh, Acid3 is announced
22:00
Philip`
sees 88582 pages with <meta http-equiv="content-type" content="text/html; charset=...">, 25175 with an HTTP Content-Type: text/html; charset=..., and 171 with <meta charset="...">
22:00
<Philip`>
(out of 130K)
22:01
<annevk>
how many pages with content="" and without a valid value for http-equiv="" ?
22:01
<Philip`>
What's a "valid value"?
22:02
<annevk>
content-type ascii case-insensitive
22:02
<Philip`>
What about all the other http-equiv values?
22:02
<annevk>
where content contains the word charset
22:03
<annevk>
content=""*
22:03
<Philip`>
Hmm, I suppose I could look for that
22:04
<Philip`>
but I won't bother doing that now, since it takes 20 minutes to run
22:04
<annevk>
k
22:21
<SadEagle>
Philip`: LOL, just stumbled on a webpage with 2 encoding headers, neither of which is right
23:15
<jgraham>
Hixie: In the outline algorithm, in the conditon "When exiting a sectioning content element, if the stack is not empty"
23:15
<Hixie>
yes?
23:15
<jgraham>
it's not clear to me what "Let current section be the last section in the outline of the current outlinee element.
23:15
<jgraham>
Insert its outline at the end of the current section. (This does not change which section is the last section in the outline.)" means
23:16
<jgraham>
Specifically the last line.
23:16
<Hixie>
oops
23:16
<Hixie>
"its" refers to the sectioning content element being exited
23:16
<Hixie>
let me fix that
23:21
<Hixie>
ok fixed
23:21
<Hixie>
is that clearer?
23:25
<jgraham>
I think that helps. I just need to work through and see if I have a sensible mental model of what's happening
23:25
<Hixie>
k
23:25
<Hixie>
i know you will, but, let me know what i can do to improve it
23:26
<Hixie>
the current algorithm should be way better than what was there before, yet get mostly the same results
23:28
<annevk>
Hixie, you're still going through parser feedback right?
23:28
<Hixie>
yes
23:28
<Hixie>
i'm stalled right now trying to figure out how to handle spaces in <table> x </table>
23:29
<annevk>
flip a coin :)
23:30
<Hixie>
between what and what? i have no options so far :-)
23:30
<annevk>
between " x <table></table>" and "x<table> </table>"
23:30
<Philip`>
annevk: Do you mean the algorithm for handling spaces should involve flipping a coin?
23:31
<annevk>
Philip`, that could be interesting, but requiring specific hardware for HTML 5 might be too much
23:32
jgraham
suggests asking the user to decide
23:33
<Philip`>
annevk: Any implementation is acceptable as long as it acts the same as flipping a coin
23:33
<Hixie>
annevk: oh i've decided it's "x <table> </table>", the question is how to get there.
23:33
<jgraham>
"You have encountered a space inside a table. Would you like to move it outside (Y/n)"
23:34
<Philip`>
There should be a web service that provides a stream of random bits, called Flipr
23:34
<Hixie>
i'm thinking a flag on the table that decides whether spaces are sent out or not
23:34
<Hixie>
that gets set as soon as you send anything out
23:34
<Hixie>
the problem is nested tables in the innerHTML case makes this relatively hard to specify
23:34
<annevk>
so i think the current spec covers it
23:34
<annevk>
because you consume characters until the end
23:34
<annevk>
which makes "x " a character block
23:35
<annevk>
and the space before it gets treated specially
23:35
<Hixie>
the current spec has no concept of "character blocks" :-)
23:35
<annevk>
the append the character stuff
23:35
<Hixie>
but e.g. "<table> x<span></span> </table>" should become "x<span></span> <table> </table>"
23:36
<Hixie>
so it's not that simply
23:36
<Hixie>
simple
23:36
<annevk>
for real?
23:36
annevk
didn't think that trailling space would be placed in front too
23:37
<Hixie>
"<table> foo<span></span> bar</table>" shouldn't display "foobar"
23:37
<Hixie>
it should display "foo bar"
23:37
<Hixie>
that's the bug :-)
23:38
<annevk>
a flag sounds easiest then, yes
23:49
<jgraham>
Hixie: I think I'm still confused about what it means to insert a section "at the end of" another section. Do you mean "as the last child section of" the other section?
23:50
<jgraham>
(so if you have <body><section><h1>foo you end up with a section for the <section> as a child of the section for the <body>
23:50
<Hixie>
in "When entering a heading content element" i used the term "append it to /candidate section/"
23:50
<Hixie>
is that clearer?
23:50
<annevk>
sounds like appendChild()...
23:52
<jgraham>
Yeah, that bit is clearer.
23:52
<Hixie>
ok i'll use that terminology
23:52
Hixie
regens