02:12 | <Hixie> | so my girlfriend looks over my shoulder at the changes i just made, and without knowing i had just added them, points to the three sections i added or renamed today and laughs at me for naming them that |
02:12 | <Hixie> | maybe i'm having a bad naming day or something |
02:23 | <Lachy> | Hixie, which section names were they? |
02:24 | <Hixie> | the "after after body insertion mode", the "after after frameset insertion mode", and the "unexpected end" (with the rules for when you stop parsing "with prejudice") |
02:45 | <Hixie> | god, i can't wait for gsnedders' preprocessor |
03:02 | <mpt> | after after? |
03:05 | <jwalden> | after post-* perhaps if you care enough about the wording |
03:14 | <Lachy> | does anyone know if Firefox 3 has implemented support for any microformats yet? If so, what have they got, or did they drop that idea? |
03:19 | <jwalden> | I think it's mostly extensionland |
03:31 | <Lachy> | yeah, I'm aware of the extensions, but there were announcements around the beginning of last year that FF3 was going to add native support for them |
04:14 | <jwalden> | I think they never had a developer willing and capable, time-wise, of doing so |
04:14 | <jwalden> | but I didn't pay too mcuh attention |
05:14 | <Hixie> | Philip`: yt? |
08:15 | <zcorpan> | Hixie: re gamespy.com, opera does use quirks mode, apparently; i was probably getting confused myself by all the testing back and forth |
08:16 | <Hixie> | were the changes i made still good? |
08:16 | <zcorpan> | i think so |
08:17 | <zcorpan> | haven't looked at the changelog yet, but i trust that you did what you said you did :) |
08:17 | <Hixie> | please don't :-) |
08:17 | <zcorpan> | i have some more doctype feedback coming soon |
08:17 | <Hixie> | with these changes i expect i've made all kinds of errors |
08:17 | <Hixie> | cool |
08:18 | <zcorpan> | i think we need to ignore the trailing "EN" (or whatever it might be) in the FPI |
08:19 | <zcorpan> | firefox and safari don't work with a number of pages because of that |
08:19 | <zcorpan> | but opera and ie do |
08:21 | <zcorpan> | looking at about 60 pages that have something other than //EN at the end (and wouldn't trigger standards mode anyway), 34 look good only in quirks mode, 1 looks good only in standards mode (in opera/firefox), and 1 looks good only in standards mode in opera but the same in quirks or standards in firefox |
08:21 | <zcorpan> | and the rest looked pretty much the same in either mode |
08:30 | <Hixie> | if you haven't already, send mail saying how you propose to check for that in the spec |
08:31 | <zcorpan> | i'm about to send it in a minute. just check that the FPI *starts* with e.g. "-//w3c//dtd html 3.2//" |
08:32 | <hsivonen> | Hixie: should I expect large volumes of conformance checker-relevant spec changes in the near term? |
08:32 | <Hixie> | hsivonen: i'm currently going through the tree construction feedback |
08:32 | <hsivonen> | Hixie: I take that as a "yes" :-) |
08:33 | <Hixie> | well, i'm almost done |
08:33 | <Hixie> | and after that i'm out of things to do again |
08:33 | <Hixie> | so. maybe not "large" volumes :-) |
08:33 | <hsivonen> | ok. perhaps then I should just file bugs for the changes manually and not write a script |
08:37 | <hsivonen> | hmm. looks like I'm going to need the script anyway. |
08:41 | <Hixie> | script? |
08:43 | <hsivonen> | Hixie: a script for mining the spec svn log for conformance checker and tools-relevant changes and filing bugs automatically so that I don't miss changes |
08:43 | <hsivonen> | at this point, doing a vgrep on the spec and parser source isn't such a great idea |
08:43 | <hsivonen> | or vdiff, rather |
08:46 | <hsivonen> | hmm. I see a form field called "source" handled in web-apps-tracker, but I don't see a field named that way in the form |
08:47 | <zcorpan> | hsivonen: another hidden feature? :) |
08:47 | <Hixie> | anything with "c" (or is it "v"? one or the other) in the "affected" part of the checkin comments will give you the checkins that i think affect you |
08:48 | <hsivonen> | Hixie: yeah. now I need to figure out how to modify web-apps-tracker to post to Bugzilla |
08:48 | <Hixie> | heh |
08:48 | <Hixie> | good luck wit hthat |
08:49 | <hsivonen> | umm. is there a reason to expect that luck is needed when posting to Bugzilla_ |
08:49 | <hsivonen> | ? |
08:54 | <Hixie> | well your script will need to deal with cookies |
08:55 | <Hixie> | which always makes things exciting in my experience |
08:56 | <hsivonen> | isn't the login cookie a constant that can be hard-coded once sniffed from a browser session? |
09:04 | <Hixie> | not with bugzilla |
09:04 | <Hixie> | it is ip-address locked, iirc |
09:07 | <zcorpan> | hsivonen: see http://www.sitepoint.com/forums/showthread.php?p=3739972#post3738863 and onward |
09:08 | <zcorpan> | Hixie: likewise, perhaps accesskey should be made conforming... :) |
09:08 | <Hixie> | there's a whole folder on accesskey |
09:08 | <Hixie> | we need a solution |
09:08 | <Hixie> | we don't have one |
09:08 | <Hixie> | at least last i checked :-) |
09:09 | <zcorpan> | flagging it as an error isn't helping authors, it seems |
09:19 | <hsivonen> | zcorpan: I'm still trying to shun doctype sniffing on the XML side |
09:19 | <zcorpan> | hsivonen: yeah |
09:20 | <hsivonen> | (for the reasons stated in http://hsivonen.iki.fi/doctype/#xml ) |
09:20 | <Hixie> | zcorpan: i agree; i haven't worked out what we should do yet |
09:20 | <hsivonen> | zcorpan: and the project that became Validator.nu started specifically as a non-DTD validator |
09:21 | <zcorpan> | hsivonen: don't tell me :) |
09:21 | <Philip`> | Hixie: Yes |
09:21 | <hsivonen> | :-) |
09:22 | <Hixie> | Philip`: no idea what i wanted to ask you anymore, sorry. it was probably about one of the e-mails you sent, in which case the question will be in my e-mail reply now. |
09:22 | <Philip`> | Oh, okay :-) |
09:23 | <hsivonen> | zcorpan: anyway, I don't find actionable feedback that doesn't run counter central design decisions or that don't belong to Hixie's plate instead (accesskey) |
09:23 | <zcorpan> | hsivonen: what about the accept header? |
09:26 | <hsivonen> | zcorpan: My thinking is that XHTML+MathML+SVG+RDF has a larger feature set than HTML only, so the former should be preferred |
09:26 | <hsivonen> | besides, people who do conneg usually want everything but IE to see the non-text/html version |
09:27 | <hsivonen> | moreover, the generic UI has a manual override anyway |
09:28 | <zcorpan> | makes sense, i guess tommy's case is a bit uncommon in using xhtml but wanting html |
09:28 | <zcorpan> | hsivonen: can i send GET parameters along with Opera's validate feature now? |
09:29 | <zcorpan> | (or remember settings some other way?) |
09:32 | <hsivonen> | zcorpan: what kind of GET parameters? Opera does a POST, doesn't it? |
09:32 | <zcorpan> | hsivonen: uh, make that GET-like parameters along with a POST request |
09:33 | <zcorpan> | that is, typing in http://validator.nu/?parser=html in opera:config |
09:33 | <hsivonen> | zcorpan: they are supported if those fields come before the form field that contains the document |
09:33 | <hsivonen> | oh |
09:33 | <hsivonen> | that's not supported |
09:33 | <Hixie> | me gets an e-mail from someone saying that the acid3 page on wikipedia is unfair to oepra because opera can't show its daily progress on nightly builds |
09:33 | <Hixie> | i wonder if i should point out that it was opera people who _created_ that page... |
09:34 | <Philip`> | It seems no more unfair to Opera than it is to IE |
09:35 | <hsivonen> | zcorpan: filed http://bugzilla.validator.nu/show_bug.cgi?id=77 |
09:35 | <Hixie> | Philip`: well they also said it was unfair to IE |
09:36 | <zcorpan> | hsivonen: thanks |
09:41 | <Hixie> | hsivonen: so you think <table><p><i></table> should be only one error, not two? (wrong place for the <p>, missing </i>) |
09:43 | <Hixie> | i guess <table><p><i><tr> should arguably be not fewer errors than <table><p><i></table> |
09:43 | <Hixie> | and right now it is |
09:47 | <hsivonen> | Hixie: what have I said? |
09:48 | <hsivonen> | Hixie: two errors in that case seems reasonable on surface |
09:48 | <Hixie> | i've made it one error |
09:48 | <Hixie> | (foster parenting) |
09:48 | <hsivonen> | Hixie: well, that's sufficient for finding the document as a whole to be in error |
09:48 | <Hixie> | and made <table><p><tr> be one error too (it used to be two errors) |
09:48 | <Hixie> | yeah |
09:49 | <Hixie> | the next commit is this change |
09:50 | <Hixie> | man, i kee having to look at what the subject line of these mails was to work out what section they're talking about |
09:52 | <annevk> | hmm, i guess I can duplicate the subject line in the body somehow from now on |
09:52 | <hsivonen> | Hixie: I'm having trouble parsing that. Do you mean all the relevant bits should be in the email body from now on? |
09:53 | <Hixie> | pretend that i never look at the subject line |
09:53 | <Hixie> | i read the e-mail bodies to work out where to file the e-mails, and when i reply to them i just reply to a concatenated stream of bodies |
09:54 | <Hixie> | so really i never see the subject lines |
09:54 | <hsivonen> | Hixie: ok |
09:54 | <Hixie> | thanks :-) |
09:54 | <annevk> | can't you concatenate the subject too btw? |
09:54 | <Hixie> | (it's no biggie, though) |
09:55 | <Hixie> | annevk: i use pine, and pine doesn't do that |
09:55 | <hsivonen> | (aside: the bugzilla login cookie seems to be a constant) |
09:55 | <Hixie> | really? not ip-locked? |
09:55 | <Hixie> | i wonder why i keep getting logged out then |
09:55 | <hsivonen> | Hixie: I mean constant across requests |
09:55 | <Hixie> | ah ok |
09:55 | <annevk> | you can make it ip-locked if you want |
09:56 | <annevk> | at least, last time I checked that was optional |
09:56 | <Hixie> | yeah but that check box seems to not work across browsers, or something |
09:56 | <Hixie> | maybe it's locked to browser+ip or something |
09:56 | <Hixie> | i dunno |
09:56 | <gavin_> | the checkbox doesn't remove all IP restrictions, afaik |
09:56 | <Hixie> | i do know i keep having to log in to the various bugzilla instances i use |
09:56 | <gavin_> | it only makes them looser |
09:56 | <Hixie> | ah |
09:56 | <Hixie> | well it's annoying as hell |
09:57 | <annevk> | maybe it's better to have a private pc versus public pc option... |
10:00 | <Philip`> | Maybe they do that since it's easy to make an attachment file that steals people's cookies? |
10:01 | <Philip`> | but then they could just use HttpOnly cookies instead |
10:06 | <hsivonen> | Hixie: do I understand correctly that when you say "BOM", you mean U+FEFF. but when the Unicode folks say "BOM", they mean U+FEFF that indeed functions as a BOM? |
10:07 | <Hixie> | i rarely say "BOM" alone |
10:07 | <Hixie> | i usually say "U+FEFF BYTE ORDER MARK character" or some such |
10:09 | <hsivonen> | oh. Martin Dürst was quoting gsnedders, not you |
10:09 | <hsivonen> | anyway, I think I agree with Martin's point if I understood it correctly |
10:09 | <zcorpan> | speaking of BOMs, i noted a while ago that ES4 strips/ignores *all* U+FEFFs |
10:10 | <zcorpan> | because it was needed for web compat |
10:10 | <hsivonen> | that is, I think BOM swallowing should be left on the encoding layer except in the case of UTF-8 in which case the HTML5 layer should swallow it |
10:11 | <hsivonen> | conceptually, that is |
10:11 | <Hixie> | hsivonen: respond to my e-mail on the subject explaining why it helps users to do that instead of what i proposed :-) |
10:11 | <hsivonen> | in practice, the BOM sniffing needs to happen on the HTML layer, though... |
10:12 | <hsivonen> | so one would actually implement "UTF-16" by instantiating a UTF-16BE or a UTF-16LE decoder after swallowing the BOM |
10:12 | <Hixie> | x<table> x</table> -- should that render as "xx" like in firefox, or "x x" like in safari? |
10:13 | <hsivonen> | Hixie: I guess I have to reread what you proposed and test UTF-16BE with initial U+FEFF in browsers |
10:13 | <annevk> | Firefox seems better |
10:13 | <annevk> | (because you keep whitespace inside the <table>, which is also what Acid3 requires fwiw...) |
10:14 | <hsivonen> | after all, it seems to boil down to whether browsers treat an initial U+FEFF in UTF-16BE as non-space character data or not |
10:14 | <Hixie> | oh there's no doubt that <table> </table> has no foster parenting |
10:14 | <Hixie> | annevk: (the safari output could be obtailed through adoption) |
10:14 | <annevk> | then i guess I don't care |
10:14 | Hixie | looks at his data to see if anyone is using utf-16 |
10:15 | <annevk> | <form> parsing is still causing us issues btw |
10:15 | <hsivonen> | Safari output follows from doing the whitespaceness check on the text node level which kinda makes sense |
10:15 | <annevk> | nested forms are common enough to warrent special rules it seems... :( |
10:15 | <Hixie> | annevk: has feedback been sent? |
10:16 | <annevk> | no, I've no idea what the spec should say |
10:16 | <annevk> | but probably something that matches WebKit/Firefox |
10:17 | <Hixie> | i saw UTF-16 explicitly declared on 0.004% of pages |
10:18 | <Hixie> | annevk: well, i don't even know what the issue is unless i have feedback :-) |
10:19 | Philip` | wrote a page with some script that DOM-inserts a form into the middle of another form, because he wanted an asynchronous file-upload box in the middle of a normal input form, and it felt quite evil :-( |
10:19 | <Philip`> | (but I think it works anyway, so that's good enough for me) |
10:24 | <Hixie> | hmm |
10:24 | <Hixie> | i wonder if we should do what hsivonen suggests in this e-mail, and basically make the tokeniser only emit strings, not characters |
10:25 | <Philip`> | How does that work with incremental rendering? |
10:25 | <Philip`> | (...of pages which are just text) |
10:26 | <Hixie> | poorly |
10:26 | <Hixie> | it also works poorly with things like " aaa" which should become " <html><head><head><body>aaa" |
10:27 | <annevk> | i hope you're still allowed to do incremental rendering? |
10:27 | <annevk> | by moving parts of text over? |
10:28 | <zcorpan> | Hixie: it becomes "<html><head><head><body> aaa" in opera, it seems, and i don't think we've run into any trouble because of that |
10:30 | <Hixie> | yeah, spaces around optional tags are messed up by most browsers |
10:30 | <Hixie> | i'm trying to fix that |
10:30 | <annevk> | please don't |
10:31 | <annevk> | it has already caused us issues |
10:31 | <Hixie> | i'm not specifying something that screws up the round tripping that badly |
10:32 | <annevk> | guess we implement html5-delta then :p |
10:32 | <Hixie> | i don't see why it should break things if we do it right |
10:33 | <annevk> | it was something about expecting documentElement.firstChild to be <head> |
10:33 | <zcorpan> | yeah, not ignoring whitespace before head broke pages |
10:33 | <Hixie> | yeah well oepra's parsing of <head> is so fucked up as it is that that wouldn't work anyway :-P |
10:33 | <annevk> | dude, we fixed that |
10:33 | <Hixie> | i'll believe that when i see it :-P |
10:34 | <zcorpan> | what's fucked up? |
10:34 | <Hixie> | i filed the bug years ago, it was only once i forced hte issue that acid3 that i saw any movement there at all |
10:34 | <Hixie> | i'm not at all convinced that it's been compltely fixed |
10:34 | <zcorpan> | i think we don't get a head for frameset documents, but that's all i know |
10:35 | <zcorpan> | (i.e. frameset documents without an explicit head) |
10:35 | <Hixie> | hmm, whatever solution we come up with for "x<table> x</table>" can also work for "x</body> </html> x" |
10:36 | <zcorpan> | i'd like the latter to be solved by ignoring </body> and </html>, but then i don't care about roundtripping so much |
10:36 | <zcorpan> | at least not roundtripping or insignificant whitespace |
10:36 | <annevk> | Hixie, http://my.opera.com/desktopteam/blog/ :) |
10:36 | <zcorpan> | and placemenet of comments |
10:37 | <Hixie> | well, not roundtripping spaces there basically means that you can't put spaces after the </body>. |
10:37 | <zcorpan> | oh noes :) |
10:37 | <Hixie> | as in, the syntax is a lie if we say you can have spaces after </body> |
10:38 | <Hixie> | and i think that's dumb :-) |
10:38 | <Hixie> | annevk: the last opera build i tried failed to connect to the network half the time, and the one before that crashed on startup :-) |
10:38 | <Hixie> | not to mention that opera on mac looks ugly as hell :-P |
10:47 | <Hixie> | jeez this week is going to be insane |
10:47 | <Hixie> | so many meetings |
10:47 | <Hixie> | ok bed time |
10:47 | <Hixie> | nn |
11:15 | hsivonen | congratulates self for writing the spec log to bugzilla script anyway |
11:18 | <annevk> | Hixie, just keep trying... anyway, e-mailed the nested forms issue |
11:19 | <hsivonen> | has anyone tested if TIS-620 needs to become an alias for Windows-874? |
11:32 | <Philip`> | hsivonen: I see 39 pages (out of 125K) that use charset=tis-620, if that's what you mean |
11:37 | <hsivonen> | Philip`: I mean: do browser implement tis-620 as an alias of Windows-874? |
11:37 | <Philip`> | Ah, okay |
11:39 | Philip` | sees that the spec diffs don't provide enough context to actually be useful |
11:43 | <annevk> | how much do you want? |
11:45 | <Philip`> | Potentially a quite large number of lines |
11:45 | <Philip`> | which would be inconvenient in the more common cases, which isn't good |
11:45 | <annevk> | maybe a hidden parameter context ? |
11:45 | <annevk> | so you could mangle the URI if you need more |
11:46 | <Philip`> | It'd be nice if instead of "@@ -37190,21 +37190,22 @@ function receiver(e) {" it showed something useful like the most recent parent node id attribute but that's probably hard :-) |
11:46 | <annevk> | yes |
11:47 | <annevk> | note that most IDs are auto-generated and those are not shown in web-apps-tracker |
11:52 | <MikeSmith> | Philip` - please forward that auto-responder message to me at mike⊙wo |
11:53 | <Philip`> | MikeSmith: Sent |
11:53 | <MikeSmith> | thanks |
11:53 | <Philip`> | Hixie: r1305 ("This change does not change the black box behaviour of the spec") does appear to change the behaviour of the spec |
11:53 | <Philip`> | (unless I've made a mistake) |
11:54 | <Philip`> | e.g. with the input <!doctype! system""? |
11:54 | <Philip`> | Expected: [u'ParseError', u'ParseError', u'ParseError', [u'DOCTYPE', u'!', None, u'', False]] |
11:54 | <Philip`> | Got: [u'ParseError', u'ParseError', u'ParseError', [u'DOCTYPE', u'!', None, u'', True]] |
11:54 | <Philip`> | where 'Expected' is what my implementation used to give |
11:55 | <annevk> | are you sure that's not the result of 1306? |
11:56 | <Philip`> | Oh, good point - it is |
11:56 | <Philip`> | because I couldn't read the r1305 diff properly, so I was referring to the spec too and got them mixed up :-p |
11:56 | <Philip`> | Hixie: Please ignore me :-) |
12:10 | <annevk> | Hixie, 'The "before htmlhtml root element node, which is then added to the stack.' misses a space |
12:16 | <annevk> | in "before head insertion mode" are the second-to-last and last equal? |
12:17 | <annevk> | also, the first in the "before head insertion mode" should also be grouped with those |
16:34 | <gsnedders> | hsivonen: I meant U+FEFF functioning as a BOM |
16:35 | <hsivonen> | gsnedders: ok. then I think I misunderstood something |
16:35 | hsivonen | is bitten by the Python variable visibility rules again :-( |
16:37 | Philip` | thinks the visibility thing is an oddly inelegant part of the language |
16:50 | gsnedders | thinks the ambiguous amperstand is confusing |
16:56 | Philip` | unexpectedly realises that "Prince" sounds like "prints", and that that possibly wasn't a coincidence in the software name |
16:58 | <hsivonen> | annevk: would a stack of form pointers work for the nested form case or is the issue more complex? |
16:59 | <annevk> | did you see my e-mail? |
17:00 | <annevk> | you basically want some kind of scoping where </form> can't get through |
17:00 | <annevk> | and you probably don't want to set the form pointer to null either |
17:00 | <hsivonen> | annevk: I saw the email. is scope a form pointer stack or something else? |
17:00 | <annevk> | <form><div></form><input></div></form> the <input> is still associated with the form |
17:01 | <hsivonen> | eww. |
17:01 | <annevk> | hsivonen, it's a block level element |
17:01 | <Philip`> | In <form><div><form><input>..., which form is the input associated with? |
17:01 | <annevk> | <center>, <blockquote>, <h1> - <h6>, <div>, etc. |
17:01 | <annevk> | Philip`, the second <form> gets ignored because the form pointer is already associated with something |
17:04 | <annevk> | Philip`, are you revising tests for all HTML5 spec changes? |
17:05 | <hsivonen> | more cases where legacy encoding labels are de facto aliases for newer encodings keep creeping out of the woodwork |
17:06 | <annevk> | i saw that in #webkit and asked him to mail whatwg⊙wo, i'm glad he did :) |
17:06 | <Philip`> | annevk: Only for the tokeniser |
17:07 | <Philip`> | (and I can't guarantee I haven't missed any changes) |
17:08 | <annevk> | hopefully enough fresh implementations keep coming to sort out all the mistakes... |
17:08 | <annevk> | or test contributions for that matter |
17:09 | <hsivonen> | aside: great Python on JVM news: http://fwierzbicki.blogspot.com/2008/02/jythons-future-looking-sunny.html |
17:09 | <Philip`> | I've been updating the Python html5lib to follow the spec, but gave up on the Ruby one after finding that it already had some non-trivial bugs (plus a trivial bug that hid all the others) |
17:10 | <Philip`> | Well, at least it had one non-trivial bug |
17:10 | <Philip`> | and maybe it was actually trivial, but I didn't try looking because I worried it might not be |
17:11 | <annevk> | did you leave the bug exposed? |
17:11 | <annevk> | if you did someone else will prolly fix it |
17:11 | <Philip`> | (this being the <x y="¬it"> case or something like that) |
17:11 | <Philip`> | annevk: Yes, it'll fail if someone runs the tokeniser tests |
17:11 | <annevk> | ah, that was annoying to fix on the python side too... |
17:12 | <annevk> | and the python side was actually hiding some bugs there too, i remember, hmm |
17:12 | <annevk> | since ruby was a port, i guess that's what went wrong... |
17:12 | <Philip`> | It seems easy to handle all the entities just with a regexp |
17:12 | <Philip`> | (plus another regexp for entities in attributes) |
17:13 | <Philip`> | though that's not so good if your input is a stream rather than a string |
19:19 | <Philip`> | hsivonen_: The obvious question is, what were fmt=0 up to fmt=5? |
19:50 | <Hixie> | Krzysztof Żelechowski's e-mails read like poetry |
19:51 | <Hixie> | or haikus |
19:51 | <Hixie> | e.g.: |
19:51 | <Hixie> | --- |
19:51 | <Hixie> | I am not sure I understand you correctly |
19:51 | <Hixie> | but if this introduces the ability |
19:51 | <Hixie> | to make the user agent |
19:51 | <Hixie> | report a different URL than the effective target, |
19:51 | <gsnedders> | I should try sending an email to public-html or whatwg that's a pantoum sometime, just to see if someone notices |
19:51 | <Hixie> | it is going to be a sweet candy for phishers. |
19:51 | <Hixie> | (Newer browsers made this effect unavailable to scripts). |
19:51 | <Hixie> | --- |
19:51 | <gsnedders> | actually, a pantoum is too obvious. too much repetition. |
19:51 | <gsnedders> | Maybe a sonnet? |
20:27 | <annevk> | gsnedders, it sets it from missing to the empty string... |
20:27 | <gsnedders> | annevk: where? |
20:27 | <annevk> | gsnedders, if you don't see that you're reading too much into it |
20:27 | <annevk> | gsnedders, I quoted those bits in my e-mail... |
20:27 | <gsnedders> | I searched the entire spec for "missing" |
20:27 | <annevk> | missing is not mentioned again |
20:28 | <annevk> | it does not need to be |
20:28 | <gsnedders> | annevk: I'm an asshole, per markp's definition. what do you expect? :P |
20:29 | <gsnedders> | annevk: if it is marked as missing, how does it not need to be marked as not missing? |
20:29 | annevk | shrugs |
20:29 | <gsnedders> | From what I can see in the spec, it is marked as missing, and is never marked as _not_ missing |
20:30 | <gsnedders> | it needs to be set in the "Before DOCTYPE public identifier state" and the "Before DOCTYPE system identifier state" |
20:53 | <annevk> | "public identifier, and system identifier must be marked as missing (which is a distinct state from the empty string)" |
20:54 | <annevk> | then it says "Set the DOCTYPE token's system identifier to the empty string" |
21:19 | <Philip`> | Hmm, ICU4J's gb2312 seems to act identically to iso-8859-1 |
21:19 | <Philip`> | (gb2312-1980 works like it should, though) |
21:21 | <annevk> | i think i'll add window.scroll/scrollTo/scrollBy to CSSOM View |
21:22 | <Philip`> | Oops, I'm wrong |
21:24 | <Philip`> | gb2312 does seem to work, but values outside the special range (about 0xA0-0xFF) are treated like in iso-8859-1 |
21:25 | <Philip`> | and gb2312-1980 is something totally different and not ASCII compatible |
21:26 | <annevk> | after all, it has scroll* |
21:28 | <Hixie> | ok so i parsed over 6 billion files with the parser change to the insertion mode thing (testing that it never hit a table-related element) and it didn't crash |
21:29 | <Hixie> | so i figure it's safe |
21:31 | <gsnedders> | annevk: that to me doesn't mean actually changing the marker |
21:33 | <Philip`> | Exception in thread "pool-1-thread-1201" java.lang.NoClassDefFoundError: Could not initialize class nu.validator.htmlparser.impl.EncodingInfo |
21:33 | <Philip`> | Hmm... |
21:34 | <hsivonen_> | Philip`: did you update ICU4J without updating the parser, too? |
21:34 | <Philip`> | I'm not sure what version of the parser I'm using |
21:34 | <hsivonen_> | Philip`: the new ICU4J has a b0rked UTF-7 decoder that crashes the old EncodingInfo |
21:34 | <Philip`> | so that's quite possible |
21:35 | <hsivonen_> | or, rather, EncodingInfo crashes the UTF-7 decoder |
21:36 | <annevk> | gsnedders, there's no marker, there's just states |
21:36 | <Philip`> | Things work better when I use a more recently compiled version of the parser - thanks :-) |
21:37 | <gsnedders> | annevk: "marked as missing" — a marker. |
21:38 | <annevk> | "must be marked as missing (which is a distinct state from the empty string)" -- a state |
21:38 | <annevk> | ... |
21:38 | annevk | -> back to cssom-view |
21:38 | <gsnedders> | so the marker is a state. ergh. |
21:40 | <Philip`> | I see reported encoding errors on 1.5% of pages with reported encodings |
21:41 | <Philip`> | (counting things like charset=ISO-8559-1 as an error) |
21:41 | <Philip`> | (but unknown charsets are only about a quarter of the errors) |
21:43 | <Philip`> | (18% of gb2312 pages have errors) |
21:47 | <Philip`> | http://www.narkasabasi.com/v2/ <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-9"> <meta http-equiv="content-Type" content="text/html; charset=windows-1254" /> <meta http-equiv="content-type" content-type content="text/html"; charset= "x-mac-turkish"> |
21:47 | <Philip`> | (Urgh - s/ /\n/) |
21:47 | <Philip`> | Looks like they couldn't quite make their mind up |
21:49 | <Philip`> | <meta content="http://schemas.microsoft.com/intellisense/ie5" name="vs_targetSchema" charset="utf-8"> - that's not good if it gets incorrcetly interpreted as the page's charset |
21:50 | jgraham | wonders why dave hodder is asking the same question on public-html-comments that I already answered on the whatwg list |
21:50 | <Philip`> | http://jellybelly.com/International/Japanese/home.html is an interesting test case |
21:50 | <Philip`> | (Opera 9.2 fails) |
21:51 | <Philip`> | (Opera 9.5 fails too, though it gets the layout correct) |
21:51 | <SadEagle> | Philip`: check out the JS in there, too |
21:51 | <annevk> | jgraham, I think he's asking for a pointer to that e-mail again, he apparently forgot :) |
21:51 | <Philip`> | (Safari 3 fails too) |
21:52 | jgraham | goes to answer again |
21:52 | <Philip`> | SadEagle: What's unusual about the JS? |
21:53 | <SadEagle> | nothing unusual --- but it's sniffing for netscape >= 3, ie >= 4 |
21:54 | <Philip`> | Ah |
21:55 | <Philip`> | That should degrade gracefully in other browsers, so it's not much of a problem :-) |
21:58 | <annevk> | oh, Acid3 is announced |
22:00 | Philip` | sees 88582 pages with <meta http-equiv="content-type" content="text/html; charset=...">, 25175 with an HTTP Content-Type: text/html; charset=..., and 171 with <meta charset="..."> |
22:00 | <Philip`> | (out of 130K) |
22:01 | <annevk> | how many pages with content="" and without a valid value for http-equiv="" ? |
22:01 | <Philip`> | What's a "valid value"? |
22:02 | <annevk> | content-type ascii case-insensitive |
22:02 | <Philip`> | What about all the other http-equiv values? |
22:02 | <annevk> | where content contains the word charset |
22:03 | <annevk> | content=""* |
22:03 | <Philip`> | Hmm, I suppose I could look for that |
22:04 | <Philip`> | but I won't bother doing that now, since it takes 20 minutes to run |
22:04 | <annevk> | k |
22:21 | <SadEagle> | Philip`: LOL, just stumbled on a webpage with 2 encoding headers, neither of which is right |
23:15 | <jgraham> | Hixie: In the outline algorithm, in the conditon "When exiting a sectioning content element, if the stack is not empty" |
23:15 | <Hixie> | yes? |
23:15 | <jgraham> | it's not clear to me what "Let current section be the last section in the outline of the current outlinee element. |
23:15 | <jgraham> | Insert its outline at the end of the current section. (This does not change which section is the last section in the outline.)" means |
23:16 | <jgraham> | Specifically the last line. |
23:16 | <Hixie> | oops |
23:16 | <Hixie> | "its" refers to the sectioning content element being exited |
23:16 | <Hixie> | let me fix that |
23:21 | <Hixie> | ok fixed |
23:21 | <Hixie> | is that clearer? |
23:25 | <jgraham> | I think that helps. I just need to work through and see if I have a sensible mental model of what's happening |
23:25 | <Hixie> | k |
23:25 | <Hixie> | i know you will, but, let me know what i can do to improve it |
23:26 | <Hixie> | the current algorithm should be way better than what was there before, yet get mostly the same results |
23:28 | <annevk> | Hixie, you're still going through parser feedback right? |
23:28 | <Hixie> | yes |
23:28 | <Hixie> | i'm stalled right now trying to figure out how to handle spaces in <table> x </table> |
23:29 | <annevk> | flip a coin :) |
23:30 | <Hixie> | between what and what? i have no options so far :-) |
23:30 | <annevk> | between " x <table></table>" and "x<table> </table>" |
23:30 | <Philip`> | annevk: Do you mean the algorithm for handling spaces should involve flipping a coin? |
23:31 | <annevk> | Philip`, that could be interesting, but requiring specific hardware for HTML 5 might be too much |
23:32 | jgraham | suggests asking the user to decide |
23:33 | <Philip`> | annevk: Any implementation is acceptable as long as it acts the same as flipping a coin |
23:33 | <Hixie> | annevk: oh i've decided it's "x <table> </table>", the question is how to get there. |
23:33 | <jgraham> | "You have encountered a space inside a table. Would you like to move it outside (Y/n)" |
23:34 | <Philip`> | There should be a web service that provides a stream of random bits, called Flipr |
23:34 | <Hixie> | i'm thinking a flag on the table that decides whether spaces are sent out or not |
23:34 | <Hixie> | that gets set as soon as you send anything out |
23:34 | <Hixie> | the problem is nested tables in the innerHTML case makes this relatively hard to specify |
23:34 | <annevk> | so i think the current spec covers it |
23:34 | <annevk> | because you consume characters until the end |
23:34 | <annevk> | which makes "x " a character block |
23:35 | <annevk> | and the space before it gets treated specially |
23:35 | <Hixie> | the current spec has no concept of "character blocks" :-) |
23:35 | <annevk> | the append the character stuff |
23:35 | <Hixie> | but e.g. "<table> x<span></span> </table>" should become "x<span></span> <table> </table>" |
23:36 | <Hixie> | so it's not that simply |
23:36 | <Hixie> | simple |
23:36 | <annevk> | for real? |
23:36 | annevk | didn't think that trailling space would be placed in front too |
23:37 | <Hixie> | "<table> foo<span></span> bar</table>" shouldn't display "foobar" |
23:37 | <Hixie> | it should display "foo bar" |
23:37 | <Hixie> | that's the bug :-) |
23:38 | <annevk> | a flag sounds easiest then, yes |
23:49 | <jgraham> | Hixie: I think I'm still confused about what it means to insert a section "at the end of" another section. Do you mean "as the last child section of" the other section? |
23:50 | <jgraham> | (so if you have <body><section><h1>foo you end up with a section for the <section> as a child of the section for the <body> |
23:50 | <Hixie> | in "When entering a heading content element" i used the term "append it to /candidate section/" |
23:50 | <Hixie> | is that clearer? |
23:50 | <annevk> | sounds like appendChild()... |
23:52 | <jgraham> | Yeah, that bit is clearer. |
23:52 | <Hixie> | ok i'll use that terminology |
23:52 | Hixie | regens |