00:00
<jgraham>
I'm joking of course...
00:00
<jgraham>
And, I think it is generally obvious what is a document conformance requirement and what is a US requirement
00:01
<jgraham>
I also think the spec makes for a poor reference for authors
00:01
<Hixie>
it certainly makes a poor reference to anyone who isn't technically minded
00:02
<Hixie>
i'm not really skilled enough to write text that is both unambiguous and clear to non-technical people, sadly
00:37
<Hixie>
ok so my experience with JAWS is somewhat poor
00:37
<Hixie>
i can't even get past the first screen of the installer without accessibility problems
00:41
<Hixie>
ok, got it past the first screen by cheating
00:42
<Hixie>
it installed stuff, then said it had to reboot, and then crashed my machine hard.
00:42
<Hixie>
had to hard-power-reset it
01:18
<Hixie>
holy fornicating rabbits, if jaws is the state of the art in speech reading software i'm not _surprised_ that accessibility people are so cranky
01:18
<Hixie>
sweet jesus
02:45
<Dashiva>
I thought m12n was bad enough, now we have a11y?
02:57
<karlUshi>
for a long time it seems http://www.google.com/search?q=a11y
04:49
<Hixie>
wow, the DOM that IE makes for <!DOCTYPE HTML><form> sure is... inretesting
05:31
<hsivonen>
Hixie: I'm awake now
05:34
<Hixie>
hsivonen: i replied to mail instead
05:35
<hsivonen>
Hixie: ok
06:05
<Hixie>
nearly done with this doctype thread
06:05
<Hixie>
sheesh
06:10
<Hixie>
yay, finally done with it
06:24
<Hixie>
i wonder why the definition for the HTML innerHTML setter starts with "Otherwise,"
06:42
hsivonen
finds out he followed up to a post with already a large number of follow-ups that weren't properly threaded. :-(
07:20
<hsivonen>
annevk: my impl passed html5lib/testdata/tokenizer/test1.test on the first try after the new doctype stuff. yay
07:22
<hsivonen>
test2.test revealed one bug. fixed
07:40
<Hixie>
you guys have already got that implemented?
07:40
<Hixie>
sheesh
07:40
<Hixie>
talk about bleeding edge
07:42
<hsivonen>
Hixie: well, I implemented the tokenization part--not yet the tree builder part
07:42
<Hixie>
ah ok
07:44
<zcorpan_>
</><!doctype html> is another interesting case (because the token doesn't reach the tree construction stage). though handled the same as </ foo ><!doctype html> in browsers
07:45
<Hixie>
yeah i
07:45
<Hixie>
er
07:46
<Hixie>
yeah i'm sure there are lots of edge cases that act slightly differently between the spec and browsers
07:46
<Hixie>
those are usually the cases that aren't really important and that the browsers all do differently anyway
07:46
<zcorpan_>
perhaps
07:57
<hsivonen>
people say that authors know HTML 4.01 and, therefore, they want to know the diff to HTML 5, but commentary suggests that usually people don't really know the HTML 4.01 details. witness optional tags
07:58
<Hixie>
i keep saying that
07:58
<Hixie>
people don't believe me though
07:59
<othermaciej>
I love the people staring at gape-jawed horror at the optional tags
08:00
<othermaciej>
and suggesting that XML syntax is a good way to teach HTML
08:00
<othermaciej>
*shudder*
08:01
<zcorpan_>
Hixie: the doctype 1024 bytes thing is firefox, not opera or ie. dunno about safari. reproducable from local disk
08:01
<Hixie>
zcorpan_: yeah, i reproduced it eventually myself too (with your help)
08:01
<karlUshi>
othermaciej: it is. XHTML syntax makes it a lot easier. believe me.
08:01
<karlUshi>
By experience
08:01
<karlUshi>
I have taught both languages
08:02
<Hixie>
zcorpan_: doesn't seem to happen for only spaces, only happens if there's some thing before the doctype other than whitespace
08:02
<othermaciej>
karlUshi: the problem is if you assume XML syntax actually applies
08:02
<othermaciej>
and start doing things like <div id="placeholder" />
08:02
<othermaciej>
or <script src="foo.js" />
08:03
<Hixie>
btw, the html5 parser spec is starting to get good enough that when people report bugs with them, it's usually the case that the bug is actually in one browser and that most of hte other browsers don't have that quirk
08:03
<karlUshi>
but it is not what people do :) this is a geek comment.
08:03
<zcorpan_>
Hixie: you can have whitespace within the doctype, with nothing before the doctype. if the > is not within the first 1024 bytes then firefox gets quirks mode
08:03
<Hixie>
<script src="foo.js" /> is far too common
08:03
<Hixie>
so common we might have to in fact put it in the spec, though i really hope not
08:03
<othermaciej>
<canvas /> is also distressingly common
08:04
othermaciej
apologizes for that one
08:04
<karlUshi>
hehe
08:04
<Hixie>
zcorpan_: i had 1024 x ' ' followed by a doctype and i got standards mode. but a bogus comment then 1024 x ' ' then the doctype, and it's quirks.
08:04
<Hixie>
othermaciej: not so much on the web, only mostly in dashboard widgets, thankfully
08:04
karlUshi
is going to look for his whips in the closets. oh and ropes
08:04
<othermaciej>
karlUshi: teaching people that some tags don't need a close tag seems simpler to me than teaching them that <foo /> is a self-closing tag, but can only be used for a fixed small set of tags
08:04
<karlUshi>
it is not because it depends on the tag
08:05
<othermaciej>
karlUshi: even though the XML spec you are nominally following says it can be used for anything
08:05
<zcorpan_>
Hixie: i don't get standards mode with 1024 x ' ' followed by a doctype in firefox
08:05
<othermaciej>
I agree that XML syntax would be easier to teach if you could actually use XML syntax
08:05
<othermaciej>
but as it is, teaching a well-structured version of HTML syntax seems better
08:05
<karlUshi>
plus things like I was typing yesterday on the channel
08:05
<othermaciej>
(i.e. tell people to close non-empty tags that don't need it to avoid confusion)
08:09
<zcorpan_>
Hixie: or wait, nm, you're right
08:10
<karlUshi>
weeeell, from my *practical* teaching experience, xhtml rules create far less misunderstanding. and what I heard from most teachers is the same. But we might have had different contacts with different people
08:11
<othermaciej>
a lot of people writing XHTML don't understand that the browser is not going to use an XML parser
08:11
<othermaciej>
(possibly most)
08:12
<karlUshi>
here again you see it on the geek side.
08:13
<karlUshi>
it is not relevant for most people to know which parser is used.
08:13
<karlUshi>
but for web developers when reading and maintaining the code it is relevant
08:13
<karlUshi>
hmmm
08:13
<karlUshi>
I have something in my head to publish on HTML viewed from different angles
08:13
<karlUshi>
and people refusing to accept the other camps
08:14
<karlUshi>
it is kind of funny because I do the same kind of arguments on both sides
08:14
<othermaciej>
here's the thing, it's not just that people are unsure
08:14
<othermaciej>
they are deeply factually convinced that XHTML is parsed as XML when served as text/html, or that at least it should be
08:14
<othermaciej>
even prominent w3c working group members are often confused on this point
08:15
<hsivonen>
Hixie: I definitely didn't mean to ask for a 1024 char limit when I asked for doing what Gecko and WebKit do
08:15
<karlUshi>
it is not a problem for web developers :)
08:15
<othermaciej>
when many people actively believe something false, that says to me there is a problem
08:15
<othermaciej>
if web developers choose not to think about it that is one thing, but like I said many have an active belief that is contrary to fact
08:16
<Hixie>
othermaciej: even w3c xhtml2 working group members are often confused on this point! and they invented the spec in question!
08:16
<Hixie>
hsivonen: ah ok
08:16
<Hixie>
hsivonen: good :-)
08:16
<othermaciej>
if I believed my CPU was a PPC chip not an Intel chip, it probably wouldn't matter to me most of the time
08:16
<othermaciej>
but the times that it does, it would seriously mess me up
08:16
<othermaciej>
especially if I refused to accept evidence that it was indeed an Intel chip
08:17
<hsivonen>
othermaciej: :-)
08:17
<karlUshi>
othermaciej: good example. When does it really matter? I'm really curious.
08:17
karlUshi
having is life on macs for more than 10 years now
08:17
<Hixie>
annevk: yt?
08:18
<othermaciej>
karlUshi: as a developer, I have to care when writing inline assembly code, which is rare but sometimes necessary
08:18
<zcorpan_>
karlUshi: people use things like <span/> and expect it to close itself
08:18
<Hixie>
annevk: so you asked for <!-->--> to be treated like <!--->--&gt;
08:18
<karlUshi>
othermaciej: here again, talking about ubergeeks
08:18
<Hixie>
annevk: but it turns out only IE does that in no-quirks mode
08:18
<othermaciej>
karlUshi: as a user, I need to realize that PPC-only binaries will be running in emulation and so will be slower and will use a lot of memory
08:18
<othermaciej>
karlUshi: well, most people don't have to care what their CPU is
08:18
<Hixie>
annevk: so, do we want to change no-quirks-mode comment parsing, or do we want to introduce a quirks-mode tokeniser difference?
08:18
<karlUshi>
people who deals with xslt and xquery etc. Yes I agree with you will have to know what is XML
08:19
<karlUshi>
not most web developers
08:19
<othermaciej>
karlUshi: but if you call tech support and tell them you have a PowerPC chip instead of "I don't know", you could have problems
08:19
<othermaciej>
it's better not to know than to be convinced of the wrong thing
08:19
<othermaciej>
if you don't know, at least you know that you should look it up if you need to know
08:19
<zcorpan_>
Hixie: the former, imho, unless it breaks pages
08:20
<Hixie>
i wonder how to find out whether it breaks pages
08:20
<Hixie>
hm
08:20
<karlUshi>
Many people say they machine name at best, (sometimes the color and the shape) and when they bought it. The rest is
08:20
<annevk>
am now
08:20
<annevk>
Hixie, I think if IE does it it should be safe enough for other browsers
08:21
<annevk>
<!--> and <!---> btw
08:21
<Hixie>
(yeah)
08:21
<Hixie>
hmm
08:21
<Hixie>
that's a plausible argument i guess
08:21
<annevk>
it is for parsing
08:21
<Hixie>
right then
08:21
<othermaciej>
karlUshi: right, many people don't know what their CPU is, but few are certain that it's something other than what it is
08:21
<annevk>
if this was about DOM methods...
08:22
<Hixie>
we'll see what browser vendors say when they try to implement it! :-)
08:22
<hsivonen>
karlUshi: even if people don't know which CPU they have, they aren't vehemently believing that they have a different CPU than what they actually have
08:24
<karlUshi>
interesting, I hear people from this group asking for being real with people, and then sometimes to have people to require extreme knowledge of the technology... when obviously xhtml syntax rules worked for most people and made them developer beautiful xhtml/css web sites. :)
08:24
<karlUshi>
not. logical. at. all.
08:24
<karlUshi>
anyway
08:24
<karlUshi>
need to move my butt to the train
08:25
hsivonen
wonders how misrepresenting testable facts is being real
08:25
<hsivonen>
(i.e. teaching people that XHTML as text/html gets an XML treatment)
08:26
<annevk>
I wonder what the difference is between learning for which tags you can write <foo/> and for which you always have to write <bar></bar> and learning HTML
08:26
<othermaciej>
because the former is well-formed XML and the latter is tag soup
08:26
<hsivonen>
on the bright side, even if XHTML brings out irrational beliefs it doesn't make people as hostile as syndication feeds :-)
08:27
<annevk>
othermaciej, I see it now, makes perfect sense!
08:29
othermaciej
takes a bow
08:30
<weinigLap>
claps
08:30
<Hixie>
ok this comment thing is going to be a bitch
08:30
<annevk>
http://validator.whatwg.org/ is cool!
08:30
<Hixie>
to get the right parse errors i have to add two new states!
08:30
<Hixie>
annevk: heh, it's been there since forever (though it used to give a 403)
08:30
<Hixie>
but the link seemed useful, indeed
08:38
<MikeSmith>
so what reason do you give to those already indoctrinated in closing-tag-required-on-empty-elements for why <script src="foo.js" /> is not conformant?
08:40
<annevk>
MikeSmith, IE doesn't support that?
08:40
<annevk>
nor do lots of older browsers
08:41
<annevk>
it's not backwards compatible at all
08:41
<Hixie>
MikeSmith: they're writing HTML4. And it's not conforming HTML4.
08:42
<Hixie>
MikeSmith: for the same reason that {script} isn't conforming HTML4. it's just not HTML4's syntax.
08:49
<MikeSmith>
or to point out that you can create a well-formed, valid XHTML1 document that is no HTML4-conformant ... ?
08:50
<MikeSmith>
not HTML4-conformant
08:50
<othermaciej>
it's pretty hard to make a document that's both conforming XHTML1 and conforming HTML4
08:50
<annevk>
it's impossible
08:50
<zcorpan_>
annevk: nope, you can play with PIs
08:51
<zcorpan_>
they end with > in html and ?> in xml
08:51
annevk
wonders how that would solve stuff
08:51
<Hixie>
it is indeed possible, though not at all useful
08:51
<hsivonen>
IIRC, there's one on damowmow
08:51
<Hixie>
indeed
08:52
<Hixie>
http://damowmow.com/playground/html-or-xml.html
08:52
<Hixie>
http://damowmow.com/playground/html-or-xml.xml
08:52
<Hixie>
(same file)
08:52
<Hixie>
technically it's not conforming to either HTML4 nor XHTML1, but it validates as both
08:53
<zcorpan_>
Hixie: why is it not conforming?
08:53
<Hixie>
PIs aren't allowed
08:53
<Hixie>
iirc
08:56
<annevk>
Hixie, do you want replies to e-mails where I agree with your response (even if phrased as question)?
08:56
annevk
isn't keen on flooding the list with "Yeah", "Yeah", ...
08:57
<annevk>
Hixie, cool
08:57
<hsivonen>
is a conforming application of SGML allowed to ban PIs?
08:57
hsivonen
guesses no
08:57
hsivonen
assumes HTML 4.01 isn't conforming
08:58
<zcorpan_>
html4 says that authors are discouraged from using sgml features with little support in html uas, or some such
08:58
<zcorpan_>
don't think xhtml bans pis
08:59
<zcorpan_>
except in appendix c
09:01
<Hixie>
annevk: no, don't bother sending mail unless you want me to change the spec
09:01
<Hixie>
(i mean, you can, but i'm not tracking the issues that i've replied to, so it doesn't really do much)
09:02
<annevk>
good
09:04
hsivonen
still does not understand why Java JSON impls don't inherit JSONArray from LinkedList and JSONObject from HashMap
09:05
<othermaciej>
wouldn't you want an array to be an array, instead of a linked list?
09:05
<annevk>
Hixie, thanks for aligning the quirks mode sniffing with the html5lib implementation (whether intentional or not)
09:06
<othermaciej>
also, wouldn't restricting the allowed key/value types for HashMap violate the Liskov Substitution Principle?
09:07
<hsivonen>
othermaciej: given what "array" means in JSON, the obvious mutable Java mapping is an instance of the List interface
09:08
<othermaciej>
hsivonen: I'm assuming an array in JSON is much like a JavaScript Array
09:08
<othermaciej>
which is a sparse array, not a linked list
09:08
othermaciej
is not sure if JSON allows the elision syntax, if not, it's non-sparse
09:09
<hsivonen>
othermaciej: not having a JSONValue common subtype indeed would make generics with JSONArray and JSONObject ugly, but having those take java.lang.Objects with magic restrictions would eliminate annoying boxing code
09:10
<hsivonen>
othermaciej: as far as I can tell, JSONArray is conceptually a java.util.List
09:10
<hsivonen>
othermaciej: doesn't matter if it is backed by LinkedList or ArrayList
09:10
<othermaciej>
well, performance-wise it does
09:10
<othermaciej>
it's almost never good to use a linked list instead of an array
09:11
<hsivonen>
of course, but in API terms both are better than this boxing/unboxing drudgery I have to deal with
09:11
othermaciej
wonders what the difference is between ArrayList and Vector
09:11
<Hixie>
annevk: what did i change?
09:11
<hsivonen>
othermaciej: ArrayList doesn't do synchronized on its own
09:12
<hsivonen>
Vector is part of the overly thread-safe legacy
09:12
<othermaciej>
I see
09:12
<hsivonen>
would be cool to update onvdl to use non-synchronized collections some day
09:13
<annevk>
Hixie, the formatting and moving down the two doctypes for which the systemid has to be missing
09:15
Hixie
looks up the Liskov substitution principle and wonders why it has such a fancy name instead of being called "common sense"
09:15
<Hixie>
annevk: i did it purely for readability reasons, but i'm glad it made things better :-)
09:16
<othermaciej>
Hixie: people shows more respect if you use a fancy term instead of saying "that violates common sense"
09:16
<Hixie>
fair point
09:17
<Hixie>
i had the same reaction when i learnt of De Morgan's laws at university
09:18
<hsivonen>
If I some day write my own JSON mapper for Java, JSONString will be a straight java.lang.String, JSONArray will be java.util.List<Object> and JSONObject will be java.util.Map<String, Object>
09:18
<Hixie>
i was like "wait, this has a name? i've been doing this since i was 10"
09:18
<annevk>
whatwg.org down?
09:19
<Hixie>
it's having issues
09:20
<annevk>
oh well, there are two repositories now :)
09:21
<annevk>
maybe someone should get ambitious and make the html5.org tracker handle timeouts by switching repository
09:21
<Hixie>
http://www.dreamhoststatus.com/ usually has information about why whatwg.org or hixie.ch are down
09:21
<Hixie>
though not this time
09:24
<othermaciej>
hsivonen: yeah, too bad you can't make the generics quite exactly perfect without introducing a base class for simple values
09:24
<annevk>
it's back up it seems
09:24
<othermaciej>
I guess you have to either box primitive types and strings or make the interface a little looser than Java likes
09:25
<hsivonen>
othermaciej: the latter is so much nicer to program with
09:25
<hsivonen>
othermaciej: also, in this case, it would have made equal() sane by default
09:25
hsivonen
had to write an external JSON equality test
09:25
<othermaciej>
:-(
09:25
<hsivonen>
equals()
09:26
<hsivonen>
anyway, all that code is now done
09:26
<hsivonen>
and tests pass
09:34
<annevk>
Hixie, several changes such as &&, </ in <script>, etc. have only been updated in the parsing section and not in the "writing" equivalent sections
09:35
<Hixie>
oh, crap
09:35
<Hixie>
any idea what the "etc" are?
09:36
<annevk>
the other &... thingies
09:36
annevk
checks for more
09:36
<annevk>
that's it
09:36
<annevk>
although maybe <p>test</body> needs something... dunno
09:55
<Hixie>
good lord
09:55
<Hixie>
i can't work out how to write the requirements on magic cdata comment stuff for authors
09:59
<annevk>
you already did that
10:09
<Hixie>
turns out it's wrong
10:10
<Hixie>
it disallows <!-- <!-- -->
10:11
<annevk>
that makes me wonder what <!-- <!--> --> does
10:11
<Hixie>
it disallows that too
10:12
<Hixie>
both should be allowed
10:12
<annevk>
you can nest them?
10:15
<Hixie>
no
10:15
<Hixie>
<!-- <!--> --> is equivalent to <!- xx--> xxx
10:36
<Hixie>
right, fixed
10:45
<Hixie>
ugh http://bugs.webkit.org/show_bug.cgi?id=12646
10:45
<Hixie>
html5 breaking pages
10:45
<Hixie>
bummer
10:49
<zcorpan_>
is it <h3><h4> vs <h3><a><h4>?
10:49
<Hixie>
yes
10:51
<Hixie>
wait... IE lets you nest them anyway
10:51
<Hixie>
wtf
10:52
<zcorpan_>
indeed
10:52
<annevk>
the rendering doesn't change though
10:52
<annevk>
nested <h1> keep the same font size
10:53
<Hixie>
sure, because they use absolute font sizes
10:53
<annevk>
I suppose that might have been the reason for other browsers to not do the nesting
10:53
<zcorpan_>
<h1><p>
10:53
<Hixie>
we could do that too with the 'rem' units
10:53
<Hixie>
hah
10:53
<Hixie>
IE is weird
10:54
<Hixie>
all browsers do <h1><p> as nesting
10:54
<Hixie>
as does the spec
10:54
<zcorpan_>
ah, ok
10:55
<Hixie>
what should we do? allow header nesting and require the stylesheet to use 'rem' or 'px' units?
10:55
<Hixie>
or special-case <a>?
10:55
<Hixie>
hmm
10:55
<annevk>
<h1><p>x</h1>x :(
10:56
<Hixie>
":-(" ?
10:56
<annevk>
I get red text nodes in IE
10:56
<Hixie>
interoperable for me
10:56
<Hixie>
oh well yes
10:56
<Hixie>
IE goes red at a moment's notice
10:56
<annevk>
heh
10:56
<annevk>
I'd go for nested headers with rem I think...
10:57
<annevk>
(would be nice if rem took <body> into account...)
10:58
<Hixie>
it's sad since every other browser does it the same way
10:58
<hsivonen>
annevk: got tests for the crazy [R]CDATA escape flag?
10:59
<annevk>
yeah
10:59
<annevk>
tests5.dat
10:59
<annevk>
html5lib doesn't pass them though
10:59
<hsivonen>
annevk: ah. thanks. no tokenizer-level tests, though?
11:00
<hsivonen>
annevk: is it ok to write and check in some even if html5lib itself doesn't pass them yet?
11:00
<hsivonen>
annevk: shall I start a new file?
11:00
<annevk>
hsivonen, please make it a separate file
11:00
<hsivonen>
ok
11:00
<annevk>
hsivonen, I suppose some people want to disable it like they did with html5
11:00
<annevk>
euh, test5
11:05
<Hixie>
wouldn't tests that fail be the most useful kind of test?
11:06
<hsivonen>
Hixie: in TDD you want to pass tests--not fail :-)
11:06
<Hixie>
tdd?
11:07
<zcorpan_>
is the .constructor attribute specced anywhere? (can i use it in test cases?)
11:07
<Hixie>
zcorpan_: try ECMA262
11:08
<hsivonen>
Hixie: Test-Driven Development
11:10
<hsivonen>
argh. the % operator in Java is crazy
11:10
<hsivonen>
sane in Python
11:10
<Hixie>
ah
11:10
<Hixie>
well, sure, you want the tests to pass
11:11
<Hixie>
but when you write them they should fail
11:11
<Hixie>
new tests that pass seem pointless :-) (except for regression testing, of course, but that's boring :-P)
11:11
hsivonen
failed due to -1 % 4 resulting in -1 instead of 3
11:12
<zcorpan_>
Hixie: thanks
11:13
Hixie
comes across an e-mail showing yet another difference between IE and all other browsers
11:13
<Hixie>
<ol> <li> xxx </li> yyy </ol>
11:13
<zcorpan_>
yup
11:13
<Hixie>
i guess i'll sleep on it
11:14
<Hixie>
since we don't have a bug report for this one i'm tempted to leave it
11:14
<annevk>
I think we're fine with aligning more with IE
11:14
<zcorpan_>
in wysiwyg editors that use contentEditable or designMode, a nested list takes the form <ul><li>foo</li><ul><li>bar</li></ul></ul>
11:14
<Hixie>
if we don't, we'll end up putting comments and spaces in the wrong place
11:14
<Hixie>
zcorpan_: good times
11:14
hsivonen
makes his first check-in to html5lib
11:14
<zcorpan_>
if you write content with ie the end result is conforming because the </li> is ignored. if you write with other browsers the end result is non-conforming
11:15
<zcorpan_>
people blame other browsers
11:15
<Hixie>
yeah
11:15
<Hixie>
oh?
11:15
<Hixie>
uri to blame-giving?
11:15
<Hixie>
i haven't seen people complain about that
11:15
<Hixie>
that's interesting
11:15
<Hixie>
i'd love to read more
11:15
<zcorpan_>
s/people/wysiwyg tool authors/
11:16
<zcorpan_>
only over IM
11:16
<zcorpan_>
in swedish
11:16
<Hixie>
heh ok
11:16
<Hixie>
well changing these things is very risky and expensive
11:16
<Hixie>
but we'll see
11:16
<Hixie>
first though, i shall sleep
11:16
<Hixie>
nn
11:16
<zcorpan_>
nn
11:16
<hsivonen>
nn
11:18
<annevk>
hmm
11:18
<annevk>
handling <!-- in CDATA requires either rearchitecture or hooking into the tree construction stage
11:19
<hsivonen>
annevk: huh. WFM in the tokenizer
11:19
<annevk>
and I don't have a clear plan for the rearchitecture either
11:19
<annevk>
hsivonen, does your tokenizer have knowledge of tag names?
11:19
<hsivonen>
annevk: it knows about void elements
11:19
<zcorpan_>
"The location attribute of the HTMLDocument interface must return the Location object for that Document object." -- i don't know how to test this
11:22
<annevk>
hsivonen, is your impl online somewhere?
11:31
<hsivonen>
annevk: not yet. I had emailed cvsdude support about it. they just emailed my that I have to ask fantasai, so I emailed her to flip the switch
11:32
<hsivonen>
annevk: If you want it now, I can upload a .zip somewhere
11:32
<hsivonen>
s/my/me/
11:33
<annevk>
I can wait I suppose
11:33
<annevk>
there are some other changes I want to make
11:33
<hsivonen>
creating a dump now
11:34
<rubys>
anybody have plans to fix the tests that hsivonen just checked in?
11:34
<hsivonen>
rubys: fix the tests or fix the impl to pass the tests?
11:34
<annevk>
yes, but I'm not sure how
11:34
<annevk>
(talking about fixing the impl here)
11:35
<rubys>
hsivonen: I'm just trying to figure out whether this will be fixed shortly, or if these tests should be added to the todo list.
11:36
<annevk>
If someone can give me a plan for how to do it I'll fix the Python code
11:36
<annevk>
until that happens it's prolly TODO work
11:36
<hsivonen>
annevk: http://hsivonen.iki.fi/htmlparser-dump.zip
11:36
<hsivonen>
annevk: look in nu.validator.*
11:36
<hsivonen>
annevk: fi.iki.* is legacy
11:37
<hsivonen>
annevk: the impl should be in sync with June 17th spec for everything except entities which is in sync with June 12th
11:41
<hsivonen>
annevk: see the last lines of read() as well as dataState()
11:45
<annevk>
HTML4 errors, nice
11:46
<hsivonen>
annevk: as for the plan, I suggest you do what I'm doing, but you need to figure which one of Python's data structures is the most effient one for that kind of low level stuff
11:46
hsivonen
guesses either list or string
11:48
<annevk>
the problem is getting the last set of characters
11:49
<annevk>
getting this to work fast is another issue...
11:49
<annevk>
for instance, since < triggers closetagopen state that will emit a character '<' on its own
11:49
<annevk>
which is only accessible from the tree construction stage
11:50
<hsivonen>
annevk: can't whatever you use to read the next char put the char into a four-slot ring buffer?
11:51
<annevk>
just for the data state?
11:51
<annevk>
that makes some sense
11:51
<hsivonen>
annevk: don't you have an equivalent for my read()?
11:51
<annevk>
I do
11:51
<annevk>
so that would actually make it fairly trivial :)
11:52
<annevk>
I think...
11:54
<hsivonen>
if (contentModelFlag != ContentModelFlag.PCDATA) {
11:54
<hsivonen>
prevFourPtr++;
11:54
<hsivonen>
prevFourPtr %= 4;
11:54
<hsivonen>
prevFour[prevFourPtr] = c;
11:54
<hsivonen>
}
11:54
<hsivonen>
return c;
11:55
<hsivonen>
where the if may be a premature optimization
11:56
<rubys>
in python/ruby you do just as well with push and pop(0)
11:57
<hsivonen>
rubys: ok. I'm not familiar enough with the performance characteristics of python/ruby date structures for this kind of thing
12:15
<annevk>
hsivonen, do you just do that in the dataState?
12:15
annevk
wonders if it covers <!</-- well enough
12:16
<hsivonen>
annevk: didn't Hixie spec it for the data state only?
12:17
<hsivonen>
annevk: read() updates the ring buffer from underneath the states
12:20
<annevk>
ah ok, that clarifies it
12:21
<hsivonen>
it's crazy how long the read() method needs to be in order to get all the right things done
12:27
<annevk>
our read method is on the iput stream object
12:29
<hsivonen>
for efficient SAX character data reporting, the tokenizer in Java needs to have access to its buffer
13:10
<annevk>
you actually only need to look at the last three characters, no?
13:21
<zcorpan_>
"The href attribute returns the address of the page represented by the associated Document object, as an absolute IRI reference." -- this means that both percent-encoded and not percent-encoded are ok, right?
13:24
<annevk>
prolly
13:42
<annevk>
reserializing this document gets ugly: <script><!--
13:42
<hsivonen>
annevk: yeah, the current character has already been seen, so it is sufficient to examine the three previous ones
13:49
<annevk>
hmm
13:49
<annevk>
it's sort of working but I'm hitting weird bugs I can't figure out
14:15
<annevk>
hah, I think I nailed it
14:15
<annevk>
running tests it doesn't seem slower so far
14:16
<annevk>
time to update the tests
14:31
<annevk>
hsivonen, one of your tests has a bug
14:32
<annevk>
hmm
14:33
<hsivonen>
annevk: oh. which one?
14:33
<annevk>
foo<!--></bar><!-->baz</bar>
14:33
<annevk>
<!--> is either a single comment or the start of one
14:34
<annevk>
it's certainly not character data
14:34
<hsivonen>
annevk: why not?
14:34
<annevk>
after the first </bar> is emitted you switch to PCDATA
14:35
<annevk>
we do at least, I suppose that could differ per implementation
14:36
<hsivonen>
oh, right. the latter one
14:36
<annevk>
yeah, the second
14:36
<zcorpan_>
annevk: <!--> can be the start of a comment?
14:36
<annevk>
it also says "end tag surrounded"
14:36
<annevk>
zcorpan_, that was previously the case
14:36
<zcorpan_>
ah
14:37
<annevk>
hsivonen, that should also be changed I suppose
14:37
<hsivonen>
I start to suspect my test harness or impl is broken
14:38
<hsivonen>
brokenness with harness
14:38
hsivonen
blushes
14:39
<hsivonen>
my impl is totally b0rked, too
14:47
<hsivonen>
my harness was so b0rked it isn't even funny
14:49
<annevk>
another advantage of multiple implementations
14:57
<annevk>
our test coverage is pretty good btw
14:57
<annevk>
each change I make triggers at least one error in the testsuite
14:57
<annevk>
I usually add some regression tests while I'm at it
15:04
<hsivonen>
annevk: has the handling of <!--> in PCDATA changed since June 17th?
15:04
<annevk>
yes
15:04
<annevk>
there are two new comment states to handle <!--> and <!---> as near-empty incorrect comments
15:04
<hsivonen>
argh
15:05
<hsivonen>
I guess I have to go diffing the spec again
15:05
<annevk>
you misunderstand, this is a good thing :)
15:05
<hsivonen>
sure
15:06
<hsivonen>
annevk: thanks
15:08
<annevk>
I think I'll implement that now and see what breaks
15:30
<annevk>
hsivonen, I checked in basic <!--> and <!---> tokenizer tests
15:34
<rubys>
annevk: I'm seeing two failures w/python, is that what you are seeing?
15:35
<annevk>
I have three
15:35
<annevk>
I suspect one of them is just something on my side
15:35
<annevk>
There's one failure in escapeFlag, but that's because the testcase has to be fixed
15:35
<rubys>
do a svn up... you might not have t.broyer's fix
15:35
<annevk>
and there's a failure with respect to newlines
15:35
<annevk>
I've no idea how to fix that
15:36
<annevk>
I got his fixes
15:36
<rubys>
when the code stabilizes, I'll take a look at test_newlines
15:40
<rubys>
... which I guess is now (down to one failure)
15:41
<annevk>
yeah, for the moment
15:47
<rubys>
fixed
15:52
<annevk>
cool
15:54
<rubys>
planning to break more stuff? :-) Or is it safe to try to port these changes to Ruby now?
15:56
<annevk>
heh, I think I'll stop doing html5lib for the rest of the day
16:00
<annevk>
things we need to do at some point:
16:00
<annevk>
* fix the new innerHTML stuff (adding a newline for <pre> and <textarea>)
16:00
<annevk>
* fix <isindex>
16:00
<annevk>
* implement almost standards mode
16:00
<annevk>
* <p></body>
16:01
<rubys>
"almost standards mode" doesn't apply to fragments, does it?
16:02
<annevk>
currently quirks mode and almost standards mode don't affect parsing at all
16:02
<annevk>
they're just determined during tree construction
16:02
<rubys>
ah, gotcha
16:04
<rubys>
why did you comment out #import hotshot, hotshot.stats in mockParser?
16:05
<annevk>
oh, that happened on accident I think
16:05
<annevk>
I don't have hotshot here and wanted to play with the other parts
16:05
<rubys>
ok, I'll include that in my next commit.
16:06
<rubys>
I'll move the import down to where it is needed
17:28
<mitsuhiko>
is there lxml support for html5lib around somewhere?
17:28
<mitsuhiko>
i want xpath ;)
17:43
<rubys>
libxml2 is supported in python html5lib
17:57
<mitsuhiko>
rubys: really?
17:57
<mitsuhiko>
i can't find it somehow
18:04
<Philip`>
mitsuhiko: Are you looking in the 0.9 release rather than the SVN version?
18:05
<mitsuhiko>
Philip`: no. svn version
18:11
<Philip`>
mitsuhiko: http://html5lib.googlecode.com/svn/trunk/python/src/treebuilders/__init__.py - looks like it needs etree with implementation=lxml.etree
18:11
<mitsuhiko>
ah
18:12
<Philip`>
(I've not tried using it myself, though)
19:03
<Jero>
does the <dialog> element also apply to comments on a weblog?
19:34
<jgraham>
mitsuhiko: Did you get it working?
19:34
<mitsuhiko>
jgraham: i haven't further tried because i found out that lxml is not exactly what i need
19:35
<mitsuhiko>
basically what i want is a simple tree i push to plugins which then manipulate it, i pickle it afterwards and render on request
19:36
<jgraham>
Does lxml not fulfil that use case
19:36
jgraham
wonders if picking wouldn't work
19:36
<jgraham>
pickling even
19:36
<mitsuhiko>
jgraham: it's out of the scope because it's not a python library
19:36
<mitsuhiko>
(unfortunately)
19:37
<jgraham>
Oh, you need to work with the python stdlib?
19:37
<mitsuhiko>
no, just not binary stuff
19:37
<jgraham>
Does that not rule out XPath?
19:37
<mitsuhiko>
jgraham: indeed. that's why i found a different solution
19:38
<jgraham>
What are you using?
19:39
<mitsuhiko>
i have a minimal Node class with a filter function that uses a very simple query "language" that is easily parsable
19:39
<mitsuhiko>
(in theory performance doesn't matter because i put the pickled stuff into the database but i want to keep it simple)
19:40
<mitsuhiko>
but the html5 library is really nice :D
20:00
<annevk>
Jero, I'd suggest e-mailing the list; that is either a good example for good usage or bad usage
20:01
<annevk>
Jero, personally I think it would be incorrect, <article> is for marking up comments iirc
20:01
<annevk>
Jero, personally I think it would be incorrect, <article> is for marking up comments iirc
20:01
<Jero>
annevk: yeah, I saw that too, but <dialog> also seemed like a good candidate. I'll email the list, thanks
20:02
<annevk>
krijnh, pointer?
20:02
<annevk>
hmm
20:21
<annevk>
http://pkarl.com/notebook/if-you-dont-approve-of-html5-then-youre-a-communist/
20:22
<Lfe>
got to love how blogs has improved page titles :)
20:22
<annevk>
http://www.burningbird.net/technology/marathon-20/
20:40
<Jero>
does anyone know anything about the http://php-html5lib.dashslot.net/trac project?
21:11
Philip`
wonders why he always ends up writing far more than he intends to
21:12
<Hixie>
just go back and remove sentences afterwards :-)
21:12
<Hixie>
i often write e-mails twice as long as the ones i eventually send out
21:12
<Hixie>
just have to apply a razor
21:23
<Hixie>
annevk: maybe try "Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent."
21:25
<annevk>
christ
21:25
<annevk>
got to love comments like that
21:26
<Hixie>
yeah
21:26
<Hixie>
the text above is what html5 says
21:26
<Hixie>
fwiw
21:26
<annevk>
k
21:26
<Dashiva>
How about just combining them? "MUST follow either the algorithm below or an algorithm giving identical results"
21:27
<gsnedders>
I prefer the current text over that
21:27
<othermaciej>
I think instead of changing the conformance requirements for every algorithm, the text that says equivalent algorithms are ok could be made more explicit
21:28
<annevk>
I think that's what Hixie suggested above
21:28
<Hixie>
Dashiva: there's like a thousand occurences of the text "must follow these steps:" in html5, i'm not replacing every single one with weasle-wording
21:29
<othermaciej>
I did not see his text above
21:29
<othermaciej>
I guess I should check the logs
21:29
<Hixie>
22:31 < Hixie>|annevk: maybe try "Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent."
21:29
<Hixie>
(it's what's in html5)
21:30
<othermaciej>
that sounds pretty good
21:32
<Hixie>
it wouldn't satisfy people who are looking for problems where there are none, sadly
21:37
<othermaciej>
if you wanted to be even more explicit you could say "Wherever this user agent says that a user agent must follow a particular algorithm or sequence of stops, user agents may use any algorithm that has the same results."
21:37
<othermaciej>
just so no one imagines a must / may conflict
21:38
<Dashiva>
That's a good one
21:38
<othermaciej>
s/this user agent/this specification/
21:52
<annevk>
bjoern disagrees...
21:52
<annevk>
I think I'll call it a day
21:52
<annevk>
to boring to worry about now
21:53
<Hixie>
hear hear
22:04
<Hixie>
hey, anyone have a link to that summary of why versioning is bad? i think it was either henri or lachy that wrote it
22:04
<Hixie>
can't find it anywhere
22:07
<gavin>
http://lists.w3.org/Archives/Public/public-html/2007Apr/0858.html , maybe?
22:07
<gavin>
I recall one from hsivonen that I can't find, either
22:09
<Hixie>
yeah that's all i could find too (arguments against IE's plan, rather than versioning in general)
22:09
<gavin>
this is going to bother me now
22:10
<bewest>
Hixie: some people interpret the versioning issue to imply that since html(5) will have no versioning in the doctype, that therefore every feature of html5 is perfect and won't need revising.
22:11
<Hixie>
yeah
22:12
<bewest>
at least, that is the complaint in #web
22:12
<bewest>
among others.
22:14
<hasather>
Hixie: http://lists.w3.org/Archives/Public/public-html/2007JanMar/0433.html?
22:15
<Hixie>
hm yes
22:15
<Hixie>
i think that was it!
22:15
<Hixie>
thanks!
22:16
<gavin>
ah, good
22:17
<hasather>
Hixie: there's also http://lists.w3.org/Archives/Public/public-html/2007Apr/0279
22:18
<Hixie>
i think he sent one to www-tag at some point, too
22:19
<hasather>
found those via zcorpan's del.icio.us :)
22:19
<Hixie>
dbaron's is pretty good too, yeah
22:22
Hixie
sends his mail to www-archive
22:25
<hendry>
are there any specs for spreadsheet type support in html?
22:25
<hendry>
there's mathml, in know, but is that it?
22:25
<Hixie>
define "spreadsheet type support"?
22:26
<Hixie>
http://spreadsheet.google.com/ is all HTML
22:26
<hendry>
functions and stuff I guess
22:26
<hendry>
:)
22:26
<bewest>
and there's an API for the spreadsheets stuff
22:26
<hendry>
though that's a representation?
22:33
<othermaciej>
annevk, Hixie: I think to be really and truly proper, it is a bit nicer to explicitly say "as if" at each site mentioning an algorithm, but a blanket clarification in the conformance section is also fine
22:33
<othermaciej>
bjoern's suggested alternative is just crazy
22:40
<Hixie>
the html5 spec is hard enough to read already without having pointless "as if"s all over the place
22:40
<Hixie>
everyone to whom it actually matters is completely aware that it's "as if" anyway
22:41
gsnedders
just wants to say "as if it's hard to read!"
22:47
<hsivonen>
en-GB-x-hixie-x-valleygirl
22:48
<gsnedders>
as if.
22:53
<Philip`>
It's annoyingly hard to concentrate when there's two simultaneous firework displays outside
23:08
<Hixie>
wow
23:08
<Hixie>
Opera actually supports <![CDATA[ ]]> in text/html
23:08
<Hixie>
it creates an actual #cdata-section
23:08
<Hixie>
i'm glad i added support for #cdata-sections to the live DOM viewer now
23:08
<Hixie>
i really didn't think that code would ever get hit
23:16
<Philip`>
Hixie: It seems to be implemented pretty weirdly - it dislikes having either [ or ] inside the CDATA section, which doesn't make much sense to me...
23:17
<Philip`>
Oh, actually it just dislikes it when the number of [ is not equal to the number of ]
23:23
<jgraham>
Philip`: Where are you with all the fireworks?
23:23
<Philip`>
http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21%5BCDATA%5B%20%5B%5B%5D%5D%3E%20%5D%5D%3E - ah, it looks very much like it has a count of how deeply nested it is, and if sees a > after the nesting level has got back to 0 then it exits CDATA mode
23:24
<Philip`>
http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21%5BCDATA%5B%5Dxyzw%5Dabcd%3E - and once it's seen the > at nesting-level 0, it deletes the last two characters (regardless of what they were)
23:26
<Philip`>
...which all seems like a totally bogus thing to do
23:27
<Hixie>
it drops everything between the last ] and the > as far as i can tell
23:27
<Hixie>
oh no
23:27
<Hixie>
you're right
23:29
<Hixie>
has to be exactly the string <![CDATA[, too
23:29
<Hixie>
that's so funny
23:29
<Hixie>
i wonder how this can be abused
23:33
<Philip`>
Looks like it parses exactly the same (requiring nested brackets and everything) if it just has <![ but only creates the cdata-section if it's followed by cdata[
23:33
<Hixie>
huh
23:33
<Hixie>
weirder and weirder
23:35
<Philip`>
Oh, no, it does it you have <!] too (then the nesting level starts at -1 so you have to do <!][> to get out of it)
23:36
<Philip`>
+if
23:36
<Hixie>
lol
23:36
<Hixie>
nic8e
23:36
<Hixie>
nice
23:36
<Hixie>
how about <!--, does that still have the counting?
23:46
<Philip`>
jgraham: In Cambridge (where it looks like most of the people are quite happy to have stopped working for the next few months :-) )
23:47
<jgraham>
Snap :)
23:47
<Philip`>
Hixie: I can't see Opera doing anything other than waiting until it sees --> after it's got <!--
23:48
<Philip`>
though it does do the []-counting if you just have <!- followed by anything except -
23:51
<Philip`>
jgraham: Are you somewhere near the centre of Cambridge?
23:52
<jgraham>
Yeah
23:52
<Hixie>
Philip`: interesting
23:53
Philip`
was perfectly located to see the fireworks as long as he opened his window quite wide and let all the rain in
23:54
<jgraham>
My view is usually blocked so I didn't try very hard
23:58
<Hixie>
only 122 more e-mails in my html-parsing-rules folder
23:58
<Hixie>
(not counting the -encoding subfolder, etc)
23:58
<bewest>
oo othat reminds me to send some feedback from a coworker