00:00
<jgraham>
(to be fair it says "sign in with your gmail account" but it doesn't say that you failed to do so)
00:00
<Hixie>
jgraham: feedback conveyed
00:00
<Hixie>
and they agreed, so hopefully it'll be fixed :-)
00:01
<jgraham>
Great :)
00:01
<Hixie>
webben: <header> elements are ways of wrapping multiple <hx> elements into one header, so you can have tag lines, e.g.
00:03
webben
can't understand why a tagline would want to be inside a <hx> element
00:04
<webben>
but that's not quite what I was asking ... is <header><h3>foo</h3></header> okay?
00:04
<jgraham>
webben It turns out that lots of people do that in the real world. It breaks any tool that tries to generate a document outline but they think it's "more semantic"
00:05
<webben>
jgraham, yeah but they're crazy
00:05
<Hixie>
<header><h3>foo</h3></header> is equivalent to <h3>foo</h3> iirc
00:05
<Hixie>
or maybe equivalent to <h1>foo</h1>
00:05
<Hixie>
i forget
00:05
<Hixie>
see the spec :-)
00:05
<jgraham>
Well, maybe. For subheadings what they are trying to do makes a lot of sense
00:06
<webben>
jgraham, ah you're talking about <header><h1>foo</h1><h2>bar</h2></header> not <header><h3>foo</h3></header> when you say "that"?
00:06
<webben>
why not just have a subhead
00:06
<webben>
element
00:07
<Hixie>
because you might have:
00:07
<Hixie>
<header><p>Welcome to...</p><h1>My home!</h1><h2>or what some people might call "my cube"</h2></header>
00:07
<jgraham>
But this is why the spec should be clear about what the use cases behind semantic constructs are so there is some hope people won't break well intentioned UAs by over broadening their element use
00:07
<Hixie>
yeah
00:07
<Hixie>
i thought the examples for <header> were clear
00:08
<webben>
Hixie, yeah ... I don't understand the use of <h2> there
00:08
<jgraham>
Although HTML4 did that with <hx> and it didn't help much
00:08
<webben>
http://www.w3.org/TR/html4/struct/global.html#h-7.5.5 was lousy
00:09
<webben>
doesn't even say MUST be used in order
00:09
<webben>
instead "Some people consider skipping heading levels to be bad practice."
00:09
<webben>
what a cop out
00:09
<jgraham>
So, could I trouble someone to invite me to join gmail?
00:09
<webben>
jgraham, sure
00:09
jgraham
is the last person in the universe with no gmail account
00:10
<Hixie>
heh
00:10
<jgraham>
My email address is jg307⊙cau
00:11
<webben>
jgraham, there you go (hopefully)
00:11
<jgraham>
webben: thanks :)
00:12
<webben>
why can't one have a <heading> as a descendant of a <heading> ?
00:13
<webben>
e.g. if you have a long document with a bit-per-page view and an all-in-one view
00:13
<webben>
you might have subsections with headings with taglines
00:15
<jgraham>
Can we go with "because my head would explode trying to work out how to generate an outline for it"? ;)
00:16
<webben>
jgraham, not nearly as much as the author trying to revise programmatically said document for all-in-one viewing
00:17
<webben>
Of course if <heading> was simply one <hX> rather than as many as you want
00:17
<webben>
and if hX are in order
00:17
<webben>
then outlining would be unproblematic
00:18
<Hixie>
there's like an exact spec for how to create an outline
00:18
<Hixie>
just implement that
00:18
<Hixie>
and your life will be good
00:20
<webben>
why are headings inside blockquotes part of the TOC?
00:20
<jgraham>
Hixie: I know. But as I recall, it's pretty complicated
00:21
<webben>
ah i see
00:21
<webben>
they aren't
00:21
webben
provides a demonstration.
00:21
<Hixie>
jgraham: yeah, but that shouldn't affect implementing it. he just has to follow the spec. :-)
00:22
<hsivonen>
FYI: http://www.intertwingly.net/blog/2006/12/01/The-White-Pebble#c1165015647
00:22
<webben>
Will authors understand it?
00:22
Kanashii
(n=Kanashii⊙plbion) Quit ()
00:28
<Hixie>
"Breaking XML is too politically incorrect even for the WHATWG."
00:28
<Hixie>
haha
00:28
<Hixie>
nice
00:28
<Hixie>
we could try!
00:28
<Hixie>
XML5!
00:29
<Hixie>
maybe sometime after SVG5!
00:32
<jgraham>
Can we replace all the angle brackets with something more aesthetically pleasing? ;)
00:32
tantek
(n=tantek⊙adspn) Quit (Read error: 54 (Connection reset by peer))
00:32
<Hixie>
i'd love to
00:32
<Hixie>
but backwards compatibility forces us to keep them
00:32
<Hixie>
:-)
00:34
<jgraham>
I've put all the html5 python code I have written up on google code: http://code.google.com/p/html5lib/
00:34
<hsivonen>
Hixie: only the W3C gets to break XML
00:34
<hsivonen>
(I mean XML 1.1)
00:34
<Hixie>
heh
00:34
<hsivonen>
speaking of which
00:35
<jgraham>
(Note: nothing there works)
00:35
<hsivonen>
the spec should require XML 1.0--not "some version"
00:35
<webben>
why?
00:35
<Hixie>
i have no idea what thomas broyer is asking for
00:35
<Hixie>
i hate it when i can't work out what someone wants
00:35
<jgraham>
(but I don't want to end up with 3 different efforts to do the same thing)
00:35
<hsivonen>
webben: XML 1.1 is a huge compatibility problem and PITA
00:36
<hsivonen>
webben: and XHTML5 does not have Cambodian tags
00:36
<hsivonen>
Khmer tags, I should say
00:36
<Hixie>
i'm not requiring XML 1.1 for the same reason that I _am_ defining XHTML at all
00:36
<Hixie>
er 1.0
00:36
<Hixie>
namely, if i require xml 1.0, someone will have to define their own serialisation using 1.1.
00:37
<hsivonen>
good point
00:37
<Hixie>
(but i agree with you in principle)
00:37
<hsivonen>
I will enforce 1.0
00:37
<Hixie>
you can do that, just by being an XML 1.0 Conformant Processor :-)
00:40
webben
is confused ... how can you both not require 1.0 and enforce 1.0 ?
00:43
<hsivonen>
webben: Hixie doesn't require but I do
00:44
<webben>
you mean with your validator?
00:44
<hsivonen>
webben: yes
00:44
<webben>
can XHTML 1.1 be in XML 1.1?
00:45
<hsivonen>
webben: it wouldn't be conforming, AFAIK
00:45
<hsivonen>
http://hsivonen.iki.fi/validator/html5/?doc=http%3A%2F%2Fhsivonen.iki.fi%2Ftest%2Fxml11.xhtml
00:47
<Hixie>
"IO Error: HTTP resource not retrievable." should probably be "The file you specified could not be downloaded. Are you sure you specified the right address? (You may also [validate the 404 document].)"
00:47
<Hixie>
or something
00:47
<hsivonen>
Hixie: do you see that on the URL I just pasted?
00:47
<Hixie>
no
00:48
<Hixie>
i see it on http://hsivonen.iki.fi/validator/html5/?doc=http%3A%2F%2Fhsivonen.iki.fi%2Ftest%2Fxml10.xhtml
00:48
<Hixie>
which is what i immediately tried :-)
00:49
<hsivonen>
Hixie: suggestion logged
00:49
<hsivonen>
the message comes from the bowels of Apache Commons HTTP Client
00:49
<Hixie>
ah
00:51
<hsivonen>
I should see if it has an IOException subclass with the http status code
00:54
<hsivonen>
oops. it comes from my code after all
00:54
<hsivonen>
if (m.getStatusCode() != 200) {
00:54
<Hixie>
heh
00:54
<hsivonen>
looks like I've been lazy
00:55
<hsivonen>
redirects are transparent to me
00:55
<hsivonen>
err opaque
00:55
<hsivonen>
I don't notice
00:56
hsivonen
gets confused with transparent and opaque if the library hides it
01:03
Hixie
tries to get the hang of the results of http://www.hixie.ch/tests/adhoc/dom/level0/window/open/
01:04
<Hixie>
(turn off tabs first)
01:07
<Hixie>
i don't understand what mozilla does
01:07
<Hixie>
on 002
01:08
<Hixie>
wow
01:09
<Hixie>
a window.alert() on safari blocks the entire browser
01:11
<Hixie>
and on IE it blocks UI interaction and JS for that tab
01:11
<Hixie>
and all the tabs that are involved in the test
01:11
<Hixie>
and the chrome for windows involved in the test, even though other tabs on that test are fine!
01:11
<Hixie>
wow, there's proof that the menu bar is per-tab if nothing else
01:18
<Hixie>
man, all the browsers act differently
01:18
<Hixie>
gah
01:18
<Hixie>
bbl
02:24
webben
(n=benjamin⊙9822) Quit ("Leaving")
02:30
whateley
(n=whateley⊙Sesn) Quit (Read error: 110 (Connection timed out))
02:48
tantek
(n=tantek⊙adspn) Quit (Read error: 131 (Connection reset by peer))
03:45
mpt
(n=mpt⊙1dtn) Quit ("This computer has gone to sleep")
05:04
<Lachy>
Hixie, typo in 4.2.2: If the value is null - The error should not [be] reported to the user.
05:54
<Lachy>
I've added several new questions to the FAQ
05:54
<Lachy>
http://blog.whatwg.org/faq/#mime-type
05:54
<Lachy>
http://blog.whatwg.org/faq/#tracking-changes
05:54
<Lachy>
http://blog.whatwg.org/faq/#namespaces
06:01
mpt
(n=mpt⊙1dtn) Quit ("Leaving")
06:42
csarven
(i=nevrasc⊙m1mvc) Quit (Read error: 104 (Connection reset by peer))
06:47
<Lachy>
what the???? "I don't want to use namespaces. I want to use an xmlns attribute. " -- Robert Sayre.
06:47
<Lachy>
I think that's the quote of the day ;-)
09:58
jgraham
(n=jgraham⊙8122) Quit (sterling.freenode.net irc.freenode.net)
09:58
gavin_s
(n=gavin⊙6221) Quit (sterling.freenode.net irc.freenode.net)
10:09
<hsivonen>
Lachy: I think Robert has a good point
11:24
<Lachy>
hsivonen, I don't think so
11:42
rhymes
(n=rhymes⊙h5rti) Quit ()
11:42
Kanashii
(n=Kanashii⊙plbion) Quit ()
12:04
<Lachy>
jgraham's idea of using a different attribute name from xmlns is better. It's similar to what I said here yesterday, but I'd rather avoid requiring authors to remember the full URI
12:08
<Lachy>
I'd just use <svg ns="svg">, where the attribute takes a set of predefined values, such as "svg", "mathml", "xhtml". But in most cases, it would be unnecessary to use it anyway.
12:11
<Lachy>
although I still think it's better to such things for use in XHTML. Browsers, especially IE, are much more likely to add support for XHTML, SVG and MathML, before a special html-based math/svg syntax.
13:18
<ROBOd>
good eday to all
13:19
<Lachy>
hey ROBOd
13:19
<ROBOd>
Lachy: it seems attractive to use ns instead of xmlns, because it would cause less confusion, because people wouldn't mistake it with XHTML, etc. however... i am suspicious if in the grand scheme of things creating a "fork" of xmlns is that good
13:19
<ROBOd>
it would only give web developers more work in the future
13:20
<ROBOd>
my suggestion would be that no new ns attribute is added
13:20
<Lachy>
I agree and I don't think it is needed
13:20
<ROBOd>
if, and only if, something is to be done in regards to this, add xmlns.
13:21
<ROBOd>
personally i am not yet decided if xmlns should not be in HTML5
13:21
<Lachy>
but, if a namespace syntax is ever added to HTML, I think it should be at least that simple and must definately not use xmlns
13:21
<ROBOd>
at the moment, i don't see the big gripe, the big need for xmlns in HTML(4|5)
13:22
<ROBOd>
Lachy: yes, it should be *that* simple, but not another attribute
13:22
<Lachy>
are you saying you would rather reuse xmlns for that purpose?
13:22
<ROBOd>
yes
13:22
<Lachy>
which would also mean using the full URIs as well
13:23
<ROBOd>
yep
13:23
<ROBOd>
there's no need to reinvent the wheel, IMHO
13:23
<Lachy>
no, that would only serve to further encouage those with teh misconception that HTML can be treated as XML
13:23
<ROBOd>
as i said above, it's true, that happens
13:23
<Lachy>
and it would give the impression that any arbitrary namespace can be used in HTML
13:24
<ROBOd>
but another attribute would just add other troubles
13:24
<Lachy>
but, as Hixie's study showed, many people get the namespace wrong anyway
13:24
<ROBOd>
exactly
13:24
<Lachy>
which is why I don't think any namespaces should be added to HTML either.
13:24
<ROBOd>
and there's no UA with complete xmlns implementation
13:25
<ROBOd>
e.g. Opera had serious problems with xmlns last time i checked
13:25
<Lachy>
but my point is that xmlns is too difficult for the average HTML coder plus the other problems just mentioned
13:25
<Lachy>
doesn't Mozilla fully support xmlns in XML?
13:26
<Lachy>
what's Opera's bug with it?
13:26
<ROBOd>
iirc they have some problems as well
13:26
<ROBOd>
don't know the Mozilla bugs precisely, since I mostly work with Opera
13:26
<ROBOd>
well... for example, Opera with VoiceXML doesn't really care much about the XML namespace
13:27
<ROBOd>
it just detects the tag name, and that's pretty much all
13:28
<ROBOd>
e.g. if one wants to use something else than the default xmlns prefix (vxml)
13:30
<ROBOd>
at the end of that day ... i was pretty much sure XML namespace support was glued (read: not good) :)
13:31
<Lachy>
but those are bugs in the XML implementation, specifically relating to prefixes. There would be no prefixes in HTML, so any use of xmlns couldn't use prefixes and that difference would only cause problems
13:32
<Lachy>
besides, as Hixie has mentioned, Opera has tried to implement namespaces in HTML, but apparently had to back out of it because so many pages relied on MS Office namespaces being completely ignored by non-IE browsers.
13:32
<ROBOd>
the more i think of it, the more i'd recommend Hixie *not* to accept xmlns (or any derivate, for that matter) in html5
13:33
<Lachy>
that's another reason we couldn't reuse xmlns in HTML because MS office has broken it
13:33
<Lachy>
I fully agree!
13:33
<ROBOd>
thing is: use xhtml for svg and for other "advanced" stuff
13:33
<Lachy>
yep
13:33
<raspberry-lemon>
the newbie agrees too, just for the record
13:34
<Lachy>
raspberry-lemon, what's your real name? Have I seen on on the mailing list before?
13:34
rhymes
(n=rhymes⊙h5rti) Quit ()
13:35
<raspberry-lemon>
real name is chris svindseth, but if you've seen me on the mailing list it would be quite the miracle as i only read it sporadically :)
13:35
<ROBOd>
Lachy: i've read Sam's blog post (link posted yesterday here). i now believe he exaggerates with his wish to merge XHTML with HTML.
13:36
<Lachy>
ah, so you've never posted to the list.
13:36
<raspberry-lemon>
no
13:37
<Lachy>
yep, I agree. I think Sam's just taking it too far
13:39
<ROBOd>
gotta go now, bbl
13:39
<Lachy>
ok, cya
15:27
<annevk>
hah
15:28
<citoyen>
oh look, it's awake
15:28
<annevk>
next time I go away for more than 24 hours I'll turn IRC off
15:28
<Lachy>
hi annevk
15:28
<annevk>
hi there
15:28
annevk
just read through the entire backlog...
15:28
annevk
hasn't yet read Sam Ruby's post
15:28
<annevk>
morning citoyen :)
15:28
<Lachy>
annevk, was it worth reading it all?
15:28
<citoyen>
mornin' :) how's the head? :)
15:30
<annevk>
better
15:30
<annevk>
Lachy, no, I skipped major parts
15:31
<annevk>
"HTML is tantalizingly close to well-formed XML." ...
15:32
<Lachy>
hah! :-D
15:32
<citoyen>
*blink*
15:32
<Lachy>
there's been several funny quotes on the list today
15:38
<annevk>
class AtheistParseError(ParseError): ...
15:52
<annevk>
"Breaking XML is too politically incorrect even for the WHATWG." We could try...
15:52
<annevk>
Introduce graceful error handling for XML
15:53
rhymes
(n=rhymes⊙h5rti) Quit ()
16:00
<Lachy>
it's too late for that
16:02
<annevk>
it's already happening
16:03
<annevk>
see feed parsers for instance
16:03
<Lachy>
?
16:03
<annevk>
we better define how it should work...
16:03
<Lachy>
Oh, that's just crap. They should use draconian error handling
16:03
<annevk>
that doesn't make much sense to me
16:04
<Lachy>
and CMSs should use proper XML tools and ensure they output well-formed feeds
16:04
<annevk>
it seems better for their users to do the non draconian thing
16:04
<annevk>
right...
16:04
<annevk>
those CMSs have been promised for over the past ten years or so
16:04
<Lachy>
IE7 does draconian error handling for feeds, doesn't it?
16:04
<hsivonen>
Lachy: have fun trying to convince Mark P. not to do what he does. :-)
16:04
<annevk>
there's not really such a thing as bugfree software, I think we should try to learn from that
16:05
<annevk>
Lachy, only partially
16:05
<hsivonen>
annevk: TeX. The conclusion is that we should use .dvi for interchange. :-)
16:06
<citoyen>
Let's face it, people fail and tools fail, no matter how much we try. Given that, and that tools are meant to make our lives easier, not more annoying, I think error handling is the way to go.
16:06
<annevk>
hsivonen, I don't get that
16:06
<annevk>
as in, I'm not sure what you're saying :)
16:07
<hsivonen>
annevk: TeX is famous for being the non-trivial piece of software that is free of bugs
16:07
<hsivonen>
TeX outputs .dvi
16:07
<annevk>
oh
16:09
<hsivonen>
grr. I have to update my <t> test cases
16:12
<annevk>
s/t/time/
16:13
<hsivonen>
annevk: won't work
16:13
<hsivonen>
consider <title>
16:14
<annevk>
ok, do it a bit smarter :)
16:14
<annevk>
s/<t /
16:14
<annevk>
s/<t>/
16:14
<annevk>
etc.
16:14
<hsivonen>
yeah
16:18
<annevk>
Hixie, if you have nothing else to work, consider updating the parsing section a bit more to remove the last couple of red blocks and do the rewrite of the tree construction section...
16:30
<annevk>
http://therealcrisp.xs4all.nl/blog/ "Hell is where browsers come from"
16:34
<hsivonen>
Lachy: wp-comments-post.php is broken
16:34
<hsivonen>
"Error: This file cannot be used on its own."
16:35
<Lachy>
ok, let me see...
16:36
<Lachy>
Does that happen when you try to post a comment?
16:36
<hsivonen>
yos
16:36
<hsivonen>
yes
16:36
<Lachy>
when you're logged in or not?
16:36
<hsivonen>
logged in
16:36
<Lachy>
ok, it worked for me when not logged in
16:37
ROBOd
(n=robod⊙8321) Quit (Read error: 104 (Connection reset by peer))
16:37
<Lachy>
worked for me when logged in too
16:37
<hsivonen>
hmm. interesting
16:37
<hsivonen>
gotta run for dinner
16:38
<ROBOd2>
bon app�tit hsivonen
16:38
<hsivonen>
thanks
16:38
<Lachy>
I get that error when I visit http://blog.whatwg.org/wp-comments-post.php directly, rather than posting to it
16:38
<annevk>
isn't it a little early...
16:39
<annevk>
oh, wait, Finland
16:39
<hsivonen>
annevk: board game scheduled after dinner
16:39
<hsivonen>
hence, early dinner
16:39
<hsivonen>
really going now
16:39
<annevk>
bye
16:40
<annevk>
Lachy, you want http://c2.com/cgi/wiki?GeneratorsInPython
16:43
<Lachy>
I see. so we would implement a getChar() function that uses yield and returns the next character in the stream
16:43
<annevk>
I think that's the idea
16:43
<Lachy>
what about when we have to back up a few chars for error handling?
16:44
<annevk>
you store the characters somewhere I suppose
16:44
<annevk>
hmm
16:44
<Lachy>
ok, need to think about it.
16:51
gsnedders
(n=gsnedder⊙hrbc) Quit ("Don't touch /dev/null�")
16:52
<annevk>
hmm yeah
16:52
<annevk>
for states like the entity state
16:53
<Lachy>
it might be easier to implement in it a stream object that handles walking forward and backward through the stream, even if it uses yield internally for some stuff
16:53
<Lachy>
and even supports inserting markup into the stream, which would be needed for document.write() support
16:54
<annevk>
yeah, didn't jgraham have something like that?
16:54
Lachy
will check
16:58
<Lachy>
I think that's what his Tokeniser object does, but not sure. It seems to be structured in a very strange way.
17:05
<annevk>
when I source on google for "live dom viewer" i get your site Lachy ... some copy
17:05
<jgraham>
Lachy: what is strange
17:05
<jgraham>
?
17:06
<jgraham>
Did you see that I started a google project for a python based html5 parser: http://code.google.com/p/html5lib/
17:07
<annevk>
cool
17:07
<annevk>
I'm willing to help out
17:07
<jgraham>
I'm really up for working with other people on this, soI'm quite happy to change the design if it's no good. And I seem to have a bit more python experience, which might help
17:09
<Lachy>
jgraham, write an article about it on the blog
17:09
<Lachy>
let a few more people know about it and ask for more contributors
17:11
<jgraham>
Yeah, that's a good idea. I might set up a wiki page for discussing the design as well
17:11
<Lachy>
Cool, I'm happy with the BSD licence for it
17:11
<annevk>
what does BSD imply?
17:12
<annevk>
what are the restrictions, basically
17:12
<Lachy>
it means that you retain copyright, but anyone is free to do whatever they like with it
17:12
<jgraham>
http://www.opensource.org/licenses/bsd-license.php
17:13
<jgraham>
I think it's about the most liberal license available
17:13
<Lachy>
http://en.wikipedia.org/wiki/BSD
17:13
<jgraham>
But if anyone has any good reasons to change it, I'm listening
17:14
<annevk>
I'd be happy with a license that doesn't require attribution
17:15
<Lachy>
http://en.wikipedia.org/wiki/Public_domain_equivalent_license
17:15
<Lachy>
BSD is near enough to public domain
17:17
<jgraham>
The options in google hosting are BSD, Apache 2.0, Artistic/GPLv2.0, GPL2.0, LGPL, MIT, MPL1.1
17:17
<Lachy>
This is what I usually do for copyright http://lachy.id.au/about/copyright
17:18
<Lachy>
of those, either MIT or BSD are the most permissive
17:19
<jgraham>
Do you think MIT would work better?
17:21
<annevk>
yes
17:21
<jgraham>
OK
17:21
<annevk>
per http://en.wikipedia.org/wiki/MIT_License that doesn't require attribution which may be a problem for some commercial entities
17:21
<jgraham>
OK, it's changed
17:22
<annevk>
if you want you can add annevankesteren⊙gc though I wonder how to deal with such a project
17:24
<jgraham>
I added you as a project owner
17:24
<annevk>
hah
17:24
Lachy
will register a new gmail account and join
17:24
<jgraham>
What do you mean "deal with such a project"? You mean how to actually design the code collaboratively?
17:25
<Lachy>
if only someone hadn't stolen my name! lachlan.hunt at gmail.com is taken :-(
17:25
<jgraham>
Heh. I ended up with jgraham.cantab since almost everything I could think of was gone...
17:26
<annevk>
jgraham, yes
17:27
<annevk>
I took the liberty to add more text to the frontpage
17:27
<jgraham>
Well I think a design document on a wiki would help. I don't know if the whatwg wiki is the right place though
17:27
<Lachy>
oh, no I forgot, I already have lachyhunt at gmail.com :-)
17:28
<jgraham>
Lachy: OK, I added you
17:28
<Lachy>
thanks
17:31
<annevk>
checkout is still going on...
17:31
<annevk>
hmm
17:31
Lachy
is finishing off the blog entry for feed autodiscovery...
17:32
<Lachy>
are there any other issues with "alternate", besides a feed not necessarily being an alternate represntaion and the MIME type not always being a good indicator of a feed?
17:32
<annevk>
you should prolly post on monday
17:32
<Lachy>
why wait? It'll still be there on Monday
17:33
<annevk>
posts tend to get more attention throughout the week
17:33
<annevk>
at least, in my experience
17:34
<Lachy>
yeah, but what difference does it make if it's posted today or tomorrow? It'll still show up in peoples feed readers on monday morning
17:34
<annevk>
i've wondered about that myself
17:35
<Lachy>
but I can hold it off for a day if you like, it doesn't matter that much
17:45
<Lachy>
hehe... :-) The latest from elliot...
17:45
<Lachy>
"Secondly, anyone who actually tried to use an SGML parser to handle HTML rapidly hit a wall since most HTML documents were not even close to actually conformant to the SGML spec or the HTML DTD. "
17:46
<Lachy>
now if only he could figure the concept when s/SGML/XML
17:47
<annevk>
hmm, I can't seem to commit
17:48
<annevk>
jgraham, should we use a googlegroups for discussion?
17:50
<jgraham>
annevk: I guess googlegroups might be good. I'd still like a wiki page somewhere to hack out a design. Any ideas where? I could set something up on my desktop but it's unlikely to be very reliable...
17:51
<annevk>
lets use wiki.html5.org
17:51
<Lachy>
jgraham, wiki.whatwg.org
17:51
<annevk>
what Lachy said
17:51
<annevk>
PythonHTML5Lib ?
17:52
<jgraham>
OK, I just didn't want it to seem like an "official" implementation
17:52
<annevk>
lets make that clear in the first paragraph :)
17:52
<jgraham>
OK
18:22
<jgraham>
I've created http://wiki.whatwg.org/wiki/HTML5Lib I'll fill in some more of the details shortly
18:30
<Lachy>
You should use [Category:Implementations] instead so that the list is automatic
18:34
<Lachy>
done http://lachy.id.au/log/2005/12/xhtml-beginners
18:34
<Lachy>
oops, wrong like
18:34
<Lachy>
*link
18:34
<Lachy>
http://wiki.whatwg.org/wiki/Category:Implementations
18:41
Lachy
has had enough of Elliot, the arguments are just going round and round in circles.
18:44
<Lachy>
I'm going to try to not respond to him again, no matter how tempting it gets.
19:22
<jgraham>
http://wiki.whatwg.org/wiki/HTML5Lib now has some description of the tokeniser Please go ahead and rip it to shreds :)
19:39
whateley
(n=whateley⊙Sesn) has left #whatwg
19:41
<annevk>
hmm, seems to come down to yet aonther mime type debate
19:41
<annevk>
I love those! [pause] Not.
19:41
annevk
reads the wiki
19:41
annevk
just had some food
19:44
jgraham
notices a mistake in the wiki page
19:46
<annevk>
We should use the word Tokenizer
19:46
<annevk>
or HTMLTokenizer
19:46
<annevk>
note the z
19:48
<annevk>
see Google if you don't believe me :)
19:48
<annevk>
jgraham, so how does the tokenizer integrate with the parser?
19:48
<annevk>
parser -> tree construction phase
19:48
<annevk>
the three construction phase directly affects the tokenizer
19:48
<annevk>
s/three/tree ...
19:49
<jgraham>
Tokeniser == english spelling, tokenizer == American spelling, no?
19:49
<annevk>
yes
19:49
<jgraham>
But we can go with "z", I'll just make more typos that way ;)
19:49
<annevk>
"Results 1 - 10 of about 40,100 for tokeniser."
19:49
<annevk>
"Results 1 - 10 of about 1,240,000 for tokenizer. "
19:50
<annevk>
Google also suggested that I search for tokenizer when I tried tokeniser :)
19:50
<jgraham>
annevk: The parser calls getToken every time it wants a token. But it also holds a reference to the tokeniser so it can change the tokeniser state when it needs to. Does it ever do more than change the content model flag?
19:52
<annevk>
I don't think so
19:52
<annevk>
but can't we work with functions then in the tokenizer that the parser implements?
19:54
<jgraham>
Could do, I guess. I'm not sure what the benefit is though?
19:54
<annevk>
I think it's cleaner than having temporary token objects...
19:56
annevk
reads through the spec once again
19:58
<jgraham>
Well this way the seperation between tokeniser and parser is pretty clean. It also has the nice property of being a very literal implementation of the spec - when it says "create a token" you really do. But I see your point; maybe it adds lots of overhead
20:01
<annevk>
I might have mentioned this already, but it would be nice if the parser was fairly low-level so it can be ported to other languages as well.
20:01
<annevk>
In an easy way
20:03
<annevk>
I think having functions might also make it easier to add markup injection, if ever...
20:04
<jgraham>
document.write in python?!
20:06
<annevk>
well, the architecture should sort of take it into account
20:09
<annevk>
jgraham, why do the base classes inherit from object?
20:10
<jgraham>
Because that makes them "new style" python classes
20:10
<jgraham>
Which have several generally desirable properties compared to old style classes
20:11
<jgraham>
see e.g. http://www.geocities.com/foetsch/python/new_style_classes.htm
20:12
<jgraham>
It's a backwards compat. issue
20:15
<jgraham>
annevk: So in your proposal, what would the interface between the parser and the tokeniser look like? Would you start with the tokeniser and have it call parser.startTagToken(name, attrs) when it made a start tag token? Or something else?
20:15
<annevk>
And what does frozenset gives us? What it seems to imply?
20:16
<annevk>
jgraham, I suppose self.startTagToken() if the parser inherits from it...
20:16
<annevk>
but yeah
20:16
<annevk>
I'm updating the wiki as we chat
20:17
<jgraham>
Also I think document.write would work in my model, you'd have to append the extra markup to the characterQueue (mistakenly called characterStack in the svn code). The treebuilder side of that would be the hard part
20:19
<annevk>
perhaps we should call it "characters"
20:19
<annevk>
hmm
20:19
<annevk>
jgraham, yeah, I guess it would
20:20
<jgraham>
frozenset is just an immutable set. Sets are nice because it's easy to compute unions, etc - useful since there are definitions like "All other elements found while parsing an HTML document" which we need to test against. Also membership tests should be fast (I think).
20:21
<annevk>
is it ok that they are global variables though?
20:23
<annevk>
hmm, I suppose you don't want to pass them around all the time
20:23
<jgraham>
They're only global in the current file
20:23
<annevk>
okay
20:24
<annevk>
that's what I expected
20:25
<annevk>
hmm, I've got referrers from example.com ...
20:25
<jgraham>
I don't understand why the parser would inherit from the tokeniser? I can see that the parser and tokeniser would call each other somehow but I don't see why they'd inherit?
20:25
<jgraham>
heh
20:25
<jgraham>
spammers?
20:26
<annevk>
think so
20:27
<annevk>
hmm, you're right
20:28
<annevk>
so you'd have x = HTMLParser("docRef"); HTMLParser invokes HTMLTokenizer(self, "docRef") and there you go
20:28
<annevk>
would that work?
20:29
gsnedders
(n=gsnedder⊙hrbc) Quit ("Don't touch /dev/null�")
20:30
<jgraham>
Yeah. That's basically what I have at the moment. Only I have a "parse" function in the parser which creates the tokeniser.
20:30
<jgraham>
As well as starting parsing obviously
20:35
<annevk>
this is what I just added to the wiki: "There's an HTMLParser class you can invoke with an object. What this object is can be decided later. File object, string, URI, etc. The newly created HTMLParser object then instantiates an HTMLTokenizer with itself as argument and the object. The HTMLTokenizer then invokes does things like parser.emitStartTagToken(name, ...) etc."
20:55
<gsnedders>
what HTML5 parsers are there in existence already?
20:56
<gsnedders>
(and are bug-free enough to use as a reference implementation)
20:56
<annevk>
there are none
20:56
<annevk>
there's a project
20:58
<jgraham>
annevk: I've created a "callback" branch in svn to try your approach.
20:58
<gsnedders>
annevk: right. I knew there were several, but I didn't know how far they were in terms of development
20:59
jgraham
wishes he knew enough computer science to make an informed argument one way or the other
20:59
<annevk>
several, even?
21:01
ROBOd2
(n=robod⊙8321) Quit (Read error: 104 (Connection reset by peer))
21:45
annevk
(n=annevk⊙poc) Quit (Read error: 110 (Connection timed out))
22:03
ROBOd2
(n=robod⊙8321) Quit ("http://www.robodesign.ro";)
22:14
annevk
(n=annevk⊙8111) Quit (Read error: 148 (No route to host))