00:00 | <jgraham> | (to be fair it says "sign in with your gmail account" but it doesn't say that you failed to do so) |
00:00 | <Hixie> | jgraham: feedback conveyed |
00:00 | <Hixie> | and they agreed, so hopefully it'll be fixed :-) |
00:01 | <jgraham> | Great :) |
00:01 | <Hixie> | webben: <header> elements are ways of wrapping multiple <hx> elements into one header, so you can have tag lines, e.g. |
00:03 | webben | can't understand why a tagline would want to be inside a <hx> element |
00:04 | <webben> | but that's not quite what I was asking ... is <header><h3>foo</h3></header> okay? |
00:04 | <jgraham> | webben It turns out that lots of people do that in the real world. It breaks any tool that tries to generate a document outline but they think it's "more semantic" |
00:05 | <webben> | jgraham, yeah but they're crazy |
00:05 | <Hixie> | <header><h3>foo</h3></header> is equivalent to <h3>foo</h3> iirc |
00:05 | <Hixie> | or maybe equivalent to <h1>foo</h1> |
00:05 | <Hixie> | i forget |
00:05 | <Hixie> | see the spec :-) |
00:05 | <jgraham> | Well, maybe. For subheadings what they are trying to do makes a lot of sense |
00:06 | <webben> | jgraham, ah you're talking about <header><h1>foo</h1><h2>bar</h2></header> not <header><h3>foo</h3></header> when you say "that"? |
00:06 | <webben> | why not just have a subhead |
00:06 | <webben> | element |
00:07 | <Hixie> | because you might have: |
00:07 | <Hixie> | <header><p>Welcome to...</p><h1>My home!</h1><h2>or what some people might call "my cube"</h2></header> |
00:07 | <jgraham> | But this is why the spec should be clear about what the use cases behind semantic constructs are so there is some hope people won't break well intentioned UAs by over broadening their element use |
00:07 | <Hixie> | yeah |
00:07 | <Hixie> | i thought the examples for <header> were clear |
00:08 | <webben> | Hixie, yeah ... I don't understand the use of <h2> there |
00:08 | <jgraham> | Although HTML4 did that with <hx> and it didn't help much |
00:08 | <webben> | http://www.w3.org/TR/html4/struct/global.html#h-7.5.5 was lousy |
00:09 | <webben> | doesn't even say MUST be used in order |
00:09 | <webben> | instead "Some people consider skipping heading levels to be bad practice." |
00:09 | <webben> | what a cop out |
00:09 | <jgraham> | So, could I trouble someone to invite me to join gmail? |
00:09 | <webben> | jgraham, sure |
00:09 | jgraham | is the last person in the universe with no gmail account |
00:10 | <Hixie> | heh |
00:10 | <jgraham> | My email address is jg307⊙cau |
00:11 | <webben> | jgraham, there you go (hopefully) |
00:11 | <jgraham> | webben: thanks :) |
00:12 | <webben> | why can't one have a <heading> as a descendant of a <heading> ? |
00:13 | <webben> | e.g. if you have a long document with a bit-per-page view and an all-in-one view |
00:13 | <webben> | you might have subsections with headings with taglines |
00:15 | <jgraham> | Can we go with "because my head would explode trying to work out how to generate an outline for it"? ;) |
00:16 | <webben> | jgraham, not nearly as much as the author trying to revise programmatically said document for all-in-one viewing |
00:17 | <webben> | Of course if <heading> was simply one <hX> rather than as many as you want |
00:17 | <webben> | and if hX are in order |
00:17 | <webben> | then outlining would be unproblematic |
00:18 | <Hixie> | there's like an exact spec for how to create an outline |
00:18 | <Hixie> | just implement that |
00:18 | <Hixie> | and your life will be good |
00:20 | <webben> | why are headings inside blockquotes part of the TOC? |
00:20 | <jgraham> | Hixie: I know. But as I recall, it's pretty complicated |
00:21 | <webben> | ah i see |
00:21 | <webben> | they aren't |
00:21 | webben | provides a demonstration. |
00:21 | <Hixie> | jgraham: yeah, but that shouldn't affect implementing it. he just has to follow the spec. :-) |
00:22 | <hsivonen> | FYI: http://www.intertwingly.net/blog/2006/12/01/The-White-Pebble#c1165015647 |
00:22 | <webben> | Will authors understand it? |
00:22 | Kanashii | (n=Kanashii⊙plbion) Quit () |
00:28 | <Hixie> | "Breaking XML is too politically incorrect even for the WHATWG." |
00:28 | <Hixie> | haha |
00:28 | <Hixie> | nice |
00:28 | <Hixie> | we could try! |
00:28 | <Hixie> | XML5! |
00:29 | <Hixie> | maybe sometime after SVG5! |
00:32 | <jgraham> | Can we replace all the angle brackets with something more aesthetically pleasing? ;) |
00:32 | tantek | (n=tantek⊙adspn) Quit (Read error: 54 (Connection reset by peer)) |
00:32 | <Hixie> | i'd love to |
00:32 | <Hixie> | but backwards compatibility forces us to keep them |
00:32 | <Hixie> | :-) |
00:34 | <jgraham> | I've put all the html5 python code I have written up on google code: http://code.google.com/p/html5lib/ |
00:34 | <hsivonen> | Hixie: only the W3C gets to break XML |
00:34 | <hsivonen> | (I mean XML 1.1) |
00:34 | <Hixie> | heh |
00:34 | <hsivonen> | speaking of which |
00:35 | <jgraham> | (Note: nothing there works) |
00:35 | <hsivonen> | the spec should require XML 1.0--not "some version" |
00:35 | <webben> | why? |
00:35 | <Hixie> | i have no idea what thomas broyer is asking for |
00:35 | <Hixie> | i hate it when i can't work out what someone wants |
00:35 | <jgraham> | (but I don't want to end up with 3 different efforts to do the same thing) |
00:35 | <hsivonen> | webben: XML 1.1 is a huge compatibility problem and PITA |
00:36 | <hsivonen> | webben: and XHTML5 does not have Cambodian tags |
00:36 | <hsivonen> | Khmer tags, I should say |
00:36 | <Hixie> | i'm not requiring XML 1.1 for the same reason that I _am_ defining XHTML at all |
00:36 | <Hixie> | er 1.0 |
00:36 | <Hixie> | namely, if i require xml 1.0, someone will have to define their own serialisation using 1.1. |
00:37 | <hsivonen> | good point |
00:37 | <Hixie> | (but i agree with you in principle) |
00:37 | <hsivonen> | I will enforce 1.0 |
00:37 | <Hixie> | you can do that, just by being an XML 1.0 Conformant Processor :-) |
00:40 | webben | is confused ... how can you both not require 1.0 and enforce 1.0 ? |
00:43 | <hsivonen> | webben: Hixie doesn't require but I do |
00:44 | <webben> | you mean with your validator? |
00:44 | <hsivonen> | webben: yes |
00:44 | <webben> | can XHTML 1.1 be in XML 1.1? |
00:45 | <hsivonen> | webben: it wouldn't be conforming, AFAIK |
00:45 | <hsivonen> | http://hsivonen.iki.fi/validator/html5/?doc=http%3A%2F%2Fhsivonen.iki.fi%2Ftest%2Fxml11.xhtml |
00:47 | <Hixie> | "IO Error: HTTP resource not retrievable." should probably be "The file you specified could not be downloaded. Are you sure you specified the right address? (You may also [validate the 404 document].)" |
00:47 | <Hixie> | or something |
00:47 | <hsivonen> | Hixie: do you see that on the URL I just pasted? |
00:47 | <Hixie> | no |
00:48 | <Hixie> | i see it on http://hsivonen.iki.fi/validator/html5/?doc=http%3A%2F%2Fhsivonen.iki.fi%2Ftest%2Fxml10.xhtml |
00:48 | <Hixie> | which is what i immediately tried :-) |
00:49 | <hsivonen> | Hixie: suggestion logged |
00:49 | <hsivonen> | the message comes from the bowels of Apache Commons HTTP Client |
00:49 | <Hixie> | ah |
00:51 | <hsivonen> | I should see if it has an IOException subclass with the http status code |
00:54 | <hsivonen> | oops. it comes from my code after all |
00:54 | <hsivonen> | if (m.getStatusCode() != 200) { |
00:54 | <Hixie> | heh |
00:54 | <hsivonen> | looks like I've been lazy |
00:55 | <hsivonen> | redirects are transparent to me |
00:55 | <hsivonen> | err opaque |
00:55 | <hsivonen> | I don't notice |
00:56 | hsivonen | gets confused with transparent and opaque if the library hides it |
01:03 | Hixie | tries to get the hang of the results of http://www.hixie.ch/tests/adhoc/dom/level0/window/open/ |
01:04 | <Hixie> | (turn off tabs first) |
01:07 | <Hixie> | i don't understand what mozilla does |
01:07 | <Hixie> | on 002 |
01:08 | <Hixie> | wow |
01:09 | <Hixie> | a window.alert() on safari blocks the entire browser |
01:11 | <Hixie> | and on IE it blocks UI interaction and JS for that tab |
01:11 | <Hixie> | and all the tabs that are involved in the test |
01:11 | <Hixie> | and the chrome for windows involved in the test, even though other tabs on that test are fine! |
01:11 | <Hixie> | wow, there's proof that the menu bar is per-tab if nothing else |
01:18 | <Hixie> | man, all the browsers act differently |
01:18 | <Hixie> | gah |
01:18 | <Hixie> | bbl |
02:24 | webben | (n=benjamin⊙9822) Quit ("Leaving") |
02:30 | whateley | (n=whateley⊙Sesn) Quit (Read error: 110 (Connection timed out)) |
02:48 | tantek | (n=tantek⊙adspn) Quit (Read error: 131 (Connection reset by peer)) |
03:45 | mpt | (n=mpt⊙1dtn) Quit ("This computer has gone to sleep") |
05:04 | <Lachy> | Hixie, typo in 4.2.2: If the value is null - The error should not [be] reported to the user. |
05:54 | <Lachy> | I've added several new questions to the FAQ |
05:54 | <Lachy> | http://blog.whatwg.org/faq/#mime-type |
05:54 | <Lachy> | http://blog.whatwg.org/faq/#tracking-changes |
05:54 | <Lachy> | http://blog.whatwg.org/faq/#namespaces |
06:01 | mpt | (n=mpt⊙1dtn) Quit ("Leaving") |
06:42 | csarven | (i=nevrasc⊙m1mvc) Quit (Read error: 104 (Connection reset by peer)) |
06:47 | <Lachy> | what the???? "I don't want to use namespaces. I want to use an xmlns attribute. " -- Robert Sayre. |
06:47 | <Lachy> | I think that's the quote of the day ;-) |
09:58 | jgraham | (n=jgraham⊙8122) Quit (sterling.freenode.net irc.freenode.net) |
09:58 | gavin_s | (n=gavin⊙6221) Quit (sterling.freenode.net irc.freenode.net) |
10:09 | <hsivonen> | Lachy: I think Robert has a good point |
11:24 | <Lachy> | hsivonen, I don't think so |
11:42 | rhymes | (n=rhymes⊙h5rti) Quit () |
11:42 | Kanashii | (n=Kanashii⊙plbion) Quit () |
12:04 | <Lachy> | jgraham's idea of using a different attribute name from xmlns is better. It's similar to what I said here yesterday, but I'd rather avoid requiring authors to remember the full URI |
12:08 | <Lachy> | I'd just use <svg ns="svg">, where the attribute takes a set of predefined values, such as "svg", "mathml", "xhtml". But in most cases, it would be unnecessary to use it anyway. |
12:11 | <Lachy> | although I still think it's better to such things for use in XHTML. Browsers, especially IE, are much more likely to add support for XHTML, SVG and MathML, before a special html-based math/svg syntax. |
13:18 | <ROBOd> | good eday to all |
13:19 | <Lachy> | hey ROBOd |
13:19 | <ROBOd> | Lachy: it seems attractive to use ns instead of xmlns, because it would cause less confusion, because people wouldn't mistake it with XHTML, etc. however... i am suspicious if in the grand scheme of things creating a "fork" of xmlns is that good |
13:19 | <ROBOd> | it would only give web developers more work in the future |
13:20 | <ROBOd> | my suggestion would be that no new ns attribute is added |
13:20 | <Lachy> | I agree and I don't think it is needed |
13:20 | <ROBOd> | if, and only if, something is to be done in regards to this, add xmlns. |
13:21 | <ROBOd> | personally i am not yet decided if xmlns should not be in HTML5 |
13:21 | <Lachy> | but, if a namespace syntax is ever added to HTML, I think it should be at least that simple and must definately not use xmlns |
13:21 | <ROBOd> | at the moment, i don't see the big gripe, the big need for xmlns in HTML(4|5) |
13:22 | <ROBOd> | Lachy: yes, it should be *that* simple, but not another attribute |
13:22 | <Lachy> | are you saying you would rather reuse xmlns for that purpose? |
13:22 | <ROBOd> | yes |
13:22 | <Lachy> | which would also mean using the full URIs as well |
13:23 | <ROBOd> | yep |
13:23 | <ROBOd> | there's no need to reinvent the wheel, IMHO |
13:23 | <Lachy> | no, that would only serve to further encouage those with teh misconception that HTML can be treated as XML |
13:23 | <ROBOd> | as i said above, it's true, that happens |
13:23 | <Lachy> | and it would give the impression that any arbitrary namespace can be used in HTML |
13:24 | <ROBOd> | but another attribute would just add other troubles |
13:24 | <Lachy> | but, as Hixie's study showed, many people get the namespace wrong anyway |
13:24 | <ROBOd> | exactly |
13:24 | <Lachy> | which is why I don't think any namespaces should be added to HTML either. |
13:24 | <ROBOd> | and there's no UA with complete xmlns implementation |
13:25 | <ROBOd> | e.g. Opera had serious problems with xmlns last time i checked |
13:25 | <Lachy> | but my point is that xmlns is too difficult for the average HTML coder plus the other problems just mentioned |
13:25 | <Lachy> | doesn't Mozilla fully support xmlns in XML? |
13:26 | <Lachy> | what's Opera's bug with it? |
13:26 | <ROBOd> | iirc they have some problems as well |
13:26 | <ROBOd> | don't know the Mozilla bugs precisely, since I mostly work with Opera |
13:26 | <ROBOd> | well... for example, Opera with VoiceXML doesn't really care much about the XML namespace |
13:27 | <ROBOd> | it just detects the tag name, and that's pretty much all |
13:28 | <ROBOd> | e.g. if one wants to use something else than the default xmlns prefix (vxml) |
13:30 | <ROBOd> | at the end of that day ... i was pretty much sure XML namespace support was glued (read: not good) :) |
13:31 | <Lachy> | but those are bugs in the XML implementation, specifically relating to prefixes. There would be no prefixes in HTML, so any use of xmlns couldn't use prefixes and that difference would only cause problems |
13:32 | <Lachy> | besides, as Hixie has mentioned, Opera has tried to implement namespaces in HTML, but apparently had to back out of it because so many pages relied on MS Office namespaces being completely ignored by non-IE browsers. |
13:32 | <ROBOd> | the more i think of it, the more i'd recommend Hixie *not* to accept xmlns (or any derivate, for that matter) in html5 |
13:33 | <Lachy> | that's another reason we couldn't reuse xmlns in HTML because MS office has broken it |
13:33 | <Lachy> | I fully agree! |
13:33 | <ROBOd> | thing is: use xhtml for svg and for other "advanced" stuff |
13:33 | <Lachy> | yep |
13:33 | <raspberry-lemon> | the newbie agrees too, just for the record |
13:34 | <Lachy> | raspberry-lemon, what's your real name? Have I seen on on the mailing list before? |
13:34 | rhymes | (n=rhymes⊙h5rti) Quit () |
13:35 | <raspberry-lemon> | real name is chris svindseth, but if you've seen me on the mailing list it would be quite the miracle as i only read it sporadically :) |
13:35 | <ROBOd> | Lachy: i've read Sam's blog post (link posted yesterday here). i now believe he exaggerates with his wish to merge XHTML with HTML. |
13:36 | <Lachy> | ah, so you've never posted to the list. |
13:36 | <raspberry-lemon> | no |
13:37 | <Lachy> | yep, I agree. I think Sam's just taking it too far |
13:39 | <ROBOd> | gotta go now, bbl |
13:39 | <Lachy> | ok, cya |
15:27 | <annevk> | hah |
15:28 | <citoyen> | oh look, it's awake |
15:28 | <annevk> | next time I go away for more than 24 hours I'll turn IRC off |
15:28 | <Lachy> | hi annevk |
15:28 | <annevk> | hi there |
15:28 | annevk | just read through the entire backlog... |
15:28 | annevk | hasn't yet read Sam Ruby's post |
15:28 | <annevk> | morning citoyen :) |
15:28 | <Lachy> | annevk, was it worth reading it all? |
15:28 | <citoyen> | mornin' :) how's the head? :) |
15:30 | <annevk> | better |
15:30 | <annevk> | Lachy, no, I skipped major parts |
15:31 | <annevk> | "HTML is tantalizingly close to well-formed XML." ... |
15:32 | <Lachy> | hah! :-D |
15:32 | <citoyen> | *blink* |
15:32 | <Lachy> | there's been several funny quotes on the list today |
15:38 | <annevk> | class AtheistParseError(ParseError): ... |
15:52 | <annevk> | "Breaking XML is too politically incorrect even for the WHATWG." We could try... |
15:52 | <annevk> | Introduce graceful error handling for XML |
15:53 | rhymes | (n=rhymes⊙h5rti) Quit () |
16:00 | <Lachy> | it's too late for that |
16:02 | <annevk> | it's already happening |
16:03 | <annevk> | see feed parsers for instance |
16:03 | <Lachy> | ? |
16:03 | <annevk> | we better define how it should work... |
16:03 | <Lachy> | Oh, that's just crap. They should use draconian error handling |
16:03 | <annevk> | that doesn't make much sense to me |
16:04 | <Lachy> | and CMSs should use proper XML tools and ensure they output well-formed feeds |
16:04 | <annevk> | it seems better for their users to do the non draconian thing |
16:04 | <annevk> | right... |
16:04 | <annevk> | those CMSs have been promised for over the past ten years or so |
16:04 | <Lachy> | IE7 does draconian error handling for feeds, doesn't it? |
16:04 | <hsivonen> | Lachy: have fun trying to convince Mark P. not to do what he does. :-) |
16:04 | <annevk> | there's not really such a thing as bugfree software, I think we should try to learn from that |
16:05 | <annevk> | Lachy, only partially |
16:05 | <hsivonen> | annevk: TeX. The conclusion is that we should use .dvi for interchange. :-) |
16:06 | <citoyen> | Let's face it, people fail and tools fail, no matter how much we try. Given that, and that tools are meant to make our lives easier, not more annoying, I think error handling is the way to go. |
16:06 | <annevk> | hsivonen, I don't get that |
16:06 | <annevk> | as in, I'm not sure what you're saying :) |
16:07 | <hsivonen> | annevk: TeX is famous for being the non-trivial piece of software that is free of bugs |
16:07 | <hsivonen> | TeX outputs .dvi |
16:07 | <annevk> | oh |
16:09 | <hsivonen> | grr. I have to update my <t> test cases |
16:12 | <annevk> | s/t/time/ |
16:13 | <hsivonen> | annevk: won't work |
16:13 | <hsivonen> | consider <title> |
16:14 | <annevk> | ok, do it a bit smarter :) |
16:14 | <annevk> | s/<t / |
16:14 | <annevk> | s/<t>/ |
16:14 | <annevk> | etc. |
16:14 | <hsivonen> | yeah |
16:18 | <annevk> | Hixie, if you have nothing else to work, consider updating the parsing section a bit more to remove the last couple of red blocks and do the rewrite of the tree construction section... |
16:30 | <annevk> | http://therealcrisp.xs4all.nl/blog/ "Hell is where browsers come from" |
16:34 | <hsivonen> | Lachy: wp-comments-post.php is broken |
16:34 | <hsivonen> | "Error: This file cannot be used on its own." |
16:35 | <Lachy> | ok, let me see... |
16:36 | <Lachy> | Does that happen when you try to post a comment? |
16:36 | <hsivonen> | yos |
16:36 | <hsivonen> | yes |
16:36 | <Lachy> | when you're logged in or not? |
16:36 | <hsivonen> | logged in |
16:36 | <Lachy> | ok, it worked for me when not logged in |
16:37 | ROBOd | (n=robod⊙8321) Quit (Read error: 104 (Connection reset by peer)) |
16:37 | <Lachy> | worked for me when logged in too |
16:37 | <hsivonen> | hmm. interesting |
16:37 | <hsivonen> | gotta run for dinner |
16:38 | <ROBOd2> | bon app�tit hsivonen |
16:38 | <hsivonen> | thanks |
16:38 | <Lachy> | I get that error when I visit http://blog.whatwg.org/wp-comments-post.php directly, rather than posting to it |
16:38 | <annevk> | isn't it a little early... |
16:39 | <annevk> | oh, wait, Finland |
16:39 | <hsivonen> | annevk: board game scheduled after dinner |
16:39 | <hsivonen> | hence, early dinner |
16:39 | <hsivonen> | really going now |
16:39 | <annevk> | bye |
16:40 | <annevk> | Lachy, you want http://c2.com/cgi/wiki?GeneratorsInPython |
16:43 | <Lachy> | I see. so we would implement a getChar() function that uses yield and returns the next character in the stream |
16:43 | <annevk> | I think that's the idea |
16:43 | <Lachy> | what about when we have to back up a few chars for error handling? |
16:44 | <annevk> | you store the characters somewhere I suppose |
16:44 | <annevk> | hmm |
16:44 | <Lachy> | ok, need to think about it. |
16:51 | gsnedders | (n=gsnedder⊙hrbc) Quit ("Don't touch /dev/null�") |
16:52 | <annevk> | hmm yeah |
16:52 | <annevk> | for states like the entity state |
16:53 | <Lachy> | it might be easier to implement in it a stream object that handles walking forward and backward through the stream, even if it uses yield internally for some stuff |
16:53 | <Lachy> | and even supports inserting markup into the stream, which would be needed for document.write() support |
16:54 | <annevk> | yeah, didn't jgraham have something like that? |
16:54 | Lachy | will check |
16:58 | <Lachy> | I think that's what his Tokeniser object does, but not sure. It seems to be structured in a very strange way. |
17:05 | <annevk> | when I source on google for "live dom viewer" i get your site Lachy ... some copy |
17:05 | <jgraham> | Lachy: what is strange |
17:05 | <jgraham> | ? |
17:06 | <jgraham> | Did you see that I started a google project for a python based html5 parser: http://code.google.com/p/html5lib/ |
17:07 | <annevk> | cool |
17:07 | <annevk> | I'm willing to help out |
17:07 | <jgraham> | I'm really up for working with other people on this, soI'm quite happy to change the design if it's no good. And I seem to have a bit more python experience, which might help |
17:09 | <Lachy> | jgraham, write an article about it on the blog |
17:09 | <Lachy> | let a few more people know about it and ask for more contributors |
17:11 | <jgraham> | Yeah, that's a good idea. I might set up a wiki page for discussing the design as well |
17:11 | <Lachy> | Cool, I'm happy with the BSD licence for it |
17:11 | <annevk> | what does BSD imply? |
17:12 | <annevk> | what are the restrictions, basically |
17:12 | <Lachy> | it means that you retain copyright, but anyone is free to do whatever they like with it |
17:12 | <jgraham> | http://www.opensource.org/licenses/bsd-license.php |
17:13 | <jgraham> | I think it's about the most liberal license available |
17:13 | <Lachy> | http://en.wikipedia.org/wiki/BSD |
17:13 | <jgraham> | But if anyone has any good reasons to change it, I'm listening |
17:14 | <annevk> | I'd be happy with a license that doesn't require attribution |
17:15 | <Lachy> | http://en.wikipedia.org/wiki/Public_domain_equivalent_license |
17:15 | <Lachy> | BSD is near enough to public domain |
17:17 | <jgraham> | The options in google hosting are BSD, Apache 2.0, Artistic/GPLv2.0, GPL2.0, LGPL, MIT, MPL1.1 |
17:17 | <Lachy> | This is what I usually do for copyright http://lachy.id.au/about/copyright |
17:18 | <Lachy> | of those, either MIT or BSD are the most permissive |
17:19 | <jgraham> | Do you think MIT would work better? |
17:21 | <annevk> | yes |
17:21 | <jgraham> | OK |
17:21 | <annevk> | per http://en.wikipedia.org/wiki/MIT_License that doesn't require attribution which may be a problem for some commercial entities |
17:21 | <jgraham> | OK, it's changed |
17:22 | <annevk> | if you want you can add annevankesteren⊙gc though I wonder how to deal with such a project |
17:24 | <jgraham> | I added you as a project owner |
17:24 | <annevk> | hah |
17:24 | Lachy | will register a new gmail account and join |
17:24 | <jgraham> | What do you mean "deal with such a project"? You mean how to actually design the code collaboratively? |
17:25 | <Lachy> | if only someone hadn't stolen my name! lachlan.hunt at gmail.com is taken :-( |
17:25 | <jgraham> | Heh. I ended up with jgraham.cantab since almost everything I could think of was gone... |
17:26 | <annevk> | jgraham, yes |
17:27 | <annevk> | I took the liberty to add more text to the frontpage |
17:27 | <jgraham> | Well I think a design document on a wiki would help. I don't know if the whatwg wiki is the right place though |
17:27 | <Lachy> | oh, no I forgot, I already have lachyhunt at gmail.com :-) |
17:28 | <jgraham> | Lachy: OK, I added you |
17:28 | <Lachy> | thanks |
17:31 | <annevk> | checkout is still going on... |
17:31 | <annevk> | hmm |
17:31 | Lachy | is finishing off the blog entry for feed autodiscovery... |
17:32 | <Lachy> | are there any other issues with "alternate", besides a feed not necessarily being an alternate represntaion and the MIME type not always being a good indicator of a feed? |
17:32 | <annevk> | you should prolly post on monday |
17:32 | <Lachy> | why wait? It'll still be there on Monday |
17:33 | <annevk> | posts tend to get more attention throughout the week |
17:33 | <annevk> | at least, in my experience |
17:34 | <Lachy> | yeah, but what difference does it make if it's posted today or tomorrow? It'll still show up in peoples feed readers on monday morning |
17:34 | <annevk> | i've wondered about that myself |
17:35 | <Lachy> | but I can hold it off for a day if you like, it doesn't matter that much |
17:45 | <Lachy> | hehe... :-) The latest from elliot... |
17:45 | <Lachy> | "Secondly, anyone who actually tried to use an SGML parser to handle HTML rapidly hit a wall since most HTML documents were not even close to actually conformant to the SGML spec or the HTML DTD. " |
17:46 | <Lachy> | now if only he could figure the concept when s/SGML/XML |
17:47 | <annevk> | hmm, I can't seem to commit |
17:48 | <annevk> | jgraham, should we use a googlegroups for discussion? |
17:50 | <jgraham> | annevk: I guess googlegroups might be good. I'd still like a wiki page somewhere to hack out a design. Any ideas where? I could set something up on my desktop but it's unlikely to be very reliable... |
17:51 | <annevk> | lets use wiki.html5.org |
17:51 | <Lachy> | jgraham, wiki.whatwg.org |
17:51 | <annevk> | what Lachy said |
17:51 | <annevk> | PythonHTML5Lib ? |
17:52 | <jgraham> | OK, I just didn't want it to seem like an "official" implementation |
17:52 | <annevk> | lets make that clear in the first paragraph :) |
17:52 | <jgraham> | OK |
18:22 | <jgraham> | I've created http://wiki.whatwg.org/wiki/HTML5Lib I'll fill in some more of the details shortly |
18:30 | <Lachy> | You should use [Category:Implementations] instead so that the list is automatic |
18:34 | <Lachy> | done http://lachy.id.au/log/2005/12/xhtml-beginners |
18:34 | <Lachy> | oops, wrong like |
18:34 | <Lachy> | *link |
18:34 | <Lachy> | http://wiki.whatwg.org/wiki/Category:Implementations |
18:41 | Lachy | has had enough of Elliot, the arguments are just going round and round in circles. |
18:44 | <Lachy> | I'm going to try to not respond to him again, no matter how tempting it gets. |
19:22 | <jgraham> | http://wiki.whatwg.org/wiki/HTML5Lib now has some description of the tokeniser Please go ahead and rip it to shreds :) |
19:39 | whateley | (n=whateley⊙Sesn) has left #whatwg |
19:41 | <annevk> | hmm, seems to come down to yet aonther mime type debate |
19:41 | <annevk> | I love those! [pause] Not. |
19:41 | annevk | reads the wiki |
19:41 | annevk | just had some food |
19:44 | jgraham | notices a mistake in the wiki page |
19:46 | <annevk> | We should use the word Tokenizer |
19:46 | <annevk> | or HTMLTokenizer |
19:46 | <annevk> | note the z |
19:48 | <annevk> | see Google if you don't believe me :) |
19:48 | <annevk> | jgraham, so how does the tokenizer integrate with the parser? |
19:48 | <annevk> | parser -> tree construction phase |
19:48 | <annevk> | the three construction phase directly affects the tokenizer |
19:48 | <annevk> | s/three/tree ... |
19:49 | <jgraham> | Tokeniser == english spelling, tokenizer == American spelling, no? |
19:49 | <annevk> | yes |
19:49 | <jgraham> | But we can go with "z", I'll just make more typos that way ;) |
19:49 | <annevk> | "Results 1 - 10 of about 40,100 for tokeniser." |
19:49 | <annevk> | "Results 1 - 10 of about 1,240,000 for tokenizer. " |
19:50 | <annevk> | Google also suggested that I search for tokenizer when I tried tokeniser :) |
19:50 | <jgraham> | annevk: The parser calls getToken every time it wants a token. But it also holds a reference to the tokeniser so it can change the tokeniser state when it needs to. Does it ever do more than change the content model flag? |
19:52 | <annevk> | I don't think so |
19:52 | <annevk> | but can't we work with functions then in the tokenizer that the parser implements? |
19:54 | <jgraham> | Could do, I guess. I'm not sure what the benefit is though? |
19:54 | <annevk> | I think it's cleaner than having temporary token objects... |
19:56 | annevk | reads through the spec once again |
19:58 | <jgraham> | Well this way the seperation between tokeniser and parser is pretty clean. It also has the nice property of being a very literal implementation of the spec - when it says "create a token" you really do. But I see your point; maybe it adds lots of overhead |
20:01 | <annevk> | I might have mentioned this already, but it would be nice if the parser was fairly low-level so it can be ported to other languages as well. |
20:01 | <annevk> | In an easy way |
20:03 | <annevk> | I think having functions might also make it easier to add markup injection, if ever... |
20:04 | <jgraham> | document.write in python?! |
20:06 | <annevk> | well, the architecture should sort of take it into account |
20:09 | <annevk> | jgraham, why do the base classes inherit from object? |
20:10 | <jgraham> | Because that makes them "new style" python classes |
20:10 | <jgraham> | Which have several generally desirable properties compared to old style classes |
20:11 | <jgraham> | see e.g. http://www.geocities.com/foetsch/python/new_style_classes.htm |
20:12 | <jgraham> | It's a backwards compat. issue |
20:15 | <jgraham> | annevk: So in your proposal, what would the interface between the parser and the tokeniser look like? Would you start with the tokeniser and have it call parser.startTagToken(name, attrs) when it made a start tag token? Or something else? |
20:15 | <annevk> | And what does frozenset gives us? What it seems to imply? |
20:16 | <annevk> | jgraham, I suppose self.startTagToken() if the parser inherits from it... |
20:16 | <annevk> | but yeah |
20:16 | <annevk> | I'm updating the wiki as we chat |
20:17 | <jgraham> | Also I think document.write would work in my model, you'd have to append the extra markup to the characterQueue (mistakenly called characterStack in the svn code). The treebuilder side of that would be the hard part |
20:19 | <annevk> | perhaps we should call it "characters" |
20:19 | <annevk> | hmm |
20:19 | <annevk> | jgraham, yeah, I guess it would |
20:20 | <jgraham> | frozenset is just an immutable set. Sets are nice because it's easy to compute unions, etc - useful since there are definitions like "All other elements found while parsing an HTML document" which we need to test against. Also membership tests should be fast (I think). |
20:21 | <annevk> | is it ok that they are global variables though? |
20:23 | <annevk> | hmm, I suppose you don't want to pass them around all the time |
20:23 | <jgraham> | They're only global in the current file |
20:23 | <annevk> | okay |
20:24 | <annevk> | that's what I expected |
20:25 | <annevk> | hmm, I've got referrers from example.com ... |
20:25 | <jgraham> | I don't understand why the parser would inherit from the tokeniser? I can see that the parser and tokeniser would call each other somehow but I don't see why they'd inherit? |
20:25 | <jgraham> | heh |
20:25 | <jgraham> | spammers? |
20:26 | <annevk> | think so |
20:27 | <annevk> | hmm, you're right |
20:28 | <annevk> | so you'd have x = HTMLParser("docRef"); HTMLParser invokes HTMLTokenizer(self, "docRef") and there you go |
20:28 | <annevk> | would that work? |
20:29 | gsnedders | (n=gsnedder⊙hrbc) Quit ("Don't touch /dev/null�") |
20:30 | <jgraham> | Yeah. That's basically what I have at the moment. Only I have a "parse" function in the parser which creates the tokeniser. |
20:30 | <jgraham> | As well as starting parsing obviously |
20:35 | <annevk> | this is what I just added to the wiki: "There's an HTMLParser class you can invoke with an object. What this object is can be decided later. File object, string, URI, etc. The newly created HTMLParser object then instantiates an HTMLTokenizer with itself as argument and the object. The HTMLTokenizer then invokes does things like parser.emitStartTagToken(name, ...) etc." |
20:55 | <gsnedders> | what HTML5 parsers are there in existence already? |
20:56 | <gsnedders> | (and are bug-free enough to use as a reference implementation) |
20:56 | <annevk> | there are none |
20:56 | <annevk> | there's a project |
20:58 | <jgraham> | annevk: I've created a "callback" branch in svn to try your approach. |
20:58 | <gsnedders> | annevk: right. I knew there were several, but I didn't know how far they were in terms of development |
20:59 | jgraham | wishes he knew enough computer science to make an informed argument one way or the other |
20:59 | <annevk> | several, even? |
21:01 | ROBOd2 | (n=robod⊙8321) Quit (Read error: 104 (Connection reset by peer)) |
21:45 | annevk | (n=annevk⊙poc) Quit (Read error: 110 (Connection timed out)) |
22:03 | ROBOd2 | (n=robod⊙8321) Quit ("http://www.robodesign.ro") |
22:14 | annevk | (n=annevk⊙8111) Quit (Read error: 148 (No route to host)) |