2007-07-01 [20:09:00.0000] wtf is "next column" in 3.15.11.1.9.1 ?? [20:09:01.0000] i really have to pay more attention when writing up these algorithms [22:36:00.0000] i found tens of thousands of documents last modified in 1990 that claim to have class=MsoNormal [22:36:01.0000] sigh [22:57:00.0000] where? you googled? [22:58:00.0000] wow, the sheer number of pages with screwed-up markup is amazing [22:58:01.0000] duryodhan: part of my research [22:58:02.0000] ohh ok [22:58:03.0000] the sheer number of screwed up markup ... heh [22:59:00.0000] take today ... I am a developer who wants to make a site ... [22:59:01.0000] what should I do? [22:59:02.0000] XHTML5, XHTML 1,2 , HTML4 , HTML Forms. WebForms 1.0 , XForms , use DOM to check forms or use XForms .... [22:59:03.0000] ? [23:00:00.0000] I mean, probably a decade later many of these will become obselete [23:00:01.0000] and someone will look on them as docs with screwed up markup ... [23:00:02.0000] today? just use HTML4. [23:00:03.0000] ok [23:00:04.0000] today is a little .... [23:00:05.0000] XHTML5 isn't done yet, XHTML 1 isn't supported by IE, XHTML2 isn't done yet, XForms doesn't work in browsers [23:01:00.0000] why do we have web forms as well as XForms ? [23:01:01.0000] dunno, ask the xforms guys [23:01:02.0000] it's pretty common for groups of people to try to invent new replacement technologies [23:01:03.0000] (XForms using something like orbeon) [23:01:04.0000] sometimes new technologies take off [23:02:00.0000] (most often they don't) [23:02:01.0000] heh... haven't you guys made web forms ? [23:02:02.0000] web forms is just a fancy name for what html4 does [23:02:03.0000] web forms 2 is just the next revision of html4 forms [23:02:04.0000] it's part of html5 [23:02:05.0000] k [23:02:06.0000] the name "web forms 2" is likely to die a peaceful death [23:02:07.0000] heh, isn't that a whatwg spec ? webforms 2? [23:03:00.0000] duryodhan: yes, but it will be integrated into the larger HTML5 spec in the future [23:04:00.0000] assuming the xforms guys don't get in the way and try to push their xforms transitional instead [23:04:01.0000] my point is ... with so many specs ... many docs will still be again with "screwed up markup" in a decade [23:04:02.0000] so we really haven't learnt from our past ... [23:04:03.0000] screwed up markup isn't caused by having many specs [23:04:04.0000] XForms guys are putting webforms 2 as a Xforms-minimal or something ... [23:05:00.0000] they're working on "XForms Transitional", I believe [23:05:01.0000] yeah [23:05:02.0000] so i found over 100000 html files with a last-modified date of 1988. [23:06:00.0000] wtf [23:06:01.0000] well I was looking at developing a way for forms to have digital signatures ... and I still don't know a good way :( [23:06:02.0000] Hixie: time travel. how many from 1987? [23:06:03.0000] there are just so many possibilities .. [23:07:00.0000] jruderman: around the same [23:07:01.0000] that's really weird, I find it hard to believe there are that many servers out there with clocks set so wrongly [23:07:02.0000] jruderman: i even found hundreds from 1662. [23:07:03.0000] that's impressive [23:08:00.0000] yes. [23:08:01.0000] Hixie: 1662? :O [23:08:02.0000] I'm sure the internet archive will be very interested in those historical documents :-) [23:09:00.0000] Found anything from BCE? [23:10:00.0000] Lachy: they were all used for writing "the da vinci code " [23:10:01.0000] my methodology was to look for the first 4 digit number in the last-modified field that was not preceded or suceeded by numbers or a + [23:10:02.0000] so i couldn't find anything BCE [23:10:03.0000] wouldn't surprise me htough [23:10:04.0000] what if it was preceded by a - > [23:10:05.0000] ? [23:11:00.0000] i'd take it but ignore the - or > [23:12:00.0000] I meant to write "what if it was preceded by a '-'?" Ignore the '>' [23:12:01.0000] i treated -s like spaces [23:12:02.0000] even more impressive is the thousands of files from dates greater than 2010 [23:12:03.0000] oh, then you might have counted timezone offsets as years [23:13:00.0000] i got a million or more from 2099 [23:13:01.0000] and almost 200,000 from max time_t [23:13:02.0000] so is the conclusion that we can draw from this, that the last modified date is completely useless? [23:15:00.0000] well, I suppose it's still somewhat useful for setting the If-last-modified-since HTTP header [23:16:00.0000] also a lot from 2099 and 2100 (like, over a million and over half a million respectively) [23:18:00.0000] is there any correlation between dates the use the correct format and those that have plausible dates? [23:20:00.0000] my hypothesis would be that servers that are configured to send the right date format are more likely to be configured with more accurate dates, and the others are just broken and unreliable. [23:21:00.0000] i dunno [23:21:01.0000] there were a LOT of different formats [23:21:02.0000] like, thousands [23:21:03.0000] the spec allows three [23:21:04.0000] which maps to about 10 actual formats [23:21:05.0000] which spec? [23:21:06.0000] http [23:21:07.0000] ok [23:21:08.0000] I thought it only allowed one. [23:21:09.0000] nope [23:22:00.0000] it defines three formats, all retarded [23:23:00.0000] oh, I see [23:23:01.0000] Sun, 06 Nov 1994 08:49:37 GMT ; RFC 822, updated by RFC 1123 [23:23:02.0000] Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036 [23:23:03.0000] Sun Nov 6 08:49:37 1994 ; ANSI C's asctime() format [23:24:00.0000] yeah [23:24:01.0000] so the second one, i ignored [23:25:00.0000] http://junkyard.damowmow.com/284 is an older version of this data btw [23:25:01.0000] the first is at least somewhat sensible [23:26:00.0000] the last one is just a really strange order [23:26:01.0000] the junkyard one doesn't exclude the + character [23:27:00.0000] hence all the low numbers [23:27:01.0000] and the peaks at 100s [23:28:00.0000] how do you explain the peaks at dates like 1428? [23:29:00.0000] 1969-70 can be explained because that's the epoch [23:30:00.0000] yeah 2038 can be explained too [23:30:01.0000] yeah, the max 32bit time [23:30:02.0000] #### only means thousands, so it could just be one misconfigured site [23:31:00.0000] 2250 i don't get [23:33:00.0000] so how many does ########## represent (the 2007 value) [23:35:00.0000] I'm surprised there aren't peaks at years like 0030, 0130, 0230, etc. for timezone offsets [23:35:01.0000] # = one order of magnitude [23:36:00.0000] there aren't that many :30 TZ offsets [23:39:00.0000] ok [23:41:00.0000] ########## is in the billions [23:42:00.0000] aargh! It really annoys me how some people conflate making content accessibile with providing fallback to those without the necessary software [23:44:00.0000] the most annoying thing for me in public-html is the way most people jump to a solution rather than determining the problem [23:46:00.0000] yeah, that too. I tried getting people to focus on the problem months ago, and it didn't really work then, and still not working now [23:47:00.0000] like in the whole headers="" debate, I tried to talk about how we could make tables accessible without needing headers, and basically got accused of ignoring the needs of the accessibility community [23:47:01.0000] yeah [23:48:00.0000] it's ridiculous [23:48:01.0000] although, Henri seemed to get a really good response from Aaron (I believe) that showed significant improvement [23:50:00.0000] this one ttp://www.w3.org/mid/4680E4F5.6080903⊙mn and http://www.w3.org/mid/4680FE26.9090802⊙mn [23:51:00.0000] yeah [23:51:01.0000] let's hope people go more in that direction [23:53:00.0000] what year was cellpadding="" invented? [23:54:00.0000] i have about 250,000 documents labelled 1990, and about 250,000 documents labelled 1990 that have an element with a cellpadding="" attribute [23:55:00.0000] /me dismisses the 1990 data [23:58:00.0000] the sheer number of different doctypes is insane [23:59:00.0000] Hixie: HTML tables were invented around 1995-96 and published in http://www.ietf.org/rfc/rfc1942.txt [23:59:01.0000] that includes cellpadding [00:00:00.0000] yeah so basically anything before 1995 is statistically insignificant [00:00:01.0000] pity [00:00:02.0000] not surprising though [00:42:00.0000] wow [00:42:01.0000] limited quirks was in the 0%-2% range until 2004, then it jumped to 11%, 13%, 20% in 2006 [00:47:00.0000] what do you mean by limited quirks? [01:05:00.0000] Lachy: "almost standards" [01:05:01.0000] ok [01:06:00.0000] I wonder if that's because tools like Dreamweaver started producing reasonable code with transitional DOCTYPEs around that time [01:08:00.0000] actually, dreamweaver was doing that in 2002 when they released Dreamweaver MX [01:16:00.0000] it started around 1999, with xhtml [01:53:00.0000] year over year, the most popular class names are very variable [01:55:00.0000] oh nm [02:04:00.0000] hmm, is dropping in usage [02:04:01.0000] that's encouraging [02:04:02.0000] isn't [02:05:00.0000] my 2000 data is borked [02:05:01.0000] probably skewed by one site or something [06:22:00.0000] /me kind of dislikes it when the spec has exactly the same paragraph repeated in two different places, since his spec<->testcase annotation script uses regexps within paragraphs to identify the right sentence for each test and gets confused by duplicates :-( [06:35:00.0000] http://diveintomark.org/archives/2007/06/30/irony style="" (quote from a t-shirt with red text) [06:43:00.0000] webben_, in my experience, Mozilla bug commentators saying "X is available in an extension, therefore it shouldn't be in the base software". is a common mistake caused by underestimating the difficulty of finding+installing extensions. It's not particularly striking. [06:46:00.0000] (Personally I think longdesc= is a waste of time for a browser to support, but not because there are extensions that support it.) [06:48:00.0000] the fact that ATs already support it is the strongest reason to include it, but in practice, I think it has failed [06:51:00.0000] I thought longdesc seemed like a good case for a microformat [06:52:00.0000] Dashiva: do you mean ? [06:53:00.0000] No, more that the use case seems so limited, it would make more sense to let the group actually using it decide how they want it, and keep it out of the main spec [06:53:01.0000] This is orthogonal to fallback/alt content for images, though [06:57:00.0000] oh well, the legal stick of accessibility has been waived again :-/ [06:57:01.0000] why is it that when accessibility advocates can't come up with a rational argument, they always fall back to the legal stick? [07:00:00.0000] Well, maybe they realize there are no carrots available? [07:17:00.0000] If the Web had smell-o-vision, would accessibility advocates fight for longdescs of odors on behalf of those with no sense of smell? [07:19:00.0000] A perfume site that made use of smell-o-vision would probably provide a description of the smell anyway for all users, so they can know what it's like before sampling. [07:20:00.0000] But that's just like now [07:20:01.0000] Everyone is disabled with respect to smelling on the web [07:20:02.0000] Why don't we have accessible smells? [07:25:00.0000] Are there any free non-patent-encumbered media formats for odours? [07:27:00.0000] It would be a pain if you had to use multiple encoders (encodours?) since the common browsers all support different formats :-( [07:36:00.0000] it looks like Sony already has a patent on one form of the technology http://theredactor.blogspot.com/2005/04/birth-of-smellovision.html [07:37:00.0000] http://blog.teledyn.com/node/2286 [07:38:00.0000] of course, it's another case of the US patent office granting another invalid patent for a non-existent invention [07:41:00.0000] you can patent things that haven't been invented? [07:41:01.0000] apparently Sony can [07:41:02.0000] why am i not surprised [07:42:00.0000] I really do not get the whole idea. It provides nothing more than does, and has absolutely no browser support [07:43:00.0000] i think it's the case of http://ln.hixie.ch/?start=1108984991&count=1 [07:44:00.0000] having browser support is a problem if that support contradicts the requirements for supporting images, and can't be changed (due to compatibility concerns), whereas a new element like doesn't have that problem [07:45:00.0000] (since could work eventually, while could never work) [07:45:01.0000] (I don't know if the existing support does have that problem in practice, though) [07:46:00.0000] can work, and does work in some UAs already. However, I would like to know how well object fallback works with screen readers when used to embed images [07:46:01.0000] webben_: yt? [08:16:00.0000] Lachy: yt? [08:16:01.0000] yo, is it you who knows all about screen readers and stuff? [08:16:02.0000] I certainly wouldn't say all. :) [08:17:00.0000] well, more than I do :-) [08:17:01.0000] Lachy: do you have a testcase? [08:17:02.0000] I can make one [08:18:00.0000] Lachy: I suspect it's basically dependent on how browsers handle object fallback. [08:19:00.0000] but I can test with VO and Window-Eyes easily enough [08:19:01.0000] /me unfortunately doesn't have a copy of JAWS [08:19:02.0000] make a page with this and let me know if they read it Fallback Content [08:22:00.0000] and can you also see how well they do with the flash and fallback content on this page http://www.3m.com/intl/au/office/TakeCommand/ [08:28:00.0000] Lachy: well, the object alternative in http://www.benjaminhawkeslewis.com/www/test-cases/object-fallback.html doesn't work with latest WebKit + VO [08:33:00.0000] you could try as well [08:33:01.0000] <webben_> Lachy: but the problem is things are embedded [08:34:00.0000] <Lachy> yeah, true [08:34:01.0000] <webben_> that's the difference between fallback and alternative/descriptive content [08:34:02.0000] <webben_> I tried embed alt ... that didn't work in VO either [08:34:03.0000] <webben_> it read the picture as "Frame 1" [08:35:00.0000] <Lachy> webkit probably doesn't support <embed alt>. Opera's build in screen reader might, or maybe Opera with Windows Eyes (if possible) [08:36:00.0000] <webben_> Lachy: Opera doesn't work with screen readers ATM. Supposedly that will be fixed in a forthcoming release. [08:36:01.0000] <webben_> I think the 3m site is way too complex to function as a testcase [08:38:00.0000] <webben_> yeah webkit doesn't support embed alt ... feeble. [08:38:01.0000] <Lachy> I just wanted to know if that flash was accessible [08:38:02.0000] <webben_> might try and submit a patch for that [08:38:03.0000] <webben_> Lachy: oh right [08:39:00.0000] <webben_> if we want an example of supposedly accessible flash, there's jkrowling's site [08:39:01.0000] <Lachy> I built that 3m site, but I didn't make the flash [08:39:02.0000] <webben_> ah [08:43:00.0000] <Lachy> Opera Voice will read the object fallback [08:44:00.0000] <Lachy> but it won't read the embed alt [08:46:00.0000] <webben_> which is silly, as flash accessibility won't work from Opera [08:47:00.0000] <webben_> hmm actually maybe it's not that silly [08:47:01.0000] <webben_> if you're embedding video you may still be able to watch an auto-playing video [08:48:00.0000] <Lachy> Opera wouldn't read the fallback for the Flash, only the image [08:55:00.0000] <Lachy> why did JK Rowling make the accessibility enabled version completely separate? The english version looks no different from the accessible english version [08:56:00.0000] <webben_> Lachy: I don't know. There are a lot of things I don't understand about how that site was put together. [08:56:01.0000] <webben_> src="http://ln.hixie.ch/resources/images/astrophy/original" is too confusing for IE [08:56:02.0000] <webben_> as is data="http://ln.hixie.ch/resources/images/astrophy/original" [08:57:00.0000] <webben_> it throws up a warning that it wants to download Microsoft HTML Viewer (WTF?) [08:57:01.0000] <webben_> presumably it thinks it's an HTML include [08:59:00.0000] <Philip`> Try <object src="http://ln.hixie.ch/resources/images/astrophy/original" type="image/png">Fallback Content</object> in the DOM viewer in IE - it's great fun, with the kind of 'fun' that only IE can manage [09:00:00.0000] <webben_> funie [09:01:00.0000] <Philip`> (It gives me <OBJECT type=text/html data=data:application/x-oleobject;base64,IGkzJfkDzxGP0ACqAGhvEzwhRE9DVFlQRSBIVE1MIFBVQkxJQyAiLS8vVzNDLy9EVEQgSFRNTCA0LjAgVHJhbnNpdGlvbmFsLy9FTiI+DQo8SFRNTD48SEVBRD4NCjxNRVRBIGh0dHAtZXF1aXY9Q29udGVudC1UeXBlIGNvbnRlbnQ9InRleHQvaHRtbDsgY2hhcnNldD13aW5kb3dzLTEyNTIiPjwvSEVBRD4NCjxCT0RZPg0KPFA+Jm5ic3A7PC9QPjwvQk9EWT48L0hUTUw+DQo= src="http://ln.hixie.ch/resources/images/astrophy/original"> in IE6, and something a bit short [09:01:01.0000] <webben_> hmm I just see #text with IE7 [09:02:00.0000] <Philip`> ...a bit shorter in IE7) [09:02:01.0000] <webben_> I think we need an image with a file extension to test this in IE [09:03:00.0000] <webben_> (IE seems highly reliant on file extensions) [09:04:00.0000] <Philip`> IE only does that x-oleobject for me when type="..." is a recognised MIME type (e.g. text/html, image/png, application/xml, not application/xhtml+xml, etc) [09:04:01.0000] <zcorpan> here's such an image: http://simon.html5.org/valid-html5.png [09:04:02.0000] <webben_> thanks [09:05:00.0000] <webben_> now it recognizes the embed as an image [09:05:01.0000] <webben_> still trips up on object though [09:06:00.0000] <Philip`> http://hsivonen.iki.fi/validator/html5/?doc=http%3A%2F%2Fsimon.html5.org%2Fvalid-html5.png - that filename is wrong - it's not valid HTML5 [09:07:00.0000] <zcorpan> -_- [09:09:00.0000] <zcorpan> funny how innerHTML in ie remembers whether an attribute had an = or not [09:10:00.0000] <zcorpan> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0D%0A%3Cp%20foo%3E [09:10:01.0000] <zcorpan> vs http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0D%0A%3Cp%20foo%3D%3E [09:11:00.0000] <Philip`> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0D%0A%3Cp%20class%3E%3Cp%20class%3D%3E [09:13:00.0000] <zcorpan> interesting [09:13:01.0000] <Philip`> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0D%0A%3Cp%20class%3E%3Cp%20class%3D%3E%0D%0A%3Cp%20style%3E%3Cp%20style%3D%3E%0D%0A%3Cp%20id%3E%3Cp%20id%3D%3E%0D%0A%3Cp%20foo%3E%3Cp%20foo%3D%3E%0D%0A [09:13:02.0000] <Philip`> so that's at least four different behaviours [09:14:00.0000] <zcorpan> try disabled [09:14:01.0000] <Philip`> Oh, but I only get those four behaviours in IE6, not IE7... [09:14:02.0000] <Philip`> Oh, yes I do [09:14:03.0000] <Philip`> /me was stupid and looking at the wrong part of the page [09:15:00.0000] <Philip`> Okay, five different behaviours [09:17:00.0000] <Philip`> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0D%0A%3Cp%20class%3E%3Cp%20class%3D%3E%3Cp%20class%3D%3D%3E%0D%0A%3Cp%20style%3E%3Cp%20style%3D%3E%3Cp%20style%3D%3D%3E%0D%0A%3Cp%20id%3E%3Cp%20id%3D%3E%3Cp%20id%3D%3D%3E%0D%0A%3Cp%20foo%3E%3Cp%20foo%3D%3E%3Cp%20foo%3D%3D%3E%0D%0A%3Cp%20disabled%3E%3Cp%20disabled%3D%3E%3Cp%20disabled%3D%3D%3E%0D%0A [09:18:00.0000] <zcorpan> <p => :) [09:24:00.0000] <zcorpan> <p=> [09:26:00.0000] <zcorpan> wow, that last one is really weird [09:27:00.0000] <Philip`> Isn't that just the same as <foo>? [09:27:01.0000] <zcorpan> oh wait, i was testing in opera [09:28:00.0000] <zcorpan> <p=> is really weird in opera [09:28:01.0000] <zcorpan> yes, in ie it's the same as <foo> [09:29:00.0000] <Philip`> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20HTML%3E%0D%0A%3Cp%3E%3Cp%3D%3Ex%3C/p%3Ex%0D%0A [09:29:01.0000] <Philip`> Opera's innerHTML says "<!DOCTYPE HTML><html><BODY><P><p>x<px</html>" [09:30:00.0000] <Philip`> Oh, but that's exactly the same as for parsing <p><x>x</p>x [09:33:00.0000] <Philip`> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20HTML%3E%0D%0A%3Cp%3E%3C%21%3Ex%3C/p%3Ex%0D%0A innerHTML is peculiar in Opera too [09:34:00.0000] <zcorpan> i guess the weird part is that the = is not in the dom [09:36:00.0000] <Philip`> Ah, okay [09:37:00.0000] <Philip`> Looks like all the attributes on <p= p=p> disappear too [09:37:01.0000] <Philip`> but not on <p== p=p> [09:38:00.0000] <zcorpan> not all attributes [09:39:00.0000] <Philip`> Oh, only the first one [09:39:01.0000] <zcorpan> the first is equivalent to <p="p=p"> [09:39:02.0000] <zcorpan> seems like the tokenizer thinks it's an attribute [09:40:00.0000] <zcorpan> </p=> closes it [09:45:00.0000] <Philip`> It's kind of worrying that I can't find one single canvas feature whose tests all pass in all of Firefox, Opera and Safari [09:47:00.0000] <Dashiva> getContext? [09:48:00.0000] <Philip`> "There is only one CanvasRenderingContext2D object per canvas, so calling the getContext() method with the 2d argument a second time must return the same object." - but getContext('2d') === getContext('2d') fails in Opera [09:49:00.0000] <Philip`> (at least 9.2 - I'm hoping some issues will be fixed a bit in 9.5...) [09:49:01.0000] <Philip`> (and if they aren't, I probably ought to start submitting bug reports) [09:52:00.0000] <annevk> You probably should [09:52:01.0000] <annevk> Although we do have fixed lots of <canvas> bugs (per my measure of "lots of") [09:55:00.0000] <Philip`> Are all those fixes going to be in the public 9.5 builds (rather than 10.0 or whatever)? [09:59:00.0000] <annevk> yeah [09:59:01.0000] <annevk> some might be in the Wii version already [09:59:02.0000] <annevk> or in Opera Mini 4 (although I'm not 100% sure how that will function with <canvas>) [10:02:00.0000] <annevk> /me wonders how much we should care about RFC3986 [10:02:01.0000] <annevk> URL5 [10:04:00.0000] <annevk> Lachy, debugging is an important use case for making style= conforming [10:04:01.0000] <Philip`> Ooh, neat, the Opera Mini 4 simulator works [10:04:02.0000] <Lachy> annevk: why? [10:04:03.0000] <Philip`> Too bad it returns BGRA instead of RGBA in getPixelData, so all the automated test results are broken... [10:05:00.0000] <annevk> Lachy, for instance your debugging tool might display an error if you use incorrect markup [10:05:01.0000] <annevk> Philip`, whoa, that sounds like a major bug [10:05:02.0000] <Lachy> maybe [10:05:03.0000] <Philip`> (Actually, I don't know if it's getPixelData or getPixel) [10:06:00.0000] <annevk> Philip`, hmm, we did add some support for getPixelData and putImageData [10:08:00.0000] <Philip`> /me tries testing it properly [10:11:00.0000] <Philip`> Oops, getPixelData isn't very good to test, since it's not spelt that way... [10:12:00.0000] <annevk> heh [10:13:00.0000] <annevk> funny that I just copied your sentence for getPixelData but got the other method name right... [10:13:01.0000] <Philip`> Hmm, getPixel looks nicely broken too [10:18:00.0000] <Philip`> http://canvex.lazyilluminati.com/misc/pixel.html [10:19:00.0000] <Philip`> Hmm, now Opera Mini gives me java.lang.IllegalStateException [10:19:01.0000] <annevk> they are in different order here [10:21:00.0000] <Philip`> http://canvex.lazyilluminati.com/misc/operamini.png [10:21:01.0000] <annevk> yeah, get the same here [10:21:02.0000] <annevk> what's the correct order anyway? [10:22:00.0000] <annevk> nm [10:22:01.0000] <Philip`> It should say 12, 34, 56, 127 (or 128) in both cases [10:22:02.0000] <Philip`> (Well, the getPixel case should say 12, 34, 56, 0.5) [10:22:03.0000] <Philip`> (and it shouldn't have a load of random junk at the end, I'm guessing) [10:23:00.0000] <Philip`> Firefox gets it wrong and says 6, 17, 28, 127 because it does premultiplied alpha [10:23:01.0000] <annevk> ouch [10:24:00.0000] <Philip`> Also I don't know whether getImageData(...).data is meant to be an actual JS array, or if it's allowed to be another object (like that CanvasPixelArray in Opera) [10:31:00.0000] <Philip`> It's intriguing how Opera Mini 4 fails on http://canvex.lazyilluminati.com/tests/tests/toDataURL.complexcolours.html too - all the transparent colours are correct, but the solid one is missing... [10:32:00.0000] <Philip`> /me goes to fix up his test system to cope with that getImageData bug [10:34:00.0000] <hsivonen> Hixie: does it work for you if I post my comments on the parsing algorithm piecemeal to public-html and say that they are my "detailed review"? [10:44:00.0000] <Philip`> I also like how Opera Mini shows JPEG artifacts on the <canvas> output :-) [10:49:00.0000] <annevk> Philip`, hmm, the solid color missing could indicate that it expects a float for the alpha value [10:49:01.0000] <annevk> /me ponders [10:53:00.0000] <Philip`> Ah, it looks like that is the case [10:53:01.0000] <Philip`> (which is odd since it works fine in Opera 9.2) [10:54:00.0000] <annevk> maybe it was a fix in the CSS parser [10:54:01.0000] <annevk> /me ponders [10:55:00.0000] <Philip`> CSS3 Color says the alpha value is a <number>, and 1 is a <number>, as far as I can see [10:55:01.0000] <annevk> k [10:57:00.0000] <Philip`> /me wishes the Opera Mini simulator had a bigger screen :-p [11:00:00.0000] <annevk> I filed bugs for the getImageData and rgba(... 1) [11:01:00.0000] <Philip`> /me tries to find other regressions [11:01:01.0000] <annevk> much appreciated :) [11:01:02.0000] <annevk> /me notes that getImageData is not a regression but a pretty serious feature bug... [11:02:00.0000] <hsivonen> what's the purpose of the document.write/noscript arrangement at http://demo.opera-mini.net/fourzerobeta/ ? [11:03:00.0000] <hsivonen> /me notes that the foremost Java applet use case (demoing Opera Mini) uses <applet> [11:03:01.0000] <annevk> I've not much against <applet> [11:05:00.0000] <hsivonen> Hixie: one thing that might make sense for research is counting <applet> vs. <object> with Java traits [11:06:00.0000] <hsivonen> my non-researched hunch is that if one embeds an applet, <applet> makes more sense than fighting the <object> windmills [11:07:00.0000] <annevk> keeping it in makes some sense [11:07:01.0000] <annevk> but I'm not really that interested in writing Java applets so I haven't done much about it [11:07:02.0000] <hsivonen> It is totally unclear to me what problem *not keeping it* solves [11:08:00.0000] <annevk> language simplicity is one argument [11:08:01.0000] <hsivonen> it is a rather weak argument if it needs to be specced anyway and <object> is hairier than <applet> [11:09:00.0000] <annevk> well, in theory it's <object data=applet.java> [11:09:01.0000] <annevk> and that actually works in most browsers just not in IE [11:11:00.0000] <Philip`> Hmph, now I just get java.io.IOException when trying to run all the tests... [11:12:00.0000] <annevk> https://bugs.opera.com/ [11:12:01.0000] <annevk> /me has to go [11:12:02.0000] <Philip`> Ooh, it worked that time [11:14:00.0000] <Philip`> though it doesn't like me pressing the buttons to record manual results :-( [11:24:00.0000] <Hixie> hsivonen: sure [11:31:00.0000] <hsivonen> Hixie: ok [12:00:00.0000] <Philip`> Hmm, I can get Opera Mini to either run the test cases, or to submit a form, but can't get it to ever do both at the same time, and there's no way I'm going to manually copy out the test results... [12:08:00.0000] <Philip`> Aha, looks like Opera Mini doesn't like setInterval at all [12:15:00.0000] <webben_> Lachy: wrt to 3M, AFAICT WIndow-Eyes doesn't even see the Flash object because of the SWFObject insertion; investigation of the same object shows that it isn't exposing any information to MSAA; but captions for video may not use a MSAA interface [12:16:00.0000] <webben_> can't seem to get Window-Eyes to notice the fallback content for object either [12:16:01.0000] <webben_> (though maybe it would if i disabled flash and js etc.) [12:17:00.0000] <webben_> this may of course all be my Window-Eyes incompetence though :) [12:25:00.0000] <Philip`> Opera Mini has a rather odd image that looks kind of like splattered blood on its "Internal server error: Failed to transcode URL" page [12:28:00.0000] <hsivonen> /me tries to shift the kind of signal public-html gets to the whatwg-style direction :-) [12:35:00.0000] <webben_> Lachy: "Providing alternative content in the same page or linking to it." ... That's not as a good a solution as progressive enhancement. [12:35:01.0000] <webben_> traditionally alternative sites have a) lagged behind the main page in terms of content and b) not been as "accessible" as they think they are [12:35:02.0000] <webben_> (wrt http://lists.w3.org/Archives/Public/public-html/2007Jul/0027.html) [12:46:00.0000] <Philip`> /me finally gets Opera Mini to sort of almost cooperate [12:47:00.0000] <Philip`> http://canvex.lazyilluminati.com/tests/20070701/tests/results.html - Desktop 9.2 vs Mini 9.5 (with a few missing results because Mini doesn't cooperate enough) [12:49:00.0000] <Philip`> Regressions that I can see: The CSS-colour-outputting code (getPixel, reading fillStyle, etc) writes a load of garbage characters (yay, buffer overflow) instead of the alpha value [12:49:01.0000] <Philip`> Interpolation of gradients with alpha doesn't work - it doesn't draw the gradient at all [12:50:00.0000] <Philip`> ...and I can't find anything else (except the rgba(..., 1) thing, and one case with radial gradients but radial gradients are horribly broken anyway so that doesn't matter) [12:52:00.0000] <Philip`> /me thinks he needs to split up his tests so there's only ~100 on a page at once, because otherwise Opera Mini and WebKit (and Opera to a limited extent) get pretty unhappy [12:55:00.0000] <Philip`> Oh, sorry, looks like that gradient-alpha interpolation thing is just the broken CSS parsing again [13:07:00.0000] <hsivonen> hmm. <pre>\n vs. <pre><!-- -->\n [13:09:00.0000] <hsivonen> grr. they are different 2007-07-02 [17:30:00.0000] <Hixie> hsivonen: you sent four mails right? [17:30:01.0000] <Hixie> about the parser? [17:38:00.0000] <webben> hsivonen: re http://lists.w3.org/Archives/Public/public-html/2007Jul/0064.html and http://lists.w3.org/Archives/Public/public-html/2007Jul/0067.html, a problem with FIrefox 1.04 (of all things) is not really a problem with AT, is it? I'm also not sure why we're evaluating old methods of embedding Flash rather than the newest techniques: http://alistapart.com/articles/flashembedcagematch/ . (These methods are continuing to evolve.) [17:39:00.0000] <webben> (Firefox 1.04 didn't have effective screen reader support.) [17:48:00.0000] <Philip`> I was beginning to think that browsers weren't that bad at drawing lines, but then I reached arcTo and my hopes were shattered :-( [00:54:00.0000] <Hixie> sweet [00:54:01.0000] <Hixie> it works [00:55:00.0000] <Hixie> the "table model" algorithms for forming a table totally work and generate actual tables that are what the markup was! [00:55:01.0000] <Hixie> i guess i shouldn't be so surprised but that's pretty damn cool. [00:56:00.0000] <Hixie> tomorrow i can do the scope="" and headers="" stuff [00:56:01.0000] <Hixie> now that i have actual tables [01:08:00.0000] <annevk> what language did you wrote your model in? [01:13:00.0000] <hsivonen> Hixie: yes, I sent 4 emails. [01:49:00.0000] <annevk> "were many things which WHATwg want to delete, are mentioned (ACRONYM, non-visual rendering of TABLES amongst them.)" [01:50:00.0000] <annevk> FO 4 [01:50:01.0000] <annevk> these people are crazy [03:06:00.0000] <zcorpan_> what's the reason that i like testing parsing more than testing other things? :) [03:07:00.0000] <annevk> parsing is trivial? :p [03:07:01.0000] <zcorpan_> perhaps :) [03:07:02.0000] <zcorpan_> but not really [03:07:03.0000] <annevk> btw, any chance you'll make some script that converts html5lib testdata to browser usable tests? [03:08:00.0000] <annevk> because making html5lib tests is trivial [03:08:01.0000] <zcorpan_> hmm, i could look into that [03:08:02.0000] <annevk> someone from mozilla already did some work [03:08:03.0000] <zcorpan_> pointer? [03:08:04.0000] <annevk> I'm afraid I don't have a pointer [03:09:00.0000] <annevk> the only thing you need to do is get the result DOM [03:09:01.0000] <annevk> iterate over it [03:09:02.0000] <zcorpan_> load the tests in an iframe [03:09:03.0000] <annevk> yeah [03:09:04.0000] <annevk> and from the restult DOM you make a string and compare that with the #document data [03:09:05.0000] <zcorpan_> ah yep [03:10:00.0000] <zcorpan_> seems pretty straightforward [03:11:00.0000] <zcorpan_> do the existing tests assume that scripting is enabled? or disabled? [03:11:01.0000] <annevk> I think enabled, but scripting support is not there [03:12:00.0000] <zcorpan_> might perhaps be good to split up the tests that assume that scripting is enabled or disabled (or neither) [03:13:00.0000] <annevk> I suppose we could make extensions [03:13:01.0000] <annevk> #data --scripting-disabled [03:13:02.0000] <zcorpan_> perhaps [03:15:00.0000] <annevk> don't know what's best [03:16:00.0000] <zcorpan_> not sure if it's possible to enable/disable scripting on a per test basis [03:17:00.0000] <zcorpan_> my port would probably only work with tests that don't assume that scripting is disabled [03:18:00.0000] <annevk> sure [03:18:01.0000] <annevk> you could have some server-side script that organizes the tests for you or something [03:18:02.0000] <annevk> in that case you can simply make a scripting-enabled and scripting-disabled folder [03:18:03.0000] <annevk> dunno [03:21:00.0000] <annevk> to start it would be nice to have some JS that does DOM -> #document\n... [03:21:01.0000] <zcorpan_> yeah [03:21:02.0000] <annevk> then we can compare browsers to html5lib more easily [03:22:00.0000] <zcorpan_> /me looks at the live dom viewer code [03:23:00.0000] <zcorpan_> hmm, what if the dom isn't a tree [03:24:00.0000] <annevk> Hixie has solved that issue [03:24:01.0000] <annevk> I suppose you just keep a pointer around and check if you haven't encountered the node before [03:24:02.0000] <zcorpan_> yeah, but the html5lib test output format doesn't support non-trees [03:25:00.0000] <annevk> oh [03:25:01.0000] <annevk> elementname - graph [03:25:02.0000] <annevk> just output something that makes it non-conforming [03:38:00.0000] <annevk> (if we have that script it can be integrated with the thing david has) [03:39:00.0000] <jgraham> http://lxr.mozilla.org/seamonkey/source/parser/htmlparser/tests/mochitest/ [03:39:01.0000] <annevk> http://hasather.net/html5/parsetree/ [03:39:02.0000] <annevk> jgraham, cool [03:48:00.0000] <hsivonen> I wonder if sayrer is writing a new HTML parser or changing the old one [03:49:00.0000] <hsivonen> (I wouldn't want to change the old one) [03:49:01.0000] <annevk> dunno, seems like mrbkap is still fixing the old one [03:49:02.0000] <annevk> (from the couple of bug numbers referenced there) [03:50:00.0000] <annevk> they also checked out all the old tests [03:55:00.0000] <annevk> jgraham, btw, how easy is it to extend all treebuilders to support two more variables for DOCTYPE? [04:36:00.0000] <zcorpan_> http://simon.html5.org/temp/dom-tostring.html [04:44:00.0000] <zcorpan_> please tell me if i can improve the function, i think it is doing some unnecessary things (looping too many times or something) [04:44:01.0000] <zcorpan_> or if i got the format wrong [04:50:00.0000] <hsivonen> zcorpan_: well, if you wanted to optimize performance (which isn't really necessary), you could traverse the DOM using an iterative algorithm instead of a recursive one [04:56:00.0000] <zcorpan_> hsivonen: hmm, ok. i think i'll leave it as is for now [05:58:00.0000] <zcorpan_> the selectors api naming issue will never end [06:02:00.0000] <annevk> I wonder when HTML 5 will get issues like that [06:02:01.0000] <annevk> hopefully when everything is already shipped in implementations [06:08:00.0000] <zcorpan_> selectorQuery/selectorQueryAll seem to be good names [06:10:00.0000] <annevk> wfm [06:25:00.0000] <Lachy> yeah, I like those names too. It looks like a reasonable compromise between selectElement and cssQuery, but I have a feeling there's going to be someone that doesn't like it [06:27:00.0000] <zcorpan_> of course someone won't like it [08:12:00.0000] <zcorpan> so why doesn't this work? http://simon.html5.org/temp/html5lib-tests/wrapper.html [08:14:00.0000] <annevk> you're not invoking send()? [08:14:01.0000] <annevk> with my XHR2 that would not be necessary but it seems that it will not going to fly :) [08:17:00.0000] <Lachy> how is XHR2 going to know when to implicitly invoke send()? [08:17:01.0000] <annevk> much like Audio(src) [08:17:02.0000] <annevk> anyway, not going to happen [08:18:00.0000] <annevk> new XMLHttpRequest(method, src) is likely the only shortcut that makes some sense [08:18:01.0000] <Philip`> Sounds like about the same as setTimeout(fn, 0) too (i.e. as soon as possible after the current script has finished) [08:18:02.0000] <annevk> yeah [08:23:00.0000] <zcorpan__> hmm, is tests1.txt parsed as xml or something? [08:25:00.0000] <annevk> shouldn't matter for responseText [08:26:00.0000] <zcorpan__> Error: syntax error [08:26:01.0000] <zcorpan__> Source File: file://loca.....l5lib-tests/tests1.txt [08:26:02.0000] <zcorpan__> Line: 1, Column: 1 [08:26:03.0000] <zcorpan__> Source Code: [08:26:04.0000] <zcorpan__> #data [08:27:00.0000] <annevk> oh, in Firefox? [08:27:01.0000] <zcorpan__> yeah [08:28:00.0000] <annevk> they have some issues there I believe [10:17:00.0000] <virtuelv> hm, wonder which version of Safari the iPhone is based on [10:18:00.0000] <virtuelv> Row 1 and 4+5 of Acid2 fails [10:19:00.0000] <Philip`> virtuelv: It calls itself "Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1A543a Safari/419.3" which sounds like a possibly old WebKit [10:21:00.0000] <virtuelv> then this too: http://www.johnmurch.com/2007/07/01/iphone-javascript-and-spec-benchmark/ makes sense [10:22:00.0000] <Philip`> Looks like someone tried running Canvex on an iPhone - I wonder what kind of performance they get... [12:35:00.0000] <jgraham> annevk: The only issue I can think of with adding more doctype stuff to the treebuilders is several tree types don't really support the concept of doctypes [12:36:00.0000] <jgraham> There have been hacked in various ways to support the concept e.g. by having elements with tagnames like "<!doctype>" [12:37:00.0000] <jgraham> But where we hacked we can hack a little more [12:37:01.0000] <jgraham> :) [12:45:00.0000] <hsivonen> jgraham: my plan is not to expose the doctype by default through APIs designed for XML [12:46:00.0000] <jgraham> hsivonen: We need it for running tests. By default it is stripped when a special flag is not set [12:46:01.0000] <hsivonen> annevk: the doctype node stuff is for browsers. it is rather pointless for parsing libraries meant for non-browser use cases [12:47:00.0000] <hsivonen> jgraham: OK. I guess I have to support doctypes in SAX (buffered and streaming) [12:47:01.0000] <hsivonen> but supporting them in the DOM just sucks without a backdoor API [12:48:00.0000] <jgraham> (that is, it is stripped where it not supported by the underlying tree) [12:49:00.0000] <hsivonen> if the use case is making a drop-in library for apps that now use an XML parser, exposing the doctype might be more harm than good [12:50:00.0000] <hsivonen> but yeah, supporting doctype exposure in SAX and XOM is reasonable [16:53:00.0000] <Hixie> /me narowly misses flinging his ipod acress his cube [16:53:01.0000] <Hixie> oops [16:57:00.0000] <Hixie> right, done scope="" [16:57:01.0000] <Hixie> now headers="" 2007-07-03 [17:24:00.0000] <rubys> jgraham: ping? [17:27:00.0000] <Hixie> if you have what you think is a tree, in the form of a list A of mappings from one node to a list of nodes all of which are in list A [17:27:01.0000] <Hixie> is there a way short of walking the entire tree to verify that the list is indeed a tree and that there are thus no loops? [17:29:00.0000] <othermaciej_> there probably is, based on what graph properties make the graph a tree [17:29:01.0000] <othermaciej_> to be a tree you need to be not just cycle-free but also have exactly one directed edge pointing to each node (except the root) [17:29:02.0000] <Hixie> i guess i don't mean a tree, i mean a directed graph [17:30:00.0000] <othermaciej_> directed acyclic graph? [17:30:01.0000] <Hixie> right [17:30:02.0000] <Hixie> basically a have a list of table cells, each of which can be the header (through headers="") for zero or more other cells, and each of which can have zero or more header cells for itself [17:30:03.0000] <Hixie> but there mustn't be any loops [17:31:00.0000] <othermaciej_> let me look it up in my CLR [17:31:01.0000] <Hixie> i mean i'll do the full walk if there's no quicker way [17:31:02.0000] <Hixie> (memory is no object) [17:31:03.0000] <kingryan> /me thinks that's the only way [17:32:00.0000] <kingryan> you might be able to cache some of it, though [17:32:01.0000] <othermaciej_> I don't even know what you mean by "full walk" [17:32:02.0000] <othermaciej_> you'd have to walk every possible path, not just visit every node once [17:32:03.0000] <othermaciej_> if you are really brute forcing it [17:32:04.0000] <Hixie> yeah [17:32:05.0000] <othermaciej_> you'd have to show all paths through the graph terminate [17:34:00.0000] <othermaciej_> Hixie: iteratively removing nodes with no outgoing edges is one way [17:35:00.0000] <Hixie> ok screw this. i don't HAVE to check that headers="" don't form loops [17:35:01.0000] <othermaciej_> Hixie: you'd want a hashtable from node to nodes it points to, and one the other way [17:35:02.0000] <Hixie> at least not in the first pass [17:36:00.0000] <kingryan> Hixie: you only need to check them if you're going to be walking them (check to avoid inf. loops) [17:36:01.0000] <Hixie> yeah [17:36:02.0000] <Philip`> I think you could do a topological sort [17:36:03.0000] <Hixie> which i don't [17:36:04.0000] <Philip`> which'll tell you if it's got any cycles [17:36:05.0000] <Hixie> but i was hoping to be able to see how many pages had that problem [17:36:06.0000] <othermaciej_> Philip`: I'm not sure the obvious topological sort algorithms will terminate in finite time [17:37:00.0000] <othermaciej_> on a graph with cycles [17:37:01.0000] <othermaciej_> since topological sorts are desgined to work on a DAG [17:37:02.0000] <Philip`> You can just do a depth-first search - start with each node being white, mark each one as grey when you recurse into it, mark it as grey when you recurse back out, and if you ever follow an edge into a grey node then there's a cycle [17:38:00.0000] <Philip`> Uh [17:38:01.0000] <Philip`> *mark it as black when you recurse back out [17:38:02.0000] <othermaciej_> that works [17:38:03.0000] <othermaciej_> hmm wait [17:38:04.0000] <Philip`> (You can do some thingy with numbering nodes as you turn them black, to get a topological sort, I think) [17:38:05.0000] <othermaciej_> I'm not sure it works [17:39:00.0000] <othermaciej_> not obvious to me that a cycle couldn't be observable only by visiting a black node [17:42:00.0000] <othermaciej_> DFS can detect cycles by identifying back-edges [17:43:00.0000] <othermaciej_> your algorithm is right [17:44:00.0000] <othermaciej_> I guess that would run in O(E) where E is the number of edges [17:44:01.0000] <othermaciej_> which seems like the best you could do [17:47:00.0000] <Hixie> and it'll work whatever order i do the nodes in, as far as i can tell [17:47:01.0000] <Hixie> which is useful [17:47:02.0000] <Hixie> in my case [17:50:00.0000] <Philip`> I think I can convince myself it's right by saying that if there is a cycle, then when the DFS reaches some node N in that cycle, it will not mark the node as black until either it has reached another grey node (and found a cycle) or has searched the whole cycle and got back to N (which is grey, so it finds the cycle) or has reached a black node in the cycle; and there can never be a black node in the cycle, because the cycle will be detected before an [17:50:01.0000] <Philip`> ...before any node in the cycle is marked as black [17:52:00.0000] <Philip`> I guess you have to do something to make sure the DFS covers all the nodes (by repeatedly DFSing from some arbitrary remaining white node, until there are none) [17:53:00.0000] <Hixie> yeah i'm just going to go through every node with at least one outgoing edge (since i have to visit them anyway for unrelated reasons) and if it's white, i do the search [17:53:01.0000] <Philip`> It should be O(V) rather than O(E) because it'll never visit one node more than once [17:54:00.0000] <Philip`> except I'm probably confused and it's O(E) too, so it's more like O(min(V, E)), not that anybody actually cares, since V =~ E anyway for non-crazy graphs [17:55:00.0000] <Hixie> this is where i find out there's only 5 tables on the whole web with a headers="" attribute and therefore it could be O(N^4) and still complete in finite time [17:59:00.0000] <othermaciej_> Philip`: it has to traverse every edge at least once to see the color of the node at the other end [18:00:00.0000] <othermaciej> Philip`: but I guess it's O(V+E) since you need to visit disconnected nodes too [18:00:01.0000] <Philip`> Got to be careful in case you stumble across some gigantic table with hundreds of rows and columns that's been made accessible with (buggy) headers, since that might cause an O(N^4) algorithm to take a second or two [18:01:00.0000] <othermaciej> actually I guess you don't since Hixie's data structure only represents edges [18:01:01.0000] <othermaciej> hundreds could be worse than a second or two with an O(N^4) algorithm [18:01:02.0000] <othermaciej> N^4 gets bad pretty quickly [18:01:03.0000] <Philip`> Oh, whoops, I forgot it'd still have to look along all the edges to already-black nodes [18:02:00.0000] <Hixie> yeah N^4 is insanely bad if you've got anything of any kind of size [18:02:01.0000] <Philip`> 100^4 = 10^8 which isn't all that bad if you're just following a few pointers :-) [18:03:00.0000] <Hixie> sadly i have to do a string lookup on every single one of these edges :-) [18:03:01.0000] <Hixie> (of course if it's bad, i'll optimise it more. we'll see) [18:04:00.0000] <Philip`> You could do an O(E) preprocessing step to do all the string lookups per edge, before doing the horribly inefficient but highly optimised O(N^4) cycle-finding algorithm on it :-) [18:04:01.0000] <Hixie> indeed [18:08:00.0000] <othermaciej> DFS isn't that hard to code, doesn't seem like a big deal [18:08:01.0000] <Hixie> indeed [18:08:02.0000] <Hixie> and you'll be glad to know it works [18:08:03.0000] <Hixie> sweet [18:08:04.0000] <othermaciej> nice [18:09:00.0000] <Hixie> it tested my three test tables in 0.244s including compiling the program and parsing the html [18:10:00.0000] <Hixie> and given that it took 0.245s to do the same program with only one empty test file... [18:10:01.0000] <othermaciej> it runs in negative time! [18:11:00.0000] <Hixie> and y'all were worried about it being slow! [18:11:01.0000] <kingryan> O(-N^4) ? [18:15:00.0000] <Philip`> Give it a really big table to test, and see if it returns the answer before you've even started the program [18:37:00.0000] <Philip`> Hmm, just remembered a slower but simpler way to find cycles: use a kind of negated variant of Bellman-Ford, by initialising every node's 'distance' value to 0, then setting v.distance=max(v.distance, 1+u.distance) for each edge (u,v), then repeating num_nodes+1 times, and if any has distance=num_nodes+1 then there's a cycle [18:40:00.0000] <Philip`> ...or is that totally rubbish and wrong? I'm not quite sure now [19:02:00.0000] <Hixie> hsivonen: please confirm that since the last time i checked about your parsing e-mails, you have sent only one further message (about <select>) [19:21:00.0000] <Hixie> holy crap, according to this nearly half of all tables with headers="" have a cycle [19:21:01.0000] <Hixie> that seems unlikely [19:22:00.0000] <Hixie> in fact of 60,000 tables with headers="" that i just parsed, only 194 came out without some sort of error [19:22:01.0000] <Hixie> and of those, 177 didn't need headers="" at all because scope="" got the same effect [19:23:00.0000] <Hixie> leaving 17 tables out of 60,000 with headers="" (in just over 100,000,000 documents total) that used headers="" in a non-trivial yet correct way [19:23:01.0000] <Hixie> /me looks at those 17 tables [19:24:00.0000] <Hixie> one of them was the table on http://cgi.ebay.ie/Nokia-6210-unlocked-battery-charger-WARRANTY_W0QQitemZ200124682259QQihZ010QQcategoryZ3312QQcmdZViewItem [19:24:01.0000] <Hixie> and it only uses headers with the empty string as its value [19:24:02.0000] <Hixie> maybe i should exclude those, huh [19:24:03.0000] <Hixie> in fact 9 of these were variants on that ebay page [19:25:00.0000] <othermaciej> would that require assuming no header is a header for that call? [19:25:01.0000] <Hixie> my headers="" algorithm used nothing but headers="" to assign headers to cells [19:25:02.0000] <Hixie> so <th> elements have no effect when headers="" is specified [19:26:00.0000] <othermaciej> what I'm wondering is, whether that is the specified behavior for headers="" [19:26:01.0000] <Hixie> in html4? [19:27:00.0000] <othermaciej> yeah [19:28:00.0000] <Hixie> ok i clearly need to look for tables with only blank headers="", since all but one of these uses of headers="" that different from scope="" are blank headers="" only. [19:28:01.0000] <othermaciej> I guess HTML4 is not very clear on it [19:28:02.0000] <Hixie> (http://www.bls.gov/oco/cg/cgs041.htm being that page) [19:30:00.0000] <Hixie> and that page only uses headers="" to associate <th>s with parent <th>S [19:30:01.0000] <Hixie> it doesn't actually do anything to make the table accessible as far as i can tell [19:32:00.0000] <othermaciej> that's a pretty poor record [19:32:01.0000] <Hixie> i'm skeptical of the large number of loops [19:32:02.0000] <Hixie> that seems unlikely [19:32:03.0000] <othermaciej> .3% of usage being error-free seems pretty damn low, even by the already low standards of most HTML features [19:33:00.0000] <othermaciej> that does sound suspicious (the number of loops) [19:33:01.0000] <Hixie> i also scanned longdesc="" in the same survey. i had my script throw out obviously invalid uses of longdesc="", like pointing to a file that the parent <a href=""> points to. [19:34:00.0000] <Hixie> doing a spot check of the pages that came up as "good" uses, one was pointing to the same file, and another was pointing to a file that was the destination of a 301 redirect of a parent <a href=""> [19:54:00.0000] <Hixie> wow, longdesc is a disaster zone far worse than i had imagined [19:57:00.0000] <Hixie> many of these are just pointing to the root of the site! [19:57:01.0000] <Hixie> /me adds another heuristic to look for that [19:57:02.0000] <Hixie> lol, the longdesc="" on http://www.felicieditore.it/ points to http://www.felicieditore.com/, which doesn't exist [20:00:00.0000] <Hixie> http://7mobile.de/shop/select?id=101787&v=010000 is a longdesc disaster in so many ways [20:06:00.0000] <Lachy> Hixie: is it looking so bad for headers and longdesc that you're going to consider leaving them out? [20:08:00.0000] <Hixie> i'm going to _consider_ leaving them out just like i'm going to consider leaving them in [20:09:00.0000] <othermaciej> right now it's looking kind of bad for headers even on just a "degrade gracefully in current versions of the #2 screen reader" basis [20:09:01.0000] <Lachy> ok. Maybe you could put them in, and include some algorithm to determine when it should be ignored due to it containing an illogical value [20:09:02.0000] <othermaciej> which I think was the best argument in its favor [20:10:00.0000] <othermaciej> if Hixie's data about how many uses are invalid holds up, anyway [20:10:01.0000] <Hixie> yeah i'm getting a sample of those with cycles to check that [20:15:00.0000] <Hixie> i think it's fair to say that no valid longdesc will ever point to the root of a domain, right? [20:17:00.0000] <Hixie> oh crap, missed dinner. bbl. [21:03:00.0000] <Hixie> ok there's definitely something wrong with the cycle detection [21:14:00.0000] <othermaciej> I think I found a mistake in CSS 2.1 (at least in the November 2006 WD) [21:15:00.0000] <othermaciej> is there any way to see a newer editor's draft so I can check if it is fixed before I report it? [21:15:01.0000] <Hixie> http://www.w3.org/Style/Group/css2-src/cover.html [21:15:02.0000] <Hixie> /me fixes the bug [21:16:00.0000] <Hixie> i was indexing using the wrong variable. duh. [21:16:01.0000] <othermaciej> can you check for me if this is really a mistake before I make an ass of myself [21:16:02.0000] <othermaciej> http://www.w3.org/Style/Group/css2-src/visufx.html says, about overflow, "It affects the clipping of all of the element's content except any descendant elements (and their respective content and descendants) whose containing block is the viewport or an ancestor of the element." [21:16:03.0000] <othermaciej> but obviously that is not supposed to apply to overflow on the viewport itself [21:16:04.0000] <Hixie> what's the error? [21:17:00.0000] <othermaciej> right? [21:17:01.0000] <Hixie> right, the viewport is not an element [21:18:00.0000] <othermaciej> ok, maybe just a lack of clarity, not an error [21:18:01.0000] <othermaciej> since if you interpret it that way, it doesn't say anything about how to clip for overflow on the viewport [21:18:02.0000] <Hixie> that sentence doesn't really say anything about anything [21:20:00.0000] <othermaciej> later examples seem to assume it is saying something [21:20:01.0000] <Hixie> yeah, css2.1 is only marginally better than html4 in terms of spec quality [21:27:00.0000] <othermaciej> ok maybe I won't bother with this, even though it was confusing to me, the actual behavior seems to be interoperable [22:02:00.0000] <Hixie> Lachy: yt? [23:26:00.0000] <Hixie> every page i've checked so far that has non-redundant headers="" actually uses them incorrectly. [23:27:00.0000] <Hixie> although maybe we need a heuristic for the top-left cell [23:45:00.0000] <Hixie> ok i finally found a page with a real longdesc="" [23:45:01.0000] <Hixie> http://www.britanniarescue.com/about/strategy/ [23:45:02.0000] <Hixie> http://www.britanniarescue.com/online/longdesc/index.php#BRlogo [23:46:00.0000] <Hixie> the longdesc is inaccurate, and it would be more useful for the information in that file to be in alt="" text anyway [23:59:00.0000] <Hixie> longdesc="mailto:trustee⊙nc" [23:59:01.0000] <Hixie> wtf [00:25:00.0000] <hsivonen> Hixie: confirmed only one additional email [00:28:00.0000] <Hixie> thanks [00:28:01.0000] <Hixie> just making sure none of your mails fall through the cracks when i speed-read the html list... [00:55:00.0000] <hsivonen> Hixie: should I CC you next time? [00:56:00.0000] <Hixie> no, it's ok [00:56:01.0000] <Hixie> just making sure [00:56:02.0000] <hsivonen> ok [00:57:00.0000] <hsivonen> on the face of it, http://www.britanniarescue.com/about/strategy/ seems to have decorative images. why do they bother with longdesc? [00:57:01.0000] <Hixie> i just select all mail to html and read it, then select all mail to the next list and read it, etc [00:57:02.0000] <Hixie> i have no idea why they use it [00:57:03.0000] <Hixie> probably because It's The Law [00:58:00.0000] <Hixie> after looking at all this in more detail, i'm starting to suspect that the accessibility advocacy has maybe done more damage than help, sadly [00:59:00.0000] <hsivonen> yeah. in some twisted way it seems to me that by speccing accessibility features we might actually create lawyerbombs :-( [01:20:00.0000] <Lachy> Hey Hixie, I'm here now [01:21:00.0000] <Hixie> hey [01:22:00.0000] <Hixie> i found a workaround around whatever it was i was going to ask you [01:22:01.0000] <Hixie> which i've forgotten now [01:22:02.0000] <Lachy> ok, no worries [01:23:00.0000] <Lachy> /me is off to see the Transforms movie now [01:23:01.0000] <Hixie> aha, the next wave of data is in [01:23:02.0000] <Lachy> *Transformers [01:23:03.0000] <Hixie> /me examines [01:25:00.0000] <Hixie> lol [01:25:01.0000] <Hixie> one of the longdesc=""s points to a file called spacer.txt [01:25:02.0000] <Hixie> i have my doubts about the usefulness of THAT longdesc [01:29:00.0000] <Dashiva> How excellent, an accessible spacer gif [01:29:01.0000] <Hixie> there are 8 times more longdesc=""s that point to the same page as an ancestor <a href=""> than there are longdesc=""s that didn't get caught on any of my "likely to suck" heuristics [01:30:00.0000] <Hixie> and out of 8 million <table>s with a cell with a headers="" attribute, twenty thousand had a cycle in the headers="" [01:30:01.0000] <Hixie> jesus [01:30:02.0000] <Hixie> and over a million had IDs that pointed to elements that weren't cells! [01:31:00.0000] <Hixie> ten thousand had overlapping cells [01:32:00.0000] <Hixie> in about four million cases, the headers="" attribute were redundant given the algorithm in the spec for mapping <th>s to <td>s [01:32:01.0000] <Hixie> in about 80,000 cases the headers="" attribute _would_ have been redundant if all the headers used <th> elements instead of <td> [01:32:02.0000] <Hixie> leaving about 2 million cases that might be valid which i'll have to look at [01:35:00.0000] <Hixie> 2 for 2 on broken uses so far [02:19:00.0000] <hsivonen> http://tools.ietf.org/html/draft-walsh-tobin-hrri-00 [02:20:00.0000] <annevk> that's been up for a while now, not? [02:21:00.0000] <annevk> although I don't think they are actually fixing anything [02:21:01.0000] <annevk> they are just widening the range of allowed characters [02:25:00.0000] <hsivonen> annevk: may have been. I dunno. found out today [02:25:01.0000] <zcorpan> a superset of IRI? [02:26:00.0000] <hsivonen> zcorpan: so it seems [02:26:01.0000] <hsivonen> URL5 [02:26:02.0000] <zcorpan> yeah [02:27:00.0000] <annevk> that's what we need, yes [02:27:01.0000] <annevk> that's not what it is :( [02:28:00.0000] <hsivonen> URL, URI, IRI, HRRI, URL5 [02:30:00.0000] <zcorpan> were there not more names somewhere in between? [02:30:01.0000] <annevk> /me learns about ephemeral [02:30:02.0000] <annevk> there's XRI -> HRRI [02:30:03.0000] <annevk> iirc [02:31:00.0000] <annevk> IRIs are not done yet fwiw [02:38:00.0000] <annevk> dropped / not included / omitted / ...? [02:38:01.0000] <annevk> suggestions? [02:40:00.0000] <annevk> excluded? [02:41:00.0000] <zcorpan> 2007-07-01 17:35 Ben 'Cerbera' Millard "absent" might be even better? [02:41:01.0000] <zcorpan> 2007-07-01 17:35 Ben 'Cerbera' Millard "not included" can still imply "we decided not to include these" [02:41:02.0000] <zcorpan> 2007-07-01 17:35 Ben 'Cerbera' Millard "absent" just means "not present" [02:42:00.0000] <annevk> cool [03:04:00.0000] <zcorpan> people really think that new features will suffer less from interop problems than existing features [03:05:00.0000] <annevk> it's mostly an academic exercise it seems [03:05:01.0000] <annevk> although not a real interesting one at that [03:42:00.0000] <Hixie> "Is XHTML 5 the successor of XHTML 2? Of course not." seems to beg the question with tr/52/21/ [03:42:01.0000] <Hixie> didn't someone already ask him that? [03:44:00.0000] <Hixie> oh i see henri basically said that already [03:44:01.0000] <annevk> maybe we should have "HTML 5" (language) and HTML and XHTML (syntax) [03:44:02.0000] <annevk> the XHTML syntax for HTML 5 shorthand would be XHTML5 but that would be unofficial [03:44:03.0000] <othermaciej> s/beg the question/invite the question/ [03:45:00.0000] <othermaciej> /me hopes that here at least he can still be gently pedantic [03:45:01.0000] <zcorpan> /me hasn't seen the tr/// constructor before [03:45:02.0000] <othermaciej> it's sed syntax [03:45:03.0000] <othermaciej> (also perl I think) [03:46:00.0000] <othermaciej> same source as s/foo/bar/ [03:50:00.0000] <zcorpan> seems useful :) [03:52:00.0000] <zcorpan> /me also learns that other puncation and parantheses can be used instead of slashes [03:56:00.0000] <annevk> the WHATWG sniffing algorithm doesn't seem to deal with .ico formats, bitmaps, etc. [03:59:00.0000] <zcorpan> http://del.icio.us/url/99931bd7993088a7dc60da0a031732e1 -- "(X)HTML4" [03:59:01.0000] <Hixie> annevk: seems easiest to just ignore the whole issue, frankly. it's not like the spec is called "xhtml5" [03:59:02.0000] <Hixie> annevk: does the spec allow for extra rows to sniff such types? [04:00:00.0000] <krijnh> zcorpan: vpieters? :| [04:00:01.0000] <annevk> Hixie, no it says "User agents must ignore any rows for image types that they do not support." [04:00:02.0000] <annevk> which seems to conflict with the warning earlier on [04:00:03.0000] <annevk> I might have mentioned that on the mailing list already [04:00:04.0000] <zcorpan> krijnh: and condor87 [04:01:00.0000] <Hixie> annevk: ah well we'll have to add rows then [04:09:00.0000] <annevk> /me ponders about <picture> [04:10:00.0000] <annevk> it seems such an obvious failure, how can they not see it? [04:13:00.0000] <hsivonen> annevk: indeed [04:14:00.0000] <hsivonen> annevk: Sander Tekelenburg's attempt at making it backwards compatible should show that the nice idea gets out of control quickly when you scratch the surface [04:14:01.0000] <annevk> neither proposal even works in IE7 [04:15:00.0000] <hsivonen> I try to focus on tree building instead spending the whole day replying to the list [04:16:00.0000] <annevk> I think I'll work on some tests for getBoundingClientRect and getClientRects or something [04:16:01.0000] <annevk> lunch first! [04:16:02.0000] <hsivonen> I'm getting more and more convinced that grouping by insertion mode first and by element second makes sense [04:16:03.0000] <annevk> you're keeping insertion modes? [04:17:00.0000] <hsivonen> with fall through for IN_TABLE etc. to IN_BODY and from IN_BODY to IN_HEAD_NOSCRIPT to IN_HEAD [04:17:01.0000] <hsivonen> annevk: no. I have just phases [04:17:02.0000] <annevk> oh ok [04:17:03.0000] <annevk> i like your code for the tokenizer quite a bit [04:18:00.0000] <annevk> although the comments are quite verbose [04:18:01.0000] <hsivonen> annevk: it's the spec :-) [04:18:02.0000] <annevk> yeah :) [04:18:03.0000] <hsivonen> too bad that doing the same for tree building is too much work [04:19:00.0000] <annevk> we just need lots of testcases [04:19:01.0000] <annevk> if zcorpan gets a proper browser framework to work for html5lib tests I assume we'll get even more testcases there [04:20:00.0000] <hsivonen> I intend to print my tree builder and the spec and go over them with a highlighter pen to check that everything is there [04:20:01.0000] <annevk> especially since the testformat is quite easy and the output can be generated using tools (assuming html5lib is compliant) [04:21:00.0000] <annevk> not sure yet how to test the formpointer stuff [04:21:01.0000] <annevk> that may require some extension [04:22:00.0000] <hsivonen> annevk: I have been thinking of a sanitizer tree that puts an UUID ID on <form> and form='' on out-of-subtree associated inputs [04:29:00.0000] <Hixie> so has anyone actually defined the problem that <picture> is intended to solve? [04:31:00.0000] <hsivonen> Hixie: implicitly, the problem is that <img> doesn't allow structured fallback--only a plain string [04:31:01.0000] <Hixie> aah [04:32:00.0000] <Hixie> does he elaborate on why <object> and longdesc="" don't handle this well enough? [04:32:01.0000] <Hixie> http://www.grupodignidade.org.br/projetos.php - <img src="img/logo.gif" alt="logo" width="160" height="80" longdesc="http://www.grupodignidade.org.br/img/logo.gif" /> [04:32:02.0000] <Hixie> sigh [04:32:03.0000] <hsivonen> Hixie: for <object>, yes. for longdecs, I no longer remember [04:32:04.0000] <Hixie> k [04:33:00.0000] <Hixie> bed time [04:33:01.0000] <Hixie> nn [04:33:02.0000] <hsivonen> nn [04:39:00.0000] <annevk> the table and longdesc study is interesting [04:59:00.0000] <zcorpan> hmm, it's not possible to check what case elements are in the dom in html, is it? except perhaps trying getElementsByTagNameNS or something [05:04:00.0000] <annevk> don't think so [05:04:01.0000] <annevk> unless localName is somehow secured [05:05:00.0000] <zcorpan> given webkit's implementation experience with my suggestion about localName, even that seems to be a dead end [05:07:00.0000] <zcorpan> i'll just have to use toLowerCase() [05:11:00.0000] <zcorpan> http://simon.html5.org/temp/html5lib-tests/wrapper.html -- got something working at least. now i just need to figure out how to parse and test the real files. or perhaps i'll just use another wrapper with some php. that may be simpler, dunno [05:14:00.0000] <zcorpan> the function fails in ie if there's a short bogus comment like <!foo> [05:31:00.0000] <zcorpan> </> results in a "/" element in ie [05:38:00.0000] <zcorpan> same as </foo> really [05:39:00.0000] <zcorpan> stray </x:y> gets dropped [05:52:00.0000] <annevk> dropping </> works just as well [05:57:00.0000] <zcorpan> oh sure. i was surprised that ie didn't drop it [06:46:00.0000] <annevk> lol [06:46:01.0000] <annevk> tr > tbody > td [06:46:02.0000] <annevk> tbody is not implied! [06:59:00.0000] <Philip`> Shouldn't that be "tbody > tr > td"? [06:59:01.0000] <annevk> yeah [07:01:00.0000] <Philip`> Ah [07:43:00.0000] <zcorpan> making progress...: http://simon.html5.org/temp/html5lib-tests/wrapper.html [07:44:00.0000] <zcorpan> now i just need to make the text file into two arrays [07:45:00.0000] <annevk> /me wonders in what kind of fantasyland some people live [07:45:01.0000] <annevk> "I was thinking exactly the opposite, and wondering whether Microsoft might be persuaded to migrate their horrific ?Active-X? strings from the opening <object> tag to an nested <param>." [07:46:00.0000] <Philip`> zcorpan: "Security error: attempted to read protected variable" - why doesn't Opera like that? [07:47:00.0000] <zcorpan> Philip`: dunno, works in Kestrel [07:48:00.0000] <Philip`> Oh, okay, maybe it's only a problem with 9.2 [07:49:00.0000] <annevk> evil data: URIs [07:49:01.0000] <hsivonen> annevk: in a world where the value of π is a legislative decision [07:55:00.0000] <zcorpan> any suggestions on how to read the text file with js? [07:56:00.0000] <hsivonen> zcorpan: XHR? [07:56:01.0000] <zcorpan> hsivonen: yeah. although in firefox i got a "syntax error" when trying to read .responseText [07:59:00.0000] <zcorpan> but let's assume that doesn't happen in firefox and i can read the file... how do i then parse it into two arrays? [07:59:01.0000] <zcorpan> my previous attempt with split() was too naïve and didn't really work [08:00:00.0000] <Philip`> Regular expressions? [08:00:01.0000] <Philip`> Whatever the problem, they are always the solution [08:00:02.0000] <annevk> :p [08:00:03.0000] <hsivonen> "now you have two problems" :-) [08:00:04.0000] <annevk> why doesn't split("\n\n") work? [08:02:00.0000] <zcorpan> does that work with multiple lines? [08:02:01.0000] <zcorpan> also, what if a test has e.g. \n\n as data [08:02:02.0000] <zcorpan> or doesn't the syntax allow for that? [08:02:03.0000] <annevk> oh right, yes [08:02:04.0000] <zcorpan> i think it does, so long as no test has \n\n as data [08:03:00.0000] <annevk> no \n\n can occur [08:03:01.0000] <zcorpan> ok [08:03:02.0000] <annevk> just split on \n\n#data or something and remove #data from the first line too [08:03:03.0000] <zcorpan> splitting removes automatically [08:06:00.0000] <Philip`> http://wiki.whatwg.org/wiki/Parser_tests#Tree_Construction_Tests doesn't seem to say it has to have blank lines between tests - the only delimiter is "\n#data\n" [08:06:01.0000] <annevk> sure, but the first test doesn't start with \n\n [08:06:02.0000] <annevk> Philip`, except for the first test... [08:06:03.0000] <annevk> also, two newlines is sort of accepted [08:06:04.0000] <Philip`> /^#data$/ [08:06:05.0000] <Philip`> /^#data$/ [08:08:00.0000] <Philip`> Uh [08:08:01.0000] <Philip`> /^#data$/m [08:10:00.0000] <Philip`> (or something like /\n*^#data\n/m if you want to strip newlines, assuming the last test doesn't end with a newline) [08:11:00.0000] <Philip`> /me wonders if anyone has written test cases for test case parsers [08:12:00.0000] <Philip`> though I'm not entirely sure how you'd parse the tests for the test parser [08:12:01.0000] <zcorpan> we need a parsing spec for the test case format [08:12:02.0000] <zcorpan> -_- [08:40:00.0000] <annevk> I tweaked http://wiki.whatwg.org/wiki/Parser_tests#Tree_Construction_Tests a bit to make it more clear what the actual format is [08:41:00.0000] <Philip`> The link at the bottom to the tests should probably be updated [08:42:00.0000] <Philip`> 'a line that says "#errors:"' - probably shouldn't have the colon [08:43:00.0000] <annevk> at some point the format used by http://html5lib.googlecode.com/svn/trunk/testdata/tree-construction/tests4.dat should be added too and the description could use some more whitespace... [08:56:00.0000] <zcorpan> yay [08:57:00.0000] <zcorpan> works in Kestrel now [08:58:00.0000] <annevk> zcorpan, sweet [08:58:01.0000] <zcorpan> firefox boils at...: Error: unexpected end of XML source [08:58:02.0000] <zcorpan> Source File: data:text/html,<script><div></script></div><title><p></title><p><p> [08:58:03.0000] <zcorpan> Line: 1, Column: 4 [08:58:04.0000] <zcorpan> Source Code: [08:58:05.0000] <zcorpan> <div> [08:58:06.0000] <annevk> ah [08:59:00.0000] <zcorpan> is that e4x or something? [08:59:01.0000] <Philip`> It works in precisely none of the five browsers I have access to :-( [08:59:02.0000] <annevk> put encodeURIComponent around it [08:59:03.0000] <annevk> maybe that will make it work better (it's also theoretically more correct) [08:59:04.0000] <zcorpan> don't think that's the problem [08:59:05.0000] <zcorpan> it's <script><div></script> in the actual test [09:00:00.0000] <annevk> maybe catch all error events and silence them? [09:01:00.0000] <annevk> iframe.onerror = function ... [09:01:01.0000] <Philip`> That would be parsed as E4X, I believe - it's only in the cases of <!--...--> and <![CDATA[...]]> where you have to use type="text/javascript;e4x=1" [09:02:00.0000] <annevk> iframe.onerror = null [09:02:01.0000] <annevk> or something [09:02:02.0000] <Philip`> (http://developer.mozilla.org/en/docs/E4X) [09:02:03.0000] <zcorpan> annevk: doesn't help [09:02:04.0000] <zcorpan> annevk: don't think JS errors bubble up to the parent document [09:03:00.0000] <annevk> zcorpan, iframe.contentWindow.onerror = null [09:03:01.0000] <zcorpan> annevk: nope [09:04:00.0000] <annevk> does it actually work if you remove that test? [09:05:00.0000] <zcorpan> hmm. no. [09:05:01.0000] <annevk> btw, it would be nice if you showed the input data in the result tree as well [09:06:00.0000] <annevk> makes it easier to analyze potential errors [09:06:01.0000] <Philip`> Could change the tests to do <script type="unsupported"> so browsers won't try running them [09:07:00.0000] <annevk> that may work [09:08:00.0000] <zcorpan> or use //<div> instead of <div> [09:08:01.0000] <zcorpan> annevk: done [09:08:02.0000] <annevk> done what? [09:09:00.0000] <zcorpan> showed the input data [09:09:01.0000] <annevk> ah [09:09:02.0000] <annevk> does it matter though that browsers run them? [09:10:00.0000] <zcorpan> no, don't think so [09:10:01.0000] <annevk> zcorpan, btw iframe.contentWindow.onerror = function(foo,bar,baz) { return false } [09:10:02.0000] <annevk> might prevent the error from appearing [09:10:03.0000] <zcorpan> it's some other reason why it doesn't work in firefox [09:10:04.0000] <zcorpan> ok [09:12:00.0000] <zcorpan> xhr only works on the same domain, right [09:12:01.0000] <zcorpan> might need a server side script to include external tests [09:12:02.0000] <annevk> yeah, same-origin [09:15:00.0000] <Philip`> If the external tests were in a format that was valid JS, you could include them with <script src> [09:16:00.0000] <zcorpan> well, they're not. :) [09:16:01.0000] <Philip`> Or if you could change the external tests to be in a format that was valid JS :-) [09:17:00.0000] <zcorpan> seems simpler to write a server-side wrapper for this [09:17:01.0000] <Philip`> but I guess the point of it being external is that it's external and out of your control [09:17:02.0000] <annevk> zcorpan, how about a document.write() version? [09:18:00.0000] <zcorpan> annevk: ? [09:18:01.0000] <annevk> zcorpan, instead of iframe.src = do iframe.contentDocument.open(); iframe.contentDocument.write(testdata); etc. [09:18:02.0000] <annevk> that's how the live-dom-viewer works [09:19:00.0000] <zcorpan> ah [09:19:01.0000] <zcorpan> ok [09:21:00.0000] <zcorpan> it doesn't fire a load even then. but i guess i could make it work. what's the benefit? [09:21:01.0000] <annevk> works in IE [09:21:02.0000] <annevk> just copy some of the live-dom-ivewer logic [09:21:03.0000] <annevk> should be doable [09:24:00.0000] <zcorpan> works in firefox with that change [09:25:00.0000] <zcorpan> and opera 9.2 [09:27:00.0000] <zcorpan> ie only wants to load the first test [09:30:00.0000] <annevk> that's an improvement [09:32:00.0000] <zcorpan> "childNodes is null or not an object" [09:32:01.0000] <zcorpan> for (var i = 0; i < node.childNodes.length; i += 1) { [09:34:00.0000] <annevk> hmm [09:34:01.0000] <zcorpan> ah [09:34:02.0000] <zcorpan> contentDocument -> contentWindow.document [09:34:03.0000] <annevk> whoa [09:35:00.0000] <annevk> that's supposed to be equivalent [09:35:01.0000] <Philip`> It's kind of irritating when you're trying to write tests to help interoperability between browsers, but then you can't even write a script to run the tests without hitting non-interoperability issues between every browser... [09:35:02.0000] <zcorpan> now it works in ie [09:35:03.0000] <zcorpan> Philip`: yeah [09:35:04.0000] <zcorpan> but it outputs everything on one line [09:36:00.0000] <zcorpan> \n -> \r\n ? [09:36:01.0000] <annevk> yeah [09:37:00.0000] <zcorpan> YAY! [09:37:01.0000] <zcorpan> :D [09:37:02.0000] <zcorpan> doesn't work in safari though [09:38:00.0000] <annevk> hmm [09:38:01.0000] <annevk> blame mjs :p [09:38:02.0000] <zcorpan> othermaciej: yt? :) [09:39:00.0000] <annevk> IE fails everything because of its fixed <title> [09:41:00.0000] <annevk> zcorpan, the test output numbers don't match the test input numbers [09:41:01.0000] <annevk> zcorpan, it seems that way [09:41:02.0000] <zcorpan> the output numbers is 1 greater right? [09:42:00.0000] <annevk> hmm, IE and Opera seem to be one off [09:42:01.0000] <zcorpan> yeah [09:42:02.0000] <zcorpan> it's correct [09:42:03.0000] <zcorpan> the first test is empty [09:42:04.0000] <zcorpan> .split(/\n*#data\n/m) [09:42:05.0000] <annevk> so why are they one off? [09:43:00.0000] <annevk> IE saying it's 24 and Opera claiming it's 25... [09:43:01.0000] <zcorpan> "foobar".split("foo") // ["", "bar"] [09:44:00.0000] <zcorpan> i guess i could remove the first entry from the array but it seemed simpler to ignore it [09:45:00.0000] <zcorpan> they might do different things with split() [09:47:00.0000] <zcorpan> yep [09:47:01.0000] <zcorpan> javascript:(function(){var arr = "#data\nfoo".split(/\n*#data\n/m); alert(arr.length); })() [09:49:00.0000] <Philip`> (Is it intentional that that will match strings like "foo#data\n"?) [09:49:01.0000] <zcorpan> not really [09:50:00.0000] <Philip`> (That was what the ^ in /\n*^#data\n/m was for :-) ) [09:51:00.0000] <zcorpan> (fixed) [09:53:00.0000] <zcorpan> ok, fixed the number of tests issue [09:56:00.0000] <zcorpan> ie passes test 101 [09:57:00.0000] <annevk> <html><head><title></title><body></body></html> ... [09:58:00.0000] <zcorpan> amazing that i got the format right on the first try. i didn't even look at the documentation [09:58:01.0000] <annevk> hixie designed it [09:59:00.0000] <zcorpan> Hixie: if you could get people use html right on the first try... ;) [09:59:01.0000] <annevk> I'm quite disappointed by the large number of fails [09:59:02.0000] <annevk> Hopefully that will improve in due course by either updating the tests or the spec [10:00:00.0000] <zcorpan> annevk: in which browser? [10:00:01.0000] <annevk> all? [10:00:02.0000] <Philip`> Could you make a table of the results for all browsers, to see which tests don't match any browser's reality? [10:01:00.0000] <zcorpan> i guess [10:01:01.0000] <zcorpan> but there are more tests [10:01:02.0000] <zcorpan> i want to figure out how to run those [10:01:03.0000] <zcorpan> first food [10:01:04.0000] <annevk> another for loop around the xhr [10:01:05.0000] <annevk> or just merge everything on the server [10:01:06.0000] <zcorpan> yeah [10:02:00.0000] <annevk> it would be good if you at some point comitted this back to html5lib [10:03:00.0000] <annevk> then we can make the acid-parser test [10:03:01.0000] <zcorpan> perhaps i don't need to do server side magic [10:03:02.0000] <annevk> other things that might be nice: 1) some colors on the result page to make it easier to scan 2) collapsable items on the result page [10:04:00.0000] <annevk> especially the second is useful given the large number of tests that fail :) [10:04:01.0000] <zcorpan> /me makes notes [10:05:00.0000] <annevk> zcorpan, did you "fix" the difference in counting with IE? [10:07:00.0000] <annevk> I'm thinking that it might be useful to include a bunch of <title></title> in a lot of testcases to make the IE results more usable [10:08:00.0000] <Philip`> Could you post-process the results to ignore ones where the only difference is the "| <title>" line? [10:09:00.0000] <Philip`> (or mark as uninteresting, rather than entirely ignore them) [10:10:00.0000] <annevk> that'd be another option [10:10:01.0000] <annevk> prolly better [10:32:00.0000] <rubys> any html5lib developers awake here? :-) [10:36:00.0000] <annevk> /me is [10:37:00.0000] <annevk> zcorpan ported html5lib tests to browsers [10:37:01.0000] <annevk> see http://simon.html5.org/temp/html5lib-tests/wrapper.html for tree-construction/tests1 [10:38:00.0000] <rubys> Anne, can you do me a favor and svn update and then run: [10:38:01.0000] <rubys> python parse.py --tree "<p><b><i><u></p><p>X" [10:41:00.0000] <annevk> get two <p> siblings the second containing the same as the first plus "X" as deepest child [10:43:00.0000] <rubys> nevermind, I found my problem (the actual test2 #45 actually has a new line in the middle) [10:43:01.0000] <rubys> sorry to bother you [10:43:02.0000] <annevk> no worries [11:01:00.0000] <annevk> hsivonen, how would this UUID stuff work? [11:02:00.0000] <annevk> hsivonen, what I'm interested in is annotating the test results for tree construction with that information [11:28:00.0000] <met_> http://ydnar.vox.com/library/post/webkit-team-adds-audio-video-support.html [11:35:00.0000] <zcorpan> annevk: i did [11:40:00.0000] <othermaciej> zcorpan: what's the problem? [12:51:00.0000] <zcorpan> othermaciej: http://simon.html5.org/temp/html5lib-tests/wrapper.html doesn't work in safari (for windows). don't know why [12:52:00.0000] <othermaciej> I was hoping it would be obvious but there's a whole lot of script there [12:53:00.0000] <zcorpan> would the web inspector help me debug? how do i activate it on windows? [12:53:01.0000] <othermaciej> zcorpan: it's got a "parse error" and a "maximum call stack size exceeded" [12:53:02.0000] <othermaciej> the JavaScript error console (in the debug menu) would tell you that [12:53:03.0000] <zcorpan> don't see a debug menu [12:54:00.0000] <othermaciej> yeah, you have to turn it on with a command-line switch [12:54:01.0000] <othermaciej> google for "safari windows debug menu" [12:54:02.0000] <othermaciej> I don't remember the details at the moment [12:54:03.0000] <billmason> http://rakaz.nl/item/enabling_the_debug_menu_on_safari_for_windows [12:54:04.0000] <zcorpan> ok, will do [12:54:05.0000] <othermaciej> is dom2string going to recurse to a depth of more than 99? [12:54:06.0000] <zcorpan> billmason: cheers [12:54:07.0000] <othermaciej> if so, that's probably the problem [12:55:00.0000] <othermaciej> we should probably relax that stack limit [12:55:01.0000] <zcorpan> it might [12:57:00.0000] <zcorpan> but i don't think that's the problem, it didn't work with one test with the input "Test" either [13:03:00.0000] <zcorpan> is "run" a preserved word? [13:05:00.0000] <hasather> zcorpan: no [13:05:01.0000] <zcorpan> what is the SyntaxError: Parse Error on line 1 in http://simon.html5.org/temp/html5lib-tests/wrapper.html ? [13:16:00.0000] <zcorpan_> works when i have only 1 test in the file [13:16:01.0000] <zcorpan_> 2 tests as well [13:17:00.0000] <hasather> seems to be a problem with the test that looks like this: "<script><div></script></div><title><p></title><p><p>" [13:20:00.0000] <hasather> zcorpan: that seems to be the only test that has unallowed content in a script element [13:22:00.0000] <jgraham> zcorpan_: TestData in http://html5lib.googlecode.com/svn/trunk/python/tests/support.py contains the testcase parser that html5lib uses (you have to pass it a list of the section headings e.g. ("data", "errors", "document")) [13:22:01.0000] <jgraham> (that was a FYI if you have any more issues with the test format) [13:28:00.0000] <zcorpan_> hasather: ah. yes of course [13:29:00.0000] <zcorpan_> jgraham: thanks [13:31:00.0000] <zcorpan_> othermaciej: seems like the problem is the number of recursions indeed. not sure if i can/will work around that [13:34:00.0000] <othermaciej> zcorpan_: I'm sure your function could easily be rewritten not to be recursive [13:34:01.0000] <zcorpan_> othermaciej: can you do it for me? :) [13:36:00.0000] <othermaciej> zcorpan_: don't have time to actually test, but I can tell you roughly how to do it [13:37:00.0000] <othermaciej> you're effectively doing a preorder tree traversal [13:37:01.0000] <othermaciej> you can do that with a stack, or since you have parent pointers just with a simple loop [13:38:00.0000] <othermaciej> when entering a node, you do the entry processing (print node itself, increment indent) [13:39:00.0000] <othermaciej> then you check if it has children - if so, enter the first child [13:39:01.0000] <zcorpan_> (the live dom viewer has the same problem btw) [13:39:02.0000] <othermaciej> if no children, check for a next sibling - if present, do exit processing for current node and enter the next sibling [13:40:00.0000] <othermaciej> if no next sibling, do exit processing for this node, then continue from the parent as if it had no children (i.e. exit to the parent's next sibling or parent's parent and so forth) [13:40:01.0000] <zcorpan_> ok. thanks [13:41:00.0000] <othermaciej> we use this style of tree traversal internal to webcore all the time [13:41:01.0000] <othermaciej> in fact, we have an internal traverseNextNode function that does it [13:41:02.0000] <othermaciej> (although that doesn't visit a node again when exiting, which I think you want) [13:42:00.0000] <zcorpan_> yeah, i want to catch misnested nodes in ie [13:43:00.0000] <zcorpan_> or perhaps that's just a check before you process the children [15:06:00.0000] <zcorpan_> hmm. the question is how to handle misnested nodes. [15:17:00.0000] <Philip`> zcorpan_: Output "FAIL" and then stop? [15:36:00.0000] <othermaciej> /me facepalms at continuing mail from Rob Burns [15:38:00.0000] <zcorpan_> Philip`: yeah... but the recursive algorithm could output the entire tree anyway, which is nicer for debugging [15:38:01.0000] <Philip`> I don't quite see how trying to publish one document after four months counts as "rushing" [15:39:00.0000] <Hixie> <td id="m1" axis="mainMenu" headers="m1" valign="top"> [15:39:01.0000] <Hixie> sigh [15:39:02.0000] <zcorpan_> Hixie: hah [15:40:00.0000] <othermaciej> now that's some compact information [15:40:01.0000] <othermaciej> Hixie: is that the sort of thing causing all the cycles? [15:44:00.0000] <Hixie> it's at least one cause [15:44:01.0000] <Hixie> i'm going to rerun the survey with a special hack to count those sperately [15:47:00.0000] <Hixie> i really have to stop e-mailing public-html [16:04:00.0000] <zcorpan_> annevk: are there tests on things like </p>, <html></p>, <head></p>, etc, in the html5lib tests? [16:05:00.0000] <zcorpan_> public-html starts to get pretty high traffic again [16:16:00.0000] <Hixie> typical longdesc: http://130.83.47.128/masterfiles/descriptions/logo.txt [16:16:01.0000] <webben> typical of what? [16:17:00.0000] <Hixie> typical of the longdescs that are actually not completely bogus [16:17:01.0000] <Hixie> (that's from http://130.83.47.128/vv/ss/comments/13.205.en.tud) [16:17:02.0000] <Hixie> (the first one on my list of "interesting" uses) [16:18:00.0000] <webben> not a terrible longdesc I suppose [16:18:01.0000] <webben> distinguishing between alternate text and explaining what the image is [16:18:02.0000] <Hixie> <a href="http://www.google.co.jp/"> [16:18:03.0000] <Hixie> <img src="http://blog2.fc2.com/2/20century/file/Logo_20s.gif" alt="Google" height="75" width="143" longdesc="http://www.google.co.jp/logos.html" /></a> [16:18:04.0000] <webben> shame they didn't explain what the logo actually depicts [16:19:00.0000] <Hixie> /me bangs head against table [16:19:01.0000] <jgraham> zcorpan_: I can't see any tests for those cases (htough I thought anne had checked some in...). If you want to add some I can add you to the html5lib members list [16:20:00.0000] <webben> Hixie: maybe the text is helpful for that one [16:20:01.0000] <webben> /me can't read Japanese [16:20:02.0000] <webben> oh wait, Google can read Japanese [16:20:03.0000] <Philip`> But that logo.txt longdesc is in the wrong language for that page (which I guess could be because the site's developers had no way to actually test longdesc so it fell out of sync with the page contents)... [16:20:04.0000] <Hixie> from that en.tud page, lower down: [16:20:05.0000] <Hixie> <img src="/masterfiles/images/blue10x1.gif" alt="[Abstandhalter]" title="[Abstandhalter]" longdesc="/masterfiles/descriptions/abstandhalter.txt"> [16:20:06.0000] <Hixie> guess what the "/masterfiles/descriptions/abstandhalter.txt" file contains [16:20:07.0000] <webben> Philip`: good point [16:23:00.0000] <Hixie> i think i've yet to see an actual useful, value use of longdesc="" in this study [16:24:00.0000] <Hixie> bbl [16:24:01.0000] <webben> Hixie: you should include uses of D-links [16:24:02.0000] <webben> since for a long time D-link was used as a longdesc alternative based on poor support for longdesc [16:26:00.0000] <webben> see also: http://www.w3.org/TR/WCAG10-HTML-TECHS/#long-descriptions [16:26:01.0000] <webben> it would be interesting to know how many links in the wild have a value of D or [D] or similar [16:26:02.0000] <webben> s/value/text content/ [16:28:00.0000] <Philip`> /me wants to rewrite his own rubbish survey tool to be slightly less rubbish, so he can get vaguely interesting numbers about common features [16:29:00.0000] <webben> how many links ... and what they point to, of course [16:29:01.0000] <jgraham> /me wants a google-scale cluster to run a survey on [16:30:00.0000] <jgraham> and a pony, of course [16:31:00.0000] <jgraham> But seriously, Philip`, it would be nice if your survey tool was more widely available. It would be even better if the parser was fast. I wonder if any of the HTML5-parser-in-C projects are going to produce something soon? [16:32:00.0000] <Philip`> At least my initial version taught me that SQLite is completely rubbish when you have concurrency - it kept throwing exceptions because the whole database was locked [16:32:01.0000] <Philip`> so I need to rewrite it with MySQL or something [16:34:00.0000] <Philip`> and I think it should do some simple crawling, rather than only looking at a fixed list of URLs, so it can find more stuff to look at [16:35:00.0000] <Philip`> (and a faster parser would definitely be useful :-) ) [16:37:00.0000] <Philip`> (A Java one would probably be as good as a C one) [16:39:00.0000] <bewest> sounds like a bunch of people are interested in some kind of survey tool available to the community [16:40:00.0000] <webben> Here's a good example of longdesc-as-long-alternative: http://www.fhwa.dot.gov/hfl/framework/04.cfm referring to http://www.fhwa.dot.gov/hfl/framework/longdesc.cfm#fig1 [16:40:01.0000] <bewest> purpose would be 2-fold, correct? 1.) survey useage of authoring techniques on the web. 2.) test parsers? [16:41:00.0000] <Philip`> 3.) Confirm whether Hixie's stats are reasonable, or if he's just making up all the numbers :-) [16:42:00.0000] <bewest> I've thought about doing this with ec2 and Alexa's web services [16:42:01.0000] <bewest> eg greptheweb, and MSR [16:42:02.0000] <bewest> alexa has crawled documents in s3 [16:43:00.0000] <bewest> but that costs money [16:44:00.0000] <zcorpan_> jgraham: sure. i might check in this browser port too [16:45:00.0000] <zcorpan_> othermaciej: rewrote the function to not be recursive but still get the same error in safari [16:45:01.0000] <bewest> Philip`: so you already have some kind of survey tool? how does it work? [16:46:00.0000] <Philip`> bewest: Ah, I wasn't aware of those things, though I tend to never consider anything that requires money :-) [16:47:00.0000] <bewest> yeah... [16:47:01.0000] <bewest> usually I don't either [16:47:02.0000] <bewest> except that I work at the company that makes those services [16:47:03.0000] <Philip`> It was just something simple for things like http://canvex.lazyilluminati.com/misc/copyright.html and http://canvex.lazyilluminati.com/misc/summary.html [16:48:00.0000] <Philip`> (and a few other things which I can't remember where I put) [16:48:01.0000] <Philip`> where I give it a list of a few thousand URLs (from Yahoo search results for arbitrary terms), and it just downloads them then parses them (with html5lib) and looks for certain stuff [16:49:00.0000] <Philip`> (and sort of does those things in parallel, if you run lots of copies of the program, except most of the processes keep dying because SQLite gets unhappy) [16:50:00.0000] <Philip`> (and then some pages cause quadratic behaviour in html5lib and you have to manually delete them from the database) [16:50:01.0000] <Philip`> (so it's all just horribly hacked together :-p ) [16:51:00.0000] <bewest> heh [16:52:00.0000] <othermaciej> zcorpan_: that's odd [16:52:01.0000] <othermaciej> zcorpan_: pointer? [16:53:00.0000] <zcorpan_> othermaciej: http://simon.html5.org/temp/html5lib-tests/wrapper.html [16:53:01.0000] <Hixie> webben: studying text contents is much harder for various reasons [16:54:00.0000] <webben> of course it's harder [16:54:01.0000] <webben> but given we're talking about what's basically a language for marking up text, such study is pretty critical [16:55:00.0000] <Hixie> be my guest :-) [16:57:00.0000] <othermaciej> zcorpan_: very confusing [16:57:01.0000] <othermaciej> zcorpan_: I'll try debugging it in a while - need to get coffee first [16:57:02.0000] <zcorpan_> othermaciej: ok [16:58:00.0000] <zcorpan_> man, i've really spent all day on this thing [16:59:00.0000] <Hixie> how does it feel to be paid to do this nonsense? :-) [16:59:01.0000] <jgraham> zcorpan_: You should now be able to commit to html5lib svn If you're committing tests that html5lib doesn't pass, it's really good to email html5lib-discuss⊙gc so people know there hasn't been a regression 2007-07-04 [17:00:00.0000] <zcorpan_> Hixie: feels great :) [17:00:01.0000] <zcorpan_> jgraham: ok. thanks [17:01:00.0000] <Hixie> hey i guess working for opera also means you get w3c member access [17:01:01.0000] <zcorpan_> yeah [17:01:02.0000] <Hixie> now you can see the crazyness you've previously only been able to imagine [17:02:00.0000] <jgraham> zcorpan_: I think you need to join the html5lib-discuss group to post to it btw. [17:02:01.0000] <Philip`> Are you being paid to work on this at 1am? :-) [17:02:02.0000] <zcorpan_> Philip`: yep :) [17:02:03.0000] <zcorpan_> Philip`: plus, i work from home [17:02:04.0000] <zcorpan_> my work day starts when i want and ends when i want [17:03:00.0000] <Dashiva> h4x [17:03:01.0000] <zcorpan_> which is usually when i wake up and when i go to bed, respectively [17:03:02.0000] <Dashiva> We have core time in Oslo [17:05:00.0000] <zcorpan_> Hixie: i read the pointers in http://ln.hixie.ch/?start=1172653243&count=1 but i haven't looked at other crazyness [17:05:01.0000] <Hixie> btw i'm going to be in oslo (though extremely tired) late next monday and early next tuesday [17:05:02.0000] <Hixie> i'll probably pop by the opera offices [17:06:00.0000] <zcorpan_> /me wonders if anyone will pop by the eskilstuna office [17:07:00.0000] <Dashiva> Just as I take two days off. I'm going to miss the munchkin playing, no doubt. [17:11:00.0000] <zcorpan_> anything interesting on public-html the past 24h? [17:14:00.0000] <Hixie> i just found this interesting tidbit: [17:14:01.0000] <Hixie> Tantek Çelik (Microsoft): We are in the XHTML WG. I am the representative; recently it has become clear that the priorities of the XHTML WG are different from our priorities. We would like to see the HTML 4 and XHTML 1.x versions resolved. Most of the folks in the WG are XHTML 2 and that is not a priority for us. [17:14:02.0000] <Hixie> from http://www.w3.org/2004/04/webapps-cdf-ws/minutes-20040601.html [17:14:03.0000] <Hixie> Steven Pemberton (W3C/CWI): If you want that done, you have to do it. [17:17:00.0000] <tantek> Thanks for the memory Hixie :) [17:17:01.0000] <tantek> yes, that workshop is where everything "blew up" as the kids say [17:17:02.0000] <Hixie> indeed [17:18:00.0000] <Hixie> but i didn't realise that steven actually told us to go do html5 [17:18:01.0000] <tantek> he didn't [17:18:02.0000] <tantek> he told you to go do html5, and me to go do microformats [17:18:03.0000] <tantek> he just didn't realize he did ;) [17:18:04.0000] <tantek> and yes, you're welcome for the setup :) [17:19:00.0000] <Hixie> :-) [17:20:00.0000] <tantek> out of that workshop i was more convinced than ever that I had to leave microsoft and pursue microformats wherever there was support for them, knowing that you would have a pretty good handle on the HTML 4.x XHTML 1.x updates. [17:24:00.0000] <tantek> Hixie, it wouldn't be inaccurate for you to even state that Microsoft's representative to that workshop called for work on HTML4 and XHTML1 along a set of requirements remarkably similar to those adopted by WHATWG. [17:24:01.0000] <Hixie> indeed [17:24:02.0000] <tantek> thereby confirming all the conspiracy theorists suspicions that WHATWG is merely doing Microsoft's bidding. ;) [17:25:00.0000] <Hixie> oh the modern conspiracy theory is that it's google's attempt at getting around the problem that converting adsense to xhtml2 would be too hard [17:25:01.0000] <zcorpan_> LOL [18:23:00.0000] <webben> Hixie: more vaguely sane long descriptions: http://www.tsu.ox.ac.uk/info/report.php [18:24:00.0000] <webben> (although I think they could have madeuse of data tables) [18:25:00.0000] <webben> another example: http://docs.sun.com/source/817-5763/ [18:26:00.0000] <webben> in general, look through this search: http://www.google.co.uk/search?hl=en&q=%22long+description+for%22 for lots of longdesc examples [18:28:00.0000] <Hixie> my script uses the same source data as that search, basically [18:38:00.0000] <Philip`> /me never knew that IE supports <comment>...</comment> [18:39:00.0000] <Philip`> (Interestingly the text appears to be not in the DOM, but is in the innerHTML view) [20:09:00.0000] <Hixie> heh, i just noticed something about the press release the w3c put out when the charters were announced [20:10:00.0000] <othermaciej> yeah? [20:10:01.0000] <Hixie> it says: [20:10:02.0000] <Hixie> "With the chartering of the XHTML 2 Working Group, W3C will continue its technical work on the language at the same time it considers rebranding the technology to clarify its independence and value in the marketplace." [20:11:00.0000] <othermaciej> hah! [20:12:00.0000] <othermaciej> "dear xhtml2 wg, how is that rebranding coming along? love, the html wg" [22:37:00.0000] <hsivonen> annevk: I meant that when you've got a form control whose form pointer does not point to an ancestor and that doesn't have a form='' attribute pointing to the same node as the form pointer, generate an id attribute on the node pointed by the form pointer if there isn't an id already and generate a corresponding form='' attribute on the form control [22:37:01.0000] <hsivonen> annevk: this fails if the <form> element already has an id='' attribute and the value of that attribute is a duplicate [22:49:00.0000] <hsivonen> othermaciej: Also I suggested the iterative DOM traversal algorithm to zcorpan, but does IE guarantee that the algorithm terminates? I think it doesn't. [22:53:00.0000] <othermaciej> hsivonen: oh - good point, I'm not sure how it works in the face of a non-tree [22:53:01.0000] <othermaciej> hsivonen: I'm not sure what exactly IE's non-tree DOMs look like [22:55:00.0000] <hsivonen> othermaciej: this is one significant reason why a non-tree DOM sucks [22:58:00.0000] <othermaciej> hsivonen: I have seen a look of shocked realization on the faces of JS library authors when they heard that IE can do that [22:59:00.0000] <othermaciej> "that explains those weird infinite loop bugs!" [22:59:01.0000] <othermaciej> do you actually know what it does though? [22:59:02.0000] <othermaciej> is it just the parent pointer that can be wrong? you could work around that with a stack [23:02:00.0000] <Hixie> see my blog [23:02:01.0000] <Hixie> entries starting with "Tag Soup" iirc [23:02:02.0000] <Hixie> bbl [23:09:00.0000] <hsivonen> othermaciej: not sure. The edges between EM and ADDRESS in the Mac IE 5 DOM with Hixie's case look like the ingredients of an infinite loop: http://hsivonen.iki.fi/soup-dom/ (I can't test IE6 here.) [23:14:00.0000] <othermaciej> good lord, that's insane [23:14:01.0000] <othermaciej> /me blames tantek [23:15:00.0000] <othermaciej> child pointer indicates presence in the childNodes array? [23:16:00.0000] <hsivonen> Philip`_: If you'd like to run surveys with something that runs as native instructions at run time, I suggest figuring out which Java spider framework can easily take a plugged HTML5 parser [23:17:00.0000] <othermaciej> hsivonen: it looks like traversal via firstChild/nextSibling/parentNode would not infinite loop on that, but it would miss some elements [23:17:01.0000] <othermaciej> wait, maybe it wouldn't even iss anything [23:17:02.0000] <hsivonen> Philip`_: the parser needs to get a java.io.InputStream, the value of the HTTP charset (null if absent), a SAX ErrorHandler and a SAX ContentHandler (for extracting links) [23:17:03.0000] <hsivonen> othermaciej: child is firstchild [23:18:00.0000] <hsivonen> othermaciej: IIRC [23:18:01.0000] <othermaciej> it can't be only firstChild, since you can't have multiple firstChilds [23:18:02.0000] <hsivonen> othermaciej: oh. right. can't rememeber anymore what I did [23:20:00.0000] <othermaciej> some nodes would be visited more than once I guess, w/ tree-based traversal [23:21:00.0000] <othermaciej> we have some ex-MacIE folks on our team, I could ask them what they were thinking :-) [23:21:01.0000] <hsivonen> Philip`_: the Internet Archive spider looks promising, but they seem to rely on the JVM running on Linux with a particular thread impl [23:22:00.0000] <hsivonen> Philip`_: btw, I wouldn't run a Java spider that used java.net.URLConnection without socket timeouts [23:22:01.0000] <hsivonen> I have more confidence in Commons HTTP Client [23:23:00.0000] <hsivonen> I haven't checked which HTTP client the Internet Archive spider uses [00:02:00.0000] <Hixie> hm, xmlns="...xhtml" usage has gone up to 20% according to the survey i just did (of several billion html docs) [00:03:00.0000] <Hixie> from about 15% about a year ago [00:07:00.0000] <Hixie> and 41% have no DOCTYPE, down from about 50% at the same time iirc [00:08:00.0000] <Hixie> 19% have the XHTML1 DOCTYPE, 11% have a 4.01 Transitional DOCTYPE with no URI [00:09:00.0000] <Hixie> 6% are 4.01 Transitional with URI [00:28:00.0000] <Hixie> and the 0.014% of XHTML usage has gone up to 0.062% [00:29:00.0000] <hsivonen> Hixie: real XHTML? as in a/x+x [00:30:00.0000] <hsivonen> Amazon EC2 was mentioned earlier. any actual experience with using it? [00:39:00.0000] <othermaciej> /me is surprised to hear there's that many sites that give the finger to IE; or is that conditionally served? [00:42:00.0000] <Hixie> hsivonen: yeah [00:42:01.0000] <Hixie> othermaciej: might be conditional, dunno [00:43:00.0000] <hsivonen> Hixie: does Google unify multiple representations of a page if it finds foo with Content-Location, foo.html and foo.xhtml? [00:46:00.0000] <Hixie> duplicate elimination happens before my script gets hold of the data, yes, but i don't know exactly what gets counted as a dupe [00:48:00.0000] <hsivonen> hmm. looks like Google has changed its behavior again and now http://hsivonen.iki.fi/thesis/html5-conformance-checker over .html or .xhtml. IIRC, it returned http://hsivonen.iki.fi/thesis/html5-conformance-checker.xhtml a couple of weeks ago [00:50:00.0000] <hsivonen> s/now/now prefers/ [00:51:00.0000] <Hixie> it probably treats them separately and picks one based on which has the most "relevance" [02:16:00.0000] <hsivonen> http://www.w3.org/mid/886507.69879.qm⊙wmryc [02:19:00.0000] <annevk> http://lists.w3.org/Archives/Public/www-validator/2007Jul/0011.html [02:19:01.0000] <zcorpan_> oh of course. writing your own dtd makes you validate. [02:20:00.0000] <annevk> it's true [02:20:01.0000] <annevk> it's just not very smart [02:21:00.0000] <zcorpan_> might be if you really use validation as qa check, and you don't want to flag files that have 1 error you already know about and have to have around [02:56:00.0000] <Lachy> Hixie, yt? [02:59:00.0000] <annevk> zcorpan_, http://simon.html5.org/temp/html5lib-tests/dom2string.js doesn't seem to handle attributes [03:00:00.0000] <zcorpan_> annevk: oops [03:05:00.0000] <zcorpan_> annevk: fixed [03:10:00.0000] <Hixie> Lachy: yo [03:11:00.0000] <Lachy> Hey Hixie, Marcos and I are working on the XBL Primer, and we're trying to come up with a concise description of what a template is. Any suggestions? [03:12:00.0000] <Hixie> it's some markup that will be used to render the bound element, i guess [03:12:01.0000] <Lachy> so far we have "A template is used to control the presentation of a document", but we want to say something about how it reorders content in the DOM, without altering it, using shadow trees, but without using technical terms [03:12:02.0000] <annevk> interesting, Opera returns uppercase attribute names [03:13:00.0000] <zcorpan_> annevk: yeah. [03:13:01.0000] <Hixie> Lachy: good luck [03:13:02.0000] <Lachy> thanks [03:13:03.0000] <Hixie> Lachy: my best attempt is what's in the spec [03:13:04.0000] <Hixie> Lachy: in the note in the definition of <template> [03:14:00.0000] <annevk> "A template defines the building blocks for the subtree of the bounding element." [03:14:01.0000] <Lachy> yeah, that's the problem :-) [03:15:00.0000] <Lachy> hmm. we could try and work something like that into it. [03:16:00.0000] <annevk> just say something and then illustrate it with some "easy" to grasp examples [03:16:01.0000] <Lachy> yeah, that's the idea [03:19:00.0000] <zcorpan_> hm. opera can have cdata nodes in the dom. how should i output those? [03:19:01.0000] <zcorpan_> "<![CDATA[ " + current.nodeValue + " ]]>" ? [03:21:00.0000] <annevk> yeah [03:24:00.0000] <zcorpan_> done [03:30:00.0000] <Hixie> i'm instrumenting my html parser to report how many times it clones nodes in the AAA and inline-reconstruction algorithms [03:30:01.0000] <Hixie> anything else i can instrument while i'm at it? [03:31:00.0000] <Hixie> hsivonen? annevk? jgraham? [03:32:00.0000] <annevk> we have some XXX comments about tokenization... [03:33:00.0000] <annevk> specifically which cases in states are the most frequent [03:33:01.0000] <annevk> so you can optimize those cases in some way... [03:34:00.0000] <annevk> other interesting things might be <form> nodes <form> where nodes does not include </form> and then do some browser testing on those more complicated examples from real world pages [03:36:00.0000] <Hixie> eh? [03:37:00.0000] <Hixie> i could emit for each tokeniser state the most common tokens seen, i guess [03:38:00.0000] <Hixie> it would make the parser way slower, but it could work [03:38:01.0000] <annevk> it's probably not very important [03:38:02.0000] <annevk> tree mutation and node duplication are more interesting [03:39:00.0000] <annevk> would be fun to count how often you encounter <canvas> nowadays :) [03:41:00.0000] <Hixie> i've looked at elements in a separate study [03:42:00.0000] <Hixie> canvas didn't appear in the top 200 [03:43:00.0000] <zcorpan_> /me suspects that some <canvas>es are only output with script [03:52:00.0000] <annevk> k [03:52:01.0000] <zcorpan_> hmm. dom core doesn't specify an order for .attributes ... i need to sort them myself [03:53:00.0000] <annevk> I wonder if we have actually sorted them... [03:55:00.0000] <zcorpan_> opera and safari don't seem to sort them. ie seems to sort them alphabetically. firefox alphabetically reversed. [03:55:01.0000] <Hixie> ok i'm going to emit a list of total count of all the tokens [03:56:00.0000] <Hixie> for each kind of token in each insertion mode [03:56:01.0000] <Hixie> anything else? [03:56:02.0000] <Hixie> last chance before i set this off and go to bed... [03:56:03.0000] <annevk> ah, I actually meant characters I think [03:56:04.0000] <annevk> but that may be too expensive [03:56:05.0000] <Hixie> characters? [03:56:06.0000] <annevk> during tokenization [03:56:07.0000] <Hixie> how do you mean? [03:57:00.0000] <zcorpan_> see how often ">" (with quotes) appears in doctypes or bogus comments [03:57:01.0000] <annevk> so you can optimize a particular tokenization state [03:57:02.0000] <Hixie> oh i thought you wanted to optimise the tree constructor states [03:58:00.0000] <Hixie> zcorpan_: hm [03:58:01.0000] <hsivonen> Hixie: hmm. I guess there might be merit in instrumenting how often IN_BODY code runs with the actual insertion mode being one of the table modes other than caption and cell [03:58:02.0000] <Hixie> annevk: surely for the tokeniser it makes no difference since you'll just do table dispatch [03:58:03.0000] <annevk> IE has this nice <!- .... ">" more comment ... > [03:59:00.0000] <zcorpan_> Hixie: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2007-June/012078.html [03:59:01.0000] <Hixie> hsivonen: you mean an average of times per page that the inbody state is invoked when the state is not inbody, incell, or incaption? [03:59:02.0000] <hsivonen> Hixie: is it even important to clone DOM nodes instead of using the attributes on the original token and creating a new DOM node using those? [03:59:03.0000] <Hixie> zcorpan_: yeah i'm just trying to work out how to do it [03:59:04.0000] <hsivonen> that is, do you really want to close concurrent attribute changes? [04:00:00.0000] <Hixie> i don't think the dom supports having attributes shared between nodes [04:01:00.0000] <hsivonen> Hixie: yes, the average times the table states actually fall though to in body [04:01:01.0000] <hsivonen> through [04:04:00.0000] <Hixie> ok, i'm logging the actual insertion mode when my inhead, inbody, and intable functions are invoked [04:04:01.0000] <hsivonen> Hixie: since that only happens in non-conforming cases and Java doesn't have goto, I let the code hit some useless branches when the fall-through happens [04:04:02.0000] <Hixie> hopefully they map exactly to the spec [04:06:00.0000] <Hixie> zcorpan_: for DOCTYPEs we don't care, right? since what the spec does matches IE anyway? [04:06:01.0000] <hsivonen> (A smart compiler could fix this, but I doubt javac or hotspot are that smart) [04:06:02.0000] <annevk> yeah, DOCTYPEs match IE [04:06:03.0000] <annevk> it's just that IE uses the same mode for bogus comments as they use for DOCTYPEs it seems [04:07:00.0000] <Hixie> i'm gonna bail on working out what characters are most common in each tokeniser mode, on the principle that there are so few states it hardly matters anyway [04:07:01.0000] <zcorpan_> Hixie: not quite. the spec doesn't handle <!doctype ">" > [04:07:02.0000] <annevk> oops [04:07:03.0000] <zcorpan_> Hixie: the spec only matches ie if the > is in an actual FPI or SPI [04:08:00.0000] <hsivonen> Hixie: oh yeah, one more thing for optimization: whether an average stack node is tested for being in a group of element names more than once [04:09:00.0000] <Hixie> well i didn't find any DOCTYPEs with > in their name part, at least not enough to appear on my radar in the scan of doctypes i did earlier this week [04:09:01.0000] <hsivonen> Hixie: that is, whether it makes sense to have a boolean on a stack node that says for example whether the node is a table context sentinel [04:09:02.0000] <zcorpan_> Hixie: ok [04:09:03.0000] <zcorpan_> Hixie: isn't that because > in the name part terminates the doctype? :) [04:10:00.0000] <hsivonen> Hixie: or whether a stack node should have a flag for phrasing OR formatting OR div OR address [04:10:01.0000] <Hixie> sorry, i meant " [04:10:02.0000] <zcorpan_> ah [04:10:03.0000] <zcorpan_> ok [04:10:04.0000] <Hixie> hsivonen: so what i did with that is that each well-known tag name has an integer associated with it (like an atom) and for each special feature that the parser cares about i used a bit [04:11:00.0000] <Hixie> i used 24 bits for these flags [04:12:00.0000] <Hixie> so for example all the <hx> elements have the number 0x400008400000 [04:12:01.0000] <hsivonen> Hixie: my strategy is to intern well-known names so that testing against one name is a comparison of memory addresses but still testing if a name is in a group means as many comparisons as names names in group [04:12:02.0000] <Hixie> the leading 0x4 is "element" (as opposed to text node), the 8 is "hx node", and the 4 is "closes <p> elements" [04:13:00.0000] <Hixie> yeah so my parser never compares tag names once they're in the stack [04:13:01.0000] <Hixie> doing string compares was prohibitively expensive [04:13:02.0000] <hsivonen> interesting [04:13:03.0000] <Hixie> i just use the integer that says whether a node is a text node, comment node, doctype, etc, to say what special kind of element it is too [04:14:00.0000] <Hixie> and so everything is always exactly one & and exactly one == [04:15:00.0000] <annevk> and you construct those numbers during tokenization? [04:15:01.0000] <hsivonen> I guess I'll complete the tree builder with my current approach and will leave a tokenizer-assigned bitfield as a later interface-breaking optimization [04:16:00.0000] <Hixie> annevk: whenever i create a node, i create it withe the appropriate constant [04:16:01.0000] <Hixie> the tokeniser doesn't know about these [04:16:02.0000] <Hixie> it emits tokens with tag names [04:16:03.0000] <Hixie> it's only when i create nodes that i use these [04:16:04.0000] <hsivonen> Hixie: ooh. so "closes p" is not assigned in the tokenizer after all [04:16:05.0000] <annevk> ok, so the tree construction stage does use string comparison? [04:17:00.0000] <Hixie> yeah, tokens are string-compared [04:17:01.0000] <Hixie> but i think my compiler might be atomising them [04:17:02.0000] <Hixie> so it's not such a big deal [04:19:00.0000] <hsivonen> I'm currently using the generic String.intern(), but I figured how to make a fast interning function with knowledge about the possible names (three-level switch: length, last char, second to last char) [04:19:01.0000] <hsivonen> but typing that is too much work [04:19:02.0000] <hsivonen> so I guess I'll write a small Python program that generates Java code for the interning function at some point [04:20:00.0000] <Hixie> zcorpan_: given that only IE does this, I'm going to assume it's not a big deal. I can investigate it in more detail later maybe. Don't want to hack the parser too much tonight. :-) [04:20:01.0000] <Hixie> beware that the names are unbounded [04:20:02.0000] <Hixie> <fiv> is an element name that is seen in the wild, e.g. [04:20:03.0000] <Hixie> you don't want to treat it as <div> [04:21:00.0000] <Hixie> especially in your case :-) [04:22:00.0000] <hsivonen> Hixie: of if the length is > 2, the prefix needs to be compared, too, to make sure [04:22:01.0000] <hsivonen> Hixie: still better than an intermediate copy to java.lang.String [04:23:00.0000] <hsivonen> Hixie: the idea is to weed out all but one prefix candidate [04:23:01.0000] <Hixie> ah cool [04:28:00.0000] <Hixie> right sleep time [04:28:01.0000] <Hixie> nn [04:28:02.0000] <hsivonen> nn [05:18:00.0000] <zcorpan> the parser test format doesn't distinguish between an "" attrubute and a text node "=" (e.g.: <p "">"="</p>) [05:18:01.0000] <zcorpan> | <p> [05:18:02.0000] <zcorpan> | ""="" [05:18:03.0000] <zcorpan> | ""="" [05:19:00.0000] <annevk> that's not too relevant though [05:19:01.0000] <annevk> but an interesting edge case [05:20:00.0000] <zcorpan> perhaps " in text nodes should be escaped with \? [05:20:01.0000] <annevk> why? [05:21:00.0000] <zcorpan> so you can tell the difference between attributes and text nodes. but perhaps it doesn't matter [05:22:00.0000] <annevk> just don't mix them [05:24:00.0000] <annevk> also, if you make mistakes in your parser at that level you've got bigger issues :) [05:25:00.0000] <zcorpan> which parser? [05:25:01.0000] <annevk> HTML parser? [05:25:02.0000] <zcorpan> ah. yeah. [05:34:00.0000] <Philip`_> hsivonen: I think it might be reasonable to keep the spidering and parsing completely separate, so they could be different languages (depending on what useful tools are available for), just communicating asynchronously through some database (which is probably necessary anyway to support parallelism) [05:47:00.0000] <hsivonen> Philip`_: I've never done wide-scale spidering. however, I would think that sticking stuff in a database in between would slow things significantly compared to the parser reading from the real socked when the spidering happens (possible with e.g. Commons HttpClient) [05:49:00.0000] <hsivonen> to me, it seems that the obvious way to implement this is to have a number of worker threads that run both the parser and the HTTP client and request URLs and report results to a centralized thread-safe coordination object [05:49:01.0000] <hsivonen> s/socked/socket/ [05:51:00.0000] <hsivonen> as for tools in different languages, if you can't make everything run on a JVM, communicating through a local socket is more efficient that having an persistence layer in between [05:51:01.0000] <hsivonen> I am assuming here that we don't want to keep copies of the spidered bytes [05:52:00.0000] <Philip`_> It would be useful to allow the thing to run on multiple computers to spread the load out, and then it would need some network communication for coordination instead of just threads [05:53:00.0000] <hsivonen> Philip`_: it might be worth investigating if instead of running a spider we should run on EC2 and read the latest Alexa spireding dump from S3 [05:53:01.0000] <Philip`_> (I'm kind of thinking about multiple computers on a LAN with a fast internet connection, so the network wouldn't be a bottleneck when spreading stuff out) [05:54:00.0000] <hsivonen> I poked around the Amazon docs but I didn't find out if the Alexa dump can be easily read by URL instead of by handle obtained from Alexa search results [05:54:01.0000] <Philip`_> That sounds like a useful thing to investigate [05:55:00.0000] <hsivonen> Philip`_: anyway, you definitely want to keep the JVM up and running with multiple threads reading from sockets instead of invoking it again and again [05:55:01.0000] <hsivonen> I don't know where the other end of those sockets should be [06:00:00.0000] <Philip`_> Perhaps the hardest bit is working out which pages to look at so that the sample is biased sensibly - I assume normal spiders just try to grab as much stuff as possible, which is not useful since they'll spend far too long in a few large sites [06:01:00.0000] <hsivonen> yeah, I think in principle we want to look at the Web breadth first, but not just front pages [06:01:01.0000] <Philip`_> and I would expect it's not possible to grab a large enough sample to do something like PageRank to find the interesting pages [06:05:00.0000] <Philip`_> (though maybe it wouldn't be too rubbish to just use the process which the original PageRank is modelling, where you follow random links and have a ~15% chance of getting bored and jumping to some other arbitrary page) [06:07:00.0000] <hsivonen> cool. the IA crawler uses Commons HttpClient [06:18:00.0000] <hsivonen> Philip`_: I encourage you to take a look at http://crawler.archive.org/ [11:27:00.0000] <annevk> http://html5.org/parsing-tests/testrunner.htm [11:30:00.0000] <annevk> lots of browser backing for ignoring </head> [11:31:00.0000] <annevk> but I guess that was already known [11:32:00.0000] <annevk> I suppose next would be some prefs so you can ignore IE <title> insertions [12:21:00.0000] <jgraham> annevk: re: running python on my web server; the short answer is that I can't (that was in response to your message a few days ago) [12:43:00.0000] <annevk> jgraham, are you a registered user? [12:43:01.0000] <annevk> Philip`, zcorpan, you can now filter with http://html5.org/parsing-tests/testrunner.htm as well for IE specific quirks [12:46:00.0000] <annevk> /me wonders what tantek will do next [12:54:00.0000] <annevk> Setting the flag makes a lot more pass in IE and Opera. Mostly because IE messes up both DOCTYPE and inserts <title> and because Opera does not include DOCTYPE at all [12:55:00.0000] <annevk> It also helps some for Firefox which always uppercases the tag name in the DOCTYPE [12:56:00.0000] <jgraham> annevk: Of freenode? No [13:11:00.0000] <zcorpan> annevk: nice! [13:17:00.0000] <annevk> I fixed some further bugs and I'm going home now [13:18:00.0000] <annevk> I'll commit it tomorrow to one of the open source thingies we have [13:18:01.0000] <zcorpan> ok [13:18:02.0000] <annevk> now someone can write python scripts to iterate over those numbers browsers return... [13:28:00.0000] <Hixie> of the 50 or so sites I found with cycles in the headers="", all but three are government sites [13:42:00.0000] <mpt> How does that compare with the proportion of government sites without cycles in the headers? [13:42:01.0000] <mpt> (Not that I'm interested, it's just the basic "compared to what?" question) [13:54:00.0000] <Hixie> mpt: the fact that it's 50 basically means it's an insignificant number that have cycles [13:58:00.0000] <mpt> ok [13:59:00.0000] <Hixie> http://sixstar.cca.gov.tw/community/pages/01_about_people.php?CommID=1231&ID=1 [13:59:01.0000] <Hixie> it's so hard to argue that that is a valid use of headers="" [13:59:02.0000] <Hixie> sigh [14:00:00.0000] <Hixie> with my proposed heuristic for the top left cell, if they changed that into an actual table it would actually work fine with implied scope=s [14:03:00.0000] <hsivonen> Hixie: btw, shouldn't scope be down, up, right, left (not row/column) [14:04:00.0000] <hsivonen> Hixie: if you have to rows of headers where the upper row applies to the lower row but not vice versa, shoudn't scope be down instead of column? [14:06:00.0000] <hsivonen> An end tag whose tag name is one of: "p", "br" is weird to have in "in head noscript" [14:09:00.0000] <zcorpan_> hsivonen: why? [14:10:00.0000] <Hixie> hsivonen: the values come from html4 [14:10:01.0000] <hsivonen> zcorpan_: other stray end tags get ignored [14:10:02.0000] <hsivonen> Hixie: I know that excplicit one come from there but implicit ones don't have to [14:10:03.0000] <zcorpan_> hsivonen: not </p> or </br> [14:11:00.0000] <hsivonen> zcorpan_: yeah. like I said, weird [14:11:01.0000] <Hixie> hsivonen: there's only one implicit one, "auto", and it has no keyword [14:11:02.0000] <zcorpan_> hsivonen: not specific to in noscript in head though [14:14:00.0000] <Hixie> wow, some (very few) of the pages caused the AAA algorithm to create over 1000 clones for one stray end tag [14:16:00.0000] <hsivonen> Hixie: I hope that doesn't count as a reason to redesign the algorithm [14:16:01.0000] <Hixie> no, it's expected really [14:16:02.0000] <hsivonen> Hixie: what Safari does on those pages? what about Firefox or Opera? [14:16:03.0000] <Hixie> no idea, dunno which pages it is [14:17:00.0000] <Hixie> 355 billion invokations of the AAA algorithm resulted in zero clones [14:18:00.0000] <Hixie> 715 thousand invokations resulted in one clone [14:18:01.0000] <Hixie> er sorry [14:18:02.0000] <Hixie> 715 million [14:18:03.0000] <Hixie> 55 million resulted in 2 clones [14:18:04.0000] <Hixie> 10 million, 3 clones [14:18:05.0000] <Hixie> 3 million, 4 clones [14:19:00.0000] <Hixie> 800 thousand, 5 clones [14:19:01.0000] <Hixie> 460000 6 clones [14:19:02.0000] <gsnedders> Hixie: 1 billion == 1 million million or 1 thousand million? [14:19:03.0000] <Hixie> 237000 7 clones [14:19:04.0000] <Hixie> US billion, thousand million, 1e9 [14:20:00.0000] <Hixie> less than 100,000 instances of hte AAA algorithm resulted in 11 clones [14:20:01.0000] <Hixie> i guess i should have gotten the total count [14:20:02.0000] <hsivonen> Hixie: cool. are you going to post this to public-html? [14:20:03.0000] <Hixie> to make this a useful number [14:20:04.0000] <Hixie> in due course [14:21:00.0000] <Philip`> /me finds that writing the HTML5 tokeniser as an OCaml data structure and then printing C++ from it is perhaps slightly crazy, but doesn't seem entirely infeasible (though I've only got about a quarter of two states implemented so far...) [14:22:00.0000] <Hixie> wait this can't be right, according to separate data, there were only 900,000,000 invokations of the AAA [14:22:01.0000] <Hixie> oh, wrong number [14:22:02.0000] <Hixie> phew [14:35:00.0000] <hsivonen> Hixie: I forgot to ask you this when you asked about instrumentation but did you record data on stack depth? [14:36:00.0000] <Hixie> yeah but it's biased because my parser bails after 64k elements [14:37:00.0000] <hsivonen> Hixie: what did you find? [14:37:01.0000] <Hixie> http://freechal.com/banilaB8 was one of the worst pages [14:37:02.0000] <Hixie> (that my parser didn't bail on) [14:37:03.0000] <hsivonen> Hixie: so you use a hard limit as well ;-) [14:38:00.0000] <Hixie> well i run out of bits to store the pointer in after 64k [14:38:01.0000] <hsivonen> the pointer? [14:38:02.0000] <Hixie> i have 64 bits to store the length of the text node, the offset of the text node, the pointer to the parent element, and some bits for e.g. if it's a comment node or a text node [14:39:00.0000] <Hixie> and the bit that points to the parent element has to also sit alongside the 24 bits i use for the element flags [14:39:01.0000] <Hixie> anyway [14:40:00.0000] <Hixie> the 50th percentile of the pages my parser didn't bail on had 16 or fewer nodes in its stack at the biggest point [14:40:01.0000] <Hixie> 99th percentile had 40 or less [14:40:02.0000] <Hixie> 100th percentil had 64k [14:40:03.0000] <hsivonen> Hixie: thanks [14:40:04.0000] <Hixie> i can get you more later but i really have to go shower [14:41:00.0000] <hsivonen> /me does new StackNode[64] [14:41:01.0000] <Hixie> heh [14:53:00.0000] <Hixie> incidentally, the reason i used 64k as my limit is that i'm having to balance the number of text nodes with the number of elements [14:53:01.0000] <Hixie> right now my text nodes are 32k max each [14:53:02.0000] <Hixie> i could make them 16k each but have 128k elements, but it turns out that, anecdotally, to process any significantly greater number of pages, i'd have to add many many bits [14:53:03.0000] <Hixie> like 4, or 5 [14:54:00.0000] <Hixie> whereas there are many pages with more than 32k characters at once [14:54:01.0000] <Hixie> i suspect that the pathological cases with deep stacks are all cases of bad interactions with AAA [14:57:00.0000] <Philip`> /me wonders why Opera says "XML parsing failed" when loading http://html5.org/parsing-tests/data/tests3.dat [14:58:00.0000] <Philip`> Oh, how odd, it works when I reload... [15:01:00.0000] <zcorpan_> Philip`: because it thinks anything loaded through XHR is XML [15:01:01.0000] <zcorpan_> Philip`: and then remembers that [15:01:02.0000] <Hixie> bbl [15:03:00.0000] <Philip`> zcorpan_: Ah, that seems to make as much sense as could be expected [15:08:00.0000] <hsivonen> do these statements have a significant difference "If the stack of open elements has an element in scope with the same tag name as that of the token, then pop elements from this stack until an element with that tag name has been popped from the stack." and "If the stack of open elements has an element in scope with the same tag name as that of the token, then pop elements from this stack until the stack no longer has an element with the same tag nam [15:09:00.0000] <Hixie> yes [15:09:01.0000] <hsivonen> ok [15:09:02.0000] <Hixie> it differs if the stack has two elements of that name in it [15:09:03.0000] <Hixie> e.g. [15:09:04.0000] <Hixie> <div><div> [15:09:05.0000] <Hixie> however typically the second wording is only used for elements that can't be twice on the stack [15:09:06.0000] <Hixie> in which case it doesn't matter [15:10:00.0000] <hsivonen> Hixie: how do you get two nested <p> elements is scope? [15:10:01.0000] <Hixie> i don't think you can [15:11:00.0000] <hsivonen> Hixie: ok. thanks. I'll send email. Every time you use a different wording for no good reason, I have to stop and think. :-) [15:12:00.0000] <Hixie> thinking is good! :-) [15:13:00.0000] <Hixie> bbl 2007-07-05 [18:51:00.0000] <Philip`> Does http://canvex.lazyilluminati.com/misc/imagedata.html crash Opera 9.5? (I can only test via Opera Mini, which just says "Internal server error", which sounds potentially worrying but not very informative) [18:54:00.0000] <othermaciej> does Opera Mini handle events? [18:55:00.0000] <othermaciej> and scripting? [18:56:00.0000] <Philip`> It seems to, as long as you don't use setInterval and don't expect it to wait for distant timeouts [18:57:00.0000] <Philip`> (i.e. it can handle scripting and events and stuff while the page is loading, for some definition of 'loading' that I haven't quite worked out, though then it justs sends a static copy to your phone) [18:57:01.0000] <Philip`> *just [18:59:00.0000] <othermaciej> so script runs at load time but not afterwards? [19:00:00.0000] <Philip`> Yes (as far as I can tell) [19:01:00.0000] <othermaciej> (I'm playing with the Opera Mini simulator) [19:01:01.0000] <Philip`> (since it basically opens the page in Opera on their servers, then at some point it decides it's got enough and transmits a non-interactive compressed snapshot, I think) [19:01:02.0000] <Philip`> (Me too, since my real phone is far too rubbish :-) ) [19:03:00.0000] <Philip`> I got it to run ~100 canvas tests in iframes on a single page, and that (eventually) worked correctly with all the scripting and loading and stuff, but it wouldn't let me correctly press the buttons to submit the test results, so I had to do that via a hard-coded timer :-( [22:11:00.0000] <mpt> "For example, don’t put a 100 x 100 image in a 10 x 10 <image> element." -- unintentionally hilarious iPhone developer docs [22:21:00.0000] <mpt> Ah, interesting: "ensure that width * height * 4 < 8 MB" ... so apparently this <image> element is for some new kind of file that has widths and heights measured in MBm⁻². [22:29:00.0000] <mpt> But hooray for this: "Don’t use JavaScript movie controls to play video on iPhone. iPhone supplies its own controls." [00:51:00.0000] <om_out> mpt: width * height * 4 bytes [00:59:00.0000] <hsivonen> Hixie: http://www.w3.org/mid/A0F10D3A-A679-4BB1-8844-684FBFDB94F6⊙if is there a way for the stack have td or th in such a position that generating implied end tags could close the scope (except for the EOF case)? [01:16:00.0000] <annevk> hehe, iPhone docs promote <image> :) [01:16:01.0000] <hsivonen> annevk: URL? [01:16:02.0000] <annevk> http://developer.apple.com/iphone/designingcontent.html [01:17:00.0000] <annevk> click on "Use Standards and Tried-and-True Design Practices" and then search [01:19:00.0000] <othermaciej> I'll report a bug [01:23:00.0000] <hsivonen> annevk: did you try to optimize redundant steps in tree building at all or did you just follow the spec to letter even if it asked you to traverse the stack more than absolutely necessary? [01:24:00.0000] <annevk> there are some small optimizations [01:24:01.0000] <annevk> but not much [01:24:02.0000] <annevk> doesn't really matter a lot in Python I've the feeling [01:25:00.0000] <annevk> well, in the beginning we tried to reduce function calls by using dictionaries instead of token objects and such and that worked pretty well [01:25:01.0000] <hsivonen> annevk: what's your take on the the ability of "generate end tags" to close the scope? [01:25:02.0000] <annevk> but now with the treebuilder abstraction we gained a lot of function calls again :( [01:26:00.0000] <annevk> http://html5lib.googlecode.com/svn/trunk/python/src/html5lib/treebuilders/_base.py search for "generateImpliedEndTags" [01:27:00.0000] <annevk> although I now see it has some XXX comment that we never hit apparently... [01:27:01.0000] <hsivonen> annevk: I was thinking of doing the exact same thing: just popping [01:27:02.0000] <hsivonen> I guess I have to send another email [01:28:00.0000] <annevk> Hixie recently added a bunch of table elements there [01:28:01.0000] <annevk> I'm not sure what that was about [01:29:00.0000] <hsivonen> annevk: I think that was about EOF [01:29:01.0000] <hsivonen> I am not sure that it is a good idea to put them in that part of the spec [01:29:02.0000] <hsivonen> annevk: does Python turn tail recursion into looping? [01:30:00.0000] <annevk> dunno [01:30:01.0000] <annevk> http://html5.org/tools/web-apps-tracker?from=964&to=965 [01:31:00.0000] <annevk> is that for <table><tbody><tr><td><p><tbody> or something? [01:32:00.0000] <annevk> doesn't seem like it, that already works [01:33:00.0000] <hsivonen> the only case where I see those mattering is the EOF case [01:33:01.0000] <annevk> example markup? [01:34:00.0000] <annevk> /me reads http://en.wikipedia.org/wiki/Tail_recursion and understands we might be able to optimize stuff a bit [01:36:00.0000] <annevk> hmm, seems only to matter if it calls itself a lot [01:38:00.0000] <annevk> hsivonen, I don't see how it matters for EOF either [01:38:01.0000] <annevk> hsivonen, you always get a single error and that can't be avoided, because </table> is never implied [01:39:00.0000] <hsivonen> annevk: good point. will you send email or shall I? [01:40:00.0000] <annevk> you're already going pretty good with your review, you do it ;) [01:41:00.0000] <hsivonen> annevk: ok [01:45:00.0000] <met_> http://www.bluishcoder.co.nz/2007/07/patch-for-video-element-support-in.html [01:47:00.0000] <Hixie> hsivonen: i don't know (re <td>s) [01:48:00.0000] <hsivonen> Hixie: that doesn't sound good ;-) [01:49:00.0000] <Hixie> the table elements were added because it seemed wrong that they not be on the list [01:49:01.0000] <Hixie> i honestly don't know if they'll ever get hit [01:49:02.0000] <Hixie> i want to say no [01:49:03.0000] <Hixie> but i'm not sure how to prove it [01:50:00.0000] <Hixie> i'll be back in about 12 hours [01:50:01.0000] <Hixie> (and possibly briefly in a few minutes) [01:50:02.0000] <hsivonen> Hixie: I'd prefer to pretend that we proved that they never get hit [01:52:00.0000] <annevk> <tbody> gets ignored outside <table>, inside <table> it is handled explicitly in each table phase [01:52:01.0000] <annevk> I wonder if the same goes for <td> and <tr> [01:53:00.0000] <annevk> I'm pretty sure they never get hit either [01:53:01.0000] <annevk> lets test that with the tests we got... [01:54:00.0000] <hsivonen> annevk: tr, td and th start tags are ignored "in body" [01:54:01.0000] <annevk> indeed [01:54:02.0000] <annevk> if I remove "td", "th", "tr" from our generate implied end tags algorithm nothing goes wrong [01:55:00.0000] <annevk> because the table phases already deal with them [01:55:01.0000] <hsivonen> annevk: the end tags seem to fall under "An end tag token not covered by the previous entries", but that seems wrong [01:55:02.0000] <annevk> only "dd", "dt", "li", "p" are important [01:55:03.0000] <annevk> actually, if I remove "p" nothing fails either... [01:55:04.0000] <annevk> /me ponders [01:56:00.0000] <hsivonen> annevk: removing p seem wrong [01:56:01.0000] <hsivonen> hmm. perhaps the An end tag token not covered by the previous entries [01:56:02.0000] <hsivonen> still does the right thing "in body" for cell ends [01:56:03.0000] <annevk> ah, the problem is that we don't count errors I suppose [01:57:00.0000] <annevk> as removing <li> also "works" [01:57:01.0000] <annevk> they are catched by the alternative algorithm that generates parse errors and therefore still generate the same tree... [01:58:00.0000] <hsivonen> IIRC, in fragment cases some "act as if" consistently produce 0 or 2 errors. I think I may have changed some of those to emit 0 or 1 errors [02:20:00.0000] <annevk> how does "If the stack of open elements has a p element in scope, then generate implied end tags, except for p elements." even make sense? [02:20:01.0000] <annevk> it says that when you encounter </p> [02:21:00.0000] <annevk> however, you will never generate an implied end tag for <dd>, <dt> or <li> or any o the table cells as they can never be between the <p> that is in scope and the current node [02:29:00.0000] <annevk> innerHTML wouldn't change anything for that either [02:40:00.0000] <hsivonen> annevk: excellent point [02:41:00.0000] <hsivonen> annevk: I'll email again. [03:00:00.0000] <hsivonen> should the list of active formatting elements be implemented as an array or as a linked list? [03:01:00.0000] <hsivonen> is it searched much more often than a node is removed from the middle? [03:05:00.0000] <hsivonen> Hixie: was you stat for "invocations of the AAA" exactly this? (that is, is the answer array?) [03:06:00.0000] <hsivonen> oh that counted cloning nodes [03:06:01.0000] <hsivonen> Hixie: did you count changing the size of the list by deleting stuff in the middle? [03:12:00.0000] <hsivonen> annevk: does the algorithm for "in body" "An end tag token not covered by the previous entries" make sense to you? [03:12:01.0000] <hsivonen> step 2.3. makes no sense to me [03:14:00.0000] <annevk> what's 2.3? [03:14:01.0000] <hsivonen> Pop all the nodes from the current node up to node, including node, then stop this algorithm. [03:15:00.0000] <hsivonen> First: Initialise node to be the current node (the bottommost node of the stack). [03:15:01.0000] <hsivonen> ok makes sense [03:15:02.0000] <hsivonen> # [03:15:03.0000] <hsivonen> If node has the same tag name as the end tag token, then: [03:15:04.0000] <hsivonen> # [03:15:05.0000] <hsivonen> Generate implied end tags. [03:15:06.0000] <hsivonen> ok, makes sense [03:15:07.0000] <hsivonen> now Pop all the nodes from the current node up to node, including node, then stop this algorithm. [03:15:08.0000] <annevk> oh, I was looking at the wrong algorithm duh [03:16:00.0000] <hsivonen> how could /node/ not already be popped or be the current node? [03:16:01.0000] <hsivonen> shouldn't that be a simple unconditional pop [03:17:00.0000] <hsivonen> umm. not unconditional but pop if the current node is /node/ [03:17:01.0000] <annevk> <foo><bar><baz></foo> [03:18:00.0000] <annevk> would pop <baz> and <bar> and <foo> [03:18:01.0000] <hsivonen> annevk: sorry for being dense, but I don't understand what step 2.3. has to do with it [03:19:00.0000] <hsivonen> annevk: isn't step 4. what causes that? [03:19:01.0000] <hsivonen> actually, step 2.1. makes no sense to me, either [03:19:02.0000] <annevk> indeed [03:20:00.0000] <annevk> I wonder how we managed to implement it :) [03:21:00.0000] <hsivonen> time to send mail again [03:21:01.0000] <annevk> we implemented what was mentioned [03:22:00.0000] <annevk> which doesn't make much sense :( [03:22:01.0000] <zcorpan_> can you provide a markup snippet that highlights the difference? [03:23:00.0000] <hsivonen> zcorpan_: the difference? [03:23:01.0000] <annevk> <foo>...</foo> is the only case that 2.1 covers [03:23:02.0000] <annevk> in which case you don't need to generate implied end tags etc. [03:23:03.0000] <annevk> you just need to pop [03:23:04.0000] <zcorpan_> ah [03:23:05.0000] <zcorpan_> indeed [03:23:06.0000] <hsivonen> lunch [03:23:07.0000] <hsivonen> then email [03:53:00.0000] <annevk> I think I'm done with public-html for the day [04:03:00.0000] <hsivonen> annevk: did you my email about the catch-all end tag case, though? did it make sense? [04:12:00.0000] <annevk> yes [04:15:00.0000] <hsivonen> ok. thanks. [04:35:00.0000] <annevk> having said that, I'm not sure the algorithm is correct [04:35:01.0000] <annevk> oh wait [04:36:00.0000] <annevk> hsivonen, it does make sense [04:36:01.0000] <annevk> /me just realized [04:36:02.0000] <annevk> hsivonen, because of step 5 [04:36:03.0000] <annevk> hsivonen, and step 4 [04:36:04.0000] <annevk> hsivonen, they change "node" [04:37:00.0000] <annevk> so say you have <dialog><dd></dialog> [04:37:01.0000] <annevk> you get to 4 [04:38:00.0000] <annevk> node becomes <dialog> [04:38:01.0000] <annevk> </dd> is implied [04:38:02.0000] <annevk> done [04:38:03.0000] <annevk> however, it's questionable whether this is correct given that current UAs don't generate implied end tags in those cases... [04:42:00.0000] <hsivonen> annevk: well, this certainly looks like something that needs another look by Hixie [04:45:00.0000] <annevk> it seems that for <foo> </foo> it doesn't make much sense [04:46:00.0000] <annevk> well, it seems that you can optimize for <foo> </foo> [04:46:01.0000] <annevk> it does make sense in a twisted way [04:48:00.0000] <hsivonen> annevk: looks like you aren't done for the day after all :-/ [05:04:00.0000] <hsivonen> I'd like to try to avoid ad hominems, but I'm intrigued that the insistence on a small improvement with great cost comes from an economist [05:17:00.0000] <annevk> that discussion is just painful [05:27:00.0000] <zcorpan_> authors provide fallback to <object>? [05:28:00.0000] <zcorpan_> /me won't join that discussion [05:33:00.0000] <annevk> hsivonen, yeah :-/ [05:33:01.0000] <annevk> these people should join some browser development project and learn about the web a little bit [05:46:00.0000] <zcorpan_> annevk: did you check in the parser-tests thing somewhere? [05:48:00.0000] <annevk> not yet [05:48:01.0000] <annevk> /me was fixing html5lib [05:48:02.0000] <zcorpan_> ok [05:48:03.0000] <annevk> you want it checked in somewhere? [05:49:00.0000] <zcorpan_> would be nice, in case i feel like improving it [05:50:00.0000] <zcorpan_> no rush though [05:54:00.0000] <annevk> it's in the html5 project now [05:54:01.0000] <annevk> including a README that says to modify the tests from html5lib, not the ones included [05:54:02.0000] <annevk> karlUshi, seen http://html5.org/parsing-tests/testrunner.htm already? [05:54:03.0000] <annevk> karlUshi, you might like it [05:55:00.0000] <Philip`> /me wonders if anyone really cares what input like &#4294967366; gets parsed into [05:56:00.0000] <annevk> FFFD [05:56:01.0000] <annevk> U+FFFD [05:56:02.0000] <Philip`> Is it worth having tests for that kind of thing? (Or are there ones already?) [05:56:03.0000] <Philip`> (Firefox gets it wrong and says "F") [05:57:00.0000] <Lachy> I wonder why it does that [05:57:01.0000] <annevk> maybe a limit [05:57:02.0000] <Philip`> (and so does my non-serious not-really-implemented tokeniser) [05:57:03.0000] <annevk> we have tokenizer tests [05:58:00.0000] <Philip`> Probably by doing "int n; ... n = n*10 + (next_char - '0')" or something and not caring about overflow [05:58:01.0000] <Lachy> looks like it's a limit of 1 0000 0000 base 16 [05:58:02.0000] <annevk> Opera and IE get it right [05:59:00.0000] <Philip`> FF also parses &#4294967295; into #4294967295; [06:00:00.0000] <annevk> oops [06:00:01.0000] <Philip`> /me doesn't expect this is a likely place for real-world interoperability concerns [06:00:02.0000] <annevk> I suppose that explains how much time reverse engineering costs and that it isn't really worth checking what other browsers do all the time [06:01:00.0000] <hsivonen> if there's anything long about longdesc, it is the email threads [06:01:01.0000] <annevk> :p [06:02:00.0000] <hsivonen> Philip`: that's why you should have an integer overflow guard in your loop that consumes NCRs [06:02:01.0000] <hsivonen> /me has one [06:02:02.0000] <Philip`> I just have a TODO comment stuck in there :-) [06:02:03.0000] <Philip`> and I have another similar comment telling me to implement the non-numeric entity things too [06:02:04.0000] <hsivonen> Philip`: which programming language? [06:03:00.0000] <hsivonen> Philip`: Ocaml? [06:03:01.0000] <Philip`> but I'm not particularly interested in making things actually work at the moment [06:03:02.0000] <Philip`> OCaml generating C++ [06:03:03.0000] <hsivonen> cool [06:03:04.0000] <Philip`> (Also OCaml generating .dot files so I can make nice graphs of the tokeniser state transitions) [06:03:05.0000] <annevk> we solved it by having a try statement around the string to int conversion [06:04:00.0000] <hsivonen> if (value < 0) { [06:04:01.0000] <hsivonen> value = 0x110000; // Value above Unicode range but within int [06:04:02.0000] <hsivonen> // range [06:04:03.0000] <hsivonen> } [06:05:00.0000] <Philip`> /me just wants to see what's possible when you have the tokeniser algorithm as a data structure that you can process, instead of being English text or unprocessable program code [06:05:01.0000] <hsivonen> (value is signed) [06:10:00.0000] <annevk> Philip`, will you consider implementing all the other fancy stuff as well? [06:10:01.0000] <annevk> or just tokenizing? [06:14:00.0000] <Philip`> That depends on how impossible the rest of it looks :-) [06:15:00.0000] <annevk> by the time Hixie addresses hsivonen's comments nobody will have to think about it anymore :p [06:15:01.0000] <Philip`> The tokeniser is fairly straightforward, since you can just represent the whole thing as a dozen state variables and some functions that match certain states and have transitions into new states [06:15:02.0000] <annevk> now I think of it, that might make it too boring for some! [06:16:00.0000] <Philip`> (The tree construction looks more complex than that, though I haven't looked at it in any detail) [06:16:01.0000] <annevk> tree construction is actually similar [06:16:02.0000] <annevk> although currently it has this concept called insertion mode which makes it look more complicated [06:16:03.0000] <annevk> you can actually implement it as a bunch of states as well [06:17:00.0000] <annevk> the difference being that you have some other set of variables and pass tokens around instead of characters [06:18:00.0000] <Philip`> Would I be right in thinking the only way the content model flag can change outside the tokeniser is when explicitly emitting a start tag? [06:19:00.0000] <annevk> yeah [06:19:01.0000] <annevk> hsivonen, removing "td", "th" and "tr" from generate implied end tags does indeed not give any parse error differences [06:20:00.0000] <annevk> hsivonen, removing "p", however, gives 45 [06:21:00.0000] <hsivonen> Philip`: it's just that start tags "in body" have a lot of stuff to type [06:22:00.0000] <annevk> /me is amazed at Robert's ability to not understand [06:28:00.0000] <Philip`> /me reaches the bogus comment state, and finds that it totally doesn't match his way of writing the algorithm [06:29:00.0000] <annevk> markup open declaration did? [06:30:00.0000] <annevk> you should be able to implement those as functions I guess; separate from the states [06:30:01.0000] <Philip`> The problem is that it sounds like it needs to look backwards and know what happened before that state was reached [06:31:00.0000] <Philip`> The markup declaration open state is just after the bogus comment state, so I haven't got that far yet :-) [06:33:00.0000] <annevk> don't you have a character queue or something? [06:34:00.0000] <annevk> then you just make sure the right chars are on the stack before switching to the state [06:36:00.0000] <hsivonen> Philip`: you may find my impl useful to look at [06:41:00.0000] <annevk> zcorpan_, in case you missed it: http://html5.googlecode.com/svn/trunk/parser-tests/ [06:44:00.0000] <zcorpan_> annevk: saw it, cheers [06:45:00.0000] <Philip`> Oh, I think my confusion comes from e.g. "<?" transitioning to the bogus comment state after consuming the '?', whereas "<!x" transitions before consuming the 'x', and the BCS can't tell the difference [06:46:00.0000] <annevk> doesn't it say "unconsume" somewhere? [06:48:00.0000] <Philip`> Not that I can see [06:48:01.0000] <Philip`> but I can work around it by just moving the consumption around to the right places [06:50:00.0000] <hsivonen> Philip`: I think Hixie cut corners when writing the spec. I had a bug there that the unit tests revealed [06:50:01.0000] <hsivonen> Philip`: basically, you need to start filling the bogus comment buffer before you make the actual state transition [06:53:00.0000] <Philip`> "(If the comment was started by the end of the file (EOF), the token is empty.)" - isn't it also empty if the comment was started by a > character? [06:54:00.0000] <Philip`> Hmm, I'll wait until later to sort out the details and make it actually work properly and pass the tests :-) [06:54:01.0000] <Philip`> (since the current implementation is totally not executable, which makes it hard to test) [07:04:00.0000] <annevk> Philip`, yeah, then it's also empty [07:21:00.0000] <Philip`> http://canvex.lazyilluminati.com/misc/states.png - incomplete and quite possibly with bugs, but it looks kind of interesting [07:26:00.0000] <Philip`> /me should probably skip all the EOF bits since they're not very interesting and they make the diagram too complex [07:28:00.0000] <Lachy> in the whole fallback content thread, has anyone actually given a use case for needing fallback beyond plain text? All I've seen are unsupported claims that it's needed. [07:29:00.0000] <hsivonen> Philip`: cool. the diagram makes the transitions look more complex than they actually are [07:30:00.0000] <hsivonen> Philip`: in fact there are only two transitions that break a stack assumption [07:31:00.0000] <Philip`> hsivonen: Is that two when not counting all the reconsume-EOF-in-the-data-state ones? [07:31:01.0000] <hsivonen> Lachy: if you want to get rid of longdesc and move the essay about the Union Jack or the dress of Lord Cornwallis inline [07:32:00.0000] <hsivonen> Philip`: reconsume whatever in data state works as a stack transition [07:32:01.0000] <hsivonen> (see my code :-) [07:32:02.0000] <hsivonen> Philip`: just rewind the stack to the data state [07:32:03.0000] <Philip`> /me will try to finish these bits while still untainted, and then look at the code ;-) [07:33:00.0000] <Lachy> hsivonen: that union jack example isn't particularly significant, since that description is completely inappropriate for how the flag was used. [07:33:01.0000] <Philip`> (I'm not trying to do a practical implementation - mostly I just want pretty pictures and things) [07:33:02.0000] <hsivonen> html5lib and my code are under the MIT license, it's not like looking at AT&T code :-) [07:34:00.0000] <Philip`> I currently just want to represent the algorithm as described in the spec, disregarding the implementation details that everyone else worries about :-) [07:39:00.0000] <MikeSmith> No commit-watchers mail since 28 June ... have there really been no changes, or is the list broken? [07:39:01.0000] <hsivonen> MikeSmith: Hixie is doing research. no changes [07:39:02.0000] <MikeSmith> OK [07:39:03.0000] <MikeSmith> thanks [07:42:00.0000] <rubys> annevk: you there? [07:44:00.0000] <rubys> if you get a chance, can you look into removing from tests/test_parser.py the following line "if testName == "tests5": continue # TODO"? [07:45:00.0000] <hsivonen> ouch. the catch all end tag case "in body" has a set of 69 strings to test against... [07:47:00.0000] <hsivonen> perhaps the tokens should come with a clever bitfield after all... instead of just interning [07:49:00.0000] <hsivonen> or a lex sorted array with binary search. or something... [07:54:00.0000] <Philip`> Does Java let you do binary searches for (interned) strings based on something like a pointer, rather than slowly comparing characters? [07:55:00.0000] <Philip`> (I guess that might not be possible since the GC can move things around arbitrarily and won't maintain a consistent ordering, perhaps) [07:57:00.0000] <hsivonen> Philip`: no, you only get to compare memory addresses for equality [07:58:00.0000] <hsivonen> Philip`: however, I could have a hashtable that knew that all values are interned [07:59:00.0000] <hsivonen> for the time being, I'm treating anything that goes beyond interning name and doing "foo" == name || "bar" == name || ... as a premature optimization [08:02:00.0000] <Philip`> /me wishes OCaml had better error reports than simply "Syntax error" [08:12:00.0000] <Philip`> Oh, assuming there's never an EOF doesn't make the state transitions much simpler - there's only about three cases I can see where it makes a difference [08:23:00.0000] <MikeSmith> hsivonen - is it true that currently with html5lib, given an arbitrary HTML document as source that it can construct a DOM from successfully, that DOM can't necessarily be re-serialized as well-formed XML? [08:23:01.0000] <MikeSmith> Or anybody? [08:24:00.0000] <rubys> it is rare, but true [08:24:01.0000] <MikeSmith> (I realize html5lib is not hsivonen's implementation...) [08:24:02.0000] <MikeSmith> rubys - OK [08:25:00.0000] <rubys> it is possible to have entity or attribute names that aren't simple names, it is possible for comments to have two consecutive dashes in them, it is possible for strings to contain form feeds or other values that are illegal in XML. [08:25:01.0000] <MikeSmith> ah [08:26:00.0000] <Philip`> When I tried serialising a random collection of web pages as XML, a significant number (uh, I can't remember how much, but maybe 20% or so) became ill-formed XML [08:26:01.0000] <rubys> other things (like matching up open and close tags) are taken care of by html5lib, and so are the overwhelming majority of common errors. [08:26:02.0000] <rubys> 20% surprises me. [08:26:03.0000] <rubys> are these public pages? Can you share an example? [08:28:00.0000] <MikeSmith> but hsivonen's implementation (backend of his conformance checker), by its nature, is inherently capable of producing well-formed XML? [08:28:01.0000] <MikeSmith> is that true? [08:28:02.0000] <MikeSmith> I would think it'd need to be since he has XML tools in the toolchain for it [08:29:00.0000] <MikeSmith> or maybe not [08:29:01.0000] <Philip`> I never looked at the examples in any detail, so I'm not sure what the issues were, though I remember a few were just because of <!----------> [08:29:02.0000] <Philip`> http://www.toyota.com/ is an interesting one [08:29:03.0000] <Philip`> since it has <spacer type"block" width="1" height="1"></spacer> which gets parsed as an attribute with a " in its name [08:30:00.0000] <Philip`> http://krijnhoetmer.nl/irc-logs/whatwg/20070507#l-581 - hmm, apparently it was 25% [08:30:01.0000] <Philip`> (just using the top thousand Yahoo search results for some boring word, if I remember correctly) [08:31:00.0000] <rubys> html5lib has a sanitizer that removes unsafe or unknown markup. Our goal is to make that bullet proof. [08:31:01.0000] <Philip`> I don't know how many of those issues were just caused by the html5lib toxml() being not very good [08:32:00.0000] <Philip`> (Also I think some of the issues might have been that I didn't handle character encoding properly) [08:34:00.0000] <rubys> If you are interested in producing XML, I would recommend the dom treebuilder [08:38:00.0000] <Philip`> When I was looking at those things before, I was mostly interested in analysing real HTML documents and just avoiding the slowness of repeatedly parsing with html5lib by caching them in a nicer serialised format, but it seems XML isn't very suitable for that :-( [08:38:01.0000] <rubys> what type of analysis? [08:40:00.0000] <Philip`> Mainly looking for common usage of certain elements/attributes, like in http://canvex.lazyilluminati.com/misc/copyright.html and http://canvex.lazyilluminati.com/misc/summary.html [08:40:01.0000] <rubys> your requirements are terribly unique, and I would like to work towards making a bullet proof conversion (possibly lossy in cases like spaces in attribute names) possible, and would appreciate test cases towards that end. [08:40:02.0000] <Philip`> (and theoretically any other statistics on HTML documents, except I got distracted before getting around to scaling the system up to work on a reasonable sample) [08:41:00.0000] <Philip`> ((for quite small values of 'reasonable')) [08:45:00.0000] <annevk> his requirements are very relevant for the work the HTML WG and WHATWG are doing (fwiw) [08:45:01.0000] <annevk> although they should be met by having a fast html5lib [08:45:02.0000] <Philip`> I expect I'll get back to this analysis thing at some point, and I'll see if I can extract the cases that cause problems (since I expect it would be nice to be able to use standard XML tools on random documents safely, without having to stick an HTML frontend onto them) [08:46:00.0000] <rubys> a fast html5ib ... which ultimately means a port to C [08:46:01.0000] <rubys> annevk: can you scroll back and see my question about tests5? [08:46:02.0000] <annevk> yeah, saw that [08:47:00.0000] <annevk> thought they already worked [08:47:01.0000] <annevk> /me poners [08:47:02.0000] <annevk> /me ponders* [08:47:03.0000] <rubys> that test passes, except for error checks, which you just enabled. [08:47:04.0000] <rubys> no error is produced on EOF [08:47:05.0000] <Philip`> I'm trying to write the easy part of the parsing algorithm in a language-agnostic manner, so it'll be nice if that works out :-) [08:49:00.0000] <annevk> there should be no error either [08:49:01.0000] <annevk> seems like a simple mistake in the test [08:51:00.0000] <rubys> if the tests were changed, then 'next if test_name == "tests5" # TODO' can be removed from ruby/tests/test_parser.rb too [08:52:00.0000] <annevk> yeah, did all that a few minutes ago [08:53:00.0000] <rubys> 'all that'? You changed the ruby test? [08:53:01.0000] <annevk> oh, ruby [08:53:02.0000] <annevk> sorry [08:53:03.0000] <annevk> I haven't played with ruby at all [08:55:00.0000] <rubys> I'd work on a C port, but only if we had more people who were interested in maintaining the code. This business of multiple people making changes to the Python code and Sam ports the changes won't scale much further. [08:57:00.0000] <annevk> if we have a C version we can just make Python and Ruby bindings, no? [08:58:00.0000] <rubys> that could certainly be done [08:58:01.0000] <Philip`> It's nice to have pure Python/Ruby/etc versions when people are unable/unwilling to compile and install C modules [08:59:00.0000] <annevk> can't you make some .pyc version people can just use? [08:59:01.0000] <annevk> /me isn't really up to speed with C > Python mappings and how to work with them [09:00:00.0000] <Philip`> (hence things like XML::Sax::PurePerl) [09:02:00.0000] <Philip`> I think you probably need a .dll (or .so or whatever) if you want to use a C library in Python, and that will be specific to a certain processor architecture and OS and maybe other system libraries, which is a pain when people can't compile easily [09:02:01.0000] <annevk> hmm, fair enough [09:03:00.0000] <rubys> on the other hand, 99.99% of the people would choose to use a C binding to their favorite language over a native binding. [09:04:00.0000] <annevk> http://lists.w3.org/Archives/Public/www-archive/2007Jul/0010.html ... [09:05:00.0000] <annevk> rubys, people who care one bit about performance, indeed [09:06:00.0000] <annevk> also, C bindings to an HTML5 parser should just be included by default in Python, Ruby, Java, etc. [09:06:01.0000] <annevk> well, maybe not Java [09:07:00.0000] <Philip`> Perl too :-) [09:07:01.0000] <rubys> I'd also love to see the C parser actually used by products like Opera and/or Firefox. [09:07:02.0000] <rubys> they could have their own treebuilders, of course; but the parser could be the same. [09:09:00.0000] <Philip`> /me wishes he could remember how to compute transitive closures (in a functional language) [09:10:00.0000] <annevk> from what I heard from WebKit and Firefox architecture that might be quite tricky [09:11:00.0000] <rubys> I'm not familiar with WebKit, but I have taken a peek at Firefox. Don't see why it would be tricky (I know, I know, famous last words...) [09:12:00.0000] <annevk> /me needs /ignore for e-mail clients [09:13:00.0000] <annevk> rubys, maybe it's possible, they have done it for the XML parser after all... [09:15:00.0000] <rubys> exactly... there is a part in the logic where you take in an input stream and produce a custom DOM implementation. Obviously, the input stream and DOM may vary from product to product, as would the tokenizer/parser error handing, but the logic could be pluggable. [09:16:00.0000] <rubys> Imagine how nice it would be if Safari, Firefox, and Opera used the SAME tokenizer/parser? [09:16:01.0000] <annevk> hmm, no parsing bugs to exploit! [09:16:02.0000] <Philip`> They'd probably all use slightly different versions with different bug fixes, so it wouldn't be entirely perfect [09:17:00.0000] <rubys> perfect? No. But a dramatic improvement over today. [09:18:00.0000] <rubys> And each vendor is going to have to invest some work effort towards html5 compliance. This should reduce the work for everybody. [09:25:00.0000] <Philip`> Are vendors planning to replace their existing HTML parser with a shiny new HTML5 one, or are they planning to just receive lots of bug reports and make lots of small fixes until they pass most of the tests, or are they not planning anything yet? [09:31:00.0000] <annevk> I think WebKit is planning on fixing bugs [09:31:01.0000] <annevk> they're pretty close for most cases anyway [09:31:02.0000] <annevk> dunno about other browsers [09:37:00.0000] <Philip`> Hmm, the state transition graph gets a bit big when I split out all the different content models [09:53:00.0000] <Philip`> http://canvex.lazyilluminati.com/misc/states2.png [09:54:00.0000] <annevk> ouch [09:54:01.0000] <annevk> "HTML tokenizing. More trivial than it looks." [09:57:00.0000] <Philip`> I think that's overestimating the possible transitions a little, since it assumes that whenever a tag token (either start or end) is emitted it could end up in any of the four content models [09:58:00.0000] <Philip`> At least there's the nice DataState PLAINTEXT black hole at the bottom [09:58:01.0000] <annevk> :) [10:09:00.0000] <annevk> In the Live DOM Viewer in Internet Explorer the <!> sequence causes the DOM view to turn almost blank... [10:30:00.0000] <Philip`> It looks like my state transition thing agrees with the spec's comments about "This can only happen if the content model flag is set to the PCDATA state" etc, except for the bogus comment state where you have to do lots of slightly convoluted thinking to work out that it's correct [10:30:01.0000] <Philip`> though, should the (non-bogus) comment states state that they can only happen when PCDATA, or is that obvious when left unstated? [10:39:00.0000] <Philip`> (I suppose it should also be obvious that the only state you can be in with PLAINTEXT is the data state) [10:40:00.0000] <annevk> I'm not sure why the other cases actually state it, to be honest [10:40:01.0000] <annevk> It makes it just more confusing for the cases where it's not [10:46:00.0000] <zcorpan_> annevk: it's because comments where the leading "!--" and trailing "--" don't fit, you can't read .nodeValue in ie [10:46:01.0000] <zcorpan_> annevk: i solved that by using a try/catch in dom2string [10:47:00.0000] <zcorpan_> annevk: and emitting "<!-- -->" if reading .nodeValue fails [10:47:01.0000] <annevk> k [10:48:00.0000] <Philip`> Ooh, neat, the W3C validator says <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><title></title><table datapagesize=cheese><tr><td></table> is valid [10:48:01.0000] <annevk> hehe [10:48:02.0000] <zcorpan_> would be cool if the live dom viewer had an option to show the dom using dom2string_recursive [10:50:00.0000] <zcorpan_> Hixie: yt? [10:54:00.0000] <annevk> zcorpan_, the real feature would be to make a mashup of http://james.html5.org/parsetree.html and your script [10:54:01.0000] <annevk> zcorpan_, maybe just for the text input box [12:14:00.0000] <Philip`> The tokeniser is much easier when I don't worry about actually implementing it, since I can just add a command like AppendHyphenToCommentToken and use it without caring about what it does [12:15:00.0000] <Philip`> but I guess it'll all catch up with me when I do get around to the implementation bit :-( [12:19:00.0000] <zcorpan_> Philip`: you're writing pseudo-code? :) [12:22:00.0000] <Philip`> Yes :-) [12:23:00.0000] <Philip`> (in a form that can be transformed into real code) [12:23:01.0000] <Philip`> (but that just moves some of the work into the code that does the transformation) [12:24:00.0000] <Philip`> (but it's a good excuse to learn OCaml anyway) [12:42:00.0000] <Philip`> http://canvex.lazyilluminati.com/misc/states3.png - now with added doctype states, so I think it's got everything (and probably more bugs than before) [12:43:00.0000] <Philip`> Oops, that's still got the EOF transitions... [12:44:00.0000] <Philip`> Now it doesn't, so it's a bit prettier [12:49:00.0000] <Philip`> Actually, I should probably tell it about parse errors too, so I can see if it's much simpler for conforming content [12:50:00.0000] <zcorpan_> seems the algorithm in https://bugzilla.mozilla.org/attachment.cgi?id=188040 only has one flaw, which is before step 1: match the value against the list of color keywords [12:52:00.0000] <annevk> zcorpan_, nice interop mess [12:53:00.0000] <zcorpan_> now i'll just see which keywords are supported, and if that differs from the keywords supported in css [12:59:00.0000] <Philip`> http://canvex.lazyilluminati.com/misc/states4.png - hmm, it does look much cleaner when you don't allow parse errors [13:03:00.0000] <zcorpan_> wow. ie supports lightgrey but not lightgray. quite the opposite to all other gr(a|e)ys [13:04:00.0000] <zcorpan_> Philip`: you can't get into the bogus states if you don't allow parse errors, right? [13:04:01.0000] <Philip`> http://en.wikipedia.org/wiki/HTML_colors says lightgrey too [13:07:00.0000] <zcorpan_> could there be other keywords supported that aren't listed in css3-color ? [13:07:01.0000] <Philip`> zcorpan_: Yep - there's nothing leading into those states in the diagram, but I didn't bother stripping them out [13:07:02.0000] <zcorpan_> Philip`: ok [13:08:00.0000] <Philip`> zcorpan_: I believe I looked in IE's .exe for colour names, and it didn't have any that weren't the standard set which CSS3 and every other browser includes [13:08:01.0000] <zcorpan_> Philip`: ok. thanks [13:09:00.0000] <Philip`> Oh, that was IE3 [13:10:00.0000] <Philip`> but I don't think they've changed it since then [13:10:01.0000] <Philip`> since they just copied it from NN2 [13:11:00.0000] <Dashiva> Philip`: What if you colored the transition arrows depending on whether the transition requires a parse error or not? [13:11:01.0000] <annevk> might be interesting to test DarkSeaGreen [13:11:02.0000] <annevk> whether IE has the X11 or .Net impl [13:11:03.0000] <annevk> /me got that from the wikipedia page [13:12:00.0000] <zcorpan_> annevk: darkseagreen is in css3-color [13:12:01.0000] <Philip`> Dashiva: That sounds worth doing [13:12:02.0000] <zcorpan_> ah [13:12:03.0000] <Philip`> though what about transitions that can be both parse errors and not? [13:13:00.0000] <Dashiva> a third color, or both? [13:14:00.0000] <Philip`> Hmm, I'll just draw two arrows, because then I won't have to change my code :-) [13:14:01.0000] <annevk> some more arrows wouldn't hurt [13:14:02.0000] <annevk> it's not always clear what the direction is :) [13:15:00.0000] <Dashiva> Maybe put an arrowhead on the middle of the arrow too [13:16:00.0000] <Philip`> Hmph, colour PNGs are huge [13:17:00.0000] <zcorpan_> annevk: ie uses x11 [13:17:01.0000] <Philip`> http://canvex.lazyilluminati.com/misc/states5.png [13:20:00.0000] <Philip`> Hmm, I don't think I can make Graphviz draw arrow heads except at the end [13:22:00.0000] <annevk> zcorpan_, so how do you test which color is used? some color picker? [13:23:00.0000] <zcorpan_> annevk: .bgcolor returns the rgb color [13:24:00.0000] <zcorpan_> er, .bgColor [13:24:01.0000] <annevk> cool, automated testing [13:26:00.0000] <zcorpan_> http://simon.html5.org/test/html/parsing/color-attributes/keywords/ [13:28:00.0000] <zcorpan_> i haven't sent anything to the list about color attributes yet, have i [13:30:00.0000] <annevk> prolly not: http://www.google.com/search?q=inurl:whatwg-whatwg+color [13:31:00.0000] <Philip`> /me wonders if he could automatically generate tests to cover all the possible state transitions [13:31:01.0000] <annevk> in http://simon.html5.org/test/html/parsing/color-attributes/ you can change Opera to none too [13:31:02.0000] <annevk> Philip`, that'd be most useful [13:32:00.0000] <annevk> Philip`, format: http://html5lib.googlecode.com/svn/trunk/testdata/tokenizer/ pretty please :) [13:32:01.0000] <zcorpan_> annevk: ah. cool. [13:33:00.0000] <annevk> Philip`, or maybe in the tree construction format... [13:33:01.0000] <annevk> Philip`, that would prolly be useful too especially for testing browsers [13:34:00.0000] <Philip`> The tree construction format probably wouldn't work too well when I don't have a tree constructor, unless I'm missing some point... [13:35:00.0000] <annevk> ah, if you want to debug your own code, then no [13:36:00.0000] <Philip`> Ah, okay - I think it would be nice to have something I could use for just tokeniser tests [13:36:01.0000] <annevk> then use the funky json format :) [13:37:00.0000] <annevk> I wonder if that can be used in some meaningfull way on browsers too... prolly not [13:37:01.0000] <Philip`> though I don't know how to cope with the issue that the tree construction stage can affect the tokeniser's content model, when there's no tree construction stage [13:37:02.0000] <annevk> see escapeFlag.test and contentModelFlags.test [13:37:03.0000] <Philip`> Incidentally, "content model flag" is a confusing name since most flags don't have four states... [13:38:00.0000] <Philip`> Oh, right - that looks useful :-) [13:41:00.0000] <Philip`> Shouldn't the test format include attributes on end tags, since the tokeniser is meant to emit them? [13:42:00.0000] <annevk> the tokeniser doesn't emit them [13:43:00.0000] <annevk> Hixie, those stats on AAA are useful! thanks [13:43:01.0000] <Philip`> "Start and end tag tokens have a tag name and a list of attributes, each of which has a name and a value." "When an end tag token is emitted with attributes, that is a parse error." - it sounds like they are emitted [13:44:00.0000] <annevk> oh, ok [13:44:01.0000] <Hixie> annevk: which ones? [13:44:02.0000] <annevk> Hixie, the ones you pasted in IRC earlier; how many times duplication is hit etc. [13:45:00.0000] <annevk> although I'd love to see more detail :) [13:45:01.0000] <Hixie> ah yes [13:45:02.0000] <Hixie> i'll be posting more in due course [14:12:00.0000] <annevk> jgraham, I've been thinking about removing all the classes in html5parser.py [14:12:01.0000] <annevk> having said that, it hasn't been more than thinking [14:13:00.0000] <annevk> I'm not sure if we would actually gain anything from removing them and moving to a bunch of if/else statements as opposed to dictionary based method invocations [14:13:01.0000] <annevk> what we have now might actually be faster [14:13:02.0000] <rubys> why remove them then? [14:13:03.0000] <jgraham> annevk: I would image waht we have now is faster [14:14:00.0000] <jgraham> (although I would need metrics to be sure, of course) [14:14:01.0000] <jgraham> I think the time would be better spent on Chtml5lib [14:14:02.0000] <annevk> prolly [14:15:00.0000] <rubys> If I did the port, who would contribute to it? [14:15:01.0000] <jgraham> rubys: I guess it would be one way for me to finally learn C :) [14:16:00.0000] <rubys> I took a look at it, and porting it to C++ would probably take about a week. To C would be another week. [14:16:01.0000] <annevk> if I learn how to work with C on Ubuntu (besides learning to work with C in general) I would probably contribute [14:16:02.0000] <jgraham> (which is a way of saying I would love to contribute fixes but I don't feel confident in designing it) [14:16:03.0000] <annevk> not sure how much time I would invest on the python version afterwards [14:16:04.0000] <rubys> I would simply port the existing design. After it is working, it could be optimized, refactored, etc. [14:17:00.0000] <bewest> in that case why not profile the python version and move slow parts to C? [14:17:01.0000] <annevk> hmm, how are we going to handle <noscript>? [14:18:00.0000] <annevk> bewest, how is that better? [14:18:01.0000] <jgraham> That sounds great to me; I simply don't have enough C experience to know how best to implement things that are currently e.g. lists in python in C [14:18:02.0000] <annevk> we can prolly steal some ideas from Hixie's and hsivonen's impl [14:18:03.0000] <bewest> annevk: maybe it's not :/ [14:18:04.0000] <rubys> C++ has a standard library. Going to C next would mean reimplementing those concepts. [14:18:05.0000] <jgraham> bewest: It's not like there's one slow bit, it's the overhad of doing things many times [14:18:06.0000] <bewest> yeah [14:18:07.0000] <jgraham> e.g. many function calls [14:19:00.0000] <Philip`> I'd be interested to see if my C++ tokeniser implementation could actually work in practice [14:19:01.0000] <jgraham> Philip`: the O'Caml one? [14:19:02.0000] <annevk> rubys, if we're going to do it C might be better if we get more detailed control over things like the inputstream [14:19:03.0000] <Philip`> jgraham: Yes [14:19:04.0000] <annevk> Question: scripting is enabled or disabled? [14:19:05.0000] <annevk> we don't have any tests for <noscript> atm... [14:20:00.0000] <Philip`> (The C++-generating part is totally broken now, but http://canvex.lazyilluminati.com/misc/states5.png is generated from exactly the same data as the C++ tokeniser would be) [14:23:00.0000] <annevk> I'll assume that scripting is enabled for now [14:23:01.0000] <annevk> I suppose at some point we can provide a switch and enable/disable tests conditionally [14:25:00.0000] <Philip`> Could the test format be made to handle scripts modifying the input stream? [14:27:00.0000] <Philip`> You couldn't really expect parsers to all have script interpreters, but you could define that the tests can have <script>document.write("<p>")</script> (for some arbitrary JSON-encoded string) and the test harness can push those strings back into the input stream, to make sure the parser copes properly [14:32:00.0000] <annevk> at least for tree construction that's feasible [14:32:01.0000] <annevk> I was thinking of maybe offering #document-scripting-disabled at some point which provides an alternate tree and prolly also #errors-scripting-disabled [14:33:00.0000] <Hixie> just so everyone is aware and doesn't wonder if i died or something, i'm going to be on vacation for 3 weeks starting sunday [14:33:01.0000] <gsnedders> I'll make sure to ask if you've died. [14:34:00.0000] <hasather> Hixie: have fun :) [14:34:01.0000] <Hixie> i'll try! :-) [14:34:02.0000] <gsnedders> more seriously, where are you going? [14:34:03.0000] <Hixie> europe, east coast, various places around there [14:35:00.0000] <Hixie> apparently spending a lot of time in layovers at schipol [14:35:01.0000] <Hixie> which doesn't bode well for my luggage [14:35:02.0000] <annevk> yeah, it does that to you [14:36:00.0000] <gsnedders> I'm probably not getting of of the UK this summer [14:36:01.0000] <jgraham> gsnedders: Me neither (although I have been to various conferences abroad) [14:37:00.0000] <gsnedders> I'm going off down to Cambridge, but that's it. Probably going to Paris with my sister + her husband over the October holidays, though [14:38:00.0000] <jgraham> I assure you that Cambridge is lovely in every way. As long as you don't like hills. Or even slight rises. [14:38:01.0000] <jgraham> And, preferably, have a thing for tourists and punt touts [14:38:02.0000] <gsnedders> my grandmother lives in Cambridge, I've been plenty of times. Doesn't seem that hilly to someone from Scotland, though. [14:39:00.0000] <gsnedders> I should try actually punting again… [14:39:01.0000] <jgraham> It's really not that hilly. That why you can't like hills if you want to like Cambridge [14:39:02.0000] <jgraham> /me wants to move away just to get some hills [14:39:03.0000] <gsnedders> jgraham: come here! [14:40:00.0000] <gsnedders> [Fife] [14:41:00.0000] <jgraham> Fife would be nice. How are the employment opportunities though?... [14:42:00.0000] <gsnedders> No idea. I'm too young to know such things :) [14:42:01.0000] <jgraham> And I, sadly, am almost old enough to have to care :( [14:43:00.0000] <gsnedders> /me goes back to showing how young he is by looking up university entrance requirements [14:43:01.0000] <Dashiva> I feel old now [14:50:00.0000] <Philip`> You have to put up with all the students in Cambridge too :-p [14:51:00.0000] <gsnedders> hmmm… AAAAB at the min. for Higher entrance into Oxford [14:51:01.0000] <gsnedders> /me marks English as the B [14:51:02.0000] <Philip`> though I suppose they're usually outnumbered by tourists [14:51:03.0000] <gsnedders> Philip`: the terms aren't overly long at Cam/Oxf [14:52:00.0000] <Philip`> 3 * 8 weeks, with three months off for the summer vacation :-) [14:52:01.0000] <gsnedders> Philip`: which gives plenty of time for tourists to rule supreme :) [14:52:02.0000] <gsnedders> (I couldn't myself remember whether it was 8v10 or 10v12) [14:53:00.0000] <Philip`> It's nice during the exam term when they stop all the tourists coming into the colleges [14:54:00.0000] <gsnedders> I don't think I've ever been there at the time, due to school [14:54:01.0000] <Philip`> (Er, but I have no idea how many colleges do that) [14:54:02.0000] <gsnedders> (and nowadays I have exams at the same time) [14:54:03.0000] <gsnedders> Philip`: all do, IIRC [14:57:00.0000] <jgraham> Philip`: the quatity tourits+students is roughly conserved over the whole year [14:58:00.0000] <Hixie> cute, this http://triin.net/2006/06/12/Coding_practices_of_web_pages page refers to my 2005-12 study [14:59:00.0000] <Hixie> wow, the numbers he gets are very similar to the numbers i got in that study [14:59:01.0000] <Hixie> ncie [14:59:02.0000] <Hixie> nice [14:59:03.0000] <Hixie> (comparing http://code.google.com/webstats/2005-12/pages.html to http://triin.net/2006/06/12/HTML) [15:00:00.0000] <Hixie> even the oddities are present in both studies [15:00:01.0000] <Hixie> that's awesome [15:11:00.0000] <hsivonen> MikeSmith: my Java impl has configurable XML 1.0 compat [15:13:00.0000] <hsivonen> MikeSmith: for various features you can choose to be conforming to HTML5 (and potentially violate XML 1.0), not to violate XML 1.0 by treating violations as fatal errors or not violate XML 1.0 by being non-conforming to HTML 5 and making infoset-altering coercions [15:15:00.0000] <hsivonen> rubys: it might be a good idea to do an independent implementation in C. I believe Mike Day has already started one. I chose to do an independent implementation in Java using only test cases from html5lib in order to make a library that makes the most of Java instead of trying to map Pythonic stuff to Java [15:23:00.0000] <MikeSmith> hsivonen - thanks for the info [15:29:00.0000] <hsivonen> MikeSmith: to elaborate a bit: the SAX interface makes it possible for me to violate the interface contract in a way that exposes all of HTML5 in a way that may violate XML 1.0. The XOM interface, by design, won't allow it. When using a DOM impl meant for XML, some of the violation may not pass, either. [15:30:00.0000] <hsivonen> MikeSmith: so the non-XML stuff will be available through SAX (which I'm treating as the native interface) and custom DOM impls if someone cares to make one [15:34:00.0000] <rubys> hsivonen: the Ruby implementation is meant to make the most of Ruby, and diverges in a number of significant ways. [15:34:01.0000] <rubys> I did use the Python implementation as a starting point, but only as that, and only because it saved me some time. [15:35:00.0000] <hsivonen> rubys: ok. anyway, I suggest pinging Mike Day to avoid duplicating what he has already been doing [15:36:00.0000] <rubys> that's why I've been advocating putting implementations into one place (html5lib)... so as to minimize the "search time" it takes to find out the actual current state of an implementation. [15:37:00.0000] <rubys> what is the license, for example, of Mike's work? [15:38:00.0000] <hsivonen> rubys: the reason why I put the Java impl in a different repo is to keep it together with the rest of the conformance checker which in turn is there in order to keep it together with the schema project [15:38:01.0000] <hsivonen> rubys: MIT/expat, IIRC [15:39:00.0000] <hsivonen> rubys: MIT/expat seems to be the convention for HTML5 parsers :-) [15:39:01.0000] <rubys> ... eventually it will likely no longer be "the" (as in "the only") Java implementation. :-) [15:40:00.0000] <hsivonen> rubys: do you mean because of the repo choice or in general? [15:42:00.0000] <rubys> the two parsers that are in html5 have essentially zero required dependencies, and very few optional dependencies. I'd like to see a similar effort in PHP, Java, C#, and C. [15:43:00.0000] <hsivonen> rubys: my Java impl depends on a couple of my utility classes and ICU4J [15:43:01.0000] <hsivonen> rubys: putting the utility classes in one jar with the parser is not a big deal [15:43:02.0000] <rubys> i tried downloading it once. that was not the impression I got. But perhaps I was wrong. [15:44:00.0000] <hsivonen> rubys: making ICU4J optional for reduced correctness is not a big deal, either [15:44:01.0000] <hsivonen> rubys: do you mean you downloaded the parser that I'm currently working on or the conformance checker way back when you mentioned it in your blog comments [15:45:00.0000] <rubys> way back when [15:45:01.0000] <hsivonen> rubys: when my parser implementation is in a state where it can actually be used, I intend to offer a binary jar that doesn't require you to run the whole conformance checker build [15:45:02.0000] <hsivonen> (and the conformance checker build is now much easier, too) [15:46:00.0000] <hsivonen> rubys: the parser I'm now writing is not the prototype parser you saw way back when [15:46:01.0000] <rubys> Cool. Is there a single place where implementations can be found? [15:47:00.0000] <rubys> If not, can we make such a list on http://wiki.whatwg.org/wiki/ ? [15:47:01.0000] <hsivonen> rubys: dunno if the WHATWG wiki is up to date [15:47:02.0000] <hsivonen> rubys: in any case, I suggest that we link to each other whenever someone makes something runnable in a new language [15:48:00.0000] <hsivonen> (my tree builder is not runnable just yet) [15:48:01.0000] <rubys> How about this: I'll update html5lib to point to http://wiki.whatwg.org/wiki/Implementations [15:48:02.0000] <hsivonen> makes sense [15:48:03.0000] <hsivonen> svn co http://svn.versiondude.net/whattf/htmlparser/trunk/ htmlparser [15:48:04.0000] <hsivonen> in case you are interested [15:49:00.0000] <hsivonen> depends on the util module in the same repo, ICU4J and Java5 [15:50:00.0000] <rubys> are there any tests? [15:51:00.0000] <hsivonen> rubys: you need to check out html5lib separately to get test data [15:51:01.0000] <hsivonen> rubys: there are test harnesses for running html5lib encoding tests and tokenization tests [15:51:02.0000] <hsivonen> (tree builder harness will follow in due course) [15:52:00.0000] <rubys> this does look like the type of parser I was describing [15:52:01.0000] <rubys> Love the README. (Seriously) [15:52:02.0000] <hsivonen> good [15:53:00.0000] <hsivonen> well, it isn't runnable, yet [15:54:00.0000] <hsivonen> /me checks in a more positive README [15:54:01.0000] <rubys> I wasn't being sarcastic, I was serious. I prefer truth in labeling over marketing any day. [15:55:00.0000] <Philip`> /me 's tokeniser now works correctly on HTML documents that do not have any <, > or & in them [15:56:00.0000] <Philip`> (Well, it doesn't handle non-ASCII characters properly either) [15:56:01.0000] <rubys> /me congratulates Philip` [15:58:00.0000] <othermaciej> Philip`: is your tokenizer "cat"? [15:59:00.0000] <Philip`> (Do the html5lib tokeniser tests intentionally omit the end-of-file token?) [15:59:01.0000] <hsivonen> Philip`: so it seems [16:01:00.0000] <jgraham> Philip`: End of file token is implied by "no more tokens". Is there some reason to make it explicit? [16:01:01.0000] <hsivonen> rubys: I'm committed to providing buffered (correct) SAX, true streaming (potentially incorrect in non-conforming cases) SAX, DOM and XOM support. dom4j support should come for free with DOM support. JDOM support should be easy once those are done. True streaming StAX is intentionally not a goal. Tree-buffered StAX will be possible but not my personal interest. [16:02:00.0000] <Philip`> othermaciej: It's about as useful as cat at the moment :-) [16:03:00.0000] <rubys> hsivonen: my only remaining concern is that it is a single person project. Communities tend to outlive individuals. [16:03:01.0000] <hsivonen> rubys: I welcome contributions under the same license. [16:03:02.0000] <Philip`> jgraham: I guess not, assuming there's no way to generate more tokens after the end-of-file token (which I think is true, but not entirely obvious) [16:03:03.0000] <rubys> but that's not a today concern, you've already addressed my bigger concerns. [16:04:00.0000] <Philip`> I just need to fix my token-stream-serialiser to omit the end-of-file one... [16:04:01.0000] <hsivonen> rubys: also, it seems to me that it is a good idea to have something that runs before building a community [16:05:00.0000] <rubys> hsivonen: I've have rather mixed experience with that: http://search.yahoo.com/search?p=%22good+ideas+and+bad+code+build+communities%2C+the+other+three+combinations+do+not%22 [16:05:01.0000] <rubys> the best counter example I know of is Xalan. [16:05:02.0000] <rubys> Great code. Smart - very smart - developers. No community. [16:06:00.0000] <hsivonen> rubys: basically, my problem is that I don't know how to make the kind of commitments that I need to make in order to get paid for this and factor in the uncertainty (at this point) expectations of community [16:08:00.0000] <rubys> what commitments do you think html5lib has behind it? To my eyes, it has the exact right mix of good ideas and bad code (<smirk>) to be successful. [16:09:00.0000] <hsivonen> rubys: I've tried to be open about my plans, but I haven't published design docs, because I don't know if anyone would care to read them. I'd be happy to answer any questions on my design, though. [16:09:01.0000] <jgraham> All my bad code makes it successful? Excellent! [16:09:02.0000] <jgraham> (obviously rubys, anne and tbroyer are responsible for the good code) [16:10:00.0000] <hsivonen> rubys: as far as I can tell, html5lib is not on a paid basis in general [16:22:00.0000] <Philip`> /me can tokenise start tags now, which is handy [16:22:01.0000] <hsivonen> rubys: I added some info to the wiki [16:42:00.0000] <Philip`> This should, in theory, now do everything except doctypes... [16:43:00.0000] <Philip`> /me tries to set it up to run actual tests [16:43:01.0000] <hsivonen> /me reads Robert Burns' replies to jgraham [16:45:00.0000] <Philip`> /me wonders if there's an easy way to parse JSON in C++ [16:46:00.0000] <Philip`> Actually, that's kind of stupid, I'll just write a test wrapper in a proper language... [16:46:01.0000] <hsivonen> Philip`: are the libs linked to from json.org unsatisfactory? [16:47:00.0000] <Philip`> I guess that'd work, but downloading and installing requires too much effort [16:48:00.0000] <Philip`> (and then reading the documentation to work out how to use them) [16:48:01.0000] <Philip`> (and then actually writing the code to use them, in C++) [16:57:00.0000] <Philip`> Hmm, how are the ParseErrors in the tokeniser tests meant to work? They look like tokens, but it's not obvious where you add them so they don't conflict with all the other code that's fiddling with tokens... [16:58:00.0000] <hsivonen> Philip`: the tokenization spec is very clear about the sequence of parse errors relative to emitted tokens [16:58:01.0000] <hsivonen> Philip`: basically, you treat errors as an extra type of token [16:58:02.0000] <hsivonen> that the tokenizer emits 2007-07-06 [17:00:00.0000] <Philip`> Oh, I guess I just need to fix my concept of 'current token' so it's not simply the most recent token on the stack [17:00:01.0000] <hsivonen> Philip`: stack? [17:01:00.0000] <Philip`> Well, append-only stack [17:01:01.0000] <Philip`> so, er, I guess it's more like a list [17:03:00.0000] <hsivonen> Philip`: do you mean your test harness builds a list of tokens? [17:03:01.0000] <hsivonen> I was just thinking that there's no stack in the tokenizer [17:05:00.0000] <Philip`> The tokeniser itself builds a list of tokens (and then prints them all out at the end) [17:05:01.0000] <Philip`> (though I can change it to not do that, because it only ever needs a single current token and a bit of cheating to merge character tokens) [17:19:00.0000] <Philip`> Ooh, it sort of almost works, some of the time [17:22:00.0000] <Hixie> interesting, i never considered parse errors as a token type [17:22:01.0000] <Hixie> i just treat them as an out-of-band callback called during parse (my parser is synchronous, it returns a complete document once the parsing is done) [17:25:00.0000] <hsivonen> Hixie: I treat both errors and tokens as callbacks [17:25:01.0000] <Hixie> right [17:25:02.0000] <hsivonen> they are on different interfaces but the handler that generates JSON implements both [17:27:00.0000] <Philip`> Now I pass all of test1.dat and test2.dat except for about half of them which are just bits I haven't quite implemented yet [17:29:00.0000] <Philip`> /me needs to write a Perl one after this [17:35:00.0000] <Philip`> (Actually, I probably don't, since there'd be no point at all) [17:35:01.0000] <webben> why not? [17:37:00.0000] <Philip`> Because a tokeniser by itself isn't very useful :-) [18:30:00.0000] <Philip`> http://canvex.lazyilluminati.com/misc/states6.png - now with fewer bugs than before, since implementation seems to pass most of the tests now [18:31:00.0000] <Philip`> *the implementation [18:32:00.0000] <nickshanks> yay squiggly lines [18:33:00.0000] <Lachy> Philip`: what's the difference between red and black lines? [18:33:01.0000] <Philip`> (Oh, I segfault on <x y="&">, which can't be good) [18:34:00.0000] <nickshanks> an especially squiggly red one going from CommentEndDash to Data [18:34:01.0000] <Philip`> Lachy: Red is transitions that are parse errors, black is transitions that probably aren't [18:34:02.0000] <Lachy> ok [18:34:03.0000] <Philip`> ("probably" because of the parse-error-unless-it's-a-permitted-slash thing, which the graph treats as not-an-error) [18:35:00.0000] <Hixie> that's awesome [18:35:01.0000] <Hixie> why not have a red line and a black line when you have the permitted slash thing? [18:35:02.0000] <Hixie> you do that elsewhere [18:35:03.0000] <Hixie> yay, bogus doctype only has red arrows leading to it [18:35:04.0000] <Hixie> same with bogus comment, yay [18:36:00.0000] <Hixie> you really should use another colour for the EOF transitions [18:36:01.0000] <Hixie> in fact maybe we should have an EOF state [18:36:02.0000] <Hixie> instead of having EOF go back to the dat astate [18:37:00.0000] <Philip`> I can't easily have both because I only generate one arrow per transition from the original algorithm, and then delete all duplicates, so it only ends up with red+black when there are two separate transitions between the same states [18:37:01.0000] <Hixie> ah ok [18:37:02.0000] <Hixie> didn't realise it came from actual code [18:38:00.0000] <Hixie> that graph is awesome [18:38:01.0000] <Philip`> It's not entirely actual code - the algorithm is represented as data in OCaml, and I can generate that graph or a C++ implementation from that data [18:38:02.0000] <Hixie> it shows that there are really three basic ideas [18:38:03.0000] <Hixie> aah [18:38:04.0000] <Hixie> cool [18:39:00.0000] <rubys> is there any reason why you couldn't generate a, say, Python or Ruby implementation from that data? [18:42:00.0000] <Philip`> http://canvex.lazyilluminati.com/misc/states7.png - unless I did something wrong, that has blue lines for every transition that cannot occur if EOF is never consumed [18:42:01.0000] <Philip`> (i.e. all the transitions that are (at least partially) caused by EOF) [18:43:00.0000] <Hixie> can you try it with a separate state for EOF? or is that more effort than it's worth? [18:43:01.0000] <Hixie> it'd be cool to have the arrows go down to another state for EOF, it would look less cluttered i'd think [18:43:02.0000] <Hixie> just an idea, don't worry about it if it's more work than a few seconds :-) [18:43:03.0000] <Philip`> rubys: I don't think there is any reason why that wouldn't work [18:43:04.0000] <Hixie> this is really cool [18:45:00.0000] <Philip`> I've still had to manually write a few hundred lines of C++ (which would need to be ported to other languages), mostly for the entity parsing (since that's too boring to do in a more generic way), but then it generates a thousand lines of state-machine code automatically [18:45:01.0000] <rubys> I don't know O'Caml, but this sounds like a wonderful excuse to learn. Will you be publishing your source at some point? [18:45:02.0000] <Philip`> I didn't know it either, so I'm using it as exactly the same excuse ;-) [18:46:00.0000] <Philip`> I'll try to upload what I've done soonish [18:46:01.0000] <KevinMarks> looks like it could be used to generate code coverage testcases too [18:47:00.0000] <Philip`> I'm sure there must be a way to add in a new EOF state in about three lines of code, but I'm also sure they'll take a few minutes to work out... [18:55:00.0000] <Philip`> http://canvex.lazyilluminati.com/misc/states8.png [18:55:01.0000] <Philip`> (Hmm, it took fourteen lines) [18:56:00.0000] <Hixie> sweet [18:57:00.0000] <Hixie> that's totally awesome [18:58:00.0000] <Philip`> Now I just need to make it able to generate the spec text from the algorithm ;-) [18:58:01.0000] <Hixie> hah [19:00:00.0000] <Philip`> /me wonders if people have experience of how much more time it takes to implement tree construction compared to tokenisation [19:00:01.0000] <Hixie> about twice as long to write, about three times as long to debug, iirc [19:00:02.0000] <Hixie> but it's not especially hard [19:00:03.0000] <Hixie> just tedious [19:07:00.0000] <rubys> why is there a blue arrow from data to data? [19:11:00.0000] <Philip`> Because I modified the algorithm so any case which is triggered by EOF and causes a transition into the Data state, was changed to transition into the EOFData state [19:11:01.0000] <Philip`> but the relevant part inside the Data state bit of the algorithm doesn't transition into the Data state [19:12:00.0000] <Philip`> (because I didn't bother writing in the "stay in the same state" bits explicitly) [19:12:01.0000] <Philip`> so that could be considered a bug in my old-algorithm-to-new-algorithm transformation code, but it'd require too much effort to fix :-) [19:18:00.0000] <Philip`> Hmm, it's far too easy to get exponential growth in these things [19:20:00.0000] <Philip`> http://canvex.lazyilluminati.com/misc/states9.png - I'm not sure why it's gone quite that bad [19:21:00.0000] <Hixie> holy crap what the hell is that [19:21:01.0000] <Hixie> states * pcdata etc? [19:22:00.0000] <Philip`> Yes [19:23:00.0000] <Philip`> I suppose it's unhappy because lots of states emit start/end tag tokens when they see EOF, and the tokeniser can't tell what the tree constructor is going to do to the content model flag when that happens, so I assume it could end up being set to anything, which causes unpleasant growth [19:23:01.0000] <Philip`> ("I assume" = "I tell the code to assume") [19:26:00.0000] <Philip`> Looks like that is the case - http://canvex.lazyilluminati.com/misc/states10.png is far better without the EOFs [19:27:00.0000] <Lachy> Philip`: what are you using to create those flow charts? [19:28:00.0000] <Philip`> Graphviz [19:30:00.0000] <Philip`> (It does tend to collapse into a mass of unreadable squiggles when you get past a certain size, and I always tend to use it on things that approach that size, but I've not heard of anything else that does the same kind of thing) [19:31:00.0000] <Philip`> (Uh, "same kind of thing" = drawing graphs, not collapsing into squiggles) [19:46:00.0000] <minerale> What is whatwg ? [19:47:00.0000] <wildcfo> u mean this channel? [19:47:01.0000] <minerale> The website needs an 'about' link [19:47:02.0000] <minerale> I just saw the site in a slashdot sig, went there and was not sure how it related to silverlight [19:50:00.0000] <minerale> is it some kind of social front end to w3c's html specifications? [19:51:00.0000] <Hixie> minerale: see http://blog.whatwg.org/faq/#whattf [19:52:00.0000] <Hixie> minerale: we're basically the renegade group that started html5 [19:57:00.0000] <Philip`> My entirely unoptimised C++ tokeniser (which no longer segfaults) takes about 0.4 seconds for the HTML5 spec, which doesn't seem too bad [19:59:00.0000] <Philip`> (It's certainly a bit useless, because it just computes all the tokens and then memory-leaks them away) [20:12:00.0000] <Hixie> Philip`: yeah tokenising is easy [20:13:00.0000] <Hixie> Philip`: the tree construction is definitely the more expensive part [00:46:00.0000] <Hixie> http://html5.googlecode.com/svn/trunk/data/ [00:46:01.0000] <Hixie> enjoy [00:47:00.0000] <Hixie> (hsivonen, jgraham, Philip`, anyone else writing an HTML5 parser ^) [01:28:00.0000] <hsivonen> Hixie: thank you [01:32:00.0000] <Hixie> hm [01:32:01.0000] <Hixie> so i have data on attributes-per-element and suchlike [01:32:02.0000] <Hixie> but i don't know exactly what you want to know [01:33:00.0000] <hsivonen> cumulative percentages of x% had 0 attributes, y% had <=1 attributes, z% had <=2 attributes, etc. [01:33:01.0000] <Hixie> hm [01:36:00.0000] <Hixie> hm [01:36:01.0000] <Hixie> if 10 elements had 0 attributes [01:36:02.0000] <Hixie> and i know there were 20 elements [01:36:03.0000] <Hixie> and 5 elements had 1 attribute [01:36:04.0000] <hsivonen> I meant element instances, not element names, btw [01:37:00.0000] <Hixie> that means 75% had <= 1 [01:37:01.0000] <Hixie> right? [01:37:02.0000] <Hixie> yeah [01:37:03.0000] <Hixie> i know [01:37:04.0000] <hsivonen> yes [01:37:05.0000] <Hixie> so i just need to add numbers until i get to one where i don't know the number [02:12:00.0000] <Hixie> hsivonen: ok, see http://html5.googlecode.com/svn/trunk/data/misc.txt [02:16:00.0000] <hsivonen> Hixie: thank you [02:17:00.0000] <Hixie> my pleasure [02:17:01.0000] <hsivonen> elements with <= 0 attributes: 33.5% is lower than I would have guessed [02:18:00.0000] <Hixie> most documents consist primarily of <td>s with bgcolors, <font>s, and such like [02:19:00.0000] <hsivonen> I had guessed that 3 attributes is the common case. not such a bad guess [02:20:00.0000] <hsivonen> I readjust my guess to 5 [02:20:01.0000] <Hixie> amusingly, the more documents i scan, the greater the portion that is XHTML [02:21:00.0000] <Hixie> in a sample of several dozen billion documents, it was about 0.2%, vs 0.02% for a sample of only a few billion (smaller sample being biased towards western pages with higher page rank) [02:22:00.0000] <Hixie> (0.2% is vs 97.5% for text/html) [02:29:00.0000] <hsivonen> Hixie: testing whether the stack has "table" in table scope is the same as checking whether there's a "table" on the stack at all, right? [02:30:00.0000] <zcorpan_> Hixie: did you find anything with <! ">" > ? [02:30:01.0000] <Hixie> zcorpan_: didn't have a chance to look into that yet [02:30:02.0000] <zcorpan_> ok [02:30:03.0000] <Hixie> zcorpan_: but the fact that only IE does it makes me think it's not a big deal [02:30:04.0000] <Hixie> hsivonen: um [02:31:00.0000] <Hixie> hsivonen: yeah, i guess so [02:31:01.0000] <hsivonen> Hixie: ok. thanks [02:31:02.0000] <Hixie> does the spec ever ask that? [02:33:00.0000] <hsivonen> Hixie: yes [02:33:01.0000] <hsivonen> Hixie: I sent email [02:33:02.0000] <Hixie> k [02:35:00.0000] <othermaciej> Hixie: so obviously XHTML lowers your pagerank! [02:35:01.0000] <othermaciej> evil google conspiracy! [02:35:02.0000] <Hixie> othermaciej: lol [02:35:03.0000] <Hixie> i wouldn't be surprised if that was actually true [02:36:00.0000] <Hixie> i don't think google really supports xhtml [02:36:01.0000] <Hixie> we probably treat it as text/html and get all confused or something [02:37:00.0000] <zcorpan_> like mobiles? [02:37:01.0000] <zcorpan_> :) [02:43:00.0000] <Hixie> yeah, probably [02:53:00.0000] <hsivonen> Hixie: according to markp and Matt Cutts on rubys' blog, the XHTML non-support is changing [02:58:00.0000] <hsivonen> hmm. actually, neither of them said anything about Google parsing XHTML right... [03:01:00.0000] <othermaciej> I wonder how much of the nominal html on the web is mobile-targeted (and therefore not really parsed as xhtml) [03:29:00.0000] <zcorpan_> othermaciej: btw, did you debug why dom2string hit a "Maximum call stack size exeeded" error in webkit? [03:31:00.0000] <othermaciej> zcorpan_: haven't had time so far [03:31:01.0000] <othermaciej> zcorpan_: can you remind me of the relevant URL? [03:31:02.0000] <othermaciej> I can try it now [03:34:00.0000] <zcorpan_> othermaciej: http://simon.html5.org/temp/html5lib-tests/ [03:38:00.0000] <othermaciej> zcorpan_: thanks [03:42:00.0000] <gsnedders> does any HTML5 document meet the nesting requirements once parsed? [03:43:00.0000] <zcorpan_> gsnedders: what nesting requirement? [03:44:00.0000] <gsnedders> zcorpan_: things like <div>test<p>test</p></div> [03:44:01.0000] <hsivonen> gsnedders: yes. [03:44:02.0000] <gsnedders> like, the content model (I say remembering the name) [03:44:03.0000] <zcorpan_> gsnedders: oh. no. [03:44:04.0000] <hsivonen> gsnedders: no [03:45:00.0000] <zcorpan_> gsnedders: any stream of characters results in a tree. but it might not conform to the content model rules [03:46:00.0000] <gsnedders> I'm just thinking about how plausible it'd be to take arbitrary input and output (machine-checkable) conformant HTML5 [03:48:00.0000] <hsivonen> gsnedders: you'd probably need methods similar to what John Cowan's TagSoup uses [03:48:01.0000] <hsivonen> gsnedders: the HTML5 parsing algorithm itself specifically is not about doing that [03:48:02.0000] <gsnedders> hsivonen: I know. I was just wondering how much it does do in itself. [03:48:03.0000] <othermaciej> even things that parse without parse errors could result in a non-conforming document [03:49:00.0000] <zcorpan_> gsnedders: it builds a tree [03:49:01.0000] <hsivonen> gsnedders: it makes sure that tables don't have intervening cruft and it moves stuff between head and body to head [03:49:02.0000] <gsnedders> My issue was really as to how close to being conforming the output of it was, and whether what it did change made those sections conforming [03:50:00.0000] <othermaciej> I wonder if all the machine-checkable conformance criteria are practically machine-fixable [03:50:01.0000] <gsnedders> othermaciej: invalid dates won't be. [03:50:02.0000] <gsnedders> (short of dropping them) [03:51:00.0000] <hsivonen> othermaciej: everything that is machine-checkable is machine-fixable to the point that the machine checker doesn't know the difference (but the result can be totally bogus) [03:51:01.0000] <zcorpan_> <title>s in body are still moved to head too, right? [03:51:02.0000] <othermaciej> you could use an ultra-lenient best-guess date parser [03:51:03.0000] <hsivonen> case in point: filling alt attributes with junk [03:51:04.0000] <gsnedders> othermaciej: even that will have limitations. [03:51:05.0000] <hsivonen> or copying src to longdesc to please an accessibility checker [03:52:00.0000] <othermaciej> whether an alt attribute is junk isn't machine-checkable, really [03:52:01.0000] <hsivonen> othermaciej: that's my point [03:52:02.0000] <othermaciej> it might be than an image is a picture of the text "asdfjkl; i hate conformance checking" [03:52:03.0000] <othermaciej> and so that would be totally valid alt text [03:53:00.0000] <hsivonen> so anything that is non-conforming in a machine-checkable way can be replaced with stuff that is semantically junk but that is syntactically ok [03:53:01.0000] <othermaciej> I guess it depends on how you want to fix things [03:54:00.0000] <othermaciej> attribute values that would be discarded don't really matter as much as violating content models, in a way [03:54:01.0000] <othermaciej> because in the latter case, there might not be a conforming document that looks and acts the same (at least, without rewriting in-page scripts) [03:55:00.0000] <hsivonen> gsnedders: in many cases, you can "fix" content models by wrapping consecutive inline nodes in a single p node [04:20:00.0000] <zcorpan_> perhaps <foo => should be a parse error (since it doesn't do what the spec says in any of ie, safari, opera, firefox) [04:25:00.0000] <othermaciej> what does the spec say to do? [04:26:00.0000] <zcorpan_> create an attribute with the name = [04:26:01.0000] <zcorpan_> | <foo> [04:26:02.0000] <zcorpan_> |   =="" [04:27:00.0000] <zcorpan_> opera, moz, safari drop the attribute. ie creates an attribute with the empty string as the name [04:27:01.0000] <othermaciej> the spec behavior is extremely weird then [04:28:00.0000] <zcorpan_> not really weird. but doesn't match any browser and it's not a parse error [04:38:00.0000] <hsivonen> zcorpan_: email time :-) [04:38:01.0000] <zcorpan_> emailed [04:46:00.0000] <othermaciej> zcorpan_: what blows the JS stack on that page is the runner/process mutual recursion [04:47:00.0000] <othermaciej> zcorpan_: not sure offhand why they call each other but perhaps it could be a loop instead [04:48:00.0000] <othermaciej> zcorpan_: at some point we will fix the stack limit, it should probably be higher than it is [04:48:01.0000] <zcorpan_> othermaciej: ah. so it's not the dom2string that is the problem [04:49:00.0000] <othermaciej> zcorpan_: well, it might have been a problem before, but those recurse deeply enough by themselves to exceed the limit [04:58:00.0000] <zcorpan_> othermaciej: works when i rewrote it to be a loop [04:59:00.0000] <othermaciej> zcorpan_: cool [04:59:01.0000] <othermaciej> zcorpan_: thanks for the workaround [05:59:00.0000] <zcorpan_> commited workaround to http://html5.googlecode.com/svn/trunk/parser-tests/ [06:16:00.0000] <Philip`> The OCaml preprocessor makes my brain hurt [06:17:00.0000] <hsivonen> trying to think whether the tree building spec ask implementors do useless stuff takes time... [06:20:00.0000] <Philip`> Oh, nice, the camlp4 documentation has an example that does precisely what I'm trying to do, which means I don't have to understand anything and can just copy-and-paste it in [06:45:00.0000] <zcorpan_> hmm. getElementsByClassName doesn't take a string as argument. it did before, didn't it? [06:55:00.0000] <Lachy> zcorpan_: gEBCN() has gone though various iterations including a space separated string, varargs and array of strings. [06:56:00.0000] <zcorpan_> yeah. i thought it was either string or array. appears it is array only [07:02:00.0000] <zcorpan_> firefox has implemented it as either string or array [07:02:01.0000] <zcorpan_> it seems [07:03:00.0000] <Lachy> which version of FF supports it? [07:04:00.0000] <zcorpan_> 3 [07:04:01.0000] <zcorpan_> or actually, it only supports array when it has 1 item [07:05:00.0000] <Lachy> hopefully that can be fixed before FF3 ships [07:05:01.0000] <zcorpan_> it uses space-separated string [07:05:02.0000] <zcorpan_> that seems to be more practical anyway to me [07:06:00.0000] <Lachy> yeah, in some ways it is, but even with an array, it's not hard to do gEBCN(["foo"]); [07:08:00.0000] <Lachy> the array helps when you're programmatically creating a collection of class names, but the space separated string would probably be better optimised for the majority of cases [07:09:00.0000] <zcorpan_> classList is an array right [07:09:01.0000] <zcorpan_> or can be passed to gEBCN [07:09:02.0000] <Lachy> it probably is [07:10:00.0000] <Lachy> it's a DOMTokenList [07:10:01.0000] <Lachy> http://www.whatwg.org/specs/web-apps/current-work/#domtokenlist0 [07:11:00.0000] <zcorpan_> does that fit the definition of "array" wrt what gEBCN can take as argument? [07:11:01.0000] <Lachy> whether or not a DOMTokenList can be passed to gEBCN would depend on the language binding [07:11:02.0000] <zcorpan_> for ECMAScript [07:13:00.0000] <Lachy> ideally, it should be possible to pass a DOMTokenList in all languages. I suggest you send mail about the issue [07:13:01.0000] <othermaciej> it can be passed, just not clear if the result will be useful [07:13:02.0000] <othermaciej> unless the toString conversion is defined [07:13:03.0000] <othermaciej> to do something good [07:13:04.0000] <othermaciej> which probably it should be [07:14:00.0000] <Lachy> the toString should probably return a space separated list of tokens as a string. [07:15:00.0000] <Lachy> but I don't think toString is relevant, given the current definition of gEBCN accepting an array [07:15:01.0000] <zcorpan_> othermaciej: why does toString matter? [07:15:02.0000] <othermaciej> I thought it took a string, sorry [07:15:03.0000] <othermaciej> defining it to take an array is weird [07:15:04.0000] <zcorpan_> why [07:15:05.0000] <othermaciej> it should at the very least accept a string also [07:16:00.0000] <othermaciej> it is true that you can do the varargs thing with an array and it also lets you build up a pre-made array [07:16:01.0000] <othermaciej> but it makes the common case more awkward [07:16:02.0000] <othermaciej> and it requires creating a wasteful temporary object for the common case [07:16:03.0000] <Lachy> yeah [07:17:00.0000] <othermaciej> and you can use .apply() to pass an array of arguments to a varargs function in JS [07:18:00.0000] <zcorpan_> perhaps the spec should be changed to only take a space separated string as argument. and defined DOMTokenList.toString to be useful [07:19:00.0000] <Lachy> it could probably be defined to accept either a space separated string, varargs, array or a DOMTokenList. [07:20:00.0000] <Lachy> I think that would be possible to define, using the IDL described in the latest DOM Language Bindings draft [08:49:00.0000] <Philip`> Ooh, looks like each tokeniser state can only ever be entered with one (or zero) type of current token [08:50:00.0000] <Philip`> which is nice because it means I can just cast the current-token pointer without any safety checks, since it's guaranteed to be the right type [13:00:00.0000] <gsnedders> FWIW: http://geoffers.no-ip.com/svn/php-html-5-direct — (Barely started) direct implementation of HTML 5's algorithms [13:03:00.0000] <Philip`> /me wonders why the list of whitespace characters differs from the list used in the tokeniser [13:06:00.0000] <gsnedders> Philip`: where is it different? It's the same, just in a different order. [13:06:01.0000] <Philip`> The tokeniser doesn't do U+000D [13:06:02.0000] <gsnedders> "U+000D CARRIAGE RETURN (CR) characters, and U+000A LINE FEED (LF) characters, are treated specially. Any CR characters that are followed by LF characters must be removed, and any CR characters not followed by LF characters must be converted to LF characters. Thus, newlines in HTML DOMs are represented by LF characters, and there are never any CR characters in the input to the tokenisation stage." [13:06:03.0000] <gsnedders> (Input Stream) [13:07:00.0000] <gsnedders> whereas within an attribute a CR could occur through a entity [13:09:00.0000] <hsivonen> gsnedders: the entity case maps to LF as well now [13:10:00.0000] <gsnedders> hsivonen: that was actually changed? ah. I guess the other parts exist to accommodate XHTML5, then? [13:10:01.0000] <gsnedders> actually, XML changes CR as well [13:11:00.0000] <gsnedders> "The only way to get a #xD character to match this production is to use a character reference in an entity value literal." — so you can get it through an entity in XML [13:12:00.0000] <Philip`> Does any of that conversion apply if you do document.write("<b\r>") ? [13:12:01.0000] <gsnedders> Philip`: it goes through the input stream, so yes [13:14:00.0000] <Philip`> Ah, okay [13:23:00.0000] <hsivonen> gsnedders: CR is now in the same table as the Windows-1252 NCRS [13:23:01.0000] <hsivonen> NCRs [13:23:02.0000] <hsivonen> gsnedders: if you put an NCR for CR in XML, you get a CR in the infoset/DOM [13:24:00.0000] <gsnedders> hsivonen: ah. I didn't notice it when I implemented that separately a few days ago (though I did just copy/paste the table and create code automagically). that's what I thought about XML, though. [13:24:01.0000] <gsnedders> trying to remember what specs say when so tired probably isn't sensible :) [13:25:00.0000] <hsivonen> /me notes that the fragment case does bad things to control flow [14:21:00.0000] <Philip`> /me wishes he could find a nice way to output C code from OCaml without just sticking lots of strings together, and without using 25K-line libraries with far too many dependencies [14:42:00.0000] <Philip`> /me wonders what would be the easiest way to prove the tokeniser terminates (assuming the character stream is finite) [14:42:01.0000] <Philip`> (I don't doubt that it does, but I like having a computer agree with me...) [14:43:00.0000] <hsivonen> oh you are actually proving stuff :-) [14:43:01.0000] <hsivonen> I just trust the html5lib tests :-) [14:46:00.0000] <zcorpan_> what is a conforming test case? don't we need conformance requirements for test cases? [14:46:01.0000] <Dashiva> Can you show the input position is steadily increasing? [14:46:02.0000] <Philip`> Since I've got the tokeniser in this format, I thought I might as well try proving various forms of correctness, to make sure I don't forget all the logic stuff I learnt at university :-) [14:47:00.0000] <Philip`> Dashiva: No, since it doesn't always steadily increase - some states don't always consume a character [14:47:01.0000] <Dashiva> But then you could take those states and how they're always part of a series of states increasing it [14:49:00.0000] <Philip`> I think that'd probably work - I don't know if it can be done automatically, but I guess it shouldn't be hard to manually define a (partial) ordering of states and check that (input_position, state) is always increasing [14:51:00.0000] <hsivonen> Philip`: do you use a read()/unread() model? [14:51:01.0000] <hsivonen> Philip`: can you prove that there are never two consecutive unreads without a read in between? [14:52:00.0000] <hsivonen> that would at least prove it isn't going backwards [14:53:00.0000] <Philip`> Is "unread" where the spec says "reconsume the character in the something state"? [14:53:01.0000] <hsivonen> I've made quite a few optimization by just looking hard at the tree building algorithm without proving anything... [14:53:02.0000] <hsivonen> Philip`: yes [14:53:03.0000] <hsivonen> Philip`: I call unread() before such transitions [14:55:00.0000] <Philip`> hsivonen: Okay - I've done about the same, with UnconsumeCharacter/ConsumeCharacter [14:56:00.0000] <hsivonen> Philip`: btw, what's you character datatype? an UTF-8 code unit? UTF-16 code unit? UTF-32 code unit? [14:57:00.0000] <Philip`> (I've tried to do as literal a translation of the spec text as possible, but 'unconsume' maps onto the state->state [where 'state' means the whole tokeniser state, not just the explicit ones in the spec] transition model much better than 'reconsume in some other state') [14:58:00.0000] <Philip`> The C++ implementation just uses a wchar_t, which is 2 or 4 bytes, but it ought to be relatively easy to change that to something better if I had any idea of what would work well [15:01:00.0000] <Philip`> I'd like to be able to just start with the original correct algorithm, and then have code that optimises it into a less naive structure, and then output that (as C++ or whatever else you want), though currently I've got none of the optimisation bit :-) [15:04:00.0000] <Philip`> (...and then if the spec changes, it'd all work nicely and easily since the optimisation things would just apply themselves to a different algorithm and produce a new correct tokeniser) [15:04:01.0000] <Philip`> (I expect this is all far more complex than necessary, but it's fun anyway) [15:45:00.0000] <Hixie> hsivonen: i haven't checked, but re </table>, what about: <table><td><ol><li></table> ? [15:45:01.0000] <Hixie> vs <table><td><p></table> [15:45:02.0000] <Hixie> and ignoring the missing <tr>s, oops [15:47:00.0000] <hsivonen> Hixie: well, yeah. I guess we want the errors there after all. my point was that <ol> gets one error anyway when it goes on the stack [16:04:00.0000] <Hixie> hsivonen: not in that case, you're in a cell there [16:16:00.0000] <hsivonen> Hixie: ooh. good point. [16:16:01.0000] <hsivonen> Hixie: except then you aren't IN_TABLE [16:18:00.0000] <hsivonen> well, I implemented the spec now [16:20:00.0000] <Hixie> doesn't in-cell defer to in-table in that case? [16:21:00.0000] <hsivonen> no. it closes the cell first [16:24:00.0000] <Hixie> ah [16:24:01.0000] <Hixie> hm [16:24:02.0000] <Hixie> well i'll look at it in detail at some point [16:24:03.0000] <Hixie> :-) 2007-07-07 [17:09:00.0000] <Hixie> gotta love some e-mails [17:09:01.0000] <Hixie> "i wonder why there's still no a special 'key' attribute for every form field implemented." [18:17:00.0000] <zcorpan_> why is there discussion about selectors api on the whatwg list? [18:19:00.0000] <othermaciej> sidetrack [18:55:00.0000] <Philip`> If html5lib throws an exception when parsing, does that count as a bug? [19:43:00.0000] <Philip`> Hmm, I get quite different results in html5lib vs my tokeniser when there's a \r [19:43:01.0000] <Philip`> (Maybe it's doing some kind of translation to \r at some point?) [19:53:00.0000] <Hixie> what results do you get? [19:54:00.0000] <Hixie> (you should get a character token U+0010) [20:00:00.0000] <Philip`> With input like "<x\r>", html5lib gives a start tag called "x", whereas I get one called "x\r" [20:02:00.0000] <Philip`> I suppose that doesn't matter since the input stream preprocessing bit says "there are never any CR characters in the input to the tokenisation stage" [20:03:00.0000] <Philip`> (but I don't have any input stream preprocessing - I'm just pushing characters straight into the tokeniser) [20:14:00.0000] <Hixie> ah [20:14:01.0000] <Hixie> you'll want some input stream preprocessing [20:14:02.0000] <Hixie> it's part of the parser spec [20:14:03.0000] <Hixie> ensures there are no NULLs, CRs, etc [20:15:00.0000] <Hixie> it's important because <p\r\ntitle="..."> ...is common in HTML, and you want to not treat that as something other than a <p> tag! [20:15:01.0000] <Hixie> bbiab [23:06:00.0000] <Hixie> hey did we ever hear back on "Steven: I believe that XHTML2 is more backwards compatible than HTML5, and I plan to make a document comparing them to demonstrate it. [23:06:01.0000] <Hixie> "? [23:06:02.0000] <Hixie> If there are real problems I want to fix them [23:08:00.0000] <othermaciej> I haven't heard anything [23:13:00.0000] <Hixie> going through this discussion, i actually came across another argument (from Mark Birbeck) for why we _shouldn't_ call it XHTML1.5 [23:13:01.0000] <Hixie> XHTML 1.5 implies it was evolved from XHTML 1.1 [23:13:02.0000] <Hixie> which it isn't (at all), it's evolved from HTML5 [23:14:00.0000] <Hixie> not that it's a big deal, just thought of it [01:20:00.0000] <billyjack> HTMLX [01:21:00.0000] <MikeSmith> HTML5X [02:04:00.0000] <Dashiva> Hixie: I guess it's HTML 1.0.5 then :) [02:25:00.0000] <jgraham> Philip`: If html5lib throws an exception it's almost certianly a bug [02:43:00.0000] <webben> Hixie: I don't know if you follow Amaya development, but they've just implemented http://www.ietf.org/internet-drafts/draft-wilde-text-fragment-07.txt ... which may suggest it would be worth a second look wrt video fragment addressing. [02:44:00.0000] <webben> here's the test the mailing list suggests: http://www.ietf.org/internet-drafts/draft-wilde-text-fragment-07.txt#char=20090,20109 [02:46:00.0000] <Hixie> good to know their focussing on their most critical bugs [02:46:01.0000] <Hixie> they're, even [02:47:00.0000] <Hixie> i don't really have a problem with using fragment identifiers to seek into video, per se, i just don't think it really fits the JS API [02:47:01.0000] <Hixie> i guess we could define the fragment identifier to set the default value of 'start' or whatever that attribute is called [02:47:02.0000] <Hixie> when i last looked at them it was before we had the (more) complex set of attributes we do now [02:57:00.0000] <MikeSmith> Hixie, I know the discussion about naming of the XML serialization is really just a distraction, but I think the idea of trying to avoid ambiguity maybe has some merit .. HTMLX or HTML5X seem possibly appropriate: unambiguous, short and simple, new (and exciting!) and much more in the WHATWG spirit than adopting a legacy name with baggage already attached to it [02:58:00.0000] <mikeday> HTML is a legacy name with a fair bit of baggage attached to it :) [03:00:00.0000] <MikeSmith> mikeday - true, but a bit of a different thing, isn't it ... [03:00:01.0000] <mikeday> at least XHTML5 gives the W3C three more versions of XHTML to play with themselves :) [03:00:02.0000] <MikeSmith> heh [03:02:00.0000] <MikeSmith> Anyway, I think that we want the HTML name and that there is no point in using anything but it is clear ... but IMHO we don't really want the XHTML name for the XML serialization and would really gain nothing by using it [03:03:00.0000] <mikeday> seems reasonable, but most people do think of XHTML as being "HTML serialised as XML" [03:03:01.0000] <Hixie> MikeSmith: if it's an issue at all, i agree with DanC that the spec that should change name is XHTML2 (since it isn't really "HTML") [03:03:02.0000] <Hixie> MikeSmith: but i'm not convinced there's an issue, i'm happy for both groups to co-exist [03:04:00.0000] <Hixie> MikeSmith: "XHTML" is what the XML serialisation of HTML is called, it's been that way for years (since 1999 at least) [03:04:01.0000] <MikeSmith> Hixie - the fact that something has been a certain way for years is not a convincing argument for keeping it that way :) [03:05:00.0000] <Hixie> for names, it is [03:05:01.0000] <MikeSmith> I think you better than anybody else can probably understand that well :) [03:05:02.0000] <Hixie> how so? [03:07:00.0000] <webben> Hixie: It's an open source project, so it's not necessarily th [03:07:01.0000] <webben> the case that it was implemented by the core team. [03:07:02.0000] <Hixie> webben: true [03:08:00.0000] <MikeSmith> Hixie - Well, because your work on HTML5 has been in many ways a clear break with the past of XHTML, a reassessment of what the real problems are and a different approach to fixing them [03:09:00.0000] <mikeday> a clear break... <br clear="all">, presumably. [03:09:01.0000] <Hixie> Mike: i don't understand how that means we should change names. If anything, the whole point of keeping backwards compatibility suggests we should not change the names willy nilly. [03:11:00.0000] <MikeSmith> I don't think it'd be willy nilly at all and backwards compatibility with respect to supporting existing content on the Web is something very different than backwards compatibility in naming [03:11:01.0000] <MikeSmith> we could not break anything by renaming the XML serialization [03:12:00.0000] <MikeSmith> we would break a whole lot of stuff by not supporting backward compatibility with existing content [03:12:01.0000] <webben> or little bits of stuff, in the cases where backwards compatibility is already broken [03:12:02.0000] <webben> Can't this issue get escalated to TBL already? [03:13:00.0000] <webben> I think XHTML5 is a crazy choice, but this issue does seem to be taking up too much time. [03:13:01.0000] <mikeday> I like the sound of XHTML II Turbo Championship Edition [03:13:02.0000] <MikeSmith> It could be decided by Tim but why should it be? [03:13:03.0000] <webben> MikeSmith: as i understand it, that's W3C process. [03:13:04.0000] <MikeSmith> if a whole new name is chosen we don't need to ask for anybody's permission [03:14:00.0000] <webben> MikeSmith: Yes, but that doesn't seem likely at this point. [03:14:01.0000] <MikeSmith> and if a whole new name is chosen, we don't need to continue unproductive discussions with others about it [03:15:00.0000] <MikeSmith> webben - nothing is sacred, nothing is written in stone at this point, our options are wide open [03:15:01.0000] <MikeSmith> with regard to this name [03:15:02.0000] <webben> MikeSmith: I mean if the WHATWG community is not already persuaded this is a bad idea, it's unlikely more arguments will persuade them. The issues involved aren't technical. [03:16:00.0000] <Hixie> there really is no issue here [03:16:01.0000] <webben> Hixie: If there was no issue, then it wouldn't be an issue to change the name. [03:16:02.0000] <webben> It would just be changed. [03:16:03.0000] <Hixie> the name is "HTML5" [03:16:04.0000] <webben> obviously there are issues here [03:17:00.0000] <Hixie> we don't have a spec called "XHTML5" [03:17:01.0000] <MikeSmith> Hixie - true that. but within the spec the term is used [03:18:00.0000] <webben> Hixie: Then it would be no issue to disassociate HTML5 from XHTML5. [03:18:01.0000] <Hixie> the only mention of "XHTML5" in the spec is in relation to a conformance class [03:19:00.0000] <Hixie> webben: you can't disassociate them. they're associated in the mind of millions of people, with thousands of blog posts and e-mails already referring to the XML HTML5 variant by that name. [03:19:01.0000] <mikeday> millions of people? seriously? [03:19:02.0000] <Hixie> well, "millions" might be overstating the case [03:19:03.0000] <webben> Hixie: The WHATWG community appears to think that it would be feasible for XHTML2 to change it's official name. You could come up with an official name for the XML serialization. [03:19:04.0000] <mikeday> /me grins [03:19:05.0000] <Hixie> tens of thousands, certainly [03:19:06.0000] <mikeday> thousands, quite likely [03:19:07.0000] <mikeday> hundreds, almost definitely [03:20:00.0000] <hsivonen> we could remove the string "XHTML5" from the spec [03:20:01.0000] <webben> Lets not pretend it's impossible for either group to change their name. [03:20:02.0000] <Hixie> webben: i really don't care if xhtml2 changes its name. i don't think it's an issue at all. [03:20:03.0000] <mikeday> at least one or two people on the planet know the difference between HTML and XHTML and XML :) [03:20:04.0000] <webben> Hixie: What do you mean by "an issue"? [03:20:05.0000] <hsivonen> but putting a name for the XML serialization that doesn't begin with "XHTML" makes no sense [03:20:06.0000] <Hixie> webben: a problem. a topic worth consideration. something that will cause harm and that needs changing. [03:21:00.0000] <MikeSmith> Hixie - but that doesn't mean it needs to stay that way, and it's not a convincing argument for keeping it that way ... if we use a different name, by a month from now, a few hundred or thousands will have blogged about that [03:21:01.0000] <webben> Hixie: Ah okay, I mean it's an issue of disagreement. [03:21:02.0000] <MikeSmith> hsivonen - why? [03:21:03.0000] <Hixie> MikeSmith: even if we use a different name, people will still refer to "xhtml", and the version of the language is "5" (as in html5), so they'll call it "xhtml5". [03:21:04.0000] <hsivonen> MikeSmith: because the XML formulation of HTML is known as "XHTML" [03:22:00.0000] <Hixie> MikeSmith: it's just a natural association [03:22:01.0000] <webben> I don't see any evidence that the majority would not use a different name for XHTML2 or XHTML5. [03:22:02.0000] <hsivonen> MikeSmith: the XHTML2 WG is confusing things by calling something that isn't an XML formulation of HTML "XHTML" for their marketing purposes [03:23:00.0000] <hsivonen> MikeSmith: but by and large, the "XHTML" people out there care about is the XML formulation of HTML 4 [03:23:01.0000] <webben> Which concedes the general principle that this is an issue. [03:23:02.0000] <webben> the whoever are "confusing things" [03:23:03.0000] <Hixie> webben: the majority of people will never even hear about xhtml2 [03:23:04.0000] <webben> confusion is an issue (something that causes harm) [03:23:05.0000] <webben> Hixie: Who is "people"? [03:23:06.0000] <Hixie> web authors [03:24:00.0000] <Hixie> people who will hear about xhtml [03:24:01.0000] <Hixie> people who write html documents [03:24:02.0000] <webben> Hixie: The majority of web authors haven't heard about XHTML. [03:24:03.0000] <webben> Those who have heard of XHTML are quite likely to have heard about XHTML2. [03:24:04.0000] <Hixie> i disagree [03:25:00.0000] <hsivonen> webben: I don't believe that statement [03:25:01.0000] <webben> Hixie: random blog commenters author HTML. [03:25:02.0000] <Hixie> "-//W3C//DTD XHTML 1.0 Transitional//EN" is the most common DOCTYPE [03:25:03.0000] <MikeSmith> hsivonen - in the past the XML formulation of HTML has been called XHTML, but there is no fundamental reason why it needs to remain that way [03:25:04.0000] <webben> Hixie: yes, usually created by a CMS. [03:25:05.0000] <MikeSmith> and "XHTML" now means more than just "XML formulation of HTML" [03:25:06.0000] <webben> indeed [03:25:07.0000] <MikeSmith> implies more [03:25:08.0000] <webben> it means tag soup [03:25:09.0000] <Hixie> webben: well, in any case, if they haven't heard about xhtml at all, then there's no problem at all. [03:26:00.0000] <hsivonen> webben: more to the point, of the people who *think* they are using XHTML, virtually 100% purports to use XHTML 1.0 or 1.1 [03:26:01.0000] <Hixie> webben: since they can hardly get confused about version numbers without hearinga bout it in the first place [03:26:02.0000] <webben> hsivonen: Well of course. They shouldn't be using XHTML2 yet. [03:26:03.0000] <MikeSmith> XHTML means HTML is valid against a certain doctype [03:26:04.0000] <webben> hsivonen: and if you mean purports, all that reflects is CMS choices. [03:26:05.0000] <hsivonen> MikeSmith: what does it mean? any spec published by the XHTML2 WG regardless of the nature of the language specified? [03:26:06.0000] <MikeSmith> or aims to be [03:26:07.0000] <webben> e.g. WordPress [03:26:08.0000] <webben> *if you mean by purports the doctype [03:27:00.0000] <MikeSmith> hsivonen - I think we need to quit caring what it means and choose a different name [03:27:01.0000] <hsivonen> webben: by purports, I mean the Appendix C delusion [03:27:02.0000] <Hixie> MikeSmith: i don't understand why you think there's a problem with the name. [03:27:03.0000] <MikeSmith> XHTML is a net liability to us as a name [03:27:04.0000] <webben> hsivonen: Yes precisely. it hardly argues XHTML is a good name for an XML serialization. [03:27:05.0000] <MikeSmith> Hixie - because of its legacy and its connotations [03:28:00.0000] <hsivonen> webben: I think that argument is more persuasive than any argument involving a claim by the XHTML2 WG [03:28:01.0000] <Hixie> well that's a new argument, i have to say [03:28:02.0000] <Hixie> haven't heard that one before [03:28:03.0000] <MikeSmith> and the fact that using it is creating contentious, distracting, unproductive debate [03:28:04.0000] <webben> I'm pretty sure I've made it before in this channel. [03:28:05.0000] <hsivonen> "XHTML" is much more tainted by the infamous Appendix C than associated with XHTML2 [03:29:00.0000] <Hixie> MikeSmith: i don't think changing the name will affect that, though, i think the solution to that is to do what the html5 spec does, and basically relegate xhtml to a bastard stepchild status. [03:29:01.0000] <Hixie> MikeSmith: there's a reason the spec is just called "html5" and not "html5 and xhtml5" or "(x)html5" or anything [03:29:02.0000] <MikeSmith> the are much more important battles to fight, creating one by staying wedded to that name is fight that isn't worth fighting [03:30:00.0000] <MikeSmith> Hixie - I now there's a reason, and I understand it to be a very good reason [03:30:01.0000] <webben> I'd prefer that we didn't think of it in terms of battles. [03:30:02.0000] <Hixie> i'm totally with you that we shouldn't fight this battle [03:30:03.0000] <MikeSmith> hsivonen - "tainted" is a apt word [03:30:04.0000] <Hixie> in fact, i've every intention of just walking away from it [03:30:05.0000] <Hixie> and ignoring it [03:31:00.0000] <MikeSmith> Hixie - fine for you to make that choice ... unfortunately, that's really not an option for the rest of us [03:31:01.0000] <MikeSmith> ignoring a problem does not make it go away, sadly [03:31:02.0000] <Hixie> why not? [03:31:03.0000] <Hixie> there's no problem [03:31:04.0000] <Hixie> that's my point [03:31:05.0000] <hsivonen> Hixie: the problem is that common concepts need short names [03:32:00.0000] <Hixie> so xhtml has a bad rep. big deal. xhtml is dead, long live html. [03:32:01.0000] <webben> Hixie: I think that to persuade people there is no problem, you'd have to not ignore the "problem" that people think there is a problem. [03:32:02.0000] <hsivonen> (except in the SGML circles where the most common concepts have the longest names) [03:32:03.0000] <Hixie> webben: heh [03:32:04.0000] <MikeSmith> hsivonen - heh. lol about that SGML naming comment [03:33:00.0000] <webben> XHTML doesn't have a bad rep except among a tiny minority of web authors. [03:34:00.0000] <Hixie> xhtml having a bad rep is the only reason MikeSmith gave for changing its name [03:34:01.0000] <webben> Most XHTML users are quite happy to serve Appendix C style faux-XHTML. [03:34:02.0000] <Hixie> so if xhtml doesn't have a bad rep, then great, we don't have to change the name [03:34:03.0000] <MikeSmith> Hixie - No, I didn't say the bad rep was the only reason [03:34:04.0000] <webben> It's not that it has a bad rep, it's that it means something different. [03:34:05.0000] <webben> (which is why it has a bad rep among that tiny minority) [03:35:00.0000] <Hixie> MikeSmith: i didn't say you said it was the only reason. i said it was the only reason you gave. [03:35:01.0000] <Hixie> webben: means something different than what? [03:35:02.0000] <MikeSmith> I said it's ambiguous and it implies things more than just an XML serialization of HTML [03:35:03.0000] <hsivonen> webben: we want those authors to migrate to HTML5 as text/html, not XHTML5 with wrong Content-Type [03:35:04.0000] <MikeSmith> it implies validity instead of just well-formedness [03:35:05.0000] <Hixie> hsivonen: don't worry, they can't migrate to xhtml5 with wrong mime type. xhtml5 is defined in terms of mime types. :-D [03:35:06.0000] <MikeSmith> among other things [03:35:07.0000] <Hixie> uh [03:35:08.0000] <Hixie> what? [03:36:00.0000] <Hixie> html4, xhtml1, html5, and xhtml5 have exactly the same concepts of conformance [03:36:01.0000] <MikeSmith> A set of authoring practices [03:36:02.0000] <webben> hsivonen: Well sure. That's another good reason not to have XHTML5 as a name. [03:36:03.0000] <webben> hsivonen: It makes the intention much clearer. [03:36:04.0000] <MikeSmith> xhtml1 is a language with a schema [03:36:05.0000] <webben> XHTML1 as She is Wrote isn't. [03:37:00.0000] <MikeSmith> xhtml in fact has rules that are a superset of XML conformance and well-formedness rules [03:37:01.0000] <MikeSmith> such as, requiring that document instances must contain a doctype [03:37:02.0000] <Hixie> MikeSmith: xhtml1 and html4 both have a set of conformance rules that you must obey in order to be conforming. [03:38:00.0000] <webben> MikeSmith: XHTML doesn't require that; XHTML 1.0 and 1.1 do. [03:38:01.0000] <webben> AFAIK. [03:38:02.0000] <Hixie> MikeSmith: just like the html5 and xhtml5 languages [03:38:03.0000] <MikeSmith> Hixie - yeah, and they are two different sets of rules [03:38:04.0000] <Hixie> MikeSmith: such as having a doctype in the case of xhtml1.x, and such as putting valid URIs in href="" attributes [03:38:05.0000] <Hixie> MikeSmith: they have four different sets of rules [03:39:00.0000] <Hixie> MikeSmith: heck even xhtml1.0 and 1.1 have different rules. [03:39:01.0000] <webben> MikeSmith: see e.g. http://www.w3.org/TR/xhtml1/#well-formed [03:39:02.0000] <Hixie> wouldn't be much point having different versions if they were identical! [03:39:03.0000] <hsivonen> Hixie: you are presupposing that there is a point! [03:40:00.0000] <Hixie> hsivonen: granted! [03:40:01.0000] <webben> hsivonen: Ruby. That /is/ a point. [03:40:02.0000] <webben> http://www.w3.org/TR/xhtml11/changes.html#a_changes [03:40:03.0000] <webben> Talking of which, has WHATWG specced Ruby in HTML yet? [03:40:04.0000] <Hixie> webben: no, it's on my pile of things to do [03:40:05.0000] <webben> cool :) [03:41:00.0000] <hsivonen> webben: the joke is that Ruby only sort of works in IE which wants it as text/html [03:41:01.0000] <webben> hsivonen: I was going to say, should be easy to include given IE's support in text/html. [03:41:02.0000] <webben> Amaya supports Ruby too. [03:42:00.0000] <Hixie> "easy" isn't the word i would use [03:42:01.0000] <krijnh> Too? Does Amaya support anything else? :) [03:42:02.0000] <Hixie> anne did some research on the error handling rules we'll have to add [03:42:03.0000] <webben> There's also: https://addons.mozilla.org/en-US/firefox/addon/1935 [03:42:04.0000] <hsivonen> webben: I forget the standard quatifier: in set {Trident, Presto, Gecko, WebKit} [03:42:05.0000] <Hixie> but i haven't checked if they're enough [03:42:06.0000] <webben> hsivonen: Mozilla developers seem to believe that having extensions support things counts as support. [03:43:00.0000] <hsivonen> webben: Mozilla developers or random Bugzilla commentators? [03:43:01.0000] <webben> hsivonen: If the core devs don't disagree, doesn't seem much difference. [03:43:02.0000] <MikeSmith> The fact that the XML serialization of HTML has a quite different sets of rules from the XHTML 1.x/2 language would seem to me to suggest that it would be good to have a clearly different name for it, that unambiguously disassociates it from the extra rules of XHTML [03:43:03.0000] <webben> it's a status quo [03:44:00.0000] <MikeSmith> webben - true on that comment about extensions being counted as support [03:44:01.0000] <Hixie> MikeSmith: the XML serialisation of HTML5 has very similar rules to XHTML1.0 [03:44:02.0000] <Hixie> MikeSmith: and both of those have very different rules to XHTML2 [03:45:00.0000] <webben> but different semantics [03:45:01.0000] <webben> (as with HTML) [03:45:02.0000] <hsivonen> webben: I hadn't noticed that regarding layout feature support. true for UI features, though [03:45:03.0000] <MikeSmith> having third-party extensions to your application, ones that you can blow off any responsibility at all for, is not a way to ensure that users have the best user experience of your application [03:45:04.0000] <webben> My point is less about what is good or bad practice, as that assessment of "what is supported" must include extensions. [03:47:00.0000] <webben> There are major advantages to having extensions, because they provide a testing ground for UIs for things like microformats. [03:49:00.0000] <webben> Even if they do also create an incentive not to add features to core. [03:49:01.0000] <MikeSmith> Of course there are big advantages, but having a powerful plug-in architecture does not mean that you should require users to install extensions in order to have access to key features [03:49:02.0000] <webben> MikeSmith: agreed [03:50:00.0000] <hsivonen> for practical purposes, *Gecko* doesn't support Ruby even if there's a fringe way to hack support into Firefox [03:51:00.0000] <MikeSmith> hsivonen - is that true of the mozilla2 codebase also? [03:52:00.0000] <Hixie> what mozilla2 codebase? [03:52:01.0000] <hsivonen> MikeSmith: dunno. I'm thinking the Firefox 2/3 timeframe [03:53:00.0000] <hsivonen> (I don't even know if the Mozilla2 major refactorings have gone forward yet) [03:53:01.0000] <MikeSmith> Hixie - I guess I assumed there is already exising mozilla2 code, from reading Brendan's roadmap stuff [04:03:00.0000] <webben> hsivonen: Why is what a particular rendering engine supports significant? [04:03:01.0000] <webben> surely what's important is: what can end-users readily do with some markup [04:03:02.0000] <webben> if it's implemented in Amaya only, they can't really do much, because switching to Amaya is weird [04:03:03.0000] <Hixie> MikeSmith: as far as i know there's only one codebase [04:03:04.0000] <webben> but installing a Fx extension is a piece of cake [04:04:00.0000] <webben> the rendering engine is only one component of the end-user experience that the Mozilla ecoystem offers [04:05:00.0000] <webben> sorry not, why is it significant, why is it the only significant thing i should say [04:05:01.0000] <webben> obviously core Gecko support would be preferable. [04:06:00.0000] <MikeSmith> Hixie - OK. I thought Brendan had said they'd try to be shipping Firefox 4 some time in 2008 based on the new/refactored moz2 codebase (including integrating Tamarin in) ... seems it would be difficult for them to ship 2008 if the work hasn't actually started yet [04:06:01.0000] <Hixie> MikeSmith: indeed. [04:07:00.0000] <webben> interesting, according to https://bugzilla.mozilla.org/show_bug.cgi?id=33339, IE's Ruby-in-HTML implementation does conform roughly to a working draft of Ruby that allowed that. [04:09:00.0000] <MikeSmith> /me wonders what the timeframe is for browsers based on the QT port of Webkit might be [04:11:00.0000] <MikeSmith> I think (hope) having a Qt-based WebKit is going to be a huge boost for getting better browsers onto more mobile devices [04:11:01.0000] <webben> is it? [04:12:00.0000] <webben> what do current devices that might use a QT-based webkit use now? [04:15:00.0000] <MikeSmith> webben - well, they do have one very good option already, which is Opera Mobile [04:15:01.0000] <MikeSmith> for the Qtopia platform [04:15:02.0000] <webben> What would webkit add to that? [04:16:00.0000] <webben> (not saying it would be a bad thing, just wondering what the big deal would be) [04:16:01.0000] <MikeSmith> webben - choice? [04:16:02.0000] <webben> ah okay [04:18:00.0000] <MikeSmith> and more competition, and thus more incentive to add more features and improve performance ... [04:19:00.0000] <MikeSmith> and make users and device manufacturers and mobile operators realize we don't need to keep shipping handsets with the sucky browsers that many of them now have [04:19:01.0000] <MikeSmith> /me is running out of battery and will have to drop off soon [04:21:00.0000] <MikeSmith> hey this is cool (to me at least) - [04:21:01.0000] <MikeSmith> [[ [04:21:02.0000] <MikeSmith> <CIA-5> ap * r24088 /trunk/ (12 files in 6 dirs): (log message trimmed) [04:21:03.0000] <MikeSmith> <CIA-5> Reviewed by Maciej. [04:21:04.0000] <MikeSmith> <CIA-5> http://bugs.webkit.org/show_bug.cgi?id=14525 [04:21:05.0000] <MikeSmith> <CIA-5> Support exslt:node-set() [04:21:06.0000] <MikeSmith> <CIA-5> Test: fast/xsl/exslt-node-set.xml [04:21:07.0000] <MikeSmith> <CIA-5> * xml/XSLTExtensions.cpp: Added. [04:21:08.0000] <MikeSmith> <CIA-5> (WebCore::exsltNodeSetFunction): A copy of exslt:node-set() implementation [04:21:09.0000] <MikeSmith> ]] [04:55:00.0000] <Philip`> jgraham: Okay - I assumed that was probably the case, and added a bug report [04:55:01.0000] <Philip`> /me needs to fix his OCaml tokeniser so it can help in generating test cases [05:23:00.0000] <zcorpan_> the forum is getting spam again :( [05:43:00.0000] <Philip`> Hooray, the OCaml one works [05:44:00.0000] <Philip`> ...though I don't have any way to test non-PCDATA bits yet [05:57:00.0000] <Philip`> How should "<!---x-->" get tokenised? [05:57:01.0000] <Philip`> html5lib gives a comment containing "x", I get one containing "-x" [06:01:00.0000] <Philip`> "Comment start dash state -> Anything else -> Append a U+002D HYPHEN-MINUS (-) character and the input character to the comment token's data." - looks like html5lib is missing that bit [06:07:00.0000] <zcorpan_> file a bug [06:09:00.0000] <Philip`> I was just trying to work out whose bug it was, since I don't trust my own code at all :-) [06:10:00.0000] <Philip`> but it seems to be theirs, so that's alright [06:27:00.0000] <jgraham> Philip`: Thanks. It seems I was unsubscribed from the bug tracker email [06:29:00.0000] <jgraham> /me wishes the people on public-html would spend their time doing something productive toward the spec or, alternatively, get a hobby that doesn't involve sending me email. [07:04:00.0000] <krijnh> /me agrees with jgraham [07:04:01.0000] <krijnh> 636 onread mails :/ [07:04:02.0000] <krijnh> *unread [07:15:00.0000] <jgraham> Philip`: I think I've fixed both your bugs [07:20:00.0000] <Philip`> Unless I did something wrong again, http://canvex.lazyilluminati.com/misc/testcov_html5lib.png should show how much of the tokeniser is covered by the test1.dat, test2.dat from html5lib [07:21:00.0000] <Philip`> (Each line is one of the algorithm steps from the spec, and black indicates that it's covered by a test) [07:22:00.0000] <Philip`> (There are a few points where it's definitely wrong, since I haven't implemented named-entity tokenisation yet) [07:23:00.0000] <Philip`> ((and some of the states with underscores are not really in the spec, they're just slight tweaks to make it fit the state-machine model better)) [07:25:00.0000] <Philip`> Conclusion: more tests needed :-) [07:45:00.0000] <gsnedders> the spec references "Unicode character class Z". What is it? I can see nothing about character classes in the unicode spec… [07:45:01.0000] <Philip`> http://canvex.lazyilluminati.com/misc/testcov_html5.png - frequency of transitions when parsing the HTML5 spec (with line width proportional to log(freq)) [07:48:00.0000] <Philip`> gsnedders: Looks like it's http://unicode.org/Public/UNIDATA/UCD.html#General_Category_Values [07:48:01.0000] <Philip`> ("Zs: Separator, Space") [07:49:00.0000] <Philip`> (as the third field in http://unicode.org/Public/UNIDATA/UnicodeData.txt) [07:52:00.0000] <Philip`> (Maybe HTML5 should say "...characters that are in the Unicode General Category Zs." instead?) [07:52:01.0000] <gsnedders> "character class" seems to be defined in the first appendix as being a BNF structure (looking it up in the index) [08:22:00.0000] <Philip`> jgraham: All my tests seem to work in html5lib now - thanks :-) [11:03:00.0000] <virtuelv> Hm. [11:03:01.0000] <virtuelv> <section><header><h1>...</h1></header><nav></nav></section> produces a rather broken DOM in Firefox [11:12:00.0000] <Philip`> You could write XHTML5 and then it'd give the right DOM :-) [11:14:00.0000] <virtuelv> Heh. As if [11:15:00.0000] <zcorpan_> as if what? [11:15:01.0000] <zcorpan_> You could write XHTML5 or it'd give the right DOM? [11:16:00.0000] <virtuelv> as if I'll ever serve something with application/* [11:16:01.0000] <zcorpan_> so the former :) [11:16:02.0000] <Philip`> You could write HTML5, and then just use some translation layer on the server to provide XHTML5 to legacy user agents for backward compatibility [11:16:03.0000] <virtuelv> well, in one word, no. [11:16:04.0000] <Philip`> though then you'd need something else for legacy legacy UAs like IE7 [11:16:05.0000] <virtuelv> I'd rather give Firefox users unstyled content [11:18:00.0000] <zcorpan_> header+h1 [11:18:01.0000] <zcorpan_> :P [14:01:00.0000] <hsivonen> when the list of formatting elements is cleared up to a marker, does the rest of the algorithm guarantee that there is a marker? [14:26:00.0000] <Philip`> I can [I think] guarantee that the current token in the tokeniser is always the correct type (e.g. it's a tag token if you're appending to the tag token name, and it's a tag token with attributes if you're appending to the attribute name, etc), but I know nothing at all about the tree construction [14:27:00.0000] <Philip`> but I'd really like to look at that at some point and see if it's similarly straightforward to prove things about it, since it's nice when it works :-) [14:27:01.0000] <hsivonen> ok [14:27:02.0000] <hsivonen> yeah, it would certainly be nice if the algorithm was proven correct [14:28:00.0000] <hsivonen> and then if my optimizations are correct :-) [14:29:00.0000] <hsivonen> in many places, as written, the algorithm involves searching the stack twice or thrice per token even though searching once and remembering the stack slot found would be enough as the operations in between don't alter the slot [14:30:00.0000] <Philip`> The main difficulty I have is in convincing myself that my proof method is correct [14:30:01.0000] <Philip`> I guess I should find some way to make the prover output easily-verifiable conditions, so people can check the proof was valid 2007-07-08 [18:42:00.0000] <Philip`> /me finds another html5lib bug [18:43:00.0000] <Philip`> and I now have tests for almost every state transition in the tokeniser, only missing the ones that require non-PCDATA (which I don't handle yet) [19:07:00.0000] <Philip`> http://canvex.lazyilluminati.com/misc/test3.test [19:07:01.0000] <Philip`> hsivonen: I see a couple of types of test failure in your tokeniser [19:08:00.0000] <Philip`> (for "<!doctype! ?>" (too few parse errors) and "<z/0 >" (it misses the attribute), and variations of those) [19:14:00.0000] <Philip`> ((It wasn't intentional for my tests to use "<!doctype!" and "<z/0" so much - that's just what fell out of the sorting function)) [02:30:00.0000] <hsivonen> Philip`: thank you. I fixed bugs exposed by your test cases. One test case failure is a bug in your test cases, though: "<z/0 0" that should give 3 errors: non-permitted slash, EOF in attribute name and duplicate attribute "0". [02:30:01.0000] <hsivonen> Philip`: are you planning on contributing your tests to html5lib? [03:20:00.0000] <Hixie> http://lists.w3.org/Archives/Member/w3c-html-cg/2007JulSep/0013.html is interesting [03:24:00.0000] <hsivonen> indeed [03:24:01.0000] <Hixie> the "kitchen sink" threads are also interesting [03:25:00.0000] <Hixie> in that it seems a lot of people on that mailing list don't really understand what's going on [03:25:01.0000] <Hixie> oh well [03:29:00.0000] <hsivonen> Hixie: do you mean vigorously lobbying for a detail that is already in the spec? [03:31:00.0000] <hsivonen> someone really needs to write a primer on diminishing returns, externalities and network effects for the WG, but I'm pretty sure that if someone did, (s)he'd be slammed for not being a real economist [03:44:00.0000] <Hixie> hsivonen: i meant like being worried that XBL2 points to HTML5 and that therefore the security thing might not be defined, etc... missing the whole point that i had to write the security thing anyway, it didn't matter which spec i put it in [03:44:01.0000] <Hixie> anyway [03:44:02.0000] <Hixie> bed time [03:44:03.0000] <Hixie> probably will be online very spottily for the next three weeks [03:53:00.0000] <hsivonen> nn [03:54:00.0000] <hsivonen> Hixie: oh you referred to public-appformats [04:35:00.0000] <Philip`> hsivonen: Oh, whoops, I haven't done anything about duplicate attributes [04:35:01.0000] <Philip`> (I guess html5lib hasn't either, since it passed that test) [04:39:00.0000] <Philip`> Looks a bit irritating how it says to drop the attribute value before you've actually got an attribute value at all... [04:40:00.0000] <Philip`> /me tries to think of a nice way to handle that [04:41:00.0000] <hsivonen> Philip`: I have a boolean flag [04:42:00.0000] <hsivonen> Philip`: and I defer the actual addition of an attribute til the value is complete or know not to exist [04:42:01.0000] <hsivonen> Philip`: which was also the source of one class of test case failures you found [04:48:00.0000] <Philip`> Adding another state variable makes other things more complex (like when verifying you never have to alter an attribute unless there actually is an attribute), so it'd be nice to avoid that if possible [04:53:00.0000] <Philip`> (Well, it's fine to add a state variable into the C++/etc implementation, but preferably not into the conceptual model of the algorithm) [05:04:00.0000] <jgraham> Philip`: So do you want html5lib commit access (hint, hint)? [05:10:00.0000] <hsivonen> would someone like to volunteer to check an email about bikeshedding, belling the cat and economics 101 for suitability of sending to public-html? in particular, checking whether it is offensively patronizing? [05:11:00.0000] <Philip`> jgraham: Oops, I forgot to respond to hsivonen's second comment - I expect it would be good to add these tests to html5lib (which is why I named the file test3.test already :-) ) [05:12:00.0000] <Philip`> at least once I've fixed the bugs, and added manually-written tests for the other bugs I have in my code [05:12:01.0000] <jgraham> Philip`: I went ahead and gave you commit access whether you wanted it or not :) [05:12:02.0000] <Philip`> jgraham: I just saw that - thanks :-) [05:28:00.0000] <Philip`> /me wonders if it matters that his test cases don't have good descriptions [05:29:00.0000] <jgraham> Philip`: I think I've fixed issue 50. Your testcases would be most welcome now so I can have some confidence that I did the right thing [05:29:01.0000] <jgraham> And also because I promised them in the commit log :) [05:30:00.0000] <Philip`> Just trying to fix the duplicate-attribute issue, which hopefully won't take long :-) [05:30:01.0000] <jgraham> Descriptions are good but probably not essential - the treebuilder tests are all description free, for example [05:30:02.0000] <jgraham> But if you can add them, please do :) [05:31:00.0000] <Philip`> I have no idea what most of my test cases are doing, so I don't know how to usefully describe them [05:32:00.0000] <Philip`> Maybe I could convince the test-generating program to work out why it's choosing those particular ones, but that seems like more effort than would be worthwhile [05:33:00.0000] <jgraham> Philip`: I guess if you keep all auto-generated tests to their own file it's fine [05:33:01.0000] <jgraham> /me notices he changed something and forgot to run the treewalkers tests [05:34:00.0000] <hsivonen> hmm. I guess I just send the message at the risk of offending some people [05:35:00.0000] <jgraham> hsivonen: Go for it. At least then we'll get email about how offended people are rather than whether or not Anne should include all his optional tags [05:36:00.0000] <jgraham> which is probably the most boring thread ever [05:36:01.0000] <jgraham> ;) [05:38:00.0000] <hsivonen> jgraham: sent [05:41:00.0000] <hsivonen> enjoy: http://lists.w3.org/Archives/Public/public-html/2007Jul/0507.html [05:56:00.0000] <hsivonen> was I too offensive? [05:57:00.0000] <Philip`> jgraham: Committed the new tests now [05:58:00.0000] <jgraham> Philip`: Cool [05:58:01.0000] <Philip`> including one with duplicate attribute values, which html5lib fails [05:58:02.0000] <Philip`> (hsivonen's implementation passes all those tests now) [05:59:00.0000] <jgraham> hsivonen: No, I don't think so. Possibly a little terse, but if people read the links they should get the idea (I'm just reading the joel on software one which I don't think I've seen before) [06:12:00.0000] <zcorpan_> /me likes hsivonen's terse style [06:13:00.0000] <Philip`> Hmm, the html5lib Ruby tokeniser doesn't seem entirely happy with EOFs [06:13:01.0000] <Philip`> (resulting in various things like <"undefined method `+' for :EOF:Symbol">) [06:51:00.0000] <jgraham> Philip`: All your tests seem to pass now [07:14:00.0000] <Philip`> jgraham: That must mean more tests are needed ;-) [08:28:00.0000] <Philip`> More tests says: html5lib doesn't lowercase tag/attribute names [09:04:00.0000] <jgraham> Philip`: We lowercase them at the tree construction stage (because Sam reuses the tokenizer in situations where case is important) [09:06:00.0000] <Philip`> Ah, okay [09:06:01.0000] <hsivonen> jgraham: what are those situations? [09:07:00.0000] <Philip`> How would it best to test that tokenisers do implement what the spec says (with lowercasing names), while accepting that html5lib doesn't do that at that point? [09:08:00.0000] <hsivonen> jgraham: out of curiosity, why didn't you parametrize this in the tokenizer? [09:08:01.0000] <Philip`> (And does html5lib work correctly when you do <a a=1 A=2>?) [09:09:00.0000] <hsivonen> (I think I've been a bit naïve with the way I handle lower casing per spec instead of having a readCaseFolded() method) [09:57:00.0000] <Philip`> hsivonen: Your entity overflow code doesn't quite work - with input like &#x100000041; the value overflows from 0x10000000 to 0x00000000 and it's never negative so it never hits the overflow-handler [09:58:00.0000] <Philip`> /me will upload tests for that at some point [09:59:00.0000] <hsivonen> Philip`: ouch. good point [09:59:01.0000] <gsnedders> hsivonen: Sam uses it for parsing XML [09:59:02.0000] <gsnedders> hsivonen: (the XML having failed to be processed by an XML parser) [10:03:00.0000] <hsivonen> Philip`: fix (I think) checked in [10:03:01.0000] <hsivonen> Philip`: thanks [10:09:00.0000] <Philip`> hsivonen: Seems to work perfectly now [10:11:00.0000] <hsivonen> looks like my code is bad enough to generate community interest after all :-) [10:12:00.0000] <Philip`> I'm always interested in breaking things ;-) [10:14:00.0000] <Philip`> I'd be interested in trying to generate stack-using code like yours, and seeing how that works in comparison with switch-statements or gotos [10:15:00.0000] <Philip`> though I'm not sure how much automatic transformation I can do to extract stackiness, and I'm too lazy to do that manually [10:16:00.0000] <hsivonen> Philip`: my expectation is that the code won't be stack-based once server HotSpot has done its thing [10:16:01.0000] <Philip`> though that reminds me that I need to collect a set of documents for performance testing... [10:16:02.0000] <hsivonen> if the expectation is incorrect, running a byte code-level optimizer would make sense [10:19:00.0000] <hsivonen> Philip`: either way, it seems to me that it is easier to convert the kind of code I have written to unconditional jumps than it would be for a switch (which, OTOH, would guarantee no worse that conditional jumps) [10:21:00.0000] <Philip`> It seems quite possible that HotSpot could give better performance for your code than for switch-based code, if it's doing lots of inlining and tail-call optimisation - it'd be interesting to see how well it works in practice [10:21:01.0000] <hsivonen> Philip`: so whether the use of methods vs. one huge switch makes sense depends on what HotSpot really does [10:21:02.0000] <Philip`> That makes sense [10:22:00.0000] <Philip`> Unfortunately C++ doesn't have the advantage of dynamic compilation, so I guess things will act totally differently there [10:22:01.0000] <Philip`> but fortunately it doesn't have dynamic compilation, so you can usually have some idea of what the compiler's actually going to do to your code :-) [10:23:00.0000] <hsivonen> Philip`: the problem is that testing which approach really performs better on HotSpot is a non-trivial task. Which is why I went with an unverified educated (hopefully :-) guess [10:23:01.0000] <hsivonen> Philip`: yeah, I'd bet on switch in the C++ case [10:24:00.0000] <Philip`> That's why I'd like to be able generate different implementation approaches from the same source data, which is still non-trivial but involves much less typing :-) [10:25:00.0000] <hsivonen> besides, for Gecko-like threading (lack thereof), on would want to have a switch-based parser with states broken down even further so that each state reads at most one character [10:25:01.0000] <hsivonen> this way, the state variable would effectively store the current continuation [10:26:00.0000] <hsivonen> and the tokenizer could be interrupted after any input character [10:26:01.0000] <hsivonen> s/on would/one would/ [10:27:00.0000] <Philip`> What happens when you need to look ahead by ~6 characters at once? [10:27:01.0000] <hsivonen> Philip`: I don't. [10:27:02.0000] <hsivonen> Philip`: my max lookahead is one read()/unread() [10:28:00.0000] <hsivonen> Philip`: otherwise, I buffer pessimistically and look back [10:28:01.0000] <hsivonen> Philip`: when I start consuming a doctype, I start building a bogus comment in parallel just in case [10:30:00.0000] <hsivonen> I should learn how to dump native code disassemblies from HotSpot some day [10:32:00.0000] <Philip`> Ah, okay, so if you had "<!docty>"(network latency) then it would emit the token before running out of characters [10:32:01.0000] <hsivonen> yes [11:46:00.0000] <hsivonen> I wonder what the Wikipedia article on "HTML 5" means when it says "Elements no longer compatible with HTML 4 – a, hr, strong" [11:49:00.0000] <othermaciej> [NEEDS CITATION] [11:50:00.0000] <Philip`> Looks like they just chose a random selection of points from html4-differences [11:50:01.0000] <Philip`> and in that case, particularly the points under "These elements have new meanings in HTML 5 which are incompatible with HTML 4" [11:52:00.0000] <Philip`> (Not entirely sure what the point is in duplicating that data badly, when there's an external link to html4-differences) [12:13:00.0000] <zcorpan_> /me edited the wiki page: "Elements with redefined meaning which are not compatible with HTML 4 – a, hr, strong" [12:42:00.0000] <jgraham> hsivonen: re: why case handling wasn't paramterized in the tokenizer: I don't know. I think Sam just picked a solution that did what he wanted. Is there a good reason to prefer a different approach [12:42:01.0000] <jgraham> ? [12:43:00.0000] <jgraham> Philip`: I'll change the html5lib test harness to do the same thing with attribute names as the treebuilder [12:48:00.0000] <hsivonen> jgraham: reasons are running tokenizer-level tests and eliminating duplicate attributes [12:50:00.0000] <hsivonen> jgraham: going forward if we integrate SVG, a flag you can toggle in mid-tokenization might become useful [12:50:01.0000] <jgraham> hsivonen: That's a good point [12:50:02.0000] <jgraham> OK, I think I will change it to work with a flag [12:51:00.0000] <hsivonen> perhaps in the future I move case folding to one place behind a flag so that case folding writes back to the read buffer [12:52:00.0000] <hsivonen> this way I could avoid name copying whenever a name doesn't cross the read buffer boundary [12:54:00.0000] <hsivonen> the fun part would be that then one could make a legitimate claim that lower case is faster :-) [13:09:00.0000] <zcorpan_> /me likes <xmp> and wonders why it was deprecated way back when [13:10:00.0000] <zcorpan_> even pretending html to be sgml it's just an ordinary rcdata element, isn't it? [13:10:01.0000] <Philip`> Maybe because they couldn't work out a good way to show authors an example of how to write <xmp>...</xmp>, without the example closing itself half-way through? [13:10:02.0000] <hsivonen> zcorpan_: there's a subject for a fun public-html thread [13:10:03.0000] <Philip`> Opera's <xmp> parsing is very broken, unfortunately :-( [13:10:04.0000] <Philip`> I think it was actually mentioned on public-html some months ago [13:10:05.0000] <zcorpan_> Philip`: &lt;/xmp> [13:11:00.0000] <Philip`> zcorpan_: That won't work, except in Opera [13:11:01.0000] <Philip`> since it ought to just show the text "&lt;/xmp>" [13:11:02.0000] <zcorpan_> oh, it's a cdata element even [13:14:00.0000] <Philip`> http://software.hixie.ch/utilities/js/live-dom-viewer/?a%3Cscript%3E%3C/script/%3Ea%0Aa%3Cstyle%3E%3C/style/%3Ea%0Aa%3Cxmp%3E%3C/xmp/%3Ea - that's rather odd in Firefox [13:17:00.0000] <zcorpan_> Philip`: file a bug? :) [13:19:00.0000] <Philip`> No need - it'll all be perfect once they've just implemented HTML5 ;-) [13:26:00.0000] <zcorpan_> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3Ca/x%3E%3Ca/x/%3E%3Ca/x/x%3E%3Ca/x%20%3E%3Ca/x%20x%3E [13:35:00.0000] <zcorpan_> <xmp><!--</xmp>--></xmp> ;) [14:50:00.0000] <Philip`> Looks like my OCaml implementation isn't very good - the C++ one is 150 times faster... [15:00:00.0000] <Philip`> Oh, right, that's because I'm making it read from stdin one character at a time [15:10:00.0000] <Philip`> Aha - the OCaml one is now only four times slower than the C++ one, for tokenising the HTML5 spec [16:25:00.0000] <Philip`> http://canvex.lazyilluminati.com/svn/tokeniser/ is the current version of [not quite all of] my code 2007-07-09 [17:12:00.0000] <Philip`> rubys: Since you said you were interested a few days ago: http://canvex.lazyilluminati.com/svn/tokeniser/ has my current, uh, meta-tokeniser(?) code [17:13:00.0000] <Philip`> (It can run the tokeniser in OCaml, and can create one in C++) [17:13:01.0000] <Philip`> ((Both are just missing non-numeric entity support since that wasn't very interesting and I haven't added it yet)) [17:16:00.0000] <rubys> ((you like to talk parenthetically, don't you?)) [17:17:00.0000] <rubys> I've downloaded it and run it. [17:17:01.0000] <Philip`> (It's a bad habit of mine :-( ) [17:19:00.0000] <othermaciej> too much lisp coding? [17:19:01.0000] <rubys> make_cpp looks fairly small; by implication make_py or make_rb would be too. Of course, an equivalent to tokeniser.cpp would also be necessary. [17:22:00.0000] <Philip`> The entity handling seems to require the most work in the language-specific code - it would be nice if that could be done more generically, like the rest of the state machine, but I've not really looked into that [17:53:00.0000] <Philip`> /me tries to work out what modifications to the generated C++ code would help efficiency easily [18:11:00.0000] <Philip`> http://canvex.lazyilluminati.com/misc/statestats.txt [18:11:01.0000] <Philip`> There's nearly as much double-quoted attribute value as there is plain text [18:15:00.0000] <Philip`> (Single-quoted is twice as common as unquoted, and double-quoted is twenty-five times more common than single-quoted) [18:15:01.0000] <Philip`> I guess the people who like XML syntax should be happy that unquoted values are so uncommon [18:19:00.0000] <Philip`> Assuming nobody has multiple doctypes, I found 1621 in 2522 pages, which is fortunately about the same as the 41%-with-no-doctype that Hixie reported [19:12:00.0000] <othermaciej> that's uncommon? [00:50:00.0000] <zcorpan_> why does HTMLCollection.namedItem() check for .name on some elements as opposed to any html element? [00:52:00.0000] <zcorpan_> is it that we want HTMLDocument.commands only look at id and not name? [00:53:00.0000] <zcorpan_> i'd rather it was more consistent with the other collection attributes... hmm [00:56:00.0000] <zcorpan_> oh. nevermind. e.g. <table>.rows doesn't look at name attributes [02:03:00.0000] <zcorpan_> is [[Get]] e.g. this?: forms[0] [04:29:00.0000] <Dashiva> zcorpan_: That's the most common use, at least [04:31:00.0000] <zcorpan_> Dashiva: ok [04:36:00.0000] <Dashiva> (on a lower level it's an internal property access, overridden to look for something other than the exact property; the details aren't too relevant, though) [07:35:00.0000] <zcorpan_> should window.frames look for svg foreignObjects? [07:40:00.0000] <zcorpan_> i.e., do foreignObjects create a nested browsing context? [07:47:00.0000] <zcorpan_> i.e. external foreign objects [08:03:00.0000] <hsivonen> losts and lots of continue and break in the parser... [08:03:01.0000] <hsivonen> just a step away from goto programming... 2007-07-10 [01:21:00.0000] <zcorpan_> hmm, wonder if "the title element" is right [01:24:00.0000] <hsivonen> zcorpan_: why? [01:25:00.0000] <zcorpan_> it seems too restrictive relative to what browsers implement [01:25:01.0000] <zcorpan_> and only opera seems to reflect document changes [01:26:00.0000] <zcorpan_> e.g., if you move the title element from head to body, does it stop being the title element? [01:27:00.0000] <zcorpan_> if you insert a new title before the title element, which is "the title element"? [01:34:00.0000] <hsivonen> zcorpan_: seems like an opportunity for test-based detailed review. :-) [01:36:00.0000] <zcorpan_> hsivonen: indeed :) [01:36:01.0000] <zcorpan_> /me has 10 tests already [01:39:00.0000] <othermaciej> 10 tests for what? [01:39:01.0000] <zcorpan_> document.title [01:39:02.0000] <zcorpan_> getting [01:45:00.0000] <hsivonen> I have no idea how to make a sane streaming approximatin of the AAA [01:45:01.0000] <hsivonen> will leave it undone for now [01:46:00.0000] <zcorpan_> that isn't drocanian? [01:46:01.0000] <hsivonen> zcorpan_: no, non-conforming and non-draconian [01:47:00.0000] <hsivonen> zcorpan_: halting is draconian [01:47:01.0000] <hsivonen> zcorpan_: sorry. I misread you [01:47:02.0000] <hsivonen> zcorpan_: yes, an approximation that isn't draconian [01:47:03.0000] <hsivonen> (and is thereby non-conforming) [01:48:00.0000] <zcorpan_> ok [02:04:00.0000] <zcorpan_> wow [02:05:00.0000] <zcorpan_> safari for windows treats .xht files as text/html [02:40:00.0000] <zcorpan_> where is SVGDocument.title defined? [02:46:00.0000] <hsivonen> zcorpan_: Section 5.17 in SVG 1.1 [02:51:00.0000] <zcorpan_> thanks [03:35:00.0000] <zcorpan_> /me notes that http://www.whatwg.org/specs/web-forms/current-work/#form-submission mixes block and inline content [05:02:00.0000] <Xiven-> hi, hixie just wanted me to pass on the message to anyone at Opera, that he's currently in the Oslo office canteen, so come along and say hi [06:47:00.0000] <zcorpan_> Document.renameNode isn't implemented anywhere, is it [06:51:00.0000] <hsivonen> hmm. I wonder what renameNode does to implemented interfaces. [06:51:01.0000] <hsivonen> /me looks up DOM3 [06:52:00.0000] <hsivonen> oh. it returns a new object [07:05:00.0000] <zcorpan_> what are the use-cases of renameNode? [07:05:01.0000] <zcorpan_> seems like something that could be dropped in DOM5 Core... ;) [07:08:00.0000] <hsivonen> zcorpan_: suppose you have a script that walks an XHTML2 document tree and converts it to XHTML5. you'd renameNode separator to hr [07:08:01.0000] <zcorpan_> wouldn't you use XSLT for that? [07:09:00.0000] <hsivonen> zcorpan_: not according to http://hixie.ch/advocacy/xslt [07:10:00.0000] <zcorpan_> ok [07:10:01.0000] <zcorpan_> the bigger problem though is serving xhtml2 over the wire in the first place [07:10:02.0000] <hsivonen> sure [07:11:00.0000] <hsivonen> I don't know but I guess that renameNode exists to have full API-provided mutability [07:11:01.0000] <Philip`> You could be using DOM methods on the server side to transform something into XHTML5 [07:11:02.0000] <hsivonen> I'd expect to see more use case for renameNode on server-side Java apps than in scripts running in browsers [07:11:03.0000] <hsivonen> cases [07:12:00.0000] <zcorpan_> yeah [07:12:01.0000] <zcorpan_> for client side i can only see the use-case of writing obscure test cases [07:13:00.0000] <hsivonen> on the face of it, it looks like a "nice to have", "for completeness" feature [07:13:01.0000] <zcorpan_> so it could have been useful to me today if it was implemented anywhere [07:13:02.0000] <zcorpan_> yeah [07:23:00.0000] <billyjack> fwiw, I expect we are going to see a lot more developers making use of client-side XSLT now that browser support has improved significantly [07:23:01.0000] <MikeSmith> for example, Webkit adding support for the node-set() function [07:24:00.0000] <MikeSmith> it is painful to try to get much done on XSLT 1.0 without node-set() [07:24:01.0000] <MikeSmith> s/done on/done itn/ [07:24:02.0000] <MikeSmith> done in [07:24:03.0000] <MikeSmith> in in in [07:41:00.0000] <Philip`> When counting tag frequency, it's a good idea to not parse PDF files [08:14:00.0000] <gsnedders> Philip`: what makes you think that? :P [08:16:00.0000] <gsnedders> now, back to writing a 1:1 implementation of the common microsyntaxes [08:17:00.0000] <Philip`> The number of tag names which are ~150 characters long and only ever appear once and are full of non-ASCII characters does suggest that some filtering out of non-HTML files would help give slightly cleaner results :-) [08:18:00.0000] <hsivonen> the internationalization bomb is about to go off regarding ratios... [08:18:01.0000] <gsnedders> hsivonen: I saw. We seem to be managing fine with how UAs currently work, though… [08:19:00.0000] <Philip`> I still get a relatively large number of tags named scr"+"ipt - I'm not sure if that suggests it would be worth using a whole HTML parser when trying to do surveys of tag frequency, just to find and exclude the non-PCDATA bits [08:20:00.0000] <hsivonen> gsnedders: what do you mean? are there shipped implementations of the ratio algorithm except mine inside the conformance checker? [08:20:01.0000] <gsnedders> hsivonen: I mean for the other places where numbers are used (@height and @width will be the most widespread) [08:21:00.0000] <hsivonen> gsnedders: those are attributes--not element content [08:22:00.0000] <gsnedders> hsivonen: that's true, but how much of a difference should it really make? [08:22:01.0000] <hsivonen> gsnedders: it makes *all* the difference as far as internationazation politics go [08:22:02.0000] <hsivonen> gotta go [08:23:00.0000] <Philip`> Do 'international' people actually use number characters other than 0-9? [08:23:01.0000] <Philip`> I don't know how to find out, but at least http://www.aljazeera.net/ and http://divyabhaskar.co.in/ appear to just have normal digits everywhere [08:26:00.0000] <Philip`> ...and I believe I remember a Hebrew-speaking person say they never use the Hebrew number system [08:44:00.0000] <Dashiva> Philip`: Which context? [08:48:00.0000] <Philip`> Dashiva: Which context for what? [08:48:01.0000] <Dashiva> Never using hebrew number system [08:51:00.0000] <Philip`> Ah - I can't really remember now, since it was a long time ago and I wasn't paying that much attention at the time :-p [08:52:00.0000] <Philip`> I think it was originally in the context of i18ning a game's UI, but I can't remember what context their comment was in [08:53:00.0000] <Philip`> (I wanted to support duodecimal Quenya numbers too, but I eventually realised that was just stupid) [09:05:00.0000] <MikeSmith> Philip` - Kanji for numbers are used quite commonly in Japanese and Chinese text [09:05:01.0000] <MikeSmith> particuarly when it's written vertically [09:05:02.0000] <MikeSmith> but also in e-mail [09:05:03.0000] <MikeSmith> and other contexts I'm probably forgetting about [09:06:00.0000] <MikeSmith> addresses [09:06:01.0000] <MikeSmith> postal addresses [09:41:00.0000] <Dashiva> MikeSmith: Calculations and time seems to be pretty dominated by roman numerals, though [13:05:00.0000] <zcorpan_> hsivonen: opera's fosterparenting doesn't seem generic handling in the css... (using divs with display:table etc doesn't result in the same thing) [13:06:00.0000] <zcorpan_> s/seem/seem to be/ [13:08:00.0000] <zcorpan_> hsivonen: opera seems to use an anonymous caption element around the misplaced content [13:09:00.0000] <zcorpan_> (which is similar to what ie does -- only difference is ie actually adds the anonymous element to the dom) [13:09:01.0000] <hsivonen> zcorpan_: ok. interesting [13:10:00.0000] <zcorpan_> e.g. try caption-side:bottom [13:11:00.0000] <zcorpan_> or * { display:block } [13:14:00.0000] <hsivonen> I wonder why hyatt and Hixie didn't put a caption there. not worth the trouble? [13:15:00.0000] <zcorpan_> instead of moving things around in the dom? [13:20:00.0000] <hsivonen> no, in addition to moving to a different place [13:22:00.0000] <zcorpan_> what's the benefit? [13:23:00.0000] <hsivonen> zcorpan_: matching IE [13:24:00.0000] <hsivonen> zcorpan_: probably not worth it [13:24:01.0000] <zcorpan_> it does affect rendering in some cases, but firefox and safari seem to not need it [13:25:00.0000] <zcorpan_> and if you were to add a real "caption" element to the dom, you might break other things [13:25:01.0000] <zcorpan_> ie adds an element with the empty string as tag name [13:26:00.0000] <zcorpan_> how do you select that with css? :) [13:27:00.0000] <hsivonen> oh. crazy. [13:28:00.0000] <zcorpan_> in any case, if we were to use the caption approach, then we wouldn't need to move things around in the dom [13:29:00.0000] <othermaciej> empty string as tag name? for what? [13:30:00.0000] <zcorpan_> othermaciej: parent element of text node "foo" in <table><tr><td>x</tr>foo<tr><td>y</table> [13:31:00.0000] <othermaciej> awesome [13:33:00.0000] <zcorpan_> come to think of it, opera probably uses the same anonymous element approach with misnested block in inline when the inline styling gets applied to text nodes even though the inline element isn't an ancestor to the text node [16:57:00.0000] <MikeSmith> Dashiva - about usage in Japan of kanji numerals vs roman numerals: Yeah, true that calculations are dominated by roman numerals [16:57:01.0000] <MikeSmith> but not true about time [16:58:00.0000] <MikeSmith> at least in writing about time in the body of a message or document or whatever [16:58:01.0000] <MikeSmith> when it's quite common to write stuff like 三時半 [16:59:00.0000] <MikeSmith> =3:30 [16:59:01.0000] <MikeSmith> or 一時間半 [16:59:02.0000] <MikeSmith> to talk about something that takes 1.5 hours 2007-07-11 [03:06:00.0000] <hsivonen> http://www.w3.org/2005/Incubator/emotion/XGR-emotion-20070710/ [03:15:00.0000] <mpt> :-D [03:18:00.0000] <mpt> or should I say <?xml version="1.0" encoding="ISO-8859-1"?><eml><emma:interpretation meaning="hilarious">:-D</emma:interpretation></eml> [03:52:00.0000] <virtuelv> hsivonen: yuck [03:53:00.0000] <virtuelv> "Look at the sky. There is <emphasislevel="strong">not a single </emphasis> cloud. " [07:29:00.0000] <gsnedders> where does the convention of preceding attributes with @ and surrounding element names with | come from? [07:38:00.0000] <Philip`> I've always assumed the @ comes via XPath [07:42:00.0000] <BenWard> The @ is definitely an XPath-ism. [07:47:00.0000] <gsnedders> That's what I assumed, [07:47:01.0000] <gsnedders> But the vertical bar, I have little idea. [07:55:00.0000] <zcorpan_> the vertical bar is a convention for code in general, i think [13:40:00.0000] <gsnedders> what does SCS stand for in the spec? [13:41:00.0000] <Philip`> Self-contained section, I think [13:41:01.0000] <gsnedders> ah [14:35:00.0000] <gsnedders> anyone know of any mail clients that can cope with 100k+ locally stored emails without being overly slow? [14:43:00.0000] <jgraham> gsnedders: I have about 20k in Thunderbird using IMAP [14:44:00.0000] <jgraham> (actually it's more than 20k but less than 100k) [14:46:00.0000] <gsnedders> I had no issues with Mail.app till around 100k, so how something copes with 20k is thy-irrelevant [14:46:01.0000] <jgraham> Fair enough [15:02:00.0000] <Hixie> zcorpan: yt? [15:03:00.0000] <Hixie> zcorpan: re the [[Get]] stuff, the spec is a mess regarding that, it's a known issue, i'm waiting for the ES4 DOM Binding spec to be done before fixing it [15:03:01.0000] <Hixie> also note that unless you warn me about them i'm likely to miss e-mails you send to public-html [15:08:00.0000] <zcorpan> Hixie: ok [15:09:00.0000] <zcorpan> Hixie: i think there might be a wiki page that lists all detailed reviews or something [15:14:00.0000] <hsivonen> Hixie: zcorpan and I have easily seachable subject lines [15:14:01.0000] <hsivonen> searchable even [15:16:00.0000] <Hixie> k [15:16:01.0000] <zcorpan> "review of" [15:17:00.0000] <Hixie> http://www.w3.org/Search/Mail/Public/advanced_search?keywords=&hdr-1-name=subject&hdr-1-query=review+of&hdr-2-name=from&hdr-2-query=&hdr-3-name=message-id&hdr-3-query=&index-grp=Public__FULL&index-type=t&type-index=public-html&resultsperpage=100&sortby=date-asc [15:17:01.0000] <Hixie> that doesn't really work well [15:18:00.0000] <zcorpan> ok -- http://www.w3.org/Search/Mail/Public/advanced_search?keywords=&hdr-1-name=subject&hdr-1-query=%22detailed+review%22&hdr-2-name=from&hdr-2-query=&hdr-3-name=message-id&hdr-3-query=&index-grp=Public__FULL&index-type=t&type-index=public-html&resultsperpage=100&sortby=date-asc [15:19:00.0000] <hsivonen> Hixie: http://www.w3.org/Search/Mail/Public/advanced_search?keywords=&hdr-1-name=subject&hdr-1-query=%28detailed+review+of&hdr-2-name=from&hdr-2-query=&hdr-3-name=message-id&hdr-3-query=&index-grp=Public__FULL&index-type=t&type-index=public-html&resultsperpage=100&sortby=date-asc [15:19:01.0000] <Hixie> that's still going to end up catching a lot of whining e-mail replies to your comments :-) [15:20:00.0000] <zcorpan> haven't seen whining replies to detailed reviews [15:20:01.0000] <jgraham> /me has noticed far fewer replies to messages with substantial review comments [15:20:02.0000] <hsivonen> /me doesn't consider heycam, liorean or mjs whining [15:20:03.0000] <Hixie> none yet [15:20:04.0000] <hsivonen> jgraham is right [15:21:00.0000] <Hixie> but i'm sure by 9 months from now when i finally get to public-html mail, there'll've been some [15:31:00.0000] <Hixie> hsivonen: re foster parenting of spaces, known issue; i plan to address it by having a flag which triggers once you start foster parenting, after which spaces get foster parented [15:35:00.0000] <zcorpan> Hixie: comments too? [15:36:00.0000] <hsivonen> Hixie: so you are taking the Gecko route instead of the WebKit trunk route? [15:37:00.0000] <Hixie> zcorpan: dunno [15:37:01.0000] <hsivonen> Hixie: Gecko foster parents them, too. [15:37:02.0000] <Hixie> hsivonen: i don't know which is which, but if gecko does what i described and not webkit, then yes, though that is mostly accidental [15:37:03.0000] <Hixie> i haven't looked at it in detail [15:37:04.0000] <hsivonen> Hixie: WebKit trunk does what I said on the list [15:38:00.0000] <Hixie> btw forget what i said about not seeing the comments to public-html [15:38:01.0000] <Hixie> turns out i'm not missing them [15:38:02.0000] <Hixie> i seem to have no trouble recognising your e-mails from the flood of nonsense [15:38:03.0000] <Hixie> that goes for zcorpan's and gsnedders' e-mails too [15:38:04.0000] <hsivonen> :-) [15:45:00.0000] <gsnedders> yay! mine aren't nonsense! [15:45:01.0000] <gsnedders> :) 2007-07-12 [03:11:00.0000] <Hixie> hm [03:11:01.0000] <Hixie> are there really two parse errors for "<!DOCTYPE" but only one for "<!DOCTYPE "? [03:20:00.0000] <hsivonen> Hixie: yes [03:21:00.0000] <hsivonen> Hixie: probably not worth tweaking [03:34:00.0000] <hsivonen> <b><table><td></b><i></table>X [03:34:01.0000] <hsivonen> Why isn't <b> supposed to reopen before X? [04:02:00.0000] <Hixie> isn't it? [04:03:00.0000] <Hixie> oh because the table is in the <b> [04:03:01.0000] <Hixie> and so the X is still in the <b> [04:03:02.0000] <Hixie> the </b> in the above has no effect [04:05:00.0000] <virtuelv> Hixie: How's Bergen? [04:14:00.0000] <gsnedders> http://geoffers.no-ip.com/svn/php-html-5-direct/tests/numbersTest [04:16:00.0000] <Hixie> virtuelv: rainy [04:17:00.0000] <Hixie> gsnedders: does that match the spec or the spec with your proposed changes? [04:17:01.0000] <gsnedders> Hixie: the spec [04:18:00.0000] <virtuelv> Hixie: Norway's been pretty much like that for a couple of weeks now [04:18:01.0000] <gsnedders> Hixie: even when the spec does very odd things (like a list of integers with input "10" outputting [1]) [04:19:00.0000] <Hixie> gsnedders: k [04:19:01.0000] <Hixie> gsnedders: can you include that link in one of your e-mails? (or just mail it directly to me ian⊙hc) I'll try to look at what browsers do with those tests when I update the spec [04:20:00.0000] <gsnedders> Hixie: I'm going to email it shortly [04:20:01.0000] <gsnedders> Hixie: just a few more general issues with the number section, then my review of that is done, and I'll send it off with the final email [04:20:02.0000] <hsivonen> Hixie: ah, I didn't realized the table was in b. I've got a bug then. [04:22:00.0000] <virtuelv> Hixie: re DOMContentLoaded - it'd be useful to have some event when the DOM is loaded and styles are available/applied [04:41:00.0000] <hsivonen> translating the spec to code would be less error-prone if the spec didn't have gotos that create unnatural loops [04:42:00.0000] <gsnedders> hsivonen: heh. I ended up with a do {} while (true); in my implementation of the lists of integers. [04:42:01.0000] <gsnedders> then relying on break and continue statements [04:42:02.0000] <hsivonen> gsnedders: I'm pretty sure do-while is always natural [04:42:03.0000] <hsivonen> (natural in the compiler sense) [04:43:00.0000] <gsnedders> ah. in that sense. [04:43:01.0000] <gsnedders> (of natural) [04:43:02.0000] <gsnedders> PHP likely does something odd with it, though, knowing PHP. [04:44:00.0000] <gsnedders> has anyone apart from zcorpan_ and myself started the spec review, anyway? [04:44:01.0000] <hsivonen> if I had to guess, my guess would be that even PHP created only natural loops for the purpose of compiler optimization [04:44:02.0000] <hsivonen> gsnedders: I'm reviewing the parsing spec as I go [04:45:00.0000] <hsivonen> gsnedders: I don't have much to say about tokenization, but I have posted remark about tree building [04:47:00.0000] <gsnedders> hsivonen: ah. I just haven't seen that much. [04:48:00.0000] <hsivonen> lost in the flood I guess :-( [04:48:01.0000] <gsnedders> ah, now I see [04:55:00.0000] <Hixie> hsivonen: believe me, the spec doesn't look like what i'd want it to look like if i was doing this from scratch [04:55:01.0000] <Hixie> anyway, time to be a tourist [04:56:00.0000] <gsnedders> Hixie: rarely anything ends up as you'd like it to if you started from scratch :P [05:19:00.0000] <hsivonen> <a><p>X<a>Y</a>Z</p></a> [05:20:00.0000] <hsivonen> Why does the first <a> come off the stack before <p> goes in? [05:20:01.0000] <hsivonen> ooh. does the p get reparented? [05:22:00.0000] <hsivonen> now I'm confused [05:37:00.0000] <met_> http://ajaxian.com/archives/google-gears-roadmap-and-features [05:37:01.0000] <hsivonen> ooh! my code lacks step #10 of the AAA! [05:44:00.0000] <Philip`> gsnedders: In numbersTest: s/dimentions/dimensions/ [05:47:00.0000] <gsnedders> Philip`: fixed [08:15:00.0000] <gsnedders> Jero: you around? [08:15:01.0000] <Jero> yup [08:17:00.0000] <gsnedders> did you start your PHP5 implementation from scratch not knowing that there was a semi-started one before, or some other reason? [08:24:00.0000] <gsnedders> Jero: and I've started on a 1:1 implementation in PHP, which isn't really so relevant in the real world [08:25:00.0000] <Jero> gsnedders: correct, I found out later that there was already an HTML5 parser in PHP [08:25:01.0000] <Jero> gsnedders: but I could access the site (some issues with Trac I believe) [08:26:00.0000] <gsnedders> Jero: it's not so interesting now. a lot of the code written for it is obsolete [08:26:01.0000] <gsnedders> http://php-html5lib.dashslot.net/svn/trunk works, though [08:27:00.0000] <Jero> gsnedders: interesting [08:27:01.0000] <Jero> also, what do you think of my implementation so far? [08:27:02.0000] <gsnedders> I've never had time to really look into it [08:27:03.0000] <gsnedders> (due to school, and now trying to get as much of the spec review done as possible before going away in a week) [08:28:00.0000] <gsnedders> http://geoffers.no-ip.com/svn/php-html-5-direct contains the direct implementation [08:29:00.0000] <Jero> thanks [08:29:01.0000] <gsnedders> it's all very slow, though [08:29:02.0000] <Jero> so is my implementation at the moment :p [08:29:03.0000] <gsnedders> the direct one will be far slower, though [08:30:00.0000] <Jero> yeah, i'm sure [08:30:01.0000] <gsnedders> as the aim is to make absolutely no compromises from the spec [08:30:02.0000] <gsnedders> which is the case of the tokeniser means one character at a time [08:30:03.0000] <gsnedders> *means emitting [08:30:04.0000] <Jero> yeah, that's not a very optimal solution :p [08:31:00.0000] <Jero> but I guess I've only made three or four changes to the entire parsing algorithm compared to the spec [08:33:00.0000] <Philip`> If you want to write a new tokeniser in some language, it could perhaps be helpful to build on my work - that has a direct representation of the spec algorithm, and generates C++ or JS code to execute it, and it ought to be fairly quick to do other languages in the same way [08:35:00.0000] <Philip`> (I need to add some kind of abstraction in the code-generating part - JS was only easy because it's almost entirely identical to C++ except for replacing 'bool' with 'var', and it takes a little bit more effort if you needs $s in front of variables) [08:35:01.0000] <Philip`> (but I'll at least try to create a Perl implementation too, to make sure it's sufficiently portable between languages) [08:44:00.0000] <gsnedders> Jero: I may, however, try forking off the direct impl and work on optimising it (as that's far nicer than starting from scratch, as I can just rewrite one method at a time) [08:48:00.0000] <Jero> well, I followed the spec in everything (with three or four exceptions), so that's basically the same as forking off the direct implementation, don't you think? [08:50:00.0000] <gsnedders> Jero: yes [08:51:00.0000] <gsnedders> Jero: it would be interesting to compare the two, though (and optimising it won't take overly long to do) [08:52:00.0000] <Jero> my impl still has a couple of bugs (though most of them are related I think) [08:53:00.0000] <Jero> and I'm a bit behind when it comes to the last 60 or so revisions [08:53:01.0000] <gsnedders> heh. any bugs in the direct impl are either PHP bugs or spec bugs [08:54:00.0000] <gsnedders> and I wouldn't allow any regressions when optimising it [08:55:00.0000] <Jero> gsnedders: you can contribute to the code if you want to in the future [08:55:01.0000] <gsnedders> Jero: I'll probably optimise the tokeniser and then see how the two compare, then decide what to do from there [08:56:00.0000] <Jero> the tokeniser of my implementation you mean? [08:57:00.0000] <gsnedders> the tokeniser of the direct implementation, then compare it to your tokeniser [08:57:01.0000] <Jero> that sounds like a good idea [08:58:00.0000] <Jero> I'll upload the code I have on my PC to the online version of my parser, so you can compare it to the latest and greatest [08:58:01.0000] <gsnedders> heh. it won't be for a while, though [08:58:02.0000] <gsnedders> the tokeniser isn't written in the direct impl yet [08:59:00.0000] <Jero> oh i see :p [08:59:01.0000] <gsnedders> (which I had actually implied earlier) [09:01:00.0000] <Jero> also, don't you think it'd be great to have the HTML5's parsing algorithm being used by the built-in DOMDocument->loadHTML() function in PHP? [09:02:00.0000] <Jero> ATM that function uses the libxml2 HTML parser [09:02:01.0000] <gsnedders> Jero: as if you're ever gonna persude the PHP devs to implement a draft standard… [09:02:02.0000] <Jero> don't worry, it was just an idea.. [09:03:00.0000] <gsnedders> Jero: it took me many, many, many years to persuade them of a bug in strip_tags(), which they kept writing off as being invalid HTML (as the aim there is to use a basic parser that'll work with valid HTML) despite me citing specific parts of the specification that clearly said otherwise [09:05:00.0000] <Jero> heh [09:06:00.0000] <gsnedders> I bet they didn't have a copy of the SGML spec, and were simply saying what they thought was right. [09:06:01.0000] <gsnedders> (it's actually something that despite being part of the SGML spec is relevant) [09:09:00.0000] <Jero> what was the bug? [09:10:00.0000] <gsnedders> U+003E within quoted attribute values [09:10:01.0000] <gsnedders> it probably breaks if you mix single and double quotes, actually [09:10:02.0000] <gsnedders> e.g., <foo bar="this'> is parsed as a single |foo| element where @bar=this [09:12:00.0000] <Jero> so it closes the value of bar upon seeing the ' character? [09:12:01.0000] <gsnedders> yes [09:12:02.0000] <Jero> that is indeed very weird [09:13:00.0000] <Jero> and what was their argument? [09:14:00.0000] <gsnedders> actually, that does work correctly [09:14:01.0000] <gsnedders> var_dump(strip_tags('<foo bar="this\'>">')); indeed produces string(0) "" [09:14:02.0000] <gsnedders> Jero: for the > bug? that it was invalid HTML. [09:14:03.0000] <gsnedders> Jero: for the latter? I only just thought of it [09:15:00.0000] <Jero> i see [09:15:01.0000] <gsnedders> the former is untrue, as it is completely valid [09:16:00.0000] <gsnedders> [^<&] off the top of my head [09:17:00.0000] <Jero> heh [09:17:01.0000] <Jero> and they still haven't fixed it? [09:17:02.0000] <gsnedders> the former is fixed in 5.2.2, IIRC [09:18:00.0000] <gsnedders> only 5, though [09:18:01.0000] <gsnedders> the same patch would apply against 4.4 fine, but it's unfixed [09:19:00.0000] <Jero> that's stupid [09:20:00.0000] <gsnedders> typical of PHP development, though [09:21:00.0000] <Jero> that's too bad [09:21:01.0000] <Philip`> <foo <bar=<bar> is syntactically valid in HTML5 now - only ["&] (or ['&] or (\s|&)) does anything [09:22:00.0000] <Philip`> /me wonders how that will mess up strip_tags [09:22:01.0000] <gsnedders> Jero: http://cvs.php.net/viewvc.cgi/php-src/ext/standard/tests/strings/bug40432.phpt?revision=1.2&view=markup&pathrev=MAIN [09:22:02.0000] <Jero> thanks [09:23:00.0000] <gsnedders> I think I saw it fail in 5.2.3, actually [09:24:00.0000] <gsnedders> Philip`: http://cvs.php.net/viewvc.cgi/php-src/ext/standard/string.c?view=markup — search for php_u_strip_tags [09:24:01.0000] <gsnedders> Philip`: string(0) "" is PHP 5.2.3's output, though [09:29:00.0000] <Jero> gsnedders, i'm off, if you ever need me regarding my HTML5 parser, email me at [censored :)] [09:30:00.0000] <gsnedders> Jero: I'll be around here if you ever want me [09:30:01.0000] <Jero> alrighty, bye [10:26:00.0000] <gsnedders> jgraham: do you really think that those tests would be that hard to get working in another language? the script I use to parse it is in the repos [10:28:00.0000] <gsnedders> jgraham: I didn't want to copy the html5lib test cases format as it would mean I'd need the input data repeated multiple times for each algorithm [10:34:00.0000] <Philip`> gsnedders: It would probably be useful to give more detail on the test format, like how it represents arrays and strings [10:34:01.0000] <Philip`> or just use JSON since that already defines those things and everyone has JSON parsers already :-) [10:35:00.0000] <gsnedders> and have each test as an object with an array of results? [10:39:00.0000] <gsnedders> Philip`: but yeah, the documentation was thrown together very quickly [10:52:00.0000] <Philip`> gsnedders: I was thinking of something like [["Empty string", "", false, false, false, null, "", []], ...], since that's about the same as what you have already but more JSONic, but maybe ["Empty string", "", { "unsigned":false, "signed":false, "real":false, ... }] would be more easily extensible [10:53:00.0000] <gsnedders> Philip`: I was thinking {"":[false,false,false,null,null,[]]} [10:53:01.0000] <Philip`> It'd be nice if JSON allowed you to keep comments [10:54:00.0000] <gsnedders> Philip`: there are only headers for large groups of tests, so I don't feel that much about keeping them [10:56:00.0000] <Philip`> What about XML? <numbertest><!-- Empty string --><input></input><outputs><output algorithm="unsigned"><false/></output><output algorithm="integerlist"><items/></output>... [10:56:01.0000] <gsnedders> that means defining data types and the like [10:56:02.0000] <Philip`> Hmm, maybe the [false,false,...] one is easiest [10:58:00.0000] <Philip`> In any case, it does seem probably easier to use JSON rather than a custom data format when you have arrays and non-ASCII strings, to avoid making every implementor implement another test parser [10:59:00.0000] <gsnedders> that's true [10:59:01.0000] <gsnedders> just lack of comments in JSON is annoying [11:00:00.0000] <gsnedders> around 15 minutes to be completely happy with a JSON version of the test suite… not overly slow… [11:00:01.0000] <Philip`> (JSON is also quite handy when you're running tests in web browsers) [11:01:00.0000] <gsnedders> (It would've been easier if it were possible to get pretty printing of JSON in PHP) [11:01:01.0000] <gsnedders> (as I just hacked my existing parser) [13:14:00.0000] <gsnedders> jgraham: just looking at the PHPUnit compiled version of the tests? [14:18:00.0000] <virtuelv_> is it defined anywhere what the implied DOM should be like when using createHTMLDocument()? [14:19:00.0000] <virtuelv_> (iow: what should the DOM be like given var doc = document.implementation.createHTMLDocument(""); [14:19:01.0000] <virtuelv_> doc.documentElement.innerHTML = "<h1>What</h1>"; [14:19:02.0000] <virtuelv_> alert(doc.documentElement.outerHTML); [14:20:00.0000] <virtuelv_> what should be alerted? [14:21:00.0000] <jgraham> gsnedders: Yeah, for some reason I looked at the PHP version [14:22:00.0000] <gsnedders> jgraham: yeah. that'd be thy impossible to parse. there's now a JSON version of the tests in the repo as well, though [14:22:01.0000] <gsnedders> (but that loses some data, like not distinguishing between ints and floats) [14:24:00.0000] <Philip`> Could you store floats as strings instead of numbers? [14:25:00.0000] <gsnedders> then parse the string? [14:25:01.0000] <gsnedders> hmmm… 2007-07-13 [03:18:00.0000] <gsnedders> hsivonen: nitpick about the "About the Validation Service" page: "fantasai and me" — me should be myself [03:23:00.0000] <hsivonen> gsnedders: are you sure? suppose "by" was used with a third person, it would be "him" or "her", right? [03:24:00.0000] <gsnedders> hsivonen: I'm sure. I have no idea what the rule is, but me is most certainly wrong [03:25:00.0000] <gsnedders> /me is now looking up OXM [03:27:00.0000] <hsivonen> gsnedders: http://www.writing911.com/database/idx/8/040/Common-Grammar-Questions/article/Me-Myself-and-I-Whats-the-Difference.html [03:27:01.0000] <hsivonen> "If it works with him, use me. If the word follows a preposition (words like by, after, as, at, for, in, like, with, etc.), choose me." [03:28:00.0000] <gsnedders> hsivonen: myself is reflexive. [03:28:01.0000] <gsnedders> hsivonen: you wrote it yourself. [03:29:00.0000] <gsnedders> hsivonen: [It was] written by fantasai and me. [03:29:01.0000] <gsnedders> hsivonen: it's the same person doing the action. [03:29:02.0000] <gsnedders> hsivonen: thus, you use the reflexive [03:31:00.0000] <hsivonen> gsnedders: unfortunately, I'm not trusting your advice on this without a grammatical reference work that specifically calls for "by myself" [03:31:01.0000] <hsivonen> because, my grammar sense says "me", the above reference says "me" and googlefight says "me": http://googlefight.com/index.php?lang=en_GB&word1=%22by+me%22&word2=%22by+myself%22 [03:32:00.0000] <gsnedders> hsivonen: I sadly can't find anything in the OSM about it when to use both (though that'll mainly be due to it being grammatical, and not stylistic) [03:33:00.0000] <gsnedders> hsivonen: "by me" is fine in some contexts, though. in more contexts, probably, which is why there are more google hits for it [03:34:00.0000] <hsivonen> I tried to look this up in the Chicago Manual of Style, but I couldn't figure the right name for the concept to find it in the index [03:41:00.0000] <gsnedders> hsivonen: and I've just forgotten the quote of what the OED said [03:41:01.0000] <gsnedders> /me runs back downstairs [03:43:00.0000] <hsivonen> http://googlefight.com/index.php?lang=en_GB&word1=%22written+by+me%22&word2=%22written+by+myself%22 [03:44:00.0000] <gsnedders> 3. Subsituted for me on a object govered by a proposition [03:44:01.0000] <gsnedders> 5. As a direct or indirect object, in acc. or inf. const., or dependant on a proposition [03:44:02.0000] <gsnedders> hsivonen: two OED definitions of myself [03:45:00.0000] <gsnedders> hsivonen: in both cases, use of me is archaic [03:46:00.0000] <gsnedders> s/proposition/preposition [03:47:00.0000] <hsivonen> hmm. I still doubt that "me" was wrong in the agent case [03:47:01.0000] <hsivonen> As far as I can tell, "by myself" means something different as in the Céline Dion song [03:48:00.0000] <gsnedders> "Written by myself" is the same as the colloquial "Written by me" [03:48:01.0000] <gsnedders> (though the OED notes the latter as archaic) [03:49:00.0000] <othermaciej> that's not how I would interpret "written by myself" [03:50:00.0000] <othermaciej> I think most native English speakers (at least US English) would interpret that as "written by me alone, without any help or collaboration from anyone else" [03:50:01.0000] <hsivonen> othermaciej: I read "by myself" as "alone" as in the above-mentioned Céline Dion song [03:51:00.0000] <gsnedders> I as a native English speaker would agree in that case with the third definition in the OED of myself [03:52:00.0000] <hsivonen> gsnedders: given what othermaciej said, I think I rather risk being colloquial than give the impression that I acted alone [03:53:00.0000] <gsnedders> hsivonen: I already get that impression from me [03:53:01.0000] <gsnedders> (well, you and fantasai) [03:55:00.0000] <othermaciej> what's the sentence you're trying to adjust? [03:55:01.0000] <hsivonen> gsnedders: I'm sorry if I'm wrong and thick here, but I still find the reference I cited more plausible [03:56:00.0000] <hsivonen> othermaciej: "Written by fantasai and me." [03:56:01.0000] <othermaciej> that sounds grammatically right to me [03:56:02.0000] <hsivonen> (not that I'd go against the OED, but I am not convinced the OED means this case) [04:00:00.0000] <othermaciej> the object of a preposition should almost always be "me", not "myself" [04:03:00.0000] <othermaciej> http://dictionary.reference.com/browse/myself has some notes [04:17:00.0000] <hsivonen> does IE7 put the attribute foo="bar" in the DOM in this case: http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C/br%20foo%3Dbar%3E [04:17:01.0000] <hsivonen> Opera does [04:18:00.0000] <hsivonen> even with doctype http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0D%0A%3C/br%20foo%3Dbar%3E 2007-07-14 [00:42:00.0000] <Lachy> Hi all [02:17:00.0000] <hendry> hsivonen: i've heard several ppl lately talking about exposing APIs on mobile devices [02:18:00.0000] <hendry> i wondered if you had any thoughts on that, or know someone who is working on that [02:19:00.0000] <hsivonen> hendry: my main thought is that I'd like to see Nokia, Opera and Apple standardize what they are doing so that we don't need a layer of abstraction libraries on the JS side [02:19:01.0000] <hendry> i heard opera were doing something in their namespace [02:20:00.0000] <hsivonen> hendry: I haven't looked into this really, so I don't know if they are already aligning their APIs with each other [02:20:01.0000] <hendry> i didn't manage to track down the relevant document yet though [02:22:00.0000] <hendry> i imagine this 'layer of abstraction libraries' might be difficult to implement on a range of devices [02:22:01.0000] <hendry> and i was thinking if UAs might connect the JS caps with typical mobile JVMs [02:23:00.0000] <hsivonen> hendry: why would be difficult? (I don't do mobile stuff, so forgive my cluelessness) [02:24:00.0000] <hendry> i haven't seen anyone do it. i am just thinking of some sort of strategy of getting 'APIs exposed' [02:24:01.0000] <hsivonen> hendry: I didn't mean abstracting random "mobile UAs" but abstracting iPhone, S60 Browser and Opera for Mobile [02:25:00.0000] <hendry> sure, they will be most of the market and the ones that are reasonable [02:25:01.0000] <hsivonen> I have no idea whether Minimo should be on the list. that is, I have no idea if Minimo tries to expose some non-desktop features of the host device to JS [02:25:02.0000] <hendry> however i do wonder about the future of the 'S60 Browser' [02:26:00.0000] <hsivonen> hendry: how so? [02:26:01.0000] <hendry> well my ex-flatmate andrei has left Nokia :) [02:26:02.0000] <hsivonen> oh [02:26:03.0000] <hendry> so I am wondering who is maintaining it for one and implementing new features [02:26:04.0000] <hendry> the community is small ;) [02:28:00.0000] <hendry> about minimo. do you think ever get some market share on mobile devices? [02:29:00.0000] <hendry> it seems webkit has all the mindshare on mobiles atm [02:29:01.0000] <hsivonen> hendry: I guess that depends on how much work attention Minimo gets and whether "Gecko 2" deCOMtamination takes the RAM footprint down. [02:30:00.0000] <hendry> hehe, there has been talk of that for many years now :) [02:33:00.0000] <hsivonen> w00t! I get the right trees on tests2.dat (after fixing a bunch of test cases :-) [03:32:00.0000] <Dashiva> "both the image and the rich description thereof should not be an either/or proposition, but a "let the user decide how to expose rich content" question..." [03:32:01.0000] <Dashiva> I hope I'm not the only one thinking "we use images because they provide something more than not having images" [03:33:00.0000] <webben> Dashiva: That depends on the user. [03:33:01.0000] <webben> User-centric design has to recognize that. :) [03:34:00.0000] <webben> You could equally say we provide a text alternative to an image because it provides something more (for some users) than having images. [03:36:00.0000] <webben> Dashiva: what were you quoting there? [03:36:01.0000] <Dashiva> A recent list post [03:37:00.0000] <Dashiva> /me ponders the possibility of pushing accessibility to get more images as alternate content on pure-text pages, to be more accessible for people who don't like reading so much [03:39:00.0000] <hsivonen> Dashiva: I think Gregory's thinking on this point doesn't match very well the way seeing authors think of their use of images [03:40:00.0000] <webben> Dashiva: That's quite right. Images are more accessible to certain people in certain situations. [03:40:01.0000] <webben> Dashiva: That's why WCAG has had (admittedly weak) provisions about including image alternatives. [03:43:00.0000] <webben> Dashiva: see e.g. the introduction to WCAG 1: http://www.w3.org/TR/WAI-WEBCONTENT/ [03:43:01.0000] <webben> and Checkpoint 14: http://www.w3.org/TR/WAI-WEBCONTENT/#gl-facilitate-comprehension [03:45:00.0000] <Dashiva> That's to supplement, like having a graph sidebar. It's not equivalent content, which seems to be the rage these days [03:46:00.0000] <hsivonen> webben: I think it is extremely rare for authors to hide some of their text when they provide an illustration. hence, as Dashiva said, supplementary--not alternative [03:46:01.0000] <webben> Dashiva: The text refers you back to Guideline 1. Read it a bit more carefully. [03:46:02.0000] <hsivonen> webben: the thing is, if an image is supplementary, does it need an alternative? [03:47:00.0000] <webben> "Providing non-text equivalents (e.g., pictures, videos, and pre-recorded audio) of text is also beneficial to some users, especially nonreaders or people who have difficulty reading." [03:47:01.0000] <webben> hsivonen: That depends on what exactly it's supplementing. [03:48:00.0000] <webben> e.g. a chart might supplement a text with /data/ in graphical form. [03:48:01.0000] <hsivonen> webben: is there a way for people to indicate that they are non-readers or have difficulty of reading so that the UA could withdraw some text from them? [03:48:02.0000] <webben> and hence need an alternate presentation of that data [03:49:00.0000] <webben> hsivonen: I think to a large measure communicating via images is kind of a default for common UAs/ [03:50:00.0000] <hsivonen> webben: my point is that if you can see, it is generally up to the user to treat supplementary content as alternative and skim over it. should it be the UA's and author's responsibility to suppress some content in some cases based on a pref that indicates that the user has a trouble comprehending text? [03:51:00.0000] <webben> hsivonen: Possibly, yes. Actually getting UAs to incorporate such customization would probably be an uphill struggle though. [03:52:00.0000] <webben> hsivonen: I think what's happening in practice atm is the web bifurcates into websites that are friendly to those with learning disabilities and those that aren't. [03:53:00.0000] <Dashiva> Heh, so image equivalents get a pri 3 checkpoint refering to an orphaned sentence in guideline 1. Oh well, no surprise. [03:53:01.0000] <webben> e.g. http://simple.wikipedia.org/ and http://www.peepo.co.uk/ [03:54:00.0000] <hsivonen> webben: one of the problems with expecting the author to mark some parts of content suppressible from people with learning disabilities is that it would require authors to understand the needs of people with learning disabilities [03:55:00.0000] <webben> hsivonen: Yes. People trying to communicate to an audience need to have some understanding of the needs of that audience. Failing that, they need guidelines to follow. (That's the basic premise of WCAG.) [03:55:01.0000] <hsivonen> webben: another problem is that people who are capable of writing prose that is fine for cognitively capable native readers of a given language don't want their literary expression curtailed [03:55:02.0000] <webben> hsivonen: Curtailed by what? [03:56:00.0000] <hsivonen> webben: by having to cater to people with limited vocabulary (foreigners) or learning disabilities [03:56:01.0000] <webben> hsivonen: In practice, people who aren't providing a commercial or public service don't seem to be "curtailed". [03:57:00.0000] <webben> I should probably say in theory, given how little business and government seems to be touched by accessibility legislation anyhow. [03:57:01.0000] <hsivonen> webben: for example, the Time magazine overdoes its thing with the thesaurus, but if you wanted Time to work for people with limited vocabulary, you'd effectively ask them to change their writing style [03:57:02.0000] <webben> hsivonen: Or have two writing styles. [03:58:00.0000] <webben> It wouldn't /necessarily/ be a commercial disaster, since their reach could increase to cover people with learning disabilities (a massive population) and people who have only basic english (an even bigger population). [03:58:01.0000] <hsivonen> webben: do you mean writing article once in a supposedly witty way and then another time with all the intertextualism, implications, rare synonyms and idiomatic English eliminated? [03:59:00.0000] <webben> hsivonen: Yeah, have a look at the simple wikipedia example I gave. [04:00:00.0000] <hsivonen> webben: I think solutions that require you to author your content twice aren't gonna fly *in general* [04:01:00.0000] <webben> hsivonen: Do they have to fly *in general*? [04:01:01.0000] <hsivonen> webben: I think I don't have a learning disability, but I've never understood what the peepo site is supposed to be about [04:01:02.0000] <webben> hsivonen: It's a series of fun web resources for people with learning disabilities + children. [04:02:00.0000] <webben> it doesn't have a single unifying theme beyond that. [04:02:01.0000] <webben> e.g. games + videos [04:02:02.0000] <hsivonen> webben: no, if it is just a shadow site of wikipedia with plain hyperlinks in between. But if we're talking about building article alternatives into the markup, yes, the use case catered for should be a general one [04:03:00.0000] <hsivonen> webben: Peepo gives me 401 when I try to follow a link [04:04:00.0000] <webben> hsivonen: It gives me a request for authentication. I suspect he's gone and broken something temporarily. [04:04:01.0000] <webben> hsivonen: A use case can be general without everyone choosing to use it. [04:04:02.0000] <webben> hsivonen: e.g. PHP contains loads of functions. Not every PHP program uses every or even most of them. [04:04:03.0000] <webben> but together they provide an immense power [04:05:00.0000] <hsivonen> I would guess that "translations" within a language to a different cognitive audience would suck even more than normal traslations [04:05:01.0000] <hsivonen> translations [04:05:02.0000] <webben> hsivonen: Why would you guess that? [04:05:03.0000] <webben> It seems non-obvious to me. [04:05:04.0000] <hsivonen> webben: for example, the Web supposedly has a feature that lets me indicate a preference of Finnish over English [04:06:00.0000] <webben> indeed [04:06:01.0000] <webben> but problems with that have more to do with UA + server failures to implement HTTP properly than anything else. [04:06:02.0000] <hsivonen> webben: however, in practice, when sites do have translations, *in practice* the English one is up-to-date. the other languages may or may not be [04:07:00.0000] <webben> hsivonen: That can be a big problem with alternate content that is not in the same document. [04:07:01.0000] <hsivonen> webben: hence, I tell my browser that I prefer English, because I prefer up-to-date foreign language that I can read over sucky out-of-date translations any time [04:07:02.0000] <webben> hsivonen: It was a big problem with text-only "accessible" versions of sites. [04:08:00.0000] <hsivonen> webben: so when I *do* switch between languages, I want to do it with explicit links so I can see what my bogometer says instead of the existence of translations being hidden from me [04:08:01.0000] <webben> hsivonen: That's purely a matter of UA interface. [04:09:00.0000] <webben> I don't think it's at all necessary to have every site come up with it's own interface for language switching. [04:09:01.0000] <hsivonen> webben: no, it isn't. when you do content negotiation, the UA doesn't know what else was available [04:10:00.0000] <hsivonen> anyway, I seriously don't trust automatic alternative selection [04:11:00.0000] <hsivonen> webben: the big problem with alternatives for cognitively different audiences is that you really need a human editor [04:11:01.0000] <webben> hsivonen: That depends on what UAs send. [04:12:00.0000] <webben> For example, a UA could send a HEAD request without language preference and get back a list of choices. [04:12:01.0000] <webben> And use that to construct a clientside menu. [04:12:02.0000] <hsivonen> webben: "text only" browsers like Lynx show the same text content and screen readers read the same text content [04:13:00.0000] <hsivonen> webben: I think you're now in the territory where fixing the failings of a feature balloon to such complication that we should concede that the feature is broken [04:14:00.0000] <webben> "feature balloon"? [04:14:01.0000] <webben> I don't think UAs ever took conneg terribly seriously. [04:14:02.0000] <hsivonen> the attempts to fix the feature cause the solution as a whole to balloon [04:14:03.0000] <webben> hsivonen: It's not an attempt to fix it. [04:14:04.0000] <hsivonen> so the initially simple idea becomes very complex [04:15:00.0000] <webben> it's just using the conneg drafts [04:15:01.0000] <webben> http://www.ietf.org/rfc/rfc2296.txt [04:16:00.0000] <webben> Features cannot break. They can be useful or non-useful. Only implementations can be broken. [04:16:01.0000] <webben> (Non-useful features probably aren't features.) [04:16:02.0000] <hsivonen> webben: that's what I'm talking about. first there was a simple idea: the browser tells a server what languages the user groks, the server sends matching content. [04:16:03.0000] <hsivonen> webben: then, people figured out that it isn't enough [04:17:00.0000] <hsivonen> webben: they try to patch and extend the feature [04:17:01.0000] <hsivonen> webben: the feature becomes so complex that users can't configure it and implementors don't bother to implement [04:17:02.0000] <webben> hsivonen: Well strictly the failure of sites like the ones with Finnish translations is simply that they don't set their q values properly. [04:17:03.0000] <webben> (the q values should be set to indicate the Finnish content is of poor quality) [04:18:00.0000] <hsivonen> webben: whereas everyone understands plain links "in English", "en français", "suomeksi" [04:18:01.0000] <hsivonen> webben: now you are again saying that the implementations just aren't good enough [04:18:02.0000] <hsivonen> webben: when the reality is that the feature just isn't as simple as it first looks [04:18:03.0000] <webben> Those aren't at all contradictory. [04:19:00.0000] <webben> The feature may not be simple (I don't think the conneg draft is immensely simple, except from an end-user perspective). [04:19:01.0000] <webben> and also implementations aren't good enough. [04:19:02.0000] <hsivonen> webben: my point is that if the feature fails most of the time in practice, practice wins over theory [04:19:03.0000] <webben> Features can't fail. [04:20:00.0000] <webben> It's a contradiction in terms. [04:20:01.0000] <webben> Unimplemented features certainly can't fail. Since they haven't even existed in practice. [04:21:00.0000] <hsivonen> if a feature is designed to give me content that I'd prefer with little effort on my part, the feature is a failure if most of the time I don't get what I wanted and spent effort finding that out [04:22:00.0000] <webben> hsivonen: If I design a bridge and no-one builds it, the bridge can't be said to be a failure since you cannot walk across it. [04:22:01.0000] <hsivonen> by "fail" I mean that all the trouble that went into the feature hasn't resulted in a net gain in satisfaction [04:23:00.0000] <webben> The /design/ might (or might not) be a failure; but that can't actually be deduced from the fact that you can't walk across the bridge. [04:23:01.0000] <webben> (For instance, it might be the government ran out of bridge-building funds, or that there was a better design. There are multiple possibilities.) [04:25:00.0000] <webben> As far as I can tell, the only advantage of "plain links" (which are so often not plain at all, but weird Flash widgets in practice) is that it doesn't require UAs or servers to implement anything. [04:26:00.0000] <webben> (advantage in terms of the design, I mean, not how things are actually done) [04:27:00.0000] <webben> Much web development seems to be about replacing simple interfaces like links with complicated ones like <object> and <video>. [04:27:01.0000] <webben> The advantage of successfully shifting such interface into the UA is that it slightly discourages authors from confusing users with such interfaces. [04:28:00.0000] <webben> (Because the user only has to deal with one way of doing things.) [04:29:00.0000] <webben> hsivonen: Another problem with "plain links" as a solution is that developers who struggle to create single versions that cater to multiple audiences often can't even manage to provide a simple link to the supposedly "accessible" version. [04:30:00.0000] <webben> It's truly amazing how many text-only sites remain effectively invisible to the target users for that reason. [04:30:01.0000] <webben> It's a similar problem with skip links. People tend to hide the skip links, so that mobility impaired and screen magnifier users never see them. [04:32:00.0000] <hsivonen> webben: since TTY display or speech rendering can be programmatically derived from the primary content, the problem is very different from an editor taking a Time magazine article and tranforming it for those who lack the reading comprehension capabilities required to read the usual Time magazine stuff [04:32:01.0000] <webben> hsivonen: I didn't say it wasn't different, did I? [04:33:00.0000] <webben> hsivonen: Note however that such alternate renderings still need editorial input for alternative text. [04:33:01.0000] <webben> (I mean TTY/speech renderings) [04:33:02.0000] <hsivonen> webben: you did't. however, translation is closer to the problem than "text only" links that you were using as an example [04:34:00.0000] <webben> hsivonen: The issues of how you produce alternate sets of content, and the issue of how the user navigates/chooses between them, seem to me to be quite distinct. [04:34:01.0000] <hsivonen> webben: providing textual alternetives for images is very different from potentially rephasing every sentence [04:35:00.0000] <webben> Like I said, I didn't say it wasn't different. But your formulation of the difference seemed to make a false distinction between programmatic and editorial derivation of the alternate content. [04:36:00.0000] <webben> Both forms of alternate content need editorial input. Simple text alternatives to buttons and link text need very little editorial input. [04:36:01.0000] <hsivonen> the distinction makes a difference when you can place the programmatic derivation on the client but you have to place the editorial derivation on the server [04:36:02.0000] <webben> Long descriptions, translations, symbolic represents, signed interpretations, basic english, etc need much more. [04:37:00.0000] <webben> hsivonen: Can you elaborate? I'm a bit confused about what you're getting at. [04:38:00.0000] <hsivonen> webben: with e.g. alt, you put a little something in the main content. with Basic English, you rewrite the whole content. [04:38:01.0000] <hsivonen> supposed you have an URI for an article [04:38:02.0000] <webben> Yes that's what I mean with less vs more editorial input. [04:38:03.0000] <hsivonen> enabling that URI to be read by a text-to-speech software is easier and less disruptive than cramming a Basic English rewrite in there [04:39:00.0000] <webben> hsivonen: Probably easier. Not sure what you mean by "less disruptive"? [04:40:00.0000] <hsivonen> less distruptive for the authoring process and for non-Basic English use [04:40:01.0000] <webben> hsivonen: example where it might not be easier would be an article about artwork for example, where you might need to provide long descriptions, or an article with loads and loads of charts. [04:40:02.0000] <webben> hsivonen: I don't think either is disruptive to "non-Basic English use". [04:41:00.0000] <hsivonen> to make it absurd by induction: would you cram all translations in one delivery file? [04:41:01.0000] <webben> hsivonen: Well that's the argument for things like longdesc, src, href, and conneg. [04:41:02.0000] <webben> (part of the argument) [04:42:00.0000] <webben> e.g. link href for RSS version [04:42:01.0000] <webben> (can also be used for alternate language versions etc) [04:43:00.0000] <webben> hsivonen: I think whether these things make sense as one file or many probably is ultimately a question of efficient use of bandwidth vs. trying to ensure information doesn't get los due to linkrot. [04:43:01.0000] <webben> plus also whether authoring systems require one to update multiple files or just one. [04:43:02.0000] <webben> e.g. RSS is easy with WordPress, it updates automatically [04:44:00.0000] <webben> translations are easy in Java, as translation editors present all the translations with one interface [04:45:00.0000] <hsivonen> webben: do you mean translating Java UIs? [04:45:01.0000] <webben> yeah [04:45:02.0000] <webben> we have a similar system for localization in the template management system we at work; all translations are presented together [04:46:00.0000] <webben> so you see the German French English etc for "Hello World" all at once [04:46:01.0000] <webben> *we use [04:46:02.0000] <webben> although ultimately those translations get pushed into files that are published separately of course [04:47:00.0000] <hsivonen> anyway, my point is that from an authoring perspective, addressing the needs of blind users entails augmenting the page with textual alternatives whereas addressing the needs of those who can't read full-featured English involves a situation similar to translations in general [04:48:00.0000] <webben> hsivonen: That's true for recasting into basic english. [04:49:00.0000] <webben> not so true for providing image alternatives [04:49:01.0000] <webben> which tends to be handled via embedding [04:51:00.0000] <webben> e.g. lot of pages describe something but also show a diagram or picture of it [04:52:00.0000] <webben> equally, in terms of longdesc, you are actually pointing users out to another resource [04:52:01.0000] <webben> which is not that different to <link>ing to RSS or another language [04:59:00.0000] <webben> I think providing a consistent, explicit facility for providing and navigating equivalent content is more important that the question of whether such content is at the same URI. (That's an important question in its own right, but I'm not /as/ worried about it.) [06:37:00.0000] <zcorpan_> Would it be possible to add functionality so that the drawImage call in HTML5 canvas could take an SVGSVGElement as the first parameter? [06:38:00.0000] <zcorpan_> and getting an svg from the HTMLImageElement [09:56:00.0000] <Philip`> zcorpan_: I would have thought that'd work most easily if you could just do <img src="whatever.svg">, and used the normal drawImage function [09:57:00.0000] <Philip`> Or do browser people really not like doing SVG in <img>, so it's worth special-casing for SVG images? [09:59:00.0000] <webben> It would be interesting to try and spec how svg text and longdesc is supposed to work alongside alt and longdesc. [10:25:00.0000] <zcorpan_> Philip`: it would be nice to be able to pass an SVGSVGElement node as parameter instead of having to use svg via HTMLImageElement [10:26:00.0000] <zcorpan_> Philip`: both are apparently pretty straight-forward to implement (given that svg as HTMLImageElement already works) [10:26:01.0000] <Philip`> Might a generic drawElement(...) be able to handle SVG elements like that, as well as handling drawing of e.g. boxes full of text? [10:31:00.0000] <zcorpan_> dunno [13:54:00.0000] <zcorpan_> ftp doesn't work for me today :| 2007-07-15 [01:02:00.0000] <duryodhan> for the color chooser, maybe one could define the color ends like#ff0 and #fff , and all shades in between will be shown for the user to pick, instead of writing a list of all colors [01:02:01.0000] <duryodhan> not sure how usefull that will be though... :) [06:39:00.0000] <Philip`> I get a parse error for "unrecognised entity name" on 60% of pages, which seems quite surprising [07:08:00.0000] <Philip`> Oh, right, the unrecognised entities were almost all in attributes, which sounds like <a href="a?b&c">, so that's not so surprising [07:10:00.0000] <Philip`> 60% of pages have that error, 30% have no parse errors at all, 10% have duplicate attributes (odd?), 7% have non-permitted slashes, etc [07:24:00.0000] <zcorpan_> Philip`: what are the duplicate attributes? [07:26:00.0000] <zcorpan_> things like <meta name=keywords content=the best of the best>? [07:47:00.0000] <aas> anybody here? [07:49:00.0000] <zcorpan_> yep [07:53:00.0000] <aas> i have a little proble with XHTML MP DOCTYPE declaration [07:53:01.0000] <zcorpan_> ok [07:56:00.0000] <aas> while i'm creating XHTML MP document in servlet through XSLT transformation it's connection to w3c.org and verifing content, and then throw FileNotFound exception on http://www.wapforum.org/DTD/xhtml-mobile10-model-1.mod file [07:56:01.0000] <aas> hmmm. i hope my english seems not very bad for you [07:57:00.0000] <aas> so, what should i do for preventing this error? [07:57:01.0000] <zcorpan_> don't use a doctype [07:57:02.0000] <aas> ) [07:58:00.0000] <aas> of course, it will solve my problem, but it's incorrect a little a think [07:58:01.0000] <aas> *i [07:58:02.0000] <zcorpan_> why is it incorrect? [07:59:00.0000] <aas> i read it's necessary to include doctype in document [07:59:01.0000] <zcorpan_> according to whom? [07:59:02.0000] <aas> wait a sec [08:01:00.0000] <aas> hmmm... [08:03:00.0000] <aas> then another question: a you sure that all phones can process wap2.0 page without doctype element? [08:04:00.0000] <zcorpan_> pretty sure [08:04:01.0000] <zcorpan_> http://simon.html5.org/articles/mobile-results [08:05:00.0000] <aas> are you writing wap sites? [08:05:01.0000] <zcorpan_> no [08:06:00.0000] <zcorpan_> according to my testing, most phones treat your (carefully crafted) page (regardless of what it is) pretty much like desktop browsers treat html [08:06:01.0000] <aas> he, very interesting article, thanks ) [08:07:00.0000] <zcorpan_> even if it validates as something fancy and is served as application/xhtml+xml [08:07:01.0000] <zcorpan_> it gets tag soup treatment by most phones [08:07:02.0000] <zcorpan_> (sadly) [08:08:00.0000] <aas> hmm. did you use emulators? [08:09:00.0000] <zcorpan_> i think a few emulators were tested too [08:09:01.0000] <zcorpan_> the s60 webkit emulator iirc [08:09:02.0000] <zcorpan_> (the result of which was confirmed on a real device) [08:12:00.0000] <aas> how do you think, does mobile phone browsers look at doctype at all? [08:12:01.0000] <zcorpan_> not sure. probably not [08:13:00.0000] <zcorpan_> opera might look at it [08:14:00.0000] <zcorpan_> if you serve the same markup to desktop browsers, and want to avoid quirks mode, then you may want to use some doctype [08:14:01.0000] <zcorpan_> (if you serve it as text/html) [08:15:00.0000] <aas> problem is that i already wrote code for converting web pages... useless now [08:16:00.0000] <Philip`> zcorpan_: http://www.sh3bwah.com/ does <img border="0" ... border=0>, http://www.sapo.pt/ does <a class="subcat" ... class="url">, http://www.nate.com/ does <input name="news_radio" ... name="NS" ...>, http://www.sanook.com/ does <a ... target="_blank" ... target="_blank" ...> [08:17:00.0000] <zcorpan_> Philip`: any dups for style? [08:22:00.0000] <aas> zcorpan_ thanks a lot! it was very useful for me [08:22:01.0000] <Philip`> zcorpan_: http://www.uume.com/ [08:22:02.0000] <aas> sorry for my english [08:23:00.0000] <Philip`> <li ... style="display:none" style="background:none" ...> [08:24:00.0000] <Philip`> (I don't automatically collect data about which attributes were duplicated, so I'm just checking the list of pages through the W3C Validator to get more details) [08:24:01.0000] <zcorpan_> Philip`: ok [08:24:02.0000] <zcorpan_> Philip`: dups for style is nasty in ie [08:25:00.0000] <zcorpan_> Philip`: if we were to support what ie does with dups for style in html5, it would require hacks in the tokenizer :( [08:28:00.0000] <Philip`> <b style="color:red; /*" style="font-size:2em; */"> [08:28:01.0000] <Philip`> Looks like you can't just concatenate the attribute values :-( [08:28:02.0000] <Philip`> I'll try adding some instrumentation to see how common each specific duplicated attribute is [08:29:00.0000] <zcorpan_> indeed, you need to parse each attribute separately and then combine them [08:30:00.0000] <Philip`> <b style="color:red" style="color:green"> [08:30:01.0000] <zcorpan_> pretty much as if it was <style> b { color:red; /*</style> <style> b { font-size:2em; */</style> [08:31:00.0000] <Philip`> Have to reverse the order, it seems [08:31:01.0000] <zcorpan_> ah yep, that too [08:31:02.0000] <zcorpan_> as if it was <style> b { font-size:2em; */</style> <style> b { color:red; /*</style> [08:32:00.0000] <Philip`> <b style="color:red" style="color:green !important"> [08:45:00.0000] <zcorpan_> it is unclear to me what should happen here: http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0D%0A%3Cinput%3E%3Cscript%3Edocument.body.firstChild.tabIndex%20%3D%20%22x%22%3B%20w%28document.body.firstChild.getAttribute%28%22tabindex%22%29%29%3C/script%3E [08:46:00.0000] <zcorpan_> "On setting, the given value must be converted to a string representing the number as a valid integer in base ten and then that string must be used as the new content attribute value." [09:16:00.0000] <zcorpan_> ah. it's defined in http://dev.w3.org/cvsweb/~checkout~/2006/webapi/Binding4DOM/Overview.html?content-type=text/html;%20charset=utf-8#es-long [09:17:00.0000] <zcorpan_> very much needed spec that one [09:17:01.0000] <zcorpan_> /me thinks [09:44:00.0000] <zcorpan_> wonder why <col> isn't allowed as a child of <table> [09:44:01.0000] <Philip`> zcorpan_: I see duplicate @style on 10 pages (out of ~480) [09:44:02.0000] <Philip`> ( http://www.uume.com http://www.megaupload.com http://www.goo.ne.jp http://www.whenu.com http://www.alibaba.com http://www.naver.com http://www.sohu.com http://www.51.com http://www.domaintools.com http://www.6rb.com ) [09:45:00.0000] <zcorpan_> Philip`: wow, that's a lot [09:49:00.0000] <zcorpan_> the first two don't seem to break anything by not doing what ie does [09:49:01.0000] <zcorpan_> /me checks the others [09:50:00.0000] <Philip`> style="" style="" [09:50:01.0000] <Philip`> style="width:300px;" style="ime-mode: active;" [09:50:02.0000] <Philip`> style="padding:0px 0px;" style="font-size:10px;" [09:50:03.0000] <Philip`> style="float:left;width:40%;height:30px;" style="padding:0 0 4px 10px;" [09:50:04.0000] <Philip`> style="WIDTH: 317px" name=query tabindex=2 onfocus="return setTextBox(event, 0);" style="BACKGROUND-POSITION: left 50%; BACKGROUND-IMAGE: url(http://wstatic.naver.com/www/images3/cursor.gif); BACKGROUND-REPEAT: no-repeat" [09:50:05.0000] <Philip`> style=" font-weight:normal; text-decoration:underline;color:#0000cc;font-size:12px;margin-left:240px;" style="behavior: url(#default#homepage)" [09:50:06.0000] <Philip`> style="FONT-SIZE: 12px" style="color:#ff0000" [09:50:07.0000] <Philip`> style="line-height: 1.0em;" style="vertical-align: top" [09:50:08.0000] <Philip`> style="font-family: Tahoma; font-size: 8pt; border: 1 solid #808080" style="font-family: Tahoma; font-size: 8pt" [09:51:00.0000] <Philip`> (Those are the pairs I see on one element) [09:51:01.0000] <Philip`> (for the ten sites) [09:51:02.0000] <zcorpan_> Philip`: thanks [09:54:00.0000] <Philip`> Three of those pages have style="behaviour:url(#default#homepage)", which seems unusual [09:54:01.0000] <Philip`> and the links with those styles also have onClick="this.style.behavior='url(#default#homepage)';this.setHomePage('http://www.naver.com');" [09:54:02.0000] <Philip`> (The one with duplicate style attributes, of which one was that #default#homepage thing, also had the onclick on the same element) [09:55:00.0000] <zcorpan_> that's an ieism we probably don't need to copy [11:03:00.0000] <jgraham> hsivonen: I just found a comment in the html5lib code for the processCharacter method in the InBody phase: [11:03:01.0000] <jgraham> # XXX The specification says to do this for every character at the [11:03:02.0000] <jgraham> # moment, but apparently that doesn't match the real world so we don't [11:03:03.0000] <jgraham> # do it for space characters. [11:03:04.0000] <jgraham> self.tree.reconstructActiveFormattingElements() [11:03:05.0000] <jgraham> So it looks like we don't reconstruct afe's for space elements by design [12:03:00.0000] <hsivonen> jgraham: hmm. testing suggests that real world requires what the spec says--not what html5lib does (Re: afe) [12:04:00.0000] <jgraham> Interesting. [12:05:00.0000] <jgraham> At least we should raise this with Hixie [12:06:00.0000] <jgraham> and ask Anne if he knows why html5lib is different here [12:06:01.0000] <jgraham> /me goes to re-enable current-spec compliant behaviour [12:08:00.0000] <hsivonen> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%3Cp%3E%3Cb%3E%3Ci%3E%3Cu%3E%3C/p%3E%20%3Cp%3EX [12:09:00.0000] <hsivonen> Firefox 2.0, WebKit trunk, Safari 2.0.4, Opera 9.20 and, I'm told, IE7 all put the space node as a descendant of formatting elements as opposed to putting it as a child of body [12:11:00.0000] <hsivonen> jgraham: shouldn't this be raised with Anne first and with Hixie/the list only if Anne has a good reason to diverge from the spec and browser behavior? [12:11:01.0000] <jgraham> Yeah, that was expressed in reverse order [12:16:00.0000] <hsivonen> jgraham: email sent to Anne with you in the CC [12:21:00.0000] <jgraham> hsivonen: Got it. [13:54:00.0000] <jgraham> Philip` / hsivonen: Going back to the tokenizer tests for non-BMP entities; these (if I am not mistaken) assume the tokenizer produces UTF-16 output. [13:55:00.0000] <jgraham> Is that actually required anywhere? The DOM has to be UTF-16 but the HTML5 spec doesn't actually require that you construct a DOM per-se [13:55:01.0000] <Philip`> jgraham: They assume it produces UTF-32 output and then the JSON serialisation encodes those values as surrogate pairs [13:56:00.0000] <hsivonen> jgraham: no, JSON assumes UTF-16 code unit escapes [13:56:01.0000] <hsivonen> jgraham: if simplejson produces UTF-32 escapes, it is borked [13:58:00.0000] <jgraham> OK I think I understand. 2007-07-16 [03:23:00.0000] <reinis> /me looks at the html5 working draft and thinks of how nicely it would fit in a mediawiki [03:39:00.0000] <hsivonen> could someone load http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%3Ctitle%3E%3C%21--%26amp%3B--%3E%3C/title%3E in IE and say what the value of the text node is? [03:40:00.0000] <hsivonen> (html5lib test cases disagree with the spec again) [03:40:01.0000] <hsivonen> (the test agrees with Gecko and WebKit. the spec agrees with Opera) [03:51:00.0000] <hsivonen> I was told on another channel that IE7 is too weird for the test to work [04:00:00.0000] <krijnh> hsivonen: there isn't a text node [04:00:01.0000] <krijnh> #comment: CTYPE ht [04:04:00.0000] <hsivonen> krijnh: thanks. [04:05:00.0000] <krijnh> I figured you already knew :) [04:06:00.0000] <hsivonen> http://hsivonen.iki.fi/test/entity-in-escaped-title.html is suitable for testing IE. I was told on another channel that IE agrees with Firefox and Safari on what the title string is [04:07:00.0000] <krijnh> Yep [04:07:01.0000] <krijnh> Only Opera shows <!--&--> [04:07:02.0000] <krijnh> IE6 en 7 show <!--&amp;--> [04:07:03.0000] <hsivonen> I reported this as a spec bug [04:08:00.0000] <krijnh> Yeah, we need more mails on the list :)) [04:09:00.0000] <hsivonen> krijnh: hopefully mine was useful :-/ [04:10:00.0000] <krijnh> I'm 423 hopefully useful mails behind [04:16:00.0000] <krijnh> hsivonen: You're working on this fulltime, right? [04:17:00.0000] <hsivonen> krijnh: yes. [05:37:00.0000] <hsivonen> http://www.w3.org/mid/11ff5c20707141026s7bcd4882t19e7c490528c4ca⊙mgc [07:46:00.0000] <krijnh> hsivonen: what accept header does your validator use? [07:47:00.0000] <krijnh> 'IO Error: HTTP resource not retrievable.' [08:14:00.0000] <duryodhan> hey I am a noob to whatwg, am interesting web forms 2.0 and what it has to offer etc. , where should I start reading? (really don't think the actual spec would be a good place) [08:14:01.0000] <duryodhan> The wiki is stunningly lacking of content [08:23:00.0000] <krijnh> duryodhan: http://dev.opera.com/articles/view/improve-your-forms-using-html5/ or something? [08:23:01.0000] <krijnh> And I think the spec is a good place :) [08:23:02.0000] <Philip`> duryodhan: http://dev.w3.org/cvsweb/~checkout~/html5/html4-differences/Overview.html?content-type=text/html briefly mentions some of the new bits from WF2 [08:25:00.0000] <Philip`> (but only very briefly, so that's probably not very interesting) [08:31:00.0000] <zcorpan_> http://www.seedit.info/ mac editor with (x)html5 support [08:32:00.0000] <zcorpan_> "the Check Server Document Tool now is using a (X)HTML5 conformance online checking tool." [08:32:01.0000] <zcorpan_> /me ponders [08:32:02.0000] <zcorpan_> hsivonen: anything you know about? [08:53:00.0000] <duryodhan> thanks all ( was out for sometime_ [08:56:00.0000] <hsivonen> zcorpan_: no, not something I know about [08:57:00.0000] <zcorpan_> duryodhan: http://www.whatwg.org/specs/web-forms/current-work/#introduction0 [08:58:00.0000] <zcorpan_> hsivonen: seems it can check uploaded documents [08:59:00.0000] <hsivonen> zcorpan_: so the editor uploads the doc somewhere? where does it upload? [09:01:00.0000] <zcorpan_> hsivonen: to your own server. you first upload the document with its ftp client then you can check conformance. i think. [09:01:01.0000] <hsivonen> zcorpan_: ok [09:04:00.0000] <hsivonen> zcorpan_: I guess I should consider this as an indicator that there'd be demand for a Web service interface [09:05:00.0000] <zcorpan_> hsivonen: yep [09:06:00.0000] <zcorpan_> it seems xhtml5 pages are saved with the .html file extension by default. but the save as drop down includes .xhtml [09:09:00.0000] <zcorpan_> or default is extensionless apparently [09:11:00.0000] <hsivonen> anyway, pretty cool that the service is getting usage. thanks for letting me know [09:11:01.0000] <duryodhan> http://www.whatwg.org/specs/web-forms/current-work/ isn't loading for me :( ... any other place where I could get it (maybe the multisection page) [09:13:00.0000] <Philip`> duryodhan: There isn't a multipage version of the WF2 spec [09:13:01.0000] <Philip`> The normal one seems to load fine for me, though [09:13:02.0000] <duryodhan> damn .. I was wondering why I couldn't find it in google [09:13:03.0000] <duryodhan> whats the actual page name it is index.html or php ? [09:14:00.0000] <zcorpan_> duryodhan: http://dev.w3.org/cvsweb/~checkout~/html5/web-forms-2/Overview.html?content-type=text/html;%20charset=utf-8#introduction0 [09:14:01.0000] <duryodhan> http://www.whatwg.org/specs/web-forms/current-work/index.html is giving me page not found (your site's page not found , not some firefox error couldn't connect etc. ) [09:15:00.0000] <duryodhan> zcorpan_: thanks [09:15:01.0000] <zcorpan_> duryodhan: it's "index" [09:15:02.0000] <zcorpan_> duryodhan: no extension [09:15:03.0000] <duryodhan> yeah that worked for me again thanks zcorpan_ [09:16:00.0000] <zcorpan_> np [09:16:01.0000] <duryodhan> hmm wonder why didn't work for me .. you might wanna check that out ... [09:47:00.0000] <duryodhan> A free-form text field, nominally free of line breaks. whats that supposed to mean? In name only free of line breaks? [09:49:00.0000] <zcorpan_> a text field in which you can't enter line breaks (as opposed to <textarea>, in which you can enter line breaks) [09:55:00.0000] <duryodhan> yeah that I realised ... but why the nominally ... [09:55:01.0000] <duryodhan> "in name only" free of line breaks ... that doesn't make sense much .. [09:56:00.0000] <duryodhan> maybe it is cos the POST data can be sent for a line input with new lines in it ... [10:02:00.0000] <Philip`> <input value="a&#10;b"> [10:03:00.0000] <Philip`> might be relevant there [10:04:00.0000] <zcorpan_> i guess nothing is done to actually prevent line breaks [10:04:01.0000] <zcorpan_> except the UAs user interface [10:24:00.0000] <hsivonen> hmm. DOM Level 3 added stuff to the Java interfaces [10:24:01.0000] <zcorpan_> http://simon.html5.org/test/html/dom/reflecting/ 42 tests [10:24:02.0000] <hsivonen> I guess it isn't pretty when you try to load Level 2 impl classes with a JDK that ships Level 3 interfaces... [10:25:00.0000] <hsivonen> I think I'm gonna stick the form pointer into Level 3 user data [10:56:00.0000] <BenWard> ftp://bicyclenail-lm.london.corp.yahoo.com [10:57:00.0000] <BenWard> SOrry [10:57:01.0000] <BenWard> Wrong channel :( [15:08:00.0000] <hsivonen> krijnh: the validator uses various Accept headers depending on what it is retrieving and in what mode [15:10:00.0000] <hsivonen> krijnh: the types that can be in the accept header are application/relax-ng-compact-syntax, text/html; q=0.9, application/xhtml+xml, application/xml, image/svg+xml, application/docbook+xml, text/xml; q=0.3, */*; q=0.1 [15:11:00.0000] <hsivonen> krijnh: they aren't all there all the time. which ones are at what time hopefully happens pretty much as you'd expect 2007-07-17 [01:55:00.0000] <krijnh> hsivonen: yt? [01:56:00.0000] <hsivonen> krijnh: yes [01:57:00.0000] <krijnh> Perhaps my server is misconfigured [01:57:01.0000] <krijnh> Wrt your accept header [01:57:02.0000] <krijnh> http://hsivonen.iki.fi/validator/html5/?doc=http%3A%2F%2Fkrijnhoetmer.nl%2Fstuff%2Ftests%2Ftable-caption-margin [01:57:03.0000] <krijnh> http://hsivonen.iki.fi/validator/html5/?doc=http%3A%2F%2Fkrijnhoetmer.nl%2Fstuff%2Ftests%2Ftable-caption-margin.html works [01:58:00.0000] <hsivonen> krijnh: in that case, the accept header should have a/x+x, a/x and t/h in it [01:58:01.0000] <hsivonen> preferring a/x+x [01:58:02.0000] <krijnh> Hmm, then why does Apache send a 406 header [01:59:00.0000] <hsivonen> of course, it is possible that I have a bug and the validator sends something else [02:02:00.0000] <krijnh> Or a bug in Apache [02:02:01.0000] <hsivonen> krijnh: found it [02:03:00.0000] <krijnh> Fixable? [02:03:01.0000] <hsivonen> krijnh: your apache thinks the type is application/x-httpd-php and that's not on my Accept list [02:03:02.0000] <krijnh> Ah [02:03:03.0000] <krijnh> .html does get parsed by php [02:04:00.0000] <krijnh> Any idea how I can fix that? [02:05:00.0000] <hsivonen> krijnh: I'm not sure, but I think the way you are associating files with an Apache handler is the legacy (content-type) way. I would guess that using the new handler association wouldn't confuse multiviews [02:06:00.0000] <hsivonen> that is, AddHandler instead of AddType might fix it [02:06:01.0000] <hsivonen> for the php association [02:06:02.0000] <krijnh> AddType application/x-httpd-php .php .html is what I use, yeah [02:06:03.0000] <krijnh> I'll look it up, thanks [02:07:00.0000] <krijnh> I wonder if that works for Apache 1.3 as well [02:08:00.0000] <hsivonen> I don't know [02:12:00.0000] <krijnh> :p that doesn't really work [02:20:00.0000] <krijnh> Yay, fixed [02:32:00.0000] <hsivonen> krijnh: what worked? [02:32:01.0000] <krijnh> hsivonen: The ugly approach [02:33:00.0000] <krijnh> Just made a dir with an index.html in it [02:33:01.0000] <krijnh> Almost the same :) [02:33:02.0000] <krijnh> This can't be fixed for Apache 1.3 with the AddHandler way [04:15:00.0000] <hendry> what's going on with whatwg and a storage layer, ala google gears [07:47:00.0000] <zcorpan_> hsivonen: http://www.rightwebpage.com/content/view/35/42/ "2) HTML Validation against W3C DTDs and WHATWG (X)HTML5 pre-release schemas;" [07:57:00.0000] <met_> last 2 comments http://blog.whatwg.org/wsg-html5-presentation#comment-6388 http://blog.whatwg.org/proposing-uri-templates-for-webforms-20#comment-6387 content llooks in topic, but home-URL looks it's spam [08:02:00.0000] <zcorpan_> then it's spam [14:50:00.0000] <hsivonen> https://garage.maemo.org/forum/forum.php?forum_id=1171 [15:11:00.0000] <jgraham> hsivonen: I loose track but have Nokia not developed or shipped browsers based on all of Gecko and Presto and Webkit at various times? [15:12:00.0000] <bewest> the 770 ships with opera [15:12:01.0000] <bewest> I don't know the history of maemo previous to the 770 [15:12:02.0000] <bewest> is this mozilla based browser something other than minimo? [15:13:00.0000] <bewest> must be... I suppose if it was minimo they would have said so [15:14:00.0000] <hsivonen> jgraham: yes. 2007-07-18 [07:46:00.0000] <zcorpan_> shouldn't the final wcag samurai be published by now? [09:36:00.0000] <Lachy> this has just been published! http://www.w3.org/TR/xbl-primer/ [09:43:00.0000] <zcorpan_> Lachy: s/XmlHttpRequest/XMLHttpRequest/ [09:44:00.0000] <zcorpan_> Lachy: s/how you can to simplify/how you can simplify/ [09:44:01.0000] <Lachy> ah, you can blame Marcos for that :-) [09:44:02.0000] <zcorpan_> seems there are lots of typos [09:45:00.0000] <Lachy> oh well, that's probably cause I didn't get a chance to proof read the parts I didn't write [09:46:00.0000] <Philip`> The image in Figure 1 gets rescaled quite uglily [09:46:01.0000] <Lachy> yeah, I just noticed [09:47:00.0000] <Lachy> can you send mail to public-appformats with all the errors and either Marcos or I will fix them later this week [09:50:00.0000] <Lachy> I've fixed those 3 mistakes and will check them in shortly [10:12:00.0000] <zcorpan_> Lachy: shouldn't’t [10:12:01.0000] <zcorpan_> in 1.8 Templates [10:14:00.0000] <Lachy> fixed [10:14:01.0000] <Lachy> good night, cya tomorrow (send mail for any others) [10:15:00.0000] <zcorpan_> nn [12:37:00.0000] <jgraham> Philip`: Am I correct in thinking that your tokenizer can only dump tokens to stdout? [12:38:00.0000] <jgraham> /me is interested in writing python bindings and plugging it in to html5lib [12:39:00.0000] <jgraham> Where by "writing" I mean "using e.g. SWIG to generate" [12:47:00.0000] <Philip`> jgraham: The token stream is handled via an interface class, where the default implementation dumps JSON but it can be switched (e.g. by '--stats' on the command line) to a different implementation that counts tags/etc instead [12:48:00.0000] <Philip`> and so it should be possible to add another token-stream implementation that pushes the tokens into Python [12:49:00.0000] <Philip`> (It does the same kind of thing for the input stream too, though the current implementation is kind of rubbish since it doesn't handle non-ASCII correctly) [12:49:01.0000] <Philip`> ((since it just reads bytes from stdin and pretends they're characters)) [12:49:02.0000] <jgraham> Ah, OK. I'd just seen the bit in emitToken that looks like std::cout << printBuf [12:50:00.0000] <Philip`> That's just in the TestTokenStream class, which is the JSON-printing one [12:50:01.0000] <Philip`> (NullTokenStream is kind of useless and does nothing, and StatsTokenStream does the counting) [12:51:00.0000] <jgraham> /me should learn to read more context [12:53:00.0000] <Philip`> Would it be useful if the tokeniser could emit one token at a time, so you could pull data from it rather than having it run uninterruptibly until EOF? [12:58:00.0000] <jgraham> Philip`: Yes [15:05:00.0000] <gsnedders> /me hits parse error in one of karl's emails and aborts processing [15:10:00.0000] <zcorpan> gsnedders: pointer? [15:10:01.0000] <gsnedders> zcorpan: by parse error I mean English mistake [15:11:00.0000] <zcorpan> gsnedders: yes; pointer to the email? [15:11:01.0000] <gsnedders> zcorpan: I won't be able to find it quickly now [15:11:02.0000] <zcorpan> k 2007-07-19 [01:42:00.0000] <Lachy> hey, I'm working on my presentation (which I'll be presenting on 3 Aug) and I'm trying to list and compare the benefits of using HTML vs. XHTML and explain when each is appropriate. [01:42:01.0000] <Lachy> Any ideas about the benefits and use cases for XHTML? [01:44:00.0000] <hsivonen> Lachy: first, I'm assuming you mean XHTML 1.0: [01:44:01.0000] <hsivonen> Lachy: embedding SVG and MathML [01:44:02.0000] <Lachy> no, XHTML5 [01:44:03.0000] <hsivonen> ooh. [01:45:00.0000] <Lachy> the presentation is called Developing with HTML5 [01:45:01.0000] <hsivonen> well, anyway, embedding SVG and MathML in it [01:45:02.0000] <Lachy> yeah, got that one already (that's the only one I had) [01:45:03.0000] <hsivonen> embedding it into XSLT transformations (for producing HTML5 or XHTML5) [01:45:04.0000] <Lachy> also XSLT, but I'm not sure if that's really a benefit ;-) [01:45:05.0000] <Lachy> I suppose, some people might want to do that [01:46:00.0000] <hsivonen> using the XML representation as the internal representation in non-browser apps [01:46:01.0000] <Lachy> like in a CMS? [01:46:02.0000] <hsivonen> that's not syntax but using streaming or tree representations [01:46:03.0000] <hsivonen> yes [01:47:00.0000] <hsivonen> or in a conformance checker :-) [01:47:01.0000] <Lachy> yeah, but most developers are building a conformance checker [01:47:02.0000] <Lachy> *aren't [01:48:00.0000] <hsivonen> well, any app that wants to do non-browser things with HTML5 and wants to do so in a robust way could use an XML pipeline with an HTML5 parser on input and an HTML5 serializer on output [01:48:01.0000] <hsivonen> which means there's no XML *syntax* involved but the APIs / data models are [01:48:02.0000] <Lachy> I'll see if I can turn that into some kind of flow chart, showing the authoring in (X)HTML, storing as XHTML, and serialising as HTML5 to the client [01:59:00.0000] <jgraham> Yeah, for something like Genshi the XML serialisation will work better [02:00:00.0000] <jgraham> http://genshi.edgewall.org/ (Python templating language that uses SAX-like streams internally and uses a subset of XInclude to process fragments) [02:04:00.0000] <hsivonen> Lachy: after all, all the XML toolchain stuff is pretty cool. it's just that the server-to-browser step doesn't work in IE and the ingestion step to the pipeline is brittle if using a real Draconian XML parser [03:14:00.0000] <rabies> Morgen. [09:53:00.0000] <Navarr> Pardon for the probably very odd question; but what is the point in HTML5 being made if its less strict? Isn't the point of standards to make them more strict to follow a specific set of rules so that browsers read and display them correctly on a universal level? [09:54:00.0000] <gavin_> depends on what you mean by "more strict" [09:54:01.0000] <gavin_> it's important that the UA requirements are completely and consistently defined [09:55:00.0000] <gavin_> the authoring requirements don't need to be "strict" to achieve interoperability [09:56:00.0000] <Navarr> but with the advancing work of XHTML to combine into XML to have further freedom in what a user does, why continue HTML? (these are very basic questions, being asked by a 16 year old interested in the web) [09:56:01.0000] <gavin_> I'm not sure I understand your assertion [09:57:00.0000] <Navarr> well, XHTML 2.0 is being worked on by the W3C, allowing the combination of HTML (written using XML schema?) to combine with other types of XML (SVG, MathML,ect.) why continue HTML? [09:58:00.0000] <gavin_> the goal of the XHTML2.0 working group isn't to "combine HTML with other types of XML", as far as I know [09:59:00.0000] <gavin_> their goal is to rewrite HTML, and they're doing it in a way that is incompatible with the web [09:59:01.0000] <Navarr> ah, i see. [09:59:02.0000] <Navarr> Thank you very much, I'm just kind of curious. [10:01:00.0000] <Philip`> About 99.95% of the web uses HTML instead of XHTML, so work on HTML is more relevant than XHTML for the majority of users and authors [10:03:00.0000] <Navarr> Isn't it also true that most of the web uses wrongly formatted HTML? [10:04:00.0000] <gavin_> yes [10:04:01.0000] <Philip`> http://triin.net/2006/06/12/HTML suggests about 97.5% of pages are invalid [10:04:02.0000] <Navarr> wow. [10:05:00.0000] <Navarr> So, what will effectively be the difference between HTML5 and XHTML2? [10:05:01.0000] <Philip`> There are basic syntax errors on about half of pages, before even looking at whether they're using real HTML elements and using them correctly [10:05:02.0000] <gavin_> http://blog.whatwg.org/faq/ will probably answer a lot of your questions [10:06:00.0000] <Navarr> thank you ^^;;; [10:06:01.0000] <gavin_> though it's perhaps a little bit biased :) [10:07:00.0000] <Navarr> yea [10:07:01.0000] <Navarr> i see it has been worked on to try to make it less biased [10:16:00.0000] <Navarr> thank you for that [15:49:00.0000] <annevk> hmm, 1848 new e-mails [15:49:01.0000] <zcorpan> annevk: wb [15:51:00.0000] <annevk> anything new? [15:53:00.0000] <zcorpan> annevk: i updated http://html5.googlecode.com/svn/trunk/parser-tests/ 2007-07-20 [18:48:00.0000] <Lachy> Woo Hoo! The selectors api naming debate continues!!! :-) [18:53:00.0000] <hober> *groan* [19:03:00.0000] <othermaciej> Lachy: the bike shed needs to be *green*, dammit [19:03:01.0000] <othermaciej> I will stand for nothing less [19:22:00.0000] <Lachy> I don't have a problem with either selectorQuery, querySelector or matchSelector [01:46:00.0000] <jgraham> Lachy: What's the distinction between "DOM Processing" and "XML Processing"? Is DOM Processing just supposed to be normalising the content to use XMLisms where the XML way of doing something is different to the HTML5 way [01:46:01.0000] <jgraham> ? [01:47:00.0000] <jgraham> Also, it's not obvious to me that storage will always be as XHTML per-se (one could choose to store content in its original unaltered form or as XHTML-in-Atom or any other convenient format) [01:48:00.0000] <jgraham> (but in general I guess XHTML makes some sense) [01:55:00.0000] <hsivonen> Lachy: yes, the flowchart is reasonable. However, it is a matter of taste if you want to say "DOM" or something more abstract like "infoset processing" or somesuch [01:55:01.0000] <Lachy> ok, thanks [02:00:00.0000] <hsivonen> Lachy: the good thing with saying "DOM" is that the audience knows what it is. The bad thing is that in practice it sucks the most if you consider all the competing options you could plug there. [02:01:00.0000] <Lachy> I'm just getting sent a more professional looking version of it, which has changed a bit. I'll upload it now [02:04:00.0000] <Lachy> http://lachy.id.au/temp/CMS1.png [02:05:00.0000] <hsivonen> looks good [02:06:00.0000] <Lachy> Marcos created it for me [02:06:01.0000] <annevk> doesn't take into account editing etc. [02:06:02.0000] <Lachy> annevk, what do you mean? [02:08:00.0000] <Lachy> it doesn't have to be a completely accurate system flow chart, it's just an illustration to show how HTML and XHTML can be used together for use in my presentation [02:09:00.0000] <annevk> k [02:10:00.0000] <karlUshi> Lachy: you could also show that from the XML DB, you could serialize in many forms. Atom, PDF, HTML etc. [02:12:00.0000] <Lachy> yeah, I guess I could add a generic "Other Formats" output [02:13:00.0000] <karlUshi> which is one of the possible benefits of having an XML format somewhere. More tools available for this kind of conversion [02:15:00.0000] <annevk> just running HTML->DOM->XSLT works too [02:18:00.0000] <karlUshi> annevk: yes indeed if you develop your own tools [02:18:01.0000] <karlUshi> hmm time to move [02:19:00.0000] <annevk> most XSLT thingies have HTML support [02:25:00.0000] <hsivonen> If anyone wants to prove a point, I suggest hooking up the parser I've been writing to SAXON, so then other people would no longer have to develop their own tools. :-) [02:25:01.0000] <hsivonen> annevk: do you mean serialization support? [02:25:02.0000] <annevk> no, as input [02:26:00.0000] <hsivonen> annevk: oh. What XSLT thingies come with HTML support out of the box? [02:26:01.0000] <Lachy> new version http://lachy.id.au/temp/CMS1.png [02:27:00.0000] <annevk> /me thought several; might be mistaken [02:37:00.0000] <zcorpan> annevk: could you update http://html5.org/parsing-tests/testrunner.htm from trunk? [03:30:00.0000] <hsivonen> the whatwg blog spammers are getting more sophisticated. now the Polish pharmacy spammer something that actually relates to the post. [04:15:00.0000] <hendry> hsivonen: are you using akismet? [04:15:01.0000] <hendry> hsivonen: are you using akismet (wordpress spam filtering service)? [04:16:00.0000] <hsivonen> hendry: the whatwg blog is using it [04:17:00.0000] <hsivonen> hendry: I don't know how much spam it catches, but lately a number of spams have made it through the filter [04:17:01.0000] <hsivonen> hendry: generally German or Polish SEO attempts [04:18:00.0000] <hendry> i guess there is no moderation? i think you must have moderation too [04:23:00.0000] <hsivonen> hendry: Lachy and I are the moderators [04:56:00.0000] <annevk> zcorpan, updated [05:14:00.0000] <hsivonen> interesting bad UTF-8: http://photomatt.net/2007/07/13/on-php/ [05:15:00.0000] <hsivonen> near the string "Reluctant Acceptance" there are byte sequences 0xE2 0x80. I wonder how those came to be. 2007-07-21 [08:09:00.0000] <zcorpan> http://www.mdibb.co.uk/2007/07/20/html5s-networking-api-let-the-lunacy-begin/ [08:28:00.0000] <Philip`> Did anyone point out http://developers.slashdot.org/article.pl?sid=07/07/20/1226235 already? [08:34:00.0000] <hsivonen> Philip`: you are "Excors" wielding the clue stick, right? [08:36:00.0000] <Philip`> I'm not certain that I have a properly functioning clue stick, but that is me anyway [08:39:00.0000] <zcorpan> Philip`: you can use Ruby in html just as much as you can in xhtml. or right now it only works in html because only ie has implemented it and it doesn't support xhtml... :) [08:46:00.0000] <Philip`> zcorpan: Ah, so I don't have a non-broken clue stick :-) [08:48:00.0000] <zcorpan> also, <br /> == <br>> not <br>/ ;) [08:53:00.0000] <Philip`> Oops [08:54:00.0000] <Philip`> I don't suppose anyone really cares about the details, though :-p [08:54:01.0000] <zcorpan> indeed [08:54:02.0000] <zcorpan> sgml is irrelevant :) [08:54:03.0000] <zcorpan> /me finds that section 3 is pretty thin on UA reqs [08:55:00.0000] <zcorpan> for already-implemented stuff anyway [08:59:00.0000] <Philip`> It would be easier to remember the <br/> thing if I actually had any tool that processed HTML in that way, other than putting <h1/></h1> into the W3C Validator with the Show Outline mode :-) [09:01:00.0000] <zcorpan> the validator used to be able to show the parsed tree [09:02:00.0000] <hasather> zcorpan: I think that feature was removed, and I think it'll come back [09:03:00.0000] <zcorpan> ok [09:10:00.0000] <zcorpan> http://html5.org/parsing-tests/testrunner.htm now works in safari [09:30:00.0000] <zcorpan> /me compares the results in the four browsers side by side. haven't found anything interesting yet [09:37:00.0000] <zcorpan> why is there a space before each line in ie? [09:38:00.0000] <zcorpan> e.g. Test 8 of 21 in data/tests3.dat fails because of that [09:40:00.0000] <zcorpan> #document in Test 11 of 21 in data/tests3.dat is wrong [09:41:00.0000] <zcorpan> the last line should say: y" instead of: | y" [09:46:00.0000] <zcorpan> seems that is already fixed in trunk 2007-07-23 [14:53:00.0000] <hendry> /me wonders if you can hook up gmail status to twitter 2007-07-24 [02:26:00.0000] <krijnh> *sigh* [02:26:01.0000] <krijnh> "Ajax is een combinatie van PHP XML JAVASCRIPT." [02:28:00.0000] <zcorpan> ah, 'tuurlijk! [02:28:01.0000] <krijnh> :) [02:29:00.0000] <krijnh> And then you say, "no it isn't, read the article JJG wrote" [02:30:00.0000] <krijnh> And then he says "no, it's definately a combination of PHP JAVA XML." [02:30:01.0000] <krijnh> Oh well [02:30:02.0000] <krijnh> We need an Ajax object in HTML5 :) [02:30:03.0000] <krijnh> Then we get the cool kids on our hands as well [02:31:00.0000] <zcorpan> XMLHttpRequest was originally in the html5 spec [02:31:01.0000] <krijnh> That's not Ajax [02:31:02.0000] <krijnh> Ajax is PHP XML JS [02:31:03.0000] <krijnh> :) [02:31:04.0000] <zcorpan> not Java? :) [02:31:05.0000] <krijnh> With HTML you can't make db connections, that's why you need JS [02:31:06.0000] <krijnh> Err, Ajax [02:32:00.0000] <krijnh> :P [02:34:00.0000] <krijnh> Damn, we get referrer spam on /irc-logs/ :/ [02:37:00.0000] <krijnh> "because people know you can use strict html in ajax, they automatically also use php" [02:37:01.0000] <krijnh> :D [03:34:00.0000] <met_> http://arstechnica.com/journals/linux.ars/2007/07/23/the-unforking-of-kdes-khtml-and-webkit [07:37:00.0000] <annevk> /me is back in Oslo [07:39:00.0000] <zcorpan> /me will be in linköping tomorrow and the day after that [07:39:01.0000] <annevk> /me wonders when zcorpan will visit Oslo [07:39:02.0000] <annevk> or Utrecht :) [07:40:00.0000] <zcorpan> hoe lang blijf je in oslo? [07:43:00.0000] <annevk> 18 augustus ga ik terug [07:46:00.0000] <zcorpan> ok. ik weet nog niet als ik bij opera kan werken na de zomer [07:46:01.0000] <annevk> zou je dan naar Noorwegen verhuizen? [07:47:00.0000] <zcorpan> mischien [07:47:01.0000] <annevk> k [12:26:00.0000] <zcorpan> not many changes to css21 [12:35:00.0000] <zcorpan> overlapping cells is a no-brainer [12:35:01.0000] <zcorpan> all browsers render them the same [12:35:02.0000] <zcorpan> yet the css21 spec doesn't want to define it [15:38:00.0000] <zcorpan> heh, pressing the back button in firefox while an xml page is loading flashes the YSOD 2007-07-25 [23:19:00.0000] <met_> Some problems with mailing lists? Archives stopped working http://lists.whatwg.org/pipermail/whatwg-whatwg.org/ [06:23:00.0000] <gavin_> 2 [06:39:00.0000] <annevk> http://lists.whatwg.org/pipermail/ ... [06:42:00.0000] <met_> annevk i see this, have even found http://lists.whatwg.org/listinfo.cgi/whatwg-whatwg.org but seems there is no public archive [06:43:00.0000] <met_> old links to the archive do not work http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2007-July/012165.html (which mean many links on my web 8-) [06:43:01.0000] <annevk> this is not the first time [06:43:02.0000] <met_> si it is only some problem with update or configuration? [06:44:00.0000] <annevk> http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/ used to work at some point in the past [11:59:00.0000] <kinocon> hi [11:59:01.0000] <kinocon> whatwg is what? [12:00:00.0000] <bhnmkl> hi? [12:01:00.0000] <bhnmkl> bunjallo suxs a little [12:29:00.0000] <bhnmkl> HI??????????????????????????? 2007-07-26 [19:50:00.0000] <yod> sweet sweet sweet sweet [21:17:00.0000] <Lachy> hey, anyone there? [21:17:01.0000] <Lachy> I'm looking for some use cases for datagrid. Something that I can talk about and describe in my presentation. Any suggestions? [21:21:00.0000] <Lachy> would a database application (like PHPMyAdmin) be a good use case for datagrid? [21:46:00.0000] <othermaciej> Lachy: look at anything that has a tree control [21:47:00.0000] <othermaciej> or something that's like a list box but needs multiple columns [21:50:00.0000] <Lachy> I thought of doing a web based mail client, since that can be represented as either a list or a tree in threaded view [08:24:00.0000] <zcorpan> Lachy: perhaps we should do s/15 years or more/year 2022 or later/ in the faq [08:25:00.0000] <Lachy> ok [08:25:01.0000] <annevk> "This CNN.com feature is optimized for Adobe Flash Player version 8 or higher. You are currently using Flash Player 0" [08:34:00.0000] <Lachy> zcorpan: how about this: "It is estimated that HTML5 will reach a W3C recommendation in the year 2022 or later. This is approximately 18-20 years of development, since beginning in mid-2004." [08:35:00.0000] <zcorpan> sounds good [08:35:01.0000] <Lachy> done [08:37:00.0000] <zcorpan> "This will approximately 18-20..." it says [08:37:01.0000] <zcorpan> "is" or "will be"? :) [08:38:00.0000] <met_> zcorpan: aby reason for changing the delay? [08:38:01.0000] <zcorpan> met_: 2007+15 is 2022? [08:39:00.0000] <met_> wasn't whatwg establish in 2004? [08:39:01.0000] <zcorpan> yes, but the 15 years estimate referred to from 2007 [08:40:00.0000] <met_> oh I see http://lists.w3.org/Archives/Public/www-archive/2006Nov/0000.html [08:41:00.0000] <met_> there it is, ok [08:41:01.0000] <Philip`> What is the point of W3C Recommendation status? It's not like we don't recommend people use HTML5 before 2022... [08:41:02.0000] <zcorpan> Philip`: perhaps it's just a frequently asked question? [08:41:03.0000] <zcorpan> :) [08:41:04.0000] <met_> Philip`: couldn't it be necceserry for IE team? [08:42:00.0000] <met_> or will they adapt some HTML5 specification which is still "iin process"? [08:42:01.0000] <zcorpan> probably [08:43:00.0000] <met_> zcorpan: there wasn't any answer from IE-team, was it? [08:43:01.0000] <met_> /me cannot find any [08:44:00.0000] <Philip`> Maybe the answer to that question should say that it's expected to be a Candidate Recommendation (or whatever status actually means something useful) in n years where n < 15, rather than scaring people away with talking about time periods that are as long as the whole life of the web [08:45:00.0000] <met_> Philip`: +1 [08:47:00.0000] <annevk> It should definitely say that HTML 5 is already being implemented / used / etc. otherwise it will just hopelessly confuse people [08:47:01.0000] <annevk> It is in fact one of the more common complaints [08:48:00.0000] <met_> e.g. many people thing that CSS2 was finished in 1998 (they do not count CSS2.1 as finishing CSS2), so if to tell HTML5 (not HTML5.1 will be winished in 15 or 18 years), people are scared [08:52:00.0000] <zcorpan> does it matter if people are scared at this point? [08:52:01.0000] <met_> yes 8-))) [08:52:02.0000] <zcorpan> i mean, even though i have advocated html5, it's not really ready to be used on a wide scale yet [08:53:00.0000] <met_> zcorpan: Does it matter if people believe more in HTML5, XHTML2 or even Microtoft Silverlight? I think yes, it does. [08:54:00.0000] <zcorpan> dunno [08:54:01.0000] <Philip`> Maybe it matters if people writing web applications see that HTML5 is decades away and so it will be totally irrelevant by the time it's finished, and so they choose a different platform to develop for [08:54:02.0000] <met_> even when I said HTML5 will be here in aproximatelly 3 years (which is what HTML WG has in charter and probably won't be fullfiled), people are scared - SO LOOOONG? [08:55:00.0000] <met_> And about 15 years, they said exactly what said Philip` - the whole web is so old, will be there same web after 15 years? [08:56:00.0000] <met_> or did you (anybody) 15 years ago expect web to same the as the web we now? the answer is no. [08:58:00.0000] <met_> maybe in the FAQ should be somethin that HTML5 will be implemented after (dunno) 4+5 years, and will be closed (?) after 18 years, the word "finished" people often understund as "will be start using" [08:58:01.0000] <annevk> zcorpan, yes, people should be enthusiastic about it [08:59:00.0000] <annevk> zcorpan, popular technologies win, not those that are good and unknown (based on historical precedents) [08:59:01.0000] <zcorpan> ok [09:02:00.0000] <Philip`> HTML5 is dragging along all of HTML's history with it, so it can't actually be good technology, but at least it has a chance of stealing HTML4/XHTML1's popularity and succeeding that way :-) [09:03:00.0000] <met_> zcorpan: and becouse HTML5 is (according to Hixie) Open process, people speak about it, and do not excuse any detail, beacause they are about any stupid and in open process thay can (different fro HTML4 or CSS) [09:03:01.0000] <met_> *becouse thay care about any stupid detail [09:04:00.0000] <annevk> Philip`, yeah, not sure if I was referring to HTML 5 as good... [09:04:01.0000] <annevk> obviously, technologies that are not good and unknown also lose [09:19:00.0000] <annevk> The FAQ also still talks about "Web Applications 1.0" [09:20:00.0000] <annevk> I would also suggest to remove Web Controls from the FAQ or maybe give it its own question and explain it's dead until XBL is adopted [09:21:00.0000] <annevk> "conformance checker" should point to http://validator.whatwg.org/ [09:45:00.0000] <Lachy> I think that FAQ entry does a reasonably good job of explaining that authors don't have to wait for the spec to be a REC before they can start using it. It even gives a couple of examples of features that will be implemented and usable relatively soon [09:46:00.0000] <Lachy> but if someone wants to revise it and email me their suggestions, I can update it [15:45:00.0000] <Philip`> It seems slightly odd to say that messages on #whatwg are "HTML5 Working Groups IRC discussion" [15:47:00.0000] <gavin_> I noticed that too [15:47:01.0000] <gavins> also [ While these may be seen as simply innocuous, "private" comments, the fact that they are publicly recorded and associated to the HTML-WG should be of concern. ] [15:50:00.0000] <zcorpan> gotta love when people dig up irc discussions [15:54:00.0000] <gavin> the discussion about smellovision was clearly tongue-in-cheek, I don't understand why people are taking such offense to it [15:55:00.0000] <gavin> it seems to me like it's being blown way out of proportion [15:57:00.0000] <Philip`> It was also more like half a dozen comments than an actual discussion [15:59:00.0000] <Philip`> krijnh: By the way, http://krijnhoetmer.nl/irc-logs/ seems to be getting significantly slower than it used to be - is it reading through the entire chat and search history every time it generates that page or something? (and would it be feasible to make it speedier somehow?) [16:00:00.0000] <Lachy> woah! http://lists.w3.org/Archives/Public/w3c-wai-ig/2007JulSep/0010.html [16:02:00.0000] <zcorpan> yeah, JERK! [16:02:01.0000] <Philip`> Isn't the whole point with disabilities that humans are not equal? (but should be given equal opportunities regardless of that (where 'should' is obviously limited by the cost of doing that)) [16:05:00.0000] <Lachy> I'm just going to ignore them, it's not worth responding to such insults [16:06:00.0000] <Lachy> oh, he wrote a formal complaint to public-html [16:07:00.0000] <Philip`> Most of the comments are on http://lists.w3.org/Archives/Public/w3c-wai-ig/2007JulSep/ [16:32:00.0000] <zcorpan> /me -> bed 2007-07-27 [03:07:00.0000] <krijnh> Philip`: Yeah, I'll fix it tomorrow, gonna split it up in /irc-logs/whatwg/2007, /irc-logs/whatwg/200706, et cetera [03:07:01.0000] <krijnh> On /irc-logs/ I'll only show the current month of the channels [03:38:00.0000] <annevk> Hixie, why are the WHATWG archives private now? That should probably be undone [03:38:01.0000] <annevk> s/probably// [04:36:00.0000] <met_> http://the.taoofmac.com/space/blog/2007/07/26/2341 [04:38:00.0000] <annevk> yet at least one graph library has been created with <canvas> [04:40:00.0000] <met_> annevk: yes, ii remember [04:41:00.0000] <zcorpan> why can't i find and reply to my [whatwg] Color attributes message to whatwg? :| [04:41:01.0000] <annevk> you have deleted it? [04:42:00.0000] <met_> tihs http://solutoire.com/plotr/ [04:43:00.0000] <annevk> I believe text drawing will be considered for some future version and maybe line styles too. Given the current number of interop issues however I would hope those are being fixed first [04:45:00.0000] <zcorpan> if someone here has it in his inbox, please reply to it saying that "transparent" is to be treated as a keyword, meaning transparent for backgrounds and borders, and black for text colors [04:46:00.0000] <annevk> I can resent it to you [04:46:01.0000] <annevk> done [04:52:00.0000] <zcorpan> thanks [05:02:00.0000] <annevk> in theory transparent would not be black btw... [05:02:01.0000] <annevk> seems WHATWG mailing list archives are completely changed... [05:02:02.0000] <annevk> [Whatwg] versus [whatwg] [05:06:00.0000] <zcorpan> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3Cstyle%3Ebody%20%7B%20color%3Ared%3B%20%7D%20font%20%7B%20color%3Atransparent%3B%20%7D%3C/style%3E%3Cfont%20color%3Dred%3Ex%3C/font%3E [05:08:00.0000] <annevk> well, CSS3 color:transparent [05:08:01.0000] <annevk> in CSS 2.1 it's not a valid keyword for color iirc [07:16:00.0000] <zcorpan> http://weblogs.mozillazine.org/roc/archives/2007/07/brrrrrr.html 5th paragraph [07:34:00.0000] <annevk> https://bugzilla.mozilla.org/show_bug.cgi?id=371432 [07:34:01.0000] <annevk> I'm not sure I like all the design choices though [07:53:00.0000] <zcorpan> i wonder about <meta http-equiv> and how it's to be processed per html5 [07:53:01.0000] <zcorpan> afaik browsers support things like Refresh as real http headers too [07:53:02.0000] <zcorpan> and other headers work in <meta http-equiv> as if they were real headers [08:41:00.0000] <zcorpan> shouldn't UAs be allowed to opt to not load the resource of an <object type data> if type is something the UA knows is unsupported? [08:43:00.0000] <annevk> I would prefer if it was required one way or the other [12:26:00.0000] <annevk> /me -> home 2007-07-28 [02:26:00.0000] <krijnh> Philip`: Speeded up /irc-logs/ [02:27:00.0000] <krijnh> And on /irc-logs/ now only the current month is listed [02:28:00.0000] <krijnh> (The referrer database was the biggest problem btw) [02:29:00.0000] <krijnh> (Currently 272452 records) [02:32:00.0000] <krijnh> hsivonen: Error: Bad value for attribute required. <-- Isn't <input required> okay? [05:01:00.0000] <krijnh> Ping [05:02:00.0000] <annevk> pong [05:02:01.0000] <krijnh> I dos'ed myself [05:02:02.0000] <krijnh> :] [05:02:03.0000] <krijnh> But apparantly this survived [05:08:00.0000] <Philip`> krijnh: Looks good :-) [05:09:00.0000] <Philip`> ...though is your server clock three minutes slow, or is my clock three minutes fast? [05:09:01.0000] <krijnh> No idea [05:09:02.0000] <krijnh> I think mine is slow [05:12:00.0000] <krijnh> Fixed [05:12:01.0000] <krijnh> :) [05:13:00.0000] <Philip`> Okay - thanks :-) [05:14:00.0000] <krijnh> Np [05:16:00.0000] <krijnh> I wonder how http://www.google.nl/search?q=irc+logs is possible [05:16:01.0000] <krijnh> Or http://www.google.com/search?q=irc+logs even [05:17:00.0000] <annevk> links from w3.org? [05:17:01.0000] <annevk> krijnh, btw, I think it makes more sense to show the last 20 days than the current month [05:18:00.0000] <annevk> the current month isn't all that useful, say, next Wednesday [05:18:01.0000] <krijnh> Yeah, would be irritating on the first of August :) [05:19:00.0000] <krijnh> I'm a bad coder, so I'll take the easy route [05:20:00.0000] <krijnh> If we're more than 15 days in August, July will be hidden [05:20:01.0000] <annevk> ugh [05:21:00.0000] <annevk> you don't have times stored in some db or something? [05:21:01.0000] <annevk> oh well [05:21:02.0000] <krijnh> Nope [05:21:03.0000] <krijnh> :) [05:22:00.0000] <krijnh> These are all static files [05:23:00.0000] <annevk> just provide pointers to the last twenty static files :) [05:24:00.0000] <annevk> and provide archive links for each month or each but the current month (for which you got files for, anyway) [05:27:00.0000] <krijnh> There :) [05:28:00.0000] <annevk> hmm, you mean that you managed to mess it up? :D [05:28:01.0000] <Philip`> The front page now appears to showing a quite random selection of dates :-p [05:28:02.0000] <Philip`> +be [05:28:03.0000] <krijnh> annevk: Indeed :P [05:28:04.0000] <krijnh> Note to self; hacking in live files is fun [05:28:05.0000] <annevk> I think it's actually the last twenty files [05:29:00.0000] <annevk> euh, the first [05:29:01.0000] <krijnh> Nah, it's 20 files [05:29:02.0000] <krijnh> And then ordered ;_ [05:30:00.0000] <annevk> not the first twenty files you created, sure? [05:30:01.0000] <krijnh> Yeah [05:30:02.0000] <annevk> well, I'm sure sure you're wrong :p [05:30:03.0000] <krijnh> Me too [05:30:04.0000] <krijnh> :) [05:31:00.0000] <Philip`> Does your refe(r)rer log have any search engines other than Google in it? :-) [05:31:01.0000] <krijnh> Philip`: MSN search sometimes [05:32:00.0000] <annevk> my site mostly gets traffic from google and w3.org/html/wg :) [05:32:01.0000] <krijnh> My sites get mostly spidered by Yahoo Slurp [05:32:02.0000] <krijnh> And I never get referrers from them [05:32:03.0000] <annevk> that's prolly the most aggresive spider, yeah [05:32:04.0000] <annevk> /me too [05:34:00.0000] <krijnh> There [05:45:00.0000] <krijnh> And even more there [05:45:01.0000] <krijnh> Does anyone dislike the Referrers list btw? [09:34:00.0000] <krijnh> Ping 2007-07-29 [01:36:00.0000] <Lachy> OMG! It really doesn't help when John responds to a difference of opinion with hostility, and then wonders where the attitude problem comes from :-( [05:23:00.0000] <zcorpan> wow... [05:23:01.0000] <zcorpan> /me is skipping the print thread [05:25:00.0000] <annevk> /me did that too [05:26:00.0000] <annevk> seemed like a waste of time [05:37:00.0000] <krijnh> annevk: what have you done wrong on GoT? [05:38:00.0000] <annevk> nothing lately [05:38:01.0000] <krijnh> And before? [05:39:00.0000] <annevk> before that nothing much either as far as I can tell [05:39:01.0000] <annevk> pointer? [05:40:00.0000] <krijnh> http://gathering.tweakers.net/forum/list_message/28361582#28361582 [05:40:01.0000] <annevk> Ik denk dat hij bedoelt dat ik er tegenwoordig niet meer langskom [05:40:02.0000] <krijnh> Ah, dat kan ook ja [05:41:00.0000] <krijnh> Ik dacht al :) [05:41:01.0000] <krijnh> Verwart 'ie je met Faruk ofzo? ;p [07:59:00.0000] <annevk> http://quuz.org/xml5/play [08:06:00.0000] <Philip`> Did something just change with DOCTYPE handling, or am I imagining things? [08:07:00.0000] <annevk> Where? [08:08:00.0000] <Philip`> http://quuz.org/xml5/play [08:08:01.0000] <annevk> Changed as opposed to what, in that case? [08:09:00.0000] <Philip`> It appeared to be giving an empty document if I had a <!DOCTYPE > anywhere, but now it doesn't do that, but I can't remember if I was just doing something wrong at first :-) [08:09:01.0000] <annevk> a start tag is required, fwiw [08:09:02.0000] <krijnh> annevk: text-align:justifyx ? [08:09:03.0000] <annevk> or empty tag [08:10:00.0000] <annevk> krijnh, ah, I'll drop that line [08:10:01.0000] <krijnh> Perhaps an Opera extension, to fix justify :) [08:11:00.0000] <Philip`> Is it intentional that <!DOCTYPE HTML PUBLIC "foo"'> isn't parsed the same as in HTML5? [08:11:01.0000] <Philip`> (I have no idea how XML would parse that) [08:11:02.0000] <hasather> Philip`: you need a system id [08:12:00.0000] <annevk> doctypes are currently not ending up in the tree [08:12:01.0000] <annevk> you should be able to use the internal subset to add entities and default attribute values though [08:15:00.0000] <annevk> maybe when I have some more time I'll do it through oninput and XMLHttpRequest which should safe everyone from pressing that silly button [08:16:00.0000] <annevk> and maybe also provide some other output serializations [08:24:00.0000] <Philip`> Oh, in XML it's just a parse error when it hits the ' instead of finding a space character [08:24:01.0000] <Philip`> (whereas HTML5 ignores the ' and ends the doctype on the >, and XML5 thinks it's the start of a single-quoted string and carries on reading until it finds a closing single-quote somewhere later) [08:25:00.0000] <annevk> I've yet to enable parse errors; error handling could be changed I suppose to make it more HTML5 like [08:27:00.0000] <Philip`> You could save everyone from pressing that button by just implementing it in client-side JavaScript instead of Python or whatever it is :-) [08:35:00.0000] <annevk> maybe I should reimplement it in ocaml [08:40:00.0000] <Philip`> That'd just be crazy :-p [08:45:00.0000] <Philip`> http://quuz.org/xml5/play?source=%3Chtml%3E%26lt%3B%26lt%3B%3C%2Fhtml%3E is weird [08:47:00.0000] <annevk> there are issues with entity handling, yeah, :( [08:47:01.0000] <annevk> I should fix that [08:50:00.0000] <Philip`> http://quuz.org/xml5/play?source=%3Chtml%3E%26amp%3C%2Fhtml%3E - that also looks like issues :-) [08:51:00.0000] <annevk> it shouldn't have emitted the final ; [08:52:00.0000] <zcorpan> "# [16:32] <Philip`> (whereas HTML5 ignores the ' and ends the doctype on the >, and XML5 thinks it's the start of a single-quoted string and carries on reading until it finds a closing single-quote somewhere later) " -- that's not how i read the html5 spec [08:53:00.0000] <zcorpan> <!DOCTYPE HTML PUBLIC "foo"'> is, afaict, parsed as if it was <!DOCTYPE HTML PUBLIC "foo" '>'> [08:55:00.0000] <annevk> yeah [08:58:00.0000] <Philip`> http://quuz.org/xml5/play?source=%7C+%3Ca%3E+%7C - "Service Temporarily Unavailable" [08:59:00.0000] <Philip`> zcorpan: Argh, I'm not sure how I missed that when actually testing it - maybe I didn't bother looking at the test output... [08:59:01.0000] <Philip`> But at least IE does ignore the ' if you do that [09:00:00.0000] <Philip`> (which is possibly what I was thinking of) [09:00:01.0000] <zcorpan> ie is possibly reparsing [09:01:00.0000] <Philip`> http://software.hixie.ch/utilities/js/live-dom-viewer/?a%3C%21DOCTYPE%20%22%22%22%3Eb%22%3Ec%0D%0Ad%3C%21DOCTYPE%20%22%22%20%22%3Ee%22%3Ef [09:02:00.0000] <annevk> Philip`, weird... [09:03:00.0000] <annevk> works fine if I try it directly on the server... [09:03:01.0000] <zcorpan> Philip`: oh. wow. [09:04:00.0000] <Philip`> IE never really parses the doctype - it just tries to jump through the quotes and work out where the end of the doctype is, then does string matches through the entire doctype content to decide if it should be quirks mode [09:05:00.0000] <Philip`> (See quirkiness of http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20fooDOCTYPE%20NETSCfoo%3E vs http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20fooDOCTYPE%20NETSfoo%3E ) [09:12:00.0000] <annevk> Philip`, thanks for the bug reports btw, I'll look into them somewhere next week [09:14:00.0000] <Philip`> (Also http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20DTD%20HTML%204%20http%3A//%3E works peculiarly in IE, when you remove a / or change the 4 into a 3 or 5 first) [09:25:00.0000] <zcorpan> can someone read hungarian? http://weblabor.hu/forumok/temak/18561 [09:28:00.0000] <Philip`> http://www.tranexp.com:2000/InterTran?type=url&url=http%3A%2F%2Fweblabor.hu%2Fforumok%2Ftemak%2F18561&text=&from=hun&to=eng can't read it especially well [09:32:00.0000] <zcorpan_> http://quuz.org/xml5/play?source=%3C%21DOCTYPE+a%5B%3C%21ENTITY+a+%22%3Ca%2F%3E%22%3E%5D%3E%26a%3B [09:32:01.0000] <zcorpan_> that was not allowed in xml 1.0 [09:34:00.0000] <Philip`> http://quuz.org/xml5/play?source=%EF%BF%BD - internal server error [09:37:00.0000] <zcorpan_> Philip`: no wonder, it decodes as windows-1252 instead of utf-8... [09:37:01.0000] <zcorpan_> gotta love tools that assume that there is a mapping between language and encoding [09:57:00.0000] <zcorpan_> annevk: the xml declaration is not part of the infoset in xml 1.0 i think [10:18:00.0000] <zcorpan_> annevk: <:> is interesting. per xml 1.0 without namespaces it is just an element with the name ":" [10:22:00.0000] <zcorpan_> annevk: http://quuz.org/xml5/play?source=%3Ca%3E%3C%2Fb+x%3D%22%3E%22%3E is different from html5 [10:25:00.0000] <Philip`> http://quuz.org/xml5/play?source=%3Ca%3E%3Cb%3E%0D%0A%3Ca%2F%3E%3Cb%3E%0D%0A%3Ca%2F%2F%3E%3Cb%3E%0D%0A%3Ca%2F%2F%2F%3E%3Cb%3E%0D%0A%3Ca%2F%2F%2F%2F%3E%3Cb%3E seems odd [10:27:00.0000] <zcorpan_> <??> [13:13:00.0000] <gavin> hrm, are the list archives going to be fixed some time soon? [13:16:00.0000] <hsivonen> gavin: I wouldn't expect fixes to the whatwg infrastructure while Hixie is on vacation [13:17:00.0000] <gavin> ok [13:17:01.0000] <gavin> when is he back, again? [13:19:00.0000] <hsivonen> gavin: IIRC, he went away for three weeks, but I've lost track of the starting point of the three weeks [13:20:00.0000] <gavin> ok 2007-07-30 [01:14:00.0000] <hsivonen> I turned comments off on the feed autodiscovery entry on the blog because it attracts spam [03:25:00.0000] <annevk> zcorpan_, I decided not to parse end tags in the same crazy way [03:25:01.0000] <annevk> zcorpan_, but more like how XML handles them [03:26:00.0000] <annevk> zcorpan_, I suppose XML5 will be backwards incompatible with XML 1.0, although in theory we could handle <:> maybe... [03:29:00.0000] <annevk> Philip`, fixed one of your bugs with entity handling (&lt;) [03:34:00.0000] <zcorpan_> annevk: how can it be more like XML when it is drocanian? [03:35:00.0000] <annevk> it's not draconian... [03:36:00.0000] <zcorpan_> 1.0 [03:36:01.0000] <annevk> oh, more like what would be natural handling for end tags given the production for end tags in XML [03:36:02.0000] <zcorpan_> ah. ok [03:38:00.0000] <annevk> zcorpan_, you mentioned something about encoding problems yesterday? [03:39:00.0000] <zcorpan_> annevk: that was re http://krijnhoetmer.nl/irc-logs/whatwg/20070729#l-145 [03:39:01.0000] <annevk> oh, too bad [03:40:00.0000] <annevk> I was hoping that would explain the mysterious failure messages on xml5/play [04:01:00.0000] <zcorpan_> why isn't <figure><object data=...>foo</object><legend>...</legend></figure> conforming? [04:07:00.0000] <zcorpan_> or actually... why are block-level children of object even allowed when child of figure [04:09:00.0000] <zcorpan_> aha. a <figure> doesn't always represent a paragraph [05:13:00.0000] <annevk> /me can't figure out why "| |" makes the server act strangely [06:26:00.0000] <zcorpan_> http://www.search-this.com/2007/07/30/html5-tables/ [07:25:00.0000] <annevk> hmm, you can never render tables in one pass [07:27:00.0000] <annevk> also, didn't <col> imply <colgroup> in HTML4 or is that just browsers? [07:28:00.0000] <Philip`> "The COLGROUP element ... Start tag: required, End tag: optional" - sounds like it's not meant to be implied [07:31:00.0000] <zcorpan_> just internet explorer, actually, iirc [07:31:01.0000] <zcorpan_> but that's not relevant, because <tbody> is also implied and it is not required [07:33:00.0000] <annevk> prolly makes sense to make it optional, yes [13:09:00.0000] <zcorpan_> there is a Mobile Web Initiative Test Suites Working Group? [13:10:00.0000] <zcorpan_> isn't developing test suites pretty central for a WG? [13:17:00.0000] <zcorpan_> their tests use <![CDATA[... wonder what mobile browsers that don't use xml parsers do with that [14:14:00.0000] <zcorpan_> how does one check the presence of a getter in JS? [14:15:00.0000] <zcorpan_> i.e. an Element.prototype.__defineGetter__() [14:23:00.0000] <Philip`> __lookupGetter__? [14:27:00.0000] <zcorpan_> ah! cheers [14:31:00.0000] <jgraham> Philip`: Do you have all your <canvas> tests somewhere? [14:31:01.0000] <jgraham> /me wants to point out "tests available" as a feature of <canvas> [14:32:00.0000] <Philip`> http://canvex.lazyilluminati.com/tests/tests/ is the most recent online version [14:33:00.0000] <Philip`> (I have a few more I haven't got around to uploading yet) [14:34:00.0000] <Philip`> (Also, that particular page hits two CSS bugs in WebKit which is a bit irritating and I suppose I ought to work around that at some point) [14:35:00.0000] <jgraham> Thanks [14:39:00.0000] <zcorpan_> http://simon.html5.org/sandbox/js/elementtraversal.js [14:42:00.0000] <Philip`> Is it just me, or is <canvas> much more fun to use than SVG? [14:42:01.0000] <Philip`> I don't really know much about SVG since I've not used it much, but I've not used it much because it never looked like any fun :-) [14:43:00.0000] <zcorpan_> dunno, i have played very little with svg and practically nothing with canvas [14:43:01.0000] <jgraham> I know basically nothing about either SVG or <canvas> [14:44:00.0000] <jgraham> But it's pretty clear that <canvas> should be in the HTML 5 spec because it's not going away [14:44:01.0000] <jgraham> and it's a simpler solution to a narrower problem than SVG [14:45:00.0000] <zcorpan_> who can find bugs in my ElementTraversal implementation? :) [14:47:00.0000] <Philip`> The if in "if (e) { while (e) { ... } }" is redundant [14:47:01.0000] <zcorpan_> ha, indeed [14:48:00.0000] <zcorpan_> fixed [14:56:00.0000] <Philip`> zcorpan_: Is that script meant to work if the UA already implements firstElementChild/etc? [14:57:00.0000] <zcorpan_> Philip`: the intent is that if the UA already has implemented it then the script is ignored [15:04:00.0000] <Philip`> Ah, I didn't know FF returned stuff from __lookupGetter__ on native properties [15:04:01.0000] <Philip`> (Does anything else implement __lookupGetter__ at all?) [15:04:02.0000] <Philip`> "For the purpose of ElementTraversal, an entity reference node which represents an element must be treated as an element node." - your code doesn't do that [15:05:00.0000] <zcorpan_> indeed [15:05:01.0000] <Philip`> (as far as I can tell) [15:06:00.0000] <zcorpan_> mozilla doesn't put entity reference nodes in the dom though [15:06:01.0000] <zcorpan_> perhaps you can insert them manually [15:12:00.0000] <Philip`> Setting things like e.childElementCount=1 isn't handled the same as other settings of readonly properties [15:12:01.0000] <Philip`> /me doesn't see any more serious issues [15:13:00.0000] <Philip`> (That is, any issues which are more serious; not any more issues which are serious) [15:15:00.0000] <othermaciej> I know a lot about both <canvas> and SVG [15:15:01.0000] <othermaciej> and I think they are both kinda neat [15:15:02.0000] <othermaciej> there is a big overlapping range of use cases, as well as some problems where one or the other is much better suited [15:29:00.0000] <jgraham> othermaciej++ (re: public-html) [15:54:00.0000] <Philip`> Ooh, I never knew Safari did <input type=slider> [15:54:01.0000] <Philip`> Uh [15:54:02.0000] <Philip`> Ooh, I never knew Safari did <input type=range> [15:55:00.0000] <Philip`> but it appears to not handle non-integer steps, which is annoying [16:22:00.0000] <Philip`> Why is &#13; a parse error? It sounds like a fairly legitimate thing to do [16:32:00.0000] <zcorpan_> Philip`: good question [16:33:00.0000] <zcorpan_> wonder how &#10;&#13; is treated in browsers [16:34:00.0000] <zcorpan_> (or is it the other way around?) [16:44:00.0000] <zcorpan_> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0D%0A%3Cpre%3Ea%26%2313%3B%26%2310%3Bb [16:49:00.0000] <zcorpan_> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0D%0A%3Cpre%3Ea%26%2313%3B%26%2310%3Bb%3Cscript%3E%0D%0A%20var%20node%20%3D%20document.getElementsByTagName%28%22pre%22%29%5B0%5D.firstChild%3B%0D%0A%20node.data%20%3D%20node.data.replace%28/%5Cr/g%2C%20%22%5C%5Cr%22%29.replace%28/%5Cn/g%2C%20%22%5C%5Cn%22%29%3B%0D%0A%3C/script%3E [16:51:00.0000] <zcorpan_> ie: &#13;&#10; becomes \r. opera: it becomes \r\n but only one linebreak is rendered. safari, firefox: it becomes \n\n (as per html5) [16:51:01.0000] <zcorpan_> (unless i'm misreading the spec) [16:54:00.0000] <PlayPause> !seen nickshanks 2007-07-31 [17:32:00.0000] <Philip`> Hmm, IE seems to support some named entities that Mozilla (and, I think, HTML5) don't [17:32:01.0000] <Philip`> in particular, 8203: zwsp; 8234: lre; 8235: rle; 8236: pdf; 8237: lro; 8238: rlo; 8298: iss; 8299: ass; 8300: iafs; 8301: aafs; 8302: nads; 8303: nods [17:32:02.0000] <Philip`> /me wonders if anyone cares about those [17:39:00.0000] <zcorpan_> Philip`: how did you find those? [17:40:00.0000] <Philip`> http://www.bbc.co.uk/dna/h2g2/A264548 has a list that includes them [17:41:00.0000] <zcorpan_> ok [17:42:00.0000] <zcorpan_> ie seems to be alone in supporting those [17:43:00.0000] <zcorpan_> but i'm not against adding them to html5 [17:45:00.0000] <Philip`> They don't appear to be the most highly documented entity names [17:45:01.0000] <Philip`> (like, that one list is the only place on the whole web) [17:46:00.0000] <zcorpan_> do they appear in your html research sample? [17:46:01.0000] <zcorpan_> (probably not) [17:52:00.0000] <Philip`> I didn't collect any data about unrecognised entities when I was looking at the ~8K pages [17:53:00.0000] <Philip`> (The tokeniser did actually report unrecognised entities but I just discarded all that data instead of saving it anyhwere) [17:53:01.0000] <Philip`> s/hw/wh/ [17:53:02.0000] <Philip`> In the other ~2.5K pages that I still have a copy of, none of those entities turn up at all [17:56:00.0000] <zcorpan_> yeah, we'd probably need a billion documents research to find the relevance of these entities [17:59:00.0000] <Philip`> http://www.tools.ietf.org/html/draft-duerst-iri-bidi-00 talks about lre, lro, rle, rlo, pdf [18:01:00.0000] <zcorpan_> Philip`: will you post this to the list? [18:01:01.0000] <Philip`> but that looks like it's just a coincidental use of the same names [18:01:02.0000] <Philip`> I will do [18:01:03.0000] <Philip`> (But which list? :-) ) [18:02:00.0000] <zcorpan_> (doesn't matter) [18:02:01.0000] <zcorpan_> i'll cover &#13; btw [18:11:00.0000] <Philip`> Opera 9.2 is missing &REG; [18:48:00.0000] <Philip`> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3Cp%3E%3C/p%3E%3Cp%3E%26ampxyz123%3C/p%3E%0D%0A%3Cscript%3Edocument.getElementsByTagName%28%27p%27%29%5B0%5D.innerHTML%3D%27%26ampxyz123%27%3C/script%3E%0D%0A [18:48:01.0000] <Philip`> in IE [18:50:00.0000] <Philip`> When you set innerHTML, it appears to discard all alphanumeric characters after a "&amp" (or whatever) if there are no non-alphanumeric characters before the end of the string [18:58:00.0000] <Philip`> http://canvex.lazyilluminati.com/misc/entities.html [18:58:01.0000] <Philip`> That appears to reliably crash Firefox trunk [19:03:00.0000] <zcorpan_> /me points at http://simon.html5.org/test/html/parsing/entities/trailing-semicolon/real/ [19:05:00.0000] <Philip`> Aha [19:05:01.0000] <Philip`> But mine makes IE fail more ;-) [19:06:00.0000] <Philip`> (and works in html5lib/etc, though unfortunately all the implementations get everything right which is no fun) [19:08:00.0000] <zcorpan_> :) [19:18:00.0000] <Philip`> http://yy28.60.kg/test/read.cgi/maido3/1096370177/l50 - there's someone using &zwsp; [19:18:01.0000] <Philip`> http://www.tasb.com/services/field/staff/index.aspx?print=true - there too [19:18:02.0000] <Philip`> It's lucky that Google doesn't understand these entities, so it lets you search for them by name [19:19:00.0000] <zcorpan_> there you go [19:21:00.0000] <Philip`> Most of the references seem to be in Chinese ASCII-art comment spam (or is that not spam?), largely on 2ch.net [19:25:00.0000] <zcorpan_> it's Shift_JIS-art (*゚ー゚) [19:41:00.0000] <deltab> no zwsp in http://www.w3.org/TR/html4/sgml/entities.html [19:41:01.0000] <deltab> ... but you probably already knew that [19:42:00.0000] <zcorpan_> yeah... all entities in html4 are already in html5 [19:42:01.0000] <zcorpan_> (i hope!) [19:43:00.0000] <Philip`> At least all the ones in Mozilla and Opera are [19:43:01.0000] <Philip`> Only IE is 'special' :-) [03:24:00.0000] <annevk> CRLF was already raised once on whatwg⊙wo [03:24:01.0000] <annevk> g [03:28:00.0000] <hsivonen> implementing a check for NCName makes me wonder if the original XML WG stopped to write down the code for checking it before making it part of the spec... [03:29:00.0000] <annevk> In XML5 it's easy :) [03:44:00.0000] <hsivonen> speaking of NCNames and stuff: are there known test cases that test the detection of various XML spec lawyering NCName subtleties? [03:47:00.0000] <hsivonen> I wonder what back end software http://www.weblogs.com/feedvalidator.html runs [03:48:00.0000] <annevk> /me hopes his comparison between Harry Potter and specs won't be seen as offensive [03:48:01.0000] <hsivonen> annevk: on list or on blog? [03:48:02.0000] <annevk> list [03:49:00.0000] <hsivonen> annevk: my guess is it will be seen as offensive [03:50:00.0000] <annevk> bah [03:53:00.0000] <hendry> damn, i had the badware pcpro popup whilst using my bank's website... [03:53:01.0000] <hsivonen> I wrote a regexp for NCName suitable for sticking into a Java String literal and it is 7875 characters long! [03:55:00.0000] <annevk> ouch [03:56:00.0000] <annevk> write your own tokenizer? :) [03:57:00.0000] <hsivonen> annevk: this is part of the feature set for parsing HTML into XML 1.0 infosets [03:58:00.0000] <hsivonen> annevk: no I need to be able to detect non-NCNames either to treat them as fatal or to drop them [03:58:01.0000] <hsivonen> anyway, the regexp is so long that Eclipse refuses to scroll enough to the right to show it all [04:00:00.0000] <hsivonen> using ICU4J for this would be cleaner and might even perform better, but I want to avoid the dependency [04:03:00.0000] <annevk> omg, this quote/unquot has to stop [04:12:00.0000] <hsivonen> at least HTML 5 is much more author-friendly reading than any ISO spec [04:13:00.0000] <hsivonen> (I wonder how much work it takes to obfuscate something like ODF or OOXML to comply with ISO drafting rules.) [04:13:01.0000] <hsivonen> (or might ISO make an exception on those?) [04:13:02.0000] <hsivonen> (not to say that being more author-friendly than an ISO spec is enough) [04:26:00.0000] <annevk> he took it as insult [04:28:00.0000] <hsivonen> annevk: not surprising. if it wasn't meant as one, it was a badly formulated non-insult [04:30:00.0000] <annevk> I guess [06:24:00.0000] <hsivonen> canvas at work: http://westciv.com/xray/ [06:26:00.0000] <annevk> in Opera I always get back <canvas> [06:26:01.0000] <annevk> I wonder if that's a bug [06:26:02.0000] <hsivonen> annevk: documentation says it only works in Gecko and WebKit. not working in Presto and Trident is known [06:27:00.0000] <annevk> of course, but what's the issue? [06:27:01.0000] <annevk> and is that an issue in Opera or Firefox? [06:28:00.0000] <hsivonen> I don't know. Based on what you said, I guess the issue is making the canvas event-neutral so that clicks go through it [06:31:00.0000] <annevk> jgraham, simpleson has no attribute "loads" causes tons of errors [06:32:00.0000] <annevk> the rests of the tests are about some silly space after <!DOCTYPE ... [07:00:00.0000] <G0k> hey all [07:04:00.0000] <annevk> hi [07:05:00.0000] <hsivonen> hi [07:14:00.0000] <G0k> so this TCP/UDP connection stuff in HTML5 [07:14:01.0000] <G0k> has anyone actually tried implementing that? [07:14:02.0000] <annevk> I don't think so [07:14:03.0000] <annevk> That section is very much unstable, afaict [07:15:00.0000] <G0k> yeah i mean [07:15:01.0000] <G0k> i guess i'm not really even clear on its...purpose [07:15:02.0000] <G0k> i mean it seems kinda misnamed [07:15:03.0000] <annevk> communicating without the overhead of HTTP [07:16:00.0000] <annevk> and I believe it was also intended to address P2P at some point [07:16:01.0000] <G0k> it's not really a TCPConnection so much as a NewProtocolWeJustInventedThatHappensToUseTCPConnection [07:16:02.0000] <hsivonen> e.g. for games, IM clients, continuous visualizations of data, etc. [07:16:03.0000] <hsivonen> G0k: it isn't a pure TCP connection for security reasons [07:17:00.0000] <G0k> yeah that's surely legitimate, but calling it a TCP connection isn't really accurate [07:17:01.0000] <G0k> i mean there's also no reason that protocol couldn't be made to run over SCTP, for instance [07:18:00.0000] <annevk> you could point it out on the list [07:19:00.0000] <G0k> yeah that would require typing in complete sentences and stuff, it's much easier to complain to you people [07:20:00.0000] <hsivonen> G0k: the connection initialization if coupled with TCP ports isn't it? [07:21:00.0000] <annevk> there's also http://dev.w3.org/cvsweb/~checkout~/2006/webapi/network-api/network-api.html?rev=1.3 [07:21:01.0000] <annevk> G0k, hah [07:21:02.0000] <G0k> well yeah i mean the DOM interface is TCP-specific but the protocol itself needn't be [07:23:00.0000] <met_> annevk, you remind me Don LaFontaine in your spot http://annevankesteren.nl/2007/07/web it's like I hear his common sentence "I a strict world where..." (see http://www.youtube.com/watch?v=ZJMGS7l0wT8 ) [07:25:00.0000] <annevk> /me doesn't have flash atm [07:27:00.0000] <hsivonen> /me didn't know that "Don" is a real one guy [07:28:00.0000] <met_> hsivonen see http://www.youtube.com/watch?v=Wkhdy6bavuk [07:30:00.0000] <hsivonen> met_: just watched it [07:33:00.0000] <met_> Don's rewriting of annevk's spot: In a strict world, where pages using those new features completely break, new hero will rise. HTML5 from producers of Safari, Opera and Firefox. In December 2010 ! [07:36:00.0000] <annevk> with 10% discount if you order now [07:44:00.0000] <Lachy> hey, I'm looking for some example flash sites that provide alternative HTML versions of the site. In particular, ones that link to the HTML-only versions of the site below the flash. Anyone know of any? [07:45:00.0000] <Lachy> or even sites that use video and provide alternative content [07:45:01.0000] <annevk> http://www.google.com/search?q=%22skip+intro%22 ? [07:45:02.0000] <Philip`> http://www.jkrowling.com/ has a text-only version [07:46:00.0000] <Lachy> yeah, I already got jkrowling [07:46:01.0000] <annevk> http://www.google.com/search?q=%22text+only+version%22 [07:47:00.0000] <hsivonen> Lachy: I'd start by looking at recent movie promo sites [07:47:01.0000] <hsivonen> Lachy: with this method, starting from the Apple Trailer page, I found http://www.rushhourmovie.com/ [07:47:02.0000] <Lachy> oh, good idea [07:47:03.0000] <hsivonen> it puts the textual alternative in noscript! [07:48:00.0000] <Lachy> oh, and it uses black text on black background when you disable JS [07:49:00.0000] <zcorpan_> all i get is "A flash player upgrade is required to view this website, click here to continue." [07:49:01.0000] <hsivonen> http://www.noendinsightmovie.com/ also has a short writeup in the source [07:49:02.0000] <hsivonen> hmm. perhaps the alternative content is for google--not browser users [07:53:00.0000] <hsivonen> the interesting thing is that even though movie promos most likely aren't trying to take blind users into account, they still bother putting text in there. this suggests that pitching textual alternatives as a SEO method might be attractive to site makers [07:55:00.0000] <hsivonen> hmm. http://cleanishappy.com/ contains a really long text writeup aimed at seach engines [07:55:01.0000] <hsivonen> but useless for users who can't browse Flash [07:58:00.0000] <Lachy> wow, that's got to be the longest meta description I've ever seen! [08:04:00.0000] <hsivonen> Lachy: http://del.icio.us/tag/flash%2Bmarketing [08:07:00.0000] <Lachy> thanks [08:37:00.0000] <annevk> /me wonders why SVGSVGElement is not named SVGSvgElement [08:48:00.0000] <annevk> <dfn title="GDO">Garage Door Opener (<abbr>GDO</abbr>)</dfn> ... [08:54:00.0000] <zcorpan_> in typography, is it common to have the whole thing in italics, or just the "Garage Door Opener" part, or perhaps just the "GDO" part? [08:54:01.0000] <annevk> dunno [10:01:00.0000] <gsnedders> only 373 unread emails on public-html having been away [10:19:00.0000] <Philip`> Is this month going to beat May? [10:19:01.0000] <Philip`> It's only 15 behind at the moment [10:23:00.0000] <zcorpan_> we need something controversial [10:24:00.0000] <annevk> that should be trivial [11:11:00.0000] <Lachy> why on earth does Firefox return true for if(document.createNodeIterator); but it returns an NS_ERROR_NOT_IMPLEMENTED whenever I try to invoke it!? [11:12:00.0000] <hasather> Lachy: when are you coming to Norway? [11:13:00.0000] <Lachy> don't know yet. [11:13:01.0000] <zcorpan_> /me might come to norway sometime soon also [11:13:02.0000] <hasather> zcorpan_: ahh, cool [11:13:03.0000] <zcorpan_> perhaps next week [11:14:00.0000] <Lachy> cause I want to go to Web Directions at the end of September, and if I go over in August, then I'd be coming back a few weeks later, so it was suggested that I go over in October [11:14:01.0000] <hasather> Lachy: so you're not coming in August? [11:15:00.0000] <Lachy> probably not, but I haven't heard back since I last emailed Annchen [11:15:01.0000] <hasather> ok [11:16:00.0000] <annevk> zcorpan_, you have to come next week :) [11:16:01.0000] <Lachy> is the difference between a NodeIterator and a TreeWalker that the NodeIterator only looks at child nodes, and the TreeWalker looks at all descendants? [11:16:02.0000] <zcorpan_> annevk: when are you leaving again? [11:17:00.0000] <annevk> zcorpan_, I'm going to Spain the 11th of August and will be back briefly the 17th to attend Friday beer :) The 18th I'll go to the Netherlands for some more vacation afer which university starts again... [11:17:01.0000] <zcorpan_> alrighty [11:19:00.0000] <Philip`> Lachy: createNodeIterator was added in https://bugzilla.mozilla.org/show_bug.cgi?id=82625 [11:20:00.0000] <Philip`> and it looks like those functions were just added for completeness [11:21:00.0000] <Lachy> Philip`: assuming you mean createTreeWalker was added in that bug, ok. [11:21:01.0000] <Lachy> but it looks like I should be able to use a TreeWalker and nextSibling for my needs anyway [11:26:00.0000] <Philip`> I meant createNodeIterator, or at least the createNodeIterator stub that just returns NS_ERROR_NOT_IMPLEMENTED [11:49:00.0000] <Lachy> Robert Burns' new quoting technique is so much harder to read :-( [11:51:00.0000] <Lachy> in fact, I just can't read it, cause I can't keep track of where I'm up to! [11:52:00.0000] <Lachy> I sent him off list mail telling him about RFC 2646 and asking him to comply with the existing convention [11:53:00.0000] <annevk> I gave up on reading that e-mail [11:54:00.0000] <annevk> /me thought Hixie would be back this week [11:56:00.0000] <annevk> given http://krijnhoetmer.nl/irc-logs/whatwg/20070705#l-669 he should be, maybe he died! [12:13:00.0000] <Lachy> I hope he hasn't [12:14:00.0000] <Lachy> I wonder who would know where he is? [12:15:00.0000] <Lachy> some weird is happening with my script. I'm getting a non-existent COL element returned from the tree walker [12:22:00.0000] <Lachy> oh, it's not weird, it's my mistake :-) [14:21:00.0000] <jgraham> Astronomy spam is the best [14:33:00.0000] <hasather> annevk: he's alive: http://twitter.com/Hixie/statuses/178895922 [15:00:00.0000] <Hixie> i have returned [15:02:00.0000] <jgraham> Nice holiday? [15:02:01.0000] <Hixie> certainly it was nice to have a break from public-html [15:02:02.0000] <jgraham> :) [15:03:00.0000] <hasather> Hixie: welcome back [15:04:00.0000] <Hixie> so, anything bad happen while i was away? any servers die? [15:04:01.0000] <Hixie> configurations go mysteriously awry? [15:04:02.0000] <hsivonen> Hixie: there have been complaints about whatwg archives [15:04:03.0000] <hsivonen> Hixie: URIs broke or something [15:04:04.0000] <Hixie> /me looks [15:04:05.0000] <jgraham> The archives are private [15:06:00.0000] <Hixie> fixed [15:06:01.0000] <Hixie> nothing i can do about the uri changes [15:19:00.0000] <hsivonen> hmm. 1300 messages on public-html this month [15:19:01.0000] <hsivonen> July, that is [16:26:00.0000] <Philip`> Three more messages to go...