| 00:00 | <gsnedders> | I probably ought to do the work I'm meant to do for Computing |
| 00:00 | <gsnedders> | I'm really get quite far behind |
| 00:00 | <gsnedders> | Though de-facto as long as I've done everything I'm meant to by December it doesn't really matter when I do it |
| 00:00 | <BenMillard> | gsnedders, do prioritise things above this e-mail. I might not send for a day or two yet |
| 00:01 | <gsnedders> | BenMillard: I've just been dealing with other low priority email |
| 00:01 | <Lachy> | why? I took my files directly from juicystudio, and didn't modify anything else |
| 00:01 | <gsnedders> | BenMillard: From several months ago :) |
| 00:01 | <BenMillard> | Lachy, you see the class="headers" here? http://juicystudio.com/wcag/tables/noscope.html |
| 00:01 | <jgraham> | Why did someone think it was a good idea for lxml to add a random doctype when parsing html documents? |
| 00:01 | <BenMillard> | sorry, class="header": "<td class="header">12/12/2005</td" |
| 00:01 | <gsnedders> | If you send me something low priority and don't get a reply within an hour or two, it'll probably take a few weeks or months :) |
| 00:02 | <gsnedders> | jgraham: Because libxml2's HTML support is just a big hack |
| 00:02 | <jgraham> | gsnedders: The problem is that their XML support tries to enforce XML rules |
| 00:02 | <BenMillard> | Lachy, the 3 instances of class="header" are also missing from your complexdatatable.html |
| 00:02 | <jgraham> | Like no : in tag names |
| 00:03 | <jgraham> | s/tag/attribute/ |
| 00:03 | <Lachy> | Looks like they've just been removed from those files |
| 00:03 | <Lachy> | press reload |
| 00:03 | <BenMillard> | Lachy: yes, you're right |
| 00:03 | <BenMillard> | that's annoying |
| 00:03 | <Lachy> | I did at first, but that must have been cached |
| 00:03 | <BenMillard> | they're changing the record whilst I'm replying to it :( |
| 00:03 | <Lachy> | yeah, he's destroying the evidence |
| 00:04 | <BenMillard> | Lachy, OK, so if you saw those attributes were present then those are the values I'll give since they are historically accurate |
| 00:05 | <jgraham> | Those attributes were present for sure. I think I mentioned it in an email |
| 00:05 | <gsnedders> | Why hasn't anyone creating a decent Flickr downloader yet? |
| 00:05 | <gsnedders> | That is like, easy to use. |
| 00:05 | <jgraham> | gsnedders: What do you mean decent? |
| 00:05 | <Philip`> | gsnedders: Like, a web browser? |
| 00:05 | <gsnedders> | Philip`: But to download an entire set? |
| 00:05 | <Philip`> | gsnedders: Oh |
| 00:05 | <BenMillard> | Lachy, the absence of class="header" makes our numbers match, so at least I haven't forgotten how to count. :) |
| 00:05 | <gsnedders> | jgraham: Not having a crazily complex UI. Copy and pasting a URL from a browser would work fine. |
| 00:06 | <Philip`> | gsnedders: Write a few (dozen) lines of script to use their API? |
| 00:06 | <Lachy> | ok, so my values are wrong cause they're the new values without the classes |
| 00:06 | <BenMillard> | Lachy, correct |
| 00:06 | <gsnedders> | Philip`: Because I need something that works on my uncle's computer, so I could just write a web API |
| 00:06 | <gsnedders> | *web interface |
| 00:06 | gsnedders | yawns |
| 00:07 | <Lachy> | BenMillard, there are only 18 header cells by my count, not 20 |
| 00:07 | <BenMillard> | Lachy, 12 in the <thead>, agreed? |
| 00:07 | <Lachy> | yes |
| 00:07 | <BenMillard> | Lachy, 2 in the first column of the <tbody>? |
| 00:08 | <Lachy> | plus the 6 for budgeted, actual and forcasted in the column |
| 00:08 | <BenMillard> | Lachy, 6 in column 7 as you say |
| 00:08 | <BenMillard> | Lachy, yep, I've forgotten how to add up them :) |
| 00:08 | <Lachy> | ah, I didn't count those first 2 as headers |
| 00:08 | <BenMillard> | oh wait, 12 + 6 + 2 = 20 |
| 00:08 | <Lachy> | the "Partner Portal" ones? |
| 00:08 | <BenMillard> | Lachy, yeah |
| 00:09 | <BenMillard> | they are associated as being row headers |
| 00:10 | <Lachy> | in complexdatatable.html, they are. But in noscope.html, there's nothing that indicates they are headers |
| 00:10 | <BenMillard> | "<td scope="row" id="row1" rowspan="3">Partner Portal</td>" in http://juicystudio.com/wcag/tables/complexdatatable.html |
| 00:10 | <BenMillard> | Lachy, yeah, later in my e-mail I mention that test 1 is unfair |
| 00:10 | <Lachy> | ok |
| 00:10 | <BenMillard> | and that scope="row" was used in test 3 instead of using headers+id for all the associations |
| 00:10 | <Lachy> | you mean test 2 |
| 00:11 | <BenMillard> | Lachy, oh sorry you're right |
| 00:11 | <BenMillard> | scope="row" is used in addition to headers+id in test 3 |
| 00:11 | <Lachy> | oh, what are the final byte counts you used? I should check the percentage given too |
| 00:12 | <BenMillard> | Lachy, 1,704 and 2,625. yes, I have made percentage errors before now :) |
| 00:14 | <Lachy> | I get 54.05% |
| 00:14 | <BenMillard> | Lachy, when talking about test file 3, I say "5 cells use <td scope> and participate in headers+id, duplicating the association." so my e-mail about that aspect correct, I just got muddled during this review |
| 00:15 | <BenMillard> | Lachy, what is your calculation for that? Maybe I've forgotten percentage increase math... |
| 00:16 | <Lachy> | (2625 - 1704) / 1704 * 100 = 921 / 1704 * 100 = 54.04% |
| 00:16 | <BenMillard> | hmm, that's more complicated than what I did :) |
| 00:16 | <Lachy> | what did you do? |
| 00:17 | <BenMillard> | just now I tried 2,625 / 1,704 = 1.5404 so yours looks right |
| 00:17 | <BenMillard> | maybe I typod my sum first time round |
| 00:17 | <Lachy> | I assumed I needed to find the difference, and then find out what percentage that difference was with the lower value |
| 00:18 | <BenMillard> | Lachy, so me saying 36% more markup was understating the code bloat by quite a bit! thanks for spotting that |
| 00:18 | <Lachy> | our numbers are consistent. Yours (1.5404) means that 2625 is 154% the size of 1704 |
| 00:19 | <Lachy> | whereas mine says it's 54% larger |
| 00:19 | <BenMillard> | Lachy, yeah that's how I interpret it |
| 00:19 | <Philip`> | (Markup-size doesn't seem a very interesting measure when these tables are probably generated by programs from databases, and no human ever needs to look at the markup, and simplicity of implementing the table-generating code seems much more relevant) |
| 00:20 | <BenMillard> | Philip`, I've seen auto-generated headers+id, for sure |
| 00:20 | <Lachy> | Philip`, throwing lots of data at people, regardless of how relevant it is, is a useful techniqe for winning an argument :-) |
| 00:20 | <BenMillard> | I've also seen typoed headers+id |
| 00:21 | <Lachy> | fyorfty percent of all people know that, Kent |
| 00:21 | <Philip`> | Lachy: Winning an argument is not the aim; the aim is to design the best possible system :-p |
| 00:21 | <BenMillard> | Philip`, it's also worth considering that if the generating code can be radically simpler (such as just using <th> for all headers) that reduces the likelihood of bugs in the table |
| 00:22 | <Lachy> | s/fyorfty/forfty/ (I messed the simpsons quote :-)) |
| 00:23 | <Philip`> | BenMillard: It's good to encourage people to do the simplest thing, but sometimes they just have complex tables, so I thought the issue was how to support the most complex tables (e.g. whether to force them to use <th> instead of <td>) |
| 00:23 | <BenMillard> | Philip`, that's right. So if a table can be supported by plain <th> using a sane association algorithm, that's preferable over the complexity and bloat of headers+id, in my judgement. |
| 00:24 | <BenMillard> | but I can well imagine irregular tables will sometimes be necessary and need headers+id, although even then all the headers could be done as <th> |
| 00:24 | <jgraham> | Hmm BenMillard keeps saying sensible things so I don't have to |
| 00:25 | <Philip`> | BenMillard: Would the headers attribute be supported only on <td>, not <th>? |
| 00:25 | <BenMillard> | Philip`, I haven't studied that in detail yet. would you like me to forward the message to you? |
| 00:26 | <Philip`> | How would it handle something like http://factfinder.census.gov/servlet/QTTable?_bm=n&_lang=en&qr_name=DEC_2000_SF1_U_DP1&ds_name=DEC_2000_SF1_U&geo_id=05000US48487 where the numbers need to be associated with the label in the first column, but the labels in the first column also need to be associated with some random set of other label cells? |
| 00:26 | <jgraham> | Philip`: I think there is likely a use case for @headers on th although no one has actually brought forward a table that needs it (at least recently) |
| 00:27 | <BenMillard> | Philip & jgraham, I call those "heirarchical row headers" although nobody else does :) |
| 00:27 | <Lachy> | "Test file 1 erroneously uses <td> for 10 of the 20 header cells" Which headers make up the 10? I only count 9 |
| 00:27 | <BenMillard> | Lachy, I'll recount |
| 00:27 | <Lachy> | actually, 11 |
| 00:28 | <Lachy> | 3 dates, 2 x Partner Portal, 6 Budged/actual/forcast |
| 00:28 | <Philip`> | BenMillard: It might be best to not forward the email, since I have too many other things I ought to be working on instead :-) |
| 00:28 | <BenMillard> | Philip`, sure thing |
| 00:28 | <BenMillard> | Lachy, so we're talking about? http://juicystudio.com/wcag/tables/noscope.html |
| 00:28 | <Lachy> | yes |
| 00:28 | <jgraham> | Philip`: That table looks like it should actually be several smaller tables |
| 00:29 | <BenMillard> | Lachy, I agree with 11. thanks! |
| 00:29 | <Philip`> | jgraham: I don't think splitting it into smaller tables would help with the "One race -> Asian -> Asian Indian" label hierarchy, which is the main problem |
| 00:29 | <BenMillard> | (so this is another case where I understated the error) |
| 00:30 | <jgraham> | Philip`: I think that layout would need @headers on <th> |
| 00:30 | <Hixie> | iirc you can actually do Philip`'s table with some careful use of rowspans, but i forget if i ended up making that work or not (and it's dubious whether that's desireable anyway) |
| 00:30 | <jgraham> | Philip`: Sure but it would have confused me hell of lot less |
| 00:30 | <Philip`> | (Also splitting it into smaller tables would make the layout go all ugly, because you want them to all be exactly the same column sizes, and there's no way to enforce that when they're multiple tables) |
| 00:30 | <jgraham> | adn I can see it |
| 00:30 | <BenMillard> | Hixie, yes, rowspan works for "heirarchical row header" case...if you've got enough width to present it |
| 00:32 | <Hixie> | Philip`: i'm not sure what the best way to render that table is, but i'm pretty sure that "0. Subject, Race, One Race, Native Hawaiian and Other Pacific Islander, Other Pacific Islander 2; Number" is not the best way to read out that cell |
| 00:32 | <BenMillard> | Philip`, the table uses fixed-width, such as width="385", so you could split it and keep the fixed widths |
| 00:32 | <Philip`> | BenMillard: Then you're making assumptions about how many pixels the user's font uses |
| 00:32 | <Hixie> | Philip`: which is presumably what one would get if we encouraged people to chain headers |
| 00:33 | <Hixie> | Philip`: it should definitely be possible to link columns into having the same widths even in different tables, though css can't do that (and likely won't for some time) so i agree that in this case we shouldn't assume that it is possible |
| 00:33 | <BenMillard> | Hixie, when moving from cell to cell the more sophisticated ATs only announce the headers which have changed |
| 00:33 | <Lachy> | BenMillard, "3 of the cells using <td scope="row"> also use rowspan." - I only see 2 scope="row" in test 3 |
| 00:34 | <Hixie> | BenMillard: well then it would sound exactly like if there weren't chained headers, assuming you're navigating the table linearly |
| 00:34 | <Lachy> | and this assertion of yours is debatable "For scope to work here under HTML4, scope=""rowgroup" must be used with the appropriate use of <tbody> around the rows which are being spanned: " |
| 00:34 | <BenMillard> | Lachy, yep, well spotted |
| 00:34 | <Lachy> | the spec is ambiguous though |
| 00:34 | <jgraham> | Hixie: FWIW Al suggested that the common AT setup is to have headers red out on demand |
| 00:34 | <Lachy> | it says row, but technically it's still in 3 rows |
| 00:34 | <BenMillard> | Lachy, does scope="row" apply to multiple rows in HTML4? |
| 00:34 | <jgraham> | s/red/read/ |
| 00:35 | <Lachy> | in fact, it doesn't say one way or the other |
| 00:35 | <Lachy> | it just says "row: The current cell provides header information for the rest of the row that contains it" |
| 00:35 | <Hixie> | jgraham: that would suggest it would render as: "Zero." zero what? crap, what are the headers? "Subject, Race, One Race, Native Hawaiian and Other Pacific Islander, Other Pacific Islander 2; Number" say what now? |
| 00:35 | <Lachy> | so does that mean the rest of the <tr> that contains it, or the rest of the row(s) that it's actually in? |
| 00:36 | <jgraham> | Hixie: I agree in this case it's pretty hard to understand. But I find that table pretty hard to understand so maybe it's just a badly designed table |
| 00:36 | <BenMillard> | Lachy, it seems to think a "row" is different from a "row group" so my reading is that scope="row" applies to exactly one line of cells across the table |
| 00:36 | <Hixie> | jgraham: quite possible |
| 00:37 | <Lachy> | You say "This further exemplifies how difficult the headers+id system is to get right", after you mention errors with scope="" |
| 00:37 | <Hixie> | jgraham: but i think "Zero." zero what? crap, what are the headers? "Other Pacific Islander 2; Number" would be easier to understand. |
| 00:37 | <BenMillard> | Lachy, can we nail down the scope="row" thing first? :) |
| 00:37 | <jgraham> | Lachy: Trying to understand the HTML4 headers spec algorithm is a lost cause |
| 00:37 | <Hixie> | there's an algorithm? |
| 00:37 | <Lachy> | BenMillard, HTML4 is not clear enough to be certain one way or another |
| 00:37 | <Hixie> | i thought there was just some vague handwaving |
| 00:38 | <jgraham> | Hixie: Algorithm is a bit of a strong term |
| 00:38 | <jgraham> | vauge handwaving is indeed much closer |
| 00:38 | <BenMillard> | Lachy, it seems to make as much different between a row and a row group as it does between a column and a column group, though... |
| 00:39 | <BenMillard> | Lachy, indeed, why have a "rowgroup" value if "row" was intended to cover that case? |
| 00:39 | <Lachy> | hmm, perhaps. |
| 00:39 | <jgraham> | Hixie: re: what AT should read out; as I've said before this seems like exactly the sort of question that user testing could help answer |
| 00:39 | <jgraham> | BenMillard: If you care the Table Inspector has a HTML4 mode |
| 00:39 | <BenMillard> | Lachy, I agree that it's debateable, so I guess either interpretation is right. :) |
| 00:39 | <Lachy> | but I don't think it's a particularly strong argument |
| 00:40 | <jgraham> | BenMillard: I wouldn't expect miracles from it though |
| 00:40 | <Lachy> | anyway, with regards to that assertion I quoted above, the evidence you presented immediately before it doesn't support it |
| 00:42 | <BenMillard> | Lachy, I see what you mean |
| 00:43 | <BenMillard> | Lachy, my thinking was that headers+id "missed out" 8 associations in favour of using scope, while headers+id also duplicates the 6 associations which are made by scope |
| 00:44 | <BenMillard> | Lachy, I interpret the gaps and overlapping as authoring mistakes... |
| 00:44 | <Lachy> | if they're consistent, it's not really a mistake. Just redundant |
| 00:44 | <BenMillard> | Lachy, they are consistent, that's true |
| 00:45 | <BenMillard> | Lachy, what sentence would you suggest in place of that one? |
| 00:47 | <Lachy> | I don't know |
| 00:50 | <BenMillard> | Lachy, how about I strike that sentence and change the 1st one in that paragraph to "So, test file 3 uses a weird patchwork of techniques, with mistakes in the use of scope and colspan." |
| 00:51 | <Lachy> | yeah |
| 00:52 | <BenMillard> | Lachy, done. did you find anything else? |
| 00:53 | <BenMillard> | jgraham, thanks for your review, btw. Short but sweet. :) |
| 00:54 | <Hixie> | Lachy: yeah, but to do that we'd have to make a number of variants of that table, and then give each variant to three or four different users, and ask each user to answer questions about the table |
| 00:54 | <Hixie> | Lachy: so if we tried, say, three variants, and had three users, that's nine users to get under a usability study video camera |
| 00:55 | <BenMillard> | Hixie, is that towards jgraham? |
| 00:55 | <Hixie> | um |
| 00:55 | <Hixie> | yes |
| 00:55 | <Hixie> | my bad |
| 00:57 | <BenMillard> | I'll leave sending the mail about tables until tomorrow. I got a snapshot of all 3 tests. |
| 00:57 | <BenMillard> | Philip`, that table is going into my collection under "To Do". |
| 00:57 | <jgraham> | Hixie: Well I'm not sure how many people 9 is cmpared to the number that, say, Josh works with in a day. Plus given those 9 people they could each look at several different tables so once you had enough people to get data on one type of table, you'd have enough to get data on several |
| 00:58 | <Hixie> | certainly would be great if we could do it |
| 00:59 | <jgraham> | Even without a full test like that one could try a single user with several similar tables and different amounts of verbosity, for example |
| 00:59 | <jgraham> | (one user obviously isn't a very good sample) |
| 01:10 | <BenMillard> | Philip`, I've actually put some notes with it, so it ended up as "USA FactFinder: Demographic Characteristics, 2000" here: http://projectcerbera.com/web/study/2008/collection#tables-government |
| 01:34 | Dashiva | equips vast-browser-wing-conspiracy hat |
| 01:52 | <takkaria> | ah, it's nice when you can mark 81 messages as read safely |
| 02:48 | <takkaria> | http://www.squarefree.com/burningedge/2008/08/29/2008-08-29-trunk-builds/ -- looks like yesterday was a pretty productive day for gecko |
| 02:51 | <jruderman> | that covers changes in the last two weeks, not just yesterday |
| 02:51 | <jruderman> | we only land that much in one day on crazy code freeze days |
| 02:52 | <takkaria> | ah, I thought it had rather a lot on it for a day |
| 02:52 | <takkaria> | still, pretty good going. :) |
| 02:53 | <alyosha> | hi ppl |
| 02:53 | <alyosha> | what do u guys think of IE 8 beta 2's HTML 5 support? |
| 02:54 | <alyosha> | I noticed (and I'm 100% sure I'm not the only one) a regression with unrecognized elements (eg. html 5 sectioning elements and inline elements such a mark) |
| 02:55 | <alyosha> | hopefully they'll fix it b4 final release |
| 03:05 | <takkaria> | have they removed the document.createElement() hack? |
| 03:05 | <alyosha> | yeah, pretty much |
| 03:06 | <alyosha> | but the elements to seem to show up correctly in the DOM tree in IE 8's developer tools |
| 03:07 | <alyosha> | *do |
| 03:13 | <alyosha> | I think it's probably an unintentional bug and they should fix it before final release, but I don't know for sure |
| 03:16 | <alyosha> | and the interesting thing is that the IE7 mode button is disabled on the html 5 doctype |
| 03:17 | <alyosha> | even though IE 7 rendering mode can be hacked to display new elements |
| 03:18 | <alyosha> | hmmm, what do u get with html 5 doctype and <meta http-equiv="X-UA-Compatible" content="IE=7">? |
| 03:22 | <alyosha> | html 5 doctype overrides the meta thingy |
| 03:23 | <alyosha> | IE 7 mode not available for html 5 |
| 03:30 | <alyosha> | actually, they didn't remove the document.createElement() hack. It's just in the CSS, it makes unrecognized elements "UNKNOWN" |
| 03:30 | <alyosha> | just tried disabling script with the hack, and it still works |
| 03:30 | <alyosha> | but the styles just aren't applied |
| 03:34 | <alyosha> | style attributes are applied after the hack, but external stylesheets are not applied |
| 04:17 | <Hixie> | you gotta wonder what a mess their codebase is to get this kind of behaviour |
| 04:18 | <alyosha> | yeah, guess so |
| 04:19 | <alyosha> | IE 7 mode renders fine and shows the stylesheets fine too, but it can only be activated through developer tools or by adding the website to compatibility mode |
| 04:19 | <alyosha> | the meta thing is overridden by the doctype and the button is gone too |
| 04:20 | <alyosha> | gotta love M$, they make sure web designers won't lose their jobs (constantly gotta fix all their problems) |
| 04:20 | <alyosha> | lol |
| 04:22 | <alyosha> | nvm, adding it to compatibility view doesn't work either |
| 04:23 | <alyosha> | does MS have a bug tracker somewhere? |
| 04:24 | <Hixie> | https://connect.microsoft.com/feedback/AdvancedSearch.aspx?SiteID=136&Status=1&FeedbackType=1 i think? |
| 04:25 | <alyosha> | ooh, cool, Microsoft isn't completely submerged in the last decade after all. |
| 04:26 | <Hixie> | if you can get it to work, let me know |
| 04:27 | <alyosha> | sure. but I think most likely we'll have to wait for a fix from MS or do something like <header id="header"> ... #header { /*style here*/ } if they don't fix it |
| 04:31 | <alyosha> | according to this report IE8b1 didn't have this problem, so it's most likely a regression in IE8b2: https://connect.microsoft.com/IE/feedback/ViewFeedback.aspx?FeedbackID=364356 |
| 04:34 | <alyosha> | well, g2g, l8rz |
| 06:20 | <Hixie> | "Ian's approach completely removes HTML conformance checking as a |
| 06:20 | <Hixie> | mechanism to introduce authors to accessibility issues." |
| 06:20 | <Hixie> | -- http://html4all.org/pipermail/list_html4all.org/2008-August/000977.html |
| 06:20 | <Hixie> | well at least they admit that they are trying to use conformance checking for their own purposes |
| 06:21 | <Hixie> | and good to see others on that thread disagreeing with it :-) |
| 07:13 | <hsivonen> | wow. when I fixed bugs in my validation harness, it ran in 4 hours and the output was only 83.4 MB. |
| 07:16 | <hsivonen> | annevk: typo. thanks |
| 07:29 | <Hixie> | hsivonen: heh |
| 07:46 | <hsivonen> | whoa! there are many more 0-error docs than I would have thought |
| 07:49 | <Hixie> | hsivonen: 2? |
| 07:50 | <hsivonen> | Hixie: 4514 |
| 07:50 | <Hixie> | out of a million? |
| 07:50 | <hsivonen> | out of 516875 |
| 07:50 | <hsivonen> | and manual verification shows that it's really so |
| 07:50 | <hsivonen> | however, this ignores the document mode |
| 07:50 | <Hixie> | 0.87% |
| 07:51 | <hsivonen> | so doctypeless files count |
| 07:51 | <Hixie> | does bgcolor in a transitional doc count as pass or fail? |
| 07:51 | <hsivonen> | fail |
| 07:51 | <hsivonen> | this is HTML5 rules |
| 07:51 | <hsivonen> | except for doctype |
| 07:51 | <Hixie> | wow that's not bad then |
| 07:52 | <hsivonen> | note that omitted alt doesn't count as an error |
| 07:52 | <Hixie> | what are we saying, that's horrific. but still. higher than i expected. |
| 07:52 | <hsivonen> | and IRIs on non-UTF-8 pages pass |
| 07:52 | <hsivonen> | no parse errors (doctype errors ignored) is 29% |
| 07:53 | <hsivonen> | which is rather high compared to your old numbers |
| 07:54 | <hsivonen> | but now the results look pretty consistent with what I've seen before in terms of the relative frequencies |
| 07:55 | <Hixie> | i had two numbers, one that counted /> and doctypes as errors and one that didn't |
| 07:55 | <hsivonen> | ah |
| 07:56 | <Hixie> | i forget what my exact numbers were |
| 07:56 | <Hixie> | but one was about 70% and one was about 90% |
| 08:25 | <annevk> | hsivonen, MB or GB? |
| 08:26 | <annevk> | gsnedders, hmm, you didn't do your checkin |
| 08:34 | <hsivonen> | annevk: MB |
| 08:35 | <hsivonen> | annevk: the harness used to have a simple but serious bug |
| 08:35 | <annevk> | but you expected 80GB initially?! |
| 08:35 | <hsivonen> | annevk: 60 GB actually, but that expectation was based on the bug, too |
| 08:35 | <annevk> | ok |
| 08:46 | <Hixie> | hsivonen: very interesting results |
| 08:47 | <Hixie> | hsivonen: these results really argue for consolidating all "attribute [known presentational attribute] not allowed" messages into a single message "This page contains presentational markup. More details... Help on removing presentational markup..." |
| 08:48 | <Hixie> | wow, 7% of pages had an </embed> ? |
| 08:49 | <hsivonen> | so it seems |
| 08:49 | <hsivonen> | crazy |
| 08:49 | <annevk> | lots of people think <embed> needs a closing tag, I once did so too |
| 08:49 | <annevk> | it's not like there was good documentation out there on how it works... |
| 08:51 | <Hixie> | wow, malformed byte sequences aren't that common either |
| 08:51 | <annevk> | "No “p” element in scope but a “p” end tag seen." 9%! |
| 08:52 | <annevk> | madness |
| 08:52 | <Hixie> | that's probably a lot of <p><table></table></p>-type stuff |
| 08:53 | <annevk> | and 5% had "Element “frameset” not allowed in this context. (The parent was element “html”.) Suppressing further errors from this subtree." so many frames still around? |
| 08:53 | <Hixie> | this sample didn't bias for date of creation |
| 08:53 | <Hixie> | so it includes stuff going back many years |
| 08:53 | <Hixie> | there's a lot of old content out there still |
| 08:55 | <Hixie> | sigh i really don't want to reintroduce <script language="">, people typo it so much |
| 08:55 | <Hixie> | and the & issue is a sad one |
| 08:55 | <annevk> | >2% uses <head profile> |
| 08:56 | <Hixie> | iirc there's a lot of pages that have <head profile=""> (blank) |
| 08:56 | <hsivonen> | annevk: wordpress.com gives distinct host names to users |
| 08:56 | <hsivonen> | annevk: livejournal, too |
| 08:56 | <Hixie> | like there are a lot of <a> elements with shape="rect" |
| 08:56 | <hsivonen> | annevk: I was too lazy to deal with those |
| 08:56 | <hsivonen> | annevk: although I did collapse MySpace profiles |
| 08:57 | <Hixie> | hsivonen: i'll give you a domain-separated set of urls next time instead of site-separated |
| 08:58 | <Hixie> | maybe we should make & followed by alphanumerics, followed by =, a non-ambiguous ampersand |
| 08:58 | <Hixie> | that might deal with a bunch of these & errors |
| 08:58 | <annevk> | "Bad value (consolidated) for attribute “lang” from namespace “http://www.w3.org/XML/1998/namespace” on element “html”: Bad language tag: Bad variant subtag." XML sites were included? |
| 08:59 | <hsivonen> | annevk: no |
| 08:59 | <hsivonen> | annevk: the validator sees HTML lang as XML lang internally |
| 08:59 | <takkaria> | Hixie: I think that could be a big win for authoring |
| 08:59 | <hsivonen> | annevk: and these messages weren't fully sanitized for UI consumption |
| 08:59 | <annevk> | Hixie, maybe also allow anything but [a-Z#] |
| 09:00 | <annevk> | to follow it |
| 09:00 | <Hixie> | annevk: ? |
| 09:00 | <annevk> | &" would be conforming |
| 09:00 | <annevk> | and so would 2&2 |
| 09:01 | <Hixie> | the character encoding thing -- we could make <meta charset> allowed if not preceeded by any non-ASCII |
| 09:01 | <annevk> | or (&) |
| 09:01 | <Hixie> | annevk: i posit that the problem is just urls in attributes |
| 09:03 | <takkaria> | fwiw I'd prefer the "get a character reference" algorithm not to depend on whether you're in an attribute value state or not |
| 09:03 | <annevk> | I don't see what's wrong loosening them up both, given that you keep several extension points |
| 09:03 | <annevk> | takkaria, it already does |
| 09:04 | <annevk> | takkaria, and if we are to keep compat with IE, it has to be that way |
| 09:05 | <takkaria> | I mean in this particular case. i.e. if you can paste an unescaped URL into an attribute value you should also be able to conformingly paste it outside an attribute value |
| 09:07 | <annevk> | that wouldn't work well |
| 09:08 | <annevk> | eg, it would go wrong with &= which does different things |
| 09:09 | <takkaria> | mm, that's a point |
| 09:10 | <takkaria> | ah well. it would be nice, though |
| 09:13 | <annevk> | at this point chaals would ask for a pony |
| 09:51 | <annevk> | grmbl, how do you properly configure lxml? |
| 09:52 | <annevk> | unzipped it's 25MB |
| 10:34 | <hsivonen> | Unsupported character encoding name: “iso-utf-8”. Will continue sniffing. |
| 10:34 | <hsivonen> | Unsupported character encoding name: “44-iso-8859-1”. Will continue sniffing. |
| 10:35 | <hsivonen> | crazy ebcdic charset in HTTP: http://web-sniffer.net/?url=http%3A%2F%2Fwww.antalis.fr%2Fsitesweb%2FFO%2Fpages%2Finterne-2-66-2122-rich_text-73228.html&submit=Submit&http=1.1&type=GET&uak=0 |
| 10:36 | <hsivonen> | Unsupported character encoding name: “gb2312,big5,euc-kr”. Will sniff. |
| 10:37 | <hsivonen> | Unsupported character encoding name: “zh-tw”. Will sniff. |
| 10:37 | <hsivonen> | you can't make this stuff up |
| 10:54 | <jgraham> | hsivonen: btw, I'm not sure that such a thing as an unbiased sample of webpages exists |
| 10:55 | <hsivonen> | jgraham: sure. I said it was biased. :-) |
| 10:55 | <Philip`> | You can't even know how it's biased, because you can't know what the population is |
| 10:56 | <jgraham> | hsivonen: I know. I just think it's a tautology |
| 10:56 | <hsivonen> | yeah |
| 10:56 | <hsivonen> | and, yet, with different page sets, the same common errors come to the top |
| 10:59 | <jgraham> | In some sense approximately all the pages on the web are autogenerated pages which use the url to determine the content e.g. calendar.example.com/year/month/day with only implementation limits on the value of year |
| 11:00 | <jgraham> | So an unbiased sample of the whole population of http URLs that return 200 would be very misleading |
| 11:01 | <Philip`> | "approximately all" is not a concept that makes sense, where there's an infinite number of pages |
| 11:03 | <hsivonen> | more to the point, the number of pages in countably infinite which should make counting proportions a bit more tractable |
| 11:04 | <hsivonen> | Unsupported character encoding name: “big6”. Will sniff. |
| 11:04 | <gsnedders> | annevk: I asked if you wanted me to do it last night so you could work on it this morning. I got no answer :P |
| 11:04 | <Philip`> | Positive integers are countably infinite too, but it doesn't make sense to ask for an unbiased random sampling of positive integers |
| 11:05 | <gsnedders> | annevk: I took the default-lazy solution |
| 11:05 | <hsivonen> | Philip`: true, but you can say that half of the integers are positive |
| 11:05 | <Philip`> | hsivonen: No you can't :-p |
| 11:06 | <annevk> | gsnedders, I thought the default was yes! |
| 11:06 | <Philip`> | For every positive integer you give me, I'll give you back two negative integers, so there's twice as many :-) |
| 11:06 | <hsivonen> | Philip`: hmm. right. |
| 11:06 | <hsivonen> | now I appear silly and badly educated |
| 11:06 | gsnedders | attempts to cd Documents/Stuff\ I\'m\ Working\ On/spec-gen |
| 11:06 | <annevk> | gsnedders, I would appreciate a bundle of lxml+anolis+html5lib so I can just write the frontend script and don't have to worry about the bundling as I'm really bad at that |
| 11:07 | annevk | tried it this morning and couldn't get the lxml dependency to work |
| 11:07 | <gsnedders> | annevk: I've never tried bundling :) |
| 11:07 | <gsnedders> | annevk: lxml is written in C, which may make it harder |
| 11:07 | <Philip`> | (It does make sense to ask for an unbiased random real number between 0 and 1, even though that's an uncountable set) |
| 11:07 | <annevk> | gsnedders, I think that's the problem, yes |
| 11:07 | <Philip`> | (or at least I think it makes sense) |
| 11:08 | <gsnedders> | But it really does need to be for the sake of being reasonably quick |
| 11:09 | <annevk> | what's a difference between a pleonasm and tautology? |
| 11:09 | <Hixie> | hsivonen, Philip`: in this particular case the population was itself a (biased, non-random) subset of google's index |
| 11:10 | <annevk> | ah I see, tautology is also used in logic |
| 11:11 | <Hixie> | a tautology is specifically being overly specific in a redundant manner. a pleonasm is just using too many words. as i understand it. |
| 11:12 | <Philip`> | I think the logical meaning of tautology is a statement that's true regardless of the values of any variables in it |
| 11:12 | <annevk> | maybe the Dutch and English pleonasm are different then (in Dutch "round circle" is considered a "pleonasme") |
| 11:12 | <annevk> | Philip`, yeah |
| 11:13 | Philip` | guesses that must include all true statements that don't have any variables |
| 11:14 | <annevk> | "2. Logic. An empty or vacuous statement composed of simpler statements in a fashion that makes it logically true whether the simpler statements are factually true or false; for example, the statement Either it will rain tomorrow or it will not rain tomorrow." |
| 11:16 | <GregHouston> | Logical "proofs" of the existence of God generally falls into the category of a tautology. |
| 11:18 | <annevk> | gsnedders, anyway, for you the stuff is running right? can't you just zip that dir? :) |
| 11:19 | <gsnedders> | annevk: Only if you're running OS X/x86 :) |
| 11:19 | <gsnedders> | As of course the compiled C stuff… |
| 11:21 | <annevk> | grmbl |
| 11:24 | <annevk> | so how do I install lxml? |
| 11:24 | <annevk> | running setup.py install fails |
| 11:25 | <gsnedders> | annevk: http://codespeak.net/lxml/installation.html :P |
| 11:26 | <virtuelv> | annevk: sudo apt-get install python-lxml :P |
| 11:27 | <annevk> | hmm |
| 11:27 | annevk | wonders if dreamhost supports that |
| 11:27 | <virtuelv> | they don't |
| 11:28 | <gsnedders> | You need to install it in a custom path |
| 11:28 | <virtuelv> | on slicehost, that stuff is a bit easier, given that you have root |
| 11:28 | <annevk> | "annevk is not in the sudoers file. This incident will be reported." |
| 11:29 | <annevk> | gsnedders, DreamHost doesn't have easy_install |
| 11:32 | <Philip`> | Do they have hard_install? |
| 11:33 | <Hixie> | there appears to be an inverse corrolation between how much actual useful research someone has done, and how much they ask people who are doing research to do more |
| 11:33 | <annevk> | -_- |
| 11:34 | gsnedders | is gonna have to install it on (mt) |
| 11:35 | <Philip`> | Hixie: That would be because the people who can do research themselves do it themselves instead of having to ask others :-) |
| 11:35 | <annevk> | grmbl, even if I do apt-get on my local machine it complains about lxml.html not being there :/ |
| 11:35 | <Hixie> | that and they know how much work it is, i imagine |
| 11:35 | <Philip`> | It would be nicer if they said *why* they wanted that research, and what useful information it would be likely to reveal |
| 11:42 | <annevk> | gsnedders, I guess the lxml dependency is pretty big? |
| 11:42 | <gsnedders> | annevk: Yeah. |
| 11:42 | <annevk> | sigh |
| 11:43 | <gsnedders> | annevk: It's the structure used for the tree everywhere |
| 11:43 | <jgraham> | Philip`: It's not clear to me that there are an infinite number of web pages given likely limits on URL length supported by servers |
| 11:44 | <jgraham> | annevk: If you want python to work sensibly on Dreamhost you have to install it youself under your home directory |
| 11:44 | <jgraham> | Then you install easy_install |
| 11:44 | <jgraham> | Then you do easy_install lxml |
| 11:45 | <jgraham> | Then you just have to rember to change anything like #!/usr/bin/env python to #!/home/annevk/bin/python |
| 11:46 | <jgraham> | Otherwise using any external dependencies seems to be really hard |
| 11:47 | <gsnedders> | Not really |
| 11:48 | <jgraham> | gsnedders: It's getting the paths right so you can import stuff that seemed to be hard |
| 11:48 | <gsnedders> | export PYTHONPATH=${HOME}/packages/lib/python |
| 11:48 | <gsnedders> | export PATH=${HOME}/packages/bin:$PATH |
| 11:48 | <gsnedders> | in .bash_profile |
| 11:48 | <gsnedders> | That's what used on sp.org |
| 11:48 | <jgraham> | Hmm, I thought I tried that and it didn't work |
| 11:49 | <jgraham> | Anyway setting PYTHONPATH is a bad idea in general |
| 11:49 | <gsnedders> | That's true, but it works ;P |
| 11:50 | <gsnedders> | annevk: See what I just pushed |
| 11:50 | <gsnedders> | i.e., http://hg.gsnedders.com/hgwebdir.cgi/anolis/rev/cf4770338aa0 |
| 11:53 | <virtuelv> | annevk: there is some tutorial for rolling your own python on DH |
| 11:53 | <virtuelv> | http://wiki.dreamhost.com/Python#Building_a_custom_version_of_Python |
| 15:27 | <gsnedders> | Time to go out into town to do something about the /topic |
| 15:28 | <jcranmer> | gsnedders: you're leaving your sense of logic behind? |
| 16:07 | <virtuelv> | gsnedders: I presume you'll put URL in /topic |
| 16:19 | gsnedders | is too impatient to wait in a queue of the length there was |
| 16:20 | <gsnedders> | (i.e., my hair is still the same old colour) |
| 17:42 | <hsivonen> | weird. my Mac had bluescreened (literally) while unattended |
| 17:45 | gsnedders | still has never got a pinkscreen |
| 17:45 | <Lachy> | hsivonen, do you mean a kernel panic? |
| 17:46 | <Lachy> | AFAIK, macs can't get BSODs |
| 17:46 | <gsnedders> | Lachy: They can however get stuck on a blank blue screen |
| 17:46 | <gsnedders> | Lachy: For no apparent reason |
| 17:47 | <Lachy> | I've never seen that |
| 17:48 | gsnedders | tries to decide in what order to post his blog posts |
| 17:48 | <Lachy> | I've had my machines have kernel panics a couple of times, and just freeze with the spinning beachball cursor. |
| 17:48 | <Lachy> | gsnedders, I'd recommend starting with number 1 followed by number 2 |
| 17:49 | <gsnedders> | Lachy: It would make more sense to do them in chronological order, but the earlier one is far more time-consuming to write |
| 17:49 | <Lachy> | ok |
| 17:49 | <Lachy> | I have a number of blog posts I have to finish writing |
| 17:50 | <Lachy> | I suppose I should just post something about IE8 tonight, and then post my other, significantly longer, potentially 3-part series later |
| 17:52 | <gsnedders> | I have eight drafts currently |
| 17:53 | <gsnedders> | One gives a useful answer to <http://krijnhoetmer.nl/irc-logs/whatwg/20080605#l-450> |
| 17:54 | <gsnedders> | The other follows on from that |
| 17:55 | <Lachy> | I'd forgotten I'd even asked that question. I suppose it'll be good to get a better answer than "Stuff" |
| 17:55 | <gsnedders> | It was the first place I could think of that has a public record of me avoiding that question. |
| 17:59 | <gsnedders> | Writing about May last year is rather time-consuming. |
| 18:03 | gsnedders | smacks his old writing |
| 18:04 | <gsnedders> | It uses -ise :( |
| 18:04 | <Lachy> | VMWare ThinApp is absolutely brilliant! Now I can seamlessly run IE6, IE7, and IE8b1 and IE8b2 all within the same copy of Windows XP, which is itself running in VMWare Fusion on OS X. |
| 18:05 | <Lachy> | it basically runs each version of IE, or any other application I like, within its own sandbox |
| 18:06 | <gsnedders> | As long as no sand falls over the edge, I guess that's all right |
| 18:07 | <Lachy> | gsnedders, what is wrong with using -ise? |
| 18:07 | <gsnedders> | Lachy: en-gb-oed prefers -ize :P |
| 18:07 | <Lachy> | what?! |
| 18:07 | <Lachy> | nooO! |
| 18:08 | <Lachy> | -ize is wrong. Stupid American misspelling |
| 18:08 | <gsnedders> | No, it isn't. |
| 18:08 | <Lachy> | yes, it is |
| 18:08 | <Lachy> | I thought en-GB used -ise, just like en-AU |
| 18:08 | <gsnedders> | -ize comes from Greek, and should be used on Greek-derived words |
| 18:09 | <GregHouston> | Am I looking at the right thing. It looks like Thin App starts around $6000. I have Workstation and it was a little under $200. |
| 18:10 | <gsnedders> | en-gb only uses -ise, en-gb-oed uses -ize for words of Greek origin and -ise for those of French, en-us uses -ize |
| 18:10 | <gsnedders> | "[T]he suffix…, whatever the element to which it is added, is in its origin the Gr[eek] -ιζειν, L[atin] -izāre; and, as the pronunciation is also with z, there is no reason why in English the special French spelling in -iser should be followed, in opposition to that which is at once etymological and phonetic." — the OED |
| 18:11 | <gsnedders> | en-us also over does the entire z thing. Analyze is wrong. |
| 18:11 | <Lachy> | hmm, interesting |
| 18:12 | <gsnedders> | en-gb uses -ise too much, en-us uses -ize too much |
| 18:12 | <Lachy> | I still think -ise should be used for *everything* |
| 18:13 | <Lachy> | except for words like prize which are supposed to end in -ize |
| 18:14 | <Lachy> | wiktionary says that it's supposed to be -ise for french-origin words and -ize for greek-origin words. But to do that, I would have to know the origin of each word before I tried to spell it |
| 18:14 | <gsnedders> | me wonders whether he really should add a certain girl on Facebook… |
| 18:17 | <gsnedders> | (She is all ready convinced that I'm secretly in love with her, which is totally untrue) |
| 18:24 | <GregHouston> | It appears Thin App really is $6k. Application virtualization must be pretty tricky to cost 20 times that of a virtual machine. |
| 18:24 | <GregHouston> | I can't multipy. Make that 30 times. |
| 18:25 | <GregHouston> | Or spell. * multiply |
| 19:08 | <Philip`> | jgraham: You only need a single custom HTTP server that supports arbitrary-length URLs, and then the web can have an infinite number of pages, and I would have thought at least one person would have made such a server |
| 19:08 | <Philip`> | If nobody has, I'll make one, just to prove my point :-p |
| 20:10 | <gsnedders> | Philip`: You have calendars that can be navigated endlessly. There's no need for custom HTTP servers. |
| 20:17 | <Philip`> | gsnedders: But those calendars might have finite URL limitations |
| 20:23 | <Philip`> | (even if it's only limited by the amount of RAM available) |
| 21:58 | <gsnedders> | Philip`: Your webserver that supports arbitrary-length URLs will have the same RAM limitations |
| 22:00 | <Philip`> | gsnedders: No it won't - it won't store the URL in memory |
| 22:00 | <gsnedders> | Philip`: It just returns something for any request? |
| 22:03 | <Philip`> | gsnedders: It could ignore the URL entirely, or it could do some streaming processing of it to calculate a finite output |
| 22:03 | <Philip`> | (I assume HTTP doesn't particularly like you sending the response before you've received the request, so you can't do anything like echo the URL back to the client) |
| 22:04 | <gsnedders> | I don't think RFC2616 actually forbids you from doing so… |
| 22:48 | <Philip`> | gsnedders: Does it never require you receive the whole header so you can detect invalid requests and send an appropriate response? |
| 23:07 | <gsnedders> | Philip`: I don't think so. But it's not the best of specs. |
| 23:07 | <gsnedders> | Philip`: It doesn't require anything in specific in the case of invalid requests |
| 23:19 | <annevk> | Hmm, installing Python on DreamHost might be ok, but I can't even get lxml running in Ubuntu... |
| 23:34 | Hixie | battles iPod and Time Machine woes |
| 23:34 | <Hixie> | looks like the USB ports on my cinema display are busted |
| 23:34 | <Hixie> | no idea how THAT happened |
| 23:35 | <Philip`> | Maybe they're jammed full of popcorn and coke |
| 23:43 | <annevk> | nn |