00:00
<gsnedders>
I probably ought to do the work I'm meant to do for Computing
00:00
<gsnedders>
I'm really get quite far behind
00:00
<gsnedders>
Though de-facto as long as I've done everything I'm meant to by December it doesn't really matter when I do it
00:00
<BenMillard>
gsnedders, do prioritise things above this e-mail. I might not send for a day or two yet
00:01
<gsnedders>
BenMillard: I've just been dealing with other low priority email
00:01
<Lachy>
why? I took my files directly from juicystudio, and didn't modify anything else
00:01
<gsnedders>
BenMillard: From several months ago :)
00:01
<BenMillard>
Lachy, you see the class="headers" here? http://juicystudio.com/wcag/tables/noscope.html
00:01
<jgraham>
Why did someone think it was a good idea for lxml to add a random doctype when parsing html documents?
00:01
<BenMillard>
sorry, class="header": "<td class="header">12/12/2005</td"
00:01
<gsnedders>
If you send me something low priority and don't get a reply within an hour or two, it'll probably take a few weeks or months :)
00:02
<gsnedders>
jgraham: Because libxml2's HTML support is just a big hack
00:02
<jgraham>
gsnedders: The problem is that their XML support tries to enforce XML rules
00:02
<BenMillard>
Lachy, the 3 instances of class="header" are also missing from your complexdatatable.html
00:02
<jgraham>
Like no : in tag names
00:03
<jgraham>
s/tag/attribute/
00:03
<Lachy>
Looks like they've just been removed from those files
00:03
<Lachy>
press reload
00:03
<BenMillard>
Lachy: yes, you're right
00:03
<BenMillard>
that's annoying
00:03
<Lachy>
I did at first, but that must have been cached
00:03
<BenMillard>
they're changing the record whilst I'm replying to it :(
00:03
<Lachy>
yeah, he's destroying the evidence
00:04
<BenMillard>
Lachy, OK, so if you saw those attributes were present then those are the values I'll give since they are historically accurate
00:05
<jgraham>
Those attributes were present for sure. I think I mentioned it in an email
00:05
<gsnedders>
Why hasn't anyone creating a decent Flickr downloader yet?
00:05
<gsnedders>
That is like, easy to use.
00:05
<jgraham>
gsnedders: What do you mean decent?
00:05
<Philip`>
gsnedders: Like, a web browser?
00:05
<gsnedders>
Philip`: But to download an entire set?
00:05
<Philip`>
gsnedders: Oh
00:05
<BenMillard>
Lachy, the absence of class="header" makes our numbers match, so at least I haven't forgotten how to count. :)
00:05
<gsnedders>
jgraham: Not having a crazily complex UI. Copy and pasting a URL from a browser would work fine.
00:06
<Philip`>
gsnedders: Write a few (dozen) lines of script to use their API?
00:06
<Lachy>
ok, so my values are wrong cause they're the new values without the classes
00:06
<BenMillard>
Lachy, correct
00:06
<gsnedders>
Philip`: Because I need something that works on my uncle's computer, so I could just write a web API
00:06
<gsnedders>
*web interface
00:06
gsnedders
yawns
00:07
<Lachy>
BenMillard, there are only 18 header cells by my count, not 20
00:07
<BenMillard>
Lachy, 12 in the <thead>, agreed?
00:07
<Lachy>
yes
00:07
<BenMillard>
Lachy, 2 in the first column of the <tbody>?
00:08
<Lachy>
plus the 6 for budgeted, actual and forcasted in the column
00:08
<BenMillard>
Lachy, 6 in column 7 as you say
00:08
<BenMillard>
Lachy, yep, I've forgotten how to add up them :)
00:08
<Lachy>
ah, I didn't count those first 2 as headers
00:08
<BenMillard>
oh wait, 12 + 6 + 2 = 20
00:08
<Lachy>
the "Partner Portal" ones?
00:08
<BenMillard>
Lachy, yeah
00:09
<BenMillard>
they are associated as being row headers
00:10
<Lachy>
in complexdatatable.html, they are. But in noscope.html, there's nothing that indicates they are headers
00:10
<BenMillard>
"<td scope="row" id="row1" rowspan="3">Partner Portal</td>" in http://juicystudio.com/wcag/tables/complexdatatable.html
00:10
<BenMillard>
Lachy, yeah, later in my e-mail I mention that test 1 is unfair
00:10
<Lachy>
ok
00:10
<BenMillard>
and that scope="row" was used in test 3 instead of using headers+id for all the associations
00:10
<Lachy>
you mean test 2
00:11
<BenMillard>
Lachy, oh sorry you're right
00:11
<BenMillard>
scope="row" is used in addition to headers+id in test 3
00:11
<Lachy>
oh, what are the final byte counts you used? I should check the percentage given too
00:12
<BenMillard>
Lachy, 1,704 and 2,625. yes, I have made percentage errors before now :)
00:14
<Lachy>
I get 54.05%
00:14
<BenMillard>
Lachy, when talking about test file 3, I say "5 cells use <td scope> and participate in headers+id, duplicating the association." so my e-mail about that aspect correct, I just got muddled during this review
00:15
<BenMillard>
Lachy, what is your calculation for that? Maybe I've forgotten percentage increase math...
00:16
<Lachy>
(2625 - 1704) / 1704 * 100 = 921 / 1704 * 100 = 54.04%
00:16
<BenMillard>
hmm, that's more complicated than what I did :)
00:16
<Lachy>
what did you do?
00:17
<BenMillard>
just now I tried 2,625 / 1,704 = 1.5404 so yours looks right
00:17
<BenMillard>
maybe I typod my sum first time round
00:17
<Lachy>
I assumed I needed to find the difference, and then find out what percentage that difference was with the lower value
00:18
<BenMillard>
Lachy, so me saying 36% more markup was understating the code bloat by quite a bit! thanks for spotting that
00:18
<Lachy>
our numbers are consistent. Yours (1.5404) means that 2625 is 154% the size of 1704
00:19
<Lachy>
whereas mine says it's 54% larger
00:19
<BenMillard>
Lachy, yeah that's how I interpret it
00:19
<Philip`>
(Markup-size doesn't seem a very interesting measure when these tables are probably generated by programs from databases, and no human ever needs to look at the markup, and simplicity of implementing the table-generating code seems much more relevant)
00:20
<BenMillard>
Philip`, I've seen auto-generated headers+id, for sure
00:20
<Lachy>
Philip`, throwing lots of data at people, regardless of how relevant it is, is a useful techniqe for winning an argument :-)
00:20
<BenMillard>
I've also seen typoed headers+id
00:21
<Lachy>
fyorfty percent of all people know that, Kent
00:21
<Philip`>
Lachy: Winning an argument is not the aim; the aim is to design the best possible system :-p
00:21
<BenMillard>
Philip`, it's also worth considering that if the generating code can be radically simpler (such as just using <th> for all headers) that reduces the likelihood of bugs in the table
00:22
<Lachy>
s/fyorfty/forfty/ (I messed the simpsons quote :-))
00:23
<Philip`>
BenMillard: It's good to encourage people to do the simplest thing, but sometimes they just have complex tables, so I thought the issue was how to support the most complex tables (e.g. whether to force them to use <th> instead of <td>)
00:23
<BenMillard>
Philip`, that's right. So if a table can be supported by plain <th> using a sane association algorithm, that's preferable over the complexity and bloat of headers+id, in my judgement.
00:24
<BenMillard>
but I can well imagine irregular tables will sometimes be necessary and need headers+id, although even then all the headers could be done as <th>
00:24
<jgraham>
Hmm BenMillard keeps saying sensible things so I don't have to
00:25
<Philip`>
BenMillard: Would the headers attribute be supported only on <td>, not <th>?
00:25
<BenMillard>
Philip`, I haven't studied that in detail yet. would you like me to forward the message to you?
00:26
<Philip`>
How would it handle something like http://factfinder.census.gov/servlet/QTTable?_bm=n&_lang=en&qr_name=DEC_2000_SF1_U_DP1&ds_name=DEC_2000_SF1_U&geo_id=05000US48487 where the numbers need to be associated with the label in the first column, but the labels in the first column also need to be associated with some random set of other label cells?
00:26
<jgraham>
Philip`: I think there is likely a use case for @headers on th although no one has actually brought forward a table that needs it (at least recently)
00:27
<BenMillard>
Philip & jgraham, I call those "heirarchical row headers" although nobody else does :)
00:27
<Lachy>
"Test file 1 erroneously uses <td> for 10 of the 20 header cells" Which headers make up the 10? I only count 9
00:27
<BenMillard>
Lachy, I'll recount
00:27
<Lachy>
actually, 11
00:28
<Lachy>
3 dates, 2 x Partner Portal, 6 Budged/actual/forcast
00:28
<Philip`>
BenMillard: It might be best to not forward the email, since I have too many other things I ought to be working on instead :-)
00:28
<BenMillard>
Philip`, sure thing
00:28
<BenMillard>
Lachy, so we're talking about? http://juicystudio.com/wcag/tables/noscope.html
00:28
<Lachy>
yes
00:28
<jgraham>
Philip`: That table looks like it should actually be several smaller tables
00:29
<BenMillard>
Lachy, I agree with 11. thanks!
00:29
<Philip`>
jgraham: I don't think splitting it into smaller tables would help with the "One race -> Asian -> Asian Indian" label hierarchy, which is the main problem
00:29
<BenMillard>
(so this is another case where I understated the error)
00:30
<jgraham>
Philip`: I think that layout would need @headers on <th>
00:30
<Hixie>
iirc you can actually do Philip`'s table with some careful use of rowspans, but i forget if i ended up making that work or not (and it's dubious whether that's desireable anyway)
00:30
<jgraham>
Philip`: Sure but it would have confused me hell of lot less
00:30
<Philip`>
(Also splitting it into smaller tables would make the layout go all ugly, because you want them to all be exactly the same column sizes, and there's no way to enforce that when they're multiple tables)
00:30
<jgraham>
adn I can see it
00:30
<BenMillard>
Hixie, yes, rowspan works for "heirarchical row header" case...if you've got enough width to present it
00:32
<Hixie>
Philip`: i'm not sure what the best way to render that table is, but i'm pretty sure that "0. Subject, Race, One Race, Native Hawaiian and Other Pacific Islander, Other Pacific Islander 2; Number" is not the best way to read out that cell
00:32
<BenMillard>
Philip`, the table uses fixed-width, such as width="385", so you could split it and keep the fixed widths
00:32
<Philip`>
BenMillard: Then you're making assumptions about how many pixels the user's font uses
00:32
<Hixie>
Philip`: which is presumably what one would get if we encouraged people to chain headers
00:33
<Hixie>
Philip`: it should definitely be possible to link columns into having the same widths even in different tables, though css can't do that (and likely won't for some time) so i agree that in this case we shouldn't assume that it is possible
00:33
<BenMillard>
Hixie, when moving from cell to cell the more sophisticated ATs only announce the headers which have changed
00:33
<Lachy>
BenMillard, "3 of the cells using <td scope="row"> also use rowspan." - I only see 2 scope="row" in test 3
00:34
<Hixie>
BenMillard: well then it would sound exactly like if there weren't chained headers, assuming you're navigating the table linearly
00:34
<Lachy>
and this assertion of yours is debatable "For scope to work here under HTML4, scope=""rowgroup" must be used with the appropriate use of <tbody> around the rows which are being spanned: "
00:34
<BenMillard>
Lachy, yep, well spotted
00:34
<Lachy>
the spec is ambiguous though
00:34
<jgraham>
Hixie: FWIW Al suggested that the common AT setup is to have headers red out on demand
00:34
<Lachy>
it says row, but technically it's still in 3 rows
00:34
<BenMillard>
Lachy, does scope="row" apply to multiple rows in HTML4?
00:34
<jgraham>
s/red/read/
00:35
<Lachy>
in fact, it doesn't say one way or the other
00:35
<Lachy>
it just says "row: The current cell provides header information for the rest of the row that contains it"
00:35
<Hixie>
jgraham: that would suggest it would render as: "Zero." zero what? crap, what are the headers? "Subject, Race, One Race, Native Hawaiian and Other Pacific Islander, Other Pacific Islander 2; Number" say what now?
00:35
<Lachy>
so does that mean the rest of the <tr> that contains it, or the rest of the row(s) that it's actually in?
00:36
<jgraham>
Hixie: I agree in this case it's pretty hard to understand. But I find that table pretty hard to understand so maybe it's just a badly designed table
00:36
<BenMillard>
Lachy, it seems to think a "row" is different from a "row group" so my reading is that scope="row" applies to exactly one line of cells across the table
00:36
<Hixie>
jgraham: quite possible
00:37
<Lachy>
You say "This further exemplifies how difficult the headers+id system is to get right", after you mention errors with scope=""
00:37
<Hixie>
jgraham: but i think "Zero." zero what? crap, what are the headers? "Other Pacific Islander 2; Number" would be easier to understand.
00:37
<BenMillard>
Lachy, can we nail down the scope="row" thing first? :)
00:37
<jgraham>
Lachy: Trying to understand the HTML4 headers spec algorithm is a lost cause
00:37
<Hixie>
there's an algorithm?
00:37
<Lachy>
BenMillard, HTML4 is not clear enough to be certain one way or another
00:37
<Hixie>
i thought there was just some vague handwaving
00:38
<jgraham>
Hixie: Algorithm is a bit of a strong term
00:38
<jgraham>
vauge handwaving is indeed much closer
00:38
<BenMillard>
Lachy, it seems to make as much different between a row and a row group as it does between a column and a column group, though...
00:39
<BenMillard>
Lachy, indeed, why have a "rowgroup" value if "row" was intended to cover that case?
00:39
<Lachy>
hmm, perhaps.
00:39
<jgraham>
Hixie: re: what AT should read out; as I've said before this seems like exactly the sort of question that user testing could help answer
00:39
<jgraham>
BenMillard: If you care the Table Inspector has a HTML4 mode
00:39
<BenMillard>
Lachy, I agree that it's debateable, so I guess either interpretation is right. :)
00:39
<Lachy>
but I don't think it's a particularly strong argument
00:40
<jgraham>
BenMillard: I wouldn't expect miracles from it though
00:40
<Lachy>
anyway, with regards to that assertion I quoted above, the evidence you presented immediately before it doesn't support it
00:42
<BenMillard>
Lachy, I see what you mean
00:43
<BenMillard>
Lachy, my thinking was that headers+id "missed out" 8 associations in favour of using scope, while headers+id also duplicates the 6 associations which are made by scope
00:44
<BenMillard>
Lachy, I interpret the gaps and overlapping as authoring mistakes...
00:44
<Lachy>
if they're consistent, it's not really a mistake. Just redundant
00:44
<BenMillard>
Lachy, they are consistent, that's true
00:45
<BenMillard>
Lachy, what sentence would you suggest in place of that one?
00:47
<Lachy>
I don't know
00:50
<BenMillard>
Lachy, how about I strike that sentence and change the 1st one in that paragraph to "So, test file 3 uses a weird patchwork of techniques, with mistakes in the use of scope and colspan."
00:51
<Lachy>
yeah
00:52
<BenMillard>
Lachy, done. did you find anything else?
00:53
<BenMillard>
jgraham, thanks for your review, btw. Short but sweet. :)
00:54
<Hixie>
Lachy: yeah, but to do that we'd have to make a number of variants of that table, and then give each variant to three or four different users, and ask each user to answer questions about the table
00:54
<Hixie>
Lachy: so if we tried, say, three variants, and had three users, that's nine users to get under a usability study video camera
00:55
<BenMillard>
Hixie, is that towards jgraham?
00:55
<Hixie>
um
00:55
<Hixie>
yes
00:55
<Hixie>
my bad
00:57
<BenMillard>
I'll leave sending the mail about tables until tomorrow. I got a snapshot of all 3 tests.
00:57
<BenMillard>
Philip`, that table is going into my collection under "To Do".
00:57
<jgraham>
Hixie: Well I'm not sure how many people 9 is cmpared to the number that, say, Josh works with in a day. Plus given those 9 people they could each look at several different tables so once you had enough people to get data on one type of table, you'd have enough to get data on several
00:58
<Hixie>
certainly would be great if we could do it
00:59
<jgraham>
Even without a full test like that one could try a single user with several similar tables and different amounts of verbosity, for example
00:59
<jgraham>
(one user obviously isn't a very good sample)
01:10
<BenMillard>
Philip`, I've actually put some notes with it, so it ended up as "USA FactFinder: Demographic Characteristics, 2000" here: http://projectcerbera.com/web/study/2008/collection#tables-government
01:34
Dashiva
equips vast-browser-wing-conspiracy hat
01:52
<takkaria>
ah, it's nice when you can mark 81 messages as read safely
02:48
<takkaria>
http://www.squarefree.com/burningedge/2008/08/29/2008-08-29-trunk-builds/ -- looks like yesterday was a pretty productive day for gecko
02:51
<jruderman>
that covers changes in the last two weeks, not just yesterday
02:51
<jruderman>
we only land that much in one day on crazy code freeze days
02:52
<takkaria>
ah, I thought it had rather a lot on it for a day
02:52
<takkaria>
still, pretty good going. :)
02:53
<alyosha>
hi ppl
02:53
<alyosha>
what do u guys think of IE 8 beta 2's HTML 5 support?
02:54
<alyosha>
I noticed (and I'm 100% sure I'm not the only one) a regression with unrecognized elements (eg. html 5 sectioning elements and inline elements such a mark)
02:55
<alyosha>
hopefully they'll fix it b4 final release
03:05
<takkaria>
have they removed the document.createElement() hack?
03:05
<alyosha>
yeah, pretty much
03:06
<alyosha>
but the elements to seem to show up correctly in the DOM tree in IE 8's developer tools
03:07
<alyosha>
*do
03:13
<alyosha>
I think it's probably an unintentional bug and they should fix it before final release, but I don't know for sure
03:16
<alyosha>
and the interesting thing is that the IE7 mode button is disabled on the html 5 doctype
03:17
<alyosha>
even though IE 7 rendering mode can be hacked to display new elements
03:18
<alyosha>
hmmm, what do u get with html 5 doctype and <meta http-equiv="X-UA-Compatible" content="IE=7">?
03:22
<alyosha>
html 5 doctype overrides the meta thingy
03:23
<alyosha>
IE 7 mode not available for html 5
03:30
<alyosha>
actually, they didn't remove the document.createElement() hack. It's just in the CSS, it makes unrecognized elements "UNKNOWN"
03:30
<alyosha>
just tried disabling script with the hack, and it still works
03:30
<alyosha>
but the styles just aren't applied
03:34
<alyosha>
style attributes are applied after the hack, but external stylesheets are not applied
04:17
<Hixie>
you gotta wonder what a mess their codebase is to get this kind of behaviour
04:18
<alyosha>
yeah, guess so
04:19
<alyosha>
IE 7 mode renders fine and shows the stylesheets fine too, but it can only be activated through developer tools or by adding the website to compatibility mode
04:19
<alyosha>
the meta thing is overridden by the doctype and the button is gone too
04:20
<alyosha>
gotta love M$, they make sure web designers won't lose their jobs (constantly gotta fix all their problems)
04:20
<alyosha>
lol
04:22
<alyosha>
nvm, adding it to compatibility view doesn't work either
04:23
<alyosha>
does MS have a bug tracker somewhere?
04:24
<Hixie>
https://connect.microsoft.com/feedback/AdvancedSearch.aspx?SiteID=136&Status=1&FeedbackType=1 i think?
04:25
<alyosha>
ooh, cool, Microsoft isn't completely submerged in the last decade after all.
04:26
<Hixie>
if you can get it to work, let me know
04:27
<alyosha>
sure. but I think most likely we'll have to wait for a fix from MS or do something like <header id="header"> ... #header { /*style here*/ } if they don't fix it
04:31
<alyosha>
according to this report IE8b1 didn't have this problem, so it's most likely a regression in IE8b2: https://connect.microsoft.com/IE/feedback/ViewFeedback.aspx?FeedbackID=364356
04:34
<alyosha>
well, g2g, l8rz
06:20
<Hixie>
"Ian's approach completely removes HTML conformance checking as a
06:20
<Hixie>
mechanism to introduce authors to accessibility issues."
06:20
<Hixie>
-- http://html4all.org/pipermail/list_html4all.org/2008-August/000977.html
06:20
<Hixie>
well at least they admit that they are trying to use conformance checking for their own purposes
06:21
<Hixie>
and good to see others on that thread disagreeing with it :-)
07:13
<hsivonen>
wow. when I fixed bugs in my validation harness, it ran in 4 hours and the output was only 83.4 MB.
07:16
<hsivonen>
annevk: typo. thanks
07:29
<Hixie>
hsivonen: heh
07:46
<hsivonen>
whoa! there are many more 0-error docs than I would have thought
07:49
<Hixie>
hsivonen: 2?
07:50
<hsivonen>
Hixie: 4514
07:50
<Hixie>
out of a million?
07:50
<hsivonen>
out of 516875
07:50
<hsivonen>
and manual verification shows that it's really so
07:50
<hsivonen>
however, this ignores the document mode
07:50
<Hixie>
0.87%
07:51
<hsivonen>
so doctypeless files count
07:51
<Hixie>
does bgcolor in a transitional doc count as pass or fail?
07:51
<hsivonen>
fail
07:51
<hsivonen>
this is HTML5 rules
07:51
<hsivonen>
except for doctype
07:51
<Hixie>
wow that's not bad then
07:52
<hsivonen>
note that omitted alt doesn't count as an error
07:52
<Hixie>
what are we saying, that's horrific. but still. higher than i expected.
07:52
<hsivonen>
and IRIs on non-UTF-8 pages pass
07:52
<hsivonen>
no parse errors (doctype errors ignored) is 29%
07:53
<hsivonen>
which is rather high compared to your old numbers
07:54
<hsivonen>
but now the results look pretty consistent with what I've seen before in terms of the relative frequencies
07:55
<Hixie>
i had two numbers, one that counted /> and doctypes as errors and one that didn't
07:55
<hsivonen>
ah
07:56
<Hixie>
i forget what my exact numbers were
07:56
<Hixie>
but one was about 70% and one was about 90%
08:25
<annevk>
hsivonen, MB or GB?
08:26
<annevk>
gsnedders, hmm, you didn't do your checkin
08:34
<hsivonen>
annevk: MB
08:35
<hsivonen>
annevk: the harness used to have a simple but serious bug
08:35
<annevk>
but you expected 80GB initially?!
08:35
<hsivonen>
annevk: 60 GB actually, but that expectation was based on the bug, too
08:35
<annevk>
ok
08:46
<Hixie>
hsivonen: very interesting results
08:47
<Hixie>
hsivonen: these results really argue for consolidating all "attribute [known presentational attribute] not allowed" messages into a single message "This page contains presentational markup. More details... Help on removing presentational markup..."
08:48
<Hixie>
wow, 7% of pages had an </embed> ?
08:49
<hsivonen>
so it seems
08:49
<hsivonen>
crazy
08:49
<annevk>
lots of people think <embed> needs a closing tag, I once did so too
08:49
<annevk>
it's not like there was good documentation out there on how it works...
08:51
<Hixie>
wow, malformed byte sequences aren't that common either
08:51
<annevk>
"No “p” element in scope but a “p” end tag seen." 9%!
08:52
<annevk>
madness
08:52
<Hixie>
that's probably a lot of <p><table></table></p>-type stuff
08:53
<annevk>
and 5% had "Element “frameset” not allowed in this context. (The parent was element “html”.) Suppressing further errors from this subtree." so many frames still around?
08:53
<Hixie>
this sample didn't bias for date of creation
08:53
<Hixie>
so it includes stuff going back many years
08:53
<Hixie>
there's a lot of old content out there still
08:55
<Hixie>
sigh i really don't want to reintroduce <script language="">, people typo it so much
08:55
<Hixie>
and the & issue is a sad one
08:55
<annevk>
>2% uses <head profile>
08:56
<Hixie>
iirc there's a lot of pages that have <head profile=""> (blank)
08:56
<hsivonen>
annevk: wordpress.com gives distinct host names to users
08:56
<hsivonen>
annevk: livejournal, too
08:56
<Hixie>
like there are a lot of <a> elements with shape="rect"
08:56
<hsivonen>
annevk: I was too lazy to deal with those
08:56
<hsivonen>
annevk: although I did collapse MySpace profiles
08:57
<Hixie>
hsivonen: i'll give you a domain-separated set of urls next time instead of site-separated
08:58
<Hixie>
maybe we should make & followed by alphanumerics, followed by =, a non-ambiguous ampersand
08:58
<Hixie>
that might deal with a bunch of these & errors
08:58
<annevk>
"Bad value (consolidated) for attribute “lang” from namespace “http://www.w3.org/XML/1998/namespace” on element “html”: Bad language tag: Bad variant subtag." XML sites were included?
08:59
<hsivonen>
annevk: no
08:59
<hsivonen>
annevk: the validator sees HTML lang as XML lang internally
08:59
<takkaria>
Hixie: I think that could be a big win for authoring
08:59
<hsivonen>
annevk: and these messages weren't fully sanitized for UI consumption
08:59
<annevk>
Hixie, maybe also allow anything but [a-Z#]
09:00
<annevk>
to follow it
09:00
<Hixie>
annevk: ?
09:00
<annevk>
&" would be conforming
09:00
<annevk>
and so would 2&2
09:01
<Hixie>
the character encoding thing -- we could make <meta charset> allowed if not preceeded by any non-ASCII
09:01
<annevk>
or (&)
09:01
<Hixie>
annevk: i posit that the problem is just urls in attributes
09:03
<takkaria>
fwiw I'd prefer the "get a character reference" algorithm not to depend on whether you're in an attribute value state or not
09:03
<annevk>
I don't see what's wrong loosening them up both, given that you keep several extension points
09:03
<annevk>
takkaria, it already does
09:04
<annevk>
takkaria, and if we are to keep compat with IE, it has to be that way
09:05
<takkaria>
I mean in this particular case. i.e. if you can paste an unescaped URL into an attribute value you should also be able to conformingly paste it outside an attribute value
09:07
<annevk>
that wouldn't work well
09:08
<annevk>
eg, it would go wrong with &AMP= which does different things
09:09
<takkaria>
mm, that's a point
09:10
<takkaria>
ah well. it would be nice, though
09:13
<annevk>
at this point chaals would ask for a pony
09:51
<annevk>
grmbl, how do you properly configure lxml?
09:52
<annevk>
unzipped it's 25MB
10:34
<hsivonen>
Unsupported character encoding name: “iso-utf-8”. Will continue sniffing.
10:34
<hsivonen>
Unsupported character encoding name: “44-iso-8859-1”. Will continue sniffing.
10:35
<hsivonen>
crazy ebcdic charset in HTTP: http://web-sniffer.net/?url=http%3A%2F%2Fwww.antalis.fr%2Fsitesweb%2FFO%2Fpages%2Finterne-2-66-2122-rich_text-73228.html&submit=Submit&http=1.1&type=GET&uak=0
10:36
<hsivonen>
Unsupported character encoding name: “gb2312,big5,euc-kr”. Will sniff.
10:37
<hsivonen>
Unsupported character encoding name: “zh-tw”. Will sniff.
10:37
<hsivonen>
you can't make this stuff up
10:54
<jgraham>
hsivonen: btw, I'm not sure that such a thing as an unbiased sample of webpages exists
10:55
<hsivonen>
jgraham: sure. I said it was biased. :-)
10:55
<Philip`>
You can't even know how it's biased, because you can't know what the population is
10:56
<jgraham>
hsivonen: I know. I just think it's a tautology
10:56
<hsivonen>
yeah
10:56
<hsivonen>
and, yet, with different page sets, the same common errors come to the top
10:59
<jgraham>
In some sense approximately all the pages on the web are autogenerated pages which use the url to determine the content e.g. calendar.example.com/year/month/day with only implementation limits on the value of year
11:00
<jgraham>
So an unbiased sample of the whole population of http URLs that return 200 would be very misleading
11:01
<Philip`>
"approximately all" is not a concept that makes sense, where there's an infinite number of pages
11:03
<hsivonen>
more to the point, the number of pages in countably infinite which should make counting proportions a bit more tractable
11:04
<hsivonen>
Unsupported character encoding name: “big6”. Will sniff.
11:04
<gsnedders>
annevk: I asked if you wanted me to do it last night so you could work on it this morning. I got no answer :P
11:04
<Philip`>
Positive integers are countably infinite too, but it doesn't make sense to ask for an unbiased random sampling of positive integers
11:05
<gsnedders>
annevk: I took the default-lazy solution
11:05
<hsivonen>
Philip`: true, but you can say that half of the integers are positive
11:05
<Philip`>
hsivonen: No you can't :-p
11:06
<annevk>
gsnedders, I thought the default was yes!
11:06
<Philip`>
For every positive integer you give me, I'll give you back two negative integers, so there's twice as many :-)
11:06
<hsivonen>
Philip`: hmm. right.
11:06
<hsivonen>
now I appear silly and badly educated
11:06
gsnedders
attempts to cd Documents/Stuff\ I\'m\ Working\ On/spec-gen
11:06
<annevk>
gsnedders, I would appreciate a bundle of lxml+anolis+html5lib so I can just write the frontend script and don't have to worry about the bundling as I'm really bad at that
11:07
annevk
tried it this morning and couldn't get the lxml dependency to work
11:07
<gsnedders>
annevk: I've never tried bundling :)
11:07
<gsnedders>
annevk: lxml is written in C, which may make it harder
11:07
<Philip`>
(It does make sense to ask for an unbiased random real number between 0 and 1, even though that's an uncountable set)
11:07
<annevk>
gsnedders, I think that's the problem, yes
11:07
<Philip`>
(or at least I think it makes sense)
11:08
<gsnedders>
But it really does need to be for the sake of being reasonably quick
11:09
<annevk>
what's a difference between a pleonasm and tautology?
11:09
<Hixie>
hsivonen, Philip`: in this particular case the population was itself a (biased, non-random) subset of google's index
11:10
<annevk>
ah I see, tautology is also used in logic
11:11
<Hixie>
a tautology is specifically being overly specific in a redundant manner. a pleonasm is just using too many words. as i understand it.
11:12
<Philip`>
I think the logical meaning of tautology is a statement that's true regardless of the values of any variables in it
11:12
<annevk>
maybe the Dutch and English pleonasm are different then (in Dutch "round circle" is considered a "pleonasme")
11:12
<annevk>
Philip`, yeah
11:13
Philip`
guesses that must include all true statements that don't have any variables
11:14
<annevk>
"2. Logic. An empty or vacuous statement composed of simpler statements in a fashion that makes it logically true whether the simpler statements are factually true or false; for example, the statement Either it will rain tomorrow or it will not rain tomorrow."
11:16
<GregHouston>
Logical "proofs" of the existence of God generally falls into the category of a tautology.
11:18
<annevk>
gsnedders, anyway, for you the stuff is running right? can't you just zip that dir? :)
11:19
<gsnedders>
annevk: Only if you're running OS X/x86 :)
11:19
<gsnedders>
As of course the compiled C stuff…
11:21
<annevk>
grmbl
11:24
<annevk>
so how do I install lxml?
11:24
<annevk>
running setup.py install fails
11:25
<gsnedders>
annevk: http://codespeak.net/lxml/installation.html :P
11:26
<virtuelv>
annevk: sudo apt-get install python-lxml :P
11:27
<annevk>
hmm
11:27
annevk
wonders if dreamhost supports that
11:27
<virtuelv>
they don't
11:28
<gsnedders>
You need to install it in a custom path
11:28
<virtuelv>
on slicehost, that stuff is a bit easier, given that you have root
11:28
<annevk>
"annevk is not in the sudoers file. This incident will be reported."
11:29
<annevk>
gsnedders, DreamHost doesn't have easy_install
11:32
<Philip`>
Do they have hard_install?
11:33
<Hixie>
there appears to be an inverse corrolation between how much actual useful research someone has done, and how much they ask people who are doing research to do more
11:33
<annevk>
-_-
11:34
gsnedders
is gonna have to install it on (mt)
11:35
<Philip`>
Hixie: That would be because the people who can do research themselves do it themselves instead of having to ask others :-)
11:35
<annevk>
grmbl, even if I do apt-get on my local machine it complains about lxml.html not being there :/
11:35
<Hixie>
that and they know how much work it is, i imagine
11:35
<Philip`>
It would be nicer if they said *why* they wanted that research, and what useful information it would be likely to reveal
11:42
<annevk>
gsnedders, I guess the lxml dependency is pretty big?
11:42
<gsnedders>
annevk: Yeah.
11:42
<annevk>
sigh
11:43
<gsnedders>
annevk: It's the structure used for the tree everywhere
11:43
<jgraham>
Philip`: It's not clear to me that there are an infinite number of web pages given likely limits on URL length supported by servers
11:44
<jgraham>
annevk: If you want python to work sensibly on Dreamhost you have to install it youself under your home directory
11:44
<jgraham>
Then you install easy_install
11:44
<jgraham>
Then you do easy_install lxml
11:45
<jgraham>
Then you just have to rember to change anything like #!/usr/bin/env python to #!/home/annevk/bin/python
11:46
<jgraham>
Otherwise using any external dependencies seems to be really hard
11:47
<gsnedders>
Not really
11:48
<jgraham>
gsnedders: It's getting the paths right so you can import stuff that seemed to be hard
11:48
<gsnedders>
export PYTHONPATH=${HOME}/packages/lib/python
11:48
<gsnedders>
export PATH=${HOME}/packages/bin:$PATH
11:48
<gsnedders>
in .bash_profile
11:48
<gsnedders>
That's what used on sp.org
11:48
<jgraham>
Hmm, I thought I tried that and it didn't work
11:49
<jgraham>
Anyway setting PYTHONPATH is a bad idea in general
11:49
<gsnedders>
That's true, but it works ;P
11:50
<gsnedders>
annevk: See what I just pushed
11:50
<gsnedders>
i.e., http://hg.gsnedders.com/hgwebdir.cgi/anolis/rev/cf4770338aa0
11:53
<virtuelv>
annevk: there is some tutorial for rolling your own python on DH
11:53
<virtuelv>
http://wiki.dreamhost.com/Python#Building_a_custom_version_of_Python
15:27
<gsnedders>
Time to go out into town to do something about the /topic
15:28
<jcranmer>
gsnedders: you're leaving your sense of logic behind?
16:07
<virtuelv>
gsnedders: I presume you'll put URL in /topic
16:19
gsnedders
is too impatient to wait in a queue of the length there was
16:20
<gsnedders>
(i.e., my hair is still the same old colour)
17:42
<hsivonen>
weird. my Mac had bluescreened (literally) while unattended
17:45
gsnedders
still has never got a pinkscreen
17:45
<Lachy>
hsivonen, do you mean a kernel panic?
17:46
<Lachy>
AFAIK, macs can't get BSODs
17:46
<gsnedders>
Lachy: They can however get stuck on a blank blue screen
17:46
<gsnedders>
Lachy: For no apparent reason
17:47
<Lachy>
I've never seen that
17:48
gsnedders
tries to decide in what order to post his blog posts
17:48
<Lachy>
I've had my machines have kernel panics a couple of times, and just freeze with the spinning beachball cursor.
17:48
<Lachy>
gsnedders, I'd recommend starting with number 1 followed by number 2
17:49
<gsnedders>
Lachy: It would make more sense to do them in chronological order, but the earlier one is far more time-consuming to write
17:49
<Lachy>
ok
17:49
<Lachy>
I have a number of blog posts I have to finish writing
17:50
<Lachy>
I suppose I should just post something about IE8 tonight, and then post my other, significantly longer, potentially 3-part series later
17:52
<gsnedders>
I have eight drafts currently
17:53
<gsnedders>
One gives a useful answer to <http://krijnhoetmer.nl/irc-logs/whatwg/20080605#l-450>;
17:54
<gsnedders>
The other follows on from that
17:55
<Lachy>
I'd forgotten I'd even asked that question. I suppose it'll be good to get a better answer than "Stuff"
17:55
<gsnedders>
It was the first place I could think of that has a public record of me avoiding that question.
17:59
<gsnedders>
Writing about May last year is rather time-consuming.
18:03
gsnedders
smacks his old writing
18:04
<gsnedders>
It uses -ise :(
18:04
<Lachy>
VMWare ThinApp is absolutely brilliant! Now I can seamlessly run IE6, IE7, and IE8b1 and IE8b2 all within the same copy of Windows XP, which is itself running in VMWare Fusion on OS X.
18:05
<Lachy>
it basically runs each version of IE, or any other application I like, within its own sandbox
18:06
<gsnedders>
As long as no sand falls over the edge, I guess that's all right
18:07
<Lachy>
gsnedders, what is wrong with using -ise?
18:07
<gsnedders>
Lachy: en-gb-oed prefers -ize :P
18:07
<Lachy>
what?!
18:07
<Lachy>
nooO!
18:08
<Lachy>
-ize is wrong. Stupid American misspelling
18:08
<gsnedders>
No, it isn't.
18:08
<Lachy>
yes, it is
18:08
<Lachy>
I thought en-GB used -ise, just like en-AU
18:08
<gsnedders>
-ize comes from Greek, and should be used on Greek-derived words
18:09
<GregHouston>
Am I looking at the right thing. It looks like Thin App starts around $6000. I have Workstation and it was a little under $200.
18:10
<gsnedders>
en-gb only uses -ise, en-gb-oed uses -ize for words of Greek origin and -ise for those of French, en-us uses -ize
18:10
<gsnedders>
"[T]he suffix…, whatever the element to which it is added, is in its origin the Gr[eek] -ιζειν, L[atin] -izāre; and, as the pronunciation is also with z, there is no reason why in English the special French spelling in -iser should be followed, in opposition to that which is at once etymological and phonetic." — the OED
18:11
<gsnedders>
en-us also over does the entire z thing. Analyze is wrong.
18:11
<Lachy>
hmm, interesting
18:12
<gsnedders>
en-gb uses -ise too much, en-us uses -ize too much
18:12
<Lachy>
I still think -ise should be used for *everything*
18:13
<Lachy>
except for words like prize which are supposed to end in -ize
18:14
<Lachy>
wiktionary says that it's supposed to be -ise for french-origin words and -ize for greek-origin words. But to do that, I would have to know the origin of each word before I tried to spell it
18:14
<gsnedders>
me wonders whether he really should add a certain girl on Facebook…
18:17
<gsnedders>
(She is all ready convinced that I'm secretly in love with her, which is totally untrue)
18:24
<GregHouston>
It appears Thin App really is $6k. Application virtualization must be pretty tricky to cost 20 times that of a virtual machine.
18:24
<GregHouston>
I can't multipy. Make that 30 times.
18:25
<GregHouston>
Or spell. * multiply
19:08
<Philip`>
jgraham: You only need a single custom HTTP server that supports arbitrary-length URLs, and then the web can have an infinite number of pages, and I would have thought at least one person would have made such a server
19:08
<Philip`>
If nobody has, I'll make one, just to prove my point :-p
20:10
<gsnedders>
Philip`: You have calendars that can be navigated endlessly. There's no need for custom HTTP servers.
20:17
<Philip`>
gsnedders: But those calendars might have finite URL limitations
20:23
<Philip`>
(even if it's only limited by the amount of RAM available)
21:58
<gsnedders>
Philip`: Your webserver that supports arbitrary-length URLs will have the same RAM limitations
22:00
<Philip`>
gsnedders: No it won't - it won't store the URL in memory
22:00
<gsnedders>
Philip`: It just returns something for any request?
22:03
<Philip`>
gsnedders: It could ignore the URL entirely, or it could do some streaming processing of it to calculate a finite output
22:03
<Philip`>
(I assume HTTP doesn't particularly like you sending the response before you've received the request, so you can't do anything like echo the URL back to the client)
22:04
<gsnedders>
I don't think RFC2616 actually forbids you from doing so…
22:48
<Philip`>
gsnedders: Does it never require you receive the whole header so you can detect invalid requests and send an appropriate response?
23:07
<gsnedders>
Philip`: I don't think so. But it's not the best of specs.
23:07
<gsnedders>
Philip`: It doesn't require anything in specific in the case of invalid requests
23:19
<annevk>
Hmm, installing Python on DreamHost might be ok, but I can't even get lxml running in Ubuntu...
23:34
Hixie
battles iPod and Time Machine woes
23:34
<Hixie>
looks like the USB ports on my cinema display are busted
23:34
<Hixie>
no idea how THAT happened
23:35
<Philip`>
Maybe they're jammed full of popcorn and coke
23:43
<annevk>
nn