01:19
Philip`
wonders if it'd be interesting to see what strings <acronym> and <abbr> are currently used for, to see if they're just interchangable in practice, or if it's best to stay far away from the whole topic
01:22
<Ketsuban>
The main problem with asking people to distinguish between <acronym> and <abbr> is that they don't know what the difference is (that is to say, an acronym is read out as the letters, like RSPCA, whereas an abbreviation is said, like NATO).
01:22
<Ketsuban>
I myself advocate keeping <acronym> around and saying people SHOULD use <acronym> for acronyms, but MAY use <abbr> if they don't know the difference.
01:23
<Ketsuban>
This is the friendliest solution for Web developers, but for developers of e.g. screen reading software it's pretty nightmarish.
01:24
<Philip`>
The main problem is that people who say they do know the difference disagree on what the difference is :-)
01:24
<Philip`>
Wikipedia says "The word acronym was coined during the mid-20th century for abbreviations pronounced as words, such as NATO and AIDS."
01:24
<Ketsuban>
Unfortunately that's not modern usage. =P
01:24
<Ketsuban>
I was taught acronyms are read out as letters.
01:26
<Philip`>
That's what I mean about people disagreeing
01:27
<Dashiva>
Is there anyone who disagrees that an acronym is also an abbreviation, regardless of what an acronym is defined to be?
01:27
<Ketsuban>
I don't disagree there.
01:28
<Ketsuban>
So I suppose that's the strongest argument you can give for dropping <acronym> altogether.
01:31
<Ketsuban>
But I think keeping <acronym> around but allowing unconditional use of <abbr> is marginally friendlier to the makers of screen readers etc.
01:34
<Philip`>
Screen readers are only helped if <acronym> is used mostly correctly, and if it's better to trust the markup than to guess, and it doesn't seem clear that that's the case
01:41
<annevk>
Philip`, that be useful, yes
01:41
<annevk>
would be, even
01:42
Philip`
should probably expand his collection of pages so there's a more useful amount of data
01:48
<mpt>
Do any existing screenreaders treat <abbr> differently from <acronym>?
01:48
<Philip`>
Do any treat them differently to <span>?
01:49
<mpt>
That's part of what I'm wondering :-)
02:08
<Philip`>
There's http://philip.html5.org/data/abbr-acronym.txt of quite limited usefulness or quality
02:11
<Hixie>
dunno what the two most common ones are from, but they seem highly pointless
02:11
<Hixie>
... title="CD">CD<...
02:12
<Philip`>
They're almost all http://www.imusic.dk/
02:12
<Hixie>
ah
02:13
<Hixie>
well this is prett convincing data as far as the elements being pointless goes
02:13
<Hixie>
as in, having both
02:13
<Hixie>
vs having one
02:14
<Philip`>
It looks like it wouldn't be good for a screen reader to try pronouncing <acronym>s as single words
02:14
<Philip`>
not to read out the individual letters
02:14
<Philip`>
which leaves them with zero good options
02:15
<Philip`>
s/not/nor/
02:24
Philip`
happens to see http://friendlybit.com/html/encyclopedia-of-html-elements/ saying "ACRONYM: No need to use this one, abbr is enough. Do we really need to differ between acronyms and abbreviations? What about initialisms and the other types of words?"
02:25
<Hixie>
screen readers are going to read it the same way they read text/plain
02:25
<Hixie>
which is to say, using their dictionary
02:25
<Hixie>
and heuristics
02:26
<jruderman>
<portmanteau>
02:27
<jruderman>
it could be a solution to the lame debate in http://en.wikipedia.org/wiki/Talk:Portmanteau#The_ubiquity_of_portmanteau : users could configure their browsers to display <portmanteau> differently
02:29
<Hixie>
"lame debate" is a redundant descriptor when linking to a URI with wiki/Talk: in it
02:33
<jruderman>
hehe
11:26
<takkaria>
I'm talking on a channel full of geeks who read the comic xkcd
11:26
<takkaria>
and it's amazing just how badly they misunderstand xhtml/html/rendering/css/the lot
11:26
<takkaria>
11:31 <+kremlin> You're suggesting that Firefox should parse XML files as if they were XHTML files, xipietotec?
11:26
<takkaria>
11:31 <+kremlin> The file extensions are different for a reason, you know.
11:27
<takkaria>
makes me wonder what hope the rest of the world has, really
11:27
<annevk>
rest of the world uses HTML :p
11:28
<takkaria>
apparently firefox 2 doesn't support XHTML so it just renders it as HTML
11:28
<Ketsuban>
Part of me thinks Firefox should render XHTML without any default themes at all beyond setting display: for the appropriate elements and styling form elements appropriately.
11:30
<jwalden>
eurgh
11:31
<Ketsuban>
But then I have really insane ideas sometimes. =P
11:33
<jwalden>
meh, we're all mad here
11:36
<takkaria>
now someone's saying that sometimes web servers don't serve all files ending in .html as text/html because sometimes it does content-sniffing
11:36
<takkaria>
sadly, I'm muted on that channel, so I can't join in the debate anymore
11:37
<jwalden>
*web servers* doing content sniffing? sheesh
11:38
<annevk>
in theory it was the idea that web servers would do that
11:38
<annevk>
for <meta http-equiv> for instance
11:40
<jwalden>
no kidding
11:40
<jwalden>
learn something new every day!
12:18
<webben>
takkaria: Firefox2 does support XHTML.
12:19
<webben>
It will render it as HTML only if you serve it as text/html
12:19
<takkaria>
webben: I know that, and now the person who told me that does too
12:20
<webben>
ok
12:20
<webben>
misunderstood what you meant by 'apparently'
12:21
<webben>
takkaria: there's a channel for xkcd readers?
12:23
<takkaria>
webben: kk. irc.xkcd.net
12:24
<takkaria>
or .com, I forget
12:24
<takkaria>
particularly #xkcd-signal; you get muted if you say something someone else has said before
12:24
<takkaria>
every time you do, your mute time gets doubled
12:24
<takkaria>
and every six hours it halves again
12:24
<webben>
hmm interesting
12:24
<webben>
ta
15:21
Lachy
attempted to go skiing today.
15:22
<Lachy>
unbelievably, the rental places didn't have skis with bindings large enough for my boots :-(
15:25
<jgraham_>
Oh so by "attempted" you actually mean "failed"
15:26
<Lachy>
yeah.
15:26
<jgraham_>
(I assumed you just meant you had been and were not very good)
15:27
<Lachy>
I do own my own crappy old straight skis that I bought second hand, but assumed I would be able to rent better skis
15:27
<Lachy>
so I didn't bother taking them
15:27
<Lachy>
I'm going to go buy some new skis tomorrow morning
15:28
jgraham_
has only been skiing on crappy dry slopes and even then not for many years
15:34
<gsnedders>
annevk: do browsers return the last header if you request Content-Type (or something else relevant to the protocol) and there are multiple headers of the type? What if there are occurrences of the header in the trailer of a chunked response?
15:35
gsnedders
wishes he could go skiing more often than once every few years :(
15:39
Philip`
wonders if 'sliding uncontrollably down a dry ski slope and sometimes not falling over' counts as skiing
15:40
<didymos>
Philip`, is there any other way? :)
15:40
gsnedders
thinks not
15:40
<Philip`>
Not in my personal experience :-)
15:40
gsnedders
has never fallen over a on a dry ski slope
15:41
<gsnedders>
I've never been on one either, but hey.
15:41
gsnedders
can just go up to Glenshee for the day
15:43
<hsivonen>
Philip`: the newes copy of your dmoz URL that I have downloaded is from July. Do you have a newer URL set available for download?
15:44
<hsivonen>
newest even
15:45
<Philip`>
hsivonen: I don't have a newer one - I've just been using one from before 2007-07-15
15:45
<Philip`>
(and it probably has the broken &amp; bits in it)
15:46
<hsivonen>
Philip`: ok
15:47
<hsivonen>
I have dmoz-unique-pages.txt.gz and dmoz-unique-pages-shuffle.txt.gz that are significantly different in size
15:48
<Philip`>
(If I remember correctly, it just came from http://rdf.dmoz.org/rdf/content.rdf.u8.gz and Perl regexps to extract the links, then sort and uniq)
15:48
<Philip`>
hsivonen: The uncompressed sizes should be identical
15:48
<hsivonen>
ok.
15:48
<Philip`>
but the shuffling hurts the compression a lot
15:48
<hsivonen>
how did you do the shuffling?
15:48
<Philip`>
Ideally I would have done it with 'sort -R'
15:50
<Philip`>
except that didn't actually shuffle things at all when I first tried it, so I just wrote a line of Perl to read get an array of [rand(), $uri] and then sorted by the random field and then printed it out again, which took a couple of gigabytes of memory and is not necessarily the best method
15:50
<Philip`>
('sort -R' works on one computer I use, but not on another, which is weird)
16:00
<Philip`>
hsivonen: If you're doing stuff with pages, http://canvex.lazyilluminati.com/misc/Test2.java might have some salvageably useful bits though it's full of bad ideas and copied-and-pasted chunks of code
16:03
<Philip`>
(I haven't even changed the original filename which it began evolving from long ago...)
16:07
<hsivonen>
Philip`: thanks
16:18
<hsivonen>
let's see what happens if I feed the first 10000 URLs from shuffle to Validator.nu
16:19
Philip`
wonders how long that will take
16:20
<Philip`>
(Parallelism definitely helps here, and is relatively trivial, which is nice)
16:20
<hsivonen>
this script is so simple that it doesn't have parallelism
16:20
<hsivonen>
I just run a simple python script on my own computer that feeds URIs sequentially to the Validator.nu Web service API
16:21
<Philip`>
How does it handle things like timeouts?
16:21
<hsivonen>
Philip`: it doesn't
16:21
<Philip`>
If it pauses for 30 seconds a few hundred times, it's going to be a bit painful
16:21
<hsivonen>
it's very likely that the setup is too simple
16:22
<hsivonen>
Philip`: Validator.nu itself has timeouts on its outgoing requests
16:22
<Philip`>
At least that's better than being too complex, so it sounds like a good place to start :-)
16:32
<SadEagle>
hmmm, lots of canvas changes
16:33
<Philip`>
and a lack of updated tests for those changes
16:33
<SadEagle>
so I am gonna be lazy for a bit :-)
16:34
<SadEagle>
thank goodness I have a centralized place to change just about all of the +/- inf and NaN handling
16:37
<Philip`>
Unfortunately I don't have a centralised place for that
16:39
Philip`
wonders how to make http://canvex.lazyilluminati.com/tests/tests/* redirect to http://philip.html5.org/tests/canvas/suite/tests/*
16:41
<Philip`>
Oh, with "Redirect" - that was easy
17:17
<hsivonen>
hmm. curiously, the 10000 url script started getting 503 from Validator.nu at some point without Validator.nu crashing
17:18
<hsivonen>
I wonder if mod_jk has some kind of DoS prevention that kicked in
17:18
<hsivonen>
or Apache itself
17:56
<annevk>
wow, that image maps are still so widely deployed
18:01
<hsivonen>
hmm. perhaps there are even more than one application/xhtml+xml site in the Alexa globar 500
18:01
<hsivonen>
global
18:01
<annevk>
wow
18:07
<Philip`>
What is the " 1 www.icio.us"?
18:08
<hsivonen>
Philip`: probably a regexping error
18:08
<Philip`>
It would be nice to show the number of pages that contain each error, rather than the total count
18:08
<Philip`>
s/rather than/as well as/
18:11
<Philip`>
http://my.opera.com/community/forums/topic.dml?id=163885&t=1202062644&page=1#comment2212326 - yay, XML
18:20
<blooberry>
hsivonen: did you find more than one application/xhtml+xml site in the alexa global 500 then? The "perhaps" make me curious. 8-}
18:21
<blooberry>
I only found one...
18:26
<hsivonen>
blooberry: I rechecked. there's only one.
18:26
<webben>
hsivonen: How are you requesting XHTML?
18:26
<blooberry>
iwiw.hu?
18:26
<hsivonen>
blooberry: yes
18:28
<hsivonen>
webben: Accept: application/xhtml+xml, application/xml; q=0.5, text/html; q=0.9
18:38
<hsivonen>
Philip`: each page counted at most once per error: http://hsivonen.iki.fi/test/moz/alexa500-page-collapsed-counts.txt
21:24
<gavin_>
hsivonen: www.iwiw.hu is sending text/html as far as I can tell...
21:24
<gavin_>
did it just change or something?
21:25
<gavin_>
I've tried a few different UAs, too
21:28
<hsivonen>
gavin_: Page Info says application/xhtml+xml in Minefield nightly
21:30
<gavin_>
ah, interesting
21:30
<Dashiva>
Same in Opera
21:31
<gavin_>
that is what I get for http://iwiw.hu/pages/user/login.jsp
21:32
<hsivonen>
www.iwiw.hu redirects me to http://www.iwiw.hu/pages/user/login.jsp which is application/xhtml+xml to Minefield
21:33
<gavin_>
yeah, I see that too
21:33
<gavin_>
but if I load the login.jsp URL in IE7, it works
21:33
<gavin_>
I thought IE7 barfed on application/xhtml+xml ?
21:33
<hsivonen>
most likely it varies the Content-Type on Accept
21:34
<gavin_>
ah, right
21:34
<webben>
gavin: curl -H 'accept: application/xhtml+xml,text/html;q=0' -v http://www.iwiw.hu/pages/user/login.jsp returns Content-Type: application/xhtml+xml;charset=UTF-8
21:34
<gavin_>
web-sniffer doesn't let me change Accept
21:35
<webben>
as does accept: application/xhtml+xml;q=0,text/html ... fail!
21:35
<webben>
curl -H 'accept: text/html' -v http://www.iwiw.hu/pages/user/login.jsp returns text/html
21:36
<webben>
someone tell them their content negotiation is borked ;)
21:50
<Philip`>
hsivonen: About error counts: Thanks
21:51
<Philip`>
Comparing to http://canvex.lazyilluminati.com/survey/2007-07-17/analyse.cgi/index#parse-errors there's a significant difference in the number with unencoded ampersands
21:51
<Philip`>
which I'd assume is due to top-500 sites being more likely to have dynamic pages with query strings needing ampersands, so that sounds quite plausible
22:16
<Hixie>
Lachy: if you have a chance, any way we can set up blog.whatwg.org/faq to be a redirect to the wiki faq?
22:22
<Lachy>
oh, damn. That had been done before, but with the last upgrade, I accidentally deleted the .htaccess.
22:22
<Lachy>
I'll check if I have a backup
22:23
<Lachy>
good, I do. Uploading it now
22:25
<Hixie>
thanks
22:25
<Hixie>
i fixed the link on the front page too
22:25
<Hixie>
(someone complained it was 404)
22:25
<Hixie>
so that shouldn't be a big issue any more
22:30
<Lachy>
all fixed
22:31
<Lachy>
ah, it looks like you removed the link entirely
22:32
<Hixie>
no i mean on www.whatwg.org
22:32
<Hixie>
made it point to the wiki
22:33
<Lachy>
oh, I thought you meant the blog's front page. The link to the FAQ seems to be missing from there
22:33
<Hixie>
odd
22:33
<Hixie>
didn't touch the blog
22:33
<Hixie>
update fallout?
22:34
<Lachy>
oh, maybe I never added the links again, after I moved the faq from the blog to the wiki
22:53
<zcorpan_>
hsivonen: in order to make the grouping feature more useful, perhaps the attribute value should be stripped from the message for errors about attribute values
22:56
<zcorpan_>
(or the grouping feature should just be smarter and do the same as you did with the alexia result list)
22:59
<zcorpan_>
looking at the bottom of that list shows that unmatched quotes is relatively common
23:00
<zcorpan_>
the very last error also seems harmless
23:01
<zcorpan_>
banning ; in attribute names might also be effective
23:11
<zcorpan_>
hsivonen: though, the grouping feature is intended to help with search-replace fixing rather than fix-the-spec-studying, so what to group might be different
23:15
<zcorpan_>
0022 / 491 Bad value (consolidated) for attribute “width” on element “img”: Zero is not a positive integer.
23:15
<zcorpan_>
i think that's <noscript><img src=tracker.cgi with=0 height=0>
23:15
<zcorpan_>
s/with/width/
23:20
<zcorpan_>
we might also want to allow width/height on <input type=image>
23:24
<zcorpan_>
0013 / 491 Stray “script” start tag.
23:24
<zcorpan_>
when can a script start tag be stray?
23:26
zcorpan_
ponders about <noscript><style scroped>