#whatwg on 2008-02-03

01:19	Philip`	wonders if it'd be interesting to see what strings <acronym> and <abbr> are currently used for, to see if they're just interchangable in practice, or if it's best to stay far away from the whole topic
01:22	<Ketsuban>	The main problem with asking people to distinguish between <acronym> and <abbr> is that they don't know what the difference is (that is to say, an acronym is read out as the letters, like RSPCA, whereas an abbreviation is said, like NATO).
01:22	<Ketsuban>	I myself advocate keeping <acronym> around and saying people SHOULD use <acronym> for acronyms, but MAY use <abbr> if they don't know the difference.
01:23	<Ketsuban>	This is the friendliest solution for Web developers, but for developers of e.g. screen reading software it's pretty nightmarish.
01:24	<Philip`>	The main problem is that people who say they do know the difference disagree on what the difference is :-)
01:24	<Philip`>	Wikipedia says "The word acronym was coined during the mid-20th century for abbreviations pronounced as words, such as NATO and AIDS."
01:24	<Ketsuban>	Unfortunately that's not modern usage. =P
01:24	<Ketsuban>	I was taught acronyms are read out as letters.
01:26	<Philip`>	That's what I mean about people disagreeing
01:27	<Dashiva>	Is there anyone who disagrees that an acronym is also an abbreviation, regardless of what an acronym is defined to be?
01:27	<Ketsuban>	I don't disagree there.
01:28	<Ketsuban>	So I suppose that's the strongest argument you can give for dropping <acronym> altogether.
01:31	<Ketsuban>	But I think keeping <acronym> around but allowing unconditional use of <abbr> is marginally friendlier to the makers of screen readers etc.
01:34	<Philip`>	Screen readers are only helped if <acronym> is used mostly correctly, and if it's better to trust the markup than to guess, and it doesn't seem clear that that's the case
01:41	<annevk>	Philip`, that be useful, yes
01:41	<annevk>	would be, even
01:42	Philip`	should probably expand his collection of pages so there's a more useful amount of data
01:48	<mpt>	Do any existing screenreaders treat <abbr> differently from <acronym>?
01:48	<Philip`>	Do any treat them differently to <span>?
01:49	<mpt>	That's part of what I'm wondering :-)
02:08	<Philip`>	There's http://philip.html5.org/data/abbr-acronym.txt of quite limited usefulness or quality
02:11	<Hixie>	dunno what the two most common ones are from, but they seem highly pointless
02:11	<Hixie>	... title="CD">CD<...
02:12	<Philip`>	They're almost all http://www.imusic.dk/
02:12	<Hixie>	ah
02:13	<Hixie>	well this is prett convincing data as far as the elements being pointless goes
02:13	<Hixie>	as in, having both
02:13	<Hixie>	vs having one
02:14	<Philip`>	It looks like it wouldn't be good for a screen reader to try pronouncing <acronym>s as single words
02:14	<Philip`>	not to read out the individual letters
02:14	<Philip`>	which leaves them with zero good options
02:15	<Philip`>	s/not/nor/
02:24	Philip`	happens to see http://friendlybit.com/html/encyclopedia-of-html-elements/ saying "ACRONYM: No need to use this one, abbr is enough. Do we really need to differ between acronyms and abbreviations? What about initialisms and the other types of words?"
02:25	<Hixie>	screen readers are going to read it the same way they read text/plain
02:25	<Hixie>	which is to say, using their dictionary
02:25	<Hixie>	and heuristics
02:26	<jruderman>	<portmanteau>
02:27	<jruderman>	it could be a solution to the lame debate in http://en.wikipedia.org/wiki/Talk:Portmanteau#The_ubiquity_of_portmanteau : users could configure their browsers to display <portmanteau> differently
02:29	<Hixie>	"lame debate" is a redundant descriptor when linking to a URI with wiki/Talk: in it
02:33	<jruderman>	hehe
11:26	<takkaria>	I'm talking on a channel full of geeks who read the comic xkcd
11:26	<takkaria>	and it's amazing just how badly they misunderstand xhtml/html/rendering/css/the lot
11:26	<takkaria>	11:31 <+kremlin> You're suggesting that Firefox should parse XML files as if they were XHTML files, xipietotec?
11:26	<takkaria>	11:31 <+kremlin> The file extensions are different for a reason, you know.
11:27	<takkaria>	makes me wonder what hope the rest of the world has, really
11:27	<annevk>	rest of the world uses HTML :p
11:28	<takkaria>	apparently firefox 2 doesn't support XHTML so it just renders it as HTML
11:28	<Ketsuban>	Part of me thinks Firefox should render XHTML without any default themes at all beyond setting display: for the appropriate elements and styling form elements appropriately.
11:30	<jwalden>	eurgh
11:31	<Ketsuban>	But then I have really insane ideas sometimes. =P
11:33	<jwalden>	meh, we're all mad here
11:36	<takkaria>	now someone's saying that sometimes web servers don't serve all files ending in .html as text/html because sometimes it does content-sniffing
11:36	<takkaria>	sadly, I'm muted on that channel, so I can't join in the debate anymore
11:37	<jwalden>	web servers doing content sniffing? sheesh
11:38	<annevk>	in theory it was the idea that web servers would do that
11:38	<annevk>	for <meta http-equiv> for instance
11:40	<jwalden>	no kidding
11:40	<jwalden>	learn something new every day!
12:18	<webben>	takkaria: Firefox2 does support XHTML.
12:19	<webben>	It will render it as HTML only if you serve it as text/html
12:19	<takkaria>	webben: I know that, and now the person who told me that does too
12:20	<webben>	ok
12:20	<webben>	misunderstood what you meant by 'apparently'
12:21	<webben>	takkaria: there's a channel for xkcd readers?
12:23	<takkaria>	webben: kk. irc.xkcd.net
12:24	<takkaria>	or .com, I forget
12:24	<takkaria>	particularly #xkcd-signal; you get muted if you say something someone else has said before
12:24	<takkaria>	every time you do, your mute time gets doubled
12:24	<takkaria>	and every six hours it halves again
12:24	<webben>	hmm interesting
12:24	<webben>	ta
15:21	Lachy	attempted to go skiing today.
15:22	<Lachy>	unbelievably, the rental places didn't have skis with bindings large enough for my boots :-(
15:25	<jgraham_>	Oh so by "attempted" you actually mean "failed"
15:26	<Lachy>	yeah.
15:26	<jgraham_>	(I assumed you just meant you had been and were not very good)
15:27	<Lachy>	I do own my own crappy old straight skis that I bought second hand, but assumed I would be able to rent better skis
15:27	<Lachy>	so I didn't bother taking them
15:27	<Lachy>	I'm going to go buy some new skis tomorrow morning
15:28	jgraham_	has only been skiing on crappy dry slopes and even then not for many years
15:34	<gsnedders>	annevk: do browsers return the last header if you request Content-Type (or something else relevant to the protocol) and there are multiple headers of the type? What if there are occurrences of the header in the trailer of a chunked response?
15:35	gsnedders	wishes he could go skiing more often than once every few years :(
15:39	Philip`	wonders if 'sliding uncontrollably down a dry ski slope and sometimes not falling over' counts as skiing
15:40	<didymos>	Philip`, is there any other way? :)
15:40	gsnedders	thinks not
15:40	<Philip`>	Not in my personal experience :-)
15:40	gsnedders	has never fallen over a on a dry ski slope
15:41	<gsnedders>	I've never been on one either, but hey.
15:41	gsnedders	can just go up to Glenshee for the day
15:43	<hsivonen>	Philip`: the newes copy of your dmoz URL that I have downloaded is from July. Do you have a newer URL set available for download?
15:44	<hsivonen>	newest even
15:45	<Philip`>	hsivonen: I don't have a newer one - I've just been using one from before 2007-07-15
15:45	<Philip`>	(and it probably has the broken & bits in it)
15:46	<hsivonen>	Philip`: ok
15:47	<hsivonen>	I have dmoz-unique-pages.txt.gz and dmoz-unique-pages-shuffle.txt.gz that are significantly different in size
15:48	<Philip`>	(If I remember correctly, it just came from http://rdf.dmoz.org/rdf/content.rdf.u8.gz and Perl regexps to extract the links, then sort and uniq)
15:48	<Philip`>	hsivonen: The uncompressed sizes should be identical
15:48	<hsivonen>	ok.
15:48	<Philip`>	but the shuffling hurts the compression a lot
15:48	<hsivonen>	how did you do the shuffling?
15:48	<Philip`>	Ideally I would have done it with 'sort -R'
15:50	<Philip`>	except that didn't actually shuffle things at all when I first tried it, so I just wrote a line of Perl to read get an array of [rand(), $uri] and then sorted by the random field and then printed it out again, which took a couple of gigabytes of memory and is not necessarily the best method
15:50	<Philip`>	('sort -R' works on one computer I use, but not on another, which is weird)
16:00	<Philip`>	hsivonen: If you're doing stuff with pages, http://canvex.lazyilluminati.com/misc/Test2.java might have some salvageably useful bits though it's full of bad ideas and copied-and-pasted chunks of code
16:03	<Philip`>	(I haven't even changed the original filename which it began evolving from long ago...)
16:07	<hsivonen>	Philip`: thanks
16:18	<hsivonen>	let's see what happens if I feed the first 10000 URLs from shuffle to Validator.nu
16:19	Philip`	wonders how long that will take
16:20	<Philip`>	(Parallelism definitely helps here, and is relatively trivial, which is nice)
16:20	<hsivonen>	this script is so simple that it doesn't have parallelism
16:20	<hsivonen>	I just run a simple python script on my own computer that feeds URIs sequentially to the Validator.nu Web service API
16:21	<Philip`>	How does it handle things like timeouts?
16:21	<hsivonen>	Philip`: it doesn't
16:21	<Philip`>	If it pauses for 30 seconds a few hundred times, it's going to be a bit painful
16:21	<hsivonen>	it's very likely that the setup is too simple
16:22	<hsivonen>	Philip`: Validator.nu itself has timeouts on its outgoing requests
16:22	<Philip`>	At least that's better than being too complex, so it sounds like a good place to start :-)
16:32	<SadEagle>	hmmm, lots of canvas changes
16:33	<Philip`>	and a lack of updated tests for those changes
16:33	<SadEagle>	so I am gonna be lazy for a bit :-)
16:34	<SadEagle>	thank goodness I have a centralized place to change just about all of the +/- inf and NaN handling
16:37	<Philip`>	Unfortunately I don't have a centralised place for that
16:39	Philip`	wonders how to make http://canvex.lazyilluminati.com/tests/tests/* redirect to http://philip.html5.org/tests/canvas/suite/tests/*
16:41	<Philip`>	Oh, with "Redirect" - that was easy
17:17	<hsivonen>	hmm. curiously, the 10000 url script started getting 503 from Validator.nu at some point without Validator.nu crashing
17:18	<hsivonen>	I wonder if mod_jk has some kind of DoS prevention that kicked in
17:18	<hsivonen>	or Apache itself
17:56	<annevk>	wow, that image maps are still so widely deployed
18:01	<hsivonen>	hmm. perhaps there are even more than one application/xhtml+xml site in the Alexa globar 500
18:01	<hsivonen>	global
18:01	<annevk>	wow
18:07	<Philip`>	What is the " 1 www.icio.us"?
18:08	<hsivonen>	Philip`: probably a regexping error
18:08	<Philip`>	It would be nice to show the number of pages that contain each error, rather than the total count
18:08	<Philip`>	s/rather than/as well as/
18:11	<Philip`>	http://my.opera.com/community/forums/topic.dml?id=163885&t=1202062644&page=1#comment2212326 - yay, XML
18:20	<blooberry>	hsivonen: did you find more than one application/xhtml+xml site in the alexa global 500 then? The "perhaps" make me curious. 8-}
18:21	<blooberry>	I only found one...
18:26	<hsivonen>	blooberry: I rechecked. there's only one.
18:26	<webben>	hsivonen: How are you requesting XHTML?
18:26	<blooberry>	iwiw.hu?
18:26	<hsivonen>	blooberry: yes
18:28	<hsivonen>	webben: Accept: application/xhtml+xml, application/xml; q=0.5, text/html; q=0.9
18:38	<hsivonen>	Philip`: each page counted at most once per error: http://hsivonen.iki.fi/test/moz/alexa500-page-collapsed-counts.txt
21:24	<gavin_>	hsivonen: www.iwiw.hu is sending text/html as far as I can tell...
21:24	<gavin_>	did it just change or something?
21:25	<gavin_>	I've tried a few different UAs, too
21:28	<hsivonen>	gavin_: Page Info says application/xhtml+xml in Minefield nightly
21:30	<gavin_>	ah, interesting
21:30	<Dashiva>	Same in Opera
21:31	<gavin_>	that is what I get for http://iwiw.hu/pages/user/login.jsp
21:32	<hsivonen>	www.iwiw.hu redirects me to http://www.iwiw.hu/pages/user/login.jsp which is application/xhtml+xml to Minefield
21:33	<gavin_>	yeah, I see that too
21:33	<gavin_>	but if I load the login.jsp URL in IE7, it works
21:33	<gavin_>	I thought IE7 barfed on application/xhtml+xml ?
21:33	<hsivonen>	most likely it varies the Content-Type on Accept
21:34	<gavin_>	ah, right
21:34	<webben>	gavin: curl -H 'accept: application/xhtml+xml,text/html;q=0' -v http://www.iwiw.hu/pages/user/login.jsp returns Content-Type: application/xhtml+xml;charset=UTF-8
21:34	<gavin_>	web-sniffer doesn't let me change Accept
21:35	<webben>	as does accept: application/xhtml+xml;q=0,text/html ... fail!
21:35	<webben>	curl -H 'accept: text/html' -v http://www.iwiw.hu/pages/user/login.jsp returns text/html
21:36	<webben>	someone tell them their content negotiation is borked ;)
21:50	<Philip`>	hsivonen: About error counts: Thanks
21:51	<Philip`>	Comparing to http://canvex.lazyilluminati.com/survey/2007-07-17/analyse.cgi/index#parse-errors there's a significant difference in the number with unencoded ampersands
21:51	<Philip`>	which I'd assume is due to top-500 sites being more likely to have dynamic pages with query strings needing ampersands, so that sounds quite plausible
22:16	<Hixie>	Lachy: if you have a chance, any way we can set up blog.whatwg.org/faq to be a redirect to the wiki faq?
22:22	<Lachy>	oh, damn. That had been done before, but with the last upgrade, I accidentally deleted the .htaccess.
22:22	<Lachy>	I'll check if I have a backup
22:23	<Lachy>	good, I do. Uploading it now
22:25	<Hixie>	thanks
22:25	<Hixie>	i fixed the link on the front page too
22:25	<Hixie>	(someone complained it was 404)
22:25	<Hixie>	so that shouldn't be a big issue any more
22:30	<Lachy>	all fixed
22:31	<Lachy>	ah, it looks like you removed the link entirely
22:32	<Hixie>	no i mean on www.whatwg.org
22:32	<Hixie>	made it point to the wiki
22:33	<Lachy>	oh, I thought you meant the blog's front page. The link to the FAQ seems to be missing from there
22:33	<Hixie>	odd
22:33	<Hixie>	didn't touch the blog
22:33	<Hixie>	update fallout?
22:34	<Lachy>	oh, maybe I never added the links again, after I moved the faq from the blog to the wiki
22:53	<zcorpan_>	hsivonen: in order to make the grouping feature more useful, perhaps the attribute value should be stripped from the message for errors about attribute values
22:56	<zcorpan_>	(or the grouping feature should just be smarter and do the same as you did with the alexia result list)
22:59	<zcorpan_>	looking at the bottom of that list shows that unmatched quotes is relatively common
23:00	<zcorpan_>	the very last error also seems harmless
23:01	<zcorpan_>	banning ; in attribute names might also be effective
23:11	<zcorpan_>	hsivonen: though, the grouping feature is intended to help with search-replace fixing rather than fix-the-spec-studying, so what to group might be different
23:15	<zcorpan_>	0022 / 491 Bad value (consolidated) for attribute “width” on element “img”: Zero is not a positive integer.
23:15	<zcorpan_>	i think that's <noscript><img src=tracker.cgi with=0 height=0>
23:15	<zcorpan_>	s/with/width/
23:20	<zcorpan_>	we might also want to allow width/height on <input type=image>
23:24	<zcorpan_>	0013 / 491 Stray “script” start tag.
23:24	<zcorpan_>	when can a script start tag be stray?
23:26	zcorpan_	ponders about <noscript><style scroped>