#whatwg on 2008-02-26

00:51	<Hixie>	annevk: cool, thanks
01:40	Hixie	summons zcorpan
01:41	<Hixie>	anyone have an opinion on whether <ol> is appropriate for lists that happen to be ordered as opposed to lists where the order is significant?
01:41	<Hixie>	e.g. should a list of place names, ordered alphabetically, be <ol> or <ul>?
01:42	<tantek>	Hixie, example of each?
01:42	<tantek>	<ol> - since "alphabetically" could mean any number of a different canonical sorting orders given variations in language/culture/punctuation treatment.
01:43	<Hixie>	hm, interesting
01:43	<tantek>	no way to really automatically discern how such ordering was done if in a <ul> and thus if it has significance or not.
01:43	<Hixie>	you used the opposite argument to zcorpan in his feedback in 2006
01:43	<Hixie>	:-)
01:43	<tantek>	I've learned more about alphabetic sorting since then.
01:44	<Hixie>	no i mean he argued the opposite of you just now, in his mail from 2006
01:44	<Hixie>	he says:
01:44	<Hixie>	> I think <ol> is a list where the order is significant to the meaning;
01:44	<Hixie>	> where the order is emphasized. For lists that happen to be ordered but
01:44	<Hixie>	> the order isn't really of a big significance or isn't of higher
01:44	<Hixie>	> significance than the global order of the document, <ol> shouldn't be
01:44	<Hixie>	> used IMHO.
01:44	<Hixie>	> Otherwise people might use <ol> whenever a list happens to be in order, e.g.
01:44	<Hixie>	> an A-Z list...
01:44	<tantek>	One way to think about it is, if the author considers the ordering to be something they did deliberately (whether by hand in the markup), then the way to communicate "deliberate ordering" is by using an <ol>
01:45	<tantek>	(whether by hand in the markup, or by selection of some backend sorting option in a query)
01:45	<Hixie>	so <ul> elements must be in effectively random order?
01:45	<tantek>	not necessarily
01:45	<tantek>	randomness isn't necessary
01:46	<Hixie>	right, i mean effectively random, as opposed to necessarily truly random
01:46	<tantek>	If the author considers the order to be arbitrary (not necessarily random), or doesn't care about the order, then a <ul> makes sense.
01:46	<tantek>	<ol> = author cared about the order. <ul> = author didn't care about the order. That simple. I think.
01:46	<Hixie>	i suppose a list of place names that were arbitrarily ordered alphabetically as opposed to ordered alphabetically for any particular reason would mean <ul> then
01:48	<tantek>	again, that assumes that a universal canonical alphabetic ordering exists that everyone implicitly understands, which is a false assumption.
01:48	<Hixie>	how do you mean?
01:49	<tantek>	alphabetic ordering is not just by characters per unicode code points for example
01:50	<Hixie>	sure. i mean, the ordering could be anything, i was just using "alphabetical" as an arbitrary ordering
01:50	<tantek>	e.g. depending on the context of the data, say it were English names of works of art (like movies or songs)
01:51	<Hixie>	for example the list could be in order of the population size per place, but if that was just an arbitrary order selected by the author simply for presentational reasons, that would be a <ul>
01:51	<tantek>	often a leading word "The" is omitted from the canonical sorting
01:51	<tantek>	there are numerous other such rules. and that's just for English. names of works of art.
01:51	<tantek>	so there are language specific rules. domain specific rules. etc.
01:51	<Hixie>	whereas if the list was a list of places, ordered by population size to show the differences in population sizes, <ol> would be correct
01:52	<tantek>	I think for whatever reason, if the author intends an order, then it should be marked up with <ol>
01:52	<tantek>	One thought experiment you could do is, if the browser scrambled the set of <li>s inside a <ul>, how upset would the author be when viewing the page?
01:53	<Hixie>	right
01:53	<tantek>	so a list of places, ordered by name (or population), that would be bad
01:53	<tantek>	but a grocery shopping list, not a problem
01:53	<Hixie>	so if i'm just listing places i've been, and i happen to have put them in order of population size, i don't care if the browser reorders them to be by order of how much rainfall they get per year
01:54	<tantek>	right, if the author didn't take care with the order, then a <ul> - hence my reasoning above
01:54	<tantek>	"for whatever reason, if the author intends an order, then it should be marked up with <ol>"
01:54	<tantek>	otherwise use <ul>
01:54	<Hixie>	but if i have a grocery list in order of importance, or in order of which aisle the stuff is found in, then i'd want <ol>
01:54	<Hixie>	well
01:54	<Hixie>	i'm saying the author might "intend an order" but not care about it at the same time
01:54	<tantek>	right, but I have yet to see such a grocery list ;)
01:55	<tantek>	then intend an order wins out
01:55	<tantek>	the intent is enough to justify an <ol>
01:55	Hixie	often puts grocery lists in order of importance, so that he can make sure to get the important stuff first, since he is limited by volume when cycling from the shop :-)
01:55	<Hixie>	hmm
01:55	<tantek>	the intent doesn't have to be "big" or "significant" - because who decides that?
01:55	<Hixie>	so zcorpan is saying that intent doesn't win out, but that care does
01:56	<Hixie>	i wonder which is more useful
01:56	<tantek>	given that authors are the ones writing the markup, I find that going with respecting/modeling author intent usually produces better results. ;)
01:56	<tantek>	rather than some academic argument about how "significant" the ordering is
01:57	<Hixie>	well, the default rendering of <ul> is bullets, and i'd want bullets if i gave a list of places i'd been to, even if i ordered them (arbitrarily) by population size or lexically or by rainfall
01:57	<Hixie>	so that would argue against what you're saying :-)
01:57	tantek	puts shopping lists into clusters by type of store. two levels of unordered lists.
01:58	<tantek>	the English "i'd want bullets if i gave a list of places i'd been to, even if i ordered them" translates to the CSS: ol.places { list-style:disc }
01:58	<tantek>	because that "want" is presentational
01:58	<Hixie>	sure, but authors aren't going to think that way
01:58	<Hixie>	they'll just use <ul>
01:59	<Hixie>	and pretend that they didn't order the items
01:59	<tantek>	that's fine, then they are saying the order doesn't matter
01:59	<Hixie>	and that they just _happen_ to be in alphabetical order
01:59	<Hixie>	right -- the order doesn't matter, even though there is one intended
01:59	<Hixie>	they don't care about the order, even though they intended one
02:00	<tantek>	no. if they are pretending they didn't order the items, then they didn't order them.
02:00	<tantek>	we can't really assume otherwise.
02:00	<Hixie>	<ul> <li> Apple <li> Banana <li> Cherry </ul>
02:00	<Hixie>	i ordered that list.
02:00	<tantek>	in this case no care = no intent.
02:00	<Hixie>	that i'm using <ul> doesn't change the fact that i did indeed give an order
02:01	<Hixie>	now i might not care if the browser changes the order
02:01	<Hixie>	but i still intended one
02:02	<tantek>	so is the distinction then an intent of default presentational order vs. a semantic order?
02:02	<tantek>	because one could argue your ordering of that list was purely for presentational purposes. that is, you only intended it for presentational purposes, not for anything semantic.
02:03	<Hixie>	right
02:03	<Hixie>	i ordered it just to make it easier to scan
02:04	<tantek>	ok easy enough to modify my previous statement to take that into account:
02:04	<tantek>	"for whatever reason, if the author intends an order with some meaning behind it beyond just the presentation, then it should be marked up with <ol>"
02:12	<Hixie>	ok, i've tried to update the spec to explain this, along with some examples. http://www.whatwg.org/specs/web-apps/current-work/multipage/section-lists0.html
02:13	tantek	clicks and hope his browser doesn't lock up.
02:14	<tantek>	BTW, really like the zero or more. About darn time.
02:14	<tantek>	do you need to say "change the meaning of the document"?
02:14	<tantek>	wouldn't just "change the meaning." be sufficient ?
02:14	<Hixie>	change the meaning of what?
02:15	<tantek>	per English implied object resolution rules
02:15	<tantek>	the previous clause
02:15	<tantek>	"changing the order"
02:15	<tantek>	also implied object
02:15	<tantek>	see previous clause
02:15	<tantek>	"where the items have been intentionally ordered"
02:15	<tantek>	also implied object
02:15	<tantek>	see previous clause
02:16	<tantek>	"a list of items"
02:16	<tantek>	resolution complete
02:16	<tantek>	no need to bring "the document" into it.
02:16	<Hixie>	hmmm
02:16	<Hixie>	i _would_ like to remove the document from that setence
02:16	<tantek>	doesn't everyone read/write English like code?
02:17	<tantek>	;)
02:17	<othermaciej>	the way I think of it is <ol> --> numbered list, <ul> --> bulleted list
02:17	<othermaciej>	I would expect that is the typical author's operational understanding as well
02:17	tantek	smacks othermaciej for presentational-major thinking.
02:18	<Hixie>	othermaciej: i agree, but i think the current definition pretty much matches that
02:18	<tantek>	what do bullets sound like?
02:18	<Hixie>	about 100ms of silence
02:18	<othermaciej>	in normal English speech, nothing, they are just separators
02:18	<othermaciej>	to a screen reader, I dunno
02:18	<tantek>	othermaciej, that's why presentational-major thinking = FAIL.
02:19	<tantek>	Hixie, I suggested ditching the "... of the document." clause from both the <ul> and <ol> sections.
02:19	<Hixie>	"the meaning" on its own just doesn't sound right
02:19	<tantek>	Hixie, "sounding right" is irrelevant. It's a spec. The question is, does it parse?
02:19	<othermaciej>	tantek: I don't see how your argument addresses presentational thinking - it might be an argument against ever using bulleted lists, but I wouldn't buy that conclusion
02:19	<Hixie>	i'm leaving "of the document" until I (or someone else) can think of something better to replace it with
02:19	<othermaciej>	anyway, let me give a practical example
02:19	<tantek>	Hixie, empty space.
02:19	<Hixie>	tantek: actually i consider it sounding right to be of paramount importance to me :-)
02:19	<othermaciej>	sometimes, on the webkit blog, I post lists of new features or bug fixes
02:20	<othermaciej>	usually, I use <ul> for this since they are in no particular order, and mentally speaking, bullets are appropriate
02:20	<othermaciej>	on one occasion, I made a point of the fact that these were 10 features of particular interest
02:20	<othermaciej>	and in that case I used <ol>
02:20	<tantek>	Hixie, if you prefer to place a redundant object resolution place that's ok too, and English depends on redundancy for error correction.
02:20	<tantek>	e.g. s/of the document/of the list of items
02:20	<othermaciej>	I think those were both correct choices but I'd be hard-pressed to argue there is a deep semantic difference
02:21	<tantek>	since that "list of items" is what the object resolution demonstrated above
02:21	<tantek>	that should address the "sounding right" concern.
02:21	<Hixie>	othermaciej: well, if you have any suggested improvement to the text of the spec as it was just changed, i'm certainly open to it
02:22	<Hixie>	tantek: i'm not convinced people would understand what it meant if i did that
02:22	<Hixie>	it's not just the list that would change meaning
02:22	<Hixie>	it's the list plus any content referring to that list
02:22	<Hixie>	e.g. see the examples
02:22	<Hixie>	especially for <ol>
02:22	<othermaciej>	to be honest I didn't read the spec, I just wanted to point out that something based on subtle distinctions of how important or essential an order is would not match the mental model of the typical author
02:23	<Hixie>	othermaciej: i'm not sure defining them as being "when you want bullets" or "when you want numbers" would work either, though
02:24	<othermaciej>	I think the distinction between <ol> and <ul> is not intrinsic to the data, it's about whether the author wants to specify and emphasize the order
02:24	<othermaciej>	depending on authorial intent, you could meaningfully do either for the same list in the same context
02:24	<Hixie>	othermaciej: in your example of numbering them to emphasise the number of items, you are not emphasising the order.
02:24	<othermaciej>	for example, "top 10" lists are pretty arbitrary in their order, many could just as easily be "10 things" lists
02:25	<Hixie>	indeed
02:25	<Hixie>	so that's not emphasising the order
02:25	<othermaciej>	fair enough
02:25	<othermaciej>	although if the spec says that in that situation you should use numbers in the text content or <ul> with CSS counter styling, I am not sure it will be followed
02:26	<othermaciej>	(also not sure that it would ever be an improvement to do that)
02:26	<Hixie>	i would say that in those contexts the current definition works better
02:26	<Hixie>	the current definition is that the order matters
02:26	<Hixie>	and if you refer to "number 5", clearly the order matters
02:26	<Hixie>	since changing the order would change what each one was numbered
02:26	<othermaciej>	that's true
02:26	<Hixie>	<ol> is now defined as "The ol element represents a list of items, where the items have been intentionally ordered, such that changing the order would change the meaning of the document."
02:26	<othermaciej>	sometimes the numbers aren't there for order, but to provide referents for external references
02:26	<Hixie>	right
02:27	<othermaciej>	(which might not even be in the same document)
02:27	<othermaciej>	that seems like a good enough definition
02:27	<Hixie>	the current definition just says they've been intentionally ordered, not that they are ordered according to any grand scheme
02:27	<Hixie>	i like it
02:27	<Hixie>	yeah
02:27	<Hixie>	it has withstood your attempt at disproving it :-P
02:28	<othermaciej>	I wasn't trying to disprove it
02:28	<othermaciej>	I hadn't even read it at the time I commented
02:28	<othermaciej>	I was just reading scrollback and thought it sounded a bit abstract
02:28	<Hixie>	i know, i'm just teasing :-)
02:30	<othermaciej>	if I were trying to disprove something, you'd know it :-)
02:30	<Hixie>	hah
03:25	Hixie	adds more examples and their hidden meanings to the spec
04:02	<Hixie>	hm
04:02	<Hixie>	<ol reversed> is an interesting proposal
04:02	<Hixie>	i wonder how the browser vendors would react to it
04:24	Hixie	bcc's 50-odd people
04:25	<Hixie>	ok that was a reply to 100+ e-mails
04:25	<Hixie>	yay
04:26	<Hixie>	what's next... maybe section/p/hr stuff... maybe em/strong stuff...
04:26	<Hixie>	hm
04:27	<tantek>	Hixie, rather than reversed, why not allow a list-increment?
04:27	<tantek>	default: +1
04:28	<tantek>	-1 to achieve reverse
04:29	<Hixie>	reversed does more than just change the step increment
04:29	<Hixie>	it also changes the start value
04:29	<Hixie>	we can add step later if there's a use case for it
04:29	<tantek>	sure but there already is a start attr
04:29	<Hixie>	right but reversed changes the default start value
04:29	<tantek>	the use case is lists that are numbered in large increments
04:29	<tantek>	see patent docs
04:29	<tantek>	which often use increments of 2 or 10
04:29	<Hixie>	they can use value=""
04:29	<tantek>	for paragraphs etc
04:30	<Hixie>	decrementing to 1 is a much more common case
04:30	<tantek>	they can't often use "value" because that requires too much fix-up
04:30	<tantek>	for the same reason we don't ask all numbered lists to use "value"
04:30	<tantek>	on every list item
04:30	<Hixie>	the whole point of large numbers in those cases is that you don't need fixup to add an additional paragraph
04:31	<Hixie>	hence why value="" is the most appropriate
04:31	<Hixie>	(if you really want to use a list in those cases)
04:32	<tantek>	people will just use hacks then to get auto-numbered by +10 lists
04:32	<tantek>	like <li>...</li> then 9 empty <li> etc.
04:32	<Hixie>	i haven't seen that happen
04:32	<tantek>	then CSS to only have every 10th item show up
04:32	<Hixie>	i have seen peple use hacks for stepping down to -1, though
04:32	<Hixie>	(namely using scripts to do it)
04:33	<tantek>	yeah, i've seen them, but can't find any examples offhand
04:33	<Hixie>	see the e-mail i just sent whatwg for the research that was done to back up reversed="" -- it wasn't as extensive as with other cases like some of the research i've done, but it was eye-opening nonetheless
04:33	<tantek>	if i do i'll let you know
04:33	<Hixie>	cool, thanks
04:35	<Hixie>	lord, microsoft adcenter is trying to hire me
04:35	Hixie	informs them politely that he's happy working for the company that's whipping their asses already
04:37	<tantek>	Hixie, I do also remember running into the need to do reverse lists in the past as well, and having to resort to "value" manually to do it in my own markup.
04:45	<othermaciej>	Hixie: they probably got you from a list of people working for the company that's whipping their asses
04:52	<tantek>	question: is this a valid URL for this channel: irc://irc.freenode.net/whatwg ?
05:04	<takkaria>	tantek: I believe so
05:05	<tantek>	I wasn't able to use that "company that's whipping asses" to find an irc: URL validator.
05:14	<othermaciej>	tantek: there's two different expired drafts floating around
05:14	<othermaciej>	tantek: by the latest one, it should be irc://irc.freenode.net/#whatwg
05:15	<jruderman>	without a # works in more clients
05:15	<othermaciej>	or irc://irc.freenode.net/whatwg,ischannel
05:16	<othermaciej>	(not sure if anything implements either of those rules
05:16	<othermaciej>	)
07:27	<Hixie>	ok here goes
07:27	<Hixie>	63 elements on sections
07:27	<Hixie>	and related subjects
07:27	Hixie	beings replying
07:32	<othermaciej>	63 elements?
07:32	<othermaciej>	or do you mean 63 emails?
07:32	<othermaciej>	hello roc
07:33	<Hixie>	er yes
07:33	<Hixie>	e-mails
07:33	<roc>	hello
07:34	<jruderman>	roc!
07:43	<Hixie>	hey roc
08:28	<virtuelv>	hsivonen: yt?
08:29	<hsivonen>	virtuelv: yes
08:29	<virtuelv>	do you have any plans on releasing the source for your validator?
08:30	<hsivonen>	virtuelv: it's already Free Software with a publicly readable svn repo
08:30	<hsivonen>	virtuelv: I don't have immediate plans to cut version numbered release packages
08:30	<virtuelv>	hsivonen: Hm. I totally missed that
08:31	<hsivonen>	virtuelv: http://about.validator.nu/#src
08:31	<virtuelv>	hsivonen: thanks
08:32	<virtuelv>	Would be helpful if there was a ToC on the about page
08:36	<hsivonen>	virtuelv: ok. I'll add one in the next update of that page that I have in preparation
08:56	<Dashiva>	Dmitry wants to hold votes on his proposals, this could get messy
08:57	<Lachy>	Hixie, the top 10 movie list example contains movies that really shouldn't be in the top 10, and is missing others that should. :-)
08:57	<Lachy>	also, there's no example of using the start attribute given
08:57	<Hixie>	pah
08:57	<Hixie>	(re the movies)
08:57	<Hixie>	and yeah
08:57	<Hixie>	i need to add more examples
08:58	<Hixie>	i just added examples for the things people asked about
08:59	<Dashiva>	Hixie: Maybe you should take a pointer from FORTRAN and start labeling your step sequences, that way it's easy to say which sequence to abort :)
09:00	<Lachy>	you need an example that shows <ol start=100 reversed>, and explain why long reversed lists should ideally include a start attribute to help with incremental rendering
09:11	<Hixie>	Dashiva: maybe
09:11	<Hixie>	Lachy: if you really think so, send mail :-)
09:12	Hixie	is well into a big e-mail on sections by now
09:30	<Lachy>	Hixie, I will later
09:31	<Hixie>	k
09:34	<hendry>	what is the (code) name for the IE7 engine? e.g. Firefox -> Gecko
09:35	<hsivonen>	hendry: Trident
09:35	<hsivonen>	hendry: since IE 4.0
09:37	<hendry>	though aren't there more codenames as the engine in 6 is different to 7, if you know what i mean. i just re-read all that stuff that came up last month.
09:37	<hendry>	i should just re-read :)
09:37	<Hixie>	the code name for the IE engine is trident
09:37	<Hixie>	as far as know it hasn't got version-specific names
09:38	<madmoose>	Maybe for IE8 they'll rename it "Tridents".
09:39	<Hixie>	hah
09:41	jgraham_	wonders if they'll fulfil nominative-determinism and stop shipping new engines once they reach three
09:42	<jgraham_>	s/new/more/
09:42	<Hixie>	hm?
09:42	<jgraham_>	Trident == Three pronged, no?
09:45	<Hixie>	oh, that name
09:45	<Hixie>	i thought you meant IE8
09:45	<Hixie>	and couldn't work out how 8 would mean 3
09:55	<annevk>	DOM travelsal is complicated :(
09:55	<Hixie>	yeah
09:55	<annevk>	traversal, even
09:55	<Hixie>	you wanna fix it?
09:55	<annevk>	heh
09:56	<Hixie>	it needs an editor
09:56	<Hixie>	with a big hammer and a bag full of nails
09:56	<Hixie>	and a lot of duct tape
09:56	<annevk>	i've got quite a list of specs already
09:57	<annevk>	can you tell if http://tc.labs.opera.com/dom/traversal/002.htm is valid or not?
09:58	<annevk>	opera had some minor failures, but i'm not sure if my minimized test is ok
09:58	<Hixie>	i dunno
09:58	<Hixie>	it took me days to do the acid3 test that does mutations in the handler
09:58	<Hixie>	and i'm still not sure it's correct
09:58	<annevk>	ouch
09:58	<annevk>	that's the test i'm talking about :)
09:58	<Hixie>	:-)
09:59	<Hixie>	someone should write up the traversal spec unambiguously
09:59	<Hixie>	then we'd know
10:01	<jgraham_>	Hixie: You should produce a list of specs that you _don't_ think need an editor, as it would be simpler to refer to that than a list of specs that do
10:01	<jgraham_>	;)
10:04	<annevk>	how did the moz call go?
10:04	<annevk>	cookies?
10:07	<Hixie>	jgraham_: no, i have a list of specs that need an editor
10:07	<Hixie>	jgraham_: it's relatively short
10:08	<Hixie>	jgraham_: (companion specifications on the whatwg wiki)
10:08	<jgraham_>	Hixie: I wasn't quite being serious :)
10:08	<Hixie>	:-)
10:12	Philip`	wondered why his web server had TCP connections with apache2-hixie.hixie.dreamhost.com, before finding that that's just what whatwg.org reverse-resolves into
10:14	<Philip`>	By the way, http://status.whatwg.org/annotate-web-apps.php doesn't look so good
10:17	Philip`	wonders if people who carefully make XHTML sites in PHP do something to disable/redirect warning/error messages like that, or if they just assume they're never going to occur
10:22	<zcorpan>	Philip`: what? php error messages are supposed to be xhtml compatible!! they changed <br> to <br />!1
10:35	<Lachy>	Hixie, yt?
10:36	<Lachy>	Hixie, in the first note in the dl elmement section, you're missing the word "using"
10:37	<Lachy>	oh, actually, you wrote "using accidentally" instead of "accidentally using"
10:56	<Hixie>	thanks will fix
11:06	<Lachy>	Hixie, why do you refer to <dl> as an association list where it's defined, but then refer to description list everywhere else?
11:11	<Hixie>	carelessness, probably
11:11	<Hixie>	aw man, henri is asking for a rewrite of the outline algorithm
11:11	<Hixie>	and there i was thinking i'd get through all this feedback without having to do any work
11:12	<Hixie>	i guess i'll reply to this tomorrow
11:12	<annevk>	that must be a subset of his feedback then
11:13	<annevk>	lots of parsing questions stuff prolly requires work
11:13	<annevk>	s/stuff//
11:14	<zcorpan>	annevk: it seems you read "this" as "his"
11:14	<annevk>	yeah
11:31	<zcorpan>	the first example on <figure> reminds me of <listing>
11:40	zcorpan	notes that the 3rd example has <p/><img/><p/>
11:41	<annevk>	that reminds me of a frontpage commercial
11:41	<annevk>	http://annevankesteren.nl/2004/02/microsoft-frontpage-and-valid-html
11:42	<annevk>	"</p>That's right. We said FrontPage.</p>"
12:12	<zcorpan>	Hixie: you got a 9 too much, so it would be 5000 people, not 500 :)
12:15	<Philip`>	<pre><code>
12:15	<Philip`>	interface PrimaryCore {
12:15	<Philip`>	seems sort of wrong, since it'll have a blank line at the top
12:15	<Philip`>	(since <pre> just deletes a directly following newline character token, which doesn't happen in this case)
12:16	<Philip`>	Also it says void sendData(in sequence<byte> data); which looks wrong
12:17	<zcorpan>	Philip`: what wrong about it?
12:17	<Philip`>	Oh, whoops, nothing
12:17	<Philip`>	I forgot it was showing HTML code
12:20	<annevk>	the <aside> example that's also a self-reference contains a too early </pre>
13:20	Philip`	wonders if there's a web server anywhere which sends Content-Encoding: deflate, so he can test his code on it
13:30	<hsivonen>	Philip`: curiously, x-gzip (as opposed to gzip) happens
13:37	<hsivonen>	Philip`: when I implemented gzip support for V.nu, I figured that clients support gzip so well that supporting deflate and x-gzip is not worthwhile.
13:37	<hsivonen>	Philip`: it seems to me that deflate is mostly a dead letter in the spec
13:38	<hsivonen>	Philip`: so my guess is that your software doesn't need to support deflate
14:05	<Philip`>	Hmm, I suppose I might as well keep untested deflate support since it's only two lines of code and it'll let me see how many servers attempt to send deflate responses
14:06	<Philip`>	It looks like only Opera sends accept-encoding:x-gzip
14:07	<Philip`>	(and nothing sends 'compress')
14:08	<SadEagle>	konq does x-gzip as well. But no compress.
14:08	Philip`	adds x-gzip to see what happens
14:09	<Philip`>	SadEagle: Aha, okay
14:09	<Philip`>	Konqueror 3.5.something sends "x-gzip, x-deflate, gzip, deflate"
14:10	<Philip`>	but x-deflate seems a bit peculiar since it e.g. isn't mentioned in RFC2616
14:12	Philip`	wonders how many people handle accept-encoding:gzip,identity;q=0
14:26	<annevk>	If I have to boxes A and B. A is located at 0,0 and B at 10,10. I can now speak about the distance between the top edge of A and the top edge of B (being 10). Now if B is located at -10,-10 this distance should be -10. Is saying "The distance between the top edge of A and the top edge B" enough or should I say "downward distance"
14:26	<annevk>	suggestions?
14:28	<zcorpan>	if it can be negative then it's not a distance
14:28	<Philip`>	"The y coordinate of the top edge of A minus the y coordinate of the top edge of B"?
14:28	<annevk>	i guess that's best, yes
14:29	<Philip`>	Or skip the annoyingly verbose English and say A<sub>T</sub><sub>y</sub> - B<sub>T</sub><sub>y</sub>
14:30	<Philip`>	Actually, skip the annoyingly verbose HTML too and say $A_T_y - B_T_y$
14:30	<annevk>	heh
14:31	<zcorpan>	Hixie: it should be attribute boolean reversed, not attribute long reversed
14:32	<zcorpan>	Hixie: s/is present/is absent/ in the paragraph defining reversed=''
14:50	<Philip`>	Oh, it turns out there's at least one site which sends 'delate' content, and it makes my code die with "unknown compression method"
14:53	<zcorpan>	'delate'?
14:53	<Philip`>	Uh
14:54	<Philip`>	'deflate'
14:56	<Philip`>	Aha, it works when I add a second parameter and do new InflaterInputStream(in, new Inflater(true));
15:04	<Philip`>	In 1024 pages, I see 2 deflate and 141 gzip
15:04	<Philip`>	(when sending accept-encoding:gzip,deflate)
15:17	<Philip`>	In 8192, I see 12 deflate and 1195 gzip
15:21	<Philip`>	In 16384, 16 and 2439
15:21	<Philip`>	deflate is not entirely negligible
15:26	<Philip`>	http://www.toua-u.ac.jp/ - "Video/X-Flv: .flv" - is that meant to do anything?
15:30	<Philip`>	http://www.superexpressgonzalez.com/ - P3P: policyref="http://www.tiendavirtual.ws/w3c/p3p.xml"; CP="NOI DSP COR NID PUB NOR" """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
15:52	<Philip`>	<!doctype php public "-//w3c//dtd xphp 1.0 transitional//en" "http://www.w3.org/tr/xphp1/dtd/xphp1-transitional.dtd">;
15:53	<Philip`>	<!doctype html public "guest house,jersey,haven,guest house,accommodation,jersey,channel islands,saint helier,british ">
15:53	<Philip`>	<!doctype html public "-//w3c//dtd xhtml 1.0 transitional//en" "http://www.w3.org/tr/xhtml1/dtd/xhtml1-transitional.dtd"; "http://www.w3.org/tr/html4/loose.dtd">;
15:53	<Philip`>	The world is a crazy place :-(
15:54	<annevk>	fortunately we have unambigious rules to interpret it
15:55	<zcorpan>	i've seen "php" doctypes before
15:55	<Philip`>	http://www.thermaglaze.com/ - <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"; [ <!ATTLIST a target CDATA #IMPLIED> ]> - that's another way to work around the nonconformance of <a target>
15:55	<zcorpan>	people change all their files from .html to .php and do a global s/html/php/
15:56	<zcorpan>	(which suggests there are some <php> elements out there, too)
15:56	<Philip`>	http://www.pervasive.com/ - <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 5.0//EN" />
15:57	<Philip`>	<!doctype html public "//w3c//dtd html 3.2//en"> - hmm, bad copy-and-paste
15:57	<Philip`>	(from Word, presumably)
15:58	<zcorpan>	or search-replace
15:58	<Philip`>	Oh, I suppose that's possible
16:13	<Philip`>	Hmm, IE6 and HTML5 (or non-IE browsers) seem to differ in doctype moding for ~1% of pages
16:15	<Philip`>	The most significant difference is <!doctype html public "-//w3c//dtd html 4.0 transitional//en" "http://www.w3.org/tr/rec-html40/loose.dtd">; (on ~0.5% of pages)
16:17	<zcorpan>	Philip`: do you have pointers to the remaining ~0.5%?
16:23	<Philip`>	276 – <!doctype html public "-//w3c//dtd html 4.0 transitional//en" "http://www.w3.org/tr/rec-html40/loose.dtd">;
16:24	<Philip`>	69 – <!doctype html public "-//"aol hometown//html 3.0 transitional//en">
16:24	<Philip`>	45 – <!doctype html public "-//w3c//dtd html 4.0 transitional//en" "http://www.w3.org/tr/html4/loose.dtd">;
16:24	<Philip`>	34 – <!doctype html public "-//w3c//dtd xhtml 1.0 transitional//en" system "http://www.w3.org/tr/xhtml1/dtd/xhtml1-transitional.dtd"/>;
16:24	<Philip`>	22 – <!doctype html public "-//w3c//dtd html 4.0 transitional//en" "http://www.w3.org/tr/html40/loose.dtd">;
16:24	<Philip`>	19 – <!doctype html public "-//w3c//dtd html 4.0 frameset//en" "http://www.w3.org/tr/rec-html40/frameset.dtd">;
16:24	<Philip`>	zcorpan: ^ Those are the counts (out of 62592 pages) of the top doctypes that are IE standards and HTML5 quirks
16:24	Philip`	will upload some page with all the details later
16:24	<zcorpan>	Philip`: thanks!
16:25	<Philip`>	(These are all pages from dmoz.org, so there's still an uncorrected bias towards certain sites/domains)
16:26	<Philip`>	(Also, by "HTML5" I mean HTML5 plus http://lists.w3.org/Archives/Public/public-html/2008Jan/0006.html )
16:34	Philip`	wonders how many pages he should download
16:39	<gsnedders>	how many implementations are there of RDF?
16:41	<Philip`>	I get quite a few "Corrupt GZIP trailer" errors
16:45	<Philip`>	<button onlick="..."> doesn't quite do what I want
17:03	<Philip`>	<!doctype html public "-//w3c//dtd xhtml 1.0 transitional//en" "http://www.w3.org/tr/xhtml1/dtd/xhtml1-transitional.dtd"; />
17:04	<Philip`>	HTML5 say quirks, Firefox/Opera say limited-quirks
17:04	<Philip`>	*says
17:05	<Philip`>	(IE says limited-quirks too, and I can't be bothered to test Safari)
17:10	<zcorpan>	um... how come safari doesn't do quirks mode in the live dom viewer?
17:11	<Philip`>	How are you testing that?
17:11	<zcorpan>	<style>body{background:f00}</style>
17:13	<zcorpan>	safari seems to not care about garbage after the system identifier
17:14	<zcorpan>	so html5 is in disagreement with browsers about trailing /> in the doctype
17:14	<Philip`>	Safari seems to think "foo<!doctype html>" is non-quirks
17:14	<zcorpan>	mozilla too
17:15	<aroben>	zcorpan: Philip`: what version of Safari are you testing?
17:15	<zcorpan>	probably an old one
17:16	<zcorpan>	3.0.2 (522.13.1)
17:16	<aroben>	ok
17:16	<aroben>	that's good
17:16	<aroben>	WebKit trunk just recently changed to match HTML 5's doctype parsing much more closely
17:16	<aroben>	I'm glad you're testing the behavior of Safari 3
17:16	<Philip`>	I see 88 pages with <!doctype .../> out of 109434
17:16	<zcorpan>	aroben: would be interesting to know if your changes breaks anything
17:18	<aroben>	zcorpan: yes, it would ;-)
17:18	<aroben>	zcorpan: I'm sure we'll find out eventually
17:18	<zcorpan>	i would also like to know if pages that use something other than //en in the FPI are rendered correctly in firefox/safari
17:19	gsnedders	is confused. how is the billion laughs attack that many? if 0 is 2 chars, then is repeated twice, it becomes four, then that is repeated twice you get eight, etc. only doubling.
17:20	<zcorpan>	gsnedders: double it enough times and it will become a billion pretty quickly :)
17:20	<gsnedders>	but with only 30 it gets to only 120, I thinks
17:20	<gsnedders>	that can't be right
17:20	gsnedders	is being dumb
17:22	<gsnedders>	doubling starting at two goes up in powers of two.
17:22	<gsnedders>	duh.
17:23	<gsnedders>	which means laugh30 is $2^{30+1}$ characters
17:24	<Philip`>	I see 8 pages with <!doctype html> out of 109434
17:31	<annevk>	mine mine mine
17:34	<Philip`>	135 pages have Server "ZX_Spectrum/1997 (Sinclair_BASIC)"
17:35	<SadEagle>	coolness.
17:39	<Philip`>	Oh, they're all subdomains of narod.ru
17:39	<Philip`>	so there's only one web server running on a Spectrum :-(
17:58	<Philip`>	http://philip.html5.org/data/doctypes-2.html
17:58	<Philip`>	(The "Test" thing probably doesn't work in WebKit)
18:00	<gsnedders>	yeah, it just reportd non-quirks for all
18:01	<gsnedders>	Philip`: <http://bugs.webkit.org/show_bug.cgi?id=15062>;
18:14	<hsivonen>	hrm. Google docs reject Firefox 3 and tells me to get Firefox 1.5.0.12 or newer
18:15	<svl>	Google does lots of bad sniffing for "Firefox" instead of looking at "Gecko" :(
18:17	<svl>	(And Firefox 3 at present is still "minefield" (I'm pretty certain, though didn't check))
18:19	<aroben>	Philip`: here's a test for what mode the page is in that works in WebKit: http://trac.webkit.org/projects/webkit/browser/trunk/LayoutTests/fast/doctypes/resources/TestDoctype.js?rev=30431
18:29	<gavin>	svl: nightly builds are "Minefield"
18:29	<gavin>	betas are "Firefox"
18:29	<hsivonen>	I tried with a beta, fwiw
18:31	<gsnedders>	svl: Firefox is a registered trademark of the Mozilla Foundation, and its use is massively limited
18:32	<gavin>	"massively limited"? limited to "Firefox", sure :)
18:37	<gsnedders>	gavin: limited to any official build
18:37	<zcorpan>	Philip`: i think the regular expression could have been <!doctype[^>]*>
18:39	<gsnedders>	Philip`: seems to have been merged into trunk by now
18:40	<zcorpan>	<!doctype html public "-//sq//dtd html 2.0 + all extensions//en" "hmpro3.dtd">
18:40	<zcorpan>	hmm
18:43	<gsnedders>	Philip`: only diff in WebKit from HTML 5 is <!doctype html public "-//w3c//dtd html 4.01 transitional//en" ""> is quirks
18:46	<zcorpan>	opera seems to mostly match html5 on commonly used doctypes, but mostly match ie on uncommonly used doctypes
18:47	<zcorpan>	what i want to know is if changing our impl will break pages with the uncommonly used doctypes :)
18:48	<zcorpan>	if they already are broken in safari 3 and firefox, then it might be a good idea to revise html5
18:49	<zcorpan>	(i.e. broken because they get the wrong mode)
19:28	<Philip`>	zcorpan: <!doctype[^>]*> probably would be entirely sensible, given that that's what HTML5 now expects
19:29	Philip`	will re-run it with that
19:31	Philip`	saw 39 application/xhtml+xml pages out of 100K (when sending the same Accept header as FF3)
19:38	<Philip`>	gsnedders: Do you mean the 'Test' button thing works correctly in WebKit now, so I don't need to change anything (like copying aroben's suggestion)?
19:38	<gsnedders>	Philip`: yeah
19:38	<Philip`>	gsnedders: Okay, thanks, that makes it easy for me :-)
19:39	<annevk>	http://tc.labs.opera.com/html/parsing/doctype/001.htm also tests DOCTYPE handling
19:39	<annevk>	fwiw
19:43	Philip`	wonders if he should make each doctype link to a list of all the pages which were found to use it
20:21	<Hixie>	Philip`: how is <!doctype html public "-//"aol hometown//html 3.0 transitional//en"> triggering HTML5 quirks?
20:24	<Philip`>	Hixie: After the second '"' it's in 'after DOCTYPE public identifier state', and the 'a' puts it into 'bogus DOCTYPE state', which ends up setting the correctness flag to 'incorrect', so it gets treated as quirks
20:28	<Hixie>	Philip`: oh, yeah, good point
20:30	<Philip`>	Oh, I just realised why my downloader was failing to terminate for one particular site
20:30	<Philip`>	That site is an infinitely long radio stream
20:31	<Hixie>	hah
20:31	<Hixie>	yeah
20:31	<Hixie>	i've come across infinite pages before too
20:31	<Hixie>	that's how i know there's an infinite number of <font> elements on the web
20:32	<gsnedders>	:D
20:44	<Philip`>	Why has my /bin/grep become thousands of times slower than it should be?
20:47	<Philip`>	and only on my XML file of extracted doctypes, not on any other file I've tested?
20:48	<Philip`>	Oh, it goes thousands of times faster again if I set LANG=en_GB without the .UTF-8
20:50	<Philip`>	even though it's perfectly fast on a different computer with LANG=en_GB.UTF-8
20:51	<gsnedders>	computers are odd.
20:55	<Dashiva>	Probably has to do with matching stuff like \w
20:57	<Philip`>	I was only doing "grep processed foo.xml"
21:00	<Dashiva>	Maybe it does some conversion to an internal representation or some other premature optimization anyway?
21:01	<Philip`>	<!DOCTYPE "=""" PUBLIC="PUBLIC" HTML="html" HTTP://WWW.W3.ORG/TR/XHTML1/DTD/XHTML1-STRICT.DTD="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"; 1.0="1.0" -//W3C//DTD="-//W3C//DTD" STRICT//EN="Strict//EN"
21:01	<Philip`>	="
21:01	<Philip`>	" XHTML="XHTML">
21:01	<Philip`>	Hmm, I can guess how that happened
21:03	<Dashiva>	haha
21:04	<Hixie>	wow, what a mess
21:04	<Dashiva>	Whitespace is our friend, right?
21:13	<Hixie>	hsivonen: re your request for rewriting the outlining algorithm
21:13	<Hixie>	you say
21:13	<Hixie>	> This sweep should probably define what checkpoint data to
21:13	<Hixie>	> store on each element that implements the HTMLHeaderElement interface to allow
21:13	<Hixie>	> local recomputation.
21:14	<Hixie>	can't i just say what i usually say, namely that at any time, the outline must match the results you would get from running the algorithm over the whole tree?
21:14	<Hixie>	i don't want to be the one defining how you are to do optimisations...
21:15	<hsivonen>	Hixie: does stating that actually make the said optimization reasonably implementable?
21:15	<Hixie>	depends what the algorithm is
21:16	<Philip`>	http://philip.html5.org/data/doctypes-2.html#%3C!doctype_html%3E - now with links to pages, at least for the obscurer doctypes
21:17	<hsivonen>	Hixie: I don't see how it would hurt to define the outline in terms of forward sweeps from a hn state
21:17	<Hixie>	oh it wouldn't, and that's just a subset of defining it as part of a forward sweep in general
21:17	<Hixie>	i'm just saying the spec shouldn't define ny algorithm twice
21:17	<Hixie>	since the spec shouldn't be optimising
21:18	<Hixie>	or rather, it should be optimising for lack of ambiguity
21:18	<Hixie>	and adding two algorithms defeats that :-)
21:18	<gsnedders>	the current section confuses me
21:18	<Hixie>	hmm... i guess each sectioning element is effectively atomic in the algorithm...
21:18	<Hixie>	that is, anything outside a section can't affect what's inside it
21:18	<Hixie>	and what's inside a section can only affect the relative position of further nested sections...
21:19	<Hixie>	that's useful
21:19	<hsivonen>	Hixie: surely a forward sweep algorith can be the only definition
21:19	<Hixie>	hsivonen: yes
21:19	<Hixie>	hsivonen: that's what i plan to do
21:19	<hsivonen>	Hixie: excellent. thanks
21:20	<Hixie>	but your e-mail asked for one or two other things too, which i'm not sure i should necessarily do
21:20	<Hixie>	actually i think what i quoted above is a non-issue, and your second requirement might already be met