#whatwg on 2008-02-27

00:16	<Hixie>	ok
00:16	<Hixie>	<blockquote> is clearly not a sectioning element
00:18	<Hixie>	and outlines will not include headers in tables, figures, blockquotes, or datagrids.
01:27	<Hixie>	oh lord, my legacy lives on http://oxine.opera.com/documentation/dom-interface.html
01:42	<kingryan>	long live legacy!
01:43	<Philip`>	whether you want it to or not
02:24	<Hixie>	hsivonen: ok. so. tell me if the new algorithm is better.
02:24	<Hixie>	jgraham_: your input would be especially interesting too, since you implemented the last version :-)
02:46	<annevk>	Hixie, do you have a rough plan of what you're going to edit in the spec?
02:48	<Hixie>	how do you mean?
02:50	<annevk>	If you had some plan which sections of the specification you're going to work on for the next month or so. I guess I'm mostly interested in when the parsing section gets another update.
02:50	<Hixie>	no particular plan.
02:50	<Hixie>	i can start on parsing next if you want.
02:53	<annevk>	Cool. In particular I'm interested in seeing insertion modes being replaced with phases and updates to DOCTYPE sniffing.
02:53	<Hixie>	yeah
02:53	<Hixie>	that scares me
06:04	<takkaria>	Hixie: how do you twittr the svn commits with whatwg? I want to copy it for one of my projects
06:56	<Hixie>	takkaria: send me mail, i'll send you the script
07:05	<takkaria>	Hixie: ta
07:07	<takkaria>	sent
07:13	<Hixie>	ok, sections are updated. e-mail sent.
08:42	<Lachy>	Can anyone explain the logic behind the suggestion to use <dl> for marking up poetry? Even the wiki page with the example seems to have no real explanation
08:47	<virtuelv>	seems bizarre
08:48	<Lachy>	it seems that some people, when determining what kind of elements are "prose content" just look at those that say "prose content" in the category list, and fail to look at the definition of prose content
08:48	<virtuelv>	IMO, poetry is a bad match for examples
08:48	<Lachy>	the heirarchy of categories should be made clearer somehow
08:48	<virtuelv>	there is poetry where whitespace is extremely significant
08:48	<Lachy>	for that, we have <pre>
08:49	<virtuelv>	yes
08:49	<virtuelv>	my point is just that using a poem in examples, or suggesting anything particular for poems is bound to fail and cause misunderstandings
09:39	<Hixie>	prose content is a terrible name
09:39	<Hixie>	i really should fine something better
09:41	<Hixie>	also "grouping content" and "text-level semantics" are terrible section names
09:59	<jgraham_>	Hixie: I'll try to look at the heading outline stuff in a bit of detail soon
09:59	jgraham_	is, in general, quite busy at the moment
09:59	<Hixie>	cool
10:00	<Hixie>	would be good to compare the new algorithm with the old one
10:00	<Hixie>	see if the same outlines come out
10:00	<Hixie>	they basically should
10:00	<Hixie>	though iirc there are some minor intentional differences, in edge cases with invalid markup in particular
10:01	<Hixie>	also <address> elements will get associated with anonymous sections now, not their DOM parent node <section> if that is a different section
10:01	<jgraham_>	Also, I'm glad you're still optimistic about FF3 having sane behaviour for unknown elements :)
10:02	<Hixie>	well they keep saying they'll try something
10:02	<Hixie>	they haven't given up yet
10:03	<Hixie>	we _really_ need new names for "prose content", "grouping content", and "text-level semantics"
10:05	<Hixie>	Maybe Flements, Bloments, and Inments.
10:05	<Hixie>	that would at least not confuse people into thinking they meant something else
10:07	<Hixie>	or Flodes, Grements, and Liliments
10:07	<annevk>	alpha beta and gamma
10:08	<Hixie>	the advantage of a made up word is that it fits well in the flow of text
10:08	<Hixie>	"<p> is a flode element"
10:08	<Hixie>	maybe "prose content" should be "flow content"
10:09	<Philip`>	You could have a <p> start flag
10:09	<Lachy>	flow content would work, since that's bascially the equivalent in HTML4
10:10	<jgraham_>	I suggest not using flow content for the same reason :)
10:13	<Hixie>	ok flow content it is
10:14	<annevk>	did someone look at how XHTML2 solved this?
10:14	<jgraham_>	Does it exactly match the HTML 4 definition?
10:15	<annevk>	XHTML2 has modules
10:15	<annevk>	"XHTML Text Module"
10:15	<Hixie>	you're assuming xhtml2 solved it
10:16	<annevk>	s/solved/did/
10:16	<Hixie>	well, they don't have the same concepts we do
10:16	<Hixie>	they don't have flow < phrasing < embedded, e.g.
10:39	<hsivonen>	hmm. it would be great to have a script that'd file a v.nu bug every time the spec svn commit message is marked as conformance-checker-relevant
10:40	<annevk>	at some point the tracker should support Atom with categories or something which might make that easy
15:42	<zcorpan>	Hixie: making <td> a sectioning root will make the outline algorithm useless for real-world pages where tables are used for layout
15:43	<zcorpan>	(unless it can be determindated which tables are used for layout and let those be ignored from the rule)
15:47	<zcorpan>	(however, i don't know when you'd have sections in <td> unless you were using a layout table, so perhaps <td> simply should not be a sectioning root)
16:03	<zcorpan>	Hixie: while you're renaming stuff, i think "text-level content" makes more sense than "phrasing content"
16:07	<zcorpan>	Hixie: html4 described inline as "text-level", and html4 has "phrase elements" (em, strong, dfn, etc)
16:11	<gsnedders>	Hixie: why do we have "current outlinee" with two "e"s?
16:24	<gsnedders>	Hixie: more nitpicking: steps one and two of creating an outline are wrong: they don't hold it yet
17:06	<annevk>	hmm, ES4 uses a namespace with a version number in it...
17:32	Philip`	sees that 15% of Alexa Top 500 pages are in HTML5 non-quirks mode, compared to 5% of dmoz.org pages
17:33	<Philip`>	(and 47% limited-quirks, vs 23%)
17:33	<gsnedders>	Philip`: and strict v. non-strict?
17:34	<Philip`>	(and 37% quirks, vs 72%)
17:34	<Philip`>	(and I hope those numbers add up right)
17:35	<Camaban>	I imagine dmoz has a lot more 'older' sites
17:36	<Philip`>	gsnedders: I'm not sure how to measure that exactly
17:39	<Philip`>	though if I simply count the top XHTML1.0-Strict and HTML4-Strict and XHTML1.1 doctypes, then that's 14% of Alexa and 4% of dmoz
18:02	<Philip`>	http://philip.html5.org/data/doctypes.html \| http://philip.html5.org/data/doctypes-alexa.html
18:05	<Philip`>	http://philip.html5.org/data/doctypes-alexa.html#%3c%21doctype_html_public_%22-%2f%2fw3c%2f%2fdtd_html_4.01%2f%2fen%22_%22http%3a%2f%2fwww.w3.org%2ftr%2fhtml4%2fstrict.dtd%22_%2f%3e - HTML5 disagrees with Firefox and IE, which maybe isn't good
18:06	<Philip`>	(and Opera)
18:11	<zcorpan>	Philip`: thanks for providing links to the pages that have a given doctype
18:12	<Philip`>	zcorpan: I hope it doesn't matter that I skipped links for the more popular ones - they just made the page unreasonably huge
18:12	<Philip`>	(It's currently ~1.5MB, or 150KB with gzip, which is not entirely as small as one might wish)
18:13	<zcorpan>	Philip`: that's fine, i wanted to analyse the less common doctypes :)
18:13	<zcorpan>	since we're already more aligned with html5 than with ie on the commonly used doctypes
19:08	<Hixie>	zcorpan: yes, that was the entire point of making <td> a sectioning root. Also, I couldn't work out what order to make the headers go in, and I figured that in a _conforming_ case of a table with subsections (e.g. a character sheet where one of the cells is a character backstory), you wouldn't actually want them on the main outline.
19:09	<Hixie>	zcorpan: phrasing content can't be text-level content because phrasing content includes non-text things like <img> and <video>.
19:09	<Hixie>	gsnedders: "current outlinee" is correct (it's the thing being outlined)
19:26	<zcorpan>	Hixie: text-level means that it's on the same level as text, not that it is text
19:26	<Hixie>	yeah but i think that's confusing
19:27	<zcorpan>	i think phrasing is more confusing :)
19:27	<Hixie>	though possibly no more confusing than what we have now, indeed
19:36	<zcorpan>	Hixie: did you mean that the point of making <td> section root was to make the algorithm useless for real-world pages?
19:37	<Hixie>	well maybe saying it was the whole point was overstating the case a bit
19:37	<zcorpan>	ok
19:37	<Hixie>	on a compliant page that happens to use headers in tables, you'd not want those headers in the outline
19:37	<Hixie>	just like headers in a figure or in a blockquote, they are like a "subdocument"
19:38	<Hixie>	now, it does mean that outlines don't work in abusive pages, but i'm not shedding any tears over this
19:38	<zcorpan>	implementors that make outlines probably already know how to spot a layout table, so they could easily exclude those tables
19:40	<Hixie>	yeah
19:40	<Hixie>	maybe a quirks mode thing :-)
19:40	<zcorpan>	tables are used for layout in standards mode too in the wild
19:40	<Hixie>	yeah
19:50	<gsnedders>	Hixie: peh. that's confusing.
19:51	<Hixie>	significantly better ideas welcome :-)
19:54	<gsnedders>	"current thing being outlined" :P
20:01	<zcorpan>	Hixie: "authors are encouraged to place at most one top-level heading in each sectioning element" or some such... but perhaps that's not accurate enough or not understandable
20:06	<Hixie>	yeah
20:06	<Hixie>	but then again, that's what the current encouragemenet basically proposes
20:11	<zcorpan>	that's true
20:42	<gsnedders>	Hixie: what's a character not defined by unicode? from the document conformance section, it seems that not all non-characters are
20:42	<Hixie>	yeah, but some of them are defined to be permanently undefined
20:42	<Hixie>	whereas others are merely not yet defined
20:43	<gsnedders>	oh, wait.
20:43	<gsnedders>	I realise what I'm mistaking.
20:43	<Hixie>	i'm so sorry for anne. he's somehow ended up editing a spec with multiple phone calls per week trying to tell him what the spec should say
20:43	<Hixie>	sure am glad i dumped xmlhttprequest now though :-D
20:44	<gsnedders>	xxFFFE is the only code point that's always a non-character, as FDD0 to FDEF are only exactly that
20:44	<gsnedders>	that's what confused me
20:44	<Hixie>	eh?
20:44	<Hixie>	there are lots of non-character characters
20:44	<Hixie>	FFFF
20:44	<Hixie>	FFFE
20:44	<Hixie>	U+03FFFE
20:44	<Hixie>	etc
20:44	<Hixie>	see the list i just put in the spec, in fact :-)
20:45	<gsnedders>	I thought it was wrong, that's the only point :)
20:45	<Hixie>	ok :-)
20:45	<gsnedders>	but as I said, I'm being silly
20:45	<Hixie>	well i just copied it from xml 1.0
20:45	<Hixie>	so...
20:45	<gsnedders>	I think FFFF is legal
20:46	<Hixie>	it's not
20:46	<gsnedders>	"Noncharacters consist of the values U+nFFFE and
20:46	<gsnedders>	U+nFFFF (where n is from 0 to 1016) and the values U+FDD0..U+FDEF.
20:46	<gsnedders>	"
20:46	<gsnedders>	OK, I'm wrong again (and so is my implementation, then)
20:46	gsnedders	realises his implementation is just too smart for himself
20:46	<gsnedders>	my implementation _is_ right.
20:47	<gsnedders>	`($codepoint & 0xFFFE) === 0xFFFE` does match FFFF
20:48	<Hixie>	or rather, it does match it
20:48	<Hixie>	but yes
20:48	<Hixie>	that's a fine check
20:48	<gsnedders>	If you aren't too tired to misread it :)
20:48	<gsnedders>	(that's from my PHP impl., FWIW)
20:49	<gsnedders>	Or rather, It only is if you aren't too tired to understand it
20:49	<gsnedders>	ergh.
20:49	<gsnedders>	I can't even do English :)
20:49	gsnedders	gives up, and shuts up
20:54	<Hixie>	knowing when to shut up is a good skill to have :-)
20:54	<gsnedders>	Hixie: I would rather the bit about when to throw a parse error to be clearer. Surrogates _are_ defined by Unicode (they just have no character assignments), for example
20:54	<Hixie>	there are no surrogate characters
20:54	<Hixie>	unless i'm misunderstanding you
20:54	<Hixie>	what is unclear about the spec?
20:55	<gsnedders>	ah. true. if you take it of the actual meaning of character, yeah.
20:55	gsnedders	was thinking of a character as being any code-point, but of course with non-characters that's dumb
20:55	gsnedders	shuts up, again
20:56	<Hixie>	U+.... is the character, not the value as it was in the original byte stream
20:56	<Hixie>	so if you are decoding as UTF-16, you can never end up seeing a U+.... character from the surrogate blocks
20:56	<gsnedders>	yeah, that's true
20:56	gsnedders	needs to wake up
20:56	<Hixie>	and if you do it as UTF-8, and you see one of those characters, it's not actually a surrogate character, it's a non-character
20:56	<gsnedders>	(or, alternatively, just go to bed)
20:56	<Hixie>	but anyway
20:57	<Hixie>	where is the part fo the spec that says not to put encoding declarations in the file?
20:57	Hixie	can't find it
20:57	gsnedders	didn't know the spec said that
20:57	<Hixie>	apparently people want me to remove it
20:57	<Hixie>	which i'm fine with
20:57	<Hixie>	but i can't find it...
20:58	<gsnedders>	latest revision to the spec could cause some documents to cause a heckuva lot of parse error, me thinks. but people manage to create a heckuva lot of parse errors anyway.
20:59	<Philip`>	Hixie: s/occurances/occurrences/ in a recent edit
20:59	<Hixie>	hate that word
20:59	<Hixie>	thanks
21:00	<kingryan>	Hixie: by "encoding declarations in the file", could people be referring to "<meta http-equiv="content-type" content="text/html; charset=utf-8" />
21:00	<kingryan>	" ?
21:00	<Hixie>	kingryan: yes
21:00	<gsnedders>	on the subject of English, how is the spec both en-gb-x-hixie and en-us-x-hixie at once?
21:00	<Hixie>	gsnedders: it's mostly -us-, i just haven't fixed the declarations yet
21:00	<Hixie>	they're in a different file
21:00	<kingryan>	are people confusing that being allowed with that being required?
21:01	<Hixie>	kingryan: no, i'm pretty sure i once wrote that people should use Content-Type headers instead
21:01	<Hixie>	but i can't find it anymore
21:02	<Hixie>	well, can't find it
21:02	<Hixie>	oh well
21:32	<annevk>	woha, multimousewheel is gone again?
21:32	<annevk>	hmm
21:32	<annevk>	i wonder if they thought everything through, such as mousewheel not firing for certain types of scrolling currently
21:33	<Hixie>	i doubt it
21:33	<Hixie>	who's editing that spec?
21:33	<Hixie>	do they have any browser, qa, and spec review exerience?
21:35	<annevk>	dunno
21:35	<annevk>	well, the spec is edited by Andrew Emmons I believe
21:35	<Hixie>	don't know who that is
21:36	<annevk>	from BitFlash
21:36	<Hixie>	never heard of it
21:36	<annevk>	i don't know him either
21:36	<Hixie>	well, we'll see
21:36	<Hixie>	there really should be a way to train spec writers
21:36	<Hixie>	maybe i should write a book or something
21:39	<annevk>	"HOWTO BE A HIXIE"
21:45	<gsnedders>	1) Start a cabal that everyone hates.
21:45	<gsnedders>	2) ???
21:45	<gsnedders>	3) Profit!
21:46	<jgraham>	Spec writing for Dummies
21:47	gsnedders	realises in another virtual desktop he started a LaTeX file containing "\chapter{Evaluation}" hours ago.
21:48	gsnedders	has been doing well at procrastinating today
21:48	<gsnedders>	Philip`: I've just got an email from <mtanalin⊙yr> too
21:49	<jgraham>	gsnedders, Philip`: I get those
21:50	<jgraham>	Hixie: re the text "The outline for a sectioning content element or a sectioning root element consists of a list of one or more potentially nested sections. Each section can have zero or one heading associated with it. The algorithm for the outline also associates each node in the DOM tree with a particular section and potentially a heading."
21:51	<jgraham>	The "sections" referred to here are need not be actual <section> elements or anything, I assume. This could be more clear
21:57	<Hixie>	jgraham: clarified.
21:57	<Hixie>	(should be regenned in about 20 seconds)
21:58	gsnedders	needs less coursework and more spec-gen clone
22:07	<annevk>	gsnedders, now you make me go looking
22:12	<Hixie>	does anyone actually use ISO-8859-11 ?
22:13	<annevk>	that's off-topic
22:14	<Hixie>	well i have feedback here saying i should make -11 turn into win874
22:14	<Hixie>	and i have no idea if it matters or not
22:14	<Hixie>	safari doesn't even support -11
22:14	<Hixie>	(it does support 874)
22:16	<annevk>	given that 874 is supported and that other browsers do "support" -11 and that mapping is cheap...
22:19	<dbaron>	Hrm. Hixie's subject lines don't work very well when you have limited horizontal space for subjects. He's always writing "Re: [whatwg] several messages about"
22:20	<annevk>	"Make using a Win1252-specific byte when the document declared as ISO-8859-1 be a parse error." is that really worth it?
22:20	<dbaron>	Hixie, and if you're not aware, you should check mozilla/intl/uconv/src/charsetalias.properties for Mozilla's behavior
22:20	annevk	thinks ISO-8859-1 should be an alias for the former
22:21	<dbaron>	Mozilla seems to treat windows-874 and iso-8859-11 separately
22:22	<dbaron>	Hixie, er, actually, that's not the case
22:22	<dbaron>	but somebody decided to do that particular mapping in C++ instead
22:22	<dbaron>	so we treatt iso-8859-11 as windows-874, per bug 127755
22:31	<Hixie>	dbaron: yeah
22:32	<Hixie>	annevk: adding more complications isn't cheap
22:32	<Hixie>	annevk: they add up
22:32	<annevk>	it's a simply hashtable entry
22:33	<annevk>	"iso-8859-1" : "windows-1252", "iso-8859-11" : "windows-874", ...
22:33	<Hixie>	no, it
22:33	<Hixie>	is far more than that
22:33	<Hixie>	not in code
22:33	<Hixie>	but in cost
22:34	<Hixie>	it further antagonises the tag, for instance
22:34	<Hixie>	it makes QA more complex
22:34	<Hixie>	it makes people who try to use ISO-8859-11 wonder why they're getting od results
22:34	<Hixie>	etc
22:34	<Hixie>	it makes people say the spec is complicated
22:35	<annevk>	ideally this would not be solved at the HTML5 level though
22:35	<annevk>	IANA already has synonyms for charsets
22:35	<annevk>	they could simply make iso-8859-1 a synomym and likewise for its friends
22:36	<Hixie>	well, if you can convince them of that, let me know
22:37	<annevk>	I don't think I'm old enough to deal with what seems to be a political mayhem
22:38	<annevk>	I guess at some point it might be worth trying to fix it...
22:38	<Hixie>	neither am i :-)
22:39	<Hixie>	gsnedders: so do i still need to deal with this doctype feedback or is anne's comment enough?
22:39	<annevk>	http://lists.w3.org/Archives/Public/public-forms/2008Feb/att-0080/2008-02-27.html#topic4
22:41	Philip`	sees no iso-8859-1 in his 125K pages, and 61 windows-874s
22:41	<annevk>	iso-8859-1 or iso-8859-11 ?
22:42	<Philip`>	Uh
22:42	<Philip`>	-11
22:42	<Hixie>	HAHAHAHAHAHAHAHA
22:42	<Hixie>	anne: boy i hope that was a minuting error
22:43	<Philip`>	(except as a substring in http://www.btfonsterteknik.com/ )
22:43	<Hixie>	Philip`: cool, thanks
22:43	<Hixie>	i guess i should do a bigger scan
22:43	<Hixie>	and see what that tells us
22:43	<SadEagle>	Philip`: if it's not any effort to compute, what about 8859-5?
22:44	<annevk>	Hixie, yeah... :)
22:44	<annevk>	Philip`, and while you're at it, how about some stats? :)
22:44	<annevk>	(on charsets in general)
22:45	<Philip`>	SadEagle: That wouldn't be any effort, though it'll take a short while while I send grep through 3GB of HTML again :-)
22:45	<Hixie>	Philip`: you should set up a hadoop cluster for yourself :-)
22:46	<Philip`>	annevk: I suppose I could just collect all the HTTP content-type and meta content-types and summarise that
22:47	<Philip`>	Hixie: I only have about one machine so it wouldn't be a very good cluster, and it'd be fast enough by itself if it wasn't part of someone else's tiny cluster that's running an automated theorem prover several million times and using up all the CPU time :-)
22:47	<annevk>	nice nice
22:50	<gsnedders>	Hixie: having it clearer would always be nice
22:50	<Hixie>	gsnedders: please reply to anne saying what you would suggest to make it clearer then :-) thanks :-)
22:51	<Philip`>	SadEagle: I see about 7 that claim to be iso-8859-5
22:51	<Hixie>	Philip`: hah
22:52	<Hixie>	i guess i should actually implement these character encoding algorithms and use those to work out the encodings, huh
22:52	<annevk>	to be honest, my initial tokenizer implemention did not make the distinction either, but that was quickly rectified when i did the parsing stuff
22:52	<Hixie>	i'll do that after lunch
22:52	<Hixie>	back later
22:52	<annevk>	i still don't like them being non-deterministic
22:52	annevk	-> the wire