#whatwg on 2008-04-05

00:10	MikeSmith	reads up on eurion constellation and sees that it apparently may have been invented by OMRON
00:15	<Hixie>	data-foo="" on any element; element.dataset[foo] for DOM access is now in the spec (on whatwg.org; not yet checked in)
00:17	<annevk>	the authoring requirement is part of the example
00:18	<annevk>	also, maybe add an explicit note that these are not intended for use by specifications or user agents, etc., only Web authors
00:21	<andersca>	Hixie: hey
00:21	<andersca>	Hixie: got another quick question about the ApplicationCache object
00:23	MikeSmith	reads jwalden posting
00:23	<annevk>	Hixie, also, if there are supposed to be links between the method definitions and the algorithms that hasn't worked out
00:24	<annevk>	Hixie, DOMStringMap has [[Put]] in the IDL but the prose defines [[Set]]
00:25	<Hixie>	annevk: yo
00:25	<Hixie>	er
00:25	<Hixie>	andersca: yo
00:25	<Hixie>	annevk: cool, thanks, going through and fixing now...
00:26	<andersca>	Hixie: about
00:26	<andersca>	void add(in DOMString uri);
00:26	<andersca>	void remove(in DOMString uri);
00:26	<andersca>	Hixie: can I specify relative uris?
00:28	<Hixie>	yeah, that should work
00:28	<andersca>	and they should be relative to the manifest uri?
00:28	<Hixie>	hm, dunno
00:28	<Hixie>	i'd guess they should be relative to the same thing that location.href="uri" would be relative to
00:28	<Hixie>	please do send mail asking for that to be clarified
00:28	<andersca>	will do!
00:28	<andersca>	ah, another minor thing
00:29	<andersca>	when there is a failure during the cache update process
00:29	<andersca>	I'm supposed to
00:30	<andersca>	"If this is a cache attempt, then discard cache and abort the update process, optionally alerting the user to the failure."
00:30	<andersca>	Hixie: shouldn't the cache be discarded regardless of whether it's a cache attempt or not?
00:31	<Hixie>	um
00:32	<annevk>	Hixie, also, dataset vs dataSet
00:33	<Hixie>	andersca: not sure
00:33	<Hixie>	andersca: in meeting right now, can't determine answer easily
00:33	<andersca>	Hixie: OK!
00:33	<Hixie>	andersca: please do send mail to the list, and assume what makes the most sense to you :-)
00:34	<andersca>	Hixie: will do
01:16	<Hixie>	hsivonen mentioned wanting the namespace parsing thing to be a flag instead of a separate state, but i have no idea what that would mean
01:16	<Hixie>	i suppose it would mean going back to the old phase vs insertion mode thing
01:17	<MikeSmith>	Hixie - what's the basic use case for DOMStringMap?
01:17	<Hixie>	implementing .dataSet
01:18	<Hixie>	hence why it was added in the same checkin :-)
01:22	<MikeSmith>	:) OK, let me refine that... I see "embedding custom non-visible data in an HTML document for scripting purposes", so I guess I mean to ask, what are some examples of types of custom non-visible data that somebody might want to embed?
01:23	<Philip`>	MikeSmith: http://lists.w3.org/Archives/Public/public-html/2008Mar/0156.html has one example
01:23	<Hixie>	MikeSmith: ah!
01:23	<Hixie>	MikeSmith: well say you're writing an app, like a game, say, and you want to put data into the document to represent state in a way that html doesn't support
01:24	<Hixie>	you need some sort of way to embed that data
01:24	<MikeSmith>	ah
01:24	<MikeSmith>	Hixie - thanks
01:24	<MikeSmith>	also Philip` thanks
01:25	MikeSmith	reads "I don't much care about conformance, but I need some way to attach arbitrary data to elements, and it shouldn't be harder than adding an attribute."
01:32	<annevk>	MikeSmith, see http://www.alistapart.com/articles/customdtd/ for a twisted idea of what people do to work around it now
01:32	<annevk>	MikeSmith, (which actually creates "problems" going forward, because we'd like to use required="" in HTML5)
01:32	MikeSmith	reads
01:34	<annevk>	(in practice people prolly don't bother with a custom DTD)
01:35	<MikeSmith>	annevk - yeah, I doubt even the author of that article would :)
01:36	<MikeSmith>	but I never read that PPK article before .. interesting
01:42	<Hixie>	MikeSmith: pretty much
01:44	<MikeSmith>	OK, I see. - so having a method to embed the data eliminates the need to use custom attributes or some other hack in the markup (and have validators choke on those)
01:44	<Hixie>	right
01:44	<Hixie>	also helps avoid future clashes
01:44	<Hixie>	and makes validators less noisy
01:46	<annevk>	type=search is both in WF2 and WF3-search
01:46	<annevk>	in /issues/
01:46	<Hixie>	yeah the webforms comments a re a mess
01:46	<Hixie>	i need to resort them
01:52	<MikeSmith>	what browser implementation support is there for WF2 at this point?
01:53	<MikeSmith>	(I know about Opera support)
01:55	<annevk>	apart from input type=range in WebKit I don't think there is any
01:55	<Hixie>	there are small bits implemented here and there, especially the bits that we just took from existing browsers and specced
01:55	<Hixie>	but yeah, most of it isn't implemented
01:56	<eseidel>	we don't have any WF2 to my knowledge
01:56	<eseidel>	maybe type=range, but I don't remember seeing that.
01:57	<eseidel>	we = webkit
01:59	<annevk>	I think it was done together with type=search
02:00	<annevk>	eseidel, see http://weblogs.mozillazine.org/hyatt/archives/2004_07.html#005928 for instance
02:00	annevk	has no idea whether it's in the current code though
02:02	<eseidel>	ah, yes, range
02:02	<eseidel>	yeah, we do
02:02	<annevk>	anyways, good night
02:03	annevk	has a hard time switching time zones
02:05	<Hixie>	type=range was the first publicly demonstrated WF2 feature
02:05	<Hixie>	it was demonstrated at wwdc 2004
02:06	<Hixie>	by jobs himself
02:06	<Hixie>	a proud moment for the whatwg
02:07	<Philip`>	It was uselessly limited to integers when I last tried it
02:07	<Philip`>	(Well, it was only useless because I was ranging between 1.0 and 2.0)
06:58	<annevk>	Hixie, http://www.w3.org/html/wg/html5/#tag-name is wrong though, it should list 1-6 too
06:59	<annevk>	Hixie, it basically forbids writing <h1>
07:03	<Hixie>	yeah that's what i said :-)
07:04	<annevk>	oh right
07:05	<Hixie>	i don't understand how to do what hsivonen wants
07:06	<annevk>	what was that again? :)
07:09	<annevk>	oh, the insertion mode thingie
07:10	<annevk>	Hixie, well, you can suggest that you can implement the "in namespace" as x separate modes
07:10	<annevk>	one for each mode from where you can enter it
07:15	<Hixie>	it seems like an implementation detail
07:16	<Hixie>	when i finally reply to all this mail
07:17	<Hixie>	it's going to be the biggest e-mail ever in the history of mankind
07:17	<Hixie>	there are 619 e-mails in the folder
07:17	<annevk>	:)
07:36	<Hixie>	http://www.w3.org/2003/entities/2007/w3centities-f.ent, as a URI, embodies everything that is wrong with W3C URI naming policy
07:37	<Hixie>	618!
07:39	<annevk>	618?
07:39	<Hixie>	the entity table in HTML% is going to be one gigantic table once we're done mergin this in, jesus
07:39	<Hixie>	618 e-mails in the folder. i replied to one.
07:40	<annevk>	yeah, like another 2000 entries in that table...
07:40	<Hixie>	at least
07:40	<annevk>	i assume we're going to require ; everywhere?
07:40	<Hixie>	on the new ones, certainly
07:41	<Hixie>	gotta grandfather in the old ones though
07:43	<annevk>	hmm, , etc. indeed don't make much sense
07:43	<annevk>	or &dollar;
07:46	<Hixie>	yeah i mentioned those in the mail
07:54	<hsivonen>	Hixie: "in namespace" leaks to the tokenizer and needs a secondary mode, so it might as well be boolea "in namespace" if ("in namespace") { ... } else { do the insertion mode thing }
07:54	<Lachy>	What the...&quest; That's just totally unnecessay&period;
07:57	<annevk>	it also has doubles, plusmn and PlusMinus
08:00	<Lachy>	annevk, blame mathml for that. HTML had plusmn, MathML added the other two http://www.w3.org/TR/2003/REC-MathML2-20031021/bycodes.html
08:01	<Lachy>	oh, MathML had all those unnecessary ASCII entities too
08:10	<Lachy>	What does "underlying, canonically related, SGML document type" mean? http://www.w3.org/mid/i74pahwdm4.fsf⊙hmae
08:12	<annevk>	I think the idea is that valid HTML documents are SGML-compatible
08:15	<Hixie>	hsivonen: the stack of open elements leaks to the tokeniser too
08:16	<Hixie>	hsivonen: i don't see how what you describe is any better in the spec than just another insertion mode with a flag
08:17	<hsivonen>	Hixie: I think a boolean will be more implementable
08:18	annevk	doesn't quite see how the boolean would work
08:19	<annevk>	you'd need to reset that boolean everywhere insertion modes are changing because of <b><math><mtext></b> and such
08:19	<annevk>	iirc
08:20	<annevk>	(but if it can be done as boolean and that's easier, that seems like an impl detail you could just do yourself)
08:21	<annevk>	I wouldn't expect impl code to match the spec closely as the spec is not written with perf, memory usage, etc. in mind necessarily
10:21	<annevk>	I just noticed that with HTML5 <p> regains it's original semantic. That is "<body>This is the first paragraph. <p>This is the second and last paragraph.</body>" is now true again
10:36	<Hixie>	hsivonen: assuming there's no difference to the black box behaviour, that's irrelevant :-)
10:42	<hsivonen>	Hixie: being able to track the correspondence of lines of code to lines of spec is not irrelevant to maintainability
10:44	<Hixie>	i agree
10:45	<Hixie>	but as i am going to be implementing it as a separate state because it'll be far easier to do that than have two different top-level branches, i'm not convinced that your model is the more likely one to be implemented :-)
10:46	<hsivonen>	I'm pretty sure there will be an implementation of my model :-)
10:47	<Hixie>	i meant the one likely to be more implemented
10:47	<Hixie>	we used to have the top-level branch statement
10:47	<Hixie>	people asked that it be removed in favour of more inter-state jumps and more states
10:47	<Hixie>	so...
10:47	<Hixie>	adding the top level again seems bad
10:47	<hsivonen>	Hixie: that was different
10:48	<hsivonen>	Hixie: I didn't like the old way in that case, either
10:48	<hsivonen>	Hixie: here you have one special mode that in some cases falls back on the secondary mode
10:49	<hsivonen>	so the secondary mode might as way stay in its original field and the special thing be guarded by a boolean
10:49	<Hixie>	we had one special mode (trailing end) that in some cases fell back on a secondary mode (in body or in frameset)
10:49	<Hixie>	how is that different?
10:49	<hsivonen>	Hixie: less code
10:49	<annevk>	depends on the impl
10:49	<Hixie>	?
10:50	<Hixie>	what is less code than what?
10:50	<annevk>	I think the trailing end change worsened or will worsen our impl
10:50	<hsivonen>	Hixie: mainly: you can't change a switch argument temporarily while you are in a switch
10:50	<hsivonen>	Hixie: less code than juggling more than one mode enumeration around
10:51	<Hixie>	i don't follow
10:51	<Hixie>	just have each insertion mode in its own function, and have your insertion mode be a function pointer
10:51	<Hixie>	no switch
10:52	<Hixie>	no comparison costs
10:52	<Hixie>	and trivial jumping from insertion mode to insertion mode to "use the rules of" another mode
10:52	<hsivonen>	Hixie: without evidence, I claim that my impl is more HotSpot JIT-friendly
10:53	<hsivonen>	Hixie: the JIT cannot inline stuff where the object reference keeps changing
10:53	<Hixie>	possibly, but i'm certainly not basing the way the spec is structured on an argument that consists of the details of the cost of a function call in a particular java VM
10:54	<hsivonen>	also, without evidence, I claim that my impl would be more C compiler optimization-friendly
10:54	<hsivonen>	less runtime stack frames
10:54	<hsivonen>	s/less/fewer/
10:55	<hsivonen>	on a different topic: does the Google Code source browser have a directory tree depth limit?
10:55	<roc>	hsivonen: do you know what polymorphic inline caching is?
10:56	<hsivonen>	roc: I don't but I can guess. The problem is that I don't know the JIT optimizations (hence, "without evidence") are, so I'm erring on the side of simpler optimization being in place
10:57	<annevk>	You already have to keep track of the insertion mode for other stuff. Why is it an issue here?
10:57	<Hixie>	roc: hey sweet, the guy who wrote that paper works at google now
10:58	<roc>	Urs Holzle?
10:58	<Hixie>	annevk: he's managed to code around it
10:58	<Hixie>	roc: yeah
10:58	<Hixie>	roc: at least, i think so
10:58	<roc>	yeah well that's par for the course
10:58	<Hixie>	could be another Urs, of course :-)
10:58	<hsivonen>	annevk: the problem is that the "as if" in another mode stuff is more complex in this case, so just falling through in a switch won't work
10:59	<Hixie>	hsivonen: why don't you just have four states?
10:59	<Hixie>	hsivonen: one for each state that you can jump into the namespace state from
10:59	<Hixie>	(otherwise identical states)
10:59	<annevk>	yeah, i suggested something like that earlier
11:00	<hsivonen>	It seems to me that putting the namespace stuff on top and returning early when not doing "as if" is so much simpler
11:00	<annevk>	for the same reason i think i'd like trailing end back and leave splitting up as an impl detail
11:01	<hsivonen>	roc: btw, do you know of documentation of what optimizations HotSpot actually does?
11:01	<hsivonen>	finding information about that outside the source code seems hard
11:01	<hsivonen>	I guess they don't want people to expect anything in particular
11:01	<roc>	I don't know about documentation
11:01	<roc>	I know who to ask :-)
11:02	<roc>	you really don't want to be making assumptions about what optimizations it does
11:02	<hsivonen>	I can't help thinking about what it might do :-)
11:03	<annevk>	(aside, it's unlikely the media queries syntax will allow media="" to be valid because making @media { } work was not liked or something)
11:03	<hsivonen>	for example, earlier Maciej suggested a different structure for the tokenizer and which structure is more performant really depends on what HotSpot really does
11:04	<hsivonen>	that is different from how my code is structured now
11:04	<annevk>	hmm, lets not base the spec on that...
11:04	<annevk>	the proposal on the wiki fits a lot better in the way the parsing section is structured now
11:05	<roc>	if you insist on caring at this level then you also have to think about what the hardware does
11:05	<annevk>	restructuring the parsing section because HotSpot might optimize a direct implementation of that structure better seems like a bit of a stretch
11:05	<roc>	and if you really care then the most important thing is that you make the parser easy to parallelize
11:06	<hsivonen>	annevk: actually, I think making "in namespace" a flag would work in the spec nicely, too
11:07	<hsivonen>	roc: yeah, it's quite possible that certain "optimization" would make cache locality worse
11:08	<hsivonen>	Hixie: does Google Caja use the Validator.nu parser? there was the Google Groups post, but the source tree shows Google's own parser
11:08	<Hixie>	no idea
11:09	<Hixie>	what name do people prefer for elements from the mathml and svg namespaces, as opposed to html elements? "foreign elements", "alien elements", "namespaced elements", or something else?
11:10	<annevk>	foreign
11:11	<annevk>	namespaced is wrong (html is namespaced), alien is weird
11:11	<hsivonen>	ooh! I now see that Google Caja uses the Validator.nu parser but has its own tree builder subclass
11:11	<Hixie>	neat!
11:12	annevk	thought Google Caja was some scripting effort
11:12	<Hixie>	i love w3c's old namespace policy
11:12	<Hixie>	three namespaces are now allowed in text/html
11:12	<annevk>	four
11:12	<Hixie>	they each have a different quasi-random four digit number in them
11:13	<Hixie>	i guess four, yes
11:13	<annevk>	but then your thingie no longer fits :p
11:13	<Hixie>	it's ok
11:13	<Hixie>	the fourth one has a different convention for trailing slashes
11:13	<annevk>	no
11:13	<annevk>	w3.org is broken
11:14	<Hixie>	oh, no, nevermind, the w3c site just redirects it
11:14	<Hixie>	SIGH
11:14	<annevk>	maybe 5 even
11:15	<annevk>	if we touch xml:lang / xml:base
11:15	<Hixie>	valid point
11:15	<annevk>	namespaces are like rabbits
11:15	<Hixie>	i was hoping to drop support for xml:base
11:16	<Hixie>	given the problems it gives us with dynamic pages
11:16	<Hixie>	and just use <base>
11:16	<annevk>	as long as you don't require reloading images on the fly xml:base support should be fine
11:16	<annevk>	and has about the same impact as <base> iirc
11:17	<Hixie>	yeah i guess we have to define this for <base> too
11:18	<Hixie>	oh well
11:18	<annevk>	i think what's important to define is when URI resolution takes place
11:18	<annevk>	for every thingie that takes a relative URI
11:18	<annevk>	cover that and dynamic stuff is covered too
11:18	<Hixie>	yeah
11:19	<Hixie>	that'll be fun
11:19	<Hixie>	yay changing base uris.
11:20	<hsivonen>	if anyone know of Validator.nu parser usage in a project that isn't listed at http://about.validator.nu/htmlparser/ , please let me know
11:31	<Hixie>	i have now specced the _syntax_ of mathml+svg in text/html, if anyone cares
11:35	<annevk>	Hixie, there should probably be more information somewhere on stuff like <a xlink:href=...>
11:36	<annevk>	Hixie, maybe on how namespaces are not explicit in HTML syntax but instead are done through other means
11:36	<Hixie>	yeah, i expect an intro section somewhere will cover that
11:36	<hsivonen>	I find it a bit surprising that even though I offer support for SAX, DOM and XOM, both Abdera and Caja want something else internally
11:37	<annevk>	Hixie, it also doesn't explain for instance that the only foreign containers allowed are <math> and <svg>
11:37	<annevk>	which does impact syntax somewhat because otherwise they'd be "normal elements"
11:39	<Hixie>	annevk: how do you mean?
11:39	<hsivonen>	Hixie: should I see the speccing at current-work?
11:39	<Hixie>	hsivonen: yeah
11:40	<Hixie>	hsivonen: only in the syntax section so far
11:40	<Hixie>	not the parsing
11:40	<annevk>	Hixie, in <p> <g/> </p> the <g/> is an incorrect normal element though in <svg> <g/> </svg> it's a correct foreign element
11:41	<Hixie>	annevk: oh i see what you're saying
11:41	<Hixie>	hm, how to phrase that
11:41	Hixie	checks in the change that makes MathML and SVG legal in HTML
11:41	<annevk>	maybe define foreign element containers or something
11:42	<hsivonen>	Hixie: yeah, I had trouble searching because I didn't realize I needed to search for "foreign el" instead of "math"
11:43	<Hixie>	hsivonen: heh
11:43	<Hixie>	annevk: wait, i don't have to define this. It's already illegal.
11:43	<zcorpan_>	+ GREATER-THAN SIGN (<code title="">]]></code>). Finally, the comment must
11:43	<zcorpan_>	Hixie: s/comment/CDATA block/
11:43	<Hixie>	annevk: oops!
11:43	<Hixie>	er
11:43	<Hixie>	zcorpan_: oops!
11:43	<annevk>	Hixie, because of your new checkin?
11:44	<Hixie>	effectively, yes
11:44	<annevk>	zcorpan_, I suspect http://forums.whatwg.org/viewtopic.php?p=679 of being spam
11:45	<annevk>	Hixie, this stuff definitely needs a solid intro then that pulls it all together
11:46	<zcorpan_>	annevk: hmm yeah
11:46	<Hixie>	annevk: yeah
11:48	zcorpan_	gets a 500 from the forums admin panel
11:48	<annevk>	I get a 500 for http://forums.whatwg.org/login.php
11:49	<annevk>	oh, now it works
11:49	<annevk>	and now it doesn't :)
11:50	<annevk>	hmm, seems the Hixie server park is down again :p
11:50	<Hixie>	acid3 is being hit
11:50	<Hixie>	the server will stop allocating memory when acid3 is hit
11:51	<Hixie>	i hate that this links the tokeniser to the insertion mode and the stack of open elements
11:53	<annevk>	ah yes, I have a total of two pages that break when <![CDATA[ is enabled in text/html, neither seems particularly important, but I guess it's safer not to enable it
11:54	<annevk>	'"if a start tag is emitted with the self-closing flag set, and the token is processed by the tree construction stage without that flag being acknowledged, then there is a parse error"' seems painful btw to implement
11:54	<Hixie>	yeah
11:54	<Hixie>	better ideas welcome
11:55	<annevk>	have the same kind of check as with CDATA
11:55	<annevk>	only also include all HTML void elements
11:55	<Hixie>	i guess that's equivalent, but in practice i'd probably implement it the other way for performance
11:56	<annevk>	how would you check it? adding a check to each start tag thingie?
11:56	<Hixie>	just check the flag when you come out of the tree construction stage
11:57	<annevk>	oh, it'd be a global flag
11:57	<Hixie>	no, just a flag on the token
11:57	<Hixie>	the flag is set when you see a />, it's reset when you self-close
11:57	<Hixie>	check the flag when you are done calling ProcessToken() or whatever
11:58	<annevk>	isn't the selc close happening when you might have lost the original token?
11:58	<Hixie>	why would you lose the original token
11:58	<Hixie>	?
11:59	<Hixie>	token = getToken(); processToken(token); if (token.selfClosingFlag) { parseError(); }
11:59	<annevk>	i suppose that could work
12:00	<annevk>	oh well, never mind, if i can't find out how to make that approach work i'll just do what i suggested above :)
12:00	<Hixie>	:-)
12:00	<annevk>	html5lib python version is unfortunately too slow to matter in perf land anyway
12:06	<annevk>	I think the sections on MathML and SVG in the content section could use some more attention in due course as well. Indicating authors can use them for inline complex mathematics and inline graphics.
12:07	<annevk>	Similarly to <html xmlns=...> we should probably also indicate that <math xmlns=...> and <svg xmlns=... xlink:xmlns=...> are faith tokens that are not forbidden in the text/html world.
12:07	<Hixie>	with examples and stuff, yeah
12:07	<Hixie>	i think i'll just make those attributes end up in the right namespace
12:07	<Hixie>	and that'll handle that
12:07	<Hixie>	(though we might add some text about it too)
12:07	<Hixie>	(informateively)
12:08	<annevk>	that'd be 6 namespaces
12:08	<Hixie>	good times
12:08	<annevk>	rabbits, i tell you :)
12:08	<Hixie>	we're becoming a real grown up w3c language!
12:09	<annevk>	time for another splinter cell, the NNC (No Namespace Consortium)
12:09	<Hixie>	well that's really basically what we are
12:10	<Hixie>	ok moved the tree construction state up to before the tokeniser
12:12	<zcorpan_>	the xmlns attribute on <html> can't end up in the right namespace
12:13	<zcorpan_>	because people do html[xmlns] and expect it to work, iirc
12:13	<Hixie>	i meant just for the math and svg elements
12:13	<Hixie>	but good to know
12:13	<zcorpan_>	ah
12:13	<Hixie>	what's a better name than "in namespace" for hte new insertion mode
12:13	<annevk>	in foreign content
12:13	<Hixie>	"in foreign lands"?
12:13	<annevk>	:)
12:14	<annevk>	i like that
12:14	zcorpan_	too :)
12:18	<hsivonen>	it would be interesting to benchmark the Validator.nu parser with Xerces DOM and Xerces in the DOM mode
12:19	<hsivonen>	to see if the XML is faster meme has substance
12:20	<Philip`>	Perhaps a more useful measure is whether an XML parser written in n man-hours is faster than an HTML parser written with the same amount of effort
12:20	<hsivonen>	Philip`: sure
12:20	<Philip`>	because otherwise you'd find that e.g. HTML parsers used in browsers are faster than their XML parsers, because the HTML parser is used a lot more and has been optimised more heavily
12:21	<hsivonen>	Philip`: but I'm pretty sure that Xerces has way more person hours
12:21	<Philip`>	and so it wouldn't be measuring the differences of the technology itself
12:21	<hsivonen>	Philip`: so if V.nu comes even close, it would mean that XML isn't a real perf win
12:21	<annevk>	(If you start comparing man hours it would also be useful to measure conformances at some point.)
12:22	<hsivonen>	(it seems to me that Xerces is zealously conforming unlike some other XML parser)
12:22	<annevk>	(To see whether it's easier to write a conforming HTML or XML parser.)
12:22	<hsivonen>	s
12:24	<Philip`>	Xerces seems to be about three billion lines of code and I haven't yet worked out what it's all for
12:27	<Hixie>	i hope nobody minds, but i'm making CDATA blocks emit text nodes, not CDATA blocks
12:27	<Philip`>	Hmm, maybe it's mostly for nothing - in a fairly typical .hpp file from Xerces-C, there's 160 lines to declare a class with three pure virtual methods
12:27	<Hixie>	in text/html
12:28	<annevk>	that should be done in text/xml too
12:28	<annevk>	so no
12:34	<hsivonen>	Philip`: part of my point is that XML parsers aren't necessarily small and neat :-)
12:35	<annevk>	yeah, full XML parsers are way more complicated than HTML parsers
12:35	<Hixie>	annevk: actually listing the conditions under which the start tag will be acknowledged is non-trivial
12:35	<Hixie>	er
12:35	<annevk>	it's amazing that propaganda went unclaimed for so long
12:35	<Hixie>	the start tag self-closing flag
12:35	<hsivonen>	Philip`: the reason why I use Ælfred2 even though it was less conforming to start with is that Xerces has layers and layers of abstraction which makes it hard to hack
12:39	<hsivonen>	Hixie: have you already implemented the namespace features in Sawzall?
12:39	<Hixie>	no
12:39	<Hixie>	my sawzall parser is very much behind the spec
12:39	<Hixie>	it still has phases!
12:40	<annevk>	Hixie, if "in foreign lands" and current_node in (<math:mi>, ..., <svg:title>) and tag_name in (html_void_list) or "in foreign lands" and current_node not in (<math:mi>, ..., <svg:title>) or not "in foreign lands" and tag_name in (html_void_list): pass else: fail()
12:40	<hsivonen>	ok. I guess we aren't going to see studies of the algorithm applied to billions of pages just yet
12:41	<Hixie>	annevk: you missed a case
12:41	<annevk>	:(
12:41	<Hixie>	annevk: <p/> in foreign lands is a parse error
12:41	<Hixie>	hsivonen: yeah, i need to get on that
12:41	<Hixie>	hsivonen: probably won't be a few weeks though
12:42	<annevk>	oh right, the magic list of HTML elements
12:42	<annevk>	I forgot about that
12:42	<Hixie>	so did i
12:42	<Hixie>	until i tried to spec it as you requested
12:42	<Hixie>	and then it got reeeeeallllly complicated
12:42	<annevk>	heh
12:43	hsivonen	is pondering changing the tokenizer to one huge switch
12:43	<Hixie>	hsivonen: as opposed to what?
12:43	<hsivonen>	Hixie: method per state
12:44	<Hixie>	wait so you have a method per state in the tokeniser and you're worried about performance of method per state in the tree construction?!?!
12:44	<hsivonen>	Hixie: these methods are supposed to be inlineable
12:44	<Hixie>	fair enough
12:45	<hsivonen>	there's no recursion, for example
12:45	<hsivonen>	the way I've written it, in theory allows everything to be inline in one pile of jumps
12:45	<hsivonen>	in theory
12:46	<Hixie>	he
12:46	<Hixie>	h
12:46	<Hixie>	tokeniser changes _nearly_ done
12:47	<annevk>	"in foreign lands" and tag_name not in magic_list and tag_name not in html_void_list
12:47	<annevk>	and current_node not in (<math:... ...)
12:48	<annevk>	though maybe I should declare defeat instead of playing tricky boolean games
12:48	<Hixie>	you missed the normal html_void_list case :-)
12:52	jgraham	needs to catch up on the logs but assuming a pile of changes to the parser are about to land...
12:52	<Hixie>	some have already landed
12:52	<jgraham>	I suggest we clean up html5lib to match the spec pre changes and cut a release before adding any of the namespace stuff
12:53	<jgraham>	(or at least cut a release branch)
12:54	<jgraham>	Hixie: As long as no one has tried implementing them in html5lib yet it's all good :)
12:54	<Hixie>	:-)
12:56	<hsivonen>	I should cut a release too, but I have to get the Maven stuff up to date for that
12:56	<annevk>	jgraham, fine with me
12:56	annevk	is working on specs for some time
12:57	annevk	has plenty of choice there...
12:57	<jgraham>	Great.
12:57	jgraham	has to get the 33 currently failing testcases to pass
12:57	<hsivonen>	I guess I should write a perf test harness to get from guessing about perf to actually measuring it...
12:57	annevk	looks at hsivonen
12:58	<hsivonen>	annevk: I'm pretty sure that all my test check-ins were spec-based
12:58	<Hixie>	my perf test harness is pretty awesome, wish i could just hand you that
12:58	<annevk>	hsivonen, joking
12:58	<Hixie>	(it consists of timing one run through "parse the web")
12:58	<hsivonen>	aside: I don't like tests getting split to the html5 Google Code project
12:59	<Hixie>	yes when did that happen -- was that me?
12:59	<hsivonen>	Hixie: can you call into Java from Sawzall? :-)
12:59	<Hixie>	i agree that it's dumb
12:59	<annevk>	that already happened?
12:59	<jgraham>	I need to change our perf test harness to use cProfile rather than Hotshot
12:59	<hsivonen>	Hixie: I don't know how it happened
12:59	<annevk>	who "authorized" that?
12:59	<Hixie>	annevk: some tests are there, i fear it may have been me back when i checked them in in the first place
12:59	jgraham	thinks tbroyer started it
12:59	<Hixie>	i dunno
13:00	<annevk>	Hixie, no, you put them in html5lib
13:00	<Hixie>	hsivonen: i expect you could just use java, google is a big java shop
13:00	<Hixie>	(amongst other languages)
13:00	<Hixie>	annevk: ah ok good
13:00	<Hixie>	well then
13:00	<Hixie>	i say, nuke the ones in html5
13:00	<Hixie>	and put them back in html5lib
13:01	Philip`	doesn't see anything interestingly relevant in http://canvex.lazyilluminati.com/html5lib/log/trunk/testdata
13:07	<annevk>	ah, http://code.google.com/p/html5/source/browse/trunk/tests/tree-construction/
13:08	<Hixie>	as a side-effect of the way the changes are being made, i'm removing the parse error from the use of /> syntax on basefont, bgsound, spacer, wbr, and frame elements
13:08	gsnedders	wonders whether to complain about implied *LWS in RFC2616bis, as currently it is allowed in the middle of CRLF :P
13:08	<annevk>	I think he's just experimenting things
13:08	<Hixie>	annevk: he asked for all tests to be put there
13:09	<jgraham>	I think he thinks that it's more project neutral or something
13:09	<jgraham>	I'm not sure there was a good technical reason
13:09	<jgraham>	And there's a very good licensing reason not to
13:09	<Hixie>	and the project owner of the "html5" project thinks it is a bad idea
13:09	<Philip`>	It's annoying that it's not possible to update the HTML5 tests without making html5lib look broken because it fails tests it hasn't been updated for yet
13:10	<Hixie>	it _is_ broken in those cases :-)
13:10	<Hixie>	it doesn't just look it :-)
13:10	<hsivonen>	Maven doesn't really live up to its promise when some of the packages are out of date...
13:11	<Philip`>	It wouldn't be broken if e.g. it was intentionally trying to implement pre-namespacey HTML5 while someone wanted to add post-namespacey tests
13:11	<jgraham>	There's a certian amount of tension between the TDD thing of using tests to find regressions and using tests to find spec problems
13:12	<annevk>	e-mailed implementors⊙wo with a note about the location of the tests
13:12	<zcorpan_>	i wonder if xmlns talismans should be allowed in more places if html fragments are allowed in svg or mathml
13:12	<Hixie>	Philip`: i was just teasing :-)
13:13	<Hixie>	zcorpan_: good question
13:13	<Hixie>	zcorpan_: if you want it, send mail or mention it on http://wiki.whatwg.org/wiki/New_Vocabularies_Solution
13:14	zcorpan_	sends mail
13:14	<annevk>	thinking about it, just putting xmlns in the right namespace might work, but what if the value is incorrect?
13:14	annevk	wonders how that works
13:15	<zcorpan_>	annevk: nothing bad happens
13:15	<annevk>	I know
13:15	<annevk>	but what about conformance checkers
13:15	<zcorpan_>	what about them?
13:15	<annevk>	they should maybe flag that
13:16	<zcorpan_>	yes
13:16	<annevk>	where in XML they don't have to
13:16	<zcorpan_>	you mean that the validation layer doesn't see the namespace declarations?
13:16	<annevk>	in theory it shouldn't have to see it
13:16	<Hixie>	i would only allow xmlns="" on elements that cross boundaries, and would flag them if they are wrongly set
13:17	<annevk>	if you solve it at the parsing level that would work, yes
13:17	<Hixie>	right
13:17	<zcorpan_>	perhaps it can be checked in the parser instead of in the validation layer
13:17	zcorpan_	is too slow
13:26	<hsivonen>	zcorpan_: fwiw, currently the V.nu validation layer doesn't see permitted xmlns. the parser checks for it and eats it.
13:26	<zcorpan_>	hsivonen: ok
13:30	<annevk>	jgraham, is http://canvex.lazyilluminati.com/html5lib/changeset?new=trunk%2Ftestdata%401127&old=trunk%2Ftestdata%401123 because I implemented table whitespace handling incorrectly?
13:32	<jgraham>	annevk: IIRC you just missed the necessary methods in the parser classes
13:32	<jgraham>	s/parser/phase/
13:32	<jgraham>	Lachy spotted it
13:32	<annevk>	oh :(
13:32	<Hixie>	hsivonen: does your validator catch duplicate IDs?
13:33	<Hixie>	hsivonen: i think it missed a duplicateid in the html5 spec, fwiw
13:35	<hsivonen>	Hixie: it should. If it didn't, it is a bug.
13:36	<hsivonen>	Hixie: it doesn't check IDREFs in MathML yet
13:36	<hsivonen>	nor hashed ID references in SVG
13:36	<Hixie>	check the html5 spec on the whatwg site at the moment
13:36	<Hixie>	id="parsing-main-inselect" is there twice, i believe
13:36	<Hixie>	and i didn't see an error
13:36	<Hixie>	though i could be mistaken
13:37	<Hixie>	(just fixed it when i noticed it)
13:37	<Hixie>	(but haven't regenned yet)
13:39	<hsivonen>	Hixie: thanks. I made a copy of the spec and will investigate
13:39	<Hixie>	np
13:40	<annevk>	hsivonen, window.onabort = function() { enableValidateButton() } or some such would be nice
13:41	<annevk>	(if that works, not sure)
13:41	<annevk>	btw, it seems duplicate ID checking is broken
13:41	<annevk>	for a simple document <body id=x><p id=x> it doesn't report errors
13:42	<hsivonen>	annevk: thanks. http://bugzilla.validator.nu/show_bug.cgi?id=150
13:50	<Hixie>	ok it is done
13:50	<Hixie>	regenning now
13:51	<annevk>	i think i'll post about math and graphics too
13:51	<hsivonen>	I wonder why Maven can't download GPG-signed files over HTTP or something like that
13:51	<hsivonen>	letting them fetch stuff using SSH seems excessive
13:51	<annevk>	seems like the kind of thing people might want to shout at :)
13:52	<annevk>	Hixie, foreign lands was way better :(
13:53	<annevk>	it had this nice fairy tale ring to it
13:53	<Hixie>	i agree :-)
13:54	<Hixie>	i love the text interface to validator.nu
13:54	<Hixie>	i love being able to validate the spec in my script while the spec is being regenned
13:55	<hsivonen>	Hixie: do you have a spec for the line pragma enhancement and a reason why it belongs on the server and not in the client?
13:56	<hsivonen>	is it just adding a constant to line numbers?
13:56	<hsivonen>	(where the constant may be negative)
13:56	<Hixie>	subtracting, in my case
13:56	<Hixie>	right
13:56	<annevk>	Hixie, title needs to be title=""-ed
13:56	<Hixie>	i have two files
13:56	<Hixie>	header + main source
13:56	<annevk>	(the <code>title</code> element in the SVG ...
13:56	<Hixie>	i validate the concatenation
13:56	<Hixie>	but edit only the main source
13:56	<Hixie>	i'm open to other options
13:57	<hsivonen>	Hixie: I guess the other option would be doing the math in the client script
13:57	<Hixie>	but right now i have to first output \|wc -l header\|, and then go to the given line, and tell emacs to move by the given offset
13:57	<hsivonen>	Hixie: are you using my client or your own?
13:57	<Hixie>	i'm using wget
13:57	<zcorpan_>	"Elements that are from namespaces other than the HTML namespace and that convey content but not metadata, are embedded content for the purposes of the content models defined in this specification. (For example, MathML, or SVG.)"
13:57	<Hixie>	(actually, curl, but same idea)
13:57	<annevk>	also, the CDATA block state is not linked
13:58	<hsivonen>	Hixie: GET or POST, out of curiosity?
13:58	<annevk>	maybe that's because it's named CDATA state later on
13:58	<zcorpan_>	...seems to contradict to the r1401 checkin
13:58	<Hixie>	hsivonen: GET with URI, i believe
13:59	<Hixie>	zcorpan_: valid point
14:00	<hsivonen>	Hixie: ok.
14:02	<zcorpan_>	for some reason i thought the 1998 and Math were supposed to be the other way around
14:03	<Hixie>	i just fixed editorial problems that you've all listed
14:04	<Hixie>	bed time now
14:04	<hsivonen>	hmm. I wonder what the right filename and mime convention with .sig, .asc and .gpg is...
14:04	<MikeSmith>	Hixie - editorial nit: when you use "i.e.", it should always be followed by a comma
14:05	<zcorpan_>	Hixie: "A start tag, if the current node is a title element in the SVG namespace." links to the html <title> definition
14:05	<MikeSmith>	(i.e., not foreign)
14:05	<Philip`>	Hmm, Opera's MathML support doesn't seem to even cope with integrals
14:06	<Hixie>	zcorpan_: thought i fixed that
14:06	<Hixie>	MikeSmith: really?
14:06	<Hixie>	anyway
14:06	<Hixie>	bed time
14:06	<Hixie>	nn
14:06	<Philip`>	(Also its superscript integral pi looks like a box)
14:06	<zcorpan_>	nn
14:06	<Philip`>	(Uh)
14:06	<Philip`>	s/integral/italic/
14:06	<annevk>	hmm, the more difficult issues are still not solved :p
14:07	<MikeSmith>	Hixie - yeah, see any style guide (MLA or Chicago manual or whatever)
14:07	<annevk>	such as the magic HTML list and SVG fixups
14:07	<MikeSmith>	http://andromeda.rutgers.edu/~jlynch/Writing/e.html
14:07	<MikeSmith>	or http://www.colorado.edu/Publications/styleguide/abbrev.html#ieeg
14:08	<MikeSmith>	annevk - what's the magic HTML list?
14:08	<Philip`>	"Both "i.e." and "e.g." should have periods after each letter and be followed by a comma."
14:08	<Philip`>	That's only a SHOULD
14:08	<zcorpan_>	annevk: they don't seem very difficult to me, although the html list needs research
14:08	<MikeSmith>	Philip` - heh
14:09	<Philip`>	so it's fine to ignore that requirement when it make your text look silly
14:09	<Philip`>	*makes
14:09	<MikeSmith>	it should be a SHALT
14:09	<zcorpan_>	MikeSmith: tags that close <math> and <svg>
14:09	<annevk>	MikeSmith, a list of element names that lets you escape the foreign lands
14:09	<MikeSmith>	ah
14:09	<annevk>	with a parse error, but at least you're home safe
14:09	<zcorpan_>	:)
14:10	<MikeSmith>	I can see we going to have some fun with "foreign elements" as a term
14:11	<annevk>	I suggested "in foreign content". I regret that now, as Hixie had "in foreign lands"
14:11	<zcorpan_>	we should bikeshed content vs lands on the list!
14:11	<Philip`>	Why is it foreign, when it's a proper part of HTML?
14:12	<Philip`>	We should set up a forum poll
14:12	<zcorpan_>	Philip`: go ahead :)
14:12	<zcorpan_>	http://forums.whatwg.org/
14:14	jgraham	suggests a whatwg green vs w3c blue poll
14:15	<jgraham>	especially if we can somehow turn it into a flamewar
14:15	<zcorpan_>	jgraham: go ahead :P
14:15	<Philip`>	It's obvious that the green would win, because it's so much better
14:15	<MikeSmith>	you guys should call yourselves the WAG WG
14:16	<MikeSmith>	wag it the 17th/18th-century sense of the word
14:17	<zcorpan_>	"A person who is fond of making jokes"?
14:18	<Philip`>	"A mischievous boy (often as a mother's term of endearment to a baby boy); in wider application, a youth, young man, a ‘fellow’, ‘chap’. Obs."?
14:19	<Philip`>	Actually that's mostly 16th century
14:20	MikeSmith	remembers Robert Burns poem "Epitaph for a Wag"
14:20	<Philip`>	("1573-80 TUSSER Husb. (1878) 177 For euerie trifle leaue ianting thy nag, but rather make lackey of Jack boie thy wag." - hmm, I lack sufficient backward compatibility to read that)
14:20	<MikeSmith>	about not stepping a guy's grave because he might've been your father
14:21	<annevk>	ah, it was me _and_ Hixie (regarding moving tests)
14:21	<jgraham>	Philip`: But green is a major accessibility issue because some people are Red/Green colour blind so they might think the spec is red and dangerous. You're just a typical heartless WHATWGer ;)
14:23	<Philip`>	jgraham: Some people are blue-yellow colourblind so they would think the spec is a sunflower and could suffer terrible gardening accidents
14:24	<Dashiva>	I wonder if red/green colorblind people are more likely to say "green looks like red" or "red looks like green"
14:25	<jgraham>	Philip`: You should contact the wai people urgently.
14:29	<Philip`>	Usually I try to avoid the problems in practice by just making sure things looks readable when printed in black-and-white
14:39	<annevk>	I think we can not continue this discussion until the PFWG has provided us with a formal reply.
14:39	<jgraham>	Hmm. Where does the spec define that doctype tokens cannot have a null public id or system id
14:39	<jgraham>	?
14:39	<annevk>	after the tokenizer it seems
14:39	<annevk>	when the doctype node is constructed, to be specific
14:40	<jgraham>	ah, OK.
14:40	<jgraham>	I was looking in the tokenizer
14:59	<MikeSmith>	"I think we need a way to coax authors into more semantic seriousness."
14:59	MikeSmith	is re-reading some parts of new-vocabs thread
14:59	<MikeSmith>	no clue what "semantic seriousness" is
15:00	MikeSmith	trying to get a read on where this guy's coming from
15:01	<MikeSmith>	OK, http://www.albany.edu/~hammond/ provides some clues
15:01	<MikeSmith>	"Generalized Extensible LaTeX-Like MarkUp (GELLMU) is the name of my project that originated with the aim of building a bridge from traditional LaTeX to the new world of XML languages. "
15:04	<MikeSmith>	"HTML is a rather low-powered member of the SGML family. The notion of “power” for a language under the umbrella of SGML has to do with the number of available translations to other document languages, both within and without SGML."
16:03	<zcorpan_>	hsivonen: " (except the langattribute maps to xml:lang)." s/lang/lang / in http://about.validator.nu/
19:10	<MikeSmith>	Hixie - about "Is there something that grows downwards that we could use as a metaphor here instead of 'stack'?" (r1344)
19:10	<MikeSmith>	sinkhole
19:11	<Philip`>	Stalactite?
19:17	<MikeSmith>	icicle
19:17	<Philip`>	Beard
19:17	<MikeSmith>	heh
19:44	<zcorpan_>	"Attributes may be separated from each other by one or more space characters." hmm, shouldn't that be a must?
19:52	<zcorpan_>	"If an attribute using the X attribute syntax is to be followed by another attribute, then there must be a space character separating the two." for all 4 values of X
19:53	<zcorpan_>	seems needlessly repetitive :)
19:53	<zcorpan_>	(although perhaps the spec should be reverted to not require spaces between attributes; i'm not sure it's helping or hurting)
20:19	<zcorpan_>	should "<![CDATA[" be case-insensitive?
20:19	<zcorpan_>	i mean, everything else in text/html is case-insensitive
20:23	<zcorpan_>	if cdata blocks are only supported in foreign lands, will authors use <math><mtext><![CDATA[ instead of <pre> or <xmp> when they don't want to escape their text with entities?
20:25	<zcorpan_>	how does foreign lands affect innerHTML?
20:38	zcorpan_	notes that cdata blocks suffer from the copy-paste cargo-cult problem
20:39	<zcorpan_>	blah blah <svg> ... <![CDATA[ ... </svg> blah
20:44	<zcorpan_>	why ban cdata blocks in svg <desc> etc?
22:41	<Hixie>	ew, yes, the CDATA blocks do suffer from cargo-cult issues
22:41	<Hixie>	crap
22:42	<Hixie>	oh i bet there are totally pages that foul up because of that
22:42	<Hixie>	crap crap crap
22:43	<hsivonen>	terminating CDATA on > would avoid that problem but would be very very wrong XML-wise
22:43	<annevk>	http://www.microsoft.com/presspass/press/2008/apr08/04-05LetterPR.mspx hits for "share": 10, "employee": 1
22:43	<annevk>	just "shareholder": 7
22:43	<Hixie>	hsivonen: yeah...
22:44	<Hixie>	hsivonen: or even <
22:45	<Hixie>	i'll make a note to study this
22:45	<annevk>	thought: make svg:style and svg:script trigger CDATA and drop <![cdata[
22:46	<Hixie>	yeah that might work
22:46	<hsivonen>	problem: copypaste
22:46	<Hixie>	how so?
22:47	<Hixie>	you mean if people use cdata in <mtext> or something?
22:47	<hsivonen>	no, I mean XML-style scripts would parse differently
22:47	<annevk>	parsing <svg:script> as <html:script> within text/html would also be way more backwards compatible
22:48	<Hixie>	hsivonen: we could have a separate mode that alsos strips <![CDATA[ and ]]>
23:06	<annevk>	it also makes sense for internal consistency, imo
23:29	<Philip`>	How come nobody uses IE's datasrc/datafld thing?
23:32	<Philip`>	Oh, some people do use it
23:33	<Philip`>	but only, like, two people in 2^17
23:40	<Hixie>	yeah <t:video>
23:40	<Hixie>	people don't use many of IE's weird extensions, it's kind of funny
23:40	<Hixie>	is there a faster way to parse an xml file into a DOM in python than xml.dom.minidom.parse() ?
23:43	<hober>	I think lxml is faster
23:44	<Hixie>	k
23:44	<Hixie>	thx
23:47	<jgraham_>	lxml != DOM
23:47	<Philip`>	http://software.hixie.ch/utilities/js/live-dom-viewer/?%3Cbody%3E%0D%0A%3Cxml%20id%3Dx%3E%0D%0A%3Cr%3E%0D%0A%20%3Ci%3E%3Ca%3ECheese%3C%2Fa%3E%3Cb%3E250g%3C%2Fb%3E%3C%2Fi%3E%0D%0A%20%3Ci%3E%3Ca%3ERhino%3C%2Fa%3E%3Cb%3E3%20tbsp%3C%2Fb%3E%3C%2Fi%3E%0D%0A%3C%2Fr%3E%0D%0A%3C%2Fxml%3E%0D%0A%0D%0A%3Ctable%20datasrc%3D%23x%3E%0D%0A%20%20%3Ctr%3E%0D%0A%20%20%20%20%3Ctd%3E%3Cdiv%20datafld%3Da%3E%3C%2Fdiv%3E%0D%0A%20%20%20%20%3Ctd%3E%3Cdiv%20datafld%3Db%3E%3C%2Fd
23:47	<Philip`>	It's not even that hard to use or anything
23:48	<jgraham_>	(it depends if you want a real DOM or if some other representation will do. If you do want a real DOM minidom still isn't a good choice)
23:49	<Philip`>	(Oops, got cut off...)
23:50	<Philip`>	(but the bit that got cut off is obvious, so fix it yourself if you care)