#whatwg on 2007-10-12

00:30	<Hixie_>	mjs, others, any input on this? http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2007-October/012683.html
00:30	<Hixie_>	should i include such an API?
00:32	<othermaciej>	Hixie_: hadn't thought about it deeply - I can see how it might be useful, the only concern I have is that preflighting for the availability of a resource in the cache instead of looking for an error on access introduces possible races
00:32	<othermaciej>	(in other words the cache item could expire between when you ask and when you try to access it)
00:32	<Hixie_>	indeed
00:33	<Hixie_>	and vice versa, could be added in between
00:33	<othermaciej>	true
00:33	<othermaciej>	or you could go offline/online in between, thus changing the rules of what resources can be served even expired
00:33	<othermaciej>	but I'm not sure how to achieve the desired result without preflighting
00:34	<Hixie_>	yeah
00:35	<kingryan>	could the API take a function that is executed iff the resource is available? then the impl could synchronize on the cache
00:35	<Hixie_>	maybe
00:35	kingryan	doesn't know how hard is suggestion would be to implement
00:37	<othermaciej>	the hard part to implement would be ensuring whatever load you may trigger uses that cached copy
00:37	<othermaciej>	it could be setting a frame's window.location, it could be setting an img src, it could be an XMLHttpRequest...
00:37	<othermaciej>	there's lots of ways to trigger a resource load
00:37	<othermaciej>	in a sense the most robust API would be to add a "try if local, otherwise error" version of all of those
00:38	<othermaciej>	or at least the ones that are considered important for this purpose
08:14	<Lachy>	good morning :-)
08:29	<hsivonen>	Lachy: morning
09:34	<hsivonen>	I wonder what the point of the NCNameness of xml:id is or the point of the normalization
09:36	<othermaciej>	architecture
09:36	<othermaciej>	xml architecture
09:39	<hsivonen>	off-hand, I don't see what architectural problem the NCNameness solves here
09:40	<hsivonen>	it's just an arbitrary "we like this kind of strings but not other kinds of strings"
09:40	<othermaciej>	my implication is that everything about xml boils down to that sort of thing
09:41	<hsivonen>	yeah
09:42	<hsivonen>	so far, xml:id is taking me more than thrice the # of lines of code needed for XHTML5 ids.
09:46	<hsivonen>	for now, I'm going to pretend that the conditional IDness of SVG 1.2 id doesn't exist and SVG ids are unconditionally IDs.
09:47	<othermaciej>	conditional idness?
09:48	<othermaciej>	are id attributes not ID when an xml:id is also present or something?
09:48	<hsivonen>	othermaciej: according to SVG 1.2, IIRC, yes
09:48	<hsivonen>	not cool
09:48	<othermaciej>	they probably did that to work around the fact that you can't have id="x" xml:id="x"
09:49	<othermaciej>	which you would need to both work with older versions of SVG and to drink the latest kool-aid
09:51	<hsivonen>	othermaciej: it is easy to guess why they did it, but it is the wrong fix
09:51	<hsivonen>	othermaciej: the right fix is to use only id='x' without xml:id='x'
09:51	<hsivonen>	anyway, last bullet point under http://www.w3.org/TR/SVGMobile12/struct.html#Core.attrib
09:51	<othermaciej>	hsivonen: but that would entail not using xml:id, which is the cool new thing, and thus obviously right to use
09:51	<hsivonen>	so my memory didn't fail me
09:55	<zcorpan>	hsivonen: would you be shot down for not supporting xml:id at all? :)
10:05	<hsivonen>	zcorpan: I don't know
10:05	<hsivonen>	I feel like venting on www-svg, but it would probably be wasted effort
10:43	<hsivonen>	I figured that now that the idea of href='' instead of xlink:href='' is no longer outrageous, I might as well try suggesting not working around the problems xml:id creates
10:46	<OmegaJunior>	Where would href i.o. xlink:href be outrageous? Not in html5, I suppose. In xhtml5 perhaps?
10:48	<hsivonen>	OmegaJunior: SVG
10:48	<OmegaJunior>	Ah
10:57	<hsivonen>	Hixie_: is there a good reason why ID assignment shouldn't happen if the ID candidate is the empty string? why the exception?
11:22	<zcorpan>	http://software.hixie.ch/utilities/js/live-dom-viewer/?%3Cp%20id%3D%22%22%3E%3Cscript%3Ew(document.getElementById(%22%22))%3C%2Fscript%3E
11:22	<zcorpan>	null in ie, firefox, safari
11:23	<hsivonen>	zcorpan: ok. thanks
11:44	<Lachy>	Hixie_, yt?
11:45	<Lachy>	Hixie_, the data URI kitchen is broken http://software.hixie.ch/utilities/cgi/data/data
11:46	<Lachy>	Hixie_, it seems to only work for file uploads, not text input or http URIs
11:59	<jwalden>	nice, roc followed up on isLocallyAvailable so I don't have to feel obligated to do so :-)
13:36	<zcorpan>	was something interesting said during the telecon? reading the log it seemed pretty hollow
13:39	<hsivonen>	I suppose attributes that have IDness should have IDness and remain in the infoset even if the value is "". so all code everywhere that is ID-sensitive needs to check both for IDness and the value emptystringness :-(
13:42	<zcorpan>	why do you suppose so?
13:43	<hsivonen>	zcorpan: it seems wrong to change the IDness based on value. moreover, I'm not sure if non-validating DTD processing could inject ""-valued IDs into the pipeline
13:44	<zcorpan>	ok
13:45	<zcorpan>	then html5 shouldn't say that empty id="" doesn't do ID assignment, but dom core should say that the empty string as argument to getElementById() should return null
13:46	<zcorpan>	correct?
13:55	<hsivonen>	zcorpan: depends on whether ID assignment and IDness assignment mean different things
13:55	<hsivonen>	IDness assignment clearly means if querying the attribute for its type return "ID"
13:56	<hsivonen>	I'm not sure what ID assigment exactly means but I understood in to mean: put in a hashtable with the ID value as the key
13:56	hsivonen	keeps mistyping assignment over and over
14:25	<zcorpan>	i should write an xml-stylesheet spec that doesn't have fatal error requirements
14:26	<zcorpan>	which would also benefit xbl2
14:51	<zcorpan>	i wonder how i should approach that...
14:52	<zcorpan>	i mean, to make progress, i should just go ahead and reverse engineer relevant implementations, write test cases and a spec, then ask for feedback
14:53	<zcorpan>	but politically that might not be the best way to do it
14:55	<zcorpan>	perhaps i should start out with reverse engineering and demos, and point out the problems to the relevant WG(s), and if i don't get a response then i go ahead and write a spec and then ask for feedback
15:02	<zcorpan>	any thoughts? which are the relevant WGs for xml-stylesheet?
15:04	<heycam>	zcorpan, xml core wg seems most appropriate
15:04	<zcorpan>	heycam: ok
15:05	<heycam>	on http://www.w3.org/XML/Group/Core they list working on that document again as a "future task"
15:07	heycam	sleeps
15:16	<hsivonen>	zcorpan: fwiw, the idea that XML Core WG has about "relevant implmentations" is likely to be radically different from yours
15:32	<ROBOd>	hello guys! pardon me barging into the discussion. i have one quick question: did microsoft (via chris wilson?) publish the complete html5 review? iirc, it was supposed they'll publish a review. i haven't seen it yet
15:33	<Philip`>	They haven't done, though ChrisW mentioned it yesterday
15:34	<Philip`>	00:29 < DanC> ChrisW: yes, I'm working with the IE team on our review...
15:34	<Philip`>	00:30 < DanC> ... I'll have some stuff sent out prior to the ftf meeting.
15:35	<ROBOd>	aha, thanks Philip`
15:37	<zcorpan>	hsivonen: i'd like to learn which implementations they consider relevant
16:47	<zcorpan>	http://blogs.s60.com/browser/2007/10/coring_the_browser.html -- hmm, so they support SVG via plugin instead of using WebCore's native support? (remember that they parse xml with the html parser)
16:50	<zcorpan>	"The difference with browsing is that HTML, CSS and ECMAScript create a really complex system, and the standards can never exactly specify the "correct" behavior in every case."
16:51	<zcorpan>	oh?
19:32	<kingryan>	is thomas broyer around here?
19:56	<Hixie_>	Lachy: odd
19:56	<Hixie_>	oh i know why
19:56	<Hixie_>	issues with the content-type sanitation
19:58	<Hixie_>	fixed
21:22	<Vito`>	hello, we're using html5lib to generate plain text from archived html pages. we're finding html5lib bombs out with maximum recursion errors on some pages.
21:22	<kingryan>	Vito`: please give an example
21:24	<Vito`>	http://mavra.perilith.com/~vito/html5lib/2002-0919-120019.html
21:25	<Vito`>	this is the python html5lib, and we're just doing parser=html5lib.HTMLParser();dom=parser.parse(filecontents);
21:25	<kingryan>	backtrace?
21:25	<Vito`>	maximum recursion depth reached, or some such
21:26	<Vito`>	the backtrace is accordingly > 1000 lines
21:26	<kingryan>	that'd be the error message. do you have backtrace?
21:26	<kingryan>	ah
21:26	<kingryan>	can you at least give us an idea of where the recursion is happening?
21:26	<Vito`>	lines 273 and 866 are repeated
21:26	<kingryan>	of what file?
21:27	<kingryan>	and which version of html5lib are you using?
21:27	<Vito`>	html5parser.py, in endTagHtml, self.parser.phase.processEndTag(name) and self.endTagHandler[name](name)
21:28	<kingryan>	I can't reproduce the error in trunk in either python or ruby
21:28	<Vito`>	hm
21:29	<kingryan>	which version are you using?
21:29	<Philip`>	I get no error in the Python trunk version either
21:29	<Vito`>	trying to find out
21:29	<Philip`>	and http://james.html5.org/cgi-bin/parsetree/parsetree.py?uri=http%3A%2F%2Fmavra.perilith.com%2F%7Evito%2Fhtml5lib%2F2002-0919-120019.html looks alright
21:44	<Vito`>	I thought it was the latest, but I guess we're running 0.9 or something. Installing 0.10 locally got the first batch of failures passing.
21:44	<Vito`>	I'll let you know if anything new comes up. Thanks for the sanity check.
21:51	<jgraham>	Hmm. In principle there are places where html5lib could have problems with recursion as it uses recursive algorithms in some places where iterative ones could be used
21:51	<jgraham>	But AFAIK there was at least one infinite loop bug fixed since 0.9
21:52	<Vito`>	we have a bit over 23k archived pages and we were hitting something pretty frequently
21:52	<Vito`>	but it's all OKs so far with 0.10
21:57	<Philip`>	If you're parsing lots of pages, it may be worth looking at hsivonen's Java HTML5 parser since it's around a hundred times faster than the Python one
21:59	<Vito`>	alright, I've an AssertionError using 0.10. Should I try with trunk or are they the same?
22:01	<kingryan>	they're mostly the same
22:03	<Vito`>	happens with trunk as well
22:04	<Vito`>	http://mavra.perilith.com/~vito/html5lib/2003-0701-120001.html
22:04	<Vito`>	I can save these and stuff them all into the issue tracker as well, of course
22:04	<Vito`>	given that many more are passing than failing now
22:28	<jgraham>	Vito`: That looks like a recent regression. I'll have to investigate further
22:29	<Vito`>	I've had a handful of those now, plus one with an encoding error. I'll just put them all in the tracker.
22:30	<jgraham>	(in the meantime you can try removing the assertion that fires; I _think_ the only bad side effect is that the source position reported for errors might be wrong)
22:30	<jgraham>	Thanks
22:32	<jgraham>	Vito`: Add me as the owner for the bugs you file (jgraham.html)
22:33	<Vito`>	k
22:40	<Hixie_>	othermaciej: do you mind if i skip replying to <video>-related e-mails from you if the spec already does everything you asked for in those e-mails, or would you rather have replies to all your mails? (either is fine, just checking which you prefer)
22:40	<othermaciej>	Hixie_: if the emails predate the current version of <video> then I can do without such replies
22:41	<othermaciej>	Hixie_: I will have new feedback from Apple soon relative to the current spec as a baseline, so I am not worried about things getting lost
22:41	<Hixie_>	k
22:42	<Hixie_>	these predate the <video> dinner at google
22:49	<jgraham>	Vito`: I think I have fixed one of your issues. I'll update svn in a few minutes once I fix a few issues with my working copy
22:55	<Vito`>	ooh
22:55	<Vito`>	"Warning: Undefined behaviour for end tag section"
22:55	<Vito`>	didn't log the file for that one, darn
22:55	<jgraham>	Vito`: That's expected
22:56	<Vito`>	ah
22:56	<jgraham>	<section> is a new HTML 5 tag but its parsing isn't yet defined (we treat it like a generic unkown element). However you will sometimes find authors inventing tags in the wild
22:57	<jgraham>	So you could have encountered a rouge <section>
22:57	<Vito`>	I wonder what page that was. There were ~30 of them.
22:57	<Hixie_>	a red section? Is that, like, a porn site?
22:58	<jgraham>	Hixie_: ?
22:58	<Vito`>	rouge
22:58	<hober>	maybe you mean rogue, Vito`
22:58	<jgraham>	Sorry being slow
22:59	<Hixie_>	sorry :-)
22:59	<jgraham>	:-p
23:00	<Vito`>	I assume "Warning: Undefined behaviour for end tag header" is the same sort of thing?
23:01	<jgraham>	Uh hu.
23:01	<Vito`>	fascinating
23:03	<jgraham>	Hmm. Odd. html5lib is failing unit tests even without my change. This isn't supposed to happen :-\|
23:07	<jgraham>	Maybe it's just kingryan's extra tests
23:17	<Philip`>	Incidentally, RIP has slightly interesting error handling - it has a version field, and v1 of the spec defines draconian error handling for packets with version=1 (e.g. reserved fields must contain zero, else the data is rejected), but non-draconian handling if the packet has version >= 2
23:17	<Hixie_>	what's RIP? And is that actually implemented?
23:18	<Hixie_>	(and do people ever give a version field?)
23:18	<Philip`>	so v2 of the protocol can start using the reserved fields, being certain that v1 implementations won't be sneakily using those fields anyway, while still being backward-compatible with v1 implementations
23:18	<Philip`>	It's a routing protocol
23:18	<Philip`>	(RFC1058)
23:18	<Hixie_>	interesting
23:18	<Hixie_>	oh, that RIP
23:19	<Hixie_>	i shoulda recognised the name
23:21	<Philip`>	As far as I'm aware, people do actually use it (on very small networks, since it's very simple (which is why I've been looking at RIP and not at anything more interesting and complex :-) ))
23:22	<Hixie_>	:-)
23:23	<Vito`>	jgraham... I'm most of the way through this corpus now, and that assertion error and the unicode error are the only two unique failures I've seen
23:23	<Philip`>	(I think I'll end up having to work with BGP, which looks much scarier)
23:24	<jgraham>	Vito`: That sounds like it could be worse (did you get the message that I checked in a fix that I think helps with the assertion to trunk?)
23:25	<Vito`>	if it's in trunk I'll update and check the pages against it
23:29	<Vito`>	also html5lib can't handle GIF files with inappropriate MIME types
23:30	<Vito`>	just... so you know
23:30	<Philip`>	Hmm, it worked fine when I passed a PDF through it once
23:30	<Philip`>	What kind of problem did you get?
23:31	<Vito`>	unicodedecodeerror
23:32	<Philip`>	Ah
23:32	<Vito`>	awesome
23:32	<Vito`>	recursion error
23:32	<Vito`>	and it didn't log
23:32	<Vito`>	argh
23:34	<jgraham>	Vito`: I don't lnow what test data you're using but it clearly rocks :)
23:35	<Vito`>	it's just our group's cache of bookmarked sites, crawled over the past few years
23:35	<kingryan>	jgraham: yes, I've added some test which may break the python impl
23:35	<kingryan>	jgraham: in ruby it was mostly a matter of adding error messages to the parserError() calls
23:35	<jgraham>	kingryan: It mostly seems to be small things; I'm just working through it now
23:36	<kingryan>	cool
23:36	<kingryan>	I meant to write a note to the ML about it, but forgot
23:37	<Vito`>	jgraham... testing against your updated trunk now
23:40	<Vito`>	jgraham... looks good against a couple of the failing pages
23:45	<Vito`>	jgraham... I can't seem to mark you as owner of the unicode issue I just reported
23:45	<jgraham>	Vito`: Not to worry; I'll do it