#whatwg on 2009-03-28

00:02	<sicking>	Hixie, btw, I also suggest that we support <!CDATA[ ]]> everwhere where we parse PCDATA and RCDATA. Due to goal 2 above. Sounded like Operas experiment in this area is that it doesn't break the web. But it's an orthogonal discussion to the one we're having now so I wasn't going to raise it until after
00:06	<annevk3>	sicking, actually, we encountered issues with supporting <![CDATA[
00:06	<sicking>	annevk3, big enough that you removed the support?
00:06	<annevk3>	sicking, I don't think we have yet, but I think we should
00:07	<sicking>	annevk3, why not, and why?
00:07	<annevk3>	because it causes issues and supporting CDATA has no real benefit
00:07	Philip`	hopes that if Opera doesn't remove CDATA support, they at least make it less insane
00:08	<sicking>	annevk3, i doubt you'll ever find a feature that you can deploy without any issues. The most recent example we had was a site breaking because we implemented document.readyState
00:08	<annevk3>	sicking, but CDATA is not a feature, it's near useless
00:08	<sicking>	annevk3, since we'll have to support <![CDATA[]]> inside SVG, I think the consistency would be nice
00:09	<annevk3>	maybe, I'd rather remove it there too
00:09	<annevk3>	(also Opera's current CDATA support doesn't match the HTML5 spec)
00:10	<sicking>	annevk3, i'd be supportive of that if we think there isn't much existing SVG content that uses it. In other words, if we don't think it'd affect the ability to copy SVG into HTML
00:12	<annevk3>	there's very little SVG so that's pretty hard to tell
00:13	<annevk3>	I don't recall ever needing it in any SVG content I've created, but then I mostly do simple things
00:14	<Philip`>	http://philip.html5.org/misc/spec-links-anim.svgz
00:14	<Philip`>	That's got a script, but doesn't use CDATA
00:16	<Philip`>	Hmm, why does View Source in Firefox 3 go unbelievably slowly on that SVG file?
00:16	<Philip`>	If the script had to use < or & then I probably would have put it in <![CDATA[]]>
00:16	<Philip`>	but it didn't so I didn't
00:17	<sicking>	have a good weekend people
00:17	<Philip`>	Yikes, FF3 uses about 400MB of RAM to view-source on that page :-/
00:18	olliej	opens in S4
00:18	olliej	wonders how badly it will fair
00:18	<sicking>	Philip`, view-source will always use more memory than the actual page
00:19	<olliej>	memory seems to have peaked at ~600mb :-O
00:19	olliej	wonders wtf is happening
00:19	<sicking>	Philip`, it doesn't use any entities either though, so should work find
00:19	<sicking>	fine even
00:20	<annevk3>	what's the advantage of toTempURL over toDataURL? working around IE bugs?
00:21	<Dashiva>	That's how I understood it
00:21	<annevk3>	feature design 101: don't propose a new feature to work around UA bugs in an existing feature
00:21	<Dashiva>	Not sure why supporting toTempURL would be easier than fixing data urls
00:22	<Philip`>	I think the idea is that you should be able to say img.src = canvas.toSomeKindOfThingThatWorksInImgSrc() and have it work in browsers that don't support data URIs
00:22	<annevk3>	Philip`, right, see above
00:23	<annevk3>	nn
00:23	<Philip`>	Dashiva: Because you can't fix data URIs in IE6
00:23	<Philip`>	(I assume)
00:24	<Dashiva>	But can you fix toTempURL in IE6?
00:24	<Philip`>	If you're writing a plugin or add-on or whatever it is, then you presumably can
00:25	<Philip`>	since you can save the canvas data to disk and then use file:/// to refer to it
00:25	<Philip`>	or you can register a protocol handler for customHandler://
00:26	<Dashiva>	Register a protocol handler for data:, feed the contents?
00:26	<Philip`>	I've got no idea whether you can do that
00:26	<Philip`>	(If you could, it'd be bad in terms of IE-compatibility with other sites that uses data: URIs)
00:42	<hdh>	(defun last-heading () (search-backward-regexp "C-x C-f") (match-string 0)) (setq mode-line-format '(:eval (last-heading)))
00:42	<hdh>	idk how to plug the eval part into the existing modeline
00:42	<hdh>	the text seems to keep its formatting in the matched buffer
01:02	<Hixie>	hdh: interesting
11:21	<benh_>	Historical question: does anyone know why HTML4 deprecated u, s, strike, but not b/i/big/small, even though it discouraged the later? What was special about u/s/strike?
12:55	<annevk3>	Hixie, s/buu can/but can/
12:55	<annevk3>	Hixie, s/is can be/can be/
12:57	<annevk3>	Hixie, s/sections that cause/sections cause/ (I think, the sentence does not seem correct otherwise)
12:59	<annevk3>	Hixie, "For each <span>cache host</span> associated with an <span>application cache</span>" Isn't a cache host always associated with one?
15:02	<annevk3>	Gmail on a message I sent: "4:06 PM (-1 minutes ago)"
15:50	<Andrii>	annevk3: google invented the time machine, cutting edge technology
16:51	<Hixie>	annevk3: please send mail for feedback, pleeeeease. :-)
17:25	<annevk3>	Hixie, next time or also for those four lines?
17:25	<annevk3>	IRC is convenient, there's less UI involved
17:26	<Dashiva>	I'm betting he wants the paper trail
17:28	<Philip`>	He prints out emails?
17:29	<Philip`>	Someone needs to make an IRC bot which lets you say "!feedback s/buu can/but can/" and it will automatically send an email to Hixie
17:35	<krijnh>	Doesn't need to be an IRC bot, since all the logs are available as HTML as well :)
17:36	<Philip`>	krijnh: Good point :-)
17:37	<Philip`>	krijnh: If you could change your log processor to let us embed RDF in our IRC messages that will get translated into RDFa in the logs, then we could write a simple RDFa-based tool to automatically extract all the feedback and email it
17:37	<annevk3>	yeah, and with enough annotation everything gets done automatically
17:38	<Dashiva>	That reminds me of joel's post about spec writing
17:39	<Dashiva>	"And then some people go into a dark place where they imagine automatically generating implementations from specs, and think they have invented a way to program without programming" (by memory)
17:39	gsnedders	starts implementing parse errors in html5lib php
17:40	<takkaria>	parse errors are overrated
17:52	<annevk3>	gsnedders, why do you need parse errors? isn't a treebuilder more useful?
17:52	<gsnedders>	annevk3: Trying to finish Tokenizer first before moving on
17:59	gsnedders	returns to the point where more than 50% of tests pass
18:40	<gsnedders>	Weeee… 23 tests failing now
18:41	<Philip`>	Delete those tests, then you'll pass 100%
18:46	<gsnedders>	14% perf. regression from throwing parse errors
18:47	<Philip`>	In a document that has no parse errors?
18:47	<gsnedders>	Yeah
18:47	<gsnedders>	(i.e., the spec)
18:48	<takkaria>	ouch
18:49	<takkaria>	I'm not sure hubbub is ever going to have parse error reporting
18:50	<gsnedders>	12.0s is still a massive improvement over the 48s it was a week ago
18:50	<takkaria>	sure :)
18:52	gsnedders	runs with profiler
18:52	<gsnedders>	(This takes it back to around 50s :P)
18:54	<Philip`>	You need to multithread your tokeniser
18:55	<gsnedders>	Multi-threading in PHP? :P
18:55	<Philip`>	Shouldn't be too hard to just split the input document into n pieces, and speculatively parse the last n-1, and discards any results that are invalidated by the tokeniser state at the end of the previous section
19:00	<jgraham>	Philip`: presumably you would end up being wrong a lot of the time
19:00	<jgraham>	Which seems bad
19:01	<Philip`>	jgraham: You could scan forwards to the next '>' and assume you're now going to be in the data state, which is likely to be right quite often
19:02	<takkaria>	is it?
19:02	<Philip`>	and if you were in a <script> or something then you make sure you've kept enough state so you can sync up once you've reached the </script>
19:02	<jgraham>	Hmm. You're making this sound surprisingly reasonable
19:02	<Philip`>	Really?
19:02	<Philip`>	That wasn't my intent
19:02	<jgraham>	Which suggests that you're misleading me somehow
19:05	<Philip`>	takkaria: I suppose it should be fairly easy to instrument a tokeniser to report how often it sees '>' when it's in the data state (and PCDATA, and no escape flag)
19:09	<Philip`>	Does Python have a multiprocessing thing nowadays that isn't unbearably hard to use efficiently?
19:09	<gsnedders>	Philip`: Yes, multiprocessing
19:09	<Philip`>	Ah, sounds good
19:09	<Philip`>	Maybe html5lib should do this! :-)
19:10	<gsnedders>	Will all zero users of php-html5lib kill me if I make 23 test cases fail?
19:11	<annevk3>	we should make error reporting optional in the tests
19:11	<annevk3>	or flag tests that rely on error reporting
19:11	<Philip`>	That's easy
19:11	<Philip`>	if 'ParseError' in expected_tokens: it relies on error reporting
19:11	<annevk3>	it seems to me that the PHP parser is not intended for building a validator so it should just not do it and be fast :)
19:12	<Philip`>	and you could just strip out all the ParseErrors when comparing your tokeniser against the test result
19:12	<annevk3>	yeah, I guess that's the best way
19:12	<annevk3>	you still want to test error handling
19:13	<gsnedders>	Yeah, that's what it currently does, for a few more minutes at least
19:14	<gsnedders>	annevk3: I'd disagree that it is irrelevant. You might want to only allow valid comments on my blog. Oh, wait, you already do.
19:15	<annevk3>	i wouldn't mind syntax errors actually
19:15	<takkaria>	gsnedders: testcases failing is bad, mmkay
19:15	<annevk3>	i'd just validate the tree
19:15	<annevk3>	validate the tree based on some whitelists
19:15	<Philip`>	Someone should make a blog comment CAPTCHA system which presents you with a random word (just in plain text) and requires you to use it in a grammatically-correct sentence in your comment
19:16	<gsnedders>	takkaria: All failures are due to the parse errors, and one of them is somewhat questionable (I'd argue that the test case relies on impl. specific behaviour)
19:38	<olliej>	Philip`: heheh
19:39	<gsnedders>	Philip`: How do you determine whether a sentence is grammatically correct?
19:44	<Philip`>	gsnedders: Mechanical Turk
19:53	<Niictar>	But... couldm
19:54	<Niictar>	But... couldn't a bot be programmed to generate a non-sensicial but grammatically correct sentence after doing a dictionary lookup of the word in question?
19:55	<Niictar>	Or, just quote a sentence example straight out of said dictionary? :P
20:00	<olliej>	Niictar: sssh
20:00	<olliej>	Niictar: although that captcha might be fairly good for filtering out most reddit/digg/youtube commenters :D
20:01	<Niictar>	=D
20:35	<gsnedders>	"It quite clearly shows that Humbert Hubmert does \emph{not} love Lolita." — Discuss.
20:35	<gsnedders>	:P
20:36	<Philip`>	s/Hubmert/Humbert/
20:36	gsnedders	is typing quickly damnit!
20:38	<Dashiva>	If you reversed the grammar captcha, it could work well for sites where real users are inable to write properly, whereas a bot would be too successful
20:57	gsnedders	makes another tweet exactly 140 characters long
20:57	<gsnedders>	(And 142 bytes)
21:10	<jwalden>	gsnedders: is the limit 140 characters or 140 code points?
21:12	<gsnedders>	jwalden: It appears to apply NFC
21:13	<gsnedders>	(So my first test failed)