#whatwg on 2008-03-14

00:42	<Hixie>	Philip`: i just used your code :-P
00:44	<Philip`>	Hixie: I never claimed that my code was a suitable general-purpose replacement :-)
00:49	<Hixie>	:-P
00:50	<Hixie>	ok i replaced it with the old code
00:50	<Hixie>	and put an ie8.html file there for IE
01:38	<Philip`>	http://realtech.burningbird.net/standards/xhtmlate-wordpress-comments/ seems good, though it fails to ban invalid characters from author names
01:43	<Philip`>	(and I can't go in to edit my post and fix the totally accidental inclusion of invalid characters because I get a YSoD :-( )
02:46	<Philip`>	Does anyone know how to make an exploit based on http://golem.ph.utexas.edu/instiki/new/Testing%20%3Ciframe%20onload=alert(document%26%23x2e;cookie)%3E work in XHTML?
02:46	<Philip`>	The difficulty is that it's seemingly impossible to use a / character
03:19	<Hixie>	Philip`: escape/encode the exploit somehow in the js, e.g. %-encoded, then just have hte exploit do something like eval(unescape(...exploit...)) and %-encode that and put it in the uri
03:21	<Philip`>	Hixie: The JS won't get executed unless the XHTML is well-formed, and I can't see how to make it well-formed
03:21	<Hixie>	oh yeah to make something well-formed you need a slash
03:22	<Hixie>	can you use the query component?
03:22	<Philip`>	In some cases it might be possible to do something with <![CDATA[ and somehow make it line up well-formedly with an existing ]]> somewhere later in the document, but I don't think that can work here
03:24	<Philip`>	As far as I can see, the query string is just ignored
03:54	Hixie	snaps
04:02	Philip`	wonders if he can calculate the tensile strength of a Hixie
10:17	hsivonen	notes that the serialization algorithm now escapes " to " outside attributes. has it always been that way?
11:48	Xiven-	conveys his thanks to Philip` for the lesson in Unicode safety
12:01	<Philip`>	Xiven-: My pleasure :-)
12:01	Philip`	hopes the comments etc don't allow in invalid characters
12:04	<Xiven->	it now removes invalid characters (based on the list at http://www.whatwg.org/specs/web-apps/current-work/#preprocessing) from all GET and POST data
12:07	Xiven-	also thanks annevk and Hixie for passing the message on to him :)
12:08	<hsivonen>	now that Hixie made astal non-characters parse errors, a conforming XHTML text content might not be conforming HTML text content
12:08	<hsivonen>	astral even
12:48	<Xiven->	but yes, the disturbing part was that PHP's XML parser appeared to run out of memory on some invalid characters
13:11	<hsivonen>	aargh. it is way too easy to accidentally move an IMAP mailbox inside another in Mail.app
13:11	<hsivonen>	a small slip of the pointing device and a click turns into a drag
13:16	<hsivonen>	Over the years, having an "are you sure" dialog for mailbox move and rename would save me non-trivial time
13:18	hsivonen	wonders why Unicode introduced REPLACEMENT CHARACTER when ASCII had1A USBSTITUTE
13:20	<zcorpan_>	hsivonen: the spec has always escaped " outside attributes, yes. but i've complained about that on the list
13:25	<hsivonen>	zcorpan_: ok
13:26	<zcorpan_>	at least i remember having complained about it, i can't find it on google or in the issues list
13:27	<zcorpan_>	ah http://lists.w3.org/Archives/Public/public-html/2007Jul/1030.html
13:28	<zcorpan_>	(though, not sure if it's in the issues list)
13:29	zcorpan_	had notes in http://simon.html5.org/tools/js/innerhtml-viewer/getInnerHTML.js
13:31	<hsivonen>	annevk, jgraham__: do you have an opinion on how test cases should test that forbidden characters emit parse errors?
13:32	<hsivonen>	(since the relative order of errors and tokens in not well-defined in that case)
13:32	<hsivonen>	for example, I implement the check as part of reading the next character from the stream
13:34	<Philip`>	hsivonen: The tests currently use something like ignoreErrorOrder:true for the cases where the order is undefined
13:36	<hsivonen>	Philip`: oh. I have to look into that
13:36	<hsivonen>	Philip`: thanks
13:38	<hsivonen>	it's crazy how long the read() method has become...
13:38	<annevk>	"HTML is tough"
13:39	<a-ja>	hsivonen: henri, whatcha think about that uF v2 idea of daniel's?
13:41	<hsivonen>	a-ja: I think it won't work because microformat producers are going to be sloppy and, therefore, consumers are going to be able to extract more data if they extract anything that looks like a microformat regardless of <meta> or profile='' or whatever
13:41	<a-ja>	breaks html5? or requires meta's to be allowed in standalone articles? just in head wouldn't seem to cut it
13:42	<annevk>	http://www.mnot.net/drafts/draft-nottingham-http-link-header-01.txt
13:42	<hsivonen>	a-ja: it's isn't as much about breaking HTML5. I just think that microformat consumers will find ignoring the meta more valuable
13:44	<a-ja>	i think head profile is gonna have enough issues....especially with list of long url's. could easily break content-type sniff .5k / 1k limits
13:45	<annevk>	there's no such limit anymore though
13:45	<a-ja>	no?
13:45	<annevk>	right
13:46	<a-ja>	guess that's A Good Thing...at least in some ways
13:49	<a-ja>	FYI:
13:49	a-ja	offers FREE as in BEER bug bounty - A case to whoever gets bug 311366 patch checked in before b5 code freeze, and gets it to stick for FF3 final
13:49	<a-ja>	not takers yet ^
13:49	<a-ja>	s/not/no/
13:55	<Philip`>	http://golem.ph.utexas.edu/instiki/show/Sandbox - alas
14:00	<hsivonen>	is there an easy online tool for converting astral chars into surrogate pairs?
14:01	<hsivonen>	http://rishida.net/scripts/uniview/conversion.php
14:02	<Philip`>	hsivonen: http://www.fileformat.info/info/unicode/char/10000/index.htm
14:02	<hsivonen>	Philip`: thanks
14:34	<hsivonen>	http://odfalliance.org/resources/google-response-post-brm.pdf contains a couple of HTML-relevant points
14:34	<hsivonen>	1) Requiring vendor extensions to be documented in order to conform to the base language.
14:34	<hsivonen>	and
14:34	<hsivonen>	2) that Transitional isn't
15:25	<hsivonen>	I regret suggesting that 0x80-0x9F bytes be errors when ISO-8859-1 is declared
15:27	<annevk>	i don't think it makes much sense
15:27	<annevk>	iso-8859-1 is just an another alias
15:29	<hsivonen>	Hixie: can I just say I was wrong and ask this detail to be reversed? (especially since doing the consistent thing with GBK would be a PITA)
15:29	<annevk>	hsivonen, can't you just use the declared encoding instead?
15:29	<annevk>	that way they'll automatically be errors
15:30	<hsivonen>	annevk: then the parser could no longer be used as an error reporting general-purpose parser
15:31	<hsivonen>	annevk: also, then show source would wrong assuming the 1252 interpretation is right
21:09	<hsivonen>	whoa. the Talk:Acid2 really is sad.
21:09	<hsivonen>	is the irc log at krijnhoetmer.nl a verifiable source?