#whatwg on 2007-06-18

01:46	mpt	wonders if <image/svg>, <application/mathml+xml>, etc would work
01:51	<othermaciej>	would work in what?
02:04	<mpt>	HTML
02:05	<mpt>	instead of having the HTML specification containing yet another registry for document types
02:14	<othermaciej>	you intend those to be tag names?
02:15	<othermaciej>	I don't think that generalizes, some interesting XML languages don't have a specific MIME type assigned
04:20	<Wolfman2000>	Evening. Is there a link that shows what is planning to be deprecated in HTML5?
04:52	<mpt>	Wolfman2000, http://dev.w3.org/cvsweb/~checkout~/html5/html4-differences/Overview.html#dropped-elements
04:53	<Wolfman2000>	this helps: thanks
06:38	<annevk>	yeah, XBL doesn't have a MIME type
07:05	<hsivonen>	Hixie: I'm awake now.
07:28	<annevk>	hsivonen, why is < more special than & or " or '?
07:29	<annevk>	they're all non-conforming in the end
07:30	<annevk>	jgraham, you around?
07:39	<hsivonen>	annevk: for unquoted attributes, no, they aren't all non-conforming in the end
07:41	<annevk>	attribute values?
07:41	<annevk>	oh, yeah
07:41	<hsivonen>	annevk: < is special, because it makes it look like a new tag is starting. but one isn't
07:41	<hsivonen>	annevk: so conformance checkers should be able to flag it for authors who are going WTF
07:41	<annevk>	title=2<5
07:41	<annevk>	there's a use case :)
07:41	<hsivonen>	annevk: also, it should be non-conforming to keep conforming docs reasonable safe for shipped Gecko and WebKit
07:41	<annevk>	<a title=2<5> already works in Firefox
07:41	<hsivonen>	oh
07:41	<annevk>	I think "<" is only special cased in some other states in at least Firefox
07:42	<hsivonen>	<a <title=2> doesn't
07:42	<hsivonen>	annevk: good point
07:43	<hsivonen>	anyway, I added warnings while I was at it in case Hixie disagrees about making it an error
07:44	<annevk>	I don't think it should be more than an error than a Unicode character that looks like "a" or something
07:45	<hsivonen>	I wonder how often Russians enter lookalikes by accident
07:47	<hsivonen>	but yeah, I wasn't properly thinking about some of those cases getting caught on a higher layer
07:48	<annevk>	I'm going to squash that bug in html5lib now I think
07:49	<annevk>	And fix all the tests...
07:49	<karlUshi>	<a title=q<p>math proposal</a>
07:49	<hsivonen>	annevk: please considering applying my patches for the tests before making other changes that prevent the patches from applying
07:51	<annevk>	at some point we should sort out math and ruby
07:51	<annevk>	hsivonen, can't you commit them yourself?
07:51	<hsivonen>	annevk: AFAIK, no
07:52	<karlUshi>	<a title=/b</a>math proposal</a>
07:55	<annevk>	indeed you can't
07:55	<Hixie>	hsivonen: you sent a mail about EOF having been dropped recently at some point from some sections in the tokeniser
07:55	<Hixie>	hsivonen: did you see if they actually got dropped? i think i may just never have had them!
07:57	<hsivonen>	Hixie: diff tells me you had them and dropped them
07:59	<Hixie>	huh
07:59	<Hixie>	any idea when?
08:00	<Hixie>	i'd love to bring them back exactly as they were
08:00	<Hixie>	totally wasn't my intention to drop them
08:00	<hsivonen>	Hixie: had them on June 12. not anymore on June 17
08:00	<Hixie>	ok, cool, thanks
08:00	<Hixie>	that'll help
08:00	<hsivonen>	Hixie: probably part of rev 886
08:00	<Hixie>	the reason i was looking for you earlier was to ask you what the use case for embedded svg was
08:00	<annevk>	maybe EOF and < were handled in the same way...
08:00	<hsivonen>	(rev # from off the top of my head)
08:01	<hsivonen>	annevk: they were
08:01	<Hixie>	EOF isn't in the diff for 866
08:01	<Hixie>	er, 886
08:03	<hsivonen>	Hixie: the use case is including diagrams or graphs
08:03	<Hixie>	aha
08:03	<hsivonen>	Hixie: with application/xhtml+xml you can include them inline
08:03	<Hixie>	899
08:04	<hsivonen>	Hixie: oh. sorry. I was thinking of another recent rev
08:04	<Wolfman2000>	Hixie: sounds like you know a lot about this HTML 5. Do you have any idea when it will eventually take over HTML 4?
08:04	<hsivonen>	Hixie: so not being able to include them inline in text/html is a feature parity bug between the serializations
08:04	<Hixie>	Wolfman2000: i wrote html5 :-) how do hyou mean, take over?
08:05	<Wolfman2000>	There are some people in other channels that are worried of the progress of HTML 5. They wonder when/if HTML 5 will become recommended over HTML 4.01 Strict.
08:05	<hsivonen>	Hixie: the fact that Jacques Distler, Sam Ruby and I intuitively want to include them inline when possible suggests that it is something that we think there's a point
08:05	<Hixie>	Wolfman2000: it won't be finished for many years
08:06	<othermaciej>	including XBL inline in HTML is surely useful
08:06	<Hixie>	Wolfman2000: then again, html4 isn't really finished yet either
08:06	<othermaciej>	and including SVG in inline XBL bindings for HTML is surely useful
08:06	<Wolfman2000>	Also, I have a demonstration page about a potentially valid use for target="_top". Let me get it up on the server.
08:06	<Hixie>	Wolfman2000: so...
08:06	<Wolfman2000>	Hixie: ...html4 isn't?
08:06	<hsivonen>	Hixie: if the diagram or graph is not shared between multiple docs and isn't binary, there's really no good reason not to if putting it inline is possible
08:06	<Hixie>	Wolfman2000: it's full of bugs and errors (e.g. it says that media=screen is the default, not media=all)
08:06	<hsivonen>	s/is possible/if possible/
08:07	<hsivonen>	s/if putting/put/
08:07	hsivonen	hasn't properly woken up yet
08:08	<annevk>	Wolfman2000, contrary to other implementations, HTML5 is driven by implementation
08:08	<Wolfman2000>	Anyway, the page I mentioned: http://courses.ncsu.edu/csc234/lec/651/jaf_index.html This page won't stay up forever: it's a design that has become retired due to lack of proper disabled support.
08:08	<Hixie>	hsivonen: should we also allow XBM inline? (just asking to find out where you think the boundary lies)
08:08	<annevk>	Wolfman2000, so when the spec is done, it will be properly implemented
08:08	<annevk>	Hixie, what's XBM?
08:08	<hsivonen>	Hixie: I don't know what XBM is
08:08	<Lachy>	did you mean XBL?
08:09	<Hixie>	Wolfman2000: can you send your suggestion / help to me by e-mail? ian⊙hc (or whatwg⊙wo if you are subscribed)
08:09	<Wolfman2000>	I'm not subscribed yet.
08:09	<Hixie>	hsivonen: a text bitmap format
08:09	<Wolfman2000>	What's the best way to subscribe?
08:09	<Hixie>	hsivonen: i would have said PNG but that's not a text format
08:09	<Hixie>	Wolfman2000: http://whatwg.org/mailing-list
08:10	<hsivonen>	Hixie: but in general I'd be ok with being able to include any namespaced stuff with prefixes and optimize SVG and MathML to work without prefixes
08:10	<karlUshi>	monchrome bitmap
08:10	<Hixie>	hsivonen: hm interesting
08:10	<othermaciej>	XBM is not an XML language
08:10	<hsivonen>	Hixie: I guess we shouldn't do XBM because it wouldn't work by just hacking the parser
08:10	<othermaciej>	or indeed a markup language
08:10	<othermaciej>	you could put it in a data: URL I guess
08:11	<Wolfman2000>	awaiting confirmation email
08:11	<karlUshi>	http://en.wikipedia.org/wiki/XBM
08:11	<hsivonen>	Hixie: but putting SVG or MathML in the DOM already works, so fixing the parser is relatively low-hanging fruit compared to generalizing to XBM
08:11	<Hixie>	hsivonen: well fwiw i personally think it'd be great to have a math format and a vector graphics format in html. i think it would be a huge amount of work, though, and i think it would be highly controversial (so i don't plan on doing it anytime soon)
08:11	<Wolfman2000>	...on confirming the subscription request, can I use my preferred internet name instead of my real name?
08:11	<Hixie>	Wolfman2000: yes
08:12	<Wolfman2000>	Then consider me subscribed.
08:12	<hsivonen>	Hixie: well, pushing text/html and not being able to include arbitrary namespaces is also controversial to some
08:13	<hsivonen>	Hixie: and I'd expect SVG in text/html to have all the same warts as SVG in application/xhtml+xml
08:14	<Wolfman2000>	...so now that I'm in, all I do is send email to whatwg⊙wo and everyone sees it, right?
08:14	<Hixie>	hsivonen: i think general svg-like or mathml-like syntax in html has reasonably straightforward ways of being done (far from easy, but at least not technically difficult)
08:14	Wolfman2000	hasn't done this in awhile.
08:14	<hsivonen>	Hixie: so fixing SVG to WHATWG quality is a more general problem than enabling it in parsing
08:14	<Hixie>	hsivonen: i think a general-purpose namespaces system would be practically infeasible though in text/html
08:14	<Hixie>	Wolfman2000: yup
08:15	<hsivonen>	Hixie: would it be infeasible to hard-wire prefixes known to date and allowing the declaration of unknown prefixes?
08:15	<Hixie>	hsivonen: i'm not including svg 1.1 in html5 unchanged (e.g. requiring xlink namespace prefixes), that would just be missing a massive opporunity
08:15	<Hixie>	opportunity
08:15	<hsivonen>	(I am aware that the list of known prefixes to date is long)
08:16	<Hixie>	hsivonen: it couldn't be done just by using prefixes, that would have all kinds of issues (e.g. prefixes already do weird things in IE)
08:16	<hsivonen>	Hixie: more weird than what the obvious prefix bindings would do?
08:16	<Hixie>	hsivonen: i don't really have any interest in a simplistic solution that just shoehorns XML syntax into text/html to be honest
08:17	<Hixie>	hsivonen: i don't see the advantage and the costs can be great
08:17	<hsivonen>	Hixie: yeah, a general-purpose system would be more about fulfilling a bullet point
08:17	<Hixie>	hsivonen: but anyway, this is something that's on the cards already
08:18	<hsivonen>	Hixie: but special-casing SVG and MathML still has a point, I think
08:18	<hsivonen>	Hixie: ok
08:20	<Wolfman2000>	Email has been sent, webpage link included.
08:35	<Wolfman2000>	...great. I just received an email saying my email to the group got...well, bounced. It's awaiting moderator approval.
08:35	Wolfman2000	thought he just signed up.
08:35	<annevk>	did you e-mail to the list you signed up for?
08:35	<annevk>	there are four lists
08:37	<Wolfman2000>	I thought I signed up to the right list. I then emailed whatwg⊙wo
08:37	<Wolfman2000>	...oh crap. I signed up for Implementors.
08:38	<Wolfman2000>	so I emailed it to the wrong spot?
08:38	<Hixie>	Wolfman2000: if it got stuck in the moderator queue you'll have to resubscribe, sorry :-\|
08:38	<Wolfman2000>	...wha?!?
08:39	<Wolfman2000>	one shot and that's it?
08:40	<Wolfman2000>	...strange. it still looks like I'm subscribed. At least...in Implementors.
08:40	<annevk>	yeah, you have to subscribe to the other list
08:40	<Hixie>	Wolfman2000: no i mean you'll have to subcribe to the other list
08:40	<annevk>	and then e-mail your message again
08:40	<Hixie>	Wolfman2000: the moderator queue is just a black hole
08:40	<Hixie>	(we were getting too much spam for me to keep up)
08:41	<Wolfman2000>	From this page: http://www.whatwg.org/mailing-list I ended up subscribing to Implementors
08:41	<Wolfman2000>	I assume there is a different page I'm supposed to go to then?
08:41	<annevk>	use http://lists.whatwg.org/listinfo.cgi/whatwg-whatwg.org
08:41	<annevk>	(it's linked from that same page)
08:42	<Wolfman2000>	...oy. four of them.
08:42	<Wolfman2000>	I'm assuming I should subscribe to all of them then?
08:42	<annevk>	no
08:42	<annevk>	just the ones you're interested in
08:43	<annevk>	I suggest you read the page briefly first
08:43	<Wolfman2000>	Probably a good idea.
08:46	<Wolfman2000>	alright, covered. In the end...I think what I wanted the most was help-whatwg.org instead of whatwg.org
08:46	<Wolfman2000>	about to re-send the email
08:47	<Hixie>	hsivonen: fixed the EOF isue
08:47	<jgraham>	annevk: I'm here now...
08:47	<Hixie>	issue
08:48	<Wolfman2000>	...alright, chose to resend to the same place. Let's hope it doesn't go to the black hole this time
08:49	<hsivonen>	Hixie: thakns
08:51	<annevk>	jgraham, it's already working
08:51	<annevk>	jgraham, I didn't have chardet but that seems to be optional now
08:51	<jgraham>	Great :)
08:52	<jgraham>	Less difficult questions early in the morning == good
08:52	<annevk>	Currently fixing the new entity stuff by making a small dirty hack that scrapes the HTML5 spec
08:52	<jgraham>	s/Less/Fewer
08:52	<annevk>	well, the multpage version
08:55	<hsivonen>	Safari appears not to have a chardet equivalent. does this mean that chardet is no longer needed on the real Web?
08:56	<hsivonen>	Opera seems to have autodetection available but it is scoped to Cyrillic, Chinese, Japanese or Korean at a time
08:56	<hsivonen>	what does IE7 do?
08:57	<Wolfman2000>	hsivonen: charset detection? Hmm...hang on a second, while I test a certain webpage.
08:57	<hsivonen>	Wolfman2000: yes
08:57	<Wolfman2000>	...I think it's still needed.
08:57	<Wolfman2000>	http://foonmix.nothing.sh/ Use Shift_JIS
08:58	<Wolfman2000>	I believe my options are set to use utf-8 by default
08:58	<Wolfman2000>	does that help a bit hsivonen, or did I misunderstand?
09:00	<hsivonen>	Wolfman2000: does IE7 have an autodetector?
09:00	<Wolfman2000>	hsivonen: I'm unsure: I'm on a Mac.
09:00	<Wolfman2000>	I was testing Safari
09:00	<hsivonen>	what do Japanese Safari users do? do they use another browser or switch encodings manually?
09:00	<Wolfman2000>	I needed to switch my encoding manually.
09:00	<Wolfman2000>	But I'm an American Safari user, so...I don't know.
09:00	<Wolfman2000>	Most Japanese people use Windows and IE. :(
09:01	<hsivonen>	well, hooking up jchardet to my tokenizer is on my todo list
09:01	<hsivonen>	I'd like to know, though, if passing only the first 512 bytes to chardet is enough
09:02	<Wolfman2000>	I don't know how to test that. I've only just signed up, and I'm merely a simple TA/web designer
09:22	zcorpan	is at the opera office in linköping
09:24	<annevk>	simonp
09:24	<annevk>	:)
09:25	<zcorpan>	@opera.com?
09:25	<annevk>	yeah
09:25	<zcorpan>	oh yep. didn't know i had an opera email already
09:28	<hsivonen>	Hixie: why did you remove "Otherwise, if the next character is a U+003B SEMICOLON, consume that too. If it isn't, there is a parse error.
09:28	<hsivonen>	"
09:28	<hsivonen>	Hixie: in entity tokenization
09:29	<annevk>	because it is part of the entity name
09:29	<hsivonen>	whoa
09:29	hsivonen	is only diffing the tokenization section
09:30	<annevk>	I thought this entity stuff would be trivial to implement but it's not
09:30	<annevk>	I think <object>, <video> etc. should allow block-level fallback...
09:30	<hsivonen>	annevk: It took me a while to get the previous entity parsing right with minimal string object creation
09:30	<annevk>	The new entity stuff isn't stable either
09:31	<annevk>	Apparently IE does something different from this for attributes
09:31	<annevk>	So maybe you should not fix that for now
09:31	<hsivonen>	yeah
09:31	<hsivonen>	actually, I did it without string object creation at all
09:33	<annevk>	Hixie, it would be useful for other standards if HTML5 defined Almost Standards Mode for them
09:33	<annevk>	Hixie, if we're going to keep it, that is
09:45	<annevk>	hsivonen, fixed the tests
09:47	<hsivonen>	annevk: thanks
09:48	<hsivonen>	Hixie: I'd like to make entity names without the terminating semicolon parse errors to help conformance checkers alleviate author confusion is the face of typos
09:48	zcorpan	agrees with hsivonen
09:49	<annevk>	blah &amp blah
09:50	<annevk>	zcorpan, have you changed your entity script to work for attribute values already?
09:50	<zcorpan>	annevk: no
09:50	<zcorpan>	i can do it though
09:50	<annevk>	might be useful to see what IE does there (and how it compares to normal entity parsing)
09:50	<annevk>	cool
09:50	<hsivonen>	annevk: if IE is even remotely sane, the entity handling differences for attributes can be handled by loading a different entity table
09:50	hsivonen	hasn't tested
09:51	<zcorpan>	hopefully it just is the same table without the entries that don't end with ;
09:52	<annevk>	hsivonen, wouldn't that always be possible?
09:53	<hsivonen>	Hixie: btw, for easy scraping it would be nice to have the entity table lexicographically sorted
09:53	<hsivonen>	Hixie: just mentioning this in case you edit it anyway
09:53	<annevk>	hsivonen, there's a table scraping script
09:53	<hsivonen>	annevk: I haven't considered insane options :-)
09:53	<annevk>	it's really easy
09:53	<hsivonen>	annevk: pointer?
09:54	<annevk>	http://html5lib.googlecode.com/svn/trunk/python/utils/extract-entities.py
09:54	<hsivonen>	annevk: thanks
09:54	<hsivonen>	annevk: I guess I'll add lex sort to the script
09:54	<annevk>	just wrote that for my own usage, but integrating the new entity handling didn't work out to well
09:54	<annevk>	hsivonen, can I add you to the html5lib project?
09:55	<annevk>	so you can simply commit those changes yourself
09:55	<hsivonen>	annevk: sure
09:55	<annevk>	you have a google account?
09:55	<hsivonen>	hsivonen⊙gc
09:56	<annevk>	done
09:56	<hsivonen>	thanks
09:56	<annevk>	http://code.google.com/u/hsivonen/
10:13	<zcorpan_>	http://simon.html5.org/test/html/parsing/entities/trailing-semicolon/002.htm -- that is <img alt>... dunno if ie has different rules for <img src> or <a href>
10:15	<annevk>	could you another three columns for the non attribute case?
10:17	<annevk>	IE actually differs for &entity; &entity and &entityX
10:17	<zcorpan_>	sure
10:17	<annevk>	I suppose &entity means &entity< or something?
10:18	<annevk>	or maybe &entity followed by a space
10:22	<annevk>	what would be more useful I suppose if you checked the results using DOM methods and then just printed how they are supported... :)
10:24	<zcorpan_>	yeah
10:32	<hsivonen>	where might I find a list of characters that are allowed (per spec) in unquoted attribute values in HTML 4.01?
10:32	<annevk>	Should </br> also cause the active formatting elements to be reconstructed?
10:33	<hsivonen>	(without spending hours trying to grok SGML myself after borrowing the Handbook from a library)
10:33	<annevk>	I think it's [a-Z0-9]
10:33	<hsivonen>	annevk: what about hyphens, underscores and the like?
10:35	<annevk>	I'd think details would be in http://www.w3.org/TR/html4/appendix/notes.html
10:35	<annevk>	but it doesn't seem like it
10:36	<annevk>	"In certain cases, authors may specify the value of an attribute without any quotation marks. The attribute value may only contain letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), periods (ASCII decimal 46), underscores (ASCII decimal 95), and colons (ASCII decimal 58). We recommend using quotation marks even when it is possible to eliminate them."
10:36	<annevk>	http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2
10:37	<hsivonen>	LCNMCHAR ".-_:"
10:37	<hsivonen>	yeah
10:37	<hsivonen>	thanks
10:56	<annevk>	bah
10:56	<annevk>	</br> is harder than it looks
10:57	<zcorpan_>	ok, the good news is that ie does the same thing with entities in attributes for both <img alt> and <a href>
10:57	<zcorpan_>	the bad news is that <img alt="&AElig"> works but <img alt="&AEligX"> doesn't
10:58	<zcorpan_>	need to figure out which characters end entities
10:58	<zcorpan_>	in attribute values
11:01	<annevk>	ah, that's only for attribute values?
11:01	<zcorpan_>	yeah
11:01	<annevk>	quoted versus double quoted versus unquoted too maybe?
11:01	<zcorpan_>	oh, that better work the same...
11:01	<zcorpan_>	but i'll test it too
11:03	<hsivonen>	when the generic facet of my validation service sees an XHTML 1.0 doctype in text/html, I will (in a future release) tokenize as HTML5 but validate as XHTML 1.0 and I'm going to say that this is bogus but I am doing it for the users' convenience
11:03	<hsivonen>	should I make the message a warning or an error?
11:03	<hsivonen>	"bogus" means error but "convenience" means warning
11:03	<zcorpan_>	error
11:04	<zcorpan_>	say that it isn't processed as xhtml by browsers unless the document is served with an xml mime type
11:04	<zcorpan_>	or something
11:04	<virtuelv>	annevk: doesn't most browsers interpret </br> as <br>?
11:05	<annevk>	yes
11:05	<hsivonen>	zcorpan_: makes sense.
11:05	<hsivonen>	annevk: do you have an opinion?
11:06	<annevk>	hsivonen, warning seems fine
11:06	<hsivonen>	zcorpan_: or would it make sense to turn it into a warning if the user checked the lax content type checkbox?
11:06	<annevk>	it's not actively harmful
11:06	hsivonen	is inclined to bind this to the lax type option
11:08	<hsivonen>	doh. I'm already doing something else for the lax type option, so that doesn't work
11:10	<zcorpan_>	hsivonen: with the lax option set, wouldn't you process it as xml?
11:10	<hsivonen>	zcorpan_: yes. I can't even remember anymore what the lax option does
11:10	<hsivonen>	the code for it is rather hairy, too
11:11	<zcorpan_>	in any case, when you parse xhtml with the html parser, emit an error imho
11:11	<hsivonen>	zcorpan_: ok
11:11	<zcorpan_>	if, with the lax option, you parse text/html as xml, a warning is fine
11:12	<zcorpan_>	back to entities: it seems any character except [a-zA-Z0-9] end an entity in attribute values
11:17	<annevk>	so you basically consume chars until you hit something out that range
11:17	<annevk>	hmm
11:21	<zcorpan_>	or you consume as many as possible that match the entity table, and for the longest match, check if the next character is in that range. if yes, emit the consumed characters, otherwise emit the entity
11:24	<annevk>	ok, rearchitected my </br> fix
11:24	<annevk>	should be easy to add </p> later
11:25	<annevk>	and _tons_ of other elements that act like that...
11:25	<annevk>	I love </plaintext>
11:27	<annevk>	zcorpan_, assuming the entity table doesn't have ; that should work I suppose
11:28	<zcorpan_>	yeah, the ; is not part of the entity name. we need to revert to the old table and instead have a third column that says which entities always require a ;
11:29	<annevk>	and a fourth that says which entities require that for attribute values...
11:30	<zcorpan_>	that is the same
11:31	<zcorpan_>	unless the next character is [a-zA-Z0-9], in which case all entities require a ;
11:32	<annevk>	how does that cover <a href="&region">&region</a>
11:32	<annevk>	oh right
11:32	<annevk>	interesting
11:32	<annevk>	what about & as terminating character and ?
11:32	<annevk>	or did you already try it for a big range?
11:34	<zcorpan_>	http://simon.html5.org/test/html/parsing/entities/trailing-semicolon/004.htm
11:35	<annevk>	good stuff :)
11:36	<annevk>	maybe you should use <span> instead of <a> for 003
11:37	<zcorpan_>	span doesn't have a href attribute :)
11:37	<annevk>	use title :)
11:37	<zcorpan_>	the point was to test a URI attribute
11:37	<annevk>	ok
11:37	<zcorpan_>	though i could use # if you don't want a lot of 404s :)
11:38	<annevk>	I suppose that could help
11:39	<zcorpan_>	done
11:40	<zcorpan_>	sent results to the list
11:42	<annevk>	heh, fun that you replied to my message :p
11:42	annevk	goes to fetch some food before it's gone
11:44	<zcorpan_>	i thought it was appropriate as a reply :)
11:47	<hsivonen>	does IE7 support '?
11:48	<zcorpan_>	no
11:48	<hsivonen>	that's weird
11:48	<zcorpan_>	yes
11:48	<zcorpan_>	iirc i filed a bug on that during their "beta" stage
11:49	<hsivonen>	gotta remember to make it a warning
11:49	<zcorpan_>	http://simon.html5.org/test/ie7b2-bugs/014.html
11:49	<zcorpan_>	opera doesn't support &TRADE;
11:50	hsivonen	adds a note in the source
11:50	<zcorpan_>	annevk: is there a bug on that? (can i check that? :P )
12:09	hsivonen	wonders what's the best practice regarding memory allocation for growable buffers in a reusable library class
12:10	<hsivonen>	that is, should I optimize speed and risk memory leaks?
12:10	<hsivonen>	never leak memory and risk speed?
12:10	<hsivonen>	or let the user of the library decide?
12:11	<hsivonen>	annevk: does html5lib ever shrink buffers that grow depending on input? or does Python make these decisions for you?
12:19	<annevk>	zcorpan_, Opera does
12:19	<annevk>	zcorpan_, fetch a newer build now you can ;)
12:20	<annevk>	hsivonen, I'm not competent enough to answer that question
12:20	<annevk>	hsivonen, I can say as much as that we don't have weird constraints anywhere to my knowledge
12:22	<hsivonen>	annevk: ok
13:06	<annevk>	onload is broken in Safari: http://www.howtocreate.co.uk/safaribenchmarks.html ?
13:06	<annevk>	you'd think that if onload is broken pages would be broken as well...
13:38	<Fuzzy76>	I guess "broken" is a subjective term
13:39	<annevk>	it certainly explains the statistics on the safari download page...
13:39	<Fuzzy76>	yes... I've seen several other benchmarks, and none of them showed anything NEAR the numbers from Apple.
14:12	<annevk>	can someone explain to me how "After DOCTYPE public identifier state" and "Before DOCTYPE system identifier state" are different?
14:12	<annevk>	seems like they could be merged
14:13	<annevk>	i'll keep the separate for now...
14:16	Philip`	wonders if there's a reliable way to get multiple asynchronous XMLHttpRequests in flight at once (so the frequency of response arrival can be faster than the round-trip time)
14:29	<annevk>	wtf
14:29	<annevk>	doctype name is no longer uppercase?!
14:29	<annevk>	uppercased*
14:29	<annevk>	this is problematic
14:31	<annevk>	seems to be what Firefox does
14:31	<annevk>	but the amount of testcases that relies on this quirk...
14:35	annevk	fixes tests
14:47	<annevk>	Soonish people should be able to use html5lib to determine whether a page will render in quirks or standards mode
14:49	<mpt>	zcorpan_, what if someone really does want to style the <head> (e.g. head, title {display: block} title {font-size: 2em;})?
14:53	<annevk>	you don't need a scoped style sheet for that
14:55	<Philip`>	(Hmm, I can't fix my problem with XMLHttpRequest, but I can dynamically add <script> elements to the page while cycling through server port numbers so it has one outstanding request per port, since the scripts appear to get loaded asynchronously)
14:56	<Philip`>	(Oh, but they're only asynchronous in Firefox, not Opera, so that won't simply work. But XMLHttpRequest appears to do pipelining in Opera, so I just need to switch between the two methods. And work out what to do for Safari...)
15:00	Philip`	can't quite find what HTML5 says should happen in terms of synchrony when adding a (non-async) <script> to the DOM
15:02	<Philip`>	Oh, looks like it ought to be asynchronous, since the pausing is only done inside the tree construction algorithm
15:07	<zcorpan_>	mpt: if scoped stylesheets are changed to not affect their parent, then you couldn't use a scoped stylesheet for it anyway. and as anne says, you can already do that without scoped stylesheets
15:08	<zcorpan_>	btw, a girl here (at opera) will be implementing an html5 parser in c++
15:08	<Lachy>	IE's cryptic error javascript error messages are really annoying :-(
15:08	<Lfe>	zcorpan_: i would like her even more if she somehow left out those pluses ^_^
15:09	<zcorpan_>	Lfe: heh
15:09	<Lachy>	I'm writing a test case to test the toUpperCase and toLowerCase functions in JavaScript against the unicode data file
15:10	<Philip`>	Could provide a C API around the C++ implementation, so it's easily embeddable in other languages (like C, or Python ctypes, or whatever)
15:10	<Lachy>	so far, I've identified 3 bugs in Firefox within the first 500 chars (cause it takes far too long to process all 17000)
15:13	<met_>	looks like people are confused by all those storages http://ajaxian.com/archives/firefox-3-sqlite-and-more
15:41	<zcorpan_>	http://simon.html5.org/temp/html5-opera.txt are things that i might write tests for this summer (thought probably less that that, that's just a first filtering)
15:42	<zcorpan_>	anyone want me to look at something in particular?
15:50	<hsivonen>	annevk: I haven't implemented anything that is in the tree construction part (yet)
15:59	<annevk>	ah
15:59	<annevk>	i just landed all that's needed to enable quirks mode checking
15:59	<annevk>	someone just has to hook in some flag
16:00	annevk	hopes jgraham can make it look prettier
16:01	<annevk>	and we need to update the DOCTYPE token to handle systemId and publicId in case they are not None
16:02	<zcorpan_>	comments before the doctype don't trigger quirks mode per html5? even bogus comments? iirc this triggers quirks mode in firefox: </ foo ><!doctype html>
16:02	<zcorpan_>	but <? foo ><!doctype html> is standards mode
16:03	<annevk>	</ foo><!doctype html> doesn't in Opera
16:03	<annevk>	doesn't give you a comment token either
16:05	<zcorpan_>	</ foo><!doctype html> is quirks mode in ie7
16:07	<Philip`>	XXX<!doctype html> is standards mode in FF too
16:07	<Philip`>	(unless that > is pushed beyond the first 1024 bytes)
16:08	<annevk>	So Firefox is sniffing before actual parsing?
16:09	<annevk>	Guess that's why it's called "doctype sniffing" here and there
16:09	<zcorpan_>	Philip`: :-O wow, i don't think that was the case before
16:09	<zcorpan_>	annevk: yeah
16:11	<zcorpan_>	Philip`: or perhaps i just didn't test that case
16:11	<hsivonen>	annevk: or you could put JSON nulls in the array for public and system id when not present
16:12	<hsivonen>	annevk: since that handles nicely the cases when only one is absent
16:12	<hsivonen>	annevk: and you need to know which on
16:12	<hsivonen>	e
16:12	<hsivonen>	Philip`: that's weird. IIRC, around Mozilla 1.1 it wasn't like that.
16:12	<Philip`>	It looks like FF must be doing some look-ahead before parsing - compare <!--><!doctype html> vs <!--><!doctype xhtml>
16:12	<annevk>	hsivonen, that's for the tokenizer tests
16:13	<annevk>	hsivonen, I was thinking about the tree construction stage
16:13	<hsivonen>	oh
16:13	<annevk>	maybe I should handle the tokenizer tests first, prolly easier to make testcases too
16:22	<annevk>	hsivonen, should I use None in the tests?
16:23	<hsivonen>	annevk: yes
16:23	<hsivonen>	annevk: I'm assuming that your JSON impl. maps None to JSON null
16:24	<annevk>	I'm talking about the test format
16:24	<hsivonen>	annevk: tree tests?
16:25	<annevk>	tokenizer tests
16:26	<annevk>	http://html5lib.googlecode.com/svn/trunk/testdata/tokenizer/
16:27	<hsivonen>	annevk: JSON null please
16:27	<annevk>	that throws an error somewhere else
16:28	<hsivonen>	annevk: in your JSON to Python mapper?
16:29	<hsivonen>	I'd expect a correct doctype to look like this: ["DOCTYPE", "HTML", null, null, false]
16:29	<annevk>	the last one should be true I think
16:29	<annevk>	as the flag is now "correct"
16:29	<hsivonen>	argh.
16:29	<hsivonen>	annevk: you are right, of course
16:30	<Philip`>	More specifically: FF seems to do standards mode if the first 1024 characters from the first non-whitespace character onwards, parsed using quirks mode rules, contains at least one doctype, and the first doctype is a valid HTML one and is not preceded by any non-comment non-text nodes
16:30	<annevk>	it seems all tests were a bit bogus with respect to that
16:30	<Philip`>	(or something roughly like that)
16:31	<zcorpan_>	Philip`: you sure parsing is in quirks mode initially?
16:31	zcorpan_	should be heading home now so he doesn't miss the train
16:32	<Philip`>	I think so - <!--><!doctype html><!--> results in two empty comments, instead of one comment with the text "><!doctype html><!" in it
16:32	<annevk>	ah ok
16:32	<annevk>	I didn't have simplejson and hence I got some simplified json parser that didn't get null
16:32	<zcorpan_>	but is the document standards mode or quirks mode?
16:32	<annevk>	implemented null in it now
16:33	<zcorpan_>	if quirks mode then the parser is initially in standards mode -- otherwise you would have seen the doctype in the pre-parse and switched to standards mode
16:33	<annevk>	Philip`, <!--> should always be a single comment
16:33	<zcorpan_>	i might have written something about this at sitepoint forums at some point
16:34	<zcorpan_>	anyway, i'm leaving now
16:34	zcorpan_	waves
16:34	<Philip`>	If I do <!doctype html><!--><!doctype html><!--> then it is CSS1Compat and it says "#comment: ><!doctype html><!"
16:35	<Philip`>	If I do <!--><!doctype html><!--> then it is BackCompat and it says "#comment","#comment"
16:36	<Philip`>	so... it's parsing in standards mode, not finding the doctype, then re-parsing in quirks mode (and finding the doctype but not changing mode)?
16:37	<annevk>	I have updated some of the tests
17:38	<annevk>	I fixed all the DOCTYPE tests and the tokenizer part of the implementation. I also added some more tokenizer tests to cover PUBLIC and SYSTEM ids.
17:45	<rubys>	annevk: you've been busy! :-)
17:46	<annevk>	yeah, I feel a bit sorry for the ruby project
17:46	<rubys>	nah, won't be hard to catch up, the divs on the python code points the way.
17:46	<annevk>	cool
17:47	<annevk>	There are still some things to implement such as proper DOCTYPE tokens
17:47	<rubys>	i'd like to wait to resync until you slow down...
17:48	<annevk>	I'm about to go home
17:48	<annevk>	so go ahead :)
17:48	<rubys>	cool, and I see the python tests are passing, which is a good sign.
17:52	<annevk>	yeah, I fixed the tests along with the implementation although 3 are still failing
17:52	<annevk>	I hope Thomas can fix that
17:52	<rubys>	i don't see any failing... which ones fail for you?
17:52	<annevk>	some sanitize and serializer tests
17:53	<rubys>	I just tried again... no failures.
17:53	<annevk>	hmm ok
17:54	<annevk>	maybe I'm missing something
18:36	<annevk>	jgraham, have you looked at handling "comments" within RCDATA and CDATA blocks?
18:36	<annevk>	jgraham, seems like we need some character buffer
19:18	<jgraham>	annevk: I was going to ask you the same thing :)
19:18	<jgraham>	I haven't, yet
19:18	<jgraham>	As I've been a bit busy
19:18	<jgraham>	I was happy to see all your checkins today though
19:54	<zcorpan_>	Philip`: yes, exactly
19:59	<Jero>	off topic: what do you guys think of this design? http://jero.net/lab/redesign2/
20:09	<zcorpan_>	Jero: looks a bit like a standard template for a blog
20:11	<Jero>	well, it is a blog ;), but i see what you mean
20:12	<Jero>	I'll probably have to dust off my Photoshop skills and try to come up with something original
20:24	<Hixie>	annevk: we could define almost standards mode, but there'd be absolutely no detectable conformance criteria in the spec for it :-/
20:24	<Hixie>	hsivonen: i thought the table _was_ sorted
20:34	<Jero>	for all those who care: I now implemented the entire tokenization and tree construction algorithms in my HTML5 parser in PHP (http://jero.net/lab/ph5p/)
20:34	<Jero>	now to get rid of those bugs (http://jero.net/lab/ph5p/tests.html)
20:35	<Jero>	(and optimizing might not be such a bad idea)
20:39	<rubys>	jero: have you taken a look at html5lib's testdata directory?
20:40	<Jero>	not recently, but the tests in my tests.html file are from the first test file
20:51	<Hixie>	annevk: defined almost standards mode
21:28	<Hixie>	jesus, <nobr> is wacked in html parsers
21:28	<Hixie>	how are we gonna handle _that_
21:29	<Dashiva>	Compared to what parsers?
21:32	<Hixie>	how do you mean?
21:33	<Dashiva>	Just wondered if there was somewhere it wasn't wacked, since you qualified the statement like that
21:38	<Hixie>	Dashiva: oh well like xml parsing
22:09	<Hixie>	hsivonen: yt?
22:46	<Philip`>	Hmph, now I need to do canvas text rendering :-(
23:03	<Hixie>	http://lists.w3.org/Archives/Public/public-html/2007Jun/0375.html
23:03	<Hixie>	o_O
23:53	Hixie	wonders how it is "frequently unclear whether a suggestion is aimed at the language definition or at the browser behaviour specification"
23:53	<Hixie>	isn't it pretty obvious?
23:55	<jgraham>	Hixie: From Phillip's mail to public-html? That whole email made no sense to me.
23:55	<Hixie>	i have to say i generally understand mails to whatwg a whole lot better than those to p-h
23:56	<Hixie>	it's kinda annoying since it makes it harder for me to deal with the p-h ones
23:57	<jgraham>	whatwg (still) seems to be where all the work is getting done... maybe we can change the tag line to "Putting the 'work' into working group" or somesuch ;)
23:58	<Hixie>	no comment