#whatwg on 2008-07-28

09:28	<melvster>	Hi All, looking at example 2.1 http://dev.w3.org/html5/html-author/#html-syntax I am wondering if the <p> should be closed?
09:29	<hsivonen>	melvster: no need to close it there
09:29	<melvster>	is it because it's the last node?
09:30	<hsivonen>	yes, the last p node in a container doesn't need to be closed
09:30	<melvster>	hsivonen: OK thanks!
09:31	<zcorpan>	melvster: "A p element's end tag may be omitted if the p element is immediately followed by an address, article, aside, blockquote, datagrid, dialog, dir, div, dl, fieldset, footer, form, h1, h2, h3, h4, h5, h6, header, hr, menu, nav, ol, p, pre, section, table, or ul, element, or if there is no more content in the parent element."
09:32	<melvster>	zcorpan: thanks again, i should have read that first
09:32	<zcorpan>	no worries :)
09:33	<zcorpan>	though <p><article> has a bad back compat story
09:34	<zcorpan>	and <p><form> is buggy in some browsers
09:34	<zcorpan>	<p><hr> in ie too iirc
09:34	<zcorpan>	and <p><table>
09:35	<hsivonen>	which RFC defines the syntax for charset names?
09:35	hsivonen	should keep better notes in source comments
09:35	<zcorpan>	<p>x<hr> is different in ie
09:39	<hsivonen>	http://tools.ietf.org/html/rfc2978 apparently
09:51	<zcorpan>	hmm, the bail-out list could be different for mathml and svg so that <svg><font> is allowed but <math><font> not
09:53	<hsivonen>	Hixie: is it safe to refer to http://www.whatwg.org/style/specification from elsewhere? is that URI expected to point to a "Working Draft" style?
09:53	<hsivonen>	Hixie: or should I copy the style sheet?
09:53	<hsivonen>	(I'm assuming the style sheet is covered by the WHATWG document license)
09:54	<zcorpan>	could also add <a>, <script> and <style> to the mathml list
09:54	<Hixie>	it's covered by whatever license you want
09:54	<Hixie>	i make no promises about not changing it
09:54	<Hixie>	it's not working draft vs any other kind of spec though
09:55	<Hixie>	iirc i have a class on the body element to decide the 'worker draft' banner
09:55	<zcorpan>	yep
09:55	<hsivonen>	Hixie: ok.
09:55	<zcorpan>	Hixie: what do you think about different bail-out lists (see above)?
09:55	<hsivonen>	(now that I think about it, I already saw zcorpan use the class on body)
09:56	<hsivonen>	zcorpan: one possibility is bailing out on font only if it has the kind of attributes presentational HTML has
09:58	<zcorpan>	hsivonen: oh yep didn't think of that
09:58	<zcorpan>	not sure which is better
09:59	<Hixie>	zcorpan: i'm not yet convinced we can add <svg><font>, need to study that further
10:00	<hsivonen>	Hixie: we could also require a <defs> context
10:00	<Hixie>	that would be profiling svg in a more weird way
10:00	<Hixie>	it's one thing to say "you can't use font"
10:00	<Hixie>	it's another to say "you can use font in these specific cases..."
10:01	<hsivonen>	Isn't <font> in practice always used as a child of <defs> in non-contrived SVG?
10:03	<zcorpan>	Hixie: doesn't not bailing on <script> break some pages with <math><script> ?
10:03	<zcorpan>	(or style/a)
10:04	<hsivonen>	Hixie: could you please add ids to the paragraphs starting with "In the foo state," and giving the conformance reqs for the shape attribute states for image map area?
10:04	<hsivonen>	s/shape/coords/
10:04	<Hixie>	hsivonen: not to my knowledge, why would you bother with the <defs>?
10:04	<Hixie>	if you want changes, send mail
10:05	<Hixie>	i'm not near the editor right now
10:05	<hsivonen>	Hixie: I'm just generalizing from the stuff Philip` found in Wikipedia
10:05	<hsivonen>	Hixie: ok. I'll send mail
10:05	<Hixie>	thx
10:06	<Hixie>	i think basing it on attributes would be better than on context, given that the whole point is to bail if someone does something stupid
10:06	<Hixie>	they're more likely to put an html font after a bunch of random svg copied and pasted, than to put svg font attributes on an html font element
10:07	<zcorpan>	Hixie: makes sense
10:07	<Hixie>	oh btw someone sent me mail about a major bug in the parser that i need to fix
10:07	<Hixie>	basically the generic cdata element parsing algorithm thingy totally doesn't work with document.write()
10:08	<Hixie>	consider <script>document.write("<style>a");document.write("b</style>")</script>
10:08	<Hixie>	(or worse, nested <script> elements)
10:08	hsivonen	fires up his GWT test harness...
10:08	<Hixie>	so i'm going to split the cdata algorithm into its own state
10:09	<Hixie>	instead of being a tokeniser pull, bring it in line with everything else (tokeniser push)
10:09	<zcorpan>	Hixie: why doesn't that work?
10:09	<Hixie>	is there anything else that pulls from the tokeniser at this point?
10:09	<hsivonen>	Hixie: no. (and I've already implemented everything as push)
10:10	<Hixie>	zcorpan: because at the end of the document.write() input stream the tokeniser is stopped and the tree construction stage is exitted, so you lose the fact that you're in the middle of a pulling step
10:10	<zcorpan>	Hixie: ah
10:10	<Hixie>	hsivonen: did you implement the generic cdata thing as having a new variable to preserve the source state?
10:10	<Hixie>	hsivonen: or?
10:11	<hsivonen>	Hixie: I implemented it as a flag in the tree builder
10:11	<hsivonen>	Hixie: and it appears that your example above breaks it :-(
10:11	<Hixie>	ah, somewhat like the earlier split of insertion modes vs states?
10:11	<Hixie>	oh?
10:12	<hsivonen>	at least when using WebKit as the engine in GWT, "b" never ends up inside the style element
10:12	<hsivonen>	it's completely lost somewhere
10:12	<Hixie>	fun
10:12	<Hixie>	if you replace style with script the problem becomes worse
10:12	<Hixie>	because the element is added with the end tag, not the start tag
10:12	<Hixie>	so you end up losing the element altogether in a naive push implementation
10:14	<Philip`>	<script>document.write('<style></sty');document.write('le>')</script> - how would that work with the tokeniser upon seeing the "</", since the "[if] the next few characters do not match the tag name of the last start tag token emitted" condition wouldn't make sense at that point?
10:14	<hsivonen>	I'm trying to review what exactl I'm doing but Eclipse beachballs on me
10:14	<Philip`>	Wait, do I mean <style>?
10:15	<Philip`>	Oh, yes, I think I do
10:15	<Hixie>	Philip`: yeah, i noticed the same problem with the <![CDATA and <!DOCTYPE tokenising
10:16	<Hixie>	Philip`: but that's easy to fix, you just say that the tokeniser stops when it's missing data to resolve an ambiguous state and wave your hand and move on
10:16	<Hixie>	Philip`: "implementation detail"
10:16	<hsivonen>	Hixie: here's what I do:
10:16	<hsivonen>	1) Everything is tokenizer push
10:16	<hsivonen>	2) tree builder has a variable called cdataOrRcdataTimesToPop
10:16	<Philip`>	Hixie: Can't you do the same hand-waving in the generic CDATA whatnots, then?
10:17	<Hixie>	Philip`: no, because when you abort that tree construction stage you return the previous one, which is in the middle of doing the cdata processing
10:17	<hsivonen>	3) If the spec calls for pushing the head element on stack first, cdataOrRcdataTimesToPop is set to 2. else, it is set to 1
10:17	<hsivonen>	4) endTag pops cdataOrRcdataTimesToPop times
10:18	<hsivonen>	and zeros cdataOrRcdataTimesToPop
10:18	<Philip`>	Ah
10:18	<Hixie>	ah
10:18	<Hixie>	that won't work :-)
10:18	<Hixie>	but makes sense given the spec today
10:18	<hsivonen>	5) if cdataOrRcdataTimesToPop > 0, characters just accumulate and returns early without inspecting insertion mode
10:19	<Hixie>	anyway dunno when i'll fix this, i expect it's in the coming few weeks though
10:19	<hsivonen>	It bothers me that I don't know what happened to "b" in the GWT case
10:19	<Hixie>	i've been avoiding the parser folder because i've been hoping the svgwg will fix the issues you, takkaria, and myself raised with their proposal
10:20	<Hixie>	but i guess eventually i'll go in and deal with it
10:21	<hsivonen>	Hixie: well, both takkaria and I said we'd prefer your/zcorpan's suggestion
10:21	<Hixie>	apparently <datagrid> is the next target
10:21	<Hixie>	hsivonen: zcorpan claims the svgwg proposal is as much his as the current spec's :-P
10:23	<Hixie>	ok bed time nn
12:59	<hsivonen>	Does HTML5 define where LWS is really allowed in the http://tools.ietf.org/html/rfc2045#section-5.1 syntax for Web purposes?
13:33	<gsnedders>	hsivonen: no
13:35	<hsivonen>	gsnedders: OK. thanks.
13:35	<hsivonen>	gsnedders: do you happen to document it for HTTP?
13:35	<gsnedders>	(and I don't know either)
13:35	<hsivonen>	it appears I have made up a definition then
13:35	<hsivonen>	I'll just write that down in my spec
13:35	<gsnedders>	I'm still (occasionally) working on the overall syntax of the entire HTTP structure
13:35	<gsnedders>	Not got to anything so exact as parsing actual headers :P
13:48	<hsivonen>	http://hsivonen.iki.fi/html5-datatypes/ comments welcome
13:50	<Philip`>	hsivonen: s/hecking/checking/
13:51	<hsivonen>	Philip`: thanks
13:55	<hsivonen>	selittäkääpä, miten kongressihenkilöt voivat istua tuntikaupalla hearingissa käymättä vessassa
13:59	<zcorpan>	is this valid? <img usemap=# src=x><map name>
14:01	<hsivonen>	zcorpan: no, the name attribute must be non-empty
14:01	<zcorpan>	hsivonen: ah
14:01	<hsivonen>	zcorpan: but according to the datatype lib, usemap=# is valid (checking referential integrity happens elsewhere)
14:02	<zcorpan>	ok
14:18	<hsivonen>	hendry: did you get an instance of the CSS validator running? If yes, under which servlet container?
14:38	<zcorpan>	hmm, why does name allow whitespace
14:39	<hsivonen>	zcorpan: in validator or in spec?
14:39	<zcorpan>	hsivonen: in spec
14:39	<zcorpan>	at leat if id is not present
14:43	<hsivonen>	I've now postponed rel checking well over a year.
14:43	<hsivonen>	I wonder if the rel stuff is still at risk...
14:44	<hsivonen>	at least the registry was discussed relatively recently on public-html
14:59	<Lachy>	I finally found some time to review the SVG WG's proposal. Personally, I'm not particularly fond of it
15:01	<Lachy>	there isn't really sufficient justification for some of the requirements it tries to address, beyond keeping it theoretically-pure-well-formed XML
15:07	<zcorpan>	should we add alt to embed? apparently opera supports it
15:12	<hsivonen>	zcorpan: would it be rendered when the plugin isn't installed?
15:12	<zcorpan>	hsivonen: yes
15:15	<hsivonen>	hmm. Flash is supposed to be accessible in itself. Video plug-ins are supposed to get superceded by <video>. Apart from Silverlight, newer plugins tend to be non-rendered and provide JS APIs
15:16	<hsivonen>	like Gears or the Garmin plugin for integrating GPS devices
15:17	<hsivonen>	it seems to me that the use case would be customizing the "Boohoo. Go install a plugin." message that e.g. Firefox generates as UI.
15:18	<zcorpan>	i guess
15:21	<zcorpan>	hsivonen: s/strings match/strings that match/
15:21	<zcorpan>	hsivonen: might squeeze in a "the" in there too
15:22	<hsivonen>	zcorpan: fixed, thanks
15:22	<zcorpan>	hsivonen: what is xml-name used for?
15:23	<hsivonen>	zcorpan: it's used for XHTML 1.0 backports. it probably shouldn't be in the lib in theory, but putting it there is convenient for me
15:23	<zcorpan>	hsivonen: ok
15:55	gsnedders	wonders whether to do something that'll make him unpopular with many around here: serve XHTML as text/html
15:57	<gsnedders>	Actually, I can just do this in Ruby, and use a pre-existing HTML parser!
15:57	<gsnedders>	Yay!
17:52	Philip`	discovers that if he makes a Jabber client send namespace-ill-formed XML to a group chat, then ejabberd propagates it to all the other clients and they detect the error and disconnect
17:53	<Philip`>	and when they reconnect and rejoin the group, the server helpfully sends the past messages to the newly-joining clients, which breaks them again
17:54	<gsnedders>	Hahahaha.
17:54	<gsnedders>	Awesome.
17:55	<gsnedders>	I think something is wrong. Fx fails all of the HTTP parsing tests
17:56	<Philip`>	I'd be more concerned if it passed them all
17:58	<gsnedders>	Yes, but it doesn't even try running them.
17:58	<gsnedders>	Which is why it claims to fail them all.
17:58	<gsnedders>	expected_xhr.onreadystatechange is never hit :\
17:59	<gsnedders>	Interesting.
17:59	<gsnedders>	Opera has changed behaviour with HTTP/0.9
18:00	<gsnedders>	Got status code 0, expected 200
18:00	<gsnedders>	Got status text , expected OK
18:01	takkaria	chuckles at Philip` and his XML games
18:01	<Philip`>	(It works against individuals by sending normal messages too, but the server doesn't appear to resend them after the first time)
18:02	<jmb>	Philip`: that's pretty nasty :)
18:04	<gsnedders>	Why is onreadystatechange never called?
18:16	<gsnedders>	readyState is getting changed :\
18:19	<Philip`>	https://support.process-one.net/browse/EJAB-680
18:19	<takkaria>	Philip`: would you be able to give me some statistics on the number of pages which include CRs and NULs in attribute values?
18:21	<gsnedders>	Philip`: Also, could you see if you have any pages that start with "HTTP" case-insensitively but not case-sensitively?
18:23	<Philip`>	takkaria: Maybe - I guess it should be easy to modify hsivonen's tokeniser to detect that
18:23	<Philip`>	takkaria: Do you care how many times per page it occurs, or just how many pages it occurs >= 1 time on?
18:23	<takkaria>	Philip`: the latter
18:23	<Philip`>	gsnedders: By "pages", do you mean the body of the page (i.e. after parsing and stripping the HTTP headers)?
18:24	<gsnedders>	Philip`: No, I mean the entire response
18:24	<gsnedders>	(i.e., what the response-line begins with)
18:24	<Philip`>	gsnedders: In that case, no
18:24	<gsnedders>	or status-line, or whatever it's called
18:24	<Philip`>	gsnedders: since I didn't save the raw response bytes, only the parsed representation
18:25	<gsnedders>	Philip`: Ah.
18:25	<Philip`>	gsnedders: (since I couldn't see a way to make HttpClient return the raw response bytes)
18:25	<gsnedders>	Philip`: Write your own! :P
18:25	<Philip`>	gsnedders: My own HTTP client? I can't do that until someone's written a proper spec on how to write one :-p
18:26	<gsnedders>	Philip`: Oh, all I'm writing is how to parse the response/request. You don't need to do either of those :P
18:50	<Philip`>	takkaria: It looks like about 10% have \r in attribute values somewhere, and about 0% have \0
18:50	<Philip`>	and of those 0%, most are JPEG and PDF files
18:51	<Philip`>	Wait a minute, I'll change it to only look at text/html...
18:53	<Philip`>	http://www.slovanova.sk/ - aha, actual HTML with a \0
18:55	Philip`	waits ten minutes while it searches through all the other files
18:57	<takkaria>	10% is an interestingly high value
19:09	<Philip`>	takkaria: From something like 126989 text/html pages in total:
19:09	<Philip`>	16 \0 in attribute value
19:09	<Philip`>	10622 \r in attribute value
19:09	<Philip`>	47 \r\n in attribute value
19:10	<Philip`>	(Those "\r\n" are slightly bogus - it should have aborted after finding the first "\r", but I didn't detect \rs that came after entities and got unconsumed, so it didn't notice until it got to the \n)
19:12	<Philip`>	takkaria: http://philip.html5.org/data/attr-chars.txt lists them all
21:25	<weinig>	Hixie: ping
21:40	gsnedders	still needs a decent idea for his computing project for this year for school :\
21:48	<gsnedders>	40 hours project, with 20 hours for impl.
21:51	<Philip`>	A day of coding? That's not much :-p
21:51	<gsnedders>	You aren't expected to do it in one day :P
21:52	<Philip`>	That doesn't demonstrate much dedication
21:52	<gsnedders>	You're only expected to have an hour of class time per day five days a week
21:53	<gsnedders>	Of course, I'm doing it out of class, so I have none :P
22:48	gsnedders	would like to do something in Haskell or C/C++
22:48	<gsnedders>	But I can't decide what :P
22:49	<gsnedders>	Anybody got any suggestions?
22:53	<jgraham>	gsnedders: Well lerning Haskell might be a 20 hour project on its own. Not much to hand in though
22:53	<gsnedders>	jgraham: That's problematic, especially when it counts for 40% of the final grade.
22:54	<jgraham>	You could implement a domain-specific language for something
22:54	<gsnedders>	Writing an html5 parser is too big, probably
22:54	<jgraham>	gsnedders: an html5 would probably take longer, yes
22:55	<jgraham>	http://effbot.org/zone/simple-top-down-parsing.htm seems like a nice article about language parsing
22:55	<jgraham>	s/an html5/an html5 parser/
22:56	<gsnedders>	jgraham: Yeah, writing HTML 5 would take a while :P
22:56	<jgraham>	Well if you manage it in 20 hours Hixie will look a bit silly :)
22:58	<gsnedders>	hmmm.
23:09	<gsnedders>	I could do something based on trying to detect spam
23:12	<jgraham>	gsnedders: I have a problem that I actually would like a solution to but don't currently have time to implement
23:12	<gsnedders>	jgraham: What problem? :P
23:15	<jgraham>	Many scientists use the arXiv prepint servers to keep up to date with current research. The basically provide a daily list of new preprints ordered by submission time. They are broadly categorised but only in astrophysics / high energy physics / computer science / etc. much broader than the expertise of most readers
23:15	<jgraham>	The time ordering creates two problems. One is that it is hard to find things that you are interested in, especially if they appear down the list somewhere
23:16	<jgraham>	The second is that papers appearing near the top of the listings tend to be noticed more and get more citations --- this is a measured effect
23:17	<jgraham>	What i want is an interface to the preprint server where the day's listings are ordered according to my personal reading habits
23:17	<jgraham>	These would be deermied automatically using some sort of machine learing algorithm
23:19	<jgraham>	Basically the way I imagine it working is that when you click on a paper the keywords from that paper (authors, title, abstract text) are added to some weighting which increases the probability of papers with similar authors, titles or abstracts appearing at the top
23:20	<jgraham>	This is basically just a spam classificaion problem except that you're trying to pick out the most useful items and present them first
23:20	<jgraham>	Rather than discard the least useful items
23:20	<Hixie>	weinig\|kaphine: pong
23:21	<jgraham>	gsnedders: I don't know how easy writing the actual machine learning bit would be but you might be able to find library code for that
23:22	<jgraham>	In fact I know you can because I looked when I first hough about this problem
23:22	<weinig>	hey Hixie
23:22	<gsnedders>	jgraham: It'd be nice to do that with feeds, to do it in more generic form
23:23	<weinig>	Hixie: I as curious if you were considering specifying the rules for HTML entity error recovery at some point in HTML5?
23:23	<weinig>	s/as/am/
23:23	<jgraham>	gsnedders: Sure, although that's not the problem that I'm interestedd in :)
23:24	<gsnedders>	jgraham: The logic used to detect whether an article is of interest or not is the same, though
23:24	<Hixie>	weinig: HTML entity error recovery?
23:24	<gsnedders>	And that, I expect, it the hard part.
23:24	<weinig>	Hixie: some thing akin to the issue described in https://bugs.webkit.org/show_bug.cgi?id=4948
23:25	<jgraham>	gsnedders: Yes, I agree that it's essentially the same problem
23:25	<Hixie>	weinig: that's all already defined
23:25	<Hixie>	weinig: note that it differs from attributes and in body text (the spec handles that too)
23:25	<weinig>	Hixie: it is? great!
23:25	<gsnedders>	jgraham: It's just a matter of plugging in the data source
23:25	<weinig>	couldn't find it in the text
23:25	<weinig>	will look again
23:26	<Hixie>	weinig: just start from the data state in the tokeniser (or the attribute value state in the tokeniser)
23:26	<Hixie>	weinig: and pretend you are parsing each of those cases
23:26	weinig	nods
23:26	<franksalim>	gsnedders, jgraham: you would just need to generate a feed from arXiv if there isn't one already
23:26	<Hixie>	should all be hyperlinked properly
23:27	<jgraham>	franksalim: I'm pretty sure there is
23:30	<jgraham>	Although curiously it seems to be different to the web page