#whatwg on 2007-07-03

00:24	<rubys>	jgraham: ping?
00:27	<Hixie>	if you have what you think is a tree, in the form of a list A of mappings from one node to a list of nodes all of which are in list A
00:27	<Hixie>	is there a way short of walking the entire tree to verify that the list is indeed a tree and that there are thus no loops?
00:29	<othermaciej_>	there probably is, based on what graph properties make the graph a tree
00:29	<othermaciej_>	to be a tree you need to be not just cycle-free but also have exactly one directed edge pointing to each node (except the root)
00:29	<Hixie>	i guess i don't mean a tree, i mean a directed graph
00:30	<othermaciej_>	directed acyclic graph?
00:30	<Hixie>	right
00:30	<Hixie>	basically a have a list of table cells, each of which can be the header (through headers="") for zero or more other cells, and each of which can have zero or more header cells for itself
00:30	<Hixie>	but there mustn't be any loops
00:31	<othermaciej_>	let me look it up in my CLR
00:31	<Hixie>	i mean i'll do the full walk if there's no quicker way
00:31	<Hixie>	(memory is no object)
00:31	kingryan	thinks that's the only way
00:32	<kingryan>	you might be able to cache some of it, though
00:32	<othermaciej_>	I don't even know what you mean by "full walk"
00:32	<othermaciej_>	you'd have to walk every possible path, not just visit every node once
00:32	<othermaciej_>	if you are really brute forcing it
00:32	<Hixie>	yeah
00:32	<othermaciej_>	you'd have to show all paths through the graph terminate
00:34	<othermaciej_>	Hixie: iteratively removing nodes with no outgoing edges is one way
00:35	<Hixie>	ok screw this. i don't HAVE to check that headers="" don't form loops
00:35	<othermaciej_>	Hixie: you'd want a hashtable from node to nodes it points to, and one the other way
00:35	<Hixie>	at least not in the first pass
00:36	<kingryan>	Hixie: you only need to check them if you're going to be walking them (check to avoid inf. loops)
00:36	<Hixie>	yeah
00:36	<Philip`>	I think you could do a topological sort
00:36	<Hixie>	which i don't
00:36	<Philip`>	which'll tell you if it's got any cycles
00:36	<Hixie>	but i was hoping to be able to see how many pages had that problem
00:36	<othermaciej_>	Philip`: I'm not sure the obvious topological sort algorithms will terminate in finite time
00:37	<othermaciej_>	on a graph with cycles
00:37	<othermaciej_>	since topological sorts are desgined to work on a DAG
00:37	<Philip`>	You can just do a depth-first search - start with each node being white, mark each one as grey when you recurse into it, mark it as grey when you recurse back out, and if you ever follow an edge into a grey node then there's a cycle
00:38	<Philip`>	Uh
00:38	<Philip`>	*mark it as black when you recurse back out
00:38	<othermaciej_>	that works
00:38	<othermaciej_>	hmm wait
00:38	<Philip`>	(You can do some thingy with numbering nodes as you turn them black, to get a topological sort, I think)
00:38	<othermaciej_>	I'm not sure it works
00:39	<othermaciej_>	not obvious to me that a cycle couldn't be observable only by visiting a black node
00:42	<othermaciej_>	DFS can detect cycles by identifying back-edges
00:43	<othermaciej_>	your algorithm is right
00:44	<othermaciej_>	I guess that would run in O(E) where E is the number of edges
00:44	<othermaciej_>	which seems like the best you could do
00:47	<Hixie>	and it'll work whatever order i do the nodes in, as far as i can tell
00:47	<Hixie>	which is useful
00:47	<Hixie>	in my case
00:50	<Philip`>	I think I can convince myself it's right by saying that if there is a cycle, then when the DFS reaches some node N in that cycle, it will not mark the node as black until either it has reached another grey node (and found a cycle) or has searched the whole cycle and got back to N (which is grey, so it finds the cycle) or has reached a black node in the cycle; and there can never be a black node in the cycle, because the cycle will be detected before an
00:50	<Philip`>	...before any node in the cycle is marked as black
00:52	<Philip`>	I guess you have to do something to make sure the DFS covers all the nodes (by repeatedly DFSing from some arbitrary remaining white node, until there are none)
00:53	<Hixie>	yeah i'm just going to go through every node with at least one outgoing edge (since i have to visit them anyway for unrelated reasons) and if it's white, i do the search
00:53	<Philip`>	It should be O(V) rather than O(E) because it'll never visit one node more than once
00:54	<Philip`>	except I'm probably confused and it's O(E) too, so it's more like O(min(V, E)), not that anybody actually cares, since V =~ E anyway for non-crazy graphs
00:55	<Hixie>	this is where i find out there's only 5 tables on the whole web with a headers="" attribute and therefore it could be O(N^4) and still complete in finite time
00:59	<othermaciej_>	Philip`: it has to traverse every edge at least once to see the color of the node at the other end
01:00	<othermaciej>	Philip`: but I guess it's O(V+E) since you need to visit disconnected nodes too
01:00	<Philip`>	Got to be careful in case you stumble across some gigantic table with hundreds of rows and columns that's been made accessible with (buggy) headers, since that might cause an O(N^4) algorithm to take a second or two
01:01	<othermaciej>	actually I guess you don't since Hixie's data structure only represents edges
01:01	<othermaciej>	hundreds could be worse than a second or two with an O(N^4) algorithm
01:01	<othermaciej>	N^4 gets bad pretty quickly
01:01	<Philip`>	Oh, whoops, I forgot it'd still have to look along all the edges to already-black nodes
01:02	<Hixie>	yeah N^4 is insanely bad if you've got anything of any kind of size
01:02	<Philip`>	100^4 = 10^8 which isn't all that bad if you're just following a few pointers :-)
01:03	<Hixie>	sadly i have to do a string lookup on every single one of these edges :-)
01:03	<Hixie>	(of course if it's bad, i'll optimise it more. we'll see)
01:04	<Philip`>	You could do an O(E) preprocessing step to do all the string lookups per edge, before doing the horribly inefficient but highly optimised O(N^4) cycle-finding algorithm on it :-)
01:04	<Hixie>	indeed
01:08	<othermaciej>	DFS isn't that hard to code, doesn't seem like a big deal
01:08	<Hixie>	indeed
01:08	<Hixie>	and you'll be glad to know it works
01:08	<Hixie>	sweet
01:08	<othermaciej>	nice
01:09	<Hixie>	it tested my three test tables in 0.244s including compiling the program and parsing the html
01:10	<Hixie>	and given that it took 0.245s to do the same program with only one empty test file...
01:10	<othermaciej>	it runs in negative time!
01:11	<Hixie>	and y'all were worried about it being slow!
01:11	<kingryan>	O(-N^4) ?
01:15	<Philip`>	Give it a really big table to test, and see if it returns the answer before you've even started the program
01:37	<Philip`>	Hmm, just remembered a slower but simpler way to find cycles: use a kind of negated variant of Bellman-Ford, by initialising every node's 'distance' value to 0, then setting v.distance=max(v.distance, 1+u.distance) for each edge (u,v), then repeating num_nodes+1 times, and if any has distance=num_nodes+1 then there's a cycle
01:40	<Philip`>	...or is that totally rubbish and wrong? I'm not quite sure now
02:02	<Hixie>	hsivonen: please confirm that since the last time i checked about your parsing e-mails, you have sent only one further message (about <select>)
02:21	<Hixie>	holy crap, according to this nearly half of all tables with headers="" have a cycle
02:21	<Hixie>	that seems unlikely
02:22	<Hixie>	in fact of 60,000 tables with headers="" that i just parsed, only 194 came out without some sort of error
02:22	<Hixie>	and of those, 177 didn't need headers="" at all because scope="" got the same effect
02:23	<Hixie>	leaving 17 tables out of 60,000 with headers="" (in just over 100,000,000 documents total) that used headers="" in a non-trivial yet correct way
02:23	Hixie	looks at those 17 tables
02:24	<Hixie>	one of them was the table on http://cgi.ebay.ie/Nokia-6210-unlocked-battery-charger-WARRANTY_W0QQitemZ200124682259QQihZ010QQcategoryZ3312QQcmdZViewItem
02:24	<Hixie>	and it only uses headers with the empty string as its value
02:24	<Hixie>	maybe i should exclude those, huh
02:24	<Hixie>	in fact 9 of these were variants on that ebay page
02:25	<othermaciej>	would that require assuming no header is a header for that call?
02:25	<Hixie>	my headers="" algorithm used nothing but headers="" to assign headers to cells
02:25	<Hixie>	so <th> elements have no effect when headers="" is specified
02:26	<othermaciej>	what I'm wondering is, whether that is the specified behavior for headers=""
02:26	<Hixie>	in html4?
02:27	<othermaciej>	yeah
02:28	<Hixie>	ok i clearly need to look for tables with only blank headers="", since all but one of these uses of headers="" that different from scope="" are blank headers="" only.
02:28	<othermaciej>	I guess HTML4 is not very clear on it
02:28	<Hixie>	(http://www.bls.gov/oco/cg/cgs041.htm being that page)
02:30	<Hixie>	and that page only uses headers="" to associate <th>s with parent <th>S
02:30	<Hixie>	it doesn't actually do anything to make the table accessible as far as i can tell
02:32	<othermaciej>	that's a pretty poor record
02:32	<Hixie>	i'm skeptical of the large number of loops
02:32	<Hixie>	that seems unlikely
02:32	<othermaciej>	.3% of usage being error-free seems pretty damn low, even by the already low standards of most HTML features
02:33	<othermaciej>	that does sound suspicious (the number of loops)
02:33	<Hixie>	i also scanned longdesc="" in the same survey. i had my script throw out obviously invalid uses of longdesc="", like pointing to a file that the parent <a href=""> points to.
02:34	<Hixie>	doing a spot check of the pages that came up as "good" uses, one was pointing to the same file, and another was pointing to a file that was the destination of a 301 redirect of a parent <a href="">
02:54	<Hixie>	wow, longdesc is a disaster zone far worse than i had imagined
02:57	<Hixie>	many of these are just pointing to the root of the site!
02:57	Hixie	adds another heuristic to look for that
02:57	<Hixie>	lol, the longdesc="" on http://www.felicieditore.it/ points to http://www.felicieditore.com/, which doesn't exist
03:00	<Hixie>	http://7mobile.de/shop/select?id=101787&v=010000 is a longdesc disaster in so many ways
03:06	<Lachy>	Hixie: is it looking so bad for headers and longdesc that you're going to consider leaving them out?
03:08	<Hixie>	i'm going to _consider_ leaving them out just like i'm going to consider leaving them in
03:09	<othermaciej>	right now it's looking kind of bad for headers even on just a "degrade gracefully in current versions of the #2 screen reader" basis
03:09	<Lachy>	ok. Maybe you could put them in, and include some algorithm to determine when it should be ignored due to it containing an illogical value
03:09	<othermaciej>	which I think was the best argument in its favor
03:10	<othermaciej>	if Hixie's data about how many uses are invalid holds up, anyway
03:10	<Hixie>	yeah i'm getting a sample of those with cycles to check that
03:15	<Hixie>	i think it's fair to say that no valid longdesc will ever point to the root of a domain, right?
03:17	<Hixie>	oh crap, missed dinner. bbl.
04:03	<Hixie>	ok there's definitely something wrong with the cycle detection
04:14	<othermaciej>	I think I found a mistake in CSS 2.1 (at least in the November 2006 WD)
04:15	<othermaciej>	is there any way to see a newer editor's draft so I can check if it is fixed before I report it?
04:15	<Hixie>	http://www.w3.org/Style/Group/css2-src/cover.html
04:15	Hixie	fixes the bug
04:16	<Hixie>	i was indexing using the wrong variable. duh.
04:16	<othermaciej>	can you check for me if this is really a mistake before I make an ass of myself
04:16	<othermaciej>	http://www.w3.org/Style/Group/css2-src/visufx.html says, about overflow, "It affects the clipping of all of the element's content except any descendant elements (and their respective content and descendants) whose containing block is the viewport or an ancestor of the element."
04:16	<othermaciej>	but obviously that is not supposed to apply to overflow on the viewport itself
04:16	<Hixie>	what's the error?
04:17	<othermaciej>	right?
04:17	<Hixie>	right, the viewport is not an element
04:18	<othermaciej>	ok, maybe just a lack of clarity, not an error
04:18	<othermaciej>	since if you interpret it that way, it doesn't say anything about how to clip for overflow on the viewport
04:18	<Hixie>	that sentence doesn't really say anything about anything
04:20	<othermaciej>	later examples seem to assume it is saying something
04:20	<Hixie>	yeah, css2.1 is only marginally better than html4 in terms of spec quality
04:27	<othermaciej>	ok maybe I won't bother with this, even though it was confusing to me, the actual behavior seems to be interoperable
05:02	<Hixie>	Lachy: yt?
06:26	<Hixie>	every page i've checked so far that has non-redundant headers="" actually uses them incorrectly.
06:27	<Hixie>	although maybe we need a heuristic for the top-left cell
06:45	<Hixie>	ok i finally found a page with a real longdesc=""
06:45	<Hixie>	http://www.britanniarescue.com/about/strategy/
06:45	<Hixie>	http://www.britanniarescue.com/online/longdesc/index.php#BRlogo
06:46	<Hixie>	the longdesc is inaccurate, and it would be more useful for the information in that file to be in alt="" text anyway
06:59	<Hixie>	longdesc="mailto:trustee⊙nc"
06:59	<Hixie>	wtf
07:25	<hsivonen>	Hixie: confirmed only one additional email
07:28	<Hixie>	thanks
07:28	<Hixie>	just making sure none of your mails fall through the cracks when i speed-read the html list...
07:55	<hsivonen>	Hixie: should I CC you next time?
07:56	<Hixie>	no, it's ok
07:56	<Hixie>	just making sure
07:56	<hsivonen>	ok
07:57	<hsivonen>	on the face of it, http://www.britanniarescue.com/about/strategy/ seems to have decorative images. why do they bother with longdesc?
07:57	<Hixie>	i just select all mail to html and read it, then select all mail to the next list and read it, etc
07:57	<Hixie>	i have no idea why they use it
07:57	<Hixie>	probably because It's The Law
07:58	<Hixie>	after looking at all this in more detail, i'm starting to suspect that the accessibility advocacy has maybe done more damage than help, sadly
07:59	<hsivonen>	yeah. in some twisted way it seems to me that by speccing accessibility features we might actually create lawyerbombs :-(
08:20	<Lachy>	Hey Hixie, I'm here now
08:21	<Hixie>	hey
08:22	<Hixie>	i found a workaround around whatever it was i was going to ask you
08:22	<Hixie>	which i've forgotten now
08:22	<Lachy>	ok, no worries
08:23	Lachy	is off to see the Transforms movie now
08:23	<Hixie>	aha, the next wave of data is in
08:23	<Lachy>	*Transformers
08:23	Hixie	examines
08:25	<Hixie>	lol
08:25	<Hixie>	one of the longdesc=""s points to a file called spacer.txt
08:25	<Hixie>	i have my doubts about the usefulness of THAT longdesc
08:29	<Dashiva>	How excellent, an accessible spacer gif
08:29	<Hixie>	there are 8 times more longdesc=""s that point to the same page as an ancestor <a href=""> than there are longdesc=""s that didn't get caught on any of my "likely to suck" heuristics
08:30	<Hixie>	and out of 8 million <table>s with a cell with a headers="" attribute, twenty thousand had a cycle in the headers=""
08:30	<Hixie>	jesus
08:30	<Hixie>	and over a million had IDs that pointed to elements that weren't cells!
08:31	<Hixie>	ten thousand had overlapping cells
08:32	<Hixie>	in about four million cases, the headers="" attribute were redundant given the algorithm in the spec for mapping <th>s to <td>s
08:32	<Hixie>	in about 80,000 cases the headers="" attribute _would_ have been redundant if all the headers used <th> elements instead of <td>
08:32	<Hixie>	leaving about 2 million cases that might be valid which i'll have to look at
08:35	<Hixie>	2 for 2 on broken uses so far
09:19	<hsivonen>	http://tools.ietf.org/html/draft-walsh-tobin-hrri-00
09:20	<annevk>	that's been up for a while now, not?
09:21	<annevk>	although I don't think they are actually fixing anything
09:21	<annevk>	they are just widening the range of allowed characters
09:25	<hsivonen>	annevk: may have been. I dunno. found out today
09:25	<zcorpan>	a superset of IRI?
09:26	<hsivonen>	zcorpan: so it seems
09:26	<hsivonen>	URL5
09:26	<zcorpan>	yeah
09:27	<annevk>	that's what we need, yes
09:27	<annevk>	that's not what it is :(
09:28	<hsivonen>	URL, URI, IRI, HRRI, URL5
09:30	<zcorpan>	were there not more names somewhere in between?
09:30	annevk	learns about ephemeral
09:30	<annevk>	there's XRI -> HRRI
09:30	<annevk>	iirc
09:31	<annevk>	IRIs are not done yet fwiw
09:38	<annevk>	dropped / not included / omitted / ...?
09:38	<annevk>	suggestions?
09:40	<annevk>	excluded?
09:41	<zcorpan>	2007-07-01 17:35 Ben 'Cerbera' Millard "absent" might be even better?
09:41	<zcorpan>	2007-07-01 17:35 Ben 'Cerbera' Millard "not included" can still imply "we decided not to include these"
09:41	<zcorpan>	2007-07-01 17:35 Ben 'Cerbera' Millard "absent" just means "not present"
09:42	<annevk>	cool
10:04	<zcorpan>	people really think that new features will suffer less from interop problems than existing features
10:05	<annevk>	it's mostly an academic exercise it seems
10:05	<annevk>	although not a real interesting one at that
10:42	<Hixie>	"Is XHTML 5 the successor of XHTML 2? Of course not." seems to beg the question with tr/52/21/
10:42	<Hixie>	didn't someone already ask him that?
10:44	<Hixie>	oh i see henri basically said that already
10:44	<annevk>	maybe we should have "HTML 5" (language) and HTML and XHTML (syntax)
10:44	<annevk>	the XHTML syntax for HTML 5 shorthand would be XHTML5 but that would be unofficial
10:44	<othermaciej>	s/beg the question/invite the question/
10:45	othermaciej	hopes that here at least he can still be gently pedantic
10:45	zcorpan	hasn't seen the tr/// constructor before
10:45	<othermaciej>	it's sed syntax
10:45	<othermaciej>	(also perl I think)
10:46	<othermaciej>	same source as s/foo/bar/
10:50	<zcorpan>	seems useful :)
10:52	zcorpan	also learns that other puncation and parantheses can be used instead of slashes
10:56	<annevk>	the WHATWG sniffing algorithm doesn't seem to deal with .ico formats, bitmaps, etc.
10:59	<zcorpan>	http://del.icio.us/url/99931bd7993088a7dc60da0a031732e1 -- "(X)HTML4"
10:59	<Hixie>	annevk: seems easiest to just ignore the whole issue, frankly. it's not like the spec is called "xhtml5"
10:59	<Hixie>	annevk: does the spec allow for extra rows to sniff such types?
11:00	<krijnh>	zcorpan: vpieters? :\|
11:00	<annevk>	Hixie, no it says "User agents must ignore any rows for image types that they do not support."
11:00	<annevk>	which seems to conflict with the warning earlier on
11:00	<annevk>	I might have mentioned that on the mailing list already
11:00	<zcorpan>	krijnh: and condor87
11:01	<Hixie>	annevk: ah well we'll have to add rows then
11:09	annevk	ponders about <picture>
11:10	<annevk>	it seems such an obvious failure, how can they not see it?
11:13	<hsivonen>	annevk: indeed
11:14	<hsivonen>	annevk: Sander Tekelenburg's attempt at making it backwards compatible should show that the nice idea gets out of control quickly when you scratch the surface
11:14	<annevk>	neither proposal even works in IE7
11:15	<hsivonen>	I try to focus on tree building instead spending the whole day replying to the list
11:16	<annevk>	I think I'll work on some tests for getBoundingClientRect and getClientRects or something
11:16	<annevk>	lunch first!
11:16	<hsivonen>	I'm getting more and more convinced that grouping by insertion mode first and by element second makes sense
11:16	<annevk>	you're keeping insertion modes?
11:17	<hsivonen>	with fall through for IN_TABLE etc. to IN_BODY and from IN_BODY to IN_HEAD_NOSCRIPT to IN_HEAD
11:17	<hsivonen>	annevk: no. I have just phases
11:17	<annevk>	oh ok
11:17	<annevk>	i like your code for the tokenizer quite a bit
11:18	<annevk>	although the comments are quite verbose
11:18	<hsivonen>	annevk: it's the spec :-)
11:18	<annevk>	yeah :)
11:18	<hsivonen>	too bad that doing the same for tree building is too much work
11:19	<annevk>	we just need lots of testcases
11:19	<annevk>	if zcorpan gets a proper browser framework to work for html5lib tests I assume we'll get even more testcases there
11:20	<hsivonen>	I intend to print my tree builder and the spec and go over them with a highlighter pen to check that everything is there
11:20	<annevk>	especially since the testformat is quite easy and the output can be generated using tools (assuming html5lib is compliant)
11:21	<annevk>	not sure yet how to test the formpointer stuff
11:21	<annevk>	that may require some extension
11:22	<hsivonen>	annevk: I have been thinking of a sanitizer tree that puts an UUID ID on <form> and form='' on out-of-subtree associated inputs
11:29	<Hixie>	so has anyone actually defined the problem that <picture> is intended to solve?
11:31	<hsivonen>	Hixie: implicitly, the problem is that <img> doesn't allow structured fallback--only a plain string
11:31	<Hixie>	aah
11:32	<Hixie>	does he elaborate on why <object> and longdesc="" don't handle this well enough?
11:32	<Hixie>	http://www.grupodignidade.org.br/projetos.php - <img src="img/logo.gif" alt="logo" width="160" height="80" longdesc="http://www.grupodignidade.org.br/img/logo.gif"; />
11:32	<Hixie>	sigh
11:32	<hsivonen>	Hixie: for <object>, yes. for longdecs, I no longer remember
11:32	<Hixie>	k
11:33	<Hixie>	bed time
11:33	<Hixie>	nn
11:33	<hsivonen>	nn
11:39	<annevk>	the table and longdesc study is interesting
11:59	<zcorpan>	hmm, it's not possible to check what case elements are in the dom in html, is it? except perhaps trying getElementsByTagNameNS or something
12:04	<annevk>	don't think so
12:04	<annevk>	unless localName is somehow secured
12:05	<zcorpan>	given webkit's implementation experience with my suggestion about localName, even that seems to be a dead end
12:07	<zcorpan>	i'll just have to use toLowerCase()
12:11	<zcorpan>	http://simon.html5.org/temp/html5lib-tests/wrapper.html -- got something working at least. now i just need to figure out how to parse and test the real files. or perhaps i'll just use another wrapper with some php. that may be simpler, dunno
12:14	<zcorpan>	the function fails in ie if there's a short bogus comment like <!foo>
12:31	<zcorpan>	</> results in a "/" element in ie
12:38	<zcorpan>	same as </foo> really
12:39	<zcorpan>	stray </x:y> gets dropped
12:52	<annevk>	dropping </> works just as well
12:57	<zcorpan>	oh sure. i was surprised that ie didn't drop it
13:46	<annevk>	lol
13:46	<annevk>	tr > tbody > td
13:46	<annevk>	tbody is not implied!
13:59	<Philip`>	Shouldn't that be "tbody > tr > td"?
13:59	<annevk>	yeah
14:01	<Philip`>	Ah
14:43	<zcorpan>	making progress...: http://simon.html5.org/temp/html5lib-tests/wrapper.html
14:44	<zcorpan>	now i just need to make the text file into two arrays
14:45	annevk	wonders in what kind of fantasyland some people live
14:45	<annevk>	"I was thinking exactly the opposite, and wondering whether Microsoft might be persuaded to migrate their horrific ?Active-X? strings from the opening <object> tag to an nested <param>."
14:46	<Philip`>	zcorpan: "Security error: attempted to read protected variable" - why doesn't Opera like that?
14:47	<zcorpan>	Philip`: dunno, works in Kestrel
14:48	<Philip`>	Oh, okay, maybe it's only a problem with 9.2
14:49	<annevk>	evil data: URIs
14:49	<hsivonen>	annevk: in a world where the value of π is a legislative decision
14:55	<zcorpan>	any suggestions on how to read the text file with js?
14:56	<hsivonen>	zcorpan: XHR?
14:56	<zcorpan>	hsivonen: yeah. although in firefox i got a "syntax error" when trying to read .responseText
14:59	<zcorpan>	but let's assume that doesn't happen in firefox and i can read the file... how do i then parse it into two arrays?
14:59	<zcorpan>	my previous attempt with split() was too naïve and didn't really work
15:00	<Philip`>	Regular expressions?
15:00	<Philip`>	Whatever the problem, they are always the solution
15:00	<annevk>	:p
15:00	<hsivonen>	"now you have two problems" :-)
15:00	<annevk>	why doesn't split("\n\n") work?
15:02	<zcorpan>	does that work with multiple lines?
15:02	<zcorpan>	also, what if a test has e.g. \n\n as data
15:02	<zcorpan>	or doesn't the syntax allow for that?
15:02	<annevk>	oh right, yes
15:02	<zcorpan>	i think it does, so long as no test has \n\n as data
15:03	<annevk>	no \n\n can occur
15:03	<zcorpan>	ok
15:03	<annevk>	just split on \n\n#data or something and remove #data from the first line too
15:03	<zcorpan>	splitting removes automatically
15:06	<Philip`>	http://wiki.whatwg.org/wiki/Parser_tests#Tree_Construction_Tests doesn't seem to say it has to have blank lines between tests - the only delimiter is "\n#data\n"
15:06	<annevk>	sure, but the first test doesn't start with \n\n
15:06	<annevk>	Philip`, except for the first test...
15:06	<annevk>	also, two newlines is sort of accepted
15:06	<Philip`>	/^#data$/
15:06	<Philip`>	/^#data$/
15:08	<Philip`>	Uh
15:08	<Philip`>	/^#data$/m
15:10	<Philip`>	(or something like /\n*^#data\n/m if you want to strip newlines, assuming the last test doesn't end with a newline)
15:11	Philip`	wonders if anyone has written test cases for test case parsers
15:12	<Philip`>	though I'm not entirely sure how you'd parse the tests for the test parser
15:12	<zcorpan>	we need a parsing spec for the test case format
15:12	<zcorpan>	-_-
15:40	<annevk>	I tweaked http://wiki.whatwg.org/wiki/Parser_tests#Tree_Construction_Tests a bit to make it more clear what the actual format is
15:41	<Philip`>	The link at the bottom to the tests should probably be updated
15:42	<Philip`>	'a line that says "#errors:"' - probably shouldn't have the colon
15:43	<annevk>	at some point the format used by http://html5lib.googlecode.com/svn/trunk/testdata/tree-construction/tests4.dat should be added too and the description could use some more whitespace...
15:56	<zcorpan>	yay
15:57	<zcorpan>	works in Kestrel now
15:58	<annevk>	zcorpan, sweet
15:58	<zcorpan>	firefox boils at...: Error: unexpected end of XML source
15:58	<zcorpan>	Source File: data:text/html,<script><div></script></div><title><p></title><p><p>
15:58	<zcorpan>	Line: 1, Column: 4
15:58	<zcorpan>	Source Code:
15:58	<zcorpan>	<div>
15:58	<annevk>	ah
15:59	<zcorpan>	is that e4x or something?
15:59	<Philip`>	It works in precisely none of the five browsers I have access to :-(
15:59	<annevk>	put encodeURIComponent around it
15:59	<annevk>	maybe that will make it work better (it's also theoretically more correct)
15:59	<zcorpan>	don't think that's the problem
15:59	<zcorpan>	it's <script><div></script> in the actual test
16:00	<annevk>	maybe catch all error events and silence them?
16:01	<annevk>	iframe.onerror = function ...
16:01	<Philip`>	That would be parsed as E4X, I believe - it's only in the cases of <!--...--> and <![CDATA[...]]> where you have to use type="text/javascript;e4x=1"
16:02	<annevk>	iframe.onerror = null
16:02	<annevk>	or something
16:02	<Philip`>	(http://developer.mozilla.org/en/docs/E4X)
16:02	<zcorpan>	annevk: doesn't help
16:02	<zcorpan>	annevk: don't think JS errors bubble up to the parent document
16:03	<annevk>	zcorpan, iframe.contentWindow.onerror = null
16:03	<zcorpan>	annevk: nope
16:04	<annevk>	does it actually work if you remove that test?
16:05	<zcorpan>	hmm. no.
16:05	<annevk>	btw, it would be nice if you showed the input data in the result tree as well
16:06	<annevk>	makes it easier to analyze potential errors
16:06	<Philip`>	Could change the tests to do <script type="unsupported"> so browsers won't try running them
16:07	<annevk>	that may work
16:08	<zcorpan>	or use //<div> instead of <div>
16:08	<zcorpan>	annevk: done
16:08	<annevk>	done what?
16:09	<zcorpan>	showed the input data
16:09	<annevk>	ah
16:09	<annevk>	does it matter though that browsers run them?
16:10	<zcorpan>	no, don't think so
16:10	<annevk>	zcorpan, btw iframe.contentWindow.onerror = function(foo,bar,baz) { return false }
16:10	<annevk>	might prevent the error from appearing
16:10	<zcorpan>	it's some other reason why it doesn't work in firefox
16:10	<zcorpan>	ok
16:12	<zcorpan>	xhr only works on the same domain, right
16:12	<zcorpan>	might need a server side script to include external tests
16:12	<annevk>	yeah, same-origin
16:15	<Philip`>	If the external tests were in a format that was valid JS, you could include them with <script src>
16:16	<zcorpan>	well, they're not. :)
16:16	<Philip`>	Or if you could change the external tests to be in a format that was valid JS :-)
16:17	<zcorpan>	seems simpler to write a server-side wrapper for this
16:17	<Philip`>	but I guess the point of it being external is that it's external and out of your control
16:17	<annevk>	zcorpan, how about a document.write() version?
16:18	<zcorpan>	annevk: ?
16:18	<annevk>	zcorpan, instead of iframe.src = do iframe.contentDocument.open(); iframe.contentDocument.write(testdata); etc.
16:18	<annevk>	that's how the live-dom-viewer works
16:19	<zcorpan>	ah
16:19	<zcorpan>	ok
16:21	<zcorpan>	it doesn't fire a load even then. but i guess i could make it work. what's the benefit?
16:21	<annevk>	works in IE
16:21	<annevk>	just copy some of the live-dom-ivewer logic
16:21	<annevk>	should be doable
16:24	<zcorpan>	works in firefox with that change
16:25	<zcorpan>	and opera 9.2
16:27	<zcorpan>	ie only wants to load the first test
16:30	<annevk>	that's an improvement
16:32	<zcorpan>	"childNodes is null or not an object"
16:32	<zcorpan>	for (var i = 0; i < node.childNodes.length; i += 1) {
16:34	<annevk>	hmm
16:34	<zcorpan>	ah
16:34	<zcorpan>	contentDocument -> contentWindow.document
16:34	<annevk>	whoa
16:35	<annevk>	that's supposed to be equivalent
16:35	<Philip`>	It's kind of irritating when you're trying to write tests to help interoperability between browsers, but then you can't even write a script to run the tests without hitting non-interoperability issues between every browser...
16:35	<zcorpan>	now it works in ie
16:35	<zcorpan>	Philip`: yeah
16:35	<zcorpan>	but it outputs everything on one line
16:36	<zcorpan>	\n -> \r\n ?
16:36	<annevk>	yeah
16:37	<zcorpan>	YAY!
16:37	<zcorpan>	:D
16:37	<zcorpan>	doesn't work in safari though
16:38	<annevk>	hmm
16:38	<annevk>	blame mjs :p
16:38	<zcorpan>	othermaciej: yt? :)
16:39	<annevk>	IE fails everything because of its fixed <title>
16:41	<annevk>	zcorpan, the test output numbers don't match the test input numbers
16:41	<annevk>	zcorpan, it seems that way
16:41	<zcorpan>	the output numbers is 1 greater right?
16:42	<annevk>	hmm, IE and Opera seem to be one off
16:42	<zcorpan>	yeah
16:42	<zcorpan>	it's correct
16:42	<zcorpan>	the first test is empty
16:42	<zcorpan>	.split(/\n*#data\n/m)
16:42	<annevk>	so why are they one off?
16:43	<annevk>	IE saying it's 24 and Opera claiming it's 25...
16:43	<zcorpan>	"foobar".split("foo") // ["", "bar"]
16:44	<zcorpan>	i guess i could remove the first entry from the array but it seemed simpler to ignore it
16:45	<zcorpan>	they might do different things with split()
16:47	<zcorpan>	yep
16:47	<zcorpan>	javascript:(function(){var arr = "#data\nfoo".split(/\n*#data\n/m); alert(arr.length); })()
16:49	<Philip`>	(Is it intentional that that will match strings like "foo#data\n"?)
16:49	<zcorpan>	not really
16:50	<Philip`>	(That was what the ^ in /\n*^#data\n/m was for :-) )
16:51	<zcorpan>	(fixed)
16:53	<zcorpan>	ok, fixed the number of tests issue
16:56	<zcorpan>	ie passes test 101
16:57	<annevk>	<html><head><title></title><body></body></html> ...
16:58	<zcorpan>	amazing that i got the format right on the first try. i didn't even look at the documentation
16:58	<annevk>	hixie designed it
16:59	<zcorpan>	Hixie: if you could get people use html right on the first try... ;)
16:59	<annevk>	I'm quite disappointed by the large number of fails
16:59	<annevk>	Hopefully that will improve in due course by either updating the tests or the spec
17:00	<zcorpan>	annevk: in which browser?
17:00	<annevk>	all?
17:00	<Philip`>	Could you make a table of the results for all browsers, to see which tests don't match any browser's reality?
17:01	<zcorpan>	i guess
17:01	<zcorpan>	but there are more tests
17:01	<zcorpan>	i want to figure out how to run those
17:01	<zcorpan>	first food
17:01	<annevk>	another for loop around the xhr
17:01	<annevk>	or just merge everything on the server
17:01	<zcorpan>	yeah
17:02	<annevk>	it would be good if you at some point comitted this back to html5lib
17:03	<annevk>	then we can make the acid-parser test
17:03	<zcorpan>	perhaps i don't need to do server side magic
17:03	<annevk>	other things that might be nice: 1) some colors on the result page to make it easier to scan 2) collapsable items on the result page
17:04	<annevk>	especially the second is useful given the large number of tests that fail :)
17:04	zcorpan	makes notes
17:05	<annevk>	zcorpan, did you "fix" the difference in counting with IE?
17:07	<annevk>	I'm thinking that it might be useful to include a bunch of <title></title> in a lot of testcases to make the IE results more usable
17:08	<Philip`>	Could you post-process the results to ignore ones where the only difference is the "\| <title>" line?
17:09	<Philip`>	(or mark as uninteresting, rather than entirely ignore them)
17:10	<annevk>	that'd be another option
17:10	<annevk>	prolly better
17:32	<rubys>	any html5lib developers awake here? :-)
17:36	annevk	is
17:37	<annevk>	zcorpan ported html5lib tests to browsers
17:37	<annevk>	see http://simon.html5.org/temp/html5lib-tests/wrapper.html for tree-construction/tests1
17:38	<rubys>	Anne, can you do me a favor and svn update and then run:
17:38	<rubys>	python parse.py --tree "<p><b><i><u></p><p>X"
17:41	<annevk>	get two <p> siblings the second containing the same as the first plus "X" as deepest child
17:43	<rubys>	nevermind, I found my problem (the actual test2 #45 actually has a new line in the middle)
17:43	<rubys>	sorry to bother you
17:43	<annevk>	no worries
18:01	<annevk>	hsivonen, how would this UUID stuff work?
18:02	<annevk>	hsivonen, what I'm interested in is annotating the test results for tree construction with that information
18:28	<met_>	http://ydnar.vox.com/library/post/webkit-team-adds-audio-video-support.html
18:35	<zcorpan>	annevk: i did
18:40	<othermaciej>	zcorpan: what's the problem?
19:51	<zcorpan>	othermaciej: http://simon.html5.org/temp/html5lib-tests/wrapper.html doesn't work in safari (for windows). don't know why
19:52	<othermaciej>	I was hoping it would be obvious but there's a whole lot of script there
19:53	<zcorpan>	would the web inspector help me debug? how do i activate it on windows?
19:53	<othermaciej>	zcorpan: it's got a "parse error" and a "maximum call stack size exceeded"
19:53	<othermaciej>	the JavaScript error console (in the debug menu) would tell you that
19:53	<zcorpan>	don't see a debug menu
19:54	<othermaciej>	yeah, you have to turn it on with a command-line switch
19:54	<othermaciej>	google for "safari windows debug menu"
19:54	<othermaciej>	I don't remember the details at the moment
19:54	<billmason>	http://rakaz.nl/item/enabling_the_debug_menu_on_safari_for_windows
19:54	<zcorpan>	ok, will do
19:54	<othermaciej>	is dom2string going to recurse to a depth of more than 99?
19:54	<zcorpan>	billmason: cheers
19:54	<othermaciej>	if so, that's probably the problem
19:55	<othermaciej>	we should probably relax that stack limit
19:55	<zcorpan>	it might
19:57	<zcorpan>	but i don't think that's the problem, it didn't work with one test with the input "Test" either
20:03	<zcorpan>	is "run" a preserved word?
20:05	<hasather>	zcorpan: no
20:05	<zcorpan>	what is the SyntaxError: Parse Error on line 1 in http://simon.html5.org/temp/html5lib-tests/wrapper.html ?
20:16	<zcorpan_>	works when i have only 1 test in the file
20:16	<zcorpan_>	2 tests as well
20:17	<hasather>	seems to be a problem with the test that looks like this: "<script><div></script></div><title><p></title><p><p>"
20:20	<hasather>	zcorpan: that seems to be the only test that has unallowed content in a script element
20:22	<jgraham>	zcorpan_: TestData in http://html5lib.googlecode.com/svn/trunk/python/tests/support.py contains the testcase parser that html5lib uses (you have to pass it a list of the section headings e.g. ("data", "errors", "document"))
20:22	<jgraham>	(that was a FYI if you have any more issues with the test format)
20:28	<zcorpan_>	hasather: ah. yes of course
20:29	<zcorpan_>	jgraham: thanks
20:31	<zcorpan_>	othermaciej: seems like the problem is the number of recursions indeed. not sure if i can/will work around that
20:34	<othermaciej>	zcorpan_: I'm sure your function could easily be rewritten not to be recursive
20:34	<zcorpan_>	othermaciej: can you do it for me? :)
20:36	<othermaciej>	zcorpan_: don't have time to actually test, but I can tell you roughly how to do it
20:37	<othermaciej>	you're effectively doing a preorder tree traversal
20:37	<othermaciej>	you can do that with a stack, or since you have parent pointers just with a simple loop
20:38	<othermaciej>	when entering a node, you do the entry processing (print node itself, increment indent)
20:39	<othermaciej>	then you check if it has children - if so, enter the first child
20:39	<zcorpan_>	(the live dom viewer has the same problem btw)
20:39	<othermaciej>	if no children, check for a next sibling - if present, do exit processing for current node and enter the next sibling
20:40	<othermaciej>	if no next sibling, do exit processing for this node, then continue from the parent as if it had no children (i.e. exit to the parent's next sibling or parent's parent and so forth)
20:40	<zcorpan_>	ok. thanks
20:41	<othermaciej>	we use this style of tree traversal internal to webcore all the time
20:41	<othermaciej>	in fact, we have an internal traverseNextNode function that does it
20:41	<othermaciej>	(although that doesn't visit a node again when exiting, which I think you want)
20:42	<zcorpan_>	yeah, i want to catch misnested nodes in ie
20:43	<zcorpan_>	or perhaps that's just a check before you process the children
22:06	<zcorpan_>	hmm. the question is how to handle misnested nodes.
22:17	<Philip`>	zcorpan_: Output "FAIL" and then stop?
22:36	othermaciej	facepalms at continuing mail from Rob Burns
22:38	<zcorpan_>	Philip`: yeah... but the recursive algorithm could output the entire tree anyway, which is nicer for debugging
22:38	<Philip`>	I don't quite see how trying to publish one document after four months counts as "rushing"
22:39	<Hixie>	<td id="m1" axis="mainMenu" headers="m1" valign="top">
22:39	<Hixie>	sigh
22:39	<zcorpan_>	Hixie: hah
22:40	<othermaciej>	now that's some compact information
22:40	<othermaciej>	Hixie: is that the sort of thing causing all the cycles?
22:44	<Hixie>	it's at least one cause
22:44	<Hixie>	i'm going to rerun the survey with a special hack to count those sperately
22:47	<Hixie>	i really have to stop e-mailing public-html
23:04	<zcorpan_>	annevk: are there tests on things like </p>, <html></p>, <head></p>, etc, in the html5lib tests?
23:05	<zcorpan_>	public-html starts to get pretty high traffic again
23:16	<Hixie>	typical longdesc: http://130.83.47.128/masterfiles/descriptions/logo.txt
23:16	<webben>	typical of what?
23:17	<Hixie>	typical of the longdescs that are actually not completely bogus
23:17	<Hixie>	(that's from http://130.83.47.128/vv/ss/comments/13.205.en.tud)
23:17	<Hixie>	(the first one on my list of "interesting" uses)
23:18	<webben>	not a terrible longdesc I suppose
23:18	<webben>	distinguishing between alternate text and explaining what the image is
23:18	<Hixie>	<a href="http://www.google.co.jp/">;
23:18	<Hixie>	<img src="http://blog2.fc2.com/2/20century/file/Logo_20s.gif"; alt="Google" height="75" width="143" longdesc="http://www.google.co.jp/logos.html"; /></a>
23:18	<webben>	shame they didn't explain what the logo actually depicts
23:19	Hixie	bangs head against table
23:19	<jgraham>	zcorpan_: I can't see any tests for those cases (htough I thought anne had checked some in...). If you want to add some I can add you to the html5lib members list
23:20	<webben>	Hixie: maybe the text is helpful for that one
23:20	webben	can't read Japanese
23:20	<webben>	oh wait, Google can read Japanese
23:20	<Philip`>	But that logo.txt longdesc is in the wrong language for that page (which I guess could be because the site's developers had no way to actually test longdesc so it fell out of sync with the page contents)...
23:20	<Hixie>	from that en.tud page, lower down:
23:20	<Hixie>	<img src="/masterfiles/images/blue10x1.gif" alt="[Abstandhalter]" title="[Abstandhalter]" longdesc="/masterfiles/descriptions/abstandhalter.txt">
23:20	<Hixie>	guess what the "/masterfiles/descriptions/abstandhalter.txt" file contains
23:20	<webben>	Philip`: good point
23:23	<Hixie>	i think i've yet to see an actual useful, value use of longdesc="" in this study
23:24	<Hixie>	bbl
23:24	<webben>	Hixie: you should include uses of D-links
23:24	<webben>	since for a long time D-link was used as a longdesc alternative based on poor support for longdesc
23:26	<webben>	see also: http://www.w3.org/TR/WCAG10-HTML-TECHS/#long-descriptions
23:26	<webben>	it would be interesting to know how many links in the wild have a value of D or [D] or similar
23:26	<webben>	s/value/text content/
23:28	Philip`	wants to rewrite his own rubbish survey tool to be slightly less rubbish, so he can get vaguely interesting numbers about common features
23:29	<webben>	how many links ... and what they point to, of course
23:29	jgraham	wants a google-scale cluster to run a survey on
23:30	<jgraham>	and a pony, of course
23:31	<jgraham>	But seriously, Philip`, it would be nice if your survey tool was more widely available. It would be even better if the parser was fast. I wonder if any of the HTML5-parser-in-C projects are going to produce something soon?
23:32	<Philip`>	At least my initial version taught me that SQLite is completely rubbish when you have concurrency - it kept throwing exceptions because the whole database was locked
23:32	<Philip`>	so I need to rewrite it with MySQL or something
23:34	<Philip`>	and I think it should do some simple crawling, rather than only looking at a fixed list of URLs, so it can find more stuff to look at
23:35	<Philip`>	(and a faster parser would definitely be useful :-) )
23:37	<Philip`>	(A Java one would probably be as good as a C one)
23:39	<bewest>	sounds like a bunch of people are interested in some kind of survey tool available to the community
23:40	<webben>	Here's a good example of longdesc-as-long-alternative: http://www.fhwa.dot.gov/hfl/framework/04.cfm referring to http://www.fhwa.dot.gov/hfl/framework/longdesc.cfm#fig1
23:40	<bewest>	purpose would be 2-fold, correct? 1.) survey useage of authoring techniques on the web. 2.) test parsers?
23:41	<Philip`>	3.) Confirm whether Hixie's stats are reasonable, or if he's just making up all the numbers :-)
23:42	<bewest>	I've thought about doing this with ec2 and Alexa's web services
23:42	<bewest>	eg greptheweb, and MSR
23:42	<bewest>	alexa has crawled documents in s3
23:43	<bewest>	but that costs money
23:44	<zcorpan_>	jgraham: sure. i might check in this browser port too
23:45	<zcorpan_>	othermaciej: rewrote the function to not be recursive but still get the same error in safari
23:45	<bewest>	Philip`: so you already have some kind of survey tool? how does it work?
23:46	<Philip`>	bewest: Ah, I wasn't aware of those things, though I tend to never consider anything that requires money :-)
23:47	<bewest>	yeah...
23:47	<bewest>	usually I don't either
23:47	<bewest>	except that I work at the company that makes those services
23:47	<Philip`>	It was just something simple for things like http://canvex.lazyilluminati.com/misc/copyright.html and http://canvex.lazyilluminati.com/misc/summary.html
23:48	<Philip`>	(and a few other things which I can't remember where I put)
23:48	<Philip`>	where I give it a list of a few thousand URLs (from Yahoo search results for arbitrary terms), and it just downloads them then parses them (with html5lib) and looks for certain stuff
23:49	<Philip`>	(and sort of does those things in parallel, if you run lots of copies of the program, except most of the processes keep dying because SQLite gets unhappy)
23:50	<Philip`>	(and then some pages cause quadratic behaviour in html5lib and you have to manually delete them from the database)
23:50	<Philip`>	(so it's all just horribly hacked together :-p )
23:51	<bewest>	heh
23:52	<othermaciej>	zcorpan_: that's odd
23:52	<othermaciej>	zcorpan_: pointer?
23:53	<zcorpan_>	othermaciej: http://simon.html5.org/temp/html5lib-tests/wrapper.html
23:53	<Hixie>	webben: studying text contents is much harder for various reasons
23:54	<webben>	of course it's harder
23:54	<webben>	but given we're talking about what's basically a language for marking up text, such study is pretty critical
23:55	<Hixie>	be my guest :-)
23:57	<othermaciej>	zcorpan_: very confusing
23:57	<othermaciej>	zcorpan_: I'll try debugging it in a while - need to get coffee first
23:57	<zcorpan_>	othermaciej: ok
23:58	<zcorpan_>	man, i've really spent all day on this thing
23:59	<Hixie>	how does it feel to be paid to do this nonsense? :-)
23:59	<jgraham>	zcorpan_: You should now be able to commit to html5lib svn If you're committing tests that html5lib doesn't pass, it's really good to email html5lib-discuss⊙gc so people know there hasn't been a regression