#whatwg on 2008-04-07

00:57	<takkaria>	annevk: on your latest post, first para, s/totaled/totalled/ and s/amount/number/
00:59	<annevk>	ty, fixed
01:00	<annevk>	night all
01:00	annevk	-> bed
01:58	<annevk>	http://lists.w3.org/Archives/Public/public-xhtml2/2008Apr/0020.html
01:58	annevk	can't sleep (for those who read the earlier message)
02:12	<BenMillard>	it was snowing here yesterday morning...very unusual for this part of the UK
02:13	<BenMillard>	especially in April!
02:16	<Philip`>	Unfortunately it had all melted by the time I woke up (around 1pm) :-(
02:16	<BenMillard>	I took some photos...I suck at doing that, though they will appear on my blog eventually
02:21	<tomg>	it was nice
02:21	<tomg>	was shocked at how white it was
02:23	<BenMillard>	indeed, it was proper powder snow
02:23	<tomg>	only light snow was forecast
02:26	Hixie	e-mails the xhtml2 group
02:41	<MikeSmith>	BenMillard - can you give a brief summary of the goals of your ongoing table work?
02:41	<MikeSmith>	is there are particular hypothesis you're working under?
02:42	<BenMillard>	it's mostly to understand how tables get authored in reality, so we can design HTML5 features to be more realistic and robust
02:42	<BenMillard>	particularly the table header association mechanism
02:43	<BenMillard>	where "we" means HTMLWG in general
02:44	<BenMillard>	I should probably summarise the goals in the document :P
02:45	<MikeSmith>	BenMillard - yeah, that would be useful, I think
02:46	<MikeSmith>	summary of what you've done so far, and what else you want to do going forward
02:46	<MikeSmith>	what specific product/deliverable you have in mind
02:46	<BenMillard>	the document starts with a "Numbers" section which summarises what's been done
02:47	<MikeSmith>	for example, we could end up producing/publishing a W3C Note from it
02:47	<MikeSmith>	BenMillard - what's the URL?
02:47	<BenMillard>	oh, sorry I thought you were looking at it! it's here: http://sitesurgeon.co.uk/tables/
02:48	<BenMillard>	deliverables include helping develop the HTML5 header association algorithm with other HTMLWG participants (James Graham and Simon Pieters, mostly)
02:49	<MikeSmith>	BenMillard - hmm, seems like that wouldn't be a specific deliverable but instead something the deliverable/document could be used for
02:50	<MikeSmith>	have you gotten much feedback from WAI folks about it yet?
02:50	<BenMillard>	I don't recall talking to any WAI people about it yet
02:50	<Hixie>	http://www.w3.org/2003/entities/2007xml/unicode.xml has a last-modified date of 1970
02:50	<Hixie>	go w3c!
02:50	<MikeSmith>	heh
02:51	<BenMillard>	MikeSmith, it was done in my spare time so I've not been tracking feedback around it
02:51	Hixie	removes the -N from his command line so that hsi script will always download the file instead of assuming that it has an accurate date
02:52	<Hixie>	oh actually
02:52	<Hixie>	that could be a minefield bug
02:52	<Hixie>	with xslt
02:52	<Hixie>	nevermind
02:52	<MikeSmith>	Hixie - you mean the headers?
02:52	<MikeSmith>	Last-Modified: Sun, 06 Apr 2008 10:46:09
02:52	<Hixie>	yeah
02:52	<Hixie>	yeah, minefield bug with xslt
02:53	<Hixie>	lovely
02:53	<MikeSmith>	ah, OK
02:53	<MikeSmith>	go minefield!
02:54	<BenMillard>	MikeSmith, yes the document itself does not contain a product
02:54	<MikeSmith>	BenMillard - fwiw, I think it might be worthwhile to post the URL and short summary to one of the WAI lists
02:54	<MikeSmith>	if you've not done that already
02:55	<BenMillard>	it's not under active development at the moment; I can't afford the time
02:56	<BenMillard>	I'd like it to be used by anyone who finds it useful, though
02:56	MikeSmith	is getting lots of "Connection reset by peer http://intertwingly.net/blog/index.atom"; when trying to get Sam's feed
02:56	<MikeSmith>	BenMillard - I see
02:57	<BenMillard>	I've been seeking sponsorship from various places to make the work more sustainable
02:57	<BenMillard>	I took the whole of last month off work to try and speed that up, but haven't quite got there
03:02	<MikeSmith>	BenMillard - I would think investing some time in doing a little awareness-raising about might help with getting others to consider sponsoring further work
03:02	<MikeSmith>	e.g., by posting about it to WAI lists or elsewhere
03:03	<BenMillard>	MikeSmith, that's a good idea. can you suggest a specific list for me to post it to? I've only sent it to public-html before now.
03:05	<MikeSmith>	BenMillard - no, I can't suggest any specific list. I don't follow them. Karl Dubost might know
03:06	<MikeSmith>	might be worthwhile for you to get in touch with Michael Cooper and/or Shawn Henry
03:06	<MikeSmith>	both W3C staff for WAI
03:07	<BenMillard>	I've spoken to Shawn Henry in person in November 2007; I could ask her
03:10	<MikeSmith>	BenMillard - yeah, she would certainly be able to tell you
03:11	MikeSmith	drops off to head into office
03:11	<BenMillard>	MikeSmith, thanks for your advice about this. I'm a noob when it comes to W3C!
03:11	<BenMillard>	oh bugger, he left just before I sent that
03:35	<jwalden>	BenMillard: ^
03:38	<MikeSmith>	BenMillard - I'm a noob too.. I've worked for the W3C only for a year now. most people here actually have a lot more experience with the W3C than I do
03:40	<MikeSmith>	I'm sort of like a foreign element that's been inserted, and not clear yet if it's going to make things worse or better
03:42	<jwalden>	<foreignElement id="MikeSmith"/>
03:49	<BenMillard>	MikeSmith, that's cool
03:49	MikeSmith	reads dbaron's latest
03:50	<MikeSmith>	http://dbaron.org/log/20080406-acid3
03:50	<MikeSmith>	"Teaching to the Test" (on HTML5 and Acid3 and Firefox)
04:17	<BenMillard>	MikeSmith, I've updated the tables document: http://sitesurgeon.co.uk/tables/
04:18	<BenMillard>	away for lunch now (at 4am local time!)
05:22	<BenMillard>	MikeSmith, I've updated the tables document http://sitesurgeon.co.uk/tables/ is this better?
05:46	<MikeSmith>	BenMillard - which parts did you change/add?
05:46	<MikeSmith>	The Goals and Deliverables part?
05:46	<BenMillard>	that's right
05:47	<BenMillard>	I also moved the Feedback section to the top to emphasise the openness of it's development
05:49	<BenMillard>	MikeSmith, using the UserName, Message format is more compatible with IRC clients and the #whatwg log than using UserName - Message
05:49	<BenMillard>	e.g.:
05:49	<BenMillard>	BenMillard, which parts did you change/add?
05:49	<MikeSmith>	um, thanks for the tip
05:50	<Hixie>	personally i'm more of a foo: bar fan than a foo, bar fan
05:51	<BenMillard>	either of those are compatible with IRC clients and the log, afaict, but foo - bar is not
05:52	<MikeSmith>	BenMillard - I don't know what you mean by compatible with IRC clients and the log
05:52	<MikeSmith>	not that I really care to get into a discussion about it
05:53	<BenMillard>	the highlight lines directed at you when either of the conventional methods are followed
05:53	<BenMillard>	*they
05:54	<BenMillard>	so if you address me as BenMillard, message it gets highlighted and I'm more likely to see it
05:54	<BenMillard>	(or BenMillard: message)
05:55	<Hixie>	either works with a decent irc client like irssi :-)
05:55	<MikeSmith>	BenMillard - yeah, I'd say that's problem between you and your IRC client
05:55	jwalden	snickers
05:56	<jwalden>	don't most clients get set off when your nick appears anywhere in the line, surrounded by \b ?
05:58	<BenMillard>	the , and : forms are the only ones I've found interoperable...is there a standard for this?
05:59	<jwalden>	standard? IRC? hah!
06:00	<jwalden>	or so I've been led to believe
06:02	<MikeSmith>	I think the standard for IRC is "please leave your %s at the door"
06:03	<MikeSmith>	jwalden - fwiw, XChat highlights the nick of the user who uses your nick in a message
06:03	<MikeSmith>	not your own nick
06:03	<MikeSmith>	which sorta makes sense to me
06:04	<MikeSmith>	I know my own name
06:04	<MikeSmith>	(most of the time)
06:04	<jwalden>	I used the phrase "set off" because it probably differs across clients; Chatzilla highlights the individual message, I assume others do other thigns
06:04	<MikeSmith>	the useful piece of information for this case seems to be, who's calling?
06:04	<jwalden>	s/gns/ngs/
06:06	<BenMillard>	the IRC logs for this channel are athttp://krijnhoetmer.nl/irc-logs/ and it support both the , and : forms. so if you use one of those forms, while in this channel, the highlighting feature will work in the logs
06:06	<BenMillard>	although I agree that any message which mentions a user's name should light up in that user's client
07:12	<hsivonen>	http://bitworking.org/news/317/Revisionist-XHistory
07:12	<othermaciej>	hi all
07:13	<hsivonen>	hi
07:27	<hsivonen>	surprisingly, one of the hotspots of Validator.nu is TreeSet doing a silly number of compares when the inserted items are already ordered or reverse-ordered.
07:28	<hsivonen>	too bad the JDK and Commons Collections don't seem to have head/tail-biased linked list-backed SortedSets.
07:29	<othermaciej>	that's not a good way to make a sorted data structure
07:30	<othermaciej>	(it's a good way to make an ordered associative data structure)
07:30	<hsivonen>	othermaciej: what's not good? TreeSet?
07:30	<othermaciej>	no, a linked-list backed set
07:30	<othermaciej>	a TreeSet (assuming it's a balanced tree) is a good way to make a balanced associative structure
07:31	<othermaciej>	but inserting N items into it should be N log N so it's surprising it would be a hot spot
07:31	<hsivonen>	othermaciej: why not if you know the insertion will always be either to head of the list or the next from head?
07:31	<othermaciej>	if you have items that are already ordered, you would want to use a ListHashSet
07:31	<othermaciej>	(I think that's what Java calls it)
07:32	<hsivonen>	othermaciej: the insertions are almost ordered
07:32	<hsivonen>	that is, the new insertion is most often to the head of the list, but sometimes a slot or two further
07:33	<hsivonen>	(LinkedHashSet is not what I need here)
07:33	<othermaciej>	that doesn't have an insertBefore?
07:33	<othermaciej>	(too bad, it should be easy to do)
07:35	<hsivonen>	it appears it does not
07:39	<hsivonen>	the show source feature ends up comparing locations 29 times the number of location objects
07:39	<hsivonen>	that's not good
08:08	<BenMillard>	annevk, you wrote "heh, RDF fanatics use " and <em> for quotes" and I see things like that quite frequently on the blogs of markup/accessibility enthusiasts/experts
08:12	<BenMillard>	indeed, it's hard to find anyone using <q>...present company excepted :)
08:13	<BenMillard>	getting the right punctuation seems more important to authors than using the right element
08:17	<jwalden>	problem being q's quotation behavior (CSS's, that is) is underspecified, as dbaron tells me
08:18	<BenMillard>	Hixie, I made editorial changes to http://sitesurgeon.co.uk/tables/ which include clarifying the markup used by each group in "How Authors Indicate Headers in Data Tables". They were a bit vague before. Let me know if this changes anything.
08:18	<Hixie>	k
08:19	<Hixie>	i probably won't look at table stuff for some time
09:09	zcorpan_	did not know about Document.strictErrorChecking
09:10	zcorpan_	will define in dom5core what happens when it is false
09:24	<BenMillard>	Philip`, my (badly taken) photos of UK snow are now blogged: http://projectcerbera.com/blog/2008/04#day06
09:46	<jwalden>	oh man
09:46	<jwalden>	that looks AWESOME
10:55	<hsivonen>	othermaciej: swiching to HeadBiasedSortedSet and TailBiasetSortedSet changed the comparison patterns to the better by approximately a factor of 29
10:56	<hsivonen>	I don't know what kind of balanced tree TreeSet has, but it sure compares a lot
10:57	<othermaciej>	how big is your data set?
10:57	<hsivonen>	othermaciej: the HTML 5 spec
10:57	<hsivonen>	about 16000 items in the set
10:58	<othermaciej>	log base 2 of that is 14
10:58	<othermaciej>	must not be that well balanced
10:58	<othermaciej>	(well, close to 14)
10:58	<hsivonen>	or it compares everything twice
10:58	<hsivonen>	or something
10:59	<hsivonen>	the next big hotspots are IO and XPath
11:01	<hsivonen>	making IO go away is hard, but making XPath go away is quite doable and something I want to do anyway
11:02	<othermaciej>	I am still somewhat surprised that inserting into a tree-based data structure could be the top hot spot for a program
11:02	<hsivonen>	doh brainfart re compares everything twice
11:03	<hsivonen>	anyway, the profiler data is what it is.
11:08	<othermaciej>	sorting an already-sorted 16000 element array in JavaScript takes 10 milliseconds
11:08	<othermaciej>	(on Safari on a decent machine)
11:08	<othermaciej>	I would suspect something is broken about TreeSet
11:11	<hsivonen>	the factor 29 wasn't a time factor but number of invocations of compareTo factor
11:11	<hsivonen>	so it can't be CPU timing weirdness with the profiler
11:40	<zcorpan>	2137 entities
11:41	<zcorpan>	it's like learning simplified chinese
11:42	<hsivonen>	I think this entity business is a bad idea, but I haven't gotten around to sending mail yet
11:44	<hsivonen>	annevk: re blog: http://www.photobasement.com/wp-content/uploads/2008/04/quotationmarks.jpg
11:46	<zcorpan>	wiki syndrome?
12:05	<annevk>	zcorpan, fwiw, I think you should drop the strict error reporting thing from dom5
12:09	<jruderman>	hsivonen: nice photo
12:12	<gsnedders>	Philip`: http://pastebin.ca/974108
12:16	<zcorpan>	annevk: isn't that too late given acid3?
12:17	<zcorpan>	annevk: or do you mean just drop the attribute and leave the strict behavior intact?
12:18	<annevk>	what does acid3 have to do with anything?
12:18	<zcorpan>	it checks that createElementNS('...', 'foo::') raises an exception, e.g.
12:21	<annevk>	I meant the method on document, fwiw
12:24	<zcorpan>	ok
12:24	<zcorpan>	although it would be nice to be able to create an html5 parser in js
12:24	<hsivonen>	so here I was parsing XML
12:24	<hsivonen>	and it went really slowly
12:24	<annevk>	zcorpan, document.innerHTML
12:25	<hsivonen>	until I realized I should prevent it from fetching DTDs from w3.org...
12:25	<zcorpan>	annevk: true, the dom3core attribute doesn't help legacy browsers anyway
12:27	zcorpan	uses del.icio.us as his dom5core issue tracker
12:40	<annevk>	hmm, entity changes are impossible to track using web-apps-tracker
13:05	<Philip`>	gsnedders: How come so many people spell "connection" wrong, but get every other header correct?
13:06	<gsnedders>	Philip`: there are plenty of rare mistakes, though
13:06	<gsnedders>	Philip`: like spaces and not hyphens comes up a fair bit
13:07	<gsnedders>	Philip`: but cneonction is caused by a proxy, I can't remember which
13:07	<gsnedders>	Philip`: it was to avoid keeping the connection open, IIRC
13:07	<gsnedders>	Philip`: there was a bizarre reason for it
13:08	<Philip`>	Ah, that's what http://www.nextthing.org/archives/2005/08/07/fun-with-http-headers says
13:08	<gsnedders>	the web is weird.
13:10	<toruvinn>	gsnedders, my guess would be the 'neon proxy', there was something like that.
13:11	<Philip`>	("... I had a database with 2,686,155 page responses and 23,699,737 response headers. The actual downloading of all of this took about a week." - that sounds really quite slow)
13:11	<toruvinn>	haha, awesome page, Philip`.
13:11	<toruvinn>	thanks.
13:13	<gsnedders>	how do you plot something using gnuplot from a data file taking the log of one column of data?
13:13	<Philip`>	gsnedders: "set logscale x 2" might be what you want
13:15	<gsnedders>	Philip`: no, y
13:15	<Philip`>	or "plot 'foo.dat' using 1:log($2) ..." might be
13:15	<gsnedders>	But that's good enough :)
13:16	<gsnedders>	it still shows a huge long tail
13:20	<Philip`>	Everything has a long tail :-)
13:22	<annevk>	from the blog: "Can we get OOXML in HTML5 too? They seem to be very similar in their approaches to standardisation."
13:22	<annevk>	wtf, really...
13:32	<Philip`>	annevk: You might need to be more specific than "the blog", since there are several
13:33	<annevk>	oops, s/the/my/
13:33	<Philip`>	Ah, that narrows it down sufficiently
13:33	<zcorpan>	annevk's blog is the blog, didn't you know?
13:33	<gsnedders>	I mean, nobody reads my blog
13:33	<gsnedders>	Except, maybe, James Holderness (who I strongly suspect does)
13:35	<hsivonen>	hmm. V.nu parser perf sucks compared to Xerces
13:35	<hsivonen>	so badly that I suspect it is IO buffering and nothing in the algorithm
13:38	<Philip`>	Need more abstraction, so you can use the same IO buffering in both implementations
13:42	gsnedders	hopes that email is pointless sending
13:43	<gsnedders>	http://lists.w3.org/Archives/Public/ietf-http-wg/2008AprJun/0124.html
13:44	<gsnedders>	(or rather, I hope that sending that email isn't pointless)
13:44	<Philip`>	Count the number of emails that have been sent in the past day; calculate how much better the world is today than it was yesterday; divide; conclude that all emails are almost entirely useless, and so you should stop writing them
13:46	<gsnedders>	That would be a time-saver.
13:47	gsnedders	concludes he MUST reply to what a girl sent him ages ago
13:47	<gsnedders>	and apologise for being so damned slow.
13:50	gsnedders	can't believe he actually just used an RFC2119 term there
13:50	<hsivonen>	Philip`: neither parser pegs the CPU, btw, which also points to IO
13:51	<Philip`>	It could also point to Thread.sleep calls, but I assume you've avoided doing that
13:56	<annevk>	zcorpan, there's more than one blog?
13:57	<hsivonen>	heh. the hotspot in V.nu is isNcname
13:57	<hsivonen>	which wouldn't be needed if the DOM impl. accepted any element name
13:58	<annevk>	isNcname is becoming easier in a few months, I think
13:59	hsivonen	changes the test setup from DOM to SaxTree
14:05	<zcorpan>	SaxTree doesn't do such checks?
14:07	<hsivonen>	zcorpan: it doesn't
14:10	<hsivonen>	hmm. I'll just try SAX with defaulthandler
14:15	<hsivonen>	a java.util.regex-based isNcname is incredibly bad
14:27	<hsivonen>	looks like it's all about how often they go and read from the underlying FileInputStream
14:31	<hsivonen>	Xerces has special UTF-8 decoding...
14:35	<hsivonen>	OK. I have created a bug in my bytes to UTF-8 buffering
14:37	<hsivonen>	bytes to UTF-16 that is
15:47	<hsivonen>	Hixie: I can now confirm that not calling JDK intern() really makes a difference
16:02	<hsivonen>	Hixie: is this on your radar: https://bugzilla.mozilla.org/show_bug.cgi?id=427329#c7
17:09	<annevk>	hsivonen, I don't think we should start special casing the parser for that
17:09	<annevk>	fwiw
19:35	<hsivonen>	annevk: btw, the NCName thing isn't getting any better per spec--only worse
19:35	<hsivonen>	annevk: the point of checking for NCNames is to avoid exceptions in existing software--not as much to comply with XML infosets
19:37	<annevk>	ah
20:13	<Hixie>	hsivonen: it's not clear to me that the parser is the problem
20:13	<hsivonen>	Hixie: it's claimed that backing out the parser fix helps
20:14	<andersca>	hey Hixie
20:28	<Hixie>	hsivonen: i thought it was claimed that it didn't
20:28	<Hixie>	oh, my bad
20:28	<Hixie>	misread it
21:35	<hsivonen>	http://typophile.com/node/43971
21:36	<annevk>	I wonder if it's really embedding
21:36	annevk	was just reading that
21:39	<annevk>	http://lists.w3.org/Archives/Public/www-style/2007Dec/thread.html#msg84
21:48	<Hixie>	ok wtf
21:49	<Hixie>	"REPORT /webapps/!svn/bc/1409/source HTTP/1.1" is taking up insane amounts of CPU on my box
21:49	<annevk>	maybe html5.org?
21:50	<jgraham>	Er, that would be me
21:50	<Hixie>	aha!
21:50	<Hixie>	the magic of irc
21:50	<jgraham>	I didn't realise it would take up CPU on your box
21:50	<Hixie>	jgraham: go ahead, it's ok
21:50	jgraham	is ignorant
21:50	<Hixie>	jgraham: i'm sure you have legitimate reasons for it :-)
21:50	<Hixie>	jgraham: just making sure it wasn't some runaway script or something
21:51	<Hixie>	what is REPORT, anyway?
21:51	<Hixie>	svn blame?
21:51	<jgraham>	I was just wondering why html5lib's EOF handling appears to be different to the spec
21:51	<jgraham>	Hixie: yep
21:51	<Hixie>	cool
21:53	<hsivonen>	I wonder why multiple ns doesn't go to XHTML5 validation: http://www.w3.org/2008/03/validators-chart
21:55	<Hixie>	i wonder why text/html with DTD doesn't go to (x)html5 validator
21:56	<Hixie>	in fact that whole thing is WAY more complex than necessary or desirable
21:56	<Hixie>	where does it come from?
21:57	<annevk>	W3C :)
21:58	<hsivonen>	Hixie: http://lists.w3.org/Archives/Public/www-tag/2008Apr/0017.html
21:58	<annevk>	http://lists.w3.org/Archives/Public/www-validator/2008Apr/0014.html ?
21:58	<annevk>	Web page study: http://nikitathespider.com/articles/ByTheNumbers/
21:59	<Hixie>	ah
21:59	<hsivonen>	enabling NVDL in Valdator.nu seems to be only a tiny bit of hacking away
22:00	<hsivonen>	once again there's some bad entity resolving that I need to fix
22:07	<hober>	this is awesome: http://nikitathespider.com/articles/ByTheNumbers/0803/MediaTypes.png
22:07	<Hixie>	looks basically like the numbers i got
22:07	<Hixie>	iirc i got 0.0044% to 0.2% depending on what kinds of pages i included
22:08	<Hixie>	(lower if i focused on the actively maintained web, higher if i included everything i could)
22:09	<Philip`>	I got 0.03% application/xhtml+xml from dmoz.org
22:09	<Philip`>	(and 99.8% text/html)
22:10	gsnedders	needs to get more HTTP headers :P
22:10	<Philip`>	gsnedders: Why? :-)
22:10	<gsnedders>	I mean, 1.1 million is nothing
22:10	<Philip`>	Depends on what you want to do with it
22:10	<gsnedders>	write a spec! :P
22:11	<Hixie>	i can never think of good examples for data-*
22:11	<Philip`>	People do real statistics with a sample size of hundreds - you don't always need billions :-)
22:11	<gsnedders>	Philip`: I know :)
22:11	<gsnedders>	Philip`: But it is all the edge cases that are helpful to have a large sample size for
22:12	<Philip`>	The web must be fractal, since you always find more edge cases when you look in more detail
22:12	<Hixie>	for writing the parser, i found that testing implementations was more useful than the data from the web
22:12	<Hixie>	but for defining new features, the data is invaluable
22:13	<Hixie>	i don't understand how people wrote specs without
22:13	<gsnedders>	Speaking of implementations, I need to email a guy at Opera
22:13	<gsnedders>	But I heart my left hand, and typig is slower than normal
22:14	<gsnedders>	typing, eve
22:14	<gsnedders>	*even
22:14	<Philip`>	Do you mean s/heart/hurt/ ?
22:14	<gsnedders>	yes
22:15	<Hixie>	Philip`, you're a braver man than i. i wasn't going to touch that one with a barge pole.
22:15	gsnedders	wonders what that one is
22:15	gsnedders	looks on the lists
22:15	<hober>	I imagine it was the s/// above
22:15	<gsnedders>	ah
22:15	<gsnedders>	oh dear…
22:16	<gsnedders>	now I realise…
22:16	<gsnedders>	I would say I ought to go hide in a corner because I didn't realise, but in this case, that's the wrong thing to say.
22:19	<gsnedders>	Hixie: Can I call you sick for just thinking of that?
22:19	<hober>	indeed.
22:23	<gsnedders>	Now, let me leave before I make an even more regrettable fuck up.
22:33	<Hixie>	annevk: what specs are you editor of these days?
22:36	<Hixie>	hey bloo
22:36	<blooberry>	hey hixie. 8-}
22:36	<Hixie>	wassup dude
22:36	<blooberry>	statistics.
22:36	<blooberry>	(trying to figure out how to present data and things)
22:37	<Hixie>	good times
22:38	<blooberry>	if you say so. ;-} visions of standard deviations dancing through my head
22:41	<Philip`>	Just say "the error bars are too small to show on this graph"
22:41	<blooberry>	I like that. 8-}
22:41	<Hixie>	hah
22:42	<andersca>	hey Hixie
22:42	<Hixie>	hey
22:46	<jgraham>	It's always good if you can claim the error bars aren't meaningful
22:47	<Hixie>	it's not at all clear to me what my error bars should actually be on some of my stats
22:47	<Hixie>	i mean, i can tell you exactly what the count was for the n billion pages
22:47	<Hixie>	it's not an estimate
22:47	<Hixie>	but since it's just a biased sample of an infinite number of pages...
22:47	<Hixie>	i don't know what to conclude
22:48	<jgraham>	Hixie: The error bars are supposed to represent the error on the population average based on the properties of your sample
22:49	<jgraham>	But since, as you note, you have a biased sample of the population it's not clear what that actually means
22:49	<Hixie>	so if n out of N pages had property X, what's the error on the population average for the property X?
22:49	<andersca>	Hixie: I have another application cache question for you
22:49	<Hixie>	go for it
22:50	<andersca>	Hixie: about the networking model
22:50	<andersca>	Hixie: so when a browsing context is associated with an application cache, all loads should go through the cache
22:51	<Hixie>	with the caveats defined in 4.6.5.1. Changes to the networking model, yes
22:51	<andersca>	yeah
22:52	<andersca>	now I understand that if I have a browsing context that is associated with an application cache
22:52	<andersca>	and the current document has a subframe, which is loaded from the cache
22:52	<andersca>	then that subframe's browsing context is not associated with an application cache?
22:52	<Hixie>	iirc the idea is that only the top-level browsing context matters
22:53	<Hixie>	but let me see if i can find that somewhere in the spec
22:53	<andersca>	the cache selection process will be invoked without a manifest URI for the subframe
22:56	<Hixie>	aha, found it
22:56	<Hixie>	"A child browsing context is always associated with the same browsing context as its parent browsing context, if any."
22:56	<Hixie>	from 4.6.2 Application caches
22:57	<jgraham>	Hixie: I think you just use the binomial std. deviation which is (pN(1-p))^0.5
22:57	<Hixie>	jgraham: where p = n/N ?
22:58	<jgraham>	Hixie: Yeah
22:58	<Hixie>	jgraham: so sqrt(n*(1-(n/N))) ?
22:58	<jgraham>	Statistics is not something that I have done a lot of recently
22:58	<Philip`>	Hixie: sqrt(n(n/N)(1-n/N)), I think
22:59	<Philip`>	and then there's a 95% chance the population mean is within +/- 2 s.d. of the sample mean, I think
22:59	<Hixie>	so (pn(1-p))^0.5, not (pN(1-p))^0.5
23:00	<annevk>	Hixie, http://wiki.whatwg.org/wiki/User:Annevk
23:00	<Philip`>	Hixie: Oops, I think I should have said sqrt(N(n/N)(1-n/N))
23:00	<Hixie>	that's what jgraham said, right
23:01	<Hixie>	sqrt(n*(1-(n/N)))
23:01	<jgraham>	Yeah, that's what I said
23:01	<jgraham>	sqrt(N(n/N)(1-n/N) that is
23:01	<Philip`>	Oh, simplifying the multiplications makes it more complex to see if it's right :-)
23:01	<jgraham>	or at least what I meant
23:02	Philip`	suggests it is a premature optimisation
23:04	<Hixie>	that graph can't be right
23:04	<Philip`>	(That calculation of s.d. only works if n/N is sufficiently non-extreme, like 20 < n < N-20 or something)
23:09	<Hixie>	y=sqrt(x(1-(x/N))) for N=1e9 from x=0..N results in a pretty curve that crosses the x axis at 0 and N and that peaks at about y=5e4
23:10	<Hixie>	which seems unintuitive if y really represents the likely error
23:10	<Hixie>	at x
23:12	<Hixie>	wait, n and N are almost certainly not the n and N i was talking about here
23:12	<Hixie>	i'm guessing n is the sample size and N the population size
23:12	<Hixie>	in which case i can't work out the error, since the population size is infinite, or at least unknowable
23:12	<jgraham>	No, N should be the sample size
23:12	<Philip`>	n is the number with property P out of the sample size of N, and the population is assumed to be infinite
23:12	<Hixie>	oh
23:12	<Hixie>	well then
23:12	<Hixie>	something is wrong
23:12	<Hixie>	for this graph doesn't make sense
23:12	<jgraham>	(imagine flipping coins; the population is infinite then too)
23:13	<Hixie>	there's no way that if i find 1000 page out of 1e9 that the error is less than if i find 10000
23:13	<Hixie>	i guess it makes sense that the error would be symmetric
23:14	<Hixie>	about 50%
23:14	<Hixie>	since otherwise you could just define your problem as its reverse and your error would drop to zero
23:14	<Hixie>	but shouldn't the error for n 0.01% or n 99.99% be greater than for n 50%?
23:15	<Philip`>	It should peak at x=5e8, y=1.6e4, not at y=5e4, I think
23:15	<Hixie>	er yes, i meant 1.5e4 but the 1. was cut off on my display
23:18	<Hixie>	annevk: cool, thanks (re wiki page)
23:20	<Philip`>	Hixie: If you had a coin that gave heads 55% of the time, you wouldn't be surprised if it gave 50 heads out of 100 throws, because that's within expected random variation. But if you had a coin that gave heads 5% of the time, you would be surprised if you got 0 heads out of 100 throws (because the chance of that is 0.95^100 = 0.6%)
23:21	<Philip`>	So it's a 5% difference between sample and population means in both cases, but that's expected in the n/N=50 case and too extreme in the n/N=0 case
23:21	<Hixie>	fair enough
23:21	<Philip`>	so the expected variation is much lower nearer n=0
23:22	<Hixie>	makes sense
23:22	Hixie	looks at the actual numbers
23:22	<Philip`>	(though the binomial normal approximation model breaks down when you actually get n=0)
23:22	<annevk>	Hixie, creating a new stats page?
23:22	<Hixie>	annevk: no, bloo made me think about it
23:23	<Hixie>	Philip`: so in a sample of 7e9 pages as my recent one, if i find 500 pages with a tag, that's really 500 +/- 22?
23:23	<Hixie>	i guess that makes sense
23:23	<jgraham>	Near n=0 it becomes poisson-like, right? So the error ~sqrt(n)
23:24	<Philip`>	Near n=0 I think you can just calculate the binomial directly, instead of approximating
23:25	<jgraham>	Right, but if you just want a good estimate and are lazy :)
23:26	<Philip`>	Hixie: 22 is the standard deviation, not the expected error - I think it's something like 66% chance that the sample mean is within +/- 1 s.d. of the true mean
23:26	<Philip`>	Hixie: so you want 2 s.d. (500 +/- 44) for 95% confidence
23:26	<Hixie>	ah right
23:26	<Philip`>	(I do hope I'm remembering this right...)
23:27	<Hixie>	this is all basically a complicated way of saying "we can't really tell anything for sure but we might as well assume it's all right"
23:27	<jgraham>	Philip`: The bit about std deviations is right
23:27	<Philip`>	The 95% thing means if you do this 20 times then you can expect to be wrong once, but hopefully only a little bit wrong :-)
23:28	<Hixie>	except i can't
23:28	<Hixie>	since i can't take a different sample
23:28	<jgraham>	But I _think_ you can make an estimate of the uncertainty on p using this method
23:28	<Hixie>	and i know the numbers precisely for my actual "sample"
23:28	<jgraham>	which is what you really care about
23:28	<Philip`>	Hixie: You should take a random sample of your 7e9 pages, and then you could do proper statistics on that, using the 7e9 as the population :-)
23:29	<Hixie>	that would be worthless
23:29	<Hixie>	since i can just do it on the whole thing!
23:29	<jgraham>	(like naievely you could say the probability of a page containing the tag is 500/7e9 +/- 22/7e9, only if might be more complicated than that)
23:30	<jgraham>	s/naievely/naively/
23:30	<jgraham>	s/if/it/
23:30	<Hixie>	if it's 500 +/- 44 out of N to have 95% confidence that the same proportion applies in the population as a whole
23:30	<Hixie>	that means that out of any random sample of N pages, there'll be 500-44 to 500+44 out of N that have this feature
23:31	<Hixie>	right?
23:31	<Hixie>	which is basically no error
23:31	<Hixie>	i mean, on the cosmic scale of things
23:31	<Philip`>	If they're random samples from the same infinitely large population (where "infinitely large" means "much larger than the sample size"), then yes
23:31	<jgraham>	http://en.wikipedia.org/wiki/Margin_of_error
23:32	<jgraham>	Philip`: Which brings me back to the point about it being good if you can say the error bars are meaningless
23:32	<Philip`>	(because obviously if sample size = population size then you'll find precisely 500 in any sample of size N, so you have to assume infinite population to make sure the samples are independent, I think)
23:32	<Hixie>	i'm going to continue pretending that the margin of error is as close to 0 as makes no difference so long as i find something on more than 10000 or so pages
23:33	<Hixie>	Philip`: yeah, the sad thing here is that the samples aren't at all random for me. They're the N most interesting pages, for some pretty precise and known-useful definition of interesting
23:33	<jgraham>	Hixie: Well if you can get a variation from 0.2-0.044 depending on which pages you sample you're dominated by systematic error anyway
23:34	<Hixie>	jgraham: exactly
23:34	<Philip`>	It seems unlikely that 456 vs 544 pages using some feature would have any practical significance on design decisions, which is all that really matters
23:34	<Hixie>	right
23:38	<Philip`>	Mostly it's just nice to not use too many decimal places when presenting data, like 1e4 out of 7e9 should be 0.00014% and not 0.0001429%, because meaningless decimal places remind me of physics lessons :-)
23:39	<Hixie>	yeah well in my case i have to round the data and add in some error anyway to keep the data from being too accurate
23:39	<Hixie>	so
23:39	<Hixie>	:-)
23:39	<Philip`>	How do you know if you're adding enough error? :-)
23:40	<Hixie>	i'm pretty sure i add enough
23:40	<Hixie>	and that's all i'll say about that :-P
23:42	<takkaria>	Hixie: where's the html5 svn repo viewer online? I can't seem to find it
23:42	<Hixie>	there's a link at the top of the spec
23:43	<takkaria>	that figures. :) ta
23:44	Philip`	is slightly reminded of Cryptonomicon, calculating exactly how much of the collected signals intelligence could be used before it would become sufficiently accurate that it would reveal its source
23:44	<Hixie>	yeah
23:44	<Philip`>	except this isn't quite as serious as a war
23:44	<Hixie>	indeed
23:45	<Hixie>	one of the things i do is report numbers for different characteristics from samples collected at different times
23:45	<Hixie>	so the numbers aren't self-consistent even if you try to combine them
23:46	<Hixie>	(they're close enough though)
23:46	<Hixie>	(to draw conclusions from for the spec, i mean)
23:46	<Philip`>	It's nice to work in areas that are trivial in the grand scheme of things, like HTML, so it doesn't matter when you mess up :-)
23:47	<Hixie>	yeah really
23:47	<Hixie>	we can have a big impact, but if we screw up, oh well! no biggie
23:48	<Philip`>	The internet is a demonstration that you can mess up quite a large number of things and we'll still carry on just fine
23:52	<Hixie>	aaah
23:53	<Hixie>	i broke mathml
23:53	<Hixie>	and didn't notice
23:53	<Hixie>	crap
23:54	<Hixie>	how do we handle <mglyph>
23:54	<fearphage>	http://files.myopera.com/fearphage/static/bugs.xhtml?http://my.imaginary/site/ this document was originally made and served as text/html. not its served as application/xhtml+xml. can anyone tell me why #299801 (3rd test from the bottom) is failing and how to make it pass (if possible). the problem revolves around xml + document.evaluate with a null namespace
23:54	<fearphage>	is there a way to query xml nodes using xpath with a null namespace?