#whatwg on 2008-03-20

00:24	<andersca>	is the idea that the offline cache should only be accessed when navigator.onLine is false?
00:24	<Hixie>	no
00:24	<Hixie>	the offline cache is always accessed
00:24	<Hixie>	is the spec not clear about that?
00:25	<andersca>	"The navigator.onLine attribute must return false if the user agent will not contact the network when the user follows links or when a script requests a remote page"
00:25	<andersca>	that's what got me confused
00:26	<Hixie>	ah, yeah, that should probably be clarified
00:26	<Hixie>	please send mail :-)
00:26	<andersca>	will do
00:26	<Hixie>	that text predates the offline cache
00:26	<andersca>	ah, got it
00:26	<andersca>	Hixie: also, can an application cache group ever have more than two caches?
00:26	<Hixie>	yes
00:27	<Hixie>	if there are many browsing contexts all using the same cache group
00:27	<Hixie>	and the cache group gets upgraded a number of times
00:27	<Hixie>	with only one browsing context getting upgraded each time
00:28	<andersca>	ah, that makes sense
00:29	<andersca>	so I should be able to get rid of older caches as contexts referencing them go away
00:32	<Hixie>	yeah
00:32	<andersca>	cool
00:33	<andersca>	this is starting to make sense
02:18	<Hixie>	i suppose we could handle the indent thing by using multiple colums
02:18	<Hixie>	but that feels like a hack
05:17	<Hixie>	ok i think i've looked at enough tables
05:17	<Hixie>	now to study the ones i marked as being interesting
07:49	bewes1	pokes othermaciej
07:49	<othermaciej>	bewes1: yes?
07:49	<bewes1>	oh, I wanted to ask you about geolocation
07:50	<othermaciej>	I am at your service
07:50	<bewes1>	I didn't think each HTTP request would re-consume the resources necessary to do geolocation
07:51	<bewes1>	is it unreasonable to assume the browser has some cached version of the geolocation results?
07:52	<othermaciej>	if there's a cached version it could be completely wrong
07:52	<bewes1>	without triggering new lookups?
07:52	<othermaciej>	let me use iPhone as an example
07:52	<bewes1>	so it's a cache invalidation issue?
07:52	<othermaciej>	I believe its geolocation has been discussed a little publicly
07:52	<othermaciej>	it uses a combination of WiFi (looking for nearby networks) and triangulation from cell towers
07:52	<othermaciej>	both of these fire up the radio
07:53	<othermaciej>	and may need to use high power signals
07:53	<othermaciej>	it does not do this all the time
07:53	<othermaciej>	only when you ask
07:53	<othermaciej>	if you put it in your pocket and drive across town, it has no idea if you are still near where you were
07:53	<othermaciej>	and it would have to suck a lot of battery to find out
07:53	<othermaciej>	so it's far better to only provide geolocation info on demand
07:53	<othermaciej>	instead of sending it to every server all the time
07:53	<bewes1>	yeah, I guess I was imagining there'd be some kind of reliable cache in between
07:54	<othermaciej>	how would you know if your cache is valid without geolocating again?
07:54	<othermaciej>	keep in mind, many sites (perhaps most) will not even use this data
07:54	<bewes1>	yeah
07:55	<othermaciej>	I would also add that it is awkward to get a request header to script code, and often the interesting location-based things you want to do will be in script
07:55	<bewes1>	yeah, I guess I was missing some perspective on devices for which it's really really expensive and a pretty unreliable cache
07:55	<bewes1>	although your points on usage are good too
07:55	<othermaciej>	I think what I said may be true for just about any phone with location support but no hardware GPS
07:56	<bewes1>	yeah
07:58	<bewes1>	how do you know where the wifi signals are? do you always require at least one cell tower?
07:58	<othermaciej>	I don't know all of the nitty gritty technical details
08:06	<MikeSmith>	fwiw, I use location-based services from both from mobile browsers and other apps on devices here in Japan quite a lot, and location interaction from browsers deployed here is always as othermaciej describes
08:07	<MikeSmith>	that is, nothing is cached
08:08	<MikeSmith>	but some non-browser apps on mobile devices here are capable of dynamically updating as you move
08:09	<MikeSmith>	e.g., they can show you your update position on a map as you move
08:09	<MikeSmith>	in near real-time
08:09	<MikeSmith>	like the navigation systems in cars
08:10	<MikeSmith>	another thing is that the mechanism being used to determine the position isn't really exposed to apps
08:11	<MikeSmith>	there's just an API for making a location query, and the device makes a determination about what the optimal mechanism is for determining your location
08:11	<bewes1>	yeah, I was just curious
08:17	<bewes1>	MikeSmith: what kinds of apps use the near real-time movement?
08:17	<MikeSmith>	bewes1 - apps for trip-planning/trip-routing
08:18	<MikeSmith>	most widely deployed one is this:
08:19	<MikeSmith>	http://www.navitime.com/howtouse.html
08:19	<bewes1>	oh, do you mean the ones in cars? I thought you meant devices you'd carry with you all the time
08:19	<MikeSmith>	Navitime
08:19	<MikeSmith>	not in cars, but on mobile handsets/ mobile phones
08:19	<MikeSmith>	but basically the same functionality as car navigation systems
08:19	<bewes1>	ah
08:20	<MikeSmith>	expect that since here in Tokyo most people don't drive cars, the apps are for train commuting and pedestrians
08:20	<MikeSmith>	use case is that you just fire up the app and tell it where you want to go
08:21	<bewes1>	pretty convenient
08:21	<MikeSmith>	e.g., give it an address or name of a business or something
08:21	<MikeSmith>	then the app shows you the route: tells what train station is closest, the shows you an interactive map of how to get to the train station
08:22	<bewes1>	that's pretty good stuff
08:22	<MikeSmith>	then same thing after you get off the train at the closest station to your destination
08:22	<bewes1>	does it include when the next train arrives?
08:22	<MikeSmith>	yep
08:22	<MikeSmith>	it has access to complete train schedules for all train lines in Japan
08:23	<bewes1>	how often are trains late?
08:24	<MikeSmith>	bewes1 - rarely
08:25	<MikeSmith>	typically only in cases of really bad weather
08:25	<MikeSmith>	or if somebody jumps in front of one of them
08:26	<MikeSmith>	btw, the actual Navitime app and others like it is either a Java/J2ME or BREW app, depending on which handset it's on
08:26	<MikeSmith>	but there's no real reason that a remote Web-based app could not provide the same features
08:26	<MikeSmith>	except that we don't have any standard scripting APIs that would enable Web developers to write such apps
08:59	<hsivonen>	looks like headers='' hasn't registered on html4all yet
09:43	<hsivonen>	hmm. the restriction against entities in the internal character encoding decl is pretty annoying
09:51	<hsivonen>	Hixie: is "# ]
09:51	<hsivonen>	# The encoding name must be serialised without the use of character entity references or character escapes of any kind. "
09:52	<hsivonen>	based on browser behavior?
09:52	<Philip`>	I assume that's to make the preparse thing work correctly?
09:52	<hsivonen>	the way the spec is written, the tree builder's encoding switching thing works even with escaping
09:53	<hsivonen>	Philip`: presumably
09:54	<hsivonen>	but I have to wonder if this kind of layering-breaking restriction is right
09:55	<hsivonen>	Hixie: assuming that the restriction stays, shouldn't it apply not only to the encoding name but the entire attribute?
09:56	<hsivonen>	bah. I'll leave implementation for another day and send email for now
09:57	<hsivonen>	in a way, this restriction is even more annoying than the old 512-byte restriction
09:57	<hsivonen>	mainly because I implemented that one already
10:03	Philip`	only sees half a dozen cases of & in charsets
10:03	<Philip`>	like <meta http-equiv="Content-Type" content="text/html;>charset=iso-8859-1" />
10:03	<Philip`>	and <meta http-equiv="content-type" content="The 4 annual Los Angeles Independent Horror Film Festival ...and Music Too!! Creeping Your Way, October 2005> <meta name=" generator" content="Microsoft FrontPage 4.0">
10:04	<Philip`>	and similar things
10:39	<zcorpan>	Philip`, or Hixie: it would be very useful to have data about pages that use "<!-- ... > ... EOF" and "<script><!-- ... </script> ... EOF" (with no --> before EOF)
10:40	<zcorpan>	we're a bit scared that we'll break many pages when fixing that
10:53	<zcorpan>	or <script><!-- ... </script> ... --> (with no --> before </script>)
10:54	<zcorpan>	(and no further script blocks afterwards)
11:40	Philip`	needs a better grep
11:51	<hsivonen>	has anyone tested if meta can cause browsers to switch to a non-UTF-* non-US-ASCII superset encoding an reparse?
12:50	<Philip`>	zcorpan: http://philip.html5.org/data/pages-with-unclosed-comments.txt shows the pages that have unclosed comments, assuming my regexp was about right
12:52	<hsivonen>	what does the spec change about that relative to Opera's existing behavior?
12:52	<Philip`>	http://parsetree.validator.nu/?doc=http://brianyeedds.com/ - the HTML5 way doesn't seem to work very well
12:54	<Philip`>	http://parsetree.validator.nu/?doc=http://clubleonberg.free.fr/ loses content too
12:54	<Philip`>	http://parsetree.validator.nu/?doc=http://cyrilvictor.photo.free.fr/ too
12:54	<Philip`>	So it seems a reasonably common problem
12:54	<hsivonen>	Philip`: spec bug or validator.nu bug?
12:55	<Philip`>	hsivonen: Spec, since it doesn't reparse comments if it finds EOF inside them
12:55	<hsivonen>	reparsing is evil, though
12:55	<Philip`>	(as far as I'm aware)
12:56	<Philip`>	Why is it evil in this case?
12:56	<Philip`>	(when it's only reparsing comment text)
12:56	<Philip`>	(so it's already got the text saved in memory, and it hasn't been inserting elements that are hard to undo)
12:56	<hsivonen>	1) I makes crafted network errors (EOF) cause scripts to run
12:57	<Philip`>	Oh, http://james.html5.org/cgi-bin/parsetree/parsetree.py?uri=http%3A%2F%2Fcyrilvictor.photo.free.fr%2F says something different
12:57	<hsivonen>	2) It is a world of pain for parser developers
12:59	<hsivonen>	having to rewind the byte stream is seriously bad
12:59	<Philip`>	http://james.html5.org/cgi-bin/parsetree/parsetree.py?source=%3Cscript%3E%3C!----x%3E%3C/script%3Efoo - html5lib bug?
12:59	<hsivonen>	taking the decoded comment buffer and emulating document.writing it is also bad but not quite as bad
13:19	<Philip`>	jgraham: Is it intentional that you disabled all the Python tokeniser tests?
13:20	<Philip`>	Oh, looks like just a typo
13:24	Philip`	fixes the --x> thing in html5lib
13:28	<Philip`>	(Not sure it's the most sensible fix, since it's only really the data='-' case that causes issues, but it should be safe enough...)
13:52	<hsivonen>	http://lists.w3.org/Archives/Public/www-style/2008Mar/0279.html
13:56	<zcorpan>	Philip`: wow thanks!
13:57	<zcorpan>	the first has --!> ...
14:05	<Philip`>	http://parsetree.validator.nu/?doc=http://www.graniteschools.org/jr/churchill/
14:05	<Philip`>	That seems to be the <script> thing you were asking about too?
14:05	<Hixie>	zcorpan, hsivonen: send mail
14:06	<hsivonen>	already did about the encoding stuff
14:06	<hsivonen>	I'll send a pre-emptive mail about zcorpan's expected email :-)
14:06	Philip`	needs to make his HTML-grep multithreaded, but isn't entirely sure how to do that without getting the output stream horribly confused
14:08	<zcorpan>	Philip`: yeah
14:10	<Philip`>	zcorpan: I'm using /(?is)(<script[^>]>[^<]<!--([^>]\|(?<!--)>)</script>)([^<]\|<(?!/script))-->([^<]\|<(?!/script))*$/ which hopefully is something like what you meant by "<script><!-- ... </script> ... --> (with no --> before </script>"
14:12	<Philip`>	zcorpan: http://philip.html5.org/data/pages-with-unclosed-scripts-and-comment-stuff.txt
14:13	<hsivonen>	zcorpan: does Opera mitigate the script running risk upon reparse?
14:15	<zcorpan>	Philip`: yep, thanks
14:16	<zcorpan>	hsivonen: i don't understand the question
14:17	zcorpan	notes that firefox and safari close comments at --!>
14:17	<zcorpan>	and doing so has better compat with pages i've looked at so far
14:18	<hsivonen>	zcorpan: the security characteristics of the bytes in the "comment" changes radically depending on whether they are comments or markup that can run scripts and instatiate iframes
14:19	<Philip`>	zcorpan: In quirks or standards?
14:19	<zcorpan>	Philip`: both
14:19	<zcorpan>	hsivonen: yeah. the reparsed comment can run scripts and instatiate iframes...
14:20	<Philip`>	zcorpan: <!-- --foo> for any foo closes the comment in FF2 standards, as far as I can tell
14:20	<Philip`>	--!> is only special in quirks
14:20	<zcorpan>	i'm testing in firefox 3
14:21	<zcorpan>	http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!doctype%20html%3E%3C!--x--!%3E%20x%20--%3E
14:22	<Philip`>	http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!doctype%20html%3E%3C!--x--foo%3E%20x%20--%3E
14:22	zcorpan	notes that some pages use // -- ></script>
14:22	<Philip`>	and
14:23	<Philip`>	http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!--x--foo%3E%20x%20--%3E
14:23	<Philip`>	http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!--x--!%3E%20x%20--%3E
14:23	<zcorpan>	Philip`: ah
14:25	<Philip`>	s/for any foo/for any foo not containing an odd number of '--'/
14:26	<zcorpan>	and is not > :)
14:26	<hsivonen>	gotta love the simplicity of comments
14:27	<Philip`>	foo = '>' will still cause the comment to close when writing <!-- --foo>, so I believe my statement is true without that detail :-)
14:28	<zcorpan>	so about half of the pages i've looked at so far would break if we implemented html5 comment parsing...
14:28	<zcorpan>	that's 1/2000 of pages
14:29	Philip`	calculated 0.05%, which agrees
14:36	<Hixie>	crockford is still on his warpath
14:37	<Hixie>	http://www.internetnews.com/webcontent/article.php/3735341/Can+We+Fix+The+Web.htm
14:37	<Hixie>	http://diveintomark.org/archives/2008/02/21/the-bolero-of-troll
14:39	<Philip`>	It seems kind of obvious to me that the web is completely broken and can't be fixed, so we're just trying to improve at hobbling along as best we can
14:40	svl	wishes everything he made was as successful and broken as the web.
14:43	<Hixie>	seriously
14:47	Philip`	wonders if there is any successful technology that isn't broken
14:53	<Dashiva>	The paperclip
14:54	<Dashiva>	Unless you count broken by design, that is
14:54	<Philip`>	Paperclips don't work in microwaves
14:54	<Dashiva>	I meant that character in office
14:54	<Philip`>	(at least the plastic-coated metal ones don't)
14:54	<Philip`>	Oh, okay
14:55	<Philip`>	I think being designed around a bad idea counts as being broken :-)
14:57	<Dashiva>	And if you're counting missing features as broken, then anything that doesn't implement the universe is broken some way or another :)
14:58	<Camaban>	isn't the universe broken too? :)
14:58	<Philip`>	I don't count all feature requests as bugs :-)
14:59	<Philip`>	I think entropy is a fundamental long-term design flaw
15:06	<Camaban>	lol
15:06	<hsivonen>	hmm. I wonder if I can trust all Java implementations to provide a non-buffering Windows-1252 decoder (on that can be swapped out without stuff getting lost in a buffer)...
15:38	<hsivonen>	does anyone have good test cases that should cause the parser switch encodings mid-stream or reparse?
16:17	<hsivonen>	I deployed lots and lots of parser changes on Validator.nu.
16:19	<Hixie>	cool
16:19	<Hixie>	i'm looking at tables like this a lot:
16:19	<Hixie>	http://www.linz.govt.nz/publications/statement%2Dintent%2D0607/financial/financial-statements/financial-performance/index.html
16:20	<Hixie>	i'm tempted to say that a header cell in the first column, with no significant data cells on its row, should become a row group header
16:20	<Hixie>	but it actually breaks on that example
16:20	<Hixie>	so i don't know if i should
16:20	<Hixie>	(it breaks on the last row)
16:20	<Hixie>	(everything else could be fine)
16:22	hsivonen	wishes each Gecko docshell had a thread
16:23	<hsivonen>	Firefox 3 beachballs on me
16:28	<Pavlov_>	i haven't had it beachball on me in ages
16:29	<hsivonen>	Pavlov_: try http://parsetree.validator.nu/?doc=http://www.whatwg.org/specs/web-apps/current-work/source
16:29	<hsivonen>	with lots of other tabs open
16:30	<Hixie>	wfm in ff2, though interaction is a bit jerky while the page is loading. (x11)
16:34	<Hixie>	ironically, i think the wide headers in tables like the following actually probably don't deserve to be read out regularly:
16:34	<Hixie>	http://broads-authority.gov.uk/boating/navigating/tide-tables.html
16:44	<Hixie>	http://www.flickr.com/photos/joeclark/192878174/ is an insane table
16:49	<Hixie>	i don't plan on supporting this kind of table: http://www.flickr.com/photos/joeclark/185786111/
16:50	<hsivonen>	Hixie: why not if the icons have alt
16:50	<Hixie>	the problem with that table is the column headers
16:50	<Hixie>	there are two columns, but they vary in number of cells covered per row
16:51	<Hixie>	(the images are a non-issue)
16:51	<hsivonen>	I'd say there are 8 columns
16:51	<Hixie>	(i should clarify that i mean that i won't support it without use of scope or headers)
16:51	<hsivonen>	ah
16:51	<Hixie>	(obviously if you explicitly list the headers, you can do almost anything)
16:55	<Hixie>	http://sitesurgeon.co.uk/tables/astro/06-seasons/original.html is weird
16:55	<Hixie>	i don't understand why the years are duplicated
16:56	<hsivonen>	fwiw, the parser at parsetree.validator.nu is now configured to allow reparsing on late meta
16:57	<Hixie>	let me know if you ever get a page that validates despite having to trigger that
16:58	<hsivonen>	Hixie: the validator is configured to give up if the reparse would have to be triggered
16:58	<hsivonen>	since the validator is in the streaming mode and cannot undo the start of the document
16:58	<Hixie>	ah ok
16:59	<hsivonen>	I implemented on-the-fly encoder change if the characters decoded so far would decode the same way with all ASCII supersets
16:59	<hsivonen>	but I think that code path will be extremely rare, because when meta is seen, it's not only the past chararters that matter but only the decoded upcoming characters in the read buffer
17:00	<Hixie>	s/but only/but also/
17:00	<hsivonen>	the rarity of the code path occurred to me only after implementing
17:02	<Hixie>	http://www.biggerbras.com/bras-by-size-panties-shapers-sizing-catalog.shtml is weird because i don't know what column i'd use for the horizontal headers
17:02	<hsivonen>	I think the plausible real-world case would go like this: over 512 bytes of inline ASCII-only style and script, meta declaring utf-8, a couple of KB of ASCII text, a copyright sign at the footer
17:03	<Hixie>	a lot of the more complicated tables are financial
17:03	<Hixie>	i recommend we get rid of money
17:04	<hsivonen>	Hixie: if you consider the actual use case of converting measurements to bra sizes, you'd have "Back Size" cell as the row heading
17:04	<hsivonen>	since Band size is a computed value
17:04	<Hixie>	yeah, that was my conclusion too
17:05	<Hixie>	but i don't know if that's what blind users would find most useful
17:05	<hsivonen>	besides, band size culumn is redundant with the actual data cells
17:05	<hsivonen>	column
17:16	hsivonen	notes that a non-ASCII-superset sniffed by chardet can never become certain
17:17	<hsivonen>	I wonder if chardet-supported non-ASCII superset encodings are common
17:17	<Hixie>	are there any?
17:18	<Hixie>	or do you mean the BOM detection?
17:19	<hsivonen>	I mean detecting UTF-16BE or UTF-16LE if those are supported by chardet
17:21	<hsivonen>	they seem to be
17:21	<Hixie>	my ff2 has stopped having a working "find", so i can't search for utf-16, sigh
17:22	<hsivonen>	There's nsUCS2BEVerifier and the same as LE
17:38	<hsivonen>	hmm. why am I seeing Ukranian and Russian in the Firefox autodetector menu but I'm not seeing them in the jchardet source?
17:40	<hsivonen>	bah! the port is incomplete!
17:40	<hsivonen>	it lack nsCyrillicDetector
17:49	Philip`	sees an XML namespace "urn:ietf:params:xml:ns:netconf:base:1.0", which seems much harder to remember than the w3.org ones
17:51	<hsivonen>	it comes from the IETF, so it has lots of colons like an IPv6 address :-)
17:58	<hsivonen>	hmm. If both jchardet and the ICU detector are enabled, which one should run first?
17:59	<Philip`>	It quite looks like the NETCONF RFC misunderstands XML namespaces
18:00	<Philip`>	e.g. for <rpc message-id="101" xmlns="...netconf..." xmlns:ex="...example..." ex:user-id="fred"> it says "Note that the "user-id" attribute is not in the NETCONF namespace.", which seems to be implying that message-id is
18:01	<Hixie>	wait let me get this right
18:01	<Hixie>	are you saying that someone writing a standard... didn't understand xmlns?
18:01	<Hixie>	surely you jest
18:02	<othermaciej>	xmlns misunderstood, film at 11
18:03	<Philip`>	and also it has <rpc-reply xmlns="...netconf..." xmlns:xc="...netconf...">...<top xmlns="...example..."><interface xc:operation="replace"> in an example, to set "the operation attribute" (and it never says that should be a namespaced attribute, and never implies it except in the example)
18:04	<Hixie>	i am shocked.
18:04	<Hixie>	shocked, i say.
18:04	<Hixie>	shocked and dismayed.
18:04	<hsivonen>	attribute namespaces misunderstood film at 11
18:05	<andersca>	lo
18:05	<andersca>	l
18:05	<Philip`>	At least it forbids doctype declarations, which could be considered sensible
18:06	<standards_barbie>	xmlns is hard, let's go shopping
18:10	<Hixie>	i wonder if i should just treat "<tr><th>X</th> <!-- no significant content --> </tr>" the same as "<tr><th colspan=N>X</th></tr>" (where N = number of columns)
18:10	<Hixie>	instead of treating the former in a separate way
18:14	<Hixie>	data:text/html;base64,PHRhYmxlPjx0cj48dGg%2BSDx0aD5IPHRoPkg8dHI%2BPHRoPkg8dHI%2BPHRoPkg8dGQ%2BRDx0ZD5EPHRyPjx0aCBjb2xzcGFuPTM%2BSDx0cj48dGg%2BSDx0ZD5EPHRkPkQ8dHI%2BPHRoPkg8dHI%2BPHRoPkg8dGQ%2BRDx0ZD5EPHRyPg%3D%3D
18:14	<Hixie>	what should the default interpretation of that table be?
18:24	Hixie	trips on some passing tumbleweed
18:25	<Philip`>	I think the browser should add a <caption> element saying "ERROR: Table is too complex"
18:28	<Hixie>	heh
18:30	<Philip`>	It seems helpful for authors if the algorithm is simple and predictable enough that they can tell whether their slightly-fancy table is going to be automatically headerised correctly, or if they're going to have to manually annotate it
18:30	<Philip`>	at least for authors who know a bit about the table thing, and care about it, but can't be bothered to use some awkward ugly tool to tell them how the table really is processed
18:31	<Philip`>	(rather than trying to make the algorithm really complex so it handles lots of weird edge cases)
18:46	<andersca>	Hixie: "If the resource is not being loaded as part of navigation of a top-level browsing context"
18:47	<andersca>	Hixie: that statement would be true when loading something in an iframe, right?
19:03	<Hixie>	andersca: or a stylesheet, or an image, or XMLHttpRequest, etc, yeah
19:03	<Hixie>	or script
19:03	<Hixie>	or any number of other things
19:04	<Hixie>	basically anything except the maii document :-)
19:04	<Hixie>	main
19:04	<andersca>	although I guess that only xmlhttprequests can cause the cache selection algorithm to be invoked
19:05	<Hixie>	i think the cache selection algorithm can only be invoked when there's a browsing context
19:05	<andersca>	ah, yeah
19:46	<Hixie>	http://www.pcworld.com/article/id,140408/article.html is funny
19:47	<Hixie>	(given safari 3.1)
19:47	<andersca>	...nothing about safari :(
19:48	<Hixie>	yeah i love that the only browser to actually _ship_ <video> is the only one not mentioned
19:49	<othermaciej>	the point being, they wanted to do it, and a few months later we actually did it
19:49	<Hixie>	exactly
19:49	<Hixie>	http://www.linuxworld.com/community/?q=node/1768 is also funny
19:49	<andersca>	lol
19:49	<Hixie>	because actually, what the commenter doesn't realise is that if what he suggests were to happen, it would have the opposite effect
19:50	<tomg>	heh
20:06	<hsivonen>	clearly, one-way IO streams are the wrong abstraction for feeding an HTML parser from HTTP
20:07	<hsivonen>	the right interface would reuse the HTTP lib cache backing store for rewinding
20:07	<othermaciej>	that assumes the cache is available as a usable memory buffer during parsing
20:07	<othermaciej>	likely untrue
20:07	<othermaciej>	(unless the item is already in cache)
20:08	<othermaciej>	(or came in as one network read)
20:08	virtuelv	refrains from commenting on <video>
20:10	<hsivonen>	othermaciej: it seems to me that the HTTP lib should create a memory cache object for uncacheable resources in case of rewinding
20:11	<othermaciej>	hsivonen: what I mean is, if it's a resizable buffer, you can't count on it remaining good while the network thread may be loading more data
20:11	<othermaciej>	though you could count on the individual chunks it gives you I suppose, depending on design of the library
20:12	<virtuelv>	or wait, I'll comment, but emphasizing that this is my highly personal opinion, and not reflecting anybody else's: Shipping video without support for a freely implementable codec is like not shipping <video> at all
20:12	<hsivonen>	oh yeah, the interface can't be a simple memory pointer
20:12	<othermaciej>	virtuelv: I'm sure at least one of the many codecs QuickTime handles is freely implementable
20:13	<othermaciej>	and I know for sure that you can download an Ogg codec, and that is believed by many to be freely implementable
20:13	<hsivonen>	I guess I should test if Safari 3.1 got the codecs parameter right
20:13	<othermaciej>	I'm not sure it did
20:13	<othermaciej>	test cases and/or bug reports welcome
20:14	<othermaciej>	don't worry about it poisoning the well, as you can see we are on a faster release cycle these days
20:14	<virtuelv>	othermaciej: yes, but unless codecs are automatically located and automatically installed, next to noone is going to install it, hence rendering it unusable
20:14	<virtuelv>	(the freely implementable quicktime codecs are likely aged and inefficient, no?)
20:14	<hsivonen>	actually, I'm worried that the codecs RFC may have too big a mismatch with QuickTime/GStreamer/DirectShow codec identities
20:15	<gsnedders>	that's hardly a new issue. Just look at charsets
20:15	<othermaciej>	of the codecs QuickTime supports, I am pretty sure at least H.261 and MPEG-1 are freely implementable (in the sense that all patents must be expired), and probably H.263 as well
20:15	<othermaciej>	I don't know if it is that bad a mismatch
20:15	<gsnedders>	othermaciej: H.263 is far too recent
20:16	<gsnedders>	othermaciej: MPEG-1 is too recent, but probably not by the time with get to REC
20:16	<virtuelv>	then again. this is what is wrong with the entire patent system
20:16	<hsivonen>	does any browser support any flavor of ISCII?
20:17	<othermaciej>	virtuelv: for now we are only exporting the platform capabilities, but we are trying to help the work towards a free baseline codec
20:18	<othermaciej>	gsnedders: I think I was a little mixed up, I believe the state with H.263 is that in practice none of the possibly relevant patent holders appear to ever assert their patents against it
20:18	<gsnedders>	othermaciej: what platform capabilities on Windows? DirectShow? Or do you require QT?
20:18	<othermaciej>	not sure if that counts as "freely implementable"
20:18	<othermaciej>	we require QT on Windows
20:18	<othermaciej>	the Gtk port uses GStreamer
20:18	<othermaciej>	the Qt port uses Phonon
20:19	<othermaciej>	I wouldn't rule out supporting DirectShow on Windows someday
20:19	<gsnedders>	othermaciej: I doubt whether that's good enough for the people who need to take the risk
20:19	<othermaciej>	or possibly having some built-in codecs
20:20	<virtuelv>	othermaciej: it is the use of a non-free format that irks me
20:20	<gsnedders>	I'm not sure how I feel about build-in codecs. On the fact of it, I'm against it. Why re-implement something that the platform provides?
20:20	<virtuelv>	many linux distros do not install stuff that has known patent issues
20:21	<hsivonen>	how recent is Phonon? I don't recall seeing it on Ubuntu package lists, so I was unaware of the whole framework
20:21	<gsnedders>	hsivonen: Part of KDE4
20:21	<gsnedders>	hsivonen: so pretty
20:22	<othermaciej>	gsnedders: I mean that if there is a common baseline then maybe we'd provide it for platforms that do not have a platform version, should there be any
20:22	<gsnedders>	othermaciej: hmm, I'd lean towards just installing it at a platform level with the browser
20:22	<hsivonen>	gsnedders: thanks. that explains it
20:32	<Hixie>	my whiteboard is full of little table diagrams
20:40	<Hixie>	http://junkyard.damowmow.com/306
20:44	<andersca>	nice
20:59	<hsivonen>	Philip`: did you develop a list of good chardet test URIs as a side effect of your chardet testing?
21:03	<svl>	One of my friends just brought up a 'problem' with dialog: how to use if the conversation has "branches"? (e.g. choose-your-own adventure, marking up a script from a computer game, ...)
21:03	<svl>	Personally I don't think anything is a good fit for that... but anyone care to differ?
21:03	<Hixie>	<dialog> is just for the dialog part
21:03	<Hixie>	the prose is still in <p>s
21:04	<Hixie>	so you'd have <p>...</p><p>...</p><dialog>...</dialog><p> Do you want to <a href=...>...</a> or <a href=...>...</a>?</p>
21:05	<Philip`>	hsivonen: I'm not quite sure which chardet testing you mean
21:05	<svl>	Ah! So start a new dialog for each branch.
21:06	<Philip`>	Oh, that thing with buffer lengths
21:07	<Philip`>	hsivonen: I'm not sure what a "good chardet test URI" would be, so I don't already have a list of them
21:08	<hsivonen>	Philip`: ok. A good chardet test URI would be a real-world page that lacks an external encoding declaration, a BOM and a charset meta within the first 512 bytes
21:10	<hsivonen>	on the other hand, Windows-1252 pages with no declarations are needed for testing against the detector accidentally degrading the result compared to simple default
21:13	hsivonen	realizes that Philip` already published http://philip.html5.org/data/charsets.html
21:13	<andersca>	Hixie: got another q for you
21:13	<Hixie>	shoot
21:13	<andersca>	Hixie: "If there is already an application cache identified by this manifest URI, and that application cache contains a resource with the URI of the manifest"
21:13	<andersca>	I should only look at the newest cache in the group there, right
21:14	<Philip`>	hsivonen: Ah, okay - I could fairly easily make a list of pages with no HTTP Content-Type and no meta in 512 bytes
21:14	<Hixie>	the newest "idle" or "complete" or whatever-i-called-it cache, right
21:14	<Hixie>	andersca: i should make that clearer, can you mail about that?
21:14	<Philip`>	hsivonen: (I don't believe I found a single page with a BOM, so that's not especially relevant)
21:14	<andersca>	Hixie: I sure can
21:14	<Hixie>	andersca: thanks
21:15	Hixie	is so deep within the table algorithm it hurts
21:15	<Hixie>	right now i'm changing my boneheaded decision to make the table argorithm be one-based to being zero-based
21:15	<Philip`>	hsivonen: (or my BOM-detector was buggy)
21:16	<hsivonen>	Philip`: that kind of list would be useful
21:21	<Hixie>	i need to change y_current to y_next, someone remind me of that in about 25 minutes
21:23	<andersca>	Hixie: hmm, cache status is per group,
21:23	<Philip`>	hsivonen: I have a list of 42K URIs with neither HTTP content-type nor sniffable-in-512-bytes meta
21:23	<Hixie>	right
21:23	<Philip`>	hsivonen: (i.e. about a third of all the pages)
21:23	<Hixie>	but while you're running the update thingy, one of the caches isn't ready
21:23	<Hixie>	right?
21:23	<Hixie>	something like that
21:23	<Hixie>	i forget the terminology i made up
21:23	<andersca>	this is not the update thingy, this is the select thingy
21:23	<andersca>	:)
21:23	<Hixie>	right
21:23	<Hixie>	but they can happen at the same time
21:23	<Hixie>	so the select thingy needs to be aware of the update thingy
21:24	<hsivonen>	Philip`: great. do you have it at an URI?
21:24	<Philip`>	hsivonen: I have a file:// URI for it
21:24	<andersca>	Hixie: ah
21:24	<andersca>	Let cache be the most recently updated application cache identified by manifest URI (that is, the newest version found in cache group).
21:24	<Hixie>	right
21:25	<andersca>	so it doesn't need to be the newest cache, it needs to be the most recently updated cache
21:25	<Hixie>	isn't that the same thing?
21:25	<hsivonen>	Philip`: I think I can't dereference file:// properly. but then I don't need the whole list, since I'm not patient enough to inspect so many pages manully
21:26	<Philip`>	hsivonen: Also http://philip.html5.org/data/pages-without-obvious-charset.txt
21:26	<Philip`>	Uh, and s/&/&/g before using it
21:26	<hsivonen>	Philip`: thank you
21:26	<Philip`>	(Actually, I might as well make that change...)
21:28	Philip`	wonders why sed doesn't work
21:29	<Philip`>	Anyway, fixed ampersands now
21:30	<Philip`>	http://020epos.de/ - he's standing on a scarily enormous phone
21:36	jgraham	cheers zero based table algorithm
21:36	<Hixie>	yeah i dunno why i did it one-based
21:36	<Hixie>	but let me tell you, there's nothing quite like offsetting an entire algorithm by one
21:36	<hsivonen>	surprisingly many frontpages are ascii-only meta refreshes
21:38	<jgraham>	I blame Pascal (that is one based by default, right?)
21:38	<Hixie>	not as far as i recall
21:38	<Hixie>	oh, their old-style strings were 1-based (in that index 0 was the length byte), if that's what you mean
21:39	jgraham	has never programmed Pascal so doesn't actually know, but knows that Hixie has
21:39	<Hixie>	yeah, i grew up on pascal
21:39	<Hixie>	various kinds thereof
21:40	<Philip`>	$[ = 1;
21:40	<Hixie>	dude
21:40	<Hixie>	that's deprecated
21:40	<Philip`>	perlvar merely says it's "highly discouraged"
21:41	<Hixie>	that's what being deprecated means
21:41	<Philip`>	(unlike e.g. $* which is explicitly deprecated)
21:41	<hsivonen>	Pascal's indexing is harmful
21:41	<Hixie>	i really don't recall ever having any problem with indexing in pascal, but maybe the compilers i used had "fixed" that feature
21:41	<Hixie>	i don't recall using 1-based arrays
21:42	<Hixie>	and strings were 0-based, except for the anomaly of the zeroth byte of a string being its length instead of the first character
21:42	<Hixie>	(and more modern string types in pascal are all zero-based)
21:43	<hsivonen>	what's with all these Chinese government sites having a meta refresh instead of a proper root page?
21:43	<Philip`>	Option Base 1;
21:43	<Philip`>	Uh, without the semicolon
21:43	<jgraham>	Of course Fortran will let you set array indicies on a case-by-case basis
21:43	<Hixie>	never used basic
21:43	<Hixie>	i'm glad to say
21:44	<Hixie>	(well, except for a couple of things, but i never used "option base" in any basic i ever used)
21:45	<hsivonen>	haha. not even meta refresh:
21:45	<hsivonen>	UrlTable[0] = "index_cn.html";
21:45	<hsivonen>	location.href=UrlTable[Math.round(Math.random()*(UrlTable.length-1))];
21:45	<jgraham>	real a(-134, 126)
21:45	<Hixie>	jgraham: yeah, pascal is the same, you can set arrays to be whatever indicies you want
21:45	<Philip`>	hsivonen: Uh, isn't that going to go to undefined half the time?
21:45	<Hixie>	jgraham: on another note, i just regenned the spec, if you have a moment it'd be great to see if you can find bugs in the new algorithm (i only changed the base)
21:45	<gsnedders>	Hixie: weren't myself, you, and cwilso discussing Pascal a while back, when cwilso mentioned arrays could be any based?
21:46	<hsivonen>	Philip`: so it seems
21:46	<jgraham>	Hixie you wanted o be reminded now "i need to change y_current to y_next"
21:46	<Hixie>	yeah i did that
21:46	<gsnedders>	Hixie: sorry, I'm a bit behind reading
21:46	<hsivonen>	who needs 1-based indeces when you can select from one slot at random?
21:46	<Hixie>	gsnedders: yeah, "array[2..92] of byte;" or some such
21:46	<gsnedders>	fun :\
21:46	<Philip`>	For interoperability, HTML5 needs to define the random number generator seed that must be initialised at every page load
21:47	gsnedders	ponders
21:47	<jgraham>	Apparently some well known scientific codes fake 1-based arrays in C just because the authors were more used to Fortran
21:47	<gsnedders>	It seems I'll likely be going to Cambridge twice this year.
21:48	<hsivonen>	hah. I found a chinese govt site that has over 512 bytes of style before meta. loading it in Firefox tells me it is an attack site
21:48	hsivonen	is assuming that .gov.cn is government
21:49	<Hixie>	if 0<=z<5, z_max = 4, what would you call the constant 5? z_what?
21:49	<Philip`>	Mathematicians can't even decide whether N (the set of natural numbers) starts at 0 or 1
21:49	<Hixie>	or what you say z_max was 5?
21:49	<gsnedders>	Philip`: 1.
21:49	<gsnedders>	:P
21:49	<Philip`>	Hixie: Is z an integer?
21:49	<hsivonen>	I conclude that heuristics for Simplified Chinese work
21:49	<hsivonen>	onto Japanese
21:50	<gsnedders>	Hixie: z_max has any relevance elsewhere?
21:50	<Hixie>	Philip`: yes
21:50	<Philip`>	Hixie: I'd say z_max was the maximum z, i.e. 4, not 5, and I'd say anything else was crazy
21:50	<Hixie>	Philip`: ok, so what is 5?
21:50	<Philip`>	z_max+1
21:50	<jgraham>	z_max + 1
21:51	<Hixie>	well, if we were talking about x instead of z, i'd say it was "width"
21:51	<jgraham>	Range?
21:52	<Philip`>	Just use numbers and letters, and don't bother with meaningful English words
21:52	<jgraham>	(but if you do that you need to define z_range = z_max - z_min +1)
21:52	<Hixie>	screw it, i'll use x_width and y_height
21:54	<Philip`>	gsnedders: Hopefully at least one of those occasions won't be too rainy
21:54	<gsnedders>	Philip`: it tends to be all right in July, at least
21:56	<andersca>	Hixie: another q about the selection process
21:56	<andersca>	"Otherwise, there is no matching application cache: create a new application cache identified by this manifest URI, store the resource in that cache, categorised as an implicit entry, and then invoke the application cache update process."
21:56	<jgraham>	gsnedders: We're having a conference in July and the conference "gift" will be an umbrella...
21:57	<gsnedders>	jgraham: heh.
21:57	<andersca>	Hixie: "store the resource" means store the headers + contents?
21:57	<gsnedders>	jgraham: I mean, I'm heading south (within the northern hemisphere)! The weather should be better, no?
21:57	<Hixie>	andersca: it means the resource, including any metadata, forks, alternate streams, anything that comes with it
21:58	<andersca>	Hixie: OK - so
21:58	<andersca>	Hixie: since this is during the selection process, the resource has most likely not finished loading yet
21:58	<jgraham>	gsnedders: Well it's supposed to be one of the dries places in he country, but I think he west coast of Scotland is too
21:58	<jgraham>	driest
21:58	<gsnedders>	jgraham: s/west/east
21:59	<jgraham>	er, yeah
21:59	<Hixie>	andersca: indeed
22:00	<andersca>	Hixie: and that is probably not a problem since the update process is started right after adding it
22:01	<Hixie>	andersca: i'm not sure why it would be a problem at all really, other than for implementation reasons
22:02	<Hixie>	as in, i could see architectural problems with systems not designed to support this being an issue
22:02	<gsnedders>	jgraham: yeah, it's as dry here as it is in Rome
22:02	<andersca>	Hixie: right
22:02	<Hixie>	but conceptually, it's just a matter of deciding where to stream the data to
22:03	<andersca>	what if I load my html file which has a cache manifest and an iframe which references the same html file
22:03	<hsivonen>	ok. I trust the heuristics work well enough. time to deploy
22:03	<Hixie>	andersca: the iframe always just uses the top-level browsing context's cache, and never ends up creating its own cache itself
22:04	<andersca>	Hixie: yeah, but where would the iframe get the html file from? would it depend on whether the cache was finished updating or not?
22:05	<Hixie>	andersca: it would get it from whatever cache its top-level browsing context is associated with. if it's not associated with anything, it would get it from the main cache.
22:05	<Hixie>	(or, failing that, the network)
22:08	<andersca>	Hixie: I mean, let's say I have something like
22:09	<andersca>	a file called test.html which looks like
22:09	<andersca>	<html manifest=...><iframe src="test.html"></iframe></html>
22:10	<Hixie>	is this the first time you visit it, or the second time?
22:10	<andersca>	the first time
22:10	<Hixie>	i.e. do you already have a cache for it?
22:10	<Hixie>	ok
22:10	<andersca>	I do not
22:10	<hsivonen>	parsetree.validator.nu now uses both chardet and the ICU detector
22:10	<hsivonen>	the validation side doesn't use heuristics
22:11	<Hixie>	andersca: ok, so, regardless of what's going on with the manifest, the result is an infinitely nested iframe of the same document. however:
22:11	<andersca>	yeah, I agree it's a bad test case :)
22:11	<Hixie>	andersca: until such time as the manifest has been downloaded and processed, the iframes will be populated from the main cache
22:11	<Hixie>	andersca: then, once the manifest is processed, the iframe will start getting populated from the manifest cache
22:12	<Hixie>	4.6.5.1. Changes to the networking model is the key section for this
22:13	<Hixie>	(note in particular step 5)
22:13	<Hixie>	(though it doesn't affect this case)
22:13	<andersca>	step 3 is the relevant step here, right
22:14	<Hixie>	the 3rd paragraph of step 24 of the application cache update process is the key here
22:15	<Hixie>	when that happens, suddenly the new network model takes effect
22:16	Philip`	knows nothing about this algorithm, but the mere existence of a 3rd paragraph of a 24th step makes it sound unreasonably complex :-p
22:16	<andersca>	ah
22:16	<andersca>	Philip`: not necessarily, it can also be detailed
22:16	<hsivonen>	why is it that I see more silly refresh and window.location front pages on Chinese and Korean sites than on German and Polish sites?
22:16	<Hixie>	Philip`: it's pretty verbose
22:16	<Hixie>	Philip`: and believe me, mucking in the way that browsers load files is always gonna be complex, unless it's underspecified :-)
22:17	<Philip`>	hsivonen: I would expect you'd see more <marquee>s on them too
22:17	<Philip`>	which presumably is a cultural issue
22:18	<Philip`>	or I'm just stereotyping :-)
22:18	<hsivonen>	Philip`: sure, marquee is cultural, but is failing to serve proper content at site root cultural?
22:21	<Philip`>	Aha, fortunately I have evidence for my opinion - 30% of the .cn sites in dmoz.org use <marquee>, vs 3% of all sites
22:22	<Philip`>	(.kr and .jp are around 3%)
22:23	<hsivonen>	FrontPage sure seems to be popular on Taiwan
22:23	<Philip`>	(and .tw is about 3% too)
22:24	Philip`	wonders if there's a good way of automatically finding significant correlations like this
22:27	<Hixie>	make a list of all the characteristics, get the characteristics for all the demographics you want tested, and then flag any values that are more than one stddev outside the mean?
22:28	<hsivonen>	hmm. the detection fails big time in .ru
22:28	<hsivonen>	KOI-R get misdetected as Chinese UTF-16
22:28	<Philip`>	hsivonen: Is that because there's no Cyrillic detector in jchardet?
22:29	<hsivonen>	Philip`: probably
22:29	<hsivonen>	I think I should reject UTF-16 in the detectors
22:29	<hsivonen>	real UTF-16 comes with a BOM anyway
22:29	<hsivonen>	or so I would hope
22:31	<Philip`>	Real UTF-16 tends to stay away from web
22:31	<Philip`>	Actually, that's a completely unfounded statement
22:32	<Philip`>	The web tends to stay away from UTF-16
22:39	<bzed>	lxml 2.0 seems to break the html5lib tests: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=471638
22:47	<jgraham>	bzed: known issue that we will fix soonish (it's the second thing on my priority list)
22:48	<jgraham>	Do you need a fix for some release, in which case I will move it to first?
22:51	<bzed>	jgraham: as long as it is fixed before Debian's Lenny is frozen, I'll be fine. THat's not too far away, though, but I guess there're at least 4 weeks left to fix it
22:52	<Hixie>	jgraham: is the bug in lxml 2 or html5lib?
22:54	<jgraham>	Hixie: It's in html5lib
22:55	<jgraham>	We were using a kind of hack to make lxml work (using a technically invalid tag name to represent the notional root node), which worked OK in lxml 1 but not in 2 which actually checks
22:55	<Hixie>	ah
22:57	<jgraham>	(it's actually mostly fixed in svn but there are a few issues with treewalkers and so that need to be ironed out)
23:00	<bzed>	jgraham: thanks for the info, I'll ping you again in a fwe weeks :)
23:00	<jgraham>	bzed: Yeah do.
23:28	<Hixie>	jgraham: ok, i think i got all the kinks out of the algorithm
23:29	<Hixie>	jgraham: if you have a moment, i'd love to make sure it's not broken before checking it in :-)
23:29	<Hixie>	http://www.whatwg.org/specs/web-apps/current-work/#processing
23:35	<jgraham>	Hixie: I'll have a look but I don't have a good way to check for mistakes
23:36	<Hixie>	i am relying on your brain :-)
23:36	<jgraham>	As I said...
23:37	<Hixie>	i've checked it in, anyway
23:37	<Hixie>	i figure if there are issues, we'll find them soon enough
23:37	<Hixie>	oh crap i forgot to update the next section
23:39	<jgraham>	A row group is a set of rows anchored at a slot (0, groupy) with a particular height such that the row group covers all the slots with coordinates (x, y) where 0 ≤ x < x_width-1
23:40	<jgraham>	should that be 0 ≤ x < x_width
23:40	<jgraham>	?
23:40	<jgraham>	(since x = x_width-1 is the last column)
23:40	<Hixie>	yes
23:40	<Hixie>	good catch
23:40	<Hixie>	will fix
23:40	<jgraham>	Similarly wih column groups
23:42	<jgraham>	Can a cll be in 0 row groups now? I thought that there was always at least 1 implied row group before
23:43	<jruderman>	does Opera have an email address for reporting security issues?
23:43	<Hixie>	it's always been able to be in zero groups
23:43	<Hixie>	xhtml: <table><tr><td>cell with no groups</td></tr></table>
23:44	<Hixie>	ok, fixed the problems above, and the section after (assigning), too. (reload if you care to check)
23:44	<jgraham>	Ah, XHTML
23:45	<jgraham>	jruderman: I think you use the form at http://www.opera.com/support/bugs/
23:45	<jruderman>	jgraham: you can't attach files there, so it's kinda useless for reporting security holes
23:46	<jgraham>	jruderman: "When your report is submitted, you will get an e-mail address to which you can send updates and attachments relevant to your report. "
23:46	<jruderman>	oh
23:46	<jruderman>	thanks