#whatwg on 2009-04-09

00:00	<takkaria>	oh, the first one seems to do something now
00:18	<Philip`>	takkaria: You have to wait for it to load a zillion script files
00:18	<Philip`>	(Presumably it'd be possible to pack them all into a single .js file if you cared about performance)
00:19	Philip`	hopes he is actually testing it correctly, and didn't forget to change some of the prefixes or something
00:40	<LeifHS>	annevk5: Yes, <embed> is like <source> - kind of. However, <source> is part of the <video> element - it is an extension of <video>. What I meant was that WebKit is treating <embed> as if is part of <object> (the same way that <source> is part of <video>).
00:40	karlcow	always wonders why talented people becomes suddenly blind when they are passionate about their tech
00:50	<LeifHS>	zcorpan: Is it possible for you to create an arch typical exsample of what you mean?
08:31	Hixie	now has a basical proof of concept of his new <datagrid> API design
08:32	<Hixie>	now i just have to check how plausible it is
08:32	<Hixie>	from the authoring side...
08:37	<Hixie>	this is one of the first times that i've designed something for which i think a structure a bit like a B-tree would actually be a pretty good fit
08:45	<MikeSmith>	wow
08:45	<MikeSmith>	B-tree
08:45	<MikeSmith>	that's a blast from the past
08:46	<MikeSmith>	Hixie: seems like datagrid is ultimately going to sink or swim based on how much implementor commitment there is
08:47	<MikeSmith>	and given that there's not be much implementor commitment forthcoming so far, I wonder where that's going to leave it
08:48	<MikeSmith>	if we really want to get to LC by this Fall
08:48	<Hixie>	my impression is that browser vendors seem in agreement that it would be a useful feature, but that it isn't a priority
08:51	<MikeSmith>	Hixie: I think it's something more than just a useful feature
08:52	<Hixie>	oh?
08:52	<MikeSmith>	to me, it's something that would appeal quite a bit to Web developers
08:52	<Hixie>	that's what i mean by "useful feature" :-)
08:53	<Hixie>	note that by LC we don't have to have commitements to implement, only agreement that the features should be in the language
08:53	<MikeSmith>	Hixie: it's the "isn't a priority" to browser developers that's the big stumbling block
08:53	<Hixie>	in fact it's only when entering CR that we have to list features that might be at risk
08:53	<Hixie>	and even then we have until REC to see them implemented
08:54	<Hixie>	so lack of implementation commitements is not a big deal so long as implementors don't disagree that it would be good to implement eventually
08:54	<Hixie>	commitments
08:54	<MikeSmith>	I guess
08:55	<MikeSmith>	I would personally rather not see us take a particular feature into CR without a clear commitment from multiple browser vendors to implement it
08:55	<Hixie>	sure
08:56	<MikeSmith>	maybe we need to light a fire under some asses as far as datagrid
08:56	<hsivonen>	myvidoop.com no longer shows an EV cert. Am I being MITMed?
08:56	<MikeSmith>	e.g., threat "this is going to be removed from HTML5 unless we get a clear indication of vendor support"
08:56	<MikeSmith>	hsivonen: MITMed
08:57	<MikeSmith>	?
08:57	<hsivonen>	MikeSmith: Man In The Middle
08:57	Hixie	watches the EV cert security model fall apart
08:58	<zcorpan>	hsivonen: do what normal people do and discard any dialogs and don't notice the lack of dialogs
08:58	<MikeSmith>	EV cert, despite whatever faults it has, is approximately one gazillion times better than non-EV certs
08:59	<Hixie>	for cert vendors, sure
08:59	<hsivonen>	MikeSmith: how so. It's causing me distress now. Probably for no good reason.
08:59	<hsivonen>	If I'm being MITMed, how do I email vidoop, when I should assume that by now the MX record in my DNS cache is poisoned as well?
09:00	<Hixie>	certs in general can answer the question "is this who i think it is", which is never a question users ask. They ask (implicitly) the question "am I being attacked", which is not possible to answer using any kind of SSL cert that I know of, EV or otherwise.
09:01	<Hixie>	(and EV certs can answer the first question with $450 more confidence than non-EV certs, that's about it.)
09:01	<MikeSmith>	Hixie: EV certs have documented rules for identity
09:01	<Hixie>	right. "$450 more confidence".
09:01	<MikeSmith>	when you pay (whatever dollar amount) for a non-EV cert, you are a sucker
09:02	<MikeSmith>	because you are paying for nothing
09:02	<Hixie>	you are paying for someone to sign it
09:02	<MikeSmith>	CAs are bound to do zero vetting otherwise
09:02	<Philip`>	You're paying for something that prevents passive network attackers from reading all the data send to/from your web site, which seems quite useful
09:03	<Philip`>	*sent
09:05	<Hixie>	Philip`: you can get that with a self-signed cert
09:05	<Philip`>	Hixie: You can't since it won't work in Firefox 3
09:05	<MikeSmith>	the EV-cert work is similar to a lot of things in that it's really easy to make jackass comments about it from the outside after the fact, throwing rocks, without any understanding of the difficulties involved in ever having tried to do something like it all
09:05	<Hixie>	Philip`: that's a bug (it should encrypt but not show any ui about it)
09:05	<Hixie>	imho, anyway
09:06	<Hixie>	MikeSmith: as someone who is involved in the security work of three browsers, i feel pretty qualified in the subject. I'm fully aware that there has not been anything better proposed. That doesn't mean it's good.
09:06	<hsivonen>	I emailed vidoop. I wonder if they'll answer and with what answer.
09:07	<hsivonen>	Hixie: hey, that's the argument against the Public Suffic List
09:07	<Hixie>	i'm not saying we shouldn't have EVs
09:07	<Hixie>	and the public suffix list sucks too, yes
09:07	<Hixie>	:-)
09:07	zcorpan	thinks they'll answer "What is MITM?"
09:09	<Hixie>	(and before someone thinks i'm only picking on technologies i'm not working on, html sucks in many ways too. It, like the other two, just happens to be the best we have in the real world.)
09:09	<hsivonen>	zcorpan: their core business is making people feel more secure, so one would hope they'd know what MITM is
09:10	<MikeSmith>	Hixie: yeah, the point is that there was not anything better proposed, and that coming up with anything at all involved burning up a lot of time negotiating things with the CAs, and not all browser projects cared to actually show up to bother to take the time to actually involve themselves actively in the discussions
09:10	<Hixie>	oh?
09:10	<Hixie>	who wasn't involved?
09:11	<MikeSmith>	Opera was involved, Microsoft was involved, George Staikos was involved
09:14	<Hixie>	interesting
09:14	<Hixie>	anyway, bed time
09:15	<Hixie>	nn
09:23	<MikeSmith>	hsivonen: about http://bugzilla.validator.nu/show_bug.cgi?id=437
09:23	<MikeSmith>	I'm trying to figure out which part of the code it is that adds the hyperlinking back to the spec
09:24	<hsivonen>	MikeSmith: nu.validator.spec.html5.Html5SpecBuilder
09:24	MikeSmith	looks now
09:28	<MikeSmith>	hsivonen: also, seems to be some brokenness in currently linking
09:29	<MikeSmith>	test, with, e.g., <!DOCTYPE html><title>foo</title><img>
09:29	<MikeSmith>	links to <img> doc end up as:
09:29	<MikeSmith>	http://www.whatwg.org/specs/web-apps/current-work/#null
09:29	<hsivonen>	MikeSmith: that code is very brittle :-(
09:30	<MikeSmith>	hsivonen: i see
09:31	<hsivonen>	I really should get to fixing outstanding V.nu bugs, but I'm still blocked on eliminating a key memory management issue from the C++ version of the parser.
09:32	<MikeSmith>	hsivonen: yeah, understood
09:33	<MikeSmith>	I'm happy to help with, at the very least, dealing with some of the low-hanging fruit
09:33	<MikeSmith>	as far a v.nu bugs
09:33	zcorpan	wonders how to ensure that text doesn't use non-XML 1.0 Char characters in a text-based CMS
09:33	<hsivonen>	MikeSmith: I very much appreciate your help. The recent fixes have been great.
09:34	<hsivonen>	zcorpan: run everything through a magic regexp?
09:35	<zcorpan>	i guess
09:37	<Philip`>	zcorpan: Remove all non-ASCII characters, and all below 0x20
09:38	<Philip`>	Alternatively: Don't try to stop people inputting non-XML 1.0 Char characters
09:38	<jgraham>	zcorpan: html5lib has a big regexp somewhere
09:38	<Philip`>	and just make sure you deal with the issue when outputting to XML
09:38	<zcorpan>	Philip`: i need non-ascii characters
09:38	<jgraham>	zcorpan: Why are you trying to produce AML with a text-based CMS?
09:38	<jgraham>	*XML
09:38	<Philip`>	(because otherwise you'll forget to validate some of the input, and get invalid text in your database, and then it'll be a pain to get rid of)
09:39	<zcorpan>	jgraham: i want to have an Atom feed and the CMS i have is text-based
09:39	<jgraham>	s/somewhere/in ihatexml.py/
09:39	<zcorpan>	lol
09:39	<Philip`>	Escape on output, it's the only way to be sure :-)
09:40	<jgraham>	zcorpan: I feel there must be a "now I have two problems" type comment in here somewhere
09:41	<MikeSmith>	heh
10:19	<MikeSmith>	http://www.whatwg.org/specs/web-apps/current-work/multipage/editing.html#spelling-and-grammar-checking
10:20	<MikeSmith>	"The spellcheck attribute is an enumerated attribute whose keywords are the empty string, true and false."
10:21	<MikeSmith>	so <p spellcheck>foo</p> should be invalid, right?
10:21	<Philip`>	That looks like an empty string to me
10:21	<MikeSmith>	but <p spellcheck="">foo</p> is valid
10:21	<MikeSmith>	Philip`: <p spellcheck> is the same as empty string?
10:21	<Philip`>	The tokeniser doesn't make any distinction between those two cases, unless I'm horribly mistaken
10:21	<MikeSmith>	OK
10:21	<Philip`>	and I'm pretty sure I'm not horribly mistaken
10:21	<Philip`>	though I could be wrong about that
10:21	<jfkthame>	Philip`: are you the one behind http://fonts.philip.html5.org/ by any chance?
10:22	<Philip`>	jfkthame: I am
10:22	<MikeSmith>	hsivonen: <p spellcheck> vs. <p spellcheck=""> ?
10:22	<jfkthame>	Philip`: cool page! however, there's a problem with at least one of the fonts
10:22	<jfkthame>	please see https://bugzilla.mozilla.org/show_bug.cgi?id=487549
10:23	<Philip`>	jfkthame: "Not authorized"
10:23	<jfkthame>	aaarrrgghhh, sorry....
10:23	<Philip`>	jfkthame: You could CC me on the bug if you want :-)
10:24	<jfkthame>	bugzilla id?
10:24	<Philip`>	philip.taylor⊙ccau
10:24	<jgraham>	Has Philip` inadvertently found yet another browser security bug
10:24	<Philip`>	If I had to guess wildly, it would be that it crashes on OS X
10:24	<jfkthame>	right - due to an invalid kern table in one of the subsetted fonts
10:25	<jfkthame>	you're cc'd
10:27	<jfkthame>	we could potentially do some more validation, but it's not feasible to exhaustively test fonts before handing them over to the OS or font rendering library, so if that isn't completely robust, bad fonts will always be a risk
10:29	Philip`	wonders why his copy of the font doesn't look anything like a font, and then realises after several minutes that it's been gzipped
10:31	<jfkthame>	looking at the original PixAntiqua.ttf, i see that it has the apple-format kern table, and offhand it looks valid
10:31	<jfkthame>	i guess the subsetting code is at fault, then, when it strips out all the irrelevant kern pairs from the subset font
10:36	<jfkthame>	ah, looks like the problem is Kern.pm (from Font::TTF).... it always packs the kern table header in the old format, without checking the version number to see whether that's correct
10:37	Philip`	was coming to the same conclusion :-)
10:37	<Philip`>	Is kern table version 1 documented somewhere?
10:38	<jfkthame>	http://developer.apple.com/textfonts/TTRefMan/RM06/Chap6kern.html
10:38	jgraham	notes there seems very little point in the bug being marked security sensitive now since most of the information needed to reproduce it seems to be documented here :)
10:38	<Philip`>	I suppose I'd really have to support it properly, to fix up all the glyph IDs, and can't just pass it through :-(
10:39	<Philip`>	but I'm too lazy to do that
10:39	<Philip`>	but I could make it just drop the entire kern table
10:40	<Philip`>	jgraham: It's not a clearly exploitable bug, and there's probably hundreds of other ways you could make font renderers crash because they weren't designed for untrusted input, so I suppose the relative risk is quite low :-)
10:40	<jfkthame>	jgraham: yup - 'fraid so - it's basically just another case of "feed random data to the font system and you'll probably crash something"
10:41	<Philip`>	Doesn't fill me with great confidence in the security of browsers if they're dependent on such things :-/
10:42	<takkaria>	font renderers obviously need to stop trusting input
10:42	<jfkthame>	you mean "the security of operating systems", don't you? the browser is just a way to get a font onto the user's system
10:43	<Philip`>	I mean the security of the process of using my web browser to view a (untrusted) page
10:43	<jfkthame>	takkaria: right - it's no different from jpeg or png decoders or whatever - just that we've been attacking those for longer and so they've had a lot more hardening
10:44	<Philip`>	Were JPEG and PNG decoders as crash-prone as TTF decoders are, when they were first used in browsers?
10:44	Philip`	supposes it's quite possible they were
10:44	<Philip`>	(Has anyone tried making Theora files that crash Firefox?)
10:44	<jfkthame>	ISTR plenty of security alerts in that area over the years
10:47	jfkthame	needs to go for now - Philip`, have fun with the Theora idea!
10:48	<Philip`>	jfkthame: Thanks for pointing me at the bug :-)
10:50	<gsnedders>	Oh, yeah, it's Thursday today.
10:50	gsnedders	forgot
10:55	Philip`	fixes his code to drop the kern table
10:55	<Philip`>	I wonder how many OS X users' browsers my page crashed before that fix
10:58	<jgraham>	gsnedders: You never could get the hang of Thursdays?
10:59	<gsnedders>	jgraham: Nah, it's just it's school holidays so I have no concept of time anymore.
10:59	Philip`	is not on holiday but has no concept of time anyway
11:44	<eighty4>	gsnedders: so no concept of time == holiday?
12:02	<gsnedders>	eighty4: No, if holiday then no concept of time. The inverse does not hold true.
12:03	<eighty4>	:)
12:03	<eighty4>	I'm having 2 days completly alone tomorrow... during that time I'll probably have no concept of time
12:15	<remysharp>	Is this a sensible place to ask html5 questions - in particular about using it for new sites/pages?
12:16	<gsnedders>	remysharp: Yes
12:16	<gsnedders>	remysharp: But I would say that, wouldn't I? :)
12:16	<jgraham>	gsnedders: The right answer was no. We fail on the first clause (Is this a sensible place)
12:16	<gsnedders>	jgraham: True.
12:17	<remysharp>	Hmm, okay - how about senseless questions?
12:17	<jgraham>	remysharp: However it is still a good place to ask HTML 5 questions
12:17	<gsnedders>	jgraham: But giving the right answer would be sensible.
12:17	<jgraham>	Perhaps the best
12:17	<jgraham>	Just note the /topic
12:18	<remysharp>	So, in the <footer> element, if I have a list of elements -
12:18	<remysharp>	As per this example: http://dev.w3.org/html5/spec/Overview.html#the-nav-element
12:18	<remysharp>	shouldn't the list be in a nav element?
12:18	remysharp	hmm - was that the right link...
12:19	<gsnedders>	remysharp: "Not all groups of links on a page need to be in a nav element — only sections that consist of primary navigation blocks are appropriate for the nav element."
12:19	<gsnedders>	remysharp: They aren't primary navigation links
12:19	<remysharp>	gsnedders: ah, right, so only "primary" - great - thanks.
12:20	<remysharp>	So I've been using a few live examples around the web to help me code my html5 page -
12:20	<remysharp>	and I've been looking at the uxlondon.com site -
12:20	<remysharp>	which uses the <div class="section"> method to get around having to use JS to trigger IE to see html
12:20	<remysharp>	so - my question is -
12:20	<jgraham>	(On the opther hand it does make some sense that the primary navigation might be partially in the footer so the content model restriction maybe doesn't make sense)
12:21	<remysharp>	is there a real reason why all their 'section's contain 'article's?
12:21	<remysharp>	I wouldn't have always nested an article in a section
12:21	<remysharp>	and the spec description of an article is user generate content or article blog entry, etc.
12:22	<remysharp>	example: http://uxlondon.com/speakers/
12:22	<jgraham>	remysharp: At first glance, no
12:22	<remysharp>	any thoughts on that at all? or perhaps just a design choice they went with
12:23	<gsnedders>	At a further glance, no.
12:23	<jgraham>	remysharp: I think their markup is unnecessarily redundant
12:23	<remysharp>	okay, good - that's what I was thinking too - so at least I'm partially following it.
12:23	<remysharp>	Using the same page as an example (the output)
12:24	<remysharp>	I am marking up a speakers page for my own project:
12:24	<remysharp>	and I've got a list of speakers - which normally I'd put in a <ul>
12:24	<remysharp>	however - I kind of want to put each one in a <section> element with the whole thing nested inside an <article> element
12:25	<remysharp>	would that make sense? (or would you like a quick example of what I mean?)
12:25	<gsnedders>	You're asking about sense again…
12:25	<remysharp>	!! :-D
12:25	<remysharp>	yeah, sorry!
12:25	gsnedders	reads the actual question
12:26	<jgraham>	remysharp: Something like <article><h2>Speakers</h2><section><h3>J. Smith</h3>[...]
12:26	<remysharp>	I suspect a quick mock of the markup I'm suggesting might help
12:26	<remysharp>	yeah - jgraham that looks like what I was thinking
12:26	<remysharp>	so no use of <ul> at all
12:26	<jgraham>	It's not really clear why the outside bit would be <article> rather than <section> but I guess it barely makes any difference anyway
12:26	<remysharp>	but my normal html4 approach would be to use a list element, but with html5, I kinda don't want to
12:26	<gsnedders>	remysharp: I'd go for <article><h2>Speakers</h2><dl><dt>Mr John Smith<dd><p>Mr John Smith is awesome.<p>He even wrote <cite>My Magical Wonderland</cite></dl></article>
12:27	<remysharp>	but isn't Mr John Smith a header within a section?
12:27	<jgraham>	gsnedders: Doesn't work so well if you ant to generate a toc that has the speakers listed
12:27	<jgraham>	*want
12:27	gsnedders	shrugs
12:27	<beowulf>	i'd go with gsnedders
12:28	<beowulf>	in terms of that question :)
12:28	<jgraham>	remysharp: I think, assuming each spaeker will have a little description, that using section+headers is fine
12:28	<beowulf>	though i'd probably s/article/section
12:28	<gsnedders>	jgraham: I do have a vague clue about what the outlining algorithm says :P
12:29	<remysharp>	damn - is there a prefered pastebin?
12:30	<gsnedders>	The one that Google takes you to when you click on "I'm feeling lucky!" because it takes the least amount of effort to find.
12:30	<jgraham>	remysharp: Try your markup in http://gsnedders.html5.org/outliner/
12:30	<remysharp>	jgraham: ta
12:30	<gsnedders>	(OMG! I WROTE THAT1)
12:30	<jgraham>	remysharp: (That is not a pastebin)
12:30	<remysharp>	yeah, sure - oh
12:30	<remysharp>	there's not going to be url is there?
12:30	<remysharp>	hixie has one I believe - that saves the url...trying that
12:30	<jgraham>	remysharp: Ask gsnedders :)
12:31	<jgraham>	Oh, yeah you could use the LDV
12:31	<gsnedders>	remysharp: Just allowing a textarea?
12:31	<gsnedders>	Yeah, that's on my to-do list.
12:32	<remysharp>	right - there you:
12:32	<remysharp>	http://tr.im/iv56
12:34	<beowulf>	i'd say the first header is redundant unless it wraps the <p> and that the outer article is a section
12:34	<jgraham>	remysharp: No need for the <header> element unless you plan to make a subheading
12:34	<gsnedders>	The header elements are both needless
12:34	<beowulf>	but i'm easily the dumbest person in the room, just to clarify
12:35	<remysharp>	There would obviously be a header element at the top of the page
12:35	<remysharp>	but you're saying the name shouldn't be in a header
12:35	<jgraham>	remysharp: No, no no :)
12:36	<gsnedders>	I'm saying you gain nothing by having it in a header element
12:36	<jgraham>	<header><h2>Foo</h2></header> === <h2>Foo</h2>
12:36	<gsnedders>	A header element only is of use when you having a heading and subheading and you only want the heading to appear in the TOC
12:36	<remysharp>	sorry, yeah, doesn't gain anything
12:36	<jgraham>	The extra <header> is redundant
12:36	<gsnedders>	<header><h2>Foo</h2><h3>Bar</h3></header> === <h2>Foo</h2> in terms of TOC too
12:37	<remysharp>	gsnedders: why does the h3 text get lost then?
12:37	<remysharp>	(in the TOC)
12:37	<remysharp>	or does it read the highest level heading and use that?
12:37	<gsnedders>	jgraham: Explain!
12:37	<gsnedders>	:P
12:37	<gsnedders>	remysharp: The highest level heading is all that's used
12:37	<remysharp>	cool - that makes sense.
12:38	<remysharp>	so if a TOC was generated from that page - if I omitted the <header> on the names of speakers, their names wouldn't appear in the TOC - is that right?
12:38	<beowulf>	gsnedders: what does <header><h1>Foo</h1><h1>Bar</h1></header> come out as in the TOC?
12:38	<gsnedders>	beowulf: Foo
12:39	<gsnedders>	remysharp: They would. The h3 element would make them.
12:39	<remysharp>	right - got you.
12:39	<remysharp>	so header is only if there's mixed content and you want a specific (or the highest) to be used.
12:39	<remysharp>	that makes sense as to why it's utterly redundant in my example.
12:39	<gsnedders>	beowulf: (It's the first highest order header)
12:40	<gsnedders>	(There are, however, bugs in what the spec currently says.)
12:40	<remysharp>	So, on that same topic, would it be fair to say that in this example, the <header> is redundant:
12:40	<remysharp>	<header><h1>My site</h1><p>Tag line</p></header>
12:41	<gsnedders>	remysharp: No
12:41	<jgraham>	remysharp: Technically, no
12:41	<gsnedders>	(the p element will not be associated with any section, IIRC)
12:42	<jgraham>	It won't have any observable effect on the outline, but semantically it is right
12:42	<gsnedders>	http://www.w3.org/Bugs/Public/show_bug.cgi?id=6750
12:42	<jgraham>	Oh, maybe it does have some observable effect
12:42	<jgraham>	then
12:42	<gsnedders>	jgraham: Not really
12:42	<gsnedders>	jgraham: Unless you have a UI which shows what element is linked to what section, it doesn't.
12:42	<jgraham>	gsnedders: That is an observable effect
12:43	<gsnedders>	Well, not in any current implementation of the algorithm seeming both of our implementations just build a TOC.
12:43	<jgraham>	gsnedders: In principle though
12:43	<remysharp>	just backpeddling to the question about using sections within an article instead of a <ul><li> collection - does that look right in essence?
12:44	<jgraham>	remysharp: Yes
12:44	<jgraham>	It is better than <ul> for sure
12:44	<remysharp>	awesome. total head f**k based on getting used to using lists for everything, but feel right.
12:44	<remysharp>	*feels right
12:45	<beowulf>	jgraham: why is it better than a list?
12:46	<jgraham>	beowulf: Even extant AT will allow you to nagivate easilly by header elements, for example
12:58	<Philip`>	If you want to generate a TOC of all the speakers, use <span class="author"> and then write a script that extracts all the author names and sticks them into a TOC list
12:58	<Philip`>	which is, like, two lines of code
12:59	<hsivonen>	Hmm. Is keygen really meant to differ from input 'in select'?
13:16	<MikeSmith>	hsivonen: I wondered about that too, when looking at your treebuilder code
13:16	<MikeSmith>	maybe worth a bugzilla to clarify
13:20	<hsivonen>	http://software.hixie.ch/utilities/js/live-dom-viewer/saved/73
13:21	<hsivonen>	like <input> in Gecko. Not like <input> in Opera and Safari.
13:24	<hsivonen>	MikeSmith: bug filed just in case
13:30	<gsnedders>	MikeSmith: Which Irish author is it you want me to read, again?
13:43	<hsivonen>	could someone please point out to me why the following XSD regexp: "\s(none\|xMinYMin\|xMidYMin\|xMaxYMin\|xMinYMid\|xMidYMid\|xMaxYMid\|xMinYMax\|xMidYMax\|xMaxYMax)\s+(meet\|slice)?\s"
13:43	<hsivonen>	does not match the word 'none'?
13:44	<hsivonen>	ooh
13:44	<hsivonen>	now I see it
13:44	<hsivonen>	\s+
13:45	<hsivonen>	that's a classing mis-use of XSD \s, BTW
13:45	<hsivonen>	XSD \s does not equal XML whitespace
13:46	<gsnedders>	WHAT!?
13:46	<hsivonen>	gsnedders: welcome to the world of i18n political correctness. Zs FTW!
13:46	<gsnedders>	Not White_Space and not Zs?
13:46	<hsivonen>	gsnedders: IIRC, Zs
13:47	<gsnedders>	That doesn't even include U+000A IIRC!
13:47	<hsivonen>	s/classing/classical/
13:48	<hsivonen>	s/al// I suppose
13:48	<gsnedders>	Yeah
13:48	<gsnedders>	It's now right :)
13:48	Philip`	would be unable to resist the temptation to write that regexp with (none\|x(Min\|Mid\|Max)Y(Min\|Mid\|Max))
13:49	<Philip`>	(Fortunately I would be able to resist (none\|xM(in\|id\|ax)YM(in\|id\|ax)) because that's just crazy)
13:51	<Philip`>	(Shouldn't the regexp start with "(defer\s+)?"?)
13:53	<hsivonen>	defer?
13:54	<Philip`>	That's what http://www.w3.org/TR/SVG/coords.html#PreserveAspectRatioAttribute says
13:55	<hsivonen>	Philip`: good point.
13:56	<hsivonen>	is defer only allowed on <image>?
13:56	<Philip`>	Sounds like it's allowed everywhere, but ignored except on <image>
13:57	<hsivonen>	heycam: ^
13:57	<hsivonen>	heycam: the delta of the V.nu copy of the SVG 1.1 schema and the W3C copy is growing
13:59	<hsivonen>	MikeSmith: I deployed your recent checkins.
14:05	<MikeSmith>	hsivonen: thanks
14:06	<MikeSmith>	gsnedders: Flann O'Brien
14:10	<heycam>	hsivonen, spec problem?
14:11	<heycam>	or just a problem with the SVG 1.1 DTD not being as restrictive as it could be?
14:37	<hsivonen>	heycam: schema bug
14:37	<hsivonen>	heycam: schema bug in the RELAX NG schema
14:55	<heycam>	hsivonen, ok
14:55	<heycam>	so that rng isn't really official
14:56	<heycam>	we're going to be making a new one soon, for 1.1, but starting from the 1.2T rng
14:58	<heycam>	ah i see that regex you quote is from that unofficial rng
14:58	<gsnedders>	MikeSmith: Well, it's my birthday in just over a week ;P
14:59	<jgraham>	gsnedders: How old are ypu going to be 5? 6? I lose track
14:59	<gsnedders>	jgraham: 7
14:59	<gsnedders>	Sorry, I lie. 10, in an as of yet undecided base.
15:00	<heycam>	hsivonen, erk, seems like the regex in the 1.2T relaxng is horribly wrong!
15:00	<heycam>	\s(none\|xMidYMid)\s(meet)?\s*
15:00	heycam	raises an issue
15:02	<krikey>	I was sat like billy no mates in #html5
15:02	<krikey>	someone could have coma and got me :)
15:02	<heycam>	http://www.w3.org/Graphics/SVG/WG/track/issues/2257
15:02	<jgraham>	Hard to get someone if you're in a coma
15:03	<krikey>	ye
15:03	heycam	goes to watch some more daily show
16:26	<Philip`>	heycam: That issue fails to mention that it accepts strings like "nonemeet" too
16:26	<jgraham>	http://www.ecma-international.org/news/PressReleases/PR_Ecma_finalises_major_revision_of_ECMAScript.htm
16:27	<cryzed>	Hey :)
16:27	<cryzed>	Is someone from the html5lib for Python here?
16:27	<jgraham>	cryzed: Yes
16:28	<cryzed>	great :)
16:28	<jgraham>	"TC39 members will create and test implementations of the candidate specification to verify its correctness and the feasibility of creating interoperable implementations". I wonder if they mean interoperable implemetations that can ship on the web
16:28	<cryzed>	So basically I want to know
16:29	<cryzed>	Do I get the lxml.html parser with this : html5lib.HTMLParser(tree=treebuilders.getTreeBuilder("lxml"))?
16:29	<Philip`>	jgraham: Try reading the next sentence
16:29	<Philip`>	"The test implementations will also be used for web compatibility testing to ensure that the revised specification remains compatible with existing web applications."
16:29	<jgraham>	Philip`: Oh.
16:30	<Philip`>	cryzed: No - that uses html5lib's HTML5 parser (and constructs an lxml document from it), not lxml's non-standard HTML parser
16:30	<jgraham>	cryzed: Yes, unless you use the latest svn in which case the best option is to use html5lib.parse(input, tree="lxml")
16:31	<cryzed>	Well
16:31	<cryzed>	what now?
16:31	<jgraham>	Oh, sorry, I misunderstood the question
16:31	<jgraham>	cryzed: What do you actually want to do
16:31	<cryzed>	wait
16:31	<cryzed>	http://paste.pocoo.org/show/ad9DwVRGDKIhXNgqSk8j/
16:31	jgraham	is not doing very well at reading at the moment
16:31	<cryzed>	I want to parse my blog with the html5lib
16:32	<cryzed>	and then scrape it with the resulting elementtree
16:32	<cryzed>	Unfortunately I don't really find any documentation for the ElementTree except http://effbot.org/zone/pythondoc-elementtree-ElementTree.htm#elementtree.ElementTree.ElementTree-class
16:32	<jgraham>	cryzed: That looks vaugely sensible. What is the problem?
16:33	<cryzed>	http://paste.pocoo.org/show/KHOsRJHJNnHNvS9ec0Bg/ that should work
16:33	<cryzed>	doesn't though
16:33	<jgraham>	http://codespeak.net/lxml/tutorial.html
16:33	<cryzed>	http://paste.pocoo.org/show/ld7f4BUjcUZbnyFG4vGd/
16:33	<cryzed>	ah...
16:33	<cryzed>	the case..
16:34	<jgraham>	findall()?
16:34	<cryzed>	yes
16:34	<cryzed>	and I DO need to supply
16:34	<cryzed>	an argument
16:34	<cryzed>	which argument do I need to supply to find all tags?
16:34	<cryzed>	.
16:34	<cryzed>	?
16:35	<jgraham>	if you just want all the child nodes you can just do "for item in element:"
16:35	<cryzed>	etree.findall(".//*"): that works aswell
16:35	<cryzed>	thanks though jgraham :)
16:35	<cryzed>	Is the argument which I pass to findall called a "xpath"?
16:35	<cryzed>	ElementPath
16:36	<cryzed>	found it..
16:36	<cryzed>	sorry I seem to ask only stupid questions
16:36	<jgraham>	cryzed: If you know xpath and are using lxml you can do element.xpath(xpath_expression)
16:37	<cryzed>	Or I can just use the ElementPath?
16:37	<cryzed>	http://effbot.org/zone/element-xpath.htm
16:37	<jgraham>	e.g. element.xpath(".//a") finds all a descendants
16:37	<jgraham>	cryzed: ElementPaths are like a subset of XPath 1.0
16:38	<cryzed>	Hrmm, I don't think I do need full xpath support, thanks for the tip though
16:38	<cryzed>	jgraham, I read on the lxml.html documentation
16:38	<cryzed>	about the following function:
16:38	<cryzed>	*method
16:38	<cryzed>	.text_content():
16:38	<jgraham>	Also if you use a really up to date lxml you can probably get CSS Selectors
16:39	<cryzed>	this isn't available in the lxml etree, right?
16:39	<jgraham>	cryzed: html5lib just generates an lxml tree. It has all the features of whichever lxml you have installed
16:39	<cryzed>	well, yes
16:39	<cryzed>	the problem is
16:40	<cryzed>	lxml.html
16:40	<cryzed>	the html Etree is a special tree
16:40	<cryzed>	How do I tell html5lib
16:40	<cryzed>	to use the lxml.html tree?
16:40	<jgraham>	Oh, yeah
16:40	<smedero>	jgraham: I don't think the .text_content() method exist in lxml.etree
16:40	<cryzed>	Is there any way to use the lxml.html tree?
16:40	<jgraham>	So, I don't think you can at the moment because of some weirdness in the way that lxml is set up
16:40	<cryzed>	I think this would be really comfortable for webscraping
16:41	<jgraham>	At least that is my recollection from when I implemented this stuff a while ago
16:41	<Philip`>	etree.tostring(node, method='text')
16:41	<Philip`>	might be similar to node.text_content()
16:41	<smedero>	yeah, that should be in the ballpark
16:41	<jgraham>	It's something like you can't create comments in lxml.html or...
16:41	<jgraham>	.xpath(".//text()) works
16:42	<jgraham>	.xpath(".//text()")
16:42	<cryzed>	thanks
16:42	<cryzed>	I found in the lxml.html implementation the following
16:42	<cryzed>	_collect_string_content(self)
16:42	<cryzed>	should work if there is no other way
16:59	<cryzed>	the .text attribute
16:59	<cryzed>	works beautifully
17:01	<cryzed>	..not
17:02	<jgraham>	cryzed: for <a><b>foo</b>bar</a> a.text == None
17:02	<jgraham>	b.text == "foo"
17:02	<jgraham>	b.tail =="bar"
17:03	<cryzed>	is there any way to get the whole text and protect the formatting?
17:03	<cryzed>	for example replace <br>
17:03	<cryzed>	with \n
17:03	<jgraham>	cryzed: Not an easy way that I know of
17:03	<cryzed>	hrm
17:03	<jgraham>	You would need to walk the tree, normalize whitespace and make whatever replacements you want
17:05	<cryzed>	http://paste.pocoo.org/show/5gbDet52tiXSMRWkLZ4y/ ?
17:06	<jgraham>	That will only do children of the td
17:07	<cryzed>	yes
17:07	<cryzed>	that's what I want actually
17:08	<cryzed>	sorry if I start to get annoying
17:08	<cryzed>	but why doesn't that work: http://paste.pocoo.org/show/cfmGTUkDaWSCr1ehDH3X/ ?
17:13	<jgraham>	Trying to get the "blockquote" attribute of a "blockquote" element?
17:14	<cryzed>	well.. kinda
17:14	<cryzed>	^^
17:14	<cryzed>	It works in BeautifulSoup :D...
17:14	<jgraham>	Er, what does it do?
17:14	<jgraham>	I mean if you really have <blockquote blockquote=something> I guess it should work
17:15	<cryzed>	it should get me the text WITHOUT markup WITH formatting out of the blockquote tags
17:15	<cryzed>	>IN BETWEEN HERE<
17:16	<gsnedders>	with formatting without markup? how?
17:16	<cryzed>	well
17:16	<cryzed>	for example
17:16	<jgraham>	cryzed: If you just want the test you can do element.xpath(".//text()")
17:16	<gsnedders>	jgraham: You don't need XPath for that! Peh!
17:17	<cryzed>	<pre>That's some fancy text <br>comment'</pre>
17:17	<cryzed>	Should result to
17:17	<jgraham>	If you want to do some formatting on the text you need to decide what formatting you want
17:17	<cryzed>	That's some fancy text
17:17	<cryzed>	comment
17:17	<cryzed>	gsnedders, what should I use?
17:17	<gsnedders>	jgraham: return etree.tostring(element, encoding=unicode, method='text', with_tail=False) is better than that
17:18	<cryzed>	I can't acces this function
17:18	<jgraham>	And implement that by e.g. walking the tree replacing <br> with "\n" and adding the .tail of the br to the right place
17:18	<jgraham>	gsnedders: Define "better"
17:18	<gsnedders>	jgraham: Quicker
17:18	<gsnedders>	jgraham: The result is identical :P
17:19	<jgraham>	gsnedders: Seems unlikely to be a problem in this case
17:19	<jgraham>	It is much longer to type and easier to get wrong (maybe)
17:19	gsnedders	just wraps it in a function :P
17:19	<cryzed>	etree doesn't got the attribute .tostring
17:19	<gsnedders>	cryzed: from lxml import etree
17:20	<cryzed>	I originally only wanted to import html5lib :\|..
18:01	<cryzed>	Is it a good idea to use the latest svn build?
18:01	<gsnedders>	cryzed: Absolutely.
18:02	<gsnedders>	cryzed: It's a better idea than using the latest release
18:02	<cryzed>	Okay
18:02	<cryzed>	btw
18:02	<cryzed>	gsnedders, if I had an custom BeautifulSoup.py
18:02	<cryzed>	*a
18:02	<cryzed>	Could I somehow tell the treebuilder
18:02	<cryzed>	to use this this BeautifulSoup?
18:03	<cryzed>	http://furyu-tei.sakura.ne.jp/archives/BSXPath.zip I found this
18:03	<cryzed>	it has xpath support as it seems
18:03	<Philip`>	html5lib's BeautifulSoup support is not particularly reliable
18:03	<cryzed>	Yeah, but it works
18:03	<cryzed>	I did some things with it
18:03	<cryzed>	And I somehow think that using BeautifulSoup is easier than lxml
18:04	<Philip`>	It works as long as you don't do one of the things that doesn't work :-)
18:04	<cryzed>	e.g?
18:05	<Philip`>	e.g. http://code.google.com/p/html5lib/issues/detail?id=80
18:05	<cryzed>	oh..
18:06	<cryzed>	lxml
18:06	<cryzed>	doesn't work any better
18:06	<cryzed>	lol
18:06	<cryzed>	html5lib.HTMLParser(tree=html5lib.treebuilders.getTreeBuilder("lxml")).parse("<a><div><div><a>")
18:06	<cryzed>	>>> etree.tostring(e)
18:06	<cryzed>	'<html><head/><body><a/><div><a></a><div><a></a><a/></div></div></body></html>'
18:06	<Philip`>	That works much better, since it gives the right output and doesn't throw exceptions :-p
18:06	<cryzed>	right?
18:06	<cryzed>	oh.. you are right
18:06	gsnedders	guesses you want the html5lib serializer
18:07	<Philip`>	cryzed: "Right" according to the HTML5 spec
18:07	<cryzed>	Oke
18:07	<cryzed>	Guys let me tell what I want to do
18:07	<Philip`>	Bit peculiar how it mixes "<a></a>" and "<a/>"...
18:07	<cryzed>	Oke
18:07	<cryzed>	I'm sure some of you know 4chan
18:08	<cryzed>	I want to write a Python "API" for it
18:08	<cryzed>	take a look at this page for example:
18:08	<cryzed>	http://zip.4chan.org/x/res/1648607.html
18:08	<cryzed>	So
18:08	<cryzed>	each reply has got an id attribute and a class
18:08	<cryzed>	Well, basically I want to use lxml
18:08	<cryzed>	to scrape this site
18:08	<cryzed>	get the text between the blockquotes
18:09	<cryzed>	remove the tags
18:09	<cryzed>	and replace the <br> tags with \n
18:09	<cryzed>	and save it into a variable
18:09	<cryzed>	at the this is all getting wrapped in a class called Reply
18:09	<cryzed>	*at the end
18:10	<cryzed>	http://iohosaf.pastebin.com/d5b10f992
18:10	<cryzed>	This is what I've got so far
18:10	<cryzed>	in the implementation with BeautifulSoup
18:10	<cryzed>	works, but is ugly imho
18:13	<Philip`>	(On your original question about using a different BeautifulSoup.py: It may be sufficient to just put it in Python's search path, like by using PYTHON_PATH=some-directory-which-contains-that-file)
18:13	<cryzed>	Philip`, okay
18:13	<cryzed>	I could just rename it to BeautifulSoup.py
18:13	<cryzed>	and place it locally
18:13	<cryzed>	next to my script
18:13	<cryzed>	probably
18:13	<Philip`>	I'm not certain but I think that ought to get picked up when html5lib tries loading it
18:14	Philip`	has to go away
18:15	<cryzed>	Well oke
18:16	<cryzed>	thanks Philip`
18:16	<cryzed>	I'll get myself the newest html5lib
18:16	<cryzed>	and just tune my old lib
18:16	<cryzed>	with BeautifulSoup a bit
18:16	<cryzed>	and hope that it works
19:42	<hsivonen>	"I don't think anyone wants to break plugins" - iPhone, anyone?
19:42	<hsivonen>	(quote from HTML WG telecon minutes)
19:43	<smedero>	it was a surreal a couple of moments... for sure.
19:43	<gsnedders>	hsivonen: Well, arguably it never broke
20:45	<tantek>	hsivonen - neither iPhone nor BlackBerry browsers support Flash / plugins
21:07	gsnedders	feels bad…
21:07	<gsnedders>	Appealing to authority :(
21:08	<takkaria>	some authority is good
21:08	<gsnedders>	annevk?
21:16	<gsnedders>	ARGH!
21:16	gsnedders	gets annoyed at Google again
21:16	<gsnedders>	I can't simply look for anything relating to Lolita without getting a ton of results of porn
21:26	<Philip`>	gsnedders: Maybe you shouldn't be using the image search with SafeSearch off
21:27	<gsnedders>	Philip`: I'm not using image search. That doesn't help me write an English dissertation.
21:40	<gsnedders>	Hmm… Nabokov almost always uses to sob and rarely to cry…
22:42	<Hixie>	heycam: yt?
22:43	<Hixie>	i have a method take takes as its argument an array of values
22:43	<Hixie>	the values are typed
22:44	<Hixie>	er, i mean, an array of arrays of values, which are typed
22:44	<Hixie>	let's say, it's an array of arrays of DOMString, long pairs
22:44	<Hixie>	e.g.
22:44	<Hixie>	foo([['a', 1], ['b', 2], ['c', 3]]);
22:44	<Hixie>	is there a sane way to describe that in WebIDL?