| 09:28 | <melvster> | Hi All, looking at example 2.1 http://dev.w3.org/html5/html-author/#html-syntax I am wondering if the <p> should be closed? |
| 09:29 | <hsivonen> | melvster: no need to close it there |
| 09:29 | <melvster> | is it because it's the last node? |
| 09:30 | <hsivonen> | yes, the last p node in a container doesn't need to be closed |
| 09:30 | <melvster> | hsivonen: OK thanks! |
| 09:31 | <zcorpan> | melvster: "A p element's end tag may be omitted if the p element is immediately followed by an address, article, aside, blockquote, datagrid, dialog, dir, div, dl, fieldset, footer, form, h1, h2, h3, h4, h5, h6, header, hr, menu, nav, ol, p, pre, section, table, or ul, element, or if there is no more content in the parent element." |
| 09:32 | <melvster> | zcorpan: thanks again, i should have read that first |
| 09:32 | <zcorpan> | no worries :) |
| 09:33 | <zcorpan> | though <p><article> has a bad back compat story |
| 09:34 | <zcorpan> | and <p><form> is buggy in some browsers |
| 09:34 | <zcorpan> | <p><hr> in ie too iirc |
| 09:34 | <zcorpan> | and <p><table> |
| 09:35 | <hsivonen> | which RFC defines the syntax for charset names? |
| 09:35 | hsivonen | should keep better notes in source comments |
| 09:35 | <zcorpan> | <p>x<hr> is different in ie |
| 09:39 | <hsivonen> | http://tools.ietf.org/html/rfc2978 apparently |
| 09:51 | <zcorpan> | hmm, the bail-out list could be different for mathml and svg so that <svg><font> is allowed but <math><font> not |
| 09:53 | <hsivonen> | Hixie: is it safe to refer to http://www.whatwg.org/style/specification from elsewhere? is that URI expected to point to a "Working Draft" style? |
| 09:53 | <hsivonen> | Hixie: or should I copy the style sheet? |
| 09:53 | <hsivonen> | (I'm assuming the style sheet is covered by the WHATWG document license) |
| 09:54 | <zcorpan> | could also add <a>, <script> and <style> to the mathml list |
| 09:54 | <Hixie> | it's covered by whatever license you want |
| 09:54 | <Hixie> | i make no promises about not changing it |
| 09:54 | <Hixie> | it's not working draft vs any other kind of spec though |
| 09:55 | <Hixie> | iirc i have a class on the body element to decide the 'worker draft' banner |
| 09:55 | <zcorpan> | yep |
| 09:55 | <hsivonen> | Hixie: ok. |
| 09:55 | <zcorpan> | Hixie: what do you think about different bail-out lists (see above)? |
| 09:55 | <hsivonen> | (now that I think about it, I already saw zcorpan use the class on body) |
| 09:56 | <hsivonen> | zcorpan: one possibility is bailing out on font only if it has the kind of attributes presentational HTML has |
| 09:58 | <zcorpan> | hsivonen: oh yep didn't think of that |
| 09:58 | <zcorpan> | not sure which is better |
| 09:59 | <Hixie> | zcorpan: i'm not yet convinced we can add <svg><font>, need to study that further |
| 10:00 | <hsivonen> | Hixie: we could also require a <defs> context |
| 10:00 | <Hixie> | that would be profiling svg in a more weird way |
| 10:00 | <Hixie> | it's one thing to say "you can't use font" |
| 10:00 | <Hixie> | it's another to say "you can use font in these specific cases..." |
| 10:01 | <hsivonen> | Isn't <font> in practice always used as a child of <defs> in non-contrived SVG? |
| 10:03 | <zcorpan> | Hixie: doesn't not bailing on <script> break some pages with <math><script> ? |
| 10:03 | <zcorpan> | (or style/a) |
| 10:04 | <hsivonen> | Hixie: could you please add ids to the paragraphs starting with "In the foo state," and giving the conformance reqs for the shape attribute states for image map area? |
| 10:04 | <hsivonen> | s/shape/coords/ |
| 10:04 | <Hixie> | hsivonen: not to my knowledge, why would you bother with the <defs>? |
| 10:04 | <Hixie> | if you want changes, send mail |
| 10:05 | <Hixie> | i'm not near the editor right now |
| 10:05 | <hsivonen> | Hixie: I'm just generalizing from the stuff Philip` found in Wikipedia |
| 10:05 | <hsivonen> | Hixie: ok. I'll send mail |
| 10:05 | <Hixie> | thx |
| 10:06 | <Hixie> | i think basing it on attributes would be better than on context, given that the whole point is to bail if someone does something stupid |
| 10:06 | <Hixie> | they're more likely to put an html font after a bunch of random svg copied and pasted, than to put svg font attributes on an html font element |
| 10:07 | <zcorpan> | Hixie: makes sense |
| 10:07 | <Hixie> | oh btw someone sent me mail about a major bug in the parser that i need to fix |
| 10:07 | <Hixie> | basically the generic cdata element parsing algorithm thingy totally doesn't work with document.write() |
| 10:08 | <Hixie> | consider <script>document.write("<style>a");document.write("b</style>")</script> |
| 10:08 | <Hixie> | (or worse, nested <script> elements) |
| 10:08 | hsivonen | fires up his GWT test harness... |
| 10:08 | <Hixie> | so i'm going to split the cdata algorithm into its own state |
| 10:09 | <Hixie> | instead of being a tokeniser pull, bring it in line with everything else (tokeniser push) |
| 10:09 | <zcorpan> | Hixie: why doesn't that work? |
| 10:09 | <Hixie> | is there anything else that pulls from the tokeniser at this point? |
| 10:09 | <hsivonen> | Hixie: no. (and I've already implemented everything as push) |
| 10:10 | <Hixie> | zcorpan: because at the end of the document.write() input stream the tokeniser is stopped and the tree construction stage is exitted, so you lose the fact that you're in the middle of a pulling step |
| 10:10 | <zcorpan> | Hixie: ah |
| 10:10 | <Hixie> | hsivonen: did you implement the generic cdata thing as having a new variable to preserve the source state? |
| 10:10 | <Hixie> | hsivonen: or? |
| 10:11 | <hsivonen> | Hixie: I implemented it as a flag in the tree builder |
| 10:11 | <hsivonen> | Hixie: and it appears that your example above breaks it :-( |
| 10:11 | <Hixie> | ah, somewhat like the earlier split of insertion modes vs states? |
| 10:11 | <Hixie> | oh? |
| 10:12 | <hsivonen> | at least when using WebKit as the engine in GWT, "b" never ends up inside the style element |
| 10:12 | <hsivonen> | it's completely lost somewhere |
| 10:12 | <Hixie> | fun |
| 10:12 | <Hixie> | if you replace style with script the problem becomes worse |
| 10:12 | <Hixie> | because the element is added with the end tag, not the start tag |
| 10:12 | <Hixie> | so you end up losing the element altogether in a naive push implementation |
| 10:14 | <Philip`> | <script>document.write('<style></sty');document.write('le>')</script> - how would that work with the tokeniser upon seeing the "</", since the "[if] the next few characters do not match the tag name of the last start tag token emitted" condition wouldn't make sense at that point? |
| 10:14 | <hsivonen> | I'm trying to review what exactl I'm doing but Eclipse beachballs on me |
| 10:14 | <Philip`> | Wait, do I mean <style>? |
| 10:15 | <Philip`> | Oh, yes, I think I do |
| 10:15 | <Hixie> | Philip`: yeah, i noticed the same problem with the <![CDATA and <!DOCTYPE tokenising |
| 10:16 | <Hixie> | Philip`: but that's easy to fix, you just say that the tokeniser stops when it's missing data to resolve an ambiguous state and wave your hand and move on |
| 10:16 | <Hixie> | Philip`: "implementation detail" |
| 10:16 | <hsivonen> | Hixie: here's what I do: |
| 10:16 | <hsivonen> | 1) Everything is tokenizer push |
| 10:16 | <hsivonen> | 2) tree builder has a variable called cdataOrRcdataTimesToPop |
| 10:16 | <Philip`> | Hixie: Can't you do the same hand-waving in the generic CDATA whatnots, then? |
| 10:17 | <Hixie> | Philip`: no, because when you abort that tree construction stage you return the previous one, which is in the middle of doing the cdata processing |
| 10:17 | <hsivonen> | 3) If the spec calls for pushing the head element on stack first, cdataOrRcdataTimesToPop is set to 2. else, it is set to 1 |
| 10:17 | <hsivonen> | 4) endTag pops cdataOrRcdataTimesToPop times |
| 10:18 | <hsivonen> | and zeros cdataOrRcdataTimesToPop |
| 10:18 | <Philip`> | Ah |
| 10:18 | <Hixie> | ah |
| 10:18 | <Hixie> | that won't work :-) |
| 10:18 | <Hixie> | but makes sense given the spec today |
| 10:18 | <hsivonen> | 5) if cdataOrRcdataTimesToPop > 0, characters just accumulate and returns early without inspecting insertion mode |
| 10:19 | <Hixie> | anyway dunno when i'll fix this, i expect it's in the coming few weeks though |
| 10:19 | <hsivonen> | It bothers me that I don't know what happened to "b" in the GWT case |
| 10:19 | <Hixie> | i've been avoiding the parser folder because i've been hoping the svgwg will fix the issues you, takkaria, and myself raised with their proposal |
| 10:20 | <Hixie> | but i guess eventually i'll go in and deal with it |
| 10:21 | <hsivonen> | Hixie: well, both takkaria and I said we'd prefer your/zcorpan's suggestion |
| 10:21 | <Hixie> | apparently <datagrid> is the next target |
| 10:21 | <Hixie> | hsivonen: zcorpan claims the svgwg proposal is as much his as the current spec's :-P |
| 10:23 | <Hixie> | ok bed time nn |
| 12:59 | <hsivonen> | Does HTML5 define where LWS is really allowed in the http://tools.ietf.org/html/rfc2045#section-5.1 syntax for Web purposes? |
| 13:33 | <gsnedders> | hsivonen: no |
| 13:35 | <hsivonen> | gsnedders: OK. thanks. |
| 13:35 | <hsivonen> | gsnedders: do you happen to document it for HTTP? |
| 13:35 | <gsnedders> | (and I don't know either) |
| 13:35 | <hsivonen> | it appears I have made up a definition then |
| 13:35 | <hsivonen> | I'll just write that down in my spec |
| 13:35 | <gsnedders> | I'm still (occasionally) working on the overall syntax of the entire HTTP structure |
| 13:35 | <gsnedders> | Not got to anything so exact as parsing actual headers :P |
| 13:48 | <hsivonen> | http://hsivonen.iki.fi/html5-datatypes/ comments welcome |
| 13:50 | <Philip`> | hsivonen: s/hecking/checking/ |
| 13:51 | <hsivonen> | Philip`: thanks |
| 13:55 | <hsivonen> | selittäkääpä, miten kongressihenkilöt voivat istua tuntikaupalla hearingissa käymättä vessassa |
| 13:59 | <zcorpan> | is this valid? <img usemap=# src=x><map name> |
| 14:01 | <hsivonen> | zcorpan: no, the name attribute must be non-empty |
| 14:01 | <zcorpan> | hsivonen: ah |
| 14:01 | <hsivonen> | zcorpan: but according to the datatype lib, usemap=# is valid (checking referential integrity happens elsewhere) |
| 14:02 | <zcorpan> | ok |
| 14:18 | <hsivonen> | hendry: did you get an instance of the CSS validator running? If yes, under which servlet container? |
| 14:38 | <zcorpan> | hmm, why does name allow whitespace |
| 14:39 | <hsivonen> | zcorpan: in validator or in spec? |
| 14:39 | <zcorpan> | hsivonen: in spec |
| 14:39 | <zcorpan> | at leat if id is not present |
| 14:43 | <hsivonen> | I've now postponed rel checking well over a year. |
| 14:43 | <hsivonen> | I wonder if the rel stuff is still at risk... |
| 14:44 | <hsivonen> | at least the registry was discussed relatively recently on public-html |
| 14:59 | <Lachy> | I finally found some time to review the SVG WG's proposal. Personally, I'm not particularly fond of it |
| 15:01 | <Lachy> | there isn't really sufficient justification for some of the requirements it tries to address, beyond keeping it theoretically-pure-well-formed XML |
| 15:07 | <zcorpan> | should we add alt to embed? apparently opera supports it |
| 15:12 | <hsivonen> | zcorpan: would it be rendered when the plugin isn't installed? |
| 15:12 | <zcorpan> | hsivonen: yes |
| 15:15 | <hsivonen> | hmm. Flash is supposed to be accessible in itself. Video plug-ins are supposed to get superceded by <video>. Apart from Silverlight, newer plugins tend to be non-rendered and provide JS APIs |
| 15:16 | <hsivonen> | like Gears or the Garmin plugin for integrating GPS devices |
| 15:17 | <hsivonen> | it seems to me that the use case would be customizing the "Boohoo. Go install a plugin." message that e.g. Firefox generates as UI. |
| 15:18 | <zcorpan> | i guess |
| 15:21 | <zcorpan> | hsivonen: s/strings match/strings that match/ |
| 15:21 | <zcorpan> | hsivonen: might squeeze in a "the" in there too |
| 15:22 | <hsivonen> | zcorpan: fixed, thanks |
| 15:22 | <zcorpan> | hsivonen: what is xml-name used for? |
| 15:23 | <hsivonen> | zcorpan: it's used for XHTML 1.0 backports. it probably shouldn't be in the lib in theory, but putting it there is convenient for me |
| 15:23 | <zcorpan> | hsivonen: ok |
| 15:55 | gsnedders | wonders whether to do something that'll make him unpopular with many around here: serve XHTML as text/html |
| 15:57 | <gsnedders> | Actually, I can just do this in Ruby, and use a pre-existing HTML parser! |
| 15:57 | <gsnedders> | Yay! |
| 17:52 | Philip` | discovers that if he makes a Jabber client send namespace-ill-formed XML to a group chat, then ejabberd propagates it to all the other clients and they detect the error and disconnect |
| 17:53 | <Philip`> | and when they reconnect and rejoin the group, the server helpfully sends the past messages to the newly-joining clients, which breaks them again |
| 17:54 | <gsnedders> | Hahahaha. |
| 17:54 | <gsnedders> | Awesome. |
| 17:55 | <gsnedders> | I think something is wrong. Fx fails all of the HTTP parsing tests |
| 17:56 | <Philip`> | I'd be more concerned if it passed them all |
| 17:58 | <gsnedders> | Yes, but it doesn't even try running them. |
| 17:58 | <gsnedders> | Which is why it claims to fail them all. |
| 17:58 | <gsnedders> | expected_xhr.onreadystatechange is never hit :\ |
| 17:59 | <gsnedders> | Interesting. |
| 17:59 | <gsnedders> | Opera has changed behaviour with HTTP/0.9 |
| 18:00 | <gsnedders> | Got status code 0, expected 200 |
| 18:00 | <gsnedders> | Got status text , expected OK |
| 18:01 | takkaria | chuckles at Philip` and his XML games |
| 18:01 | <Philip`> | (It works against individuals by sending normal messages too, but the server doesn't appear to resend them after the first time) |
| 18:02 | <jmb> | Philip`: that's pretty nasty :) |
| 18:04 | <gsnedders> | Why is onreadystatechange never called? |
| 18:16 | <gsnedders> | readyState is getting changed :\ |
| 18:19 | <Philip`> | https://support.process-one.net/browse/EJAB-680 |
| 18:19 | <takkaria> | Philip`: would you be able to give me some statistics on the number of pages which include CRs and NULs in attribute values? |
| 18:21 | <gsnedders> | Philip`: Also, could you see if you have any pages that start with "HTTP" case-insensitively but not case-sensitively? |
| 18:23 | <Philip`> | takkaria: Maybe - I guess it should be easy to modify hsivonen's tokeniser to detect that |
| 18:23 | <Philip`> | takkaria: Do you care how many times per page it occurs, or just how many pages it occurs >= 1 time on? |
| 18:23 | <takkaria> | Philip`: the latter |
| 18:23 | <Philip`> | gsnedders: By "pages", do you mean the body of the page (i.e. after parsing and stripping the HTTP headers)? |
| 18:24 | <gsnedders> | Philip`: No, I mean the entire response |
| 18:24 | <gsnedders> | (i.e., what the response-line begins with) |
| 18:24 | <Philip`> | gsnedders: In that case, no |
| 18:24 | <gsnedders> | or status-line, or whatever it's called |
| 18:24 | <Philip`> | gsnedders: since I didn't save the raw response bytes, only the parsed representation |
| 18:25 | <gsnedders> | Philip`: Ah. |
| 18:25 | <Philip`> | gsnedders: (since I couldn't see a way to make HttpClient return the raw response bytes) |
| 18:25 | <gsnedders> | Philip`: Write your own! :P |
| 18:25 | <Philip`> | gsnedders: My own HTTP client? I can't do that until someone's written a proper spec on how to write one :-p |
| 18:26 | <gsnedders> | Philip`: Oh, all I'm writing is how to parse the response/request. You don't need to do either of those :P |
| 18:50 | <Philip`> | takkaria: It looks like about 10% have \r in attribute values somewhere, and about 0% have \0 |
| 18:50 | <Philip`> | and of those 0%, most are JPEG and PDF files |
| 18:51 | <Philip`> | Wait a minute, I'll change it to only look at text/html... |
| 18:53 | <Philip`> | http://www.slovanova.sk/ - aha, actual HTML with a \0 |
| 18:55 | Philip` | waits ten minutes while it searches through all the other files |
| 18:57 | <takkaria> | 10% is an interestingly high value |
| 19:09 | <Philip`> | takkaria: From something like 126989 text/html pages in total: |
| 19:09 | <Philip`> | 16 \0 in attribute value |
| 19:09 | <Philip`> | 10622 \r in attribute value |
| 19:09 | <Philip`> | 47 \r\n in attribute value |
| 19:10 | <Philip`> | (Those "\r\n" are slightly bogus - it should have aborted after finding the first "\r", but I didn't detect \rs that came after entities and got unconsumed, so it didn't notice until it got to the \n) |
| 19:12 | <Philip`> | takkaria: http://philip.html5.org/data/attr-chars.txt lists them all |
| 21:25 | <weinig> | Hixie: ping |
| 21:40 | gsnedders | still needs a decent idea for his computing project for this year for school :\ |
| 21:48 | <gsnedders> | 40 hours project, with 20 hours for impl. |
| 21:51 | <Philip`> | A day of coding? That's not much :-p |
| 21:51 | <gsnedders> | You aren't expected to do it in one day :P |
| 21:52 | <Philip`> | That doesn't demonstrate much dedication |
| 21:52 | <gsnedders> | You're only expected to have an hour of class time per day five days a week |
| 21:53 | <gsnedders> | Of course, I'm doing it out of class, so I have none :P |
| 22:48 | gsnedders | would like to do something in Haskell or C/C++ |
| 22:48 | <gsnedders> | But I can't decide what :P |
| 22:49 | <gsnedders> | Anybody got any suggestions? |
| 22:53 | <jgraham> | gsnedders: Well lerning Haskell might be a 20 hour project on its own. Not much to hand in though |
| 22:53 | <gsnedders> | jgraham: That's problematic, especially when it counts for 40% of the final grade. |
| 22:54 | <jgraham> | You could implement a domain-specific language for something |
| 22:54 | <gsnedders> | Writing an html5 parser is too big, probably |
| 22:54 | <jgraham> | gsnedders: an html5 would probably take longer, yes |
| 22:55 | <jgraham> | http://effbot.org/zone/simple-top-down-parsing.htm seems like a nice article about language parsing |
| 22:55 | <jgraham> | s/an html5/an html5 parser/ |
| 22:56 | <gsnedders> | jgraham: Yeah, writing HTML 5 would take a while :P |
| 22:56 | <jgraham> | Well if you manage it in 20 hours Hixie will look a bit silly :) |
| 22:58 | <gsnedders> | hmmm. |
| 23:09 | <gsnedders> | I could do something based on trying to detect spam |
| 23:12 | <jgraham> | gsnedders: I have a problem that I actually would like a solution to but don't currently have time to implement |
| 23:12 | <gsnedders> | jgraham: What problem? :P |
| 23:15 | <jgraham> | Many scientists use the arXiv prepint servers to keep up to date with current research. The basically provide a daily list of new preprints ordered by submission time. They are broadly categorised but only in astrophysics / high energy physics / computer science / etc. much broader than the expertise of most readers |
| 23:15 | <jgraham> | The time ordering creates two problems. One is that it is hard to find things that you are interested in, especially if they appear down the list somewhere |
| 23:16 | <jgraham> | The second is that papers appearing near the top of the listings tend to be noticed more and get more citations --- this is a measured effect |
| 23:17 | <jgraham> | What i want is an interface to the preprint server where the day's listings are ordered according to my personal reading habits |
| 23:17 | <jgraham> | These would be deermied automatically using some sort of machine learing algorithm |
| 23:19 | <jgraham> | Basically the way I imagine it working is that when you click on a paper the keywords from that paper (authors, title, abstract text) are added to some weighting which increases the probability of papers with similar authors, titles or abstracts appearing at the top |
| 23:20 | <jgraham> | This is basically just a spam classificaion problem except that you're trying to pick out the most useful items and present them first |
| 23:20 | <jgraham> | Rather than discard the least useful items |
| 23:20 | <Hixie> | weinig|kaphine: pong |
| 23:21 | <jgraham> | gsnedders: I don't know how easy writing the actual machine learning bit would be but you might be able to find library code for that |
| 23:22 | <jgraham> | In fact I know you can because I looked when I first hough about this problem |
| 23:22 | <weinig> | hey Hixie |
| 23:22 | <gsnedders> | jgraham: It'd be nice to do that with feeds, to do it in more generic form |
| 23:23 | <weinig> | Hixie: I as curious if you were considering specifying the rules for HTML entity error recovery at some point in HTML5? |
| 23:23 | <weinig> | s/as/am/ |
| 23:23 | <jgraham> | gsnedders: Sure, although that's not the problem that I'm interestedd in :) |
| 23:24 | <gsnedders> | jgraham: The logic used to detect whether an article is of interest or not is the same, though |
| 23:24 | <Hixie> | weinig: HTML entity error recovery? |
| 23:24 | <gsnedders> | And that, I expect, it the hard part. |
| 23:24 | <weinig> | Hixie: some thing akin to the issue described in https://bugs.webkit.org/show_bug.cgi?id=4948 |
| 23:25 | <jgraham> | gsnedders: Yes, I agree that it's essentially the same problem |
| 23:25 | <Hixie> | weinig: that's all already defined |
| 23:25 | <Hixie> | weinig: note that it differs from attributes and in body text (the spec handles that too) |
| 23:25 | <weinig> | Hixie: it is? great! |
| 23:25 | <gsnedders> | jgraham: It's just a matter of plugging in the data source |
| 23:25 | <weinig> | couldn't find it in the text |
| 23:25 | <weinig> | will look again |
| 23:26 | <Hixie> | weinig: just start from the data state in the tokeniser (or the attribute value state in the tokeniser) |
| 23:26 | <Hixie> | weinig: and pretend you are parsing each of those cases |
| 23:26 | weinig | nods |
| 23:26 | <franksalim> | gsnedders, jgraham: you would just need to generate a feed from arXiv if there isn't one already |
| 23:26 | <Hixie> | should all be hyperlinked properly |
| 23:27 | <jgraham> | franksalim: I'm pretty sure there is |
| 23:30 | <jgraham> | Although curiously it seems to be different to the web page |