| 08:27 | <yorick> | the html5 boolean attributes seem to be inconsistent with the html4 ones |
| 08:28 | <yorick> | html4: <option selected="selected">contents</option> |
| 08:28 | <yorick> | html5: <div draggable="true">contents</div> |
| 08:34 | <yorick> | also, is there a possibility to set DataTransfer on dragstart to insensitive, so it can be accessed when dragging over something? |
| 08:41 | <jgraham> | Philip`: Can you get data on things that start <!-- but with a > and no --> before the end of the document |
| 08:41 | jgraham | isn't quite sure how to express that as a regexp |
| 09:11 | <annevk5> | yorick, draggable is not a boolean attribute per HTML5 |
| 09:24 | <yorick> | annevk5: then what is it? |
| 09:36 | <zcorpan> | yorick: an enumerated attribute |
| 09:52 | zcorpan | adds another entry to http://wiki.whatwg.org/wiki/HTML5_Presentations |
| 11:37 | <gsnedders> | Ergh… |
| 11:37 | gsnedders | doesn't want to sign up to another bug tracker to report a bug in Validator.nu |
| 11:54 | <Philip`> | jgraham: You mean something like, um, /<!--[^>]*(?<!-->)>([^>]|(?<!-->)>)*$/ perhaps? |
| 11:55 | <Philip`> | Whoops, not that one |
| 11:55 | <Philip`> | /<!--[^>]*(?<!--)>([^>]|(?<!--)>)*$/ |
| 11:58 | <jgraham> | Philip`: Perhaps |
| 11:59 | <jgraham> | If that matches "<!-- foo > bar" and <!-- foo>" but not "<!-- foo > bar -->" |
| 12:00 | <Philip`> | It does |
| 12:00 | Philip` | should take this opportunity to hook his grep tool up to his new set of pages... |
| 12:00 | <jgraham> | Ah, well that sounds like what I want then |
| 12:01 | Philip` | notes that "(?<!--)" confusingly has nothing to do with the string "<!--", it's just a negative lookbehind assertion on the string "--" |
| 12:02 | <jgraham> | Ah, that makes a little more sense |
| 12:02 | <gsnedders> | And this is why you shouldn't use regex to parse HTML P |
| 12:02 | <gsnedders> | * :P |
| 12:03 | <jgraham> | gsnedders: No this is why should shouldn't have ultra-weird comment parsing |
| 12:03 | <gsnedders> | jgraham: It's saner than SGML. |
| 12:03 | <jgraham> | which requires lookhead |
| 12:04 | <jgraham> | s/should/you/ |
| 12:05 | <gsnedders> | I don't. I have perfectly sane comment parsing, thank you very much. |
| 12:06 | <jgraham> | gsnedders: You use a sophisticated biological neural network to parse comments and it's not even that reliable. How can you describe that as sane? |
| 12:06 | <gsnedders> | jgraham: Through my own insaity. |
| 12:06 | <gsnedders> | *insanity. |
| 12:10 | <Philip`> | gsnedders: Parsing it's easy, it's just like /<!(-?>|--.*?-->)/ |
| 12:10 | <Philip`> | s/it's/is/ |
| 12:11 | <Philip`> | Uh |
| 12:11 | <Philip`> | gsnedders: Parsing it's easy, it's just like /<!(-?>|([^-]|--).*?-->)/ |
| 12:11 | <Philip`> | s/it's/is/ |
| 12:11 | <Philip`> | or something like that |
| 12:12 | <Philip`> | but anyway it's easy |
| 12:12 | <MikeSmith> | gsnedders: I'm looking at the class="" bug now |
| 12:12 | <Philip`> | The difficulty is trying to match things that are *not* matched by the normal state machine |
| 12:24 | <gsnedders> | How the hell do you get script to validate in HTML 4.01? |
| 12:24 | gsnedders | stabs SGML |
| 12:27 | <Philip`> | Use <script src> |
| 12:27 | <Philip`> | If you want inline scripts, use <script src="data:text/javascript,..."> |
| 12:30 | <Dashiva> | gsnedders: Just rephrase all your less-than tests :) |
| 12:39 | <MikeSmith> | from the HTML4 spec, I can't tell whether id and class are allowed to be empty or not |
| 12:43 | <Dashiva> | It says "must begin with a letter" for id |
| 12:44 | <Dashiva> | But it's taken from the SGML spec, so would probably have to look there |
| 12:46 | <MikeSmith> | hmm, the XHTML DTD defines the value of class as NMTOKENS |
| 12:46 | <gsnedders> | Yay! More undocumented differences between XHTML 1.0 and HTML 4.01! |
| 12:48 | <MikeSmith> | I think as far as the HTML4 and XHTML1 specs are concerned, the value of class can't be empty |
| 12:48 | <Dashiva> | Can't class be a list of zero class names? |
| 12:49 | <MikeSmith> | Dashiva: not as far as I can see, as far as XML goes |
| 12:50 | <MikeSmith> | http://www.w3.org/TR/REC-xml/#NT-Nmtoken |
| 12:50 | <MikeSmith> | Nmtokens ::= Nmtoken (#x20 Nmtoken)* |
| 12:50 | <MikeSmith> | Nmtoken ::= (NameChar)+ |
| 12:50 | <MikeSmith> | and XHTML1 DTD says: |
| 12:51 | <Dashiva> | Fair enough. Where does it say it's NMTOKENS? xhtml1 says CDATA where I'm looking |
| 12:51 | <MikeSmith> | "class NMTOKENS #IMPLIED |
| 12:51 | <MikeSmith> | I'm looking at a DTD on my local system |
| 12:52 | <MikeSmith> | XHTML Modularization |
| 12:52 | <gsnedders> | Well, it claims to be an XHTML 1.0 schema, not an XHTML Mod. one |
| 12:53 | <Dashiva> | What I found: http://www.w3.org/TR/xhtml1/dtds.html#dtdentry_xhtml1-strict.dtd_coreattrs |
| 12:53 | <Dashiva> | I guess it's obsolete |
| 12:53 | <MikeSmith> | Dashiva: yeah, that's what I'm looking at now |
| 12:53 | <MikeSmith> | well, it's right, even if it's obsolete |
| 12:54 | <MikeSmith> | and XHTML modularization is wrong |
| 12:54 | <MikeSmith> | I mean, in practice at least |
| 12:54 | <Dashiva> | YSOD on empty class attribute? :) |
| 12:54 | <MikeSmith> | YSOD? |
| 12:55 | <Dashiva> | Yellow screen of death |
| 12:55 | <MikeSmith> | heh |
| 12:58 | <MikeSmith> | the HTML 4.01 DTD used by the W3C validator says "class CDATA #IMPLIED" |
| 12:59 | <MikeSmith> | and all the XHTML1 DTDs it uses says the same |
| 13:00 | <MikeSmith> | but the XHTML 1.1 DTD it uses says "class NMTOKENS #IMPLIED" |
| 13:00 | <MikeSmith> | so those mooncalfs apparently redefined it in XHTML 1.1 |
| 13:01 | <MikeSmith> | I think in in validator.nu we should preserve that brokenness |
| 13:02 | <MikeSmith> | to learn people not to bother validating against XHTML 1.1 |
| 13:03 | <MikeSmith> | but I see hsivonen had the foresight to not even include an XHTML1.1-checking option in validator.nu |
| 13:04 | <MikeSmith> | I wonder if they bothered to document this in the XHTML 1.1 spec |
| 13:05 | <MikeSmith> | nope |
| 13:05 | <MikeSmith> | http://www.w3.org/TR/xhtml11/changes.html#a_changes |
| 13:14 | <Philip`> | jgraham: With that search thing, how many results do you want? |
| 13:18 | <Philip`> | jgraham: Actually, you can have them all |
| 13:19 | <Philip`> | jgraham: http://philip.html5.org/data/comments-not-closed-but-with-a-gt-after-them.txt |
| 13:22 | <jgraham> | Philip`: Great, thanks |
| 13:30 | Philip` | hopes his regexp isn't wrong |
| 13:38 | jgraham | hasn't checked yet |
| 13:39 | <Philip`> | <!doctype html> seems to be one of those most popular HTML5 features |
| 13:39 | Philip` | sees it on 64 distinct domains |
| 13:40 | <Philip`> | like last.fm and pear.php.net and maps.google.com and edward.oconnor.cx and help.godaddy.com |
| 13:41 | <Philip`> | s/those most/the most/ |
| 14:22 | <MikeSmith> | gsnedders: http://bugzilla.validator.nu/attachment.cgi?id=85 |
| 14:26 | <MikeSmith> | hsivonen: ↑ |
| 14:30 | <MikeSmith> | http://bugzilla.validator.nu/attachment.cgi?id=86 |
| 16:11 | <zcorpan> | jgraham: have you documented <!-- and --> in web ecmascript yet? |
| 16:13 | <Philip`> | Hmm, I've seeded that dotnetdotcom index of web pages for almost 24 hours, and there's been zero connections |
| 16:13 | <Philip`> | Torrents aren't so useful for files that nobody wants to download anyway |
| 16:36 | <jgraham> | zcorpan: No, good point |
| 16:37 | <jgraham> | Philip`: I would quite like to download it but I'm not sure that it's a good idea over my crappy/bandwidth limited home connection |
| 16:38 | <Philip`> | jgraham: They provide plain HTTP download too, which might be more compatible with your connection |
| 16:39 | <Philip`> | and it's only 2.5GB, which is smaller than they claim |
| 16:41 | <Philip`> | And if you don't want all of it, you could just download the first N megabytes and discard the last entry |
| 16:41 | <Philip`> | where N is the largest value that still is considered a good idea to download |
| 20:46 | <jgraham> | Hmm, unless I am missing something, it looks like (almost?) all the cases of <!-- not followed by --> are in <script> blocks |
| 20:46 | <Philip`> | I looked at three, and saw one which wasn't |
| 20:46 | <Philip`> | but I don't know which one that was |
| 20:48 | <jgraham> | Yeah, my script is a bit buggy |
| 20:49 | <Philip`> | I hope mine wasn't |
| 20:49 | Philip` | wishes people would independently verify his data :-) |
| 20:49 | jgraham | would like to do that |
| 20:50 | <jgraham> | I will get some or more of that dotcomdotnet data at some point |
| 20:50 | <Philip`> | Also I hope their data isn't buggy |
| 20:51 | <jgraham> | That is, of course, quite possible |
| 20:51 | <Philip`> | I haven't seen any problems though |
| 20:52 | <Philip`> | except a few pages which seemingly return bogus HTTP responses that the HttpClient parser dies on |
| 20:52 | <Philip`> | but those could be legitimately broken servers |
| 20:53 | <Philip`> | I guess the main question is what direction their sample is most biased in |
| 20:54 | <Philip`> | (The sample of the web that they crawl, not the sample of their crawled data that they published (which they claim is uniform, and seemingly is restricted to 200s and text/html)) |
| 20:54 | <jgraham> | Philip`: How is your sample biased? html5lib is telling me that all the sites are in no-quirks mode which seems unreasonable |
| 20:55 | <jgraham> | (the small selection I looked at were compatible with that hypothesis but that was like 2 sites) |
| 20:57 | <Philip`> | jgraham: I didn't do any sampling myself |
| 20:57 | <gsnedders> | Is html5lib reliable? |
| 20:57 | <Philip`> | Looking at a few random pages from my list, www.articlear.com/profile/Jason-Uvios/994 looks quirky |
| 20:58 | <jgraham> | gsnedders: No |
| 20:59 | <jgraham> | This is a good way of finding bugs in it though :) |
| 23:18 | <Hixie> | hsivonen: do you know if the spec reflects what you want wrt foster parenting? |
| 23:18 | <Hixie> | i have a comment suggesting you want something reverted but it seems already reverted |