08:27
<yorick>
the html5 boolean attributes seem to be inconsistent with the html4 ones
08:28
<yorick>
html4: <option selected="selected">contents</option>
08:28
<yorick>
html5: <div draggable="true">contents</div>
08:34
<yorick>
also, is there a possibility to set DataTransfer on dragstart to insensitive, so it can be accessed when dragging over something?
08:41
<jgraham>
Philip`: Can you get data on things that start <!-- but with a > and no --> before the end of the document
08:41
jgraham
isn't quite sure how to express that as a regexp
09:11
<annevk5>
yorick, draggable is not a boolean attribute per HTML5
09:24
<yorick>
annevk5: then what is it?
09:36
<zcorpan>
yorick: an enumerated attribute
09:52
zcorpan
adds another entry to http://wiki.whatwg.org/wiki/HTML5_Presentations
11:37
<gsnedders>
Ergh…
11:37
gsnedders
doesn't want to sign up to another bug tracker to report a bug in Validator.nu
11:54
<Philip`>
jgraham: You mean something like, um, /<!--[^>]*(?<!-->)>([^>]|(?<!-->)>)*$/ perhaps?
11:55
<Philip`>
Whoops, not that one
11:55
<Philip`>
/<!--[^>]*(?<!--)>([^>]|(?<!--)>)*$/
11:58
<jgraham>
Philip`: Perhaps
11:59
<jgraham>
If that matches "<!-- foo > bar" and <!-- foo>" but not "<!-- foo > bar -->"
12:00
<Philip`>
It does
12:00
Philip`
should take this opportunity to hook his grep tool up to his new set of pages...
12:00
<jgraham>
Ah, well that sounds like what I want then
12:01
Philip`
notes that "(?<!--)" confusingly has nothing to do with the string "<!--", it's just a negative lookbehind assertion on the string "--"
12:02
<jgraham>
Ah, that makes a little more sense
12:02
<gsnedders>
And this is why you shouldn't use regex to parse HTML P
12:02
<gsnedders>
* :P
12:03
<jgraham>
gsnedders: No this is why should shouldn't have ultra-weird comment parsing
12:03
<gsnedders>
jgraham: It's saner than SGML.
12:03
<jgraham>
which requires lookhead
12:04
<jgraham>
s/should/you/
12:05
<gsnedders>
I don't. I have perfectly sane comment parsing, thank you very much.
12:06
<jgraham>
gsnedders: You use a sophisticated biological neural network to parse comments and it's not even that reliable. How can you describe that as sane?
12:06
<gsnedders>
jgraham: Through my own insaity.
12:06
<gsnedders>
*insanity.
12:10
<Philip`>
gsnedders: Parsing it's easy, it's just like /<!(-?>|--.*?-->)/
12:10
<Philip`>
s/it's/is/
12:11
<Philip`>
Uh
12:11
<Philip`>
gsnedders: Parsing it's easy, it's just like /<!(-?>|([^-]|--).*?-->)/
12:11
<Philip`>
s/it's/is/
12:11
<Philip`>
or something like that
12:12
<Philip`>
but anyway it's easy
12:12
<MikeSmith>
gsnedders: I'm looking at the class="" bug now
12:12
<Philip`>
The difficulty is trying to match things that are *not* matched by the normal state machine
12:24
<gsnedders>
How the hell do you get script to validate in HTML 4.01?
12:24
gsnedders
stabs SGML
12:27
<Philip`>
Use <script src>
12:27
<Philip`>
If you want inline scripts, use <script src="data:text/javascript,...">
12:30
<Dashiva>
gsnedders: Just rephrase all your less-than tests :)
12:39
<MikeSmith>
from the HTML4 spec, I can't tell whether id and class are allowed to be empty or not
12:43
<Dashiva>
It says "must begin with a letter" for id
12:44
<Dashiva>
But it's taken from the SGML spec, so would probably have to look there
12:46
<MikeSmith>
hmm, the XHTML DTD defines the value of class as NMTOKENS
12:46
<gsnedders>
Yay! More undocumented differences between XHTML 1.0 and HTML 4.01!
12:48
<MikeSmith>
I think as far as the HTML4 and XHTML1 specs are concerned, the value of class can't be empty
12:48
<Dashiva>
Can't class be a list of zero class names?
12:49
<MikeSmith>
Dashiva: not as far as I can see, as far as XML goes
12:50
<MikeSmith>
http://www.w3.org/TR/REC-xml/#NT-Nmtoken
12:50
<MikeSmith>
Nmtokens ::= Nmtoken (#x20 Nmtoken)*
12:50
<MikeSmith>
Nmtoken ::= (NameChar)+
12:50
<MikeSmith>
and XHTML1 DTD says:
12:51
<Dashiva>
Fair enough. Where does it say it's NMTOKENS? xhtml1 says CDATA where I'm looking
12:51
<MikeSmith>
"class NMTOKENS #IMPLIED
12:51
<MikeSmith>
I'm looking at a DTD on my local system
12:52
<MikeSmith>
XHTML Modularization
12:52
<gsnedders>
Well, it claims to be an XHTML 1.0 schema, not an XHTML Mod. one
12:53
<Dashiva>
What I found: http://www.w3.org/TR/xhtml1/dtds.html#dtdentry_xhtml1-strict.dtd_coreattrs
12:53
<Dashiva>
I guess it's obsolete
12:53
<MikeSmith>
Dashiva: yeah, that's what I'm looking at now
12:53
<MikeSmith>
well, it's right, even if it's obsolete
12:54
<MikeSmith>
and XHTML modularization is wrong
12:54
<MikeSmith>
I mean, in practice at least
12:54
<Dashiva>
YSOD on empty class attribute? :)
12:54
<MikeSmith>
YSOD?
12:55
<Dashiva>
Yellow screen of death
12:55
<MikeSmith>
heh
12:58
<MikeSmith>
the HTML 4.01 DTD used by the W3C validator says "class CDATA #IMPLIED"
12:59
<MikeSmith>
and all the XHTML1 DTDs it uses says the same
13:00
<MikeSmith>
but the XHTML 1.1 DTD it uses says "class NMTOKENS #IMPLIED"
13:00
<MikeSmith>
so those mooncalfs apparently redefined it in XHTML 1.1
13:01
<MikeSmith>
I think in in validator.nu we should preserve that brokenness
13:02
<MikeSmith>
to learn people not to bother validating against XHTML 1.1
13:03
<MikeSmith>
but I see hsivonen had the foresight to not even include an XHTML1.1-checking option in validator.nu
13:04
<MikeSmith>
I wonder if they bothered to document this in the XHTML 1.1 spec
13:05
<MikeSmith>
nope
13:05
<MikeSmith>
http://www.w3.org/TR/xhtml11/changes.html#a_changes
13:14
<Philip`>
jgraham: With that search thing, how many results do you want?
13:18
<Philip`>
jgraham: Actually, you can have them all
13:19
<Philip`>
jgraham: http://philip.html5.org/data/comments-not-closed-but-with-a-gt-after-them.txt
13:22
<jgraham>
Philip`: Great, thanks
13:30
Philip`
hopes his regexp isn't wrong
13:38
jgraham
hasn't checked yet
13:39
<Philip`>
<!doctype html> seems to be one of those most popular HTML5 features
13:39
Philip`
sees it on 64 distinct domains
13:40
<Philip`>
like last.fm and pear.php.net and maps.google.com and edward.oconnor.cx and help.godaddy.com
13:41
<Philip`>
s/those most/the most/
14:22
<MikeSmith>
gsnedders: http://bugzilla.validator.nu/attachment.cgi?id=85
14:26
<MikeSmith>
hsivonen: ↑
14:30
<MikeSmith>
http://bugzilla.validator.nu/attachment.cgi?id=86
16:11
<zcorpan>
jgraham: have you documented <!-- and --> in web ecmascript yet?
16:13
<Philip`>
Hmm, I've seeded that dotnetdotcom index of web pages for almost 24 hours, and there's been zero connections
16:13
<Philip`>
Torrents aren't so useful for files that nobody wants to download anyway
16:36
<jgraham>
zcorpan: No, good point
16:37
<jgraham>
Philip`: I would quite like to download it but I'm not sure that it's a good idea over my crappy/bandwidth limited home connection
16:38
<Philip`>
jgraham: They provide plain HTTP download too, which might be more compatible with your connection
16:39
<Philip`>
and it's only 2.5GB, which is smaller than they claim
16:41
<Philip`>
And if you don't want all of it, you could just download the first N megabytes and discard the last entry
16:41
<Philip`>
where N is the largest value that still is considered a good idea to download
20:46
<jgraham>
Hmm, unless I am missing something, it looks like (almost?) all the cases of <!-- not followed by --> are in <script> blocks
20:46
<Philip`>
I looked at three, and saw one which wasn't
20:46
<Philip`>
but I don't know which one that was
20:48
<jgraham>
Yeah, my script is a bit buggy
20:49
<Philip`>
I hope mine wasn't
20:49
Philip`
wishes people would independently verify his data :-)
20:49
jgraham
would like to do that
20:50
<jgraham>
I will get some or more of that dotcomdotnet data at some point
20:50
<Philip`>
Also I hope their data isn't buggy
20:51
<jgraham>
That is, of course, quite possible
20:51
<Philip`>
I haven't seen any problems though
20:52
<Philip`>
except a few pages which seemingly return bogus HTTP responses that the HttpClient parser dies on
20:52
<Philip`>
but those could be legitimately broken servers
20:53
<Philip`>
I guess the main question is what direction their sample is most biased in
20:54
<Philip`>
(The sample of the web that they crawl, not the sample of their crawled data that they published (which they claim is uniform, and seemingly is restricted to 200s and text/html))
20:54
<jgraham>
Philip`: How is your sample biased? html5lib is telling me that all the sites are in no-quirks mode which seems unreasonable
20:55
<jgraham>
(the small selection I looked at were compatible with that hypothesis but that was like 2 sites)
20:57
<Philip`>
jgraham: I didn't do any sampling myself
20:57
<gsnedders>
Is html5lib reliable?
20:57
<Philip`>
Looking at a few random pages from my list, www.articlear.com/profile/Jason-Uvios/994 looks quirky
20:58
<jgraham>
gsnedders: No
20:59
<jgraham>
This is a good way of finding bugs in it though :)
23:18
<Hixie>
hsivonen: do you know if the spec reflects what you want wrt foster parenting?
23:18
<Hixie>
i have a comment suggesting you want something reverted but it seems already reverted