01:12
<Lachy>
I managed to write a simple python script using html5lib to read the spec source and find all the element summaries. Tomorrow, I should be able to extend the script to generate element summaries for the authoring guide
01:13
<Hixie>
neat
01:13
<Hixie>
let me know if you want me to do a POST to a particular url every time i do a commit
01:13
<Hixie>
that way you could have your spec autoregenerate when i update the spec
01:13
<Lachy>
yeah, it's a lot better than I had been doing, which involved a lot more manual work. This should at least spead up progress a lot
01:15
<Lachy>
you check in so frequently compared with me, and most of your changes won't affect the element summaries, so that wouldn't be worth the effort
01:16
<Lachy>
I would like some way to know that element summaries have changed, though, if possible, so I know I should regenerate that section.
01:17
<Lachy>
what do you think about the template design I linked above?
01:17
<Lachy>
any suggestions for improvement?
01:39
<Hixie>
the DOM interface part is going to be huge for some elements
01:40
<Hixie>
i dunno
01:40
<Hixie>
are the categories and suchlike going to be links?
01:41
<Hixie>
maybe the attributes and IDL should both just be <dl>s with name/definition pairs
01:42
<karlcow>
Lachy: feel free to share it somewhere ;) I mean the script
01:42
<Lachy>
yeah, they will all be linked once anolis generates them
01:42
<Lachy>
karlcow, it will be checked into CVS once it's finished
01:42
<karlcow>
cool
01:43
<Lachy>
Hixie, do you mean <dl><dt>Attributes <dd>attr1<dd>attr2 ... <dt>DOM Interfaces <dd>prop1<dd>prop2 ...</dl>?
01:43
<Hixie>
no
01:43
<Lachy>
or do you mean each attribute <dt>Attr <dd>description of attr
01:43
<Hixie>
the latter
01:44
<Hixie>
i was about to type an example but you got there first :-)
01:45
<Lachy>
if each attribute was to have a description, I was considering using a table instead.
01:45
<Hixie>
that's fine too
01:45
<Lachy>
But then I'd need to write short summary descriptions for each attribute, though I wanted to avoid cluttering the summary boxes with too much information
01:46
<Lachy>
hmm, maybe something like the comments for each attr in the HTML4 DTD would be ok. e.g. http://www.w3.org/TR/html401/struct/links.html#edef-A
01:48
<Hixie>
just have the full descriptions there
01:49
<Hixie>
you'll probably have to split the attribute descriptions from the more meaty "how to use the element" text anyway
01:51
<Lachy>
maybe. it depends how much I need to write for each attribute description
01:51
<Lachy>
if it's a lot, then it's not really going to fit in the space too well
01:51
<Hixie>
on the web the space is infinite :-)
01:52
<Hixie>
i wouldn't worry too much about having the right template
01:52
<Hixie>
once you've got descriptions for a dozen elements or two, you'll have a much better idea of what you have to put in the template
01:54
<Lachy>
reload the template and take a look now
02:05
jwalden
wonders if the HTTP spec will ever sanction use of Set-Cookie, WWW-Authenticate, and Proxy-Authenticate as they are actually used
02:06
<Lachy>
wtf? Sam incorrectly defined the meaning of a strawman http://lists.w3.org/Archives/Public/public-html/2009Jan/0688.html and then Leif incorrectly pointed out something isn't a strawman, either by it's real definition, nor Sam's http://lists.w3.org/Archives/Public/public-html/2009Feb/0000.html
02:06
jwalden
suspects not, based on http://trac.tools.ietf.org/wg/httpbis/trac/ticket/129
05:11
<Hixie>
Lachy: i was surprised to see sam's suggestion of actually having people call each other out; it'll be interesting to see how such an approach fares
05:11
<Hixie>
Lachy: the attribute side looks good; i still think the idl side should use a similar technique instead of webidl
05:26
<heycam>
hey Hixie i'm here for a bit
05:26
<Hixie>
hey
05:26
<Hixie>
two things
05:27
<Hixie>
1 - when you have a moment, i would appreciate your feedback on the WindowProxy section in the spec
05:27
<heycam>
k i'll have a look during the week
05:27
<Hixie>
(it's very short)
05:28
<Hixie>
2 - we have to say that the "this" keyword in JS at the global scope returns the WindowProxy object instead of the actual global object
05:28
<heycam>
uh oh
05:28
<Hixie>
i'm unsure whether it's best to say this in WebIDL, with the JS binding stuff, or if we should say it in HTML5, in a JS-specific section
05:28
<Hixie>
(or in ES 3.x, but they probably don't want a forward dependency on html5)
05:29
<heycam>
so it's a violation of es3?
05:29
<heycam>
i haven't looked much into the whole split window thing: is it that "this" at the global scope returns one object, while a different object is in the scope chain?
05:33
<heycam>
as to whether it's more appropriate in web idl or html5, dunno
05:33
<heycam>
guess i'll need to look at what exactly it entails
05:34
<Hixie>
yeah it's a violation of es 3.1
05:34
<Hixie>
yes, the top object in the scope chain is the global object (the Window object for that script's Document)
05:34
<Hixie>
and the .window, this, etc, attributes all return a WindowProxy object
05:37
<heycam>
so perhaps it needn't be a violation, it's just that evaluating the contents of a <script> is done with some extra futzing of this/scope chain, rather than being whatever a global code execution says in ES
05:38
<heycam>
in web idl, i don't say anything about what object is the global object, or what happens when a top level script is executed
05:38
<heycam>
i just say what properties exist on the global, and what their functionality is, etc.
05:38
<Hixie>
html5 mentions what the global object is, without mentioning its importance to js
05:39
<heycam>
would you want an explicit hook for html5 to define what's the global object?
05:39
<heycam>
(an explicit hook in web idl that is?)
05:39
<Hixie>
i'm ok with the current text, but i'm also very happy to make it explicit, sure
05:39
<Hixie>
if we do add such text, we can add text to make |this| different at the same time
05:40
<Hixie>
see a recent e-mail on whatwg about this for some background, btw
05:40
<Hixie>
from bz i think
05:45
<heycam>
so this behaviour is the same when executing a <script> as when, say, firing an event listener?
05:49
<heycam>
maybe web idl should have a definition for executing a script with a particular object in the scope chain?
05:50
<heycam>
i see the definition for "The script settings determined from the node" seems pretty generic
05:50
<heycam>
would you imagine this split window thing would be applicable to other languages?
05:50
heycam
wonders why some <dfn>s are italicised now
05:52
<heycam>
it'd be nice if whatwg⊙wo mails had an Archived-At header, like w3.org lists
05:52
<Hixie>
sorry, had a real life distraction
05:53
<Hixie>
yes, the behavior is the same for all scripts, javascript:, <script>, event handlers, whatever. Though event handlers already have an explicit definition of the scope chain somewhere in html5.
05:53
<heycam>
so the only diff is the replacement of the scope chain object with some other object (the "inner" window)
05:53
<Hixie>
do other languages have a global object, even?
05:53
heycam
shrugs
05:54
<heycam>
gotta go, bbl
05:54
<Hixie>
k
05:54
<Hixie>
later
05:54
<Hixie>
thanks
12:25
jgraham
has parts of html5lib limping along in python 3
12:26
<jgraham>
Like enough to do html5lib.parse(b"<html></html>) but not enough to deal with the possibility of passing in a file object with the encoding set but the wrong error handling mode
12:26
<jgraham>
Which I don't really know how to deal with apart from saying "don't do that"
12:28
<jgraham>
This is, of course a distraction and I should really be concentrating on the MathML + SVG stuff but it is quite a fun distraction
12:31
<takkaria>
it's interesting. when it came to data interchange, people first jumped on XML for it. now they're jumping on RDF-in-XML
12:31
<takkaria>
I wonder if there's going to be an abstraction over that at some point, too
13:34
<gsnedders>
Is it bad that Safari's lack of regex in it's find box is annoying
13:34
<gsnedders>
*its
13:34
<gsnedders>
?
13:44
<Lachy>
gsnedders, I don't know of any browser that supports regex searching
14:29
<gsnedders>
Does anyone have any data about how common different character sets are?
14:32
<karlcow>
Lachy: https://addons.mozilla.org/en-US/firefox/addon/6534 something like this maybe?
14:35
<Lachy>
karlcow, thanks
15:06
<hsivonen>
http://video.dld-conference.com/watch/dTMYg3z?t=dld09 at 20 minutes
15:11
<Philip`>
gsnedders: Do you mean something like http://philip.html5.org/data/charsets.html ?
15:11
<gsnedders>
Philip`: yes
15:11
<Philip`>
gsnedders: I don't think anyone has any data like that
15:12
<gsnedders>
Philip`: That data there should be enough to work from
15:12
<Philip`>
Oh, right, I have data like that - that was convenient
15:13
<gsnedders>
Philip`: :P
15:13
<gsnedders>
I'd be interested in a bigger less bias sample, though
15:13
<gsnedders>
(Seeming dmoz is bias towards English)
15:14
<Philip`>
I don't think the concept of "less bias" exists
15:14
<Philip`>
You can just choose the direction in which you want the bias to be
15:14
<Philip`>
Ooh, a blizzard
15:15
<Philip`>
(dmoz is definitely biased towards English, and I think it's also biased towards western European)
15:16
<Philip`>
(so it's really bad for e.g. Chinese sites)
15:17
<Philip`>
*.uk: 161279 pages
15:17
<Philip`>
*.de: 295359
15:17
<Philip`>
*.jp: 119322
15:17
<Philip`>
*.cn: 7582
15:17
<jcranmer>
.zh?
15:18
<Philip`>
*.zh: 36
15:19
<Philip`>
Whoops
15:19
<Philip`>
That's because '.' was a regexp
15:19
<Philip`>
If I make it \. then there's 0
15:19
<Philip`>
(The other numbers don't change much)
15:20
<gsnedders>
Philip`: Yeah, that's my problem.
15:20
<gsnedders>
What's ZH?
15:20
<annevk>
hsivonen, interesting
15:21
myakura
thinks those *.jp sites are years old
15:21
<annevk>
hsivonen, didn't know he was CEO of CC
15:21
<annevk>
(that Flash player they are using is also really cool btw)
15:22
<Philip`>
gsnedders: Alexa's list might be better, since it's got ~20K for *.cn (and similar for *.jp and *.uk)
15:23
<gsnedders>
Philip`: But do you have data for that? :P
15:23
<Philip`>
gsnedders: (though it only lists sites rather than pages, so it's entirely biased towards the initial entry pages)
15:24
<Philip`>
gsnedders: No, but you could easily download the list and then download a few thousand .cn pages and search them for patterns :-)
15:31
<karlcow>
Philip`: there would be benefit of having separate sets for different countries. I had this discussion with blooberry about the mama too.
15:32
<karlcow>
The issue is how to create these sets
15:33
<Philip`>
karlcow: The problem is that it would encourage comparison between datasets, and comparisons are very dangerous because you're usually comparing the bias of the samples rather than real differences
15:33
<karlcow>
There is also all the invisible Web. The one behind passwd or intranets
15:34
<karlcow>
Philip`: yes indeed, you have to always give the context of your studies, plus a bit of error calculations.
15:34
<karlcow>
There is nothing *true* in sample surveys, just giving you hints but not certainty
15:35
karlcow
has suddenly a big wave of astrophysical studies and research coming back to his head
15:36
<blooberry>
With publicly available sets, the burden of determining the set is pretty much off the entity that is doing the analysis. Any other set requires a lot of extra legwork, both in getting and describing the additional URLs, examining its bias. Not to say that publicly available sets don't have bias, it is just that they are more easily documented.
15:37
<annevk>
can someone quickly tell me whether the character that e.g. &euml; represents can have multiple Unicode representations?
15:37
<annevk>
e.g. its own character and one using two characters that combine into one?
15:37
<Philip`>
karlcow: Error calculations don't seem especially helpful here - I can say "(4.56+/-0.07)% of pages in dmoz.org have property P" based on the sample size, but that says nothing meaningful about other data sets (e.g. dmoz.org yesterday or tomorrow, or some other selection of pages)
15:39
<karlcow>
Philip`: again context… and calculation of confidence depending on your data set.
15:40
<gsnedders>
annevk: What is euml?
15:40
<annevk>
data:text/html,&euml;
15:40
<gsnedders>
annevk: Yes, it has at least two representations
15:40
<annevk>
thx
15:41
<karlcow>
U+000EB
15:41
<gsnedders>
karlcow: Yeah, I could find that. I don't however know Unicode codepoints off the top of my head. :)
15:42
karlcow
neither
15:48
<blooberry>
So, is there a way to use available URL sets that is both reproduceable and has the ability to improve the set in ways that are statistically meaningful?
15:48
<karlcow>
ah the combining diaresis is 0308
15:48
<karlcow>
and the diaresis is 00A8
15:49
<karlcow>
blooberry: as a downloaded set or as an accessible set ?
15:49
karlcow
will have to explain that
15:51
<karlcow>
an URL set is "biological". It changes in time, because of the nature of the Web. So for example to compare the results of two different tools, we need a downloaded dated set of URIs and their content (and metadata going with it)
15:52
<blooberry>
karlcow: *nodding agreement*
15:53
<karlcow>
Then there is the quality of the set… and here same kind of issues than in astrophysics when I was doing star fields research. At a point you have to select the stars, because if you pick up randomly, you get many interesting stats, but not really usable in a scientific way
15:54
<blooberry>
What if you balance it between selected and random?
15:54
<karlcow>
I guess the balanced set helps you to test the quality of the random
15:55
<karlcow>
s/balanced/selected/
16:12
<karlcow>
the selected set helps you to test hypothesis in a controlled environment. For example, new Web sites released by Web agencies (recent contracts) to see what are the new professional practices.
16:14
<karlcow>
on a Japanese only set and on a few years, we could also see for example if people are switching to utf-8 or not.
16:14
<karlcow>
without necessary knowing if it's a local choice or a constraints of the tools to produce Web sites.
16:16
<karlcow>
blooberry: did you play with different user agent strings for the MAMA, faking the string to see the variability?
16:20
<blooberry>
That's been on my list of things to do. I played around with it once with the Alexa Top 1000 and had interesting results, but they were hard to interpret.
16:20
<blooberry>
I've never come up with the perfect way to compare the results between UA strings.
16:20
<blooberry>
I kept finding different things that were variable between versions.
16:24
<blooberry>
I pulled content using the latest UAs from Opera, FF, and MSIE, as well as the forthcoming Opera UA, as well as a UA called "foo" for comparison. 8-}
16:28
<karlcow>
blooberry: hehe. The Power of Foo! known by old monks in China for generations. They tackle issues on bad Web pages.
16:29
<blooberry>
yes, they are the experts in Kung Foo
16:30
<blooberry>
I found that I had to ignore both IMG references and hyperlink HREFs, because many sites randomize the arguments for them to aid tracking...it just makes the analysis job harder. ;-}
16:32
<blooberry>
At a certain point I was really wondering if what I was deleting was too crucial to throw away. I'm not sure of the answer to that.
20:59
<Lachy>
Looks like some people involved with HTML4All have started creating their own HTML spec http://html4all.org/mailman/archives/list_html4all.org/2009-February/001038.html
21:04
<gsnedders>
I'm just confused trying to read the wiki draft
21:08
Hixie
looks to see if they have any good ideas worth taking into html5
21:15
<Lachy>
my script to generate the element summaries for the authoring guide is working well, though it's not quite finished
21:16
<karlcow>
they called it "4.1"
21:16
<gsnedders>
5 > 4.1. EOF;
21:16
<gsnedders>
:P
21:16
<karlcow>
gsnedders: and you want candies with that?
21:17
<gsnedders>
karlcow: Nah, not really
21:17
<gsnedders>
karlcow: A minion that could write an English essay for me would be nice though
21:17
<karlcow>
or maybe an enhancer, because it seems there is an issue with the length of your thing
21:18
<Lachy>
gsnedders, do you know how I can use the html5lib serialiser to pretty print the output from my script?
21:19
<Lachy>
or at least print it in a way that doesn't insert new lines everywhere
21:19
<gsnedders>
Lachy: Not off the top of my head
21:19
<Lachy>
I'm using the example provided here http://code.google.com/p/html5lib/wiki/UserDocumentation#Serialization_of_Streams
21:20
<Lachy>
but it seems the print statement adds a LF after every line. Is there a way to stop that?
21:21
karlcow
wonders if they built it from scratch or they if took the source of html 5 and changed some of the things
21:21
<gsnedders>
karlcow: Parts are from html5 (run through tidy and converted to XHTML 1.0 Strict), others are from scratch, as far as I can tell
21:21
<Lachy>
ah, it works if I use: "print item," The comma must suppress the LF
21:22
<karlcow>
that gives now html5, html 4.1, html6, hmm what else
21:22
<karlcow>
gsnedders: ah. thanks
21:22
<gsnedders>
Lachy: You probably want to just write to file directly
21:23
<karlcow>
http://html4all.org/wiki/index.php/Inaugural_Members
21:23
<Lachy>
gsnedders, I figured if I write to standard output, then I can eventually just pipe it to other scripts that incorporate those summaries into the main document
21:23
<gsnedders>
Lachy: import sys
21:24
<gsnedders>
Lachy: sys.stdout.write(item)
21:24
gsnedders
shrugs
21:25
<karlcow>
hmmm mainly rob http://html4all.org/wiki/index.php/Special:Recentchanges
21:26
<karlcow>
at least Rob does all the wiki edits
21:27
<Lachy>
gsnedders, thanks
21:27
<Lachy>
Hixie, this is the output from my script so far http://lachy.id.au/temp/template.html
21:28
<gsnedders>
Because authors love IDL!
21:28
<Lachy>
gsnedders, I need to do something better with the IDL, eventually
21:28
<Hixie>
Lachy: looks familiar :-P
21:28
<Lachy>
I just have to figure out what and how. For now, it's easier to just stick it in there as-is as a placeholder
21:29
<Lachy>
Hixie, it should. My script just steals all the content from the script and reformats it
21:29
<Hixie>
:-)
21:30
<karlcow>
http://code.google.com/intl/fr/search/#q=%22import%20html5lib%22
21:31
<karlcow>
http://search.koders.com/default.aspx?s=%22import+html5lib%22&btn=&la=%2A&li=%2A
21:32
<karlcow>
hmm beautifulsoup not a lot of results either http://search.koders.com/default.aspx?s=%22import+BeautifulSoup%22&btn=&la=%2A&li=%2A
21:33
<gsnedders>
It doesn't even include Anolis! :'(
21:48
<Lachy>
http://dev.w3.org/html5/html-author/#the-html-vocabulary-and-apis
21:52
<Lachy>
karlcow, as you requested yesterday, the script to generate the element summaries is now in CVS http://dev.w3.org/html5/html-author/utils/
21:52
<Lachy>
see elements.py and elements.template.html
21:54
<karlcow>
excellent
21:54
<karlcow>
lachy++
21:55
<Philip`>
Lachy: "print item," prints a space at the end, which probably isn't what you want
21:56
<Lachy>
Philip`, ok. I changed it to sys.stdout.write() as gsnedders suggested anyway
21:57
<Philip`>
(Ooh, the ground is covered in snow, for the first time since ages and ages ago)
21:58
<Philip`>
karlcow: Your searches will miss people writing "from BeautifulSoup import ..."
22:04
<Lachy>
http://dev.w3.org/html5/html-author/#the-html-vocabulary-and-apis now has category descriptions, and appropriate links from each element summary
22:06
<karlcow>
Philip`: indeed.
22:20
<gsnedders>
gsnedders top tip: remember the syrup when making flapjack.
22:21
<gsnedders>
It doesn't work when you forget it
22:47
<olliej>
yoyo all
23:28
<Lachy>
Hixie, what's so wrong with the datagrid API that now apparently needs to be rewritten?
23:30
<Hixie>
it's synchronous
23:30
<Hixie>
alex russel sent mail about it a few weeks ago
23:31
<Lachy>
ok