00:17
<TabAtkins>
Phew, nearly done with DOM.
00:18
<TabAtkins>
Only at line 4500, but it moves extremely fast now that most links have been converted.
00:18
<TabAtkins>
Then there's the slogging task of fixing all the linking failures, which hopefully shouldn't be too long.
01:43
<strugee>
Hixie: ping
01:43
<Hixie>
yo
01:43
<strugee>
we were going to discuss developers.whatwg.org
01:44
<Hixie>
yesyes
01:44
<strugee>
ok
01:44
<strugee>
so
01:44
strugee
looks at the GitHub issue
01:47
<strugee>
hmm
01:47
<strugee>
so how complex is your script?
01:47
<strugee>
do you actually have to parse the entire document into a DOM-like structure, or can you parse individual parts?
01:49
<Hixie>
i parse the entire thing into a dom
01:49
<Hixie>
then do extensive manipulations on it
01:49
<Hixie>
then serialise it back out
01:50
<strugee>
right
01:50
<strugee>
but do you actually need to?
01:50
<Hixie>
what would constitute a need to?
01:50
<Hixie>
i mean, i dunno how to answer that question
01:51
<Hixie>
the code to do it is http://software.hixie.ch/applications/wattsi/src/wattsi.pas
01:51
<strugee>
e.g. to change the <title> you could find the beginning and end of <head> and parse that, then change <title>
01:51
<strugee>
so you wouldn't have to parse the entire document
01:51
<strugee>
ok, thanks for the link
01:51
<Hixie>
how do you "find the beginning and end of <head>" without parsing it?
01:52
<Hixie>
(the code above isn't quite complete, there's also some glue perl code that isn't published anywhere)
01:52
<Hixie>
(but the heavy lifting is in that file)
01:53
<strugee>
you just search for the keyword "<head>" in the string
01:53
<strugee>
obviously that doesn't work on arbitrary documents but since the source is in a known state I think we can get away with it
01:53
<strugee>
unless I'm doing something dumb?
01:54
<Hixie>
i mean, i don't guarantee that i won't change that to <head class="draft"> or something
01:55
<Hixie>
in particular, i don't see how that would work for the headings and cross-references
01:55
<Hixie>
which are all over the place
01:55
<strugee>
so use the keyword "<head" instead of "<head>"
01:55
<strugee>
ok, that was what I was asking
01:55
<Hixie>
i don't guarantee i won't spell it "<HEAD" or remove the tag and rely on it being implied
01:55
<strugee>
right
01:55
<Hixie>
who knows what i'll be doing 10 years from now :-)
01:56
strugee
chuckles
01:56
<strugee>
ok, scratch that idea :)
01:56
<strugee>
are you familiar with Node.js streams?
01:56
<strugee>
that was what I was trying to make it fit into
01:57
<Hixie>
i'm not very familiar with Node.js
01:57
<Hixie>
only done one project in it and it was a while ago
01:57
<strugee>
ok
01:58
<strugee>
the basic idea is that you do transforms on a stream of text instead of a string
01:58
<Hixie>
i mean i'm familiar with the idea of streams in general
01:58
<Hixie>
they're not new :-)
01:59
<strugee>
right :)
01:59
<Hixie>
i don't think you can use a one-pass approach with this though, since the cross-references go both ways
01:59
<strugee>
ok
01:59
<strugee>
makes sense
01:59
<Hixie>
i mean, at the most basic level, if you want to output the table of contents at the start, you need to have seen the last heading before you output the table of contents
02:00
<strugee>
yeah
02:01
<strugee>
I wonder if you could collect all the footnotes, headers, etc. in one pass and then write them in in a second
02:01
<strugee>
if worst comes to worse I can just buffer the document and parse
02:01
<Hixie>
well, html parsing as a general case is a non-streaming operation
02:02
<Hixie>
but you can probably assume that i don't rely on things like bogus table parsing and so skip those parts
02:02
<Hixie>
i guess
02:02
<Hixie>
writing an html parser using the spec's built-in model is a pretty large task as it is
02:03
<Hixie>
writing one in another model entirely is a whole extra level of pain
02:03
<Hixie>
since you have to be able to reason about all the implications
02:05
<strugee>
yeah, no kidding
02:08
<strugee>
hmm
02:09
<Hixie>
that's why i was saying you really want an off-the-shelf parser if you're not relying on the output of mine
02:10
<Hixie>
(though it has to be a pretty configurable one since i add a new void element)
02:10
<Hixie>
(and a bunch of attributes)
02:11
<Hixie>
the alternative is you tell me what the output should be and i just generate it
02:11
<strugee>
ok
02:12
<Hixie>
i mean, i already generate https://html.spec.whatwg.org/.wattsi-output/multipage-dev/
02:12
<Hixie>
which is pretty close, i'd wager
02:12
<Hixie>
that output would be very regular html
02:12
<Hixie>
which you could then massage more easily
02:13
<Hixie>
if you wanted to
02:15
<strugee>
ok
02:15
<strugee>
yeah, that would be nice
02:16
<strugee>
what's done to that version?
02:16
<strugee>
transform to HTML and strip non-dev content?
02:17
<strugee>
if so then I think we'd just need to insert styles and fix the title
02:17
<strugee>
or not even the title, since I think that's done already in the source
02:18
<Hixie>
what should the style link be?
02:20
<Hixie>
i've added a <link> to styles.css in the same directory
02:20
<Hixie>
want some <script src=""> too?
02:21
<Hixie>
and anything else?
02:21
<Hixie>
the old developers.w.o version had some different handling of references
02:21
<Hixie>
i'm happy to do something with that too
02:22
<strugee>
yeah, I think we'll need scripts
02:22
<strugee>
I can take care of minifying it into one file
02:22
<strugee>
so just a scripts.js in the root would be fine, I think
02:27
<strugee>
I'm getting 404 on https://html.spec.whatwg.org/.wattsi-output/multipage-dev/
02:28
<strugee>
and https://html.spec.whatwg.org/.wattsi-output/ shows a directory listing with nothing in it
02:29
<Hixie>
yeah the caniuse.json data is broken
02:29
<Hixie>
which is breaking my script
02:31
<strugee>
ok
02:32
<strugee>
so when that's fixed, stuff will magically appear in that directory?
02:32
<Hixie>
yeah
02:32
<Hixie>
well
02:32
<Hixie>
not when caniuse.json is fixed
02:32
<Hixie>
but when i work around it
02:32
<Hixie>
in a few minutes
02:32
<Hixie>
i tried deleting the file
02:32
<Hixie>
and my script was unhappy
02:33
<Hixie>
man my script really freaking wants real data in this file
02:33
<Hixie>
hm this file isn't broken, wtf
02:34
<Hixie>
maybe there's a bug in my json parser
02:35
<Hixie>
something fishy going on
02:35
<strugee>
ok
02:35
<strugee>
keep me posted
02:37
<Hixie>
wt_f_
02:37
<Hixie>
wget is screwing me
02:38
<Hixie>
$ wget -o /dev/null -O caniuse.json --no-check-certificate https://raw.githubusercontent.com/Fyrd/caniuse/master/data.json
02:38
<Hixie>
$ wget -o /dev/null -O caniuse2.json --no-check-certificate https://raw.githubusercontent.com/Fyrd/caniuse/master/data.json
02:38
<Hixie>
$ sha512sum caniuse*
02:38
<Hixie>
10acc62e0124195a1378b1ea79a91db4f13d861629afdcf76a928c2e7b359a890cdc0d5ee261d138aa4a86544b8b648a4cd44cdb610a7391fdddda861594b688 caniuse2.json
02:38
<Hixie>
805859fcdf2d05c06e3f35a0bc7f539d7dbe02d613c06dc4475592bd5e9204161a659573edfff96ae0121c00bbcd3fa89d1802fbd07507714e180c4ffbd673cc caniuse.json
02:42
<strugee>
it couldn't be changing that fast, could it?
02:42
<Hixie>
no
02:42
<Hixie>
no idea what's up
02:42
<strugee>
let me try
02:44
<Hixie>
it worked once i nuked the existing file on disk a second time
02:44
<Hixie>
dunno what's up with THAT
02:44
<Hixie>
wget trying to do some sort of completion or something
02:44
<strugee>
_weird_
02:44
<strugee>
mine match, by the way
02:44
<strugee>
I did it with curl
02:44
<Hixie>
mine too, now. :-/
02:45
<Hixie>
anyway, look now
02:45
<strugee>
ok
02:47
<strugee>
ok, fantastic
02:47
<strugee>
by the way, what's link-fixup.js?
02:48
<strugee>
it's 404ing in my console when I load multipage-dev/
02:48
<Hixie>
handles breaking links
02:48
<Hixie>
try html.spec.whatwg.org/multipage/#the-canvas-element
02:49
<strugee>
s/multipage/multipage-dev/?
02:51
<Hixie>
in theory it works for the multipage-dev version too
02:51
<strugee>
nevermind
02:51
<Hixie>
but i was giving you an example of it working on the spec multipage version
02:51
<strugee>
no, I see what you were saying now
02:52
<caitp>
is there any reason why the single page version exists? you can't find anything on it anyway
02:52
<strugee>
are you talking about submission thing at the bottom?
02:52
<caitp>
99/100 times, cmd+f yields nothing useful
02:53
<caitp>
not that trying to find anything in the multipage version is any better
02:53
<caitp>
you need like a dxr for the spec
02:54
<strugee>
or we should put the built version in Git or something so you can grep it
02:55
<caitp>
eh but then i'd have to keep a fork of it
02:55
<strugee>
*shrugs* set up some automation
02:57
<Hixie>
we have a copy in git, for the record :-)
02:58
<Hixie>
on github
02:58
<Hixie>
why does cmd+f not work for you? is your browser broken?
03:09
<strugee>
all right, laptop's dying
03:09
<strugee>
Hixie: it's been a pleasure
03:10
<strugee>
I'll ping you on GitHub or something if I need anything else
03:10
<Hixie>
np. keep me posted about what you want.
03:10
<strugee>
thanks!
03:10
<caitp__>
it's not that the browser is broken, it's that A) the document is too large, B) there are too many references to key phrases, C) many of the references don't lead anywhere useful, or lead to the references section, which in turn leads nowhere useful
03:10
<caitp__>
it's basically organized really badly
03:11
<caitp__>
a dxr would help a lot
03:11
<caitp__>
cross referencing the referenced specs, and not all being a client-side operation
03:11
<caitp__>
would make it a lot easier to work with
03:12
<Hixie>
if there are problems, file bugs with concrete suggestions
03:13
<Hixie>
cos "it's basically organized really badly" is really unactionable given that i've spent the last 11 years organising it as best as i can :-)
03:13
<Hixie>
more or less full-time :-)
03:13
<caitp__>
"dxr" is a pretty concrete suggestion, I think
03:13
<caitp__>
server-side spec searching and cross-referencing, that's pretty concrete :p
03:14
<Hixie>
i've no idea what you mean by this
03:14
<caitp__>
huk
03:15
<caitp__>
okay so the classical example would be, suppose you had to load the entire source tree of chromium into your browser, formatted
03:15
<caitp__>
that would be a pretty big document, right?
03:15
<caitp__>
searching it would be hard, expensive, and most of the time wouldn't turn up anything particularly useful
03:16
<caitp__>
the spec is pretty big, searching it frequently turns up nothing useful
03:17
<caitp__>
so it would be nice to provide better ways of doing this, and even caching entries for common searches
03:17
<Hixie>
what are you talking about. searching the spec works great, i do it all the time.
03:17
<Hixie>
plus every term is cross-referenced in both directions.
03:17
<caitp__>
mmmm not so much
03:17
<Hixie>
with a little popup that shows all the references
03:18
<caitp__>
even doing something as simple as looking up interfaces takes way longer than it should
03:18
<Hixie>
and i've watched chromium developers, they basically use "grep" which is more or less the same as find-in-page
03:18
<Hixie>
there's an interface index
03:18
<caitp__>
grep is faster, but again, means keeping a fork
03:19
<Hixie>
grep is the same speed as find-in-page
03:19
<caitp__>
no it really isn't
03:20
<Hixie>
are you using some ancient browser or something?
03:20
<caitp__>
no
03:21
<caitp__>
I mostly use canary on an 8 core machine, but it's not any better with stable, or safari 8, or FF nightly or stable
03:21
<caitp__>
it's a big document, it loads slowly, and referenced terms link nowhere useful, and unreferenced terms link nowhere
03:22
<caitp__>
it's not very usable for quickly looking something up
03:22
<caitp__>
and if I'm on a laptop, forget it =)
03:22
<Hixie>
for each "referenced terms link nowhere useful" case, file a bug.
03:22
<Hixie>
saying what you want it to link to.
03:23
<Hixie>
for "unreferenced terms" i've no idea what you mean
03:23
<caitp__>
they all lead to the references section, which means A) i lose my place which was very difficult to get to in the first place, and B) don't immediately link to anything useful, just a references section
03:23
<caitp__>
"unreferenced" eg, not in the references section, no popup, no hyperlink
03:23
<caitp__>
just text
03:23
<Hixie>
but for the original complaint, find in page... it's pretty much instantaneous for me. i don't understand the problem.
03:23
<Hixie>
how is grep faster?
03:24
<Hixie>
i've no idea what you mean by "they all lead to the references section"
03:24
<Hixie>
what is "all" here?
03:24
<Hixie>
i mean there's 4313 <dfn> elements in the document
03:24
<Hixie>
so quite obviously, not all the cross-references are to the references section
03:24
<Hixie>
in fact hardly any are
03:24
<caitp__>
I'm not sure what your language for it is, but a cross-referenced term is typically a link which opens a popup window, with links which lead to a references section at the bottom of the document
03:25
<caitp__>
and by "typically" I mean always, as far as I can tell
03:25
<Hixie>
can you give me an example?
03:25
<caitp__>
pretty much any time I find one, that's what happens :p
03:25
<Hixie>
ok find one
03:25
<Hixie>
tell me what it is
03:25
<Hixie>
i'm at a loss as to what you mean
03:25
<Hixie>
the only links into the references section i'm aware of are the [FOO] links
03:26
<Hixie>
which are, well, the links to the references section.
03:26
<Hixie>
and i don't think any of those do any popups or anything
03:26
<Hixie>
(there is an open bug on making them more helpful)
03:42
<caitp__>
i still prfer the dxr idea
03:51
<Hixie>
caitp__: can you give me an example of what you mean? what are these cross-references that aren't useful?
03:51
<caitp__>
i'm on my laptop, too much to open the single page site
03:52
<caitp__>
typical experience though, is looking up details on some attribute in an idl interface
03:52
<Hixie>
the multipage copy has the same links
03:52
<caitp__>
leads to a references link
03:52
<Hixie>
so feel free to give me the links from there
03:55
<MikeSmith>
I do searches in the single-page spec all the time but I do it from elinks
03:55
<MikeSmith>
or lynx
03:55
<MikeSmith>
but I guess that's not a general solution that a lot of other people would be happy with
03:57
<MikeSmith>
but I'll also agree that trying to load and use the the single-page rendered version in a real browser is not practical
03:57
<Hixie>
i mean, it's the only way i use the spec
03:57
<Hixie>
and i do it all the time
04:16
<zewt>
i find the multipage one unusable and always use the single page one (broken backrefs are a dealbreaker)
04:16
<MikeSmith>
I use a 4-year-old laptop most of the time so I guess I should get a faster machine
04:22
<caitp__>
i guess everyone has different experiences with it
04:23
<caitp__>
online search + fast loading smaller pages would go a long way though, i think
04:24
<Hixie>
MikeSmith: i mean, the desktop i use for this is from 2010...
06:25
<zewt>
why am i still using chrome? now it's saying "foo.zip may harm your browsing experience, so Chrome has blocked it", and it literally won't let me save the file
06:26
<zewt>
pretty unbelievable
06:28
<caitp__>
it just wants you to have the best browsing experience possible and gets a bit carried away
06:28
<zewt>
by making me download this file twice (because it already downloaded all of it, it just won't save it)
06:29
<zewt>
at least half a gig isn't as much as it used to be
06:44
<KevinMarks_>
Do any implementations of audio or video tags support media fragments in sources?
07:00
<annevk>
TabAtkins: well, DOM wasn't built in a day
07:07
annevk
uses single-page without a problem too, find in page combined with backreferences is pretty much all I use...
07:37
<TabAtkins>
strugee: Please dont' try and do text-based transforms on an HTML document. Just use a real HTML parser and deal with it that way. Python has html5lib; I'm unsure of what standards-compliant parsers exist in JS.
07:43
<annevk>
https://github.com/aredridel/html5 perhaps?
07:52
<Domenic>
Also parse5
10:30
<strugee>
TabAtkins: yeah, I'm off that train. Hixie talked me out of it
10:31
<strugee>
it wasn't that hard. I didn't hold out much hope that it could work anyway :)
18:05
<annevk>
Very hard not to 386 https://bugzilla.mozilla.org/show_bug.cgi?id=999544