02:21
Philip`
wonders if there could be getImageData(sx, sy, sw, sh, hires) where hires=true makes it return the full-resolution backing bitmap with width!=sw, otherwise it scales the bitmap down and returns with width==sw
02:21
<Philip`>
so people who don't know or don't care can just do getImageData(sx, sy, sw, sh) and get what looks like sensible pixels, whereas people who really want to do high-quality filters can set hires and take more care
02:22
<Philip`>
(It's compatible with existing browsers and content, too...)
02:23
<othermaciej>
Philip`: putImageData would have to have the same flag
02:23
<othermaciej>
Philip`: then the get/put invariant would hold only in hires mode
02:24
<Philip`>
Yep
02:24
<Philip`>
so people who just want to assume canvas pixels are really pixels can do so, and they'll get some visual degradation if they start doing get/putImageData, but it'll do what they expect
02:26
<Philip`>
(Most people aren't going to do anything with ImageData, so normal canvas code will look high-res and fine)
02:48
<othermaciej>
yeah
08:14
<annevk>
A hires flag could work. Or maybe we should keep that as an option for future extensions...
08:16
<hsivonen>
Roger Johansson's way of saying that I win makes me think that he thinks I said something so crazy that it isn't worth talking about.
08:17
<hsivonen>
http://www.456bereastreet.com/archive/200705/is_html_5_a_slippery_slope/#comment26
08:20
<annevk>
Hah, only entity declarations... Guess he didn't read the specification very carefully then
08:20
<annevk>
XML parsing can be very simple. You simply have to ignore the hard parts.
08:20
<annevk>
(At that point error handling comes almost for free.)
08:34
<mikeday>
heh
08:36
<othermaciej>
html parsing is also easy if you ignore the hard parts
08:37
<annevk>
yeah
08:37
<annevk>
HTML is easy to understand if you don't go down the rabbit hole
08:47
<mikeday>
For any X, X is easy if you ignore the hard parts of X.
08:49
<othermaciej>
for some X, it's all hard parts
08:52
<mikeday>
right, but ignoring them reduces it to the empty set, which sounds pretty easy :)
08:59
<mikeday>
Here is a question: in the US-ASCII encoding, is 0x7F a legal octet?
09:00
<othermaciej>
ASCII 0x7F is the control character DEL
09:01
<mikeday>
so basically any 7-bit code is fine, only high bit set is garbage
09:01
<othermaciej>
also known as ^?
09:02
<othermaciej>
0x00 - 0x1F and 0x7F are not printable but they are defined
09:03
<mikeday>
that's fine then
09:04
<mikeday>
presumably in Latin1, 0x80 to 0x9F are garbage and need to be turned to U+FFFD?
09:04
<mikeday>
or passed through as is for compatibility with Windows-1252? :/
09:08
<othermaciej>
web pages that claim to be latin-1 have to be processed as windows-1252
09:09
<othermaciej>
in fact, you even have to treat pages that claim to be unicode w/ range 0x80-0x9f treated as the win-latin1 characters
09:09
<annevk>
yeah
09:09
<annevk>
and all character references in that range too
09:10
<annevk>
HTML5 hasn't defined that yet
09:10
<othermaciej>
so yes, the unicode spec is universally violated on the web
09:10
<annevk>
it should just be redefined...
09:10
<annevk>
that would be nice, actually
09:10
<othermaciej>
Unicode5!
09:11
<othermaciej>
unfortunately, I think that would probably break non-web uses of unicode
09:11
<annevk>
this would also only apply to text/html I suppose?
09:11
<othermaciej>
(though maybe not, depending on how other Microsoft apps handle it)
09:11
<othermaciej>
in Safari I think we apply it to XML, but perhaps there is no need to
09:12
<othermaciej>
well, I guess actually someone should test it in XHR in IE
09:12
<othermaciej>
at least you can make a web browser without violating the TCP or IP standards
09:12
<othermaciej>
but I think you have to violate just about everything else in the protocol / format stack
09:15
<othermaciej>
actually, I don't know if the control characters are used for anything so it might not be incompatible in practice
09:15
<annevk>
interesting
09:15
<mikeday>
hmm, so even 0x80-0x9f encoded in UTF-8 need to be passed through, that's cute
09:16
<othermaciej>
mikeday: yes, or UTF-16, or UTF-32 -- it is deeply lame
09:16
<annevk>
we should just update several encoding / decoding standards :)
09:16
<mikeday>
in Prince we have a hack to handle mislabelled pages that claim to be Latin1 but are actually Windows-1252
09:17
<mikeday>
because fonts won't have glyphs for control codes, so we fallback to the Windows-1252 character instead
09:17
<mikeday>
eg. we can't find a glyph for 0x85, so we fallback to U+2026
09:17
<mikeday>
which is what 0x85 maps to in Windows-1252
09:17
<mikeday>
that way the hack fits in at the font / glyph layer rather than the UNICODE layer; dunno if that's any better though.
09:18
<annevk>
it would fail people doing scripting etc.
09:18
<othermaciej>
that depends - if you have JavaScript support, it won't be sufficient
09:18
<annevk>
or support for :contains()
09:18
<annevk>
it would also fail attribute selectors etc.
09:18
<othermaciej>
we do our hack at the character set decoding layer
09:19
<othermaciej>
(and in the tokenizer when decoding entities)
09:19
<annevk>
html5lib only has the tokenizer bit atm
09:21
<othermaciej>
I think the entity bit is done only by our html parser
09:22
<mikeday>
good points, especially :contains(), which we support
09:22
<mikeday>
none of the sites that screw up the encoding use :contains(), thankfully :)
09:22
annevk
points out that :contains() has been dropped
09:22
<annevk>
I believe the latest proposal is foo[#text^="X"]
09:23
<annevk>
and td[#column=2]
09:23
<annevk>
it might be that I'm mistaken though
09:23
<mikeday>
nice to drop it after we've had it implemented for years
09:23
<mikeday>
td[#column=2] { display: none }
09:23
<annevk>
the problem is that nobody else did
09:24
mikeday
grins
09:24
<annevk>
and I suppose it's significantly easier to support in a UA that doesn't do scripting
09:24
<mikeday>
right
09:24
<annevk>
(#column would be on the semantic level btw, not based on display:table-* crap)
09:25
<mikeday>
?
09:25
<mikeday>
td { display: table-cell } is how we determine what a table cell *is*
09:26
<annevk>
in this case you would need to have some knowledge about the HTML namespace aiui
09:26
<othermaciej>
is #column meant to be the more CSS-ish way to do <col> styling?
09:26
<mikeday>
oh, great.
09:26
<annevk>
othermaciej, yeah, think so
09:27
<annevk>
but so far I haven't seen much more than just that
09:27
<othermaciej>
but column position is determined by HTML markup only?
09:27
<annevk>
it was planned for Selectors 2 or something
09:27
<othermaciej>
I guess that makes sense, to make it work with CSS tables, you would need computed style to determine matching
09:27
<annevk>
othermaciej, iirc, yes
09:27
<annevk>
or any other table markup language you support for that matter
09:27
<othermaciej>
it is funny how often the issue that selectors can't depend on computed style comes up
09:28
<annevk>
:hover
09:28
<annevk>
and :bound-element come to mind
09:31
<othermaciej>
how does it relate to :hover?
09:34
<mikeday>
:hover { display: none } is rather confusing
09:35
<annevk>
for instance
09:37
<mikeday>
hmm, seems like the most annoying thing about parsing is buffer management
09:38
<annevk>
ah, that sounds like something we didn't have to worry about :)
09:38
<annevk>
will your parser have an open source license btw?
09:38
MikeSmith
waves to mikeday
09:38
<mikeday>
hi MikeSmith
09:38
<MikeSmith>
hei
09:39
<mikeday>
annevk, yes, that way I can steal code from you guys, and the rest of the world :)
09:39
mikeday
examines inputstream.py
09:40
<annevk>
well, in theory you can rip html5lib apart and exploit it in some given that the license is MIT
09:40
<annevk>
but, cool!
09:40
<annevk>
s/given/way, given/
09:40
<mikeday>
it would be nice if there were common open source libraries for HTML parsing, though
09:41
<mikeday>
it's something that many applications could benefit from
09:41
<annevk>
yes, so go for it! :)
09:41
<mikeday>
screen scrapers, search engines, unconventional browsers
09:41
<mikeday>
AI projects
09:41
<mikeday>
PDF formatters *cough*
09:41
<annevk>
having an open source HTML parser is vital for the web
09:42
<mikeday>
pity no one thought of that back in 1996 hey :)
09:42
<annevk>
having one separate from browsers is very nice
09:46
<gsnedders>
mikeday: Netscape did in '98 :P
09:47
<mikeday>
?
09:47
<mikeday>
you mean Mozilla? :)
09:47
<gsnedders>
mikeday: Netscape Communicator.
09:47
<mikeday>
did the HTML parser make it through the big rewrite?
09:48
<gsnedders>
mikeday: I don't think so
09:48
<gsnedders>
mikeday: the reason for the rewrite was the lack of quality of the source base, in part
09:48
<mikeday>
heh, gotta love that
09:48
<mikeday>
release something open source, to prompt people to develop a new thing from scratch
09:48
<gsnedders>
mikeday: arguably it was Moz though, as Mozilla was the codename for Netscape Navigator all the time :P
09:49
<gsnedders>
mikeday: the decision to rewrite was made after the open sourcing it. the plan when it was open soruced was to release a Netscape 5 based on that codebase
09:49
<mikeday>
yeah
09:49
<mikeday>
fair bit of work went into it before the decision to rewrite it, right?
09:50
<gsnedders>
mikeday: http://wp.netscape.com/newsref/pr/newsrelease558.html
09:51
<mikeday>
gawd, that's sad reading today
09:51
<mikeday>
"seed the market for Netcenter" indeed
09:51
<mikeday>
"unprecedented levels of innovation in the browser market"
09:51
<mikeday>
yep, the innovation was pretty unprecedented after 1998
09:52
<gsnedders>
mikeday: looking it up, it seems as if they spent a year just getting out all the parts that they didn't have IP of
09:53
<gsnedders>
mikeday: shortly after they did that, it was scrapped
09:53
<mikeday>
whole thing reeks of desperation, but I guess it all worked out alright in the end
09:53
<mikeday>
and by 2008, the web will be slightly improved over the web of 1998 :)
09:53
<mikeday>
more Flash and less Java, anyway :)
09:53
<gsnedders>
it's questionable if Gecko isn't overly bloated like Netscape before it
09:54
<gsnedders>
as far as OSS goes, WebKit and KHTML have _far_ smaller codebases, yet their standards compliance isn't lacking
09:55
<mikeday>
right
09:55
<gsnedders>
but Mozilla 2 is meant to work on cutting that down
09:55
<mikeday>
and the cycle continues
09:56
<gsnedders>
once there starts to be browsers based off WebKit for other OSes, Gecko may start simply being forced to being smaller and quicker to keep marketshare
09:56
<mikeday>
anyone in the mood for a "why are people still developing applications in C++" rant? :)
09:56
<gsnedders>
ergh. no.
09:56
<gsnedders>
:P
09:56
<mikeday>
good :)
09:58
<mikeday>
what is the relationship between Mozilla 2 and Firefox, exactly?
09:59
<gsnedders>
mikeday: "Mozilla" nowadays is mostly all the frameworks
09:59
<mikeday>
so Firefox 2 is using Mozilla 1.x?
10:00
<gsnedders>
yes
10:00
<gsnedders>
Firefox < 4 is using Mozilla 1.x
10:01
<gsnedders>
but Fx4 isn't expected till '09
10:01
<mikeday>
groovy
10:02
<gsnedders>
and the question is whether they can keep up while needing to change so much
10:02
<mikeday>
is Firefox migrating to be a JavaScript/XUL based application?
10:03
<gsnedders>
huh? surely it already is?
10:03
<mikeday>
I thought it still had some C++ components?
10:03
<gsnedders>
well, sure, but the UI is all XUL
10:04
<mikeday>
right
10:04
<gsnedders>
but that's what an XML User Interface Language should be used for :P
10:04
<mikeday>
I just thought the idea was to have more JavaScript, less C++
10:05
<gsnedders>
I'm not sure how much there is for Fx4 yet
10:05
<mikeday>
and less XPCOM
10:05
mikeday
shrugs
10:05
<gsnedders>
my understanding was more standard C++ and less XPCOM
10:05
<mikeday>
ah, like exceptions and RTTI?
10:06
<gsnedders>
no idea
10:06
<mikeday>
oh well :)
10:06
gsnedders
has zero knowledge of C++
10:06
<Philip`>
Is it just me, or are they totally crazy for planning to do automated refactoring of C++ with highly experimental tools?
10:06
<mikeday>
live fast, die young
10:06
<Philip`>
It might work, but I still think it's totally crazy
10:06
<gsnedders>
Philip`: probably yes. but I guess that's why they need so many branches in VCS :P
10:07
<mikeday>
might as well go all the way and write a C++ -> JavaScript converter
10:07
<Philip`>
I'm fairly sure they've been planning that too
10:07
<gsnedders>
here's JIT compiling for JS2… but I can't remember seeing anything about actually moving to JS
10:08
<Philip`>
(at least for things that don't need the performance and don't need access to non-JS-interfaced objects)
10:09
<Philip`>
http://wiki.mozilla.org/Mozilla_2 - "Semi-automated refactoring work: ... identification of C++ ripe for conversion to JS2" - hmm, maybe not automatically doing the actual conversion
10:10
<Philip`>
Oh, but: http://wiki.mozilla.org/Static_Analysis - "Identify C++ to convert to JS2... ... and translate it automatically. C++ candidate code uses only scriptable interfaces, strings, primitives."
10:11
<mikeday>
hmm, so it also depends on JavaScript 2, which is relatively new...
10:11
<mikeday>
all sounds rather risky :)
10:11
Philip`
wonders if it's sensible to be worried that JIT compiling won't exactly improve Mozilla's memory usage situation
10:12
<mikeday>
anyone remember that Java version of Netscape?
10:12
<gsnedders>
but the memory usage should go down in other parts by not having so much code, often reimplemented where OS libraries can be used
10:17
<Philip`>
http://wiki.mozilla.org/Grendel ?
10:19
<mikeday>
yeah, that's the one
10:20
<mikeday>
I guess some things do change
10:20
<mikeday>
no one in 1998 would have believed that the browser would be *implemented* in JavaScript
10:20
<mikeday>
and that Java would be used to implement *server* applications
10:20
<mikeday>
we're through the looking glass now :/
10:21
<Philip`>
Java doesn't seem to have the best track record of making desktop applications that aren't frustratingly slow and ugly
10:22
<mikeday>
indeed.
10:23
<Philip`>
(I think it's a really boring language but it seems to work well enough for other types of application)
10:24
<mikeday>
funny how Java is boring now
10:24
<mikeday>
when it was introduced it was hot! exciting! applets! coffee!
10:24
<mikeday>
first programming language with a major marketing campaign
10:25
<mikeday>
'scuse me
10:27
<Philip`>
C++ templates are 'interesting' and let you write compile-time Turing machines and parser generators and stuff; Java's generics look similar but merely add type restrictions and add zero extra power to the language, which is boring :-(
10:28
<Philip`>
I must admit I did make a web page with animated wobbling coloured text in a Java applet about ten years ago and thought it was great, though
10:38
<mpt>
Anyone remember Corel Office for Java?
10:38
<mpt>
That was awesome
10:38
<mpt>
Launch it, go make lunch, come back, and ooh, the splash screen's up already
10:39
<Philip`>
I think I managed to avoid ever hearing about it
10:39
<Philip`>
which was probably fortunate :-)
10:43
<Philip`>
http://www.somis.dundee.ac.uk/pub/corelindex.htm - I wonder if that version works
10:43
<Philip`>
(Don't have a Java-enabled browser to test with, though...)
10:45
Philip`
installs one
10:46
<Philip`>
"Loading Java Applet Failed" - aha, that must be the "Write once, run anywhere" feature
10:51
<Philip`>
http://www.acm.org/pubs/citations/journals/cacm/1999-42-10/p72-cusumano/ - oh, I didn't know Netscape even made their own Java VM
11:17
<mikeday>
what does charStack[-1] mean in Python? last item in charStack?
11:17
<Dashiva>
yes
11:18
<mikeday>
thanks :)
11:19
<Philip`>
unless charStack is a dict, in which case it's the entry with key -1
11:19
<Philip`>
but a dict called charStack would not seem like a sensible idea
11:20
<mikeday>
it's a good test of a language in a way, how much of it can you understand if you don't know it
11:21
<Philip`>
By the way, I'd be quite interested in a fast (e.g. C(++)) version of html5lib since I've been complaining about how it takes ages to parse files in Python :-)
11:22
<mikeday>
does "be quite interested in" translate to "willing to help code a"? :)
11:23
<mikeday>
just out of interest, how long does it take to parse files with html5lib?
11:23
<Philip`>
I have approximately no time for the next three weeks, but I should be freer after then, so then I'd probably be willing and able :-)
11:24
<mikeday>
sounds good, as the task should take at least 3+n weeks
11:25
<annevk>
once you got something running it would be nice if you could host it online like on code.google.com or something
11:25
<annevk>
if I can find some time I'd be interested in learning some C and looking into it :)
11:26
<mikeday>
will do
11:26
<mikeday>
annevk, you read the whole file into a string, use regexp to do line conversion
11:26
<mikeday>
that's just cheating :)
11:27
<mikeday>
also, I think I found a typo: # Normalize new ipythonlines and null characters
11:27
<mikeday>
s/new ipythonlines/newlines/ ?
11:27
<annevk>
mikeday, htmlinputstream.py is not my job
11:28
<Philip`>
mikeday: For me parsing the HTML5 spec with html5lib, it takes 16 seconds normally, and 11 seconds with Pysco
11:28
<annevk>
jgraham, care to look at that?
11:28
<Philip`>
s/Pysco/Psyco/
11:28
<annevk>
I think in C it should take less than a second
11:29
<mikeday>
libxml2 HTML parser takes 0.3s
11:29
<Philip`>
Bonus points if it works as a chtml5lib module that you can plug in to Python to replace html5lib with zero effort :-)
11:30
<annevk>
mikeday, there you go :)
11:30
<mikeday>
the html5lib code is rather elegant
11:30
<mikeday>
it would be nice if it could be compiled efficiently rather than hand optimising it
11:32
<mikeday>
hmm, the inputstream has a queue of characters that have been read and then putback
11:32
<mikeday>
however, the queue will never have more than 1 character on it
11:33
<annevk>
sometimes it will
11:33
<annevk>
the way we use it anyway
11:33
<mikeday>
if charsUntil is called multiple times...
11:34
<annevk>
some code appends to the queue
11:34
<annevk>
if we're talking about the same thing
11:34
<mikeday>
oh, outside of inputstream
11:34
<mikeday>
and here's me thinking it was a private variable :)
11:35
<annevk>
searching for self.stream.queue in tokenizer.py should give you an idea
11:35
<mikeday>
yeah
11:36
<mikeday>
I was expecting some kind of putback method
11:36
<zcorpan>
annevk: i found out that you fail http://simon.html5.org/test/css/magic-body/overflow/004.htm too (updated the table)
11:37
<mikeday>
that's a clever test
11:41
<mikeday>
hmm, parsing is easier when you load the entire document into memory first
11:41
<mikeday>
but not every application is going to like that.
11:42
<Philip`>
Are you going to attempt to handle <script> and document.write, so the application can modify the input stream while it's being parsed?
11:42
<mikeday>
eventually.
11:42
<mikeday>
obviously full support would require a JavaScript interpreter
11:43
<mikeday>
but you could always start with a hack that just handles trivial document.write calls, for testing
11:43
<zcorpan>
spidermonkey
11:43
<mikeday>
not every app will care about JavaScript
11:43
mikeday
nods
11:43
<mikeday>
just make it an optional dependency
11:43
<Philip`>
The script interpreter could just be offloaded into the application that's using the HTML parser, with some way it can be notified of document modifications and then modify the input stream back again, rather than building any scripting into the parser library itself
11:44
<mikeday>
can <script> do DOM manipulation when the entire document is not yet loaded?
11:45
<mikeday>
(please say no)
11:45
<zcorpan>
yes
11:45
<zcorpan>
(sorry)
11:45
<Lachy>
if it couldn't, that would affect the ability to do incremental rendering
11:45
<mikeday>
oh dear :)
11:45
<annevk>
do you mean loaded or parsed?
11:46
<mikeday>
I mean parsed
11:46
<zcorpan>
still a yes
11:46
<mikeday>
ie. the document text has not yet finished loading
11:46
<Philip`>
You could just make it a parser for script-is-disabled UAs, since real web browsers are probably going to be writing their own new parsers anyway and not many other people need scripting inside HTML :-)
11:46
<annevk>
the moment </script> is emitted script is executed and affects the input stream
11:46
<mikeday>
what happens if it starts mucking about with the root element? is that well defined?
11:46
<Lachy>
unless there's a defer attribute
11:47
<annevk>
mikeday, I think everything should work ok...
11:47
<mikeday>
what if it writes out a <script> element, like a self-reproducing element?
11:47
<annevk>
although I'm not entirely sure what would happen if it just removed the entire tree...
11:48
zcorpan
tested this before but can't remember what happens
11:48
<mikeday>
suicide scrips
11:48
<mikeday>
ts
11:49
<zcorpan>
iirc the rest of the element that was removed will not be inserted into the document, but i'll test it again
11:49
<mikeday>
so you could have a long page, then a script at the bottom that deletes all preceding content, then a second totally different page
11:50
<Philip`>
data:text/html,<script>function f(){document.write('<script>' + f + ' f()<'+'/script>Ha ')};f()</script>
11:50
<Philip`>
In Firefox it's seemingly limited to 20 invocations
11:51
<Philip`>
In Opera it kind of loops forever
11:51
<mikeday>
lovely :)
11:52
<Philip`>
In IE6 it's limited to 4
11:52
<Philip`>
(or 5, depending on whether you count the initial one)
11:52
Philip`
wonders if that counts as a DOS bug in Opera
11:53
<annevk>
it's not like we freeze
11:53
<annevk>
I'll file a bug anyway to be on the safe side
11:54
<annevk>
btw, it seems that Firefox on my machine does way more invocations...
11:54
<Philip`>
Ah, that's true
11:54
<Philip`>
though it still eats ~5MB of memory a second
11:54
<zcorpan>
firefox doesn't add the rest of the element. opera does (and it will end up at the parent node relative to if the script didn't run). ie7 shows aborts and shows an error page...
11:54
<mikeday>
eventually will trigger the OOM killer and potentially kill X-Windows
11:55
<mikeday>
anyway, it's a beautiful document, makes the billion laughs attack look tame by comparison
11:56
<Philip`>
I'm not sure if I'm looking at the wrong numbers but it doesn't seem like Opera is freeing the half a gigabyte it used up while I had that page open...
11:56
<annevk>
seems to be fixed in more recent versions of Opera
11:56
<annevk>
they show a single "Ha"
11:56
<Philip`>
Ah, Firefox 2 goes up to 100
11:57
<Philip`>
Actually, it goes down from 100
11:57
<Philip`>
(data:text/html,<script>var i=1;function f(){document.write('<script>' + f + ' f()<'+'/script> '+(i++))};f()</script>)
11:58
<Philip`>
(FF3 goes down from 21, IE6 goes down from 6)
11:59
<Lachy>
you don't need to write out ' + f + ' each time, just <script>f()</script> is sufficient
11:59
<mikeday>
what's the trigger for cutting off execution, would you need to track how many script blocks were created by the original script block?
11:59
<mikeday>
a bit like tracking recursion depth, but for script elements...
11:59
<Philip`>
Oh, that's true... This way is more quine-like, though :-)
11:59
<zcorpan>
http://simon.html5.org/test/html/parsing/dom-mutations/
12:00
<mikeday>
if you just blindly added the document.write text to the input stream,
12:00
<mikeday>
I don't see how you'd keep track of who created it
12:00
<Lachy>
this version loops 125 times in FF data:text/html,<script>var i=1;function f(){document.write(i++ + '<br><script>f()<\/script>')};f();</script>
12:01
<Lachy>
using the live DOM viewer, IE stops after 6
12:01
<zcorpan>
21 times for me in firefox3
12:02
<Lachy>
I'm testing FF2
12:04
<Philip`>
(Opera doesn't like the \ in that address)
12:04
<Lachy>
well, try %-encoding it then
12:05
Philip`
just used '+' instead
12:10
<mikeday>
hmm, be nice if C had coroutines, or python-style generators
12:11
<mikeday>
then if you run out of input halfway through parsing, you could just suspend and wait for more
12:13
<Philip`>
http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html - not exactly elegant, unfortunately, but that's what you get for using C
12:14
<mikeday>
hah, already reading that page
12:14
<mikeday>
cuter than setjmp/longjmp,
12:14
<mikeday>
but you can't use local variables, and only works for one single function
12:15
<Philip`>
Bah, who needs locals?
12:15
<mikeday>
gcc labels as values would also work nicely
12:15
<mikeday>
and allow the use of switch statements as well I suppose
12:15
<mikeday>
at the cost of making the code gcc-specific
12:15
Philip`
doesn't like GCC-specific code
12:15
<mikeday>
right
12:15
<mikeday>
or just make the code block on read
12:16
<mikeday>
and let applications use multiple threads if they care so much
12:16
<mikeday>
or callback on read, so the application can handle it and abort
12:16
<Philip`>
(I'm not entirely sure why I don't like it, but I guess it's just because portable standard code seems nicer)
12:16
<mikeday>
that's what pretty much every other library does
12:17
<mikeday>
annoying for application developers, but "that's what you get for using C"! :)
12:18
<Philip`>
Could something similar to libpng work?
12:18
<mikeday>
libpng is callbacks
12:18
<mikeday>
you say read, it reads by calling your provided read function
12:19
<mikeday>
hmm, it uses setjmp/longjmp too
12:19
<mikeday>
but only for error handling
12:19
<Philip`>
Its setjmp/longjmp is a bit nasty when you want to use it in C++ :-(
12:19
<mikeday>
jpeglib uses callbacks, but I think has I/O suspension as well,
12:20
<mikeday>
so you can pause a read and go and do something else
12:20
<Philip`>
Do HTML parsers need to abort and return errors?
12:20
<mikeday>
I think error handling shouldn't require setjmp hackery
12:20
<mikeday>
libpng is a bit exceptional in that regard (hah!)
12:21
<mikeday>
I was just hoping to avoid I/O callbacks if possible
12:21
<mikeday>
as it's not polite for a library to take over the main loop in that way
12:22
<mikeday>
doesn't seem possible without some funky code though.
12:24
<Philip`>
Hmm, you'd want something more like "while (some external loop getting data from the network) { parser->heres_some_more_data(buf, size); while (! parser->needs_more_data() && ! ui_needs_to_be_more_responsive()) parser->do_some_parsing(); if (is_finished()) hooray(); }" ?
12:24
<mikeday>
right, something like that
12:24
<mikeday>
although I probably wouldn't update the UI in the same thread
12:25
<mikeday>
parser.open(), parser.write(), write, write, parser.close()
12:26
<mikeday>
requires the parser to be able to suspend itself rather carefully when it runs out of data though
12:26
<mikeday>
which complicates the parser code
12:26
<mikeday>
just as demonstrated on that coroutines page
12:27
<Philip`>
Could it be useful to copy libxml2's interface, so the HTML5 parser could just fit in as a new input source there and users use libxml2 like they normally do?
12:27
<mikeday>
I would rather not to be honest, as libxml2 doesn't need to be any bigger
12:27
<mikeday>
or any more complex
12:28
<mikeday>
two different SAX interfaces, a DOM interface, the reader interface, etc.
12:28
Philip`
has never actually used libxml2
12:28
<mikeday>
and too many XML-specific assumptions, which make the current libxml2 HTML parser fit rather awkwardly
12:28
<Philip`>
though I have used Xerces-C, so I know what giant over-engineered XML libraries feel like
12:28
<mikeday>
it's great for what it does, but it's grown by accretion and now it can't throw anything away
12:28
<mikeday>
for example, I'd much rather that it didn't have its own HTTP implementation
12:29
<mikeday>
and just provided stubs for interfacing with curl, or whatever
12:29
<Philip`>
Okay, that sounds like convincing reasons to stay away from it :-)
12:29
<mikeday>
there is some good code in there for specific tasks though, that's worth learning from
12:30
<mikeday>
but a HTML specific library could be slightly simpler, as well as implementing more of HTML.
12:30
<mikeday>
oh well, I'll have to stop procrastinating and get some of the ugly code written
12:31
<mikeday>
once you accept that it's going to be ugly, and stop trying to make it elegant, you get it done quicker.
12:35
<Philip`>
(Hmm, Xerces is a third of a million lines, and the .dll has half a megabyte just for the exported symbol names... All I wanted was a simple plain XML parser)
12:37
<mikeday>
libxml2 is < 200k lines of source
12:37
<mikeday>
about 220k including headers
12:38
<mikeday>
includes a fair bit of stuff though, XPath, some XSD schema and RELAX NG, etc.
12:39
<mikeday>
and HTML and DocBook parsers :/
12:41
<Philip`>
Maybe I should use TinyXml, since that's only six thousand lines...
12:42
<mikeday>
actually, libxml2 also can be configured to be smaller, eg. --without-xpath, --without-schemas
12:42
<mikeday>
handy when you need a small footprint
12:42
<mikeday>
(I quite like libxml2, generally speaking :)
12:47
<Philip`>
I think the project I'm working on actually has three XML parsers already, since it uses a couple of external libraries that come with expat and libxml2. Plus there's a XML-converted-to-some-fast-binary-format loader, since XML parsing was too slow. Actually there's two of those (since I wanted to load some game's different binary XML format too), and also two separate XML writers. But it all works, which is enough to keep me happy
12:48
<mikeday>
heh
12:48
<mikeday>
we end up linking to expat indirectly as well, because fontconfig uses it
12:49
<mikeday>
feels a bit strange using two different parsers, but memory is cheap.
12:49
<mikeday>
anyway, must go
12:49
mikeday
waves
12:49
<Philip`>
Bye :-)
14:25
<annevk>
Hixie, postMessage() is not the only member of Document with relaxed settings: open(), write(), writeln(), close(), location have that too
14:26
<annevk>
Hixie, as such, moving postMessage() doesn't make much sense to us
19:31
annevk
ponders about the text/xsl discussion
19:35
<JonT>
ponders?
19:36
<ddfreyne>
v., -dered, -der·ing, -ders. To weigh in the mind with thoroughness and care.
19:36
<ddfreyne>
but "/me weighs in the mind with thoroughness and care" sounds so much cooler than "/me ponders"
19:37
<JonT>
thanks ddfreyne
19:38
<Philip`>
"ponder, n. Obs. 1. A weight; spec. one used with a scale or balance. Hence in extended use: anything weighty, esp. a weighty attack or blow."
19:44
nickshanks
ponders Philip around the lug 'ole ;-)
19:58
<annevk>
lol
20:00
<annevk>
Hixie, see a bit further up btw about postMessage()
20:02
<Hixie>
as far as i can tell you are wrong
20:02
<Hixie>
you can't document.write() to another domain
20:02
<Hixie>
that would be all kinds of dangerous
20:04
<annevk>
They have a different policy from some of the other members, anyway
20:07
<Hixie>
not in, e.g., safari, at least
20:09
<annevk>
even document.location?
20:12
<Hixie>
yup
20:12
<Hixie>
(you can get to it via window.location)
20:16
<annevk>
so contentDocument.location throws but contentWindow.location doesn't?
20:16
<Hixie>
yah
20:16
<Hixie>
i believe so
20:16
<Hixie>
test it
20:16
<Hixie>
actually it's contentDocument that throws
20:16
<Hixie>
not contentDocument.location
20:18
<Hixie>
bbiab, work
20:18
<annevk>
I get an exception for both contentDocument.location and contentWindow.location ...
20:18
<Hixie>
ah maybe "location" isn't one of the safe ones; try history
20:18
<Hixie>
anyway
20:19
<Hixie>
really. on. my. way. to. work. really.
20:20
<annevk>
contentDocument doesn't throw in Opera
20:20
<annevk>
or Firefox
20:20
<annevk>
tested using
20:20
<annevk>
data:text/html,<iframe src=http://www.google.com></iframe>x<script>; alert(document.getElementsByTagName('iframe')[0].contentDocument) </script>
21:55
<Hixie>
Jeffrey Zeldman follows the WHATWG twitter feed
21:56
<hasather>
http://jeremiahgrossman.blogspot.com/2007/05/html-5-in-works.html
22:03
<Hixie>
right!
22:03
<Hixie>
lunch is eaten, e-mail is looked at, though not read
22:03
<Hixie>
time to do some editing!
22:03
<zcorpan_>
have fun :)
23:05
<Hixie>
wow.
23:05
<Hixie>
talk about a lack of interoperability
23:05
<Hixie>
http://www.hixie.ch/tests/adhoc/html/canvas/037.html
23:08
<Hixie>
i have three browsers and SIX different renderings
23:08
<Philip`>
And the spec gives a seventh rendering, and potentially an eighth if you interpret it differently ;-)
23:09
<Hixie>
the original spec gives a 7th
23:09
<Hixie>
and i recently changed it to an 8th, yes
23:09
zcorpan_
writes some test cases on <body bgcolor> parsing
23:09
<Hixie>
sweet kittens what a mess
23:10
<Hixie>
ok let's see
23:10
<Hixie>
opera is definitely buggy
23:10
<Hixie>
safari 2 is definitely buggy
23:10
<Philip`>
In the original spec I couldn't tell whether it was meant to draw a cone between offsets 0 and 1, or between -infinity and +infinity
23:11
<Hixie>
well it definitely has to go through the two circles you specify in the createRadialGradient method
23:11
<Hixie>
and opera and safari 2 don't do that
23:12
<Hixie>
nor does camino as far as i can tell
23:12
<Hixie>
i don't understand where camino gets its rendering from
23:12
<Philip`>
Oh, I noticed that problem in Opera but not in the cases I tried in Safari
23:12
<Hixie>
this leaves firefox 2 linux, minefield mac, and webkit trunk as my potentially correct renderings
23:12
<Philip`>
(Konqueror 3.8 is utterly broken with radial gradients too)
23:13
<Hixie>
minefield mac ignores the first stop so that's wrong...
23:13
<Hixie>
firefox 2 linux creates two cones, one truncated
23:13
<Hixie>
that's gotta be wrong
23:14
<Hixie>
so that leaves webkit trunk
23:14
<Hixie>
which creates an infinite cone
23:14
<Hixie>
let's test webkit trunk with a cone that ends finitely
23:15
<Hixie>
it works
23:15
<Hixie>
ok
23:15
<Hixie>
that wins
23:17
<Philip`>
"minefield mac ignores the first stop" - I see all four colours in 037 with Minefield on Linux
23:17
<Hixie>
yeah i expect the linux one has a less buggy cairo
23:18
<Philip`>
Ah, okay - I think they're all using the same Cairo version, but presumably different backend code within Cairo
23:19
<Hixie>
holy crap
23:19
<Hixie>
minefield on linux gives us a 7th rendering
23:30
<jruderman>
on mac, i see a green-blue gradient, and some magenta, and some horribly aliased borders. i don't see any black or yellow.
23:30
<Philip`>
Given that Cairo doesn't even bother to do this consistently between platforms or between versions, it seems to be a case that doesn't come up very often in practice, so nobody will mind what the output is as long as it's consistent
23:30
<jruderman>
(firefox trunk)
23:31
<Hixie>
jruderman: the existence of the green-blue gradient is the only constant in the results
23:31
<Hixie>
well, except safari 2, which just draws black
23:32
<jruderman>
hah
23:32
<Hixie>
(the test description is wrong)
23:32
<jruderman>
i'm surprised, i thought canvas was simple and already consistent across browsers
23:32
<jruderman>
oh
23:32
<Hixie>
jruderman: it's surprisingly consistent in the areas i wrote good spec test for
23:33
<jruderman>
"Hixie has NFI how the test below should render."
23:33
<Philip`>
Oh, Safari 2 just doesn't do gradients in fillRect - you can do rect() fill() instead and it'll work more usefully
23:33
<jruderman>
Hixie: hehe
23:33
<Philip`>
(in case you want to test its actual gradient rendering)
23:33
<Hixie>
Philip`: yeah
23:33
<jruderman>
what's with the jagged diagonal borders?
23:34
<Hixie>
what's a good third dimension variable name other than t, r and z?
23:34
<Philip`>
jruderman: It's consistent in the obvious cases, until you try doing something non-trivial and find different bugs in every variable :-)
23:34
<Philip`>
Uh
23:34
<Philip`>
s/variable/browser/
23:35
<Philip`>
(I don't know where that word came from)
23:35
<Hixie>
probably came from you reading what i just wrote :-P
23:35
<jruderman>
Hixie: w?
23:35
<Hixie>
w might work
23:35
<Hixie>
i was thinking s
23:35
<jruderman>
Hixie: h, for height?
23:35
<Philip`>
alpha?
23:35
<Hixie>
but w is good
23:35
<Hixie>
alpha might work too
23:35
<Hixie>
alpha is better actually yeah
23:36
<jruderman>
Hixie: a, for altitude, and to confuse Philip` ?
23:36
<bewest>
d for depth?
23:36
<Hixie>
it's not a physical dimension
23:36
<Philip`>
Lowercase omega, to confuse jruderman?
23:36
<Hixie>
(it's a position along an imaginative line)
23:36
<Hixie>
ooo
23:36
<Hixie>
omega
23:36
<Hixie>
omega it is
23:36
<bewest>
heh
23:37
<Hixie>
(or "wibble" as we used to call it at university)
23:38
<Philip`>
I'm alright until people start using xi and zeta, and then I get totally confused
23:47
<zcorpan_>
http://simon.html5.org/test/html/parsing/color-attributes/001.htm
23:50
<zcorpan_>
can anyone test that in safari?
23:51
<Hixie>
hold on
23:51
<zcorpan_>
just see if there are any with #00ff00
23:51
<Hixie>
nope
23:51
<zcorpan_>
interesting
23:52
<Hixie>
<body>, <head>, <html> are blank; <isindex> is null
23:52
<Hixie>
on trunk
23:53
<Hixie>
safari 2 <noembed>, <noframes>, <nolayer>, <noscript> are also blank
23:53
<Hixie>
and <isindex> is not null
23:55
<Hixie>
ok, i have a new, "more mathematical" definition for radial gradients
23:55
<Hixie>
let's fix these tests to match
23:56
<Philip`>
"more mathematical" = "more Greek letters"?
23:56
<zcorpan_>
what happens with http://simon.html5.org/test/html/parsing/color-attributes/002.htm ?
23:56
<Hixie>
Philip`: yup!
23:56
<Hixie>
zcorpan_: trunk: lime, xxff
23:56
<Hixie>
s2 same
23:57
<zcorpan_>
ah, so it's not handled in the parser like in other browsers
23:57
<zcorpan_>
ok
23:59
<zcorpan_>
hm, wonder what is most sane to do here. perhaps just the rendering section should say how to interpret bgcolor values or something