00:39
<tantek>
ok email to public-html sent. onto the next issue. yay.
01:39
<AryehGregor>
Okay, I seriously feel Google Docs has regressed in functionality sometime over the last few weeks. Yet another minor thing that' drives me crazy:
01:40
<AryehGregor>
Yet another minor thing that's been driving me crazy: have one line that starts with an LTR word followed by one that starts with an RTL word. Go to start of line beginning with LTR word. Hit down arrow. The cursor is now visually at the beginning of the next line, but logically in the middle -- any new text you insert will be logically after the RTL word at the start of the line.
01:40
<AryehGregor>
Argh.
01:40
AryehGregor
is going to look for places to officially complain soon
01:41
<tantek>
now try combining that LTR / RTL mixing with text-overflow:ellipsis for extra good times.
02:44
<roc>
jamesr: Gecko handles non-BMP characters pretty well
02:44
<roc>
zewt: IVS too
02:45
<jamesr>
roc: in the native code?
02:46
<roc>
what do you mean "native code"?
02:46
<jamesr>
c++
02:46
<zewt>
yeah, the native part is easy, it's the user scripts part that's hard
02:46
<jamesr>
i'm interested in how you would handle this from javascript, if you do
02:46
<jamesr>
do you roll your own length / slice / etc functions?
02:47
<zewt>
well, usually you don't actually "need" the length (for definitions of "length" more complicated than "the number of codepoints")
02:47
<roc>
It's sort of like UTF-8
02:47
<roc>
a lot of the things you do with strings just work
02:47
<zewt>
it's exactly like utf-8 (except you need more data to handle it correctly, eg. knowing which codepoints are combining)
02:47
<zewt>
er
02:47
<zewt>
+ surrogates of course
02:47
<roc>
editing and selection are interesting
02:47
<zewt>
(sorry, multitasking too much)
02:47
<jamesr>
yeah - mapping selection ranges to strings and such can be very tricky
02:48
<jamesr>
or questions like "what's the first letter"
02:48
<zewt>
or just avoiding splitting surrogates in half
02:48
<zewt>
i havn't tried dealing with it myself either, so i don't really know what the practical issues are
02:48
<roc>
A lot of Web pages, even apps, should work fine with non-BMP chars
02:48
<zewt>
if code is simple, yeah
02:49
<zewt>
"accidentally" :)
02:49
<jamesr>
most of the time
02:49
<jamesr>
but something like this: https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/String/slice#Example:_Using_slice_to_create_a_new_string
02:49
<jamesr>
will fail if str1 has non-BMP chars at unexpected places
02:49
<roc>
depends on why you're using slice ... where the offsets came from
02:49
<jamesr>
by creating str2 with unmatched surrogate pairs
02:50
<jamesr>
right, so how do you get offsets that respect surrogate pairs? ecmascript doesn't provide any APIs for dealing
02:50
<roc>
if you searched for some substring and you slice at the start or end of the substring, you're probably OK
02:50
<zewt>
jamesr: well, technically combining characters also hit that too (even with utf-32), though that's a slightly lower level of breakage than splitting surrogates (the results are comprehensible, even if weird)
02:50
<jamesr>
are you checking the high bits of each char in JS?
02:50
<roc>
right, combining chars have basically the same set of problems
02:50
<jamesr>
let's say you wanted to truncate a long string if it's more than 30 letters long
02:50
<zewt>
jamesr: offsets should always be literal offsets into the string, not "number of codepoints after surrogate decoding"...
02:50
<zewt>
yeah, that's where it becomes trickier
02:50
<roc>
jamesr: use text-overflow:ellipsis!
02:51
jamesr
isn't fully confident that webkit will get that right
02:51
<roc>
I am very confident Gecko gets that right :-)
02:51
<roc>
it's basically just part of the clusterization problem
02:51
<jamesr>
we probably do, i'm not familiar with this part of wk
02:51
<roc>
that's not really a BMP-related issue at all
02:54
<zewt>
basically it's a bigger problem with utf-16 because if you get it wrong, you don't just end up with a weird string, you end up with a corrupt one
02:55
<zewt>
i think you can paste in mismatched surrogates into pages from other sources anyway, so it's not like it's the only way that can be introduced
02:57
<roc>
yes
02:57
<roc>
although strings that start with combining chars aren't technically illegal, they're a real pain to deal with :-)
02:58
<zewt>
more a pain for implementors than users, i think :P
02:58
<zewt>
as a user i'd just expect them to act as a non-combining character when there's nothing to combine with (in general)
03:11
<rniwa>
oh man... did I miss yet-another unicode goodness discussion?
03:11
<zewt>
several :)
03:12
rniwa
is glad he missed it
03:12
<rniwa>
I don't wanna know weird webkit unicode bugs
03:13
<zewt>
Your search - site:*.w3.org "ExtendedAttributeNoArgs" - did not match any documents.
03:13
<zewt>
"???"
03:15
<heycam>
zewt, problem in my grammar?
03:16
<zewt>
searching for instances of that string in specs other than webidl
03:16
<heycam>
should be ExtendedAttributeNoArg btw, no "s" on the end
03:16
<zewt>
(not familiar with webidl at all; was surprised that it didn't even find webidl itself)
03:16
<heycam>
but it's just a symbol in the grammar, i wouldn't expect other specs to use that word
03:16
<zewt>
http://dev.w3.org/2006/webapi/WebIDL/#idl-extended-attributes <- NoArgs
03:16
<zewt>
figuring that out is why i was searching for it :)
03:17
<heycam>
zewt, ah so it is. oh that's right, i think i renamed it. probably after the last TR publication. :)
03:17
<heycam>
it used to say something-or-other "takes no argument", but somebody found that wording a bit weird
03:18
<zewt>
google's cache is up to date, but for some reason it doesn't find that keyword
03:18
<heycam>
oh that is weird, because if I do site:dev.w3.org then it does find it
03:19
<zewt>
was trying to figure out if there's any way to mark up "flags" on functions (sort of like Python decorators), but that's not really a language binding issue, so
03:19
<zewt>
yeah, smells like a rare "provable google search bug" :)
03:20
<zewt>
site:dev.w3.org works, site:*.w3.org doesn't (which normally does)
03:21
<heycam>
(btw I tend to use just site:w3.org, which includes subdomains too)
06:22
<zcorpan>
woah. i thought importScripts() was same-origin, too. no idea how i could have missed that
06:33
<zcorpan>
wait, wait, wait, wait. people write DTD fragments when proposing a new element?
06:42
<MikeSmith_>
yeah, somebody should have given the waved off to that before they sent it
07:57
<jgraham>
zcorpan: After thinking a bit I decided to be glad they didn't send in XSD for the new element
08:02
<zcorpan>
touche
08:06
<hsivonen>
using DTD fragment to propose new elements is a bit like implying that UTF-16 is OK when proposing solutions to the encoding problem
08:15
<zcorpan>
hmm, <hr> in <select> is supported in some browsers (on some OSes)? http://forums.whatwg.org/bb3/viewtopic.php?f=1&t=4948
08:20
<hsivonen>
zcorpan: could be a difference between WebKit's old parser and the HTML5 parser
08:21
<zcorpan>
seems like a nice feature to me
08:26
<zcorpan>
hsivonen: we have four ways to opt in to utf-8
08:27
<hsivonen>
zcorpan: do you count the two meta syntaxes separately?
08:28
<zcorpan>
yeah
08:43
<annevk>
oh yes
08:43
<annevk>
I "won" the media type parameter debate on EventSource
08:44
<annevk>
with data
08:51
<MikeSmith>
been very quiet so far this week
08:52
<annevk>
MikeSmith: you have been?
08:53
<MikeSmith>
lists
08:53
<MikeSmith>
but now I remember I've got about 800 unread message in my inbox
08:53
<annevk>
ah, just about the ask what you were plotting and scheming
08:53
<MikeSmith>
I've been in Seoul
08:54
<MikeSmith>
though I did manage to get some plotting and scheming in while here
08:55
<annevk>
even now we closed account registration the wiki still gets daily spam
09:01
<MikeSmith>
annevk: visited the opera korea office
09:01
<MikeSmith>
and incidentally also finally figured out how to get my mutt to display some korean e-mail messages it wouldn't before
09:03
<MikeSmith>
mostly
09:04
<annevk>
sounds like a good time :)
09:04
<MikeSmith>
heh
09:05
<MikeSmith>
mail UAs here use ks_c_5601-1987 for some reason
09:05
<MikeSmith>
which as far as I can see is effectively the same as euc-kr
09:06
<MikeSmith>
at least all I ended up needing to do for mutt was to alias ks_c_5601-1987 to euc-kr
09:07
<jgraham>
Is plotting and scheming when you draw graphs in lisp?
09:14
<MikeSmith>
heh
09:15
<MikeSmith>
chrome WebRequest API is interesting
09:16
<MikeSmith>
annevk: "Apple seems to be working on bringing Web Notification support to Safari"
09:17
<MikeSmith>
http://peter.sh/2011/12/reverse-flexible-rows-and-columns-socket-api-and-panels/
09:26
<annevk>
MikeSmith: read that, sounds cool
09:27
<annevk>
MikeSmith: hopefully it will get jgraham to do some work on the spec o_O
09:29
<jgraham>
I know, I know
09:48
<MikeSmith>
annevk: "No blank line after the signature." is ambiguous and kind of confusing
09:48
<MikeSmith>
to me at least
09:49
<MikeSmith>
I first took it to mean, "A blank line after the signature is not allowed."
09:49
<MikeSmith>
when I tried typing in text in that first line
09:53
<annevk>
I'm not that great with error messages I'm afraid
11:52
<gsnedders>
Why am I unsurprised at olliej arguing like crazy against multi-vm bindings in WebKit?
11:52
<gsnedders>
(well, multi-language-vm bindings)
11:54
<annevk>
ohunt is great
12:05
<hsivonen>
gsnedders: whoa. I didn't read the thread carefully enough. I thought it was multi-vm bindings. is V8 becoming a bi-language VM?
12:06
<gsnedders>
hsivonen: No.
12:06
<jgraham>
Which thread?
12:06
<gsnedders>
hsivonen: My point is WebKit already has multi-VM bindings: JSC and V8
12:06
<gsnedders>
hsivonen: Multiple VMs for the same language, yes, but multiple VMs.
12:07
<gsnedders>
https://lists.webkit.org/pipermail/webkit-dev/2011-December/018775.html
12:10
<smaug____>
support for multi-vm has been actively tried to get rid of from Gecko
12:12
<hsivonen>
annevk: do you have a test suite for responseType == "json" already?
12:12
<gsnedders>
Presto supported futhark and Carakan for a while concurrently, though that was little work.
12:13
<jgraham>
So, is it me or is that thread a relic of a poor VCS? I mean there are obviously technical reasons why multi-VM is bad for the web, but the idea of having to ask to create a branch seems weird
12:14
<gsnedders>
jgraham: They want control over what branches they have in the official repo
12:14
<annevk>
hsivonen: no
12:14
<gsnedders>
jgraham: There is no objection to creating a branch elsewhere
12:14
<hsivonen>
jgraham: they are using SVN, so of course they have a poor VCS :-)
12:14
<jgraham>
hsivonen: I know
12:15
<karlcow>
http://lucumr.pocoo.org/2011/12/7/thoughts-on-python3/
12:15
<gsnedders>
(and there is an official git clone, and I think the recommendation is basically to create a clone of that)
12:15
<karlcow>
Because as it stands, Python 3 is the XHTML of the programming language world. It's incompatible to what it tries to replace but does not offer much besides being more “correct”.
12:16
<jgraham>
karlcow: I entirely disagree
12:16
<karlcow>
that was a quick answer to a looooong blog post
12:19
<jgraham>
karlcow: Not to the blogpost, to you
12:20
<jgraham>
Which was one sentence from the blogpost
12:20
<hsivonen>
jgraham: what karlcow said is a quote from the post
12:20
<hsivonen>
nevermind
12:20
<jgraham>
Right, I know that after reading the post :)
12:20
<karlcow>
hsivonen: poork markup from me
12:20
<jgraham>
Right, some quotation marks would have helped
12:20
<jgraham>
(the conclusion of the blogpost is actually very reasonable)
12:20
<hsivonen>
I haven't really had a good look at Python 3, but from a distance it sure looks XHTML2-ish
12:22
Philip`
thought it was obvious that karlcow was quoting, because the sentence started with a capital letter :-p
12:23
<karlcow>
Philip`: and there was no sex-related comments, no approximate English or silly poetic license
12:23
<karlcow>
which I just achieved in that last sentence
12:52
<jgraham>
"LLVM is turning into a real
12:52
<jgraham>
option for the web."
12:54
<jgraham>
Maybe someone should read http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/043719.html
13:09
<annevk>
hsivonen: warning for non-labeled content we should definitely do for HTML I think
13:10
<annevk>
hsivonen: not sure whether I paid attention at the time, but if Hixie still feels that way I would disagree with him now
13:21
<jgraham>
Is it me or is roc's blog dead?
13:23
<MikeSmith>
jgraham: don't working for me either
13:24
<MikeSmith>
お探しのブログは見つかりませんでした。
16:00
<zewt>
hmm, this is fairly bizarre
16:01
<zewt>
ff8, in zh-TW Windows (CP950), loading zh-TW HTML with no @lang, is defaulting to a Japanese font
16:01
<zewt>
it only uses a zh-TW font if I explicitly set @lang, or use Big5 instead of UTF-8
16:02
<hsivonen>
zewt: is CP950 an exclusively Traditional Chinese encoding?
16:02
<zewt>
it's windows's equivalent of Big5, which is what it picks if you select "Chinese (Taiwan)"
16:03
<zewt>
it has a separate encoding for simplified chinese
16:03
<hsivonen>
zewt: so is the page using CP950 or UTF-8?
16:03
<zewt>
but it's even more bizarre to pick a *japanese* font, I'd be less surprised if it mixed up zh-TW and zh-CN
16:03
<zewt>
hsivonen: http://zewt.org/~glenn/test-zh-TW-utf-8.html
16:03
<hsivonen>
zewt: absent @lang, Unihan in Firefox defaults to Japanese
16:04
<zewt>
that's very surprising
16:04
<hsivonen>
zewt: boo. that page doesn't declare its encoding
16:04
<zewt>
(not to say a bad thing, not depending on the locale like charsets do is a plus, just not what I'd ever expect)
16:04
<hsivonen>
zewt: having Web content behavior depend on browser locale is evil
16:04
<zewt>
it does, or it did a minute ago
16:05
<zewt>
but now apache is being stupid, apparently
16:05
<zewt>
hsivonen: yes, but a very common evil
16:06
<zewt>
gave apache a kick and my content-type header is back
16:07
hsivonen
mumbles about apache not updating etag when headers change
16:07
<zewt>
i suppose that so long as in practice, every chinese system also has japanese fonts installed, then that's okay
16:07
<zewt>
which is the case in Windows, i believe (all asian fonts are installed as a unit)
16:21
<hsivonen>
annevk: fwiw, I think I have an implementation of spec-compliant responseType "json"
16:24
<zewt>
okay, now i'm confused: what about Chrome? it also appears to default to Japanese for UTF-8, and it doesn't support @lang at all as far as I know
16:25
<zewt>
... so, other than setting a zh-TW font by name (evil), how does anyone display zh-TW in chrome?
16:25
<zewt>
(i expect I'm doing something silly)
16:26
<hsivonen>
zewt: please file a bug on Chrome
16:27
<zewt>
saying what? no doubt they already know they don't support @lang, but I'm not sure what the expected path is supposed to be currently
16:27
<hsivonen>
zewt: saying they should consider @lang when doing font selection
16:27
<zewt>
they (chrome + webkit) must already know that :)
16:28
<hsivonen>
zewt: they might not. also, they might not know that someone cares
16:28
<zewt>
but how's it supposed to work today? are a billion people just hacking around it with explicit font names?
16:28
<hsivonen>
these newfangled browsers trying to get away without doing stuff that Gecko has done for a decade
16:29
<hsivonen>
zewt: dunno. possibly.
16:29
<hsivonen>
zewt: or using parochial encodings
16:29
<zewt>
if they don't know by now that every Chinese speaker doesn't want pages displayed in a Japanese font, then I don't think I could convince them
16:29
<hsivonen>
zewt: or does Chrome not vary the font for Chinese legacy encodings?
16:30
<zewt>
i expect it does for those, though my big5 test is mojibake at the moment; let me see why
16:30
<erlehmann>
zewt, dont misunderestimate the chromium team!
16:31
<zewt>
heh
16:31
<zewt>
well, webkit, really
16:31
<zewt>
i sort of feel bad for all the blame the chrome guys get for things that are webkit's fault :P
16:34
<zewt>
ah, hold on, it's rendering as zn-CN
16:34
<zewt>
(which is hard for my weak western eyes to distinguish from jp)
16:35
<erlehmann>
gajin baka!
16:35
<zewt>
確かに
16:35
<zewt>
i'm an american; i take *pride* in not being able to read other languages!
16:35
<hsivonen>
zewt: huh? aren't Japanese glyph designs generally closer to Traditional Chinese designs than Simplified Chinese designs?
16:35
<erlehmann>
western font culture is pig disgusting!
16:36
<zewt>
hsivonen: i need to pick a test case with bigger character differences
16:37
<erlehmann>
hsivonen, japanese glyph designs are most closely resembled by pokemans, see 🐭 🐮 🐯 🐵
16:37
<erlehmann>
emoji are the new fancy shit!
16:38
<erlehmann>
though i question the need for U+1F5FE SILHOUETTE OF JAPAN
16:38
<zewt>
unicode jumped the shark with PILE OF POO
16:38
<erlehmann>
or U+1F5FC TOKYO TOWER
16:38
<erlehmann>
zewt, i heard the poop has eyes on ios!
16:39
<erlehmann>
however i cannot confirm due to not running non-free operating systems. *strokes his neckbeard*
16:39
<hsivonen>
interesting. accoding to the comments at http://my.opera.com/ODIN/blog/2011/08/09/introducing-oupeng-a-chinese-opera Opera has a Mini server farm behind the Great Firewall
16:39
<karlcow>
http://www.fileformat.info/info/unicode/block/miscellaneous_symbols_and_pictographs/list.htm
16:40
<erlehmann>
a mini server farm! likea beowulf cluster of wall warts!
16:40
<erlehmann>
karlcow, know any other font besides symbola that can do that?
16:40
<erlehmann>
i have symbola installed
16:40
<erlehmann>
but it looks weird
16:41
<erlehmann>
how does the apple do it? do they have SVG fonts for emoji?
16:41
<erlehmann>
hi, cowboy.
16:42
<karlcow>
hmmm good question not sure what apple installed on the system I haven't checked
16:43
<erlehmann>
the apple confounds me.
16:44
<zewt>
okay, found the source of my confusion
16:44
<zewt>
http://zewt.org/~glenn/encoding%20utf-8.html renders correctly in FF8 in Win7 (my desktop), but incorrectly in XP (my test VM; zh-TW shows the jp glyph)
16:46
<zewt>
chrome on http://zewt.org/~glenn/enc/big5.html shows the zh-CN glyph (which is weird--it should be zh-TW, right?)
16:49
<hsivonen>
zewt: did you test zh-Hant and zh-Hans?
16:49
<zewt>
nope, don't know anything about those
16:49
<zewt>
(not that I know a great deal about zh-anything)
16:49
<erlehmann>
zsh!
17:28
<zewt>
fyi, probably not surprisingly, ie8 does use the locale to pick the font
17:28
<zewt>
(for utf-8)
17:44
<Ms2ger>
Are there any websocket api tests already?
17:46
<jgraham>
Ms2ger: I don't know of any public ones
17:47
<jgraham>
We had some but they also tested the protocol so they need to be updated
17:47
<jgraham>
(in general testing websockets is pretty hrad)
17:47
<jgraham>
*hard
17:48
<jgraham>
Because you need a server component that can give you arbitary bits on the wire in response to your messages
17:48
<Ms2ger>
Oh, MS has some
17:48
<jgraham>
to check that the implementaion does the right thing API wise when it hits various edge cases
17:49
<jgraham>
Ms2ger: Just "WebSocket in window" type tests?
17:49
<Ms2ger>
new WebSocket(>2 arguments)
17:50
<Ms2ger>
(Which happens to throw in Gecko)
17:51
<jgraham>
Hmm, these tests do require a server
17:52
<jgraham>
http://html5labs-interop.cloudapp.net/
17:52
<jgraham>
http://html5labs-interop.cloudapp.net/wsdemo.html ... "To view this content please install Silverlight"
17:53
<Ms2ger>
\o/
17:53
jgraham
will look slighly confused for a bit and then get on with something more useful
17:55
<zewt>
please install a proprietary browser plugin like it's 1999
17:56
<Ms2ger>
Interesting, MS's W3C-submitted tests support MozWebSocket
18:06
<Ms2ger>
"Undefined variable: WebSocket"
18:06
Ms2ger
glares towards Opera
18:55
<jgraham>
Ms2ger: Didn't your moher teach you it is rude to glare?
18:56
<Ms2ger>
My moher didn't, no
18:59
<jgraham>
What about your mother? did she teach you that it's rude to mock people's typos?
19:01
<shepazu>
or did you teach her how to suck eggs?
19:01
<Ms2ger>
She taught that it's fair game to laugh at doctors in astrophysics
19:05
<jgraham>
Dammit.
19:06
<jgraham>
I somehow ended up in a ridiculed minority.
19:06
<JonathanNeal>
Were any of you guys able to check this out? https://github.com/jonathantneal/html5css/blob/master/style.css it's the ua css i wrote up based on the html5 spec.
19:26
<Hixie>
annevk: btw, looks like your multipage generator script isn't deleting old sections
19:26
<Hixie>
annevk: e.g. we still have content-models.html and apis-in-html-documents.html and even video.html which haven't been updated in months
19:27
<Hixie>
(pretty sure that's not a problem on my end because i delete the entire directory when regenerating it)
19:32
<Ms2ger>
Hixie, there's a bug about that
19:32
<Ms2ger>
(At least for the W3C copy
19:32
<Ms2ger>
)
19:40
<zewt>
starting to see somewhat more pages catching and breaking major browser hotkeys
19:40
<zewt>
twitter and now AWS's console
20:06
<zewt>
http://zewt.org/~glenn/encoding%20utf-8.html strange; IE9 shows zh-CN in the lang=jp case (firefox shows all three fine, and IE9 does get zh-TW and zh-CN right)
20:11
<zewt>
but it gets it right with shift-jis (or else all of Japan would be yelling at them)
20:11
<zewt>
it feels like some browser vendors are actively trying to prevent asia (especially Japan) from using UTF-8
20:15
<zewt>
aha, it wants lang=ja (other browsers accept lang=jp, so that's what I ended up typing)
20:16
<smaug____>
annevk: ping
20:45
<Hixie>
heycam|away: does SVG have path objects?
20:46
<Hixie>
it's ridiculous how many people are asking for ellipses suddenly
20:54
<Philip`>
Maybe they thought that asking for ellipses is the most effective way to get a non-terrible API for drawing circles
20:55
Philip`
thinks APIs that require you to remember to use the argument 2*Math.PI don't count as non-terrible
20:58
<TobiX>
Philip`: As long as you don't have to use LOGO to draw a circle ;)
21:22
<TabAtkins>
tantek: You still need me for anything? (Returning the ping from a while ago without reading anything from near it.)
21:24
<zewt>
did the unsubscribed-poster setting change on the list? curious how that spam got through
21:24
<TabAtkins>
It happens occasionally.
21:28
<gavinc>
are there any standalone implementations of the UTF-8, with error handling decoding?
21:31
<tantek>
TabAtkins - I think it may have been about the my documentation of the proposal to allow space instead of t in date-and-time microsyntaxes
21:31
<tantek>
it appears to be resolved now in WHATWG
21:32
<TabAtkins>
kk
21:32
<tantek>
however you're still encouraged to review and edit/improve the research documentation: http://wiki.whatwg.org/wiki/Time_element#permit_space_instead_of_T_in_datetimes
21:35
tantek
is proceeding with attempting to grow consensus on time enhancements across WHATWG and HTMLWG specs.
21:44
<jgraham>
gavinc: What do you mean by "implementation" in this case? A C library that will take UTF8 and give you back (some string format)?
21:45
<jgraham>
(not that I know the answer in any case, but...)
21:45
<zewt>
i assume he means the particular rules defined by http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#utf-8
21:45
<gavinc>
zewt: Yes
21:45
<gavinc>
jgraham: or even just an example that isn't stuck way way down inside a browser
21:46
<zewt>
i'd imagine the utf8-to-codepoints code is probably not very complex ... maybe check webkit?
21:46
<zewt>
(webkit tends to have code which is easier to read in isolation than gecko, in my brief experiences with both)
21:47
<gavinc>
zewt: well, most languages have UTF-8 parsers for reading byte streams. I guess I was wondering if anyone had already implemented utf-8 with error recovery as one of those
21:47
<zewt>
well, they might implement error recovery, but in different ways
21:47
<zewt>
i assume you want those particular rules
21:48
<gavinc>
zewt: Right, so specifically THOSE rules
21:48
<gavinc>
Java, Python and Ruby don't do any error recovery, they just explode on invalid utf-8
21:49
<gkellogg>
gavinc: I can't say that for sure about Ruby, as the case I'm looking at doesn't actually look like UTF-8
21:53
Philip`
uses the code at http://trac.wildfiregames.com/browser/ps/trunk/source/lib/utf8.cpp , which can either abort and report errors or else replace invalid stuff (for some value of 'stuff') with U+FFFD
21:53
<zewt>
here's (some version of) webkit's utf-8 decoding: http://gitorious.org/webkit/sedkit/blobs/94b6ef65ee422c62fb91ca9be76ee2fa5310d718/Source/WebCore/platform/text/TextCodecUTF8.cpp
21:54
<zewt>
not as simple as an "example code" standalone might be, but not complex
21:55
<zewt>
(i don't know if webkit actually implements utf-8 error handling exactly per spec, but I'd hope so)
21:55
<zewt>
also I don't know what version that is (I just googled for the source file of whatever WebKit source version is lying around in my ~)
21:56
<gavinc>
zewt: yeah, specifically line 169 http://gitorious.org/webkit/sedkit/blobs/94b6ef65ee422c62fb91ca9be76ee2fa5310d718/Source/WebCore/platform/text/TextCodecUTF8.cpp#line169
21:56
<zewt>
decodeNonASCIISequence looks like the fast path with no error handling, check TextCodecUTF8::decode below it
21:57
<gavinc>
No, it's still doing the error handling as specified
21:58
<zewt>
it's the asserts that make it look dubious
21:59
<zewt>
havn't squinted very hard at it
22:00
<zewt>
there's still enough platform stuff in there that it may be more work to extract it to use it independently than to just rewrite it
22:00
<Velmont>
gavinc: chardet python library is using charset sniffing code/algorithm from browsers at least.
22:02
<gkellogg>
ww: so, it seems that the problem is specific to Ruby 1.9. Running on a different version, I seem to be able to parse (without actually generating triples) in about 22 secs. I'll try making triples next
22:02
<gavinc>
gkellogg: mischan ;)
22:03
<gkellogg>
:)
22:03
<gavinc>
of course chardet 410s... sigh, yeah
22:05
<gavinc>
Velmont: chardet just does the detection, not the bytes-> chars :(
22:07
<zewt>
->codepoints :)
22:07
<Wilto>
annevk: Ping.
22:15
<gavinc>
zewt: yes those :P
22:22
<jgraham>
Hmm, I'm pretty sure the python codecs module has some possibility for error recovery, but I doubt it matches this spec
22:22
<gavinc>
reading now, it's C with goto, may take a few passes at reading
22:23
<jgraham>
errors='replace': replace malformed data with a suitable replacement marker, such as '?' or '\ufffd'
22:23
<gavinc>
yeah, I think gets darn close of what HTML5 says
22:23
<zewt>
jgraham: would be interesting if they could be convinced to converge on html's rules
22:24
<gavinc>
Was also looking if there were any test cases yet for that algorithm
22:24
<jgraham>
zewt: I don't know how or if it is different
22:24
<zewt>
i don't either
22:25
<gavinc>
one example, it doesn't replace surrogates
22:25
<zewt>
or if they're prevented from changing it for compat, add an errors='html' mode
22:26
<gavinc>
exactly :D
22:27
<gavinc>
was hoping that it could be done by registering a new codex, but it looks like that would be staggeringly slower
22:30
<jgraham>
gavinc: Where is the actual decoder in python?
22:31
<jgraham>
I mean actual python decoder
22:31
<jgraham>
in c
22:31
<jgraham>
Well I mean something. I'm not sure either way round was clear
22:33
<zewt>
PyUnicode_DecodeUTF8Stateful, I think
22:33
<zewt>
at least in 3.1.3
22:34
<gavinc>
PyUnicode_DecodeUTF8Stateful
22:34
<gavinc>
yeah
22:34
<zewt>
i win
22:34
<gavinc>
I was reading it :P
22:39
<gavinc>
http://hg.python.org/cpython/file/17ceebc61b65/Objects/unicodeobject.c#l2555
22:54
<gavinc>
Well, the example in HTML comes out as A��B�C☺�... so perhaps some test cases are in order :D
23:00
<gavinc>
https://gist.github.com/1445180
23:02
<annevk>
Wilto: ?
23:03
<annevk>
hsivonen: cool
23:03
<Wilto>
Hey man, Mike Taylor mentioned I might want to get ahold of you—I’ve been working with Scott Jehl et al. on the whole “responsive images” problem.
23:03
<annevk>
Hixie: that can be fixed, I'm currently just overwriting files
23:03
<Wilto>
We whipped up that canvas-based demo the other day.
23:03
<Wilto>
( http://scottjehl.com/imgwithfallback.html )
23:04
<annevk>
ah cool
23:04
<annevk>
yeah brucel keeps bugging me about it
23:04
<Wilto>
He doesn’t know it, but he might have just become my hero.
23:04
<Wilto>
I’ve been obsessing over this stuff for months.
23:04
<annevk>
so personally I'm not sure the requirements from authors are quite clear enough for a new markup feature
23:04
<annevk>
that's my a
23:05
<annevk>
my b is that I sort of feel that long term bandwidth will be less of an issue and everything will be 300ppi
23:05
<annevk>
and new markup only hits long term
23:05
<Hixie>
annevk: if you would that'd be great. i had someone send me a link to the webrtc section today, without them realising it was dead
23:05
<annevk>
Hixie: so probably tomorrow around this time it will be fixed
23:06
<annevk>
Hixie: unless I get to it before going to bed somehow
23:06
<Hixie>
annevk: roger, thanks
23:09
<annevk>
Wilto: both a and b might be false; there's not really sufficient information imo other than that some people are hitting this problem today
23:10
<Wilto>
annevk: Granted, yeah. And believe me, I have in no way been banking on “we need a new element”—I assume that’s a phrase that’s thrown around a lot when people first run into an issue like this.
23:10
<annevk>
Wilto: that's the brucel stance :)
23:10
<Wilto>
Thing is—and again, not something I say easily—this just doesn’t feel solvable on the front end.
23:11
<Wilto>
I would absolutely love to write up the history of the issue. I’m sure you guys have been getting it in fits and starts.
23:12
<Wilto>
It is a sordid tale of cookies and dynamically-injected base tags.
23:13
<zewt>
gar, quotostrophes
23:14
<Wilto>
The general thinking is that a reliable fallback pattern already exists in video/audio/canvas.
23:15
<zewt>
but that's a different pattern
23:15
<heycam>
Hixie, yes SVG does have path objects. but they, like much of the SVG DOM, suck. :)
23:15
<annevk>
Wilto: yeah I know, but that is mostly for different formats, although admittedly media queries are there (not sure if they are implemented though)
23:15
<Wilto>
Well, in terms of not breaking things that already exist.
23:16
<heycam>
Hixie, so we (SVGWG) want to improve the SVG DOM in the SVG2 timeframe (so some time over the next year)
23:16
<Hixie>
heycam: Are these the DOM nodes for some sort of path element, or are they separate Path objects?
23:16
<heycam>
Hixie, no, separate path objects
23:16
<heycam>
Hixie, however unlike say SVGPoint/SVGMatrix you can't create a disconnected SVGPath
23:16
<heycam>
Hixie, they only exist to reflect path data on elements at the moment
23:16
<Hixie>
k
23:16
<Hixie>
i'm gonna guess i'll be coming up with a new interface, but i'll take a look to see what can be reused...
23:17
<heycam>
Hixie, ok if you get to it before we do (which sounds likely) I'll take a look at what you come up with and see where to go from there
23:17
<Hixie>
i'm expecting to do work on canvas early next year
23:18
<Hixie>
one of the things is adding the path primitives
23:18
<Hixie>
would be great to have your feedback when i do, of course
23:18
<heycam>
yep sure
23:18
<Wilto>
annevk: So, in your opinion, is this idea a total non-starter?
23:18
<annevk>
Wilto: no, all I have is some thoughts
23:18
<Hixie>
heycam: i dunno how much similar they'll be. I mean, some of the stuff I'm looking at doing is the focus ring stuff, hit testing, etc.
23:19
<Wilto>
Or would it help if I wrote up, uh, anything. Like I said: desperate times.
23:19
<annevk>
Wilto: what we need for standards is mostly data
23:19
<annevk>
Wilto: which indeed usually starts with someone outlining the problem and such
23:19
<Hixie>
heycam: explicit stroking and filling, adding text to a path, create a path with text along another path
23:19
<heycam>
Hixie, but these'll be methods on the canvas rather than the path object I would expect?
23:19
<annevk>
Wilto: in this case someone might have already done something in that direction, I forgot, but not recently
23:19
<Hixie>
heycam: dunno
23:19
<annevk>
Wilto: you can take a few hints from http://wiki.whatwg.org/wiki/FAQ#Is_there_a_process_for_adding_new_features_to_a_specification.3F
23:20
<heycam>
Hixie, create a path with text along another path eh :) yes we're interested in a "convert text to its outline paths" too (probably including text along a path)
23:20
<annevk>
Wilto: it's not a literal process, but it contains the steps we usually go through
23:20
<Wilto>
annevk: No—we’ve got lots of notes on the subject, but the official documentation is pretty much “ask me.” I just wanted to check in with you to see if I was completely off the rails, y’know?
23:20
<annevk>
Wilto: okay
23:21
<annevk>
Wilto: so no, you're not, a write up would be great
23:21
<Wilto>
Excellent. I appreciate you taking the time, man!
23:21
<annevk>
Wilto: we can't really promise anything, but it's valuable to have that information
23:21
<Wilto>
Oh, no, no promises expected. It doesn’t hurt to have it all down on paper somewhere anyway.
23:22
<annevk>
Wilto: even if we decide not to do it in the end, at least it's more reasoned than a hunch from someone
23:22
<gavinc>
The over long utf-8 sequence \xC0\xAF (/) should produce a SINGLE U+FFFD character is that the correct reading of http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#utf-8 ?
23:22
<annevk>
Wilto: right; cool!
23:22
<annevk>
and with that I'm gonna get some sleep :)
23:22
<Wilto>
Have a good one!
23:22
<annevk>
nn
23:23
<zewt>
gavinc: yep; a whole overlong sequence is replaced with a single replacement character
23:24
<gavinc>
zewt: Guess what Java and Python don't do? ;)
23:24
<zewt>
lots of software don't deal with overlong sequences adequately
23:24
<zewt>
i don't think vim does
23:25
<gavinc>
yeah, iconv doesn't seem to :\
23:25
<gavinc>
Not sure I'm calling it correctly however
23:25
<zewt>
it's not always wanted--in some cases faster performance with less checks is the goal
23:26
<zewt>
python seems to throw an error on that sequence
23:26
<gavinc>
print '\xc0\xaf'.decode('utf-8', 'replace')
23:26
<zewt>
>>> '\xc0\xaf'.decode("utf-8")
23:26
<zewt>
in replace mode i'm not sure
23:26
<zewt>
though
23:26
<gavinc>
�� :D
23:26
<zewt>
ah right, that's legal, just a different recovery mode
23:27
<gavinc>
close, seems to meet all the other HTML5 rules
23:28
<zewt>
vim doesn't detect it and converts it directly :(
23:29
<gavinc>
bad vim!
23:29
<zewt>
well
23:29
<zewt>
vim keeps the original binary sequence unchanged; what's wrong is that it renders it
23:29
<gavinc>
as a slash?
23:29
<zewt>
yeah
23:30
<zewt>
probably just a naive utf-8 decoder
23:40
<Hixie>
kennyluck: I need more context for some of your bugs! :-) (e.g. bug 14890)
23:41
<Hixie>
haha
23:42
<Hixie>
bug 15051 ends with "omg, you wannbe w3c"
23:42
<Hixie>
i'm pretty sure if there's one thing i don't want to be, it's the w3c. :-P
23:42
<Hixie>
dunno about you guys. :-P
23:42
<zewt>
hixietf