00:39 | <tantek> | ok email to public-html sent. onto the next issue. yay. |
01:39 | <AryehGregor> | Okay, I seriously feel Google Docs has regressed in functionality sometime over the last few weeks. Yet another minor thing that' drives me crazy: |
01:40 | <AryehGregor> | Yet another minor thing that's been driving me crazy: have one line that starts with an LTR word followed by one that starts with an RTL word. Go to start of line beginning with LTR word. Hit down arrow. The cursor is now visually at the beginning of the next line, but logically in the middle -- any new text you insert will be logically after the RTL word at the start of the line. |
01:40 | <AryehGregor> | Argh. |
01:40 | AryehGregor | is going to look for places to officially complain soon |
01:41 | <tantek> | now try combining that LTR / RTL mixing with text-overflow:ellipsis for extra good times. |
02:44 | <roc> | jamesr: Gecko handles non-BMP characters pretty well |
02:44 | <roc> | zewt: IVS too |
02:45 | <jamesr> | roc: in the native code? |
02:46 | <roc> | what do you mean "native code"? |
02:46 | <jamesr> | c++ |
02:46 | <zewt> | yeah, the native part is easy, it's the user scripts part that's hard |
02:46 | <jamesr> | i'm interested in how you would handle this from javascript, if you do |
02:46 | <jamesr> | do you roll your own length / slice / etc functions? |
02:47 | <zewt> | well, usually you don't actually "need" the length (for definitions of "length" more complicated than "the number of codepoints") |
02:47 | <roc> | It's sort of like UTF-8 |
02:47 | <roc> | a lot of the things you do with strings just work |
02:47 | <zewt> | it's exactly like utf-8 (except you need more data to handle it correctly, eg. knowing which codepoints are combining) |
02:47 | <zewt> | er |
02:47 | <zewt> | + surrogates of course |
02:47 | <roc> | editing and selection are interesting |
02:47 | <zewt> | (sorry, multitasking too much) |
02:47 | <jamesr> | yeah - mapping selection ranges to strings and such can be very tricky |
02:48 | <jamesr> | or questions like "what's the first letter" |
02:48 | <zewt> | or just avoiding splitting surrogates in half |
02:48 | <zewt> | i havn't tried dealing with it myself either, so i don't really know what the practical issues are |
02:48 | <roc> | A lot of Web pages, even apps, should work fine with non-BMP chars |
02:48 | <zewt> | if code is simple, yeah |
02:49 | <zewt> | "accidentally" :) |
02:49 | <jamesr> | most of the time |
02:49 | <jamesr> | but something like this: https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/String/slice#Example:_Using_slice_to_create_a_new_string |
02:49 | <jamesr> | will fail if str1 has non-BMP chars at unexpected places |
02:49 | <roc> | depends on why you're using slice ... where the offsets came from |
02:49 | <jamesr> | by creating str2 with unmatched surrogate pairs |
02:50 | <jamesr> | right, so how do you get offsets that respect surrogate pairs? ecmascript doesn't provide any APIs for dealing |
02:50 | <roc> | if you searched for some substring and you slice at the start or end of the substring, you're probably OK |
02:50 | <zewt> | jamesr: well, technically combining characters also hit that too (even with utf-32), though that's a slightly lower level of breakage than splitting surrogates (the results are comprehensible, even if weird) |
02:50 | <jamesr> | are you checking the high bits of each char in JS? |
02:50 | <roc> | right, combining chars have basically the same set of problems |
02:50 | <jamesr> | let's say you wanted to truncate a long string if it's more than 30 letters long |
02:50 | <zewt> | jamesr: offsets should always be literal offsets into the string, not "number of codepoints after surrogate decoding"... |
02:50 | <zewt> | yeah, that's where it becomes trickier |
02:50 | <roc> | jamesr: use text-overflow:ellipsis! |
02:51 | jamesr | isn't fully confident that webkit will get that right |
02:51 | <roc> | I am very confident Gecko gets that right :-) |
02:51 | <roc> | it's basically just part of the clusterization problem |
02:51 | <jamesr> | we probably do, i'm not familiar with this part of wk |
02:51 | <roc> | that's not really a BMP-related issue at all |
02:54 | <zewt> | basically it's a bigger problem with utf-16 because if you get it wrong, you don't just end up with a weird string, you end up with a corrupt one |
02:55 | <zewt> | i think you can paste in mismatched surrogates into pages from other sources anyway, so it's not like it's the only way that can be introduced |
02:57 | <roc> | yes |
02:57 | <roc> | although strings that start with combining chars aren't technically illegal, they're a real pain to deal with :-) |
02:58 | <zewt> | more a pain for implementors than users, i think :P |
02:58 | <zewt> | as a user i'd just expect them to act as a non-combining character when there's nothing to combine with (in general) |
03:11 | <rniwa> | oh man... did I miss yet-another unicode goodness discussion? |
03:11 | <zewt> | several :) |
03:12 | rniwa | is glad he missed it |
03:12 | <rniwa> | I don't wanna know weird webkit unicode bugs |
03:13 | <zewt> | Your search - site:*.w3.org "ExtendedAttributeNoArgs" - did not match any documents. |
03:13 | <zewt> | "???" |
03:15 | <heycam> | zewt, problem in my grammar? |
03:16 | <zewt> | searching for instances of that string in specs other than webidl |
03:16 | <heycam> | should be ExtendedAttributeNoArg btw, no "s" on the end |
03:16 | <zewt> | (not familiar with webidl at all; was surprised that it didn't even find webidl itself) |
03:16 | <heycam> | but it's just a symbol in the grammar, i wouldn't expect other specs to use that word |
03:16 | <zewt> | http://dev.w3.org/2006/webapi/WebIDL/#idl-extended-attributes <- NoArgs |
03:16 | <zewt> | figuring that out is why i was searching for it :) |
03:17 | <heycam> | zewt, ah so it is. oh that's right, i think i renamed it. probably after the last TR publication. :) |
03:17 | <heycam> | it used to say something-or-other "takes no argument", but somebody found that wording a bit weird |
03:18 | <zewt> | google's cache is up to date, but for some reason it doesn't find that keyword |
03:18 | <heycam> | oh that is weird, because if I do site:dev.w3.org then it does find it |
03:19 | <zewt> | was trying to figure out if there's any way to mark up "flags" on functions (sort of like Python decorators), but that's not really a language binding issue, so |
03:19 | <zewt> | yeah, smells like a rare "provable google search bug" :) |
03:20 | <zewt> | site:dev.w3.org works, site:*.w3.org doesn't (which normally does) |
03:21 | <heycam> | (btw I tend to use just site:w3.org, which includes subdomains too) |
06:22 | <zcorpan> | woah. i thought importScripts() was same-origin, too. no idea how i could have missed that |
06:33 | <zcorpan> | wait, wait, wait, wait. people write DTD fragments when proposing a new element? |
06:42 | <MikeSmith_> | yeah, somebody should have given the waved off to that before they sent it |
07:57 | <jgraham> | zcorpan: After thinking a bit I decided to be glad they didn't send in XSD for the new element |
08:02 | <zcorpan> | touche |
08:06 | <hsivonen> | using DTD fragment to propose new elements is a bit like implying that UTF-16 is OK when proposing solutions to the encoding problem |
08:15 | <zcorpan> | hmm, <hr> in <select> is supported in some browsers (on some OSes)? http://forums.whatwg.org/bb3/viewtopic.php?f=1&t=4948 |
08:20 | <hsivonen> | zcorpan: could be a difference between WebKit's old parser and the HTML5 parser |
08:21 | <zcorpan> | seems like a nice feature to me |
08:26 | <zcorpan> | hsivonen: we have four ways to opt in to utf-8 |
08:27 | <hsivonen> | zcorpan: do you count the two meta syntaxes separately? |
08:28 | <zcorpan> | yeah |
08:43 | <annevk> | oh yes |
08:43 | <annevk> | I "won" the media type parameter debate on EventSource |
08:44 | <annevk> | with data |
08:51 | <MikeSmith> | been very quiet so far this week |
08:52 | <annevk> | MikeSmith: you have been? |
08:53 | <MikeSmith> | lists |
08:53 | <MikeSmith> | but now I remember I've got about 800 unread message in my inbox |
08:53 | <annevk> | ah, just about the ask what you were plotting and scheming |
08:53 | <MikeSmith> | I've been in Seoul |
08:54 | <MikeSmith> | though I did manage to get some plotting and scheming in while here |
08:55 | <annevk> | even now we closed account registration the wiki still gets daily spam |
09:01 | <MikeSmith> | annevk: visited the opera korea office |
09:01 | <MikeSmith> | and incidentally also finally figured out how to get my mutt to display some korean e-mail messages it wouldn't before |
09:03 | <MikeSmith> | mostly |
09:04 | <annevk> | sounds like a good time :) |
09:04 | <MikeSmith> | heh |
09:05 | <MikeSmith> | mail UAs here use ks_c_5601-1987 for some reason |
09:05 | <MikeSmith> | which as far as I can see is effectively the same as euc-kr |
09:06 | <MikeSmith> | at least all I ended up needing to do for mutt was to alias ks_c_5601-1987 to euc-kr |
09:07 | <jgraham> | Is plotting and scheming when you draw graphs in lisp? |
09:14 | <MikeSmith> | heh |
09:15 | <MikeSmith> | chrome WebRequest API is interesting |
09:16 | <MikeSmith> | annevk: "Apple seems to be working on bringing Web Notification support to Safari" |
09:17 | <MikeSmith> | http://peter.sh/2011/12/reverse-flexible-rows-and-columns-socket-api-and-panels/ |
09:26 | <annevk> | MikeSmith: read that, sounds cool |
09:27 | <annevk> | MikeSmith: hopefully it will get jgraham to do some work on the spec o_O |
09:29 | <jgraham> | I know, I know |
09:48 | <MikeSmith> | annevk: "No blank line after the signature." is ambiguous and kind of confusing |
09:48 | <MikeSmith> | to me at least |
09:49 | <MikeSmith> | I first took it to mean, "A blank line after the signature is not allowed." |
09:49 | <MikeSmith> | when I tried typing in text in that first line |
09:53 | <annevk> | I'm not that great with error messages I'm afraid |
11:52 | <gsnedders> | Why am I unsurprised at olliej arguing like crazy against multi-vm bindings in WebKit? |
11:52 | <gsnedders> | (well, multi-language-vm bindings) |
11:54 | <annevk> | ohunt is great |
12:05 | <hsivonen> | gsnedders: whoa. I didn't read the thread carefully enough. I thought it was multi-vm bindings. is V8 becoming a bi-language VM? |
12:06 | <gsnedders> | hsivonen: No. |
12:06 | <jgraham> | Which thread? |
12:06 | <gsnedders> | hsivonen: My point is WebKit already has multi-VM bindings: JSC and V8 |
12:06 | <gsnedders> | hsivonen: Multiple VMs for the same language, yes, but multiple VMs. |
12:07 | <gsnedders> | https://lists.webkit.org/pipermail/webkit-dev/2011-December/018775.html |
12:10 | <smaug____> | support for multi-vm has been actively tried to get rid of from Gecko |
12:12 | <hsivonen> | annevk: do you have a test suite for responseType == "json" already? |
12:12 | <gsnedders> | Presto supported futhark and Carakan for a while concurrently, though that was little work. |
12:13 | <jgraham> | So, is it me or is that thread a relic of a poor VCS? I mean there are obviously technical reasons why multi-VM is bad for the web, but the idea of having to ask to create a branch seems weird |
12:14 | <gsnedders> | jgraham: They want control over what branches they have in the official repo |
12:14 | <annevk> | hsivonen: no |
12:14 | <gsnedders> | jgraham: There is no objection to creating a branch elsewhere |
12:14 | <hsivonen> | jgraham: they are using SVN, so of course they have a poor VCS :-) |
12:14 | <jgraham> | hsivonen: I know |
12:15 | <karlcow> | http://lucumr.pocoo.org/2011/12/7/thoughts-on-python3/ |
12:15 | <gsnedders> | (and there is an official git clone, and I think the recommendation is basically to create a clone of that) |
12:15 | <karlcow> | Because as it stands, Python 3 is the XHTML of the programming language world. It's incompatible to what it tries to replace but does not offer much besides being more “correct”. |
12:16 | <jgraham> | karlcow: I entirely disagree |
12:16 | <karlcow> | that was a quick answer to a looooong blog post |
12:19 | <jgraham> | karlcow: Not to the blogpost, to you |
12:20 | <jgraham> | Which was one sentence from the blogpost |
12:20 | <hsivonen> | jgraham: what karlcow said is a quote from the post |
12:20 | <hsivonen> | nevermind |
12:20 | <jgraham> | Right, I know that after reading the post :) |
12:20 | <karlcow> | hsivonen: poork markup from me |
12:20 | <jgraham> | Right, some quotation marks would have helped |
12:20 | <jgraham> | (the conclusion of the blogpost is actually very reasonable) |
12:20 | <hsivonen> | I haven't really had a good look at Python 3, but from a distance it sure looks XHTML2-ish |
12:22 | Philip` | thought it was obvious that karlcow was quoting, because the sentence started with a capital letter :-p |
12:23 | <karlcow> | Philip`: and there was no sex-related comments, no approximate English or silly poetic license |
12:23 | <karlcow> | which I just achieved in that last sentence |
12:52 | <jgraham> | "LLVM is turning into a real |
12:52 | <jgraham> | option for the web." |
12:54 | <jgraham> | Maybe someone should read http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/043719.html |
13:09 | <annevk> | hsivonen: warning for non-labeled content we should definitely do for HTML I think |
13:10 | <annevk> | hsivonen: not sure whether I paid attention at the time, but if Hixie still feels that way I would disagree with him now |
13:21 | <jgraham> | Is it me or is roc's blog dead? |
13:23 | <MikeSmith> | jgraham: don't working for me either |
13:24 | <MikeSmith> | お探しのブログは見つかりませんでした。 |
16:00 | <zewt> | hmm, this is fairly bizarre |
16:01 | <zewt> | ff8, in zh-TW Windows (CP950), loading zh-TW HTML with no @lang, is defaulting to a Japanese font |
16:01 | <zewt> | it only uses a zh-TW font if I explicitly set @lang, or use Big5 instead of UTF-8 |
16:02 | <hsivonen> | zewt: is CP950 an exclusively Traditional Chinese encoding? |
16:02 | <zewt> | it's windows's equivalent of Big5, which is what it picks if you select "Chinese (Taiwan)" |
16:03 | <zewt> | it has a separate encoding for simplified chinese |
16:03 | <hsivonen> | zewt: so is the page using CP950 or UTF-8? |
16:03 | <zewt> | but it's even more bizarre to pick a *japanese* font, I'd be less surprised if it mixed up zh-TW and zh-CN |
16:03 | <zewt> | hsivonen: http://zewt.org/~glenn/test-zh-TW-utf-8.html |
16:03 | <hsivonen> | zewt: absent @lang, Unihan in Firefox defaults to Japanese |
16:04 | <zewt> | that's very surprising |
16:04 | <hsivonen> | zewt: boo. that page doesn't declare its encoding |
16:04 | <zewt> | (not to say a bad thing, not depending on the locale like charsets do is a plus, just not what I'd ever expect) |
16:04 | <hsivonen> | zewt: having Web content behavior depend on browser locale is evil |
16:04 | <zewt> | it does, or it did a minute ago |
16:05 | <zewt> | but now apache is being stupid, apparently |
16:05 | <zewt> | hsivonen: yes, but a very common evil |
16:06 | <zewt> | gave apache a kick and my content-type header is back |
16:07 | hsivonen | mumbles about apache not updating etag when headers change |
16:07 | <zewt> | i suppose that so long as in practice, every chinese system also has japanese fonts installed, then that's okay |
16:07 | <zewt> | which is the case in Windows, i believe (all asian fonts are installed as a unit) |
16:21 | <hsivonen> | annevk: fwiw, I think I have an implementation of spec-compliant responseType "json" |
16:24 | <zewt> | okay, now i'm confused: what about Chrome? it also appears to default to Japanese for UTF-8, and it doesn't support @lang at all as far as I know |
16:25 | <zewt> | ... so, other than setting a zh-TW font by name (evil), how does anyone display zh-TW in chrome? |
16:25 | <zewt> | (i expect I'm doing something silly) |
16:26 | <hsivonen> | zewt: please file a bug on Chrome |
16:27 | <zewt> | saying what? no doubt they already know they don't support @lang, but I'm not sure what the expected path is supposed to be currently |
16:27 | <hsivonen> | zewt: saying they should consider @lang when doing font selection |
16:27 | <zewt> | they (chrome + webkit) must already know that :) |
16:28 | <hsivonen> | zewt: they might not. also, they might not know that someone cares |
16:28 | <zewt> | but how's it supposed to work today? are a billion people just hacking around it with explicit font names? |
16:28 | <hsivonen> | these newfangled browsers trying to get away without doing stuff that Gecko has done for a decade |
16:29 | <hsivonen> | zewt: dunno. possibly. |
16:29 | <hsivonen> | zewt: or using parochial encodings |
16:29 | <zewt> | if they don't know by now that every Chinese speaker doesn't want pages displayed in a Japanese font, then I don't think I could convince them |
16:29 | <hsivonen> | zewt: or does Chrome not vary the font for Chinese legacy encodings? |
16:30 | <zewt> | i expect it does for those, though my big5 test is mojibake at the moment; let me see why |
16:30 | <erlehmann> | zewt, dont misunderestimate the chromium team! |
16:31 | <zewt> | heh |
16:31 | <zewt> | well, webkit, really |
16:31 | <zewt> | i sort of feel bad for all the blame the chrome guys get for things that are webkit's fault :P |
16:34 | <zewt> | ah, hold on, it's rendering as zn-CN |
16:34 | <zewt> | (which is hard for my weak western eyes to distinguish from jp) |
16:35 | <erlehmann> | gajin baka! |
16:35 | <zewt> | 確かに |
16:35 | <zewt> | i'm an american; i take *pride* in not being able to read other languages! |
16:35 | <hsivonen> | zewt: huh? aren't Japanese glyph designs generally closer to Traditional Chinese designs than Simplified Chinese designs? |
16:35 | <erlehmann> | western font culture is pig disgusting! |
16:36 | <zewt> | hsivonen: i need to pick a test case with bigger character differences |
16:37 | <erlehmann> | hsivonen, japanese glyph designs are most closely resembled by pokemans, see 🐭 🐮 🐯 🐵 |
16:37 | <erlehmann> | emoji are the new fancy shit! |
16:38 | <erlehmann> | though i question the need for U+1F5FE SILHOUETTE OF JAPAN |
16:38 | <zewt> | unicode jumped the shark with PILE OF POO |
16:38 | <erlehmann> | or U+1F5FC TOKYO TOWER |
16:38 | <erlehmann> | zewt, i heard the poop has eyes on ios! |
16:39 | <erlehmann> | however i cannot confirm due to not running non-free operating systems. *strokes his neckbeard* |
16:39 | <hsivonen> | interesting. accoding to the comments at http://my.opera.com/ODIN/blog/2011/08/09/introducing-oupeng-a-chinese-opera Opera has a Mini server farm behind the Great Firewall |
16:39 | <karlcow> | http://www.fileformat.info/info/unicode/block/miscellaneous_symbols_and_pictographs/list.htm |
16:40 | <erlehmann> | a mini server farm! likea beowulf cluster of wall warts! |
16:40 | <erlehmann> | karlcow, know any other font besides symbola that can do that? |
16:40 | <erlehmann> | i have symbola installed |
16:40 | <erlehmann> | but it looks weird |
16:41 | <erlehmann> | how does the apple do it? do they have SVG fonts for emoji? |
16:41 | <erlehmann> | hi, cowboy. |
16:42 | <karlcow> | hmmm good question not sure what apple installed on the system I haven't checked |
16:43 | <erlehmann> | the apple confounds me. |
16:44 | <zewt> | okay, found the source of my confusion |
16:44 | <zewt> | http://zewt.org/~glenn/encoding%20utf-8.html renders correctly in FF8 in Win7 (my desktop), but incorrectly in XP (my test VM; zh-TW shows the jp glyph) |
16:46 | <zewt> | chrome on http://zewt.org/~glenn/enc/big5.html shows the zh-CN glyph (which is weird--it should be zh-TW, right?) |
16:49 | <hsivonen> | zewt: did you test zh-Hant and zh-Hans? |
16:49 | <zewt> | nope, don't know anything about those |
16:49 | <zewt> | (not that I know a great deal about zh-anything) |
16:49 | <erlehmann> | zsh! |
17:28 | <zewt> | fyi, probably not surprisingly, ie8 does use the locale to pick the font |
17:28 | <zewt> | (for utf-8) |
17:44 | <Ms2ger> | Are there any websocket api tests already? |
17:46 | <jgraham> | Ms2ger: I don't know of any public ones |
17:47 | <jgraham> | We had some but they also tested the protocol so they need to be updated |
17:47 | <jgraham> | (in general testing websockets is pretty hrad) |
17:47 | <jgraham> | *hard |
17:48 | <jgraham> | Because you need a server component that can give you arbitary bits on the wire in response to your messages |
17:48 | <Ms2ger> | Oh, MS has some |
17:48 | <jgraham> | to check that the implementaion does the right thing API wise when it hits various edge cases |
17:49 | <jgraham> | Ms2ger: Just "WebSocket in window" type tests? |
17:49 | <Ms2ger> | new WebSocket(>2 arguments) |
17:50 | <Ms2ger> | (Which happens to throw in Gecko) |
17:51 | <jgraham> | Hmm, these tests do require a server |
17:52 | <jgraham> | http://html5labs-interop.cloudapp.net/ |
17:52 | <jgraham> | http://html5labs-interop.cloudapp.net/wsdemo.html ... "To view this content please install Silverlight" |
17:53 | <Ms2ger> | \o/ |
17:53 | jgraham | will look slighly confused for a bit and then get on with something more useful |
17:55 | <zewt> | please install a proprietary browser plugin like it's 1999 |
17:56 | <Ms2ger> | Interesting, MS's W3C-submitted tests support MozWebSocket |
18:06 | <Ms2ger> | "Undefined variable: WebSocket" |
18:06 | Ms2ger | glares towards Opera |
18:55 | <jgraham> | Ms2ger: Didn't your moher teach you it is rude to glare? |
18:56 | <Ms2ger> | My moher didn't, no |
18:59 | <jgraham> | What about your mother? did she teach you that it's rude to mock people's typos? |
19:01 | <shepazu> | or did you teach her how to suck eggs? |
19:01 | <Ms2ger> | She taught that it's fair game to laugh at doctors in astrophysics |
19:05 | <jgraham> | Dammit. |
19:06 | <jgraham> | I somehow ended up in a ridiculed minority. |
19:06 | <JonathanNeal> | Were any of you guys able to check this out? https://github.com/jonathantneal/html5css/blob/master/style.css it's the ua css i wrote up based on the html5 spec. |
19:26 | <Hixie> | annevk: btw, looks like your multipage generator script isn't deleting old sections |
19:26 | <Hixie> | annevk: e.g. we still have content-models.html and apis-in-html-documents.html and even video.html which haven't been updated in months |
19:27 | <Hixie> | (pretty sure that's not a problem on my end because i delete the entire directory when regenerating it) |
19:32 | <Ms2ger> | Hixie, there's a bug about that |
19:32 | <Ms2ger> | (At least for the W3C copy |
19:32 | <Ms2ger> | ) |
19:40 | <zewt> | starting to see somewhat more pages catching and breaking major browser hotkeys |
19:40 | <zewt> | twitter and now AWS's console |
20:06 | <zewt> | http://zewt.org/~glenn/encoding%20utf-8.html strange; IE9 shows zh-CN in the lang=jp case (firefox shows all three fine, and IE9 does get zh-TW and zh-CN right) |
20:11 | <zewt> | but it gets it right with shift-jis (or else all of Japan would be yelling at them) |
20:11 | <zewt> | it feels like some browser vendors are actively trying to prevent asia (especially Japan) from using UTF-8 |
20:15 | <zewt> | aha, it wants lang=ja (other browsers accept lang=jp, so that's what I ended up typing) |
20:16 | <smaug____> | annevk: ping |
20:45 | <Hixie> | heycam|away: does SVG have path objects? |
20:46 | <Hixie> | it's ridiculous how many people are asking for ellipses suddenly |
20:54 | <Philip`> | Maybe they thought that asking for ellipses is the most effective way to get a non-terrible API for drawing circles |
20:55 | Philip` | thinks APIs that require you to remember to use the argument 2*Math.PI don't count as non-terrible |
20:58 | <TobiX> | Philip`: As long as you don't have to use LOGO to draw a circle ;) |
21:22 | <TabAtkins> | tantek: You still need me for anything? (Returning the ping from a while ago without reading anything from near it.) |
21:24 | <zewt> | did the unsubscribed-poster setting change on the list? curious how that spam got through |
21:24 | <TabAtkins> | It happens occasionally. |
21:28 | <gavinc> | are there any standalone implementations of the UTF-8, with error handling decoding? |
21:31 | <tantek> | TabAtkins - I think it may have been about the my documentation of the proposal to allow space instead of t in date-and-time microsyntaxes |
21:31 | <tantek> | it appears to be resolved now in WHATWG |
21:32 | <TabAtkins> | kk |
21:32 | <tantek> | however you're still encouraged to review and edit/improve the research documentation: http://wiki.whatwg.org/wiki/Time_element#permit_space_instead_of_T_in_datetimes |
21:35 | tantek | is proceeding with attempting to grow consensus on time enhancements across WHATWG and HTMLWG specs. |
21:44 | <jgraham> | gavinc: What do you mean by "implementation" in this case? A C library that will take UTF8 and give you back (some string format)? |
21:45 | <jgraham> | (not that I know the answer in any case, but...) |
21:45 | <zewt> | i assume he means the particular rules defined by http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#utf-8 |
21:45 | <gavinc> | zewt: Yes |
21:45 | <gavinc> | jgraham: or even just an example that isn't stuck way way down inside a browser |
21:46 | <zewt> | i'd imagine the utf8-to-codepoints code is probably not very complex ... maybe check webkit? |
21:46 | <zewt> | (webkit tends to have code which is easier to read in isolation than gecko, in my brief experiences with both) |
21:47 | <gavinc> | zewt: well, most languages have UTF-8 parsers for reading byte streams. I guess I was wondering if anyone had already implemented utf-8 with error recovery as one of those |
21:47 | <zewt> | well, they might implement error recovery, but in different ways |
21:47 | <zewt> | i assume you want those particular rules |
21:48 | <gavinc> | zewt: Right, so specifically THOSE rules |
21:48 | <gavinc> | Java, Python and Ruby don't do any error recovery, they just explode on invalid utf-8 |
21:49 | <gkellogg> | gavinc: I can't say that for sure about Ruby, as the case I'm looking at doesn't actually look like UTF-8 |
21:53 | Philip` | uses the code at http://trac.wildfiregames.com/browser/ps/trunk/source/lib/utf8.cpp , which can either abort and report errors or else replace invalid stuff (for some value of 'stuff') with U+FFFD |
21:53 | <zewt> | here's (some version of) webkit's utf-8 decoding: http://gitorious.org/webkit/sedkit/blobs/94b6ef65ee422c62fb91ca9be76ee2fa5310d718/Source/WebCore/platform/text/TextCodecUTF8.cpp |
21:54 | <zewt> | not as simple as an "example code" standalone might be, but not complex |
21:55 | <zewt> | (i don't know if webkit actually implements utf-8 error handling exactly per spec, but I'd hope so) |
21:55 | <zewt> | also I don't know what version that is (I just googled for the source file of whatever WebKit source version is lying around in my ~) |
21:56 | <gavinc> | zewt: yeah, specifically line 169 http://gitorious.org/webkit/sedkit/blobs/94b6ef65ee422c62fb91ca9be76ee2fa5310d718/Source/WebCore/platform/text/TextCodecUTF8.cpp#line169 |
21:56 | <zewt> | decodeNonASCIISequence looks like the fast path with no error handling, check TextCodecUTF8::decode below it |
21:57 | <gavinc> | No, it's still doing the error handling as specified |
21:58 | <zewt> | it's the asserts that make it look dubious |
21:59 | <zewt> | havn't squinted very hard at it |
22:00 | <zewt> | there's still enough platform stuff in there that it may be more work to extract it to use it independently than to just rewrite it |
22:00 | <Velmont> | gavinc: chardet python library is using charset sniffing code/algorithm from browsers at least. |
22:02 | <gkellogg> | ww: so, it seems that the problem is specific to Ruby 1.9. Running on a different version, I seem to be able to parse (without actually generating triples) in about 22 secs. I'll try making triples next |
22:02 | <gavinc> | gkellogg: mischan ;) |
22:03 | <gkellogg> | :) |
22:03 | <gavinc> | of course chardet 410s... sigh, yeah |
22:05 | <gavinc> | Velmont: chardet just does the detection, not the bytes-> chars :( |
22:07 | <zewt> | ->codepoints :) |
22:07 | <Wilto> | annevk: Ping. |
22:15 | <gavinc> | zewt: yes those :P |
22:22 | <jgraham> | Hmm, I'm pretty sure the python codecs module has some possibility for error recovery, but I doubt it matches this spec |
22:22 | <gavinc> | reading now, it's C with goto, may take a few passes at reading |
22:23 | <jgraham> | errors='replace': replace malformed data with a suitable replacement marker, such as '?' or '\ufffd' |
22:23 | <gavinc> | yeah, I think gets darn close of what HTML5 says |
22:23 | <zewt> | jgraham: would be interesting if they could be convinced to converge on html's rules |
22:24 | <gavinc> | Was also looking if there were any test cases yet for that algorithm |
22:24 | <jgraham> | zewt: I don't know how or if it is different |
22:24 | <zewt> | i don't either |
22:25 | <gavinc> | one example, it doesn't replace surrogates |
22:25 | <zewt> | or if they're prevented from changing it for compat, add an errors='html' mode |
22:26 | <gavinc> | exactly :D |
22:27 | <gavinc> | was hoping that it could be done by registering a new codex, but it looks like that would be staggeringly slower |
22:30 | <jgraham> | gavinc: Where is the actual decoder in python? |
22:31 | <jgraham> | I mean actual python decoder |
22:31 | <jgraham> | in c |
22:31 | <jgraham> | Well I mean something. I'm not sure either way round was clear |
22:33 | <zewt> | PyUnicode_DecodeUTF8Stateful, I think |
22:33 | <zewt> | at least in 3.1.3 |
22:34 | <gavinc> | PyUnicode_DecodeUTF8Stateful |
22:34 | <gavinc> | yeah |
22:34 | <zewt> | i win |
22:34 | <gavinc> | I was reading it :P |
22:39 | <gavinc> | http://hg.python.org/cpython/file/17ceebc61b65/Objects/unicodeobject.c#l2555 |
22:54 | <gavinc> | Well, the example in HTML comes out as A��B�C☺�... so perhaps some test cases are in order :D |
23:00 | <gavinc> | https://gist.github.com/1445180 |
23:02 | <annevk> | Wilto: ? |
23:03 | <annevk> | hsivonen: cool |
23:03 | <Wilto> | Hey man, Mike Taylor mentioned I might want to get ahold of you—I’ve been working with Scott Jehl et al. on the whole “responsive images” problem. |
23:03 | <annevk> | Hixie: that can be fixed, I'm currently just overwriting files |
23:03 | <Wilto> | We whipped up that canvas-based demo the other day. |
23:03 | <Wilto> | ( http://scottjehl.com/imgwithfallback.html ) |
23:04 | <annevk> | ah cool |
23:04 | <annevk> | yeah brucel keeps bugging me about it |
23:04 | <Wilto> | He doesn’t know it, but he might have just become my hero. |
23:04 | <Wilto> | I’ve been obsessing over this stuff for months. |
23:04 | <annevk> | so personally I'm not sure the requirements from authors are quite clear enough for a new markup feature |
23:04 | <annevk> | that's my a |
23:05 | <annevk> | my b is that I sort of feel that long term bandwidth will be less of an issue and everything will be 300ppi |
23:05 | <annevk> | and new markup only hits long term |
23:05 | <Hixie> | annevk: if you would that'd be great. i had someone send me a link to the webrtc section today, without them realising it was dead |
23:05 | <annevk> | Hixie: so probably tomorrow around this time it will be fixed |
23:06 | <annevk> | Hixie: unless I get to it before going to bed somehow |
23:06 | <Hixie> | annevk: roger, thanks |
23:09 | <annevk> | Wilto: both a and b might be false; there's not really sufficient information imo other than that some people are hitting this problem today |
23:10 | <Wilto> | annevk: Granted, yeah. And believe me, I have in no way been banking on “we need a new element”—I assume that’s a phrase that’s thrown around a lot when people first run into an issue like this. |
23:10 | <annevk> | Wilto: that's the brucel stance :) |
23:10 | <Wilto> | Thing is—and again, not something I say easily—this just doesn’t feel solvable on the front end. |
23:11 | <Wilto> | I would absolutely love to write up the history of the issue. I’m sure you guys have been getting it in fits and starts. |
23:12 | <Wilto> | It is a sordid tale of cookies and dynamically-injected base tags. |
23:13 | <zewt> | gar, quotostrophes |
23:14 | <Wilto> | The general thinking is that a reliable fallback pattern already exists in video/audio/canvas. |
23:15 | <zewt> | but that's a different pattern |
23:15 | <heycam> | Hixie, yes SVG does have path objects. but they, like much of the SVG DOM, suck. :) |
23:15 | <annevk> | Wilto: yeah I know, but that is mostly for different formats, although admittedly media queries are there (not sure if they are implemented though) |
23:15 | <Wilto> | Well, in terms of not breaking things that already exist. |
23:16 | <heycam> | Hixie, so we (SVGWG) want to improve the SVG DOM in the SVG2 timeframe (so some time over the next year) |
23:16 | <Hixie> | heycam: Are these the DOM nodes for some sort of path element, or are they separate Path objects? |
23:16 | <heycam> | Hixie, no, separate path objects |
23:16 | <heycam> | Hixie, however unlike say SVGPoint/SVGMatrix you can't create a disconnected SVGPath |
23:16 | <heycam> | Hixie, they only exist to reflect path data on elements at the moment |
23:16 | <Hixie> | k |
23:16 | <Hixie> | i'm gonna guess i'll be coming up with a new interface, but i'll take a look to see what can be reused... |
23:17 | <heycam> | Hixie, ok if you get to it before we do (which sounds likely) I'll take a look at what you come up with and see where to go from there |
23:17 | <Hixie> | i'm expecting to do work on canvas early next year |
23:18 | <Hixie> | one of the things is adding the path primitives |
23:18 | <Hixie> | would be great to have your feedback when i do, of course |
23:18 | <heycam> | yep sure |
23:18 | <Wilto> | annevk: So, in your opinion, is this idea a total non-starter? |
23:18 | <annevk> | Wilto: no, all I have is some thoughts |
23:18 | <Hixie> | heycam: i dunno how much similar they'll be. I mean, some of the stuff I'm looking at doing is the focus ring stuff, hit testing, etc. |
23:19 | <Wilto> | Or would it help if I wrote up, uh, anything. Like I said: desperate times. |
23:19 | <annevk> | Wilto: what we need for standards is mostly data |
23:19 | <annevk> | Wilto: which indeed usually starts with someone outlining the problem and such |
23:19 | <Hixie> | heycam: explicit stroking and filling, adding text to a path, create a path with text along another path |
23:19 | <heycam> | Hixie, but these'll be methods on the canvas rather than the path object I would expect? |
23:19 | <annevk> | Wilto: in this case someone might have already done something in that direction, I forgot, but not recently |
23:19 | <Hixie> | heycam: dunno |
23:19 | <annevk> | Wilto: you can take a few hints from http://wiki.whatwg.org/wiki/FAQ#Is_there_a_process_for_adding_new_features_to_a_specification.3F |
23:20 | <heycam> | Hixie, create a path with text along another path eh :) yes we're interested in a "convert text to its outline paths" too (probably including text along a path) |
23:20 | <annevk> | Wilto: it's not a literal process, but it contains the steps we usually go through |
23:20 | <Wilto> | annevk: No—we’ve got lots of notes on the subject, but the official documentation is pretty much “ask me.” I just wanted to check in with you to see if I was completely off the rails, y’know? |
23:20 | <annevk> | Wilto: okay |
23:21 | <annevk> | Wilto: so no, you're not, a write up would be great |
23:21 | <Wilto> | Excellent. I appreciate you taking the time, man! |
23:21 | <annevk> | Wilto: we can't really promise anything, but it's valuable to have that information |
23:21 | <Wilto> | Oh, no, no promises expected. It doesn’t hurt to have it all down on paper somewhere anyway. |
23:22 | <annevk> | Wilto: even if we decide not to do it in the end, at least it's more reasoned than a hunch from someone |
23:22 | <gavinc> | The over long utf-8 sequence \xC0\xAF (/) should produce a SINGLE U+FFFD character is that the correct reading of http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#utf-8 ? |
23:22 | <annevk> | Wilto: right; cool! |
23:22 | <annevk> | and with that I'm gonna get some sleep :) |
23:22 | <Wilto> | Have a good one! |
23:22 | <annevk> | nn |
23:23 | <zewt> | gavinc: yep; a whole overlong sequence is replaced with a single replacement character |
23:24 | <gavinc> | zewt: Guess what Java and Python don't do? ;) |
23:24 | <zewt> | lots of software don't deal with overlong sequences adequately |
23:24 | <zewt> | i don't think vim does |
23:25 | <gavinc> | yeah, iconv doesn't seem to :\ |
23:25 | <gavinc> | Not sure I'm calling it correctly however |
23:25 | <zewt> | it's not always wanted--in some cases faster performance with less checks is the goal |
23:26 | <zewt> | python seems to throw an error on that sequence |
23:26 | <gavinc> | print '\xc0\xaf'.decode('utf-8', 'replace') |
23:26 | <zewt> | >>> '\xc0\xaf'.decode("utf-8") |
23:26 | <zewt> | in replace mode i'm not sure |
23:26 | <zewt> | though |
23:26 | <gavinc> | �� :D |
23:26 | <zewt> | ah right, that's legal, just a different recovery mode |
23:27 | <gavinc> | close, seems to meet all the other HTML5 rules |
23:28 | <zewt> | vim doesn't detect it and converts it directly :( |
23:29 | <gavinc> | bad vim! |
23:29 | <zewt> | well |
23:29 | <zewt> | vim keeps the original binary sequence unchanged; what's wrong is that it renders it |
23:29 | <gavinc> | as a slash? |
23:29 | <zewt> | yeah |
23:30 | <zewt> | probably just a naive utf-8 decoder |
23:40 | <Hixie> | kennyluck: I need more context for some of your bugs! :-) (e.g. bug 14890) |
23:41 | <Hixie> | haha |
23:42 | <Hixie> | bug 15051 ends with "omg, you wannbe w3c" |
23:42 | <Hixie> | i'm pretty sure if there's one thing i don't want to be, it's the w3c. :-P |
23:42 | <Hixie> | dunno about you guys. :-P |
23:42 | <zewt> | hixietf |