00:14
<Hixie>
OK!
00:14
Hixie
flexes fingers
00:14
<Hixie>
character encoding spec, where are you so i can (a) implement you and then (b) fix you good and well.
00:52
<annevk>
I was planning on sleeping, but then I came accross this: http://steinbaugh.com/asides/ems-layout/ "but I’m not planning anything new until HTML 5 or XHTML 2 gets finalized" someone should tell him
00:52
<Hixie>
xhtml2 might be finalised relatively soon
00:53
<Hixie>
so step 3 of the "algorithm for extracting an encoding from a Content-Type" says "If the next six characters are not 'charset', return nothing"
00:53
<Hixie>
anyone know if i meant that to be a case-insensitive check?
00:54
<Philip`>
Hmm, it still says "six"?
00:55
<Hixie>
uh yeah i should fix that while i'm at it
00:55
<Hixie>
but that wasn't the point :-)
01:03
<Philip`>
Lots of people use capital CHARSET so it would be most useful if it was case insensitive
01:03
<Hixie>
good to know
01:04
<Philip`>
Oh, is this for HTTP Content-Type rather than <meta>?
01:04
<Hixie>
it's for "text/html;charset=foo"
01:04
<Hixie>
whether in Content-Type or in <meta content="">
01:04
<Hixie>
(not <meta charset="">)
01:05
<Philip`>
Ah, okay
01:05
<Hixie>
still case-insensitive?
01:06
<Philip`>
In the HTTP header, I see 3 CHARSET, 223 Charset, 25451 charset, but there are far more CHARSETs in <meta>
01:06
<Hixie>
k
01:06
<Hixie>
case-insensitive it is
01:06
<Philip`>
(I don't currently have a nice way of counting contents of <meta> though)
01:06
<Hixie>
i'm setting up a script to parse the docs and find that
14:00
<annevk>
hmm
14:01
<annevk>
is it worth responding to the global href thread?
14:05
<Philip`>
If you have good arguments then it's probably useful to respond and to update the FAQ to be clearer about those points
14:05
<annevk>
dunno
14:06
<annevk>
guess i'll leave it for now
14:07
<annevk>
i think <a> should be allowed to wrap around "block level elements" and i think that adding href everywhere complicates processing models too much for no real benefit
14:09
<annevk>
using script for older user agents is a valid point though
14:10
<Philip`>
Why does it make the processing model any more complicated than it already has to be to handle global onclick and :hover?
14:13
<annevk>
because every element gains an additional default click event handler
14:14
<annevk>
(that might conflict with existing click event handlers (in case of form controls) so you need to make choices how to handle those, etc.)
14:14
<annevk>
more ways to do something always makes things more complicated, and increases QA cost, etc.
14:18
<Philip`>
Ah, sounds like the issue is that e.g. a <button> currently has one default click handler and n DOM click event handlers, and <button href> would change that to be >1 default click handler rather than changing it to n+1 DOM click event handlers
14:19
<Philip`>
(and so then things like preventDefault would become confusing, because there's no longer just one default handler)
14:20
<annevk>
it's also not clear if submitting the form and following a link at the same time makes any sense
14:22
<Philip`>
Another issue is you'd sometimes want the other link-related attributes like rel and target and ping, and you'd have media and hreflang and type too for consistency, which would get messy
15:24
<Philip`>
http://www.google.com/m/search?q=%ef%bf%bf
15:33
<zcorpan>
Philip`: it amuses you doesn't it? :)
15:36
<Philip`>
zcorpan: It would be more fun if it wasn't so trivial to find exactly the same bugs in every XHTML site :-)
15:43
<Lachy>
the problem is that so few developers understand, or even bother to consider, character encoding issues
15:44
<Lachy>
the bigger problem is that character encodings are too complex for most people to understand
15:45
<annevk>
hmm, why do I get "invalid character" in Opera for http://www.google.com/m/search?q=%ef%bf%bf but the same Acid3 test still fails?
15:49
<Camaban>
I agree with that Lachy, I've never found anything talking about character encoding that I could understand terribly well. I can look up and find various bits of info about doctypes and stuff, but good, understandable character encoding info seems a lot harder to find
15:55
<annevk>
ALA prefers IE-propaganda :o
16:03
<Lachy>
Camaban, search for "Guide to Unicode" in google, then read the first result
16:03
<Lachy>
http://lachy.id.au/log/2004/12/guide-to-unicode-part-1
16:06
<Camaban>
will hvae a look :)
16:23
<Camaban>
Lachy: sorry, but that comes under the heading of 'stuff I struggle to udnerstand', and I've only read the first couple of paragraphs
16:23
<Lachy>
Camaban, really? I was told by so many people that it was really easy to understand
16:23
<Camaban>
Since version 1.1, the Unicode standard has remained fully compatible with ISO/IEC 10646: Universal Multiple-Octet Coded Character Set. The ISO/IEC 10646 standard defines a character repertoire and character code points (or code positions), as well as two character encodings, UCS-2 and UCS-4, allowing for up to 232 code points.
16:24
<Camaban>
that means absollutely nothing to me
16:24
<Lachy>
it gets easier
16:24
<Camaban>
I'll have to have another go at getting further into it when I have a bit more time then
16:26
<annevk>
Camaban, seen http://www.joelonsoftware.com/articles/Unicode.html already?
16:27
<Camaban>
annevk: no, looks like perhaps I dind't know what to search for :)
16:27
<Lachy>
if you read that one from Joel, then you have to read this one first http://ln.hixie.ch/?start=1066145333&count=1
16:28
<annevk>
Camaban, I think that one is really good, and pretty accessible too
16:29
<Camaban>
from a quick scan, it looks so
16:30
<Camaban>
as a guy who generally codes up HTML/CSS, I tended to search for stuff about character encoding to check what I should be putting in the HTML to make it work properly, the idea of actually needing to search for, and find out about unicode hadn't occured to me
16:31
<annevk>
it explains encodings further down
16:31
<Camaban>
yeah, I see that, I guess it needs a few more links to come up in google better :)
17:41
<met_>
annevk http://www.w3.org/TR/2008/WD-XMLHttpRequest2-20080225/ HTML 5 (work in pgoress) s/pgoress/progress
18:36
<annevk>
met_, thanks, fixed
18:38
<met_>
annevk, is it possileble tomake suchchanges in released TR document?
18:38
<met_>
or there is some w3c policy?
18:40
<annevk>
TR docs are snapshots
18:40
<annevk>
it will be fixed in the next snapshot
18:41
<annevk>
sometimes TR docs are "edited in place", but going through that trouble for a Working Draft is not worth it
18:41
<met_>
ok
19:26
<zcorpan>
<!doctype html public "-//w3c//dtd html 4.0 transitional//en" " > appeared 23 times in Philip`'s data
19:27
<zcorpan>
though don't seem too broken in ie because they get terminated at <html lang="..."> or so
19:29
<zcorpan>
looking at the pages it is getting increasingly clear that ignoring the last two characters in the fpi results in better web compat
19:29
<Philip`>
The "//en" characters?
19:29
<zcorpan>
yeah
19:31
<zcorpan>
pages with <!doctype html public "-//w3c//dtd html 4.0 transitional//de"> need quirks mode
19:43
<zcorpan>
aha!
19:43
<zcorpan>
http://www.quintomiglio.com/ is the first that renders better in standards mode in opera and firefox
19:44
<zcorpan>
(gets quirks in opera, standards in firefox)
19:44
<zcorpan>
i had found 14 before that one where the opposite is true
19:45
<zcorpan>
(and a bunch that would render pretty much the same in quirks and no-quirks)
19:51
<annevk>
man, what did I do to this Garett Smith
19:51
<annevk>
Garrett*
20:34
<Hixie>
annevk: ?
20:36
<annevk>
yo
20:36
<annevk>
ignore my www-archive e-mail please
20:36
<Hixie>
the acid3 one?
20:36
<Hixie>
i saw your reply, was already ignoring both :-)
20:36
<Hixie>
i was wondering what made you wander what you'd did to garrett
20:37
<annevk>
he seems so hostile
20:37
<Hixie>
right, but which thread?
20:38
<annevk>
http://lists.w3.org/Archives/Public/www-style/2008Feb/thread.html#msg274
20:39
<Hixie>
oh, www-style
20:39
<Hixie>
i don't read that anymore
20:40
<annevk>
i do
20:40
<annevk>
I'm still interested in CSS and nobody else is doing it
20:41
<Hixie>
yeah
20:41
<Hixie>
if i wasn't doing html5 i probably would be
20:41
<Hixie>
though i'd be sorely tempted to start a competing organisation to standardise css properly, with a real community, etc, like the whatwg
20:42
<Hixie>
lord, you're right, wtf did you do to this guy
20:44
<jwalden>
existed
20:45
<othermaciej>
he seems kind of angry
20:47
Hixie
mumbles something about being glad this poem discussion is going on on public-html and not whatwg
20:47
<Hixie>
i like Philip Taylor's comment:
20:47
<Hixie>
"I'm reluctant to enter into this debate, yet
20:47
<Hixie>
feel strangely compelled so to do."
20:48
<annevk>
i tried reading that wiki page, but it was very unclear and very big
20:48
<annevk>
so i gave up and did something else
20:49
<Hixie>
i fear that when it comes time to go through the wiki pages, a lot of them will be getting a response of "the problem is not described well enough for me to address this issue"
20:50
<annevk>
surprisingly many being with describing "the solution"
20:50
<Lachy>
that's because describing a solution is much easier than describing what the problem is
20:51
<Hixie>
yup
20:51
<Hixie>
but as editor i can't care about solutions without understanding the problem
20:51
<Hixie>
since i can't evaluate a solution without knowing the problem
20:51
<Lachy>
I gave up on the HTMLWG wiki a long time ago, when I realised, despite several attempts to nudge them in the right direction, their content remained mostly useless
20:53
<annevk>
as with usability, it seems better to learn what people need than what they want
20:54
<Philip`>
http://tug.ctan.org/cgi-bin/filenameSearch.py?filename=%00 - http://www.hipocampo.org/buscar.asp?search=%01 - http://virtueventures.com/services.php?page=6%ef%bf%bf - this really is too easy
20:54
<Lachy>
I really don't understand what I did to provoke this hostile respone. http://www.w3.org/mid/47C5A056.4080108⊙mn - I thought all I did was try to clarify what the spec was saying to someone who misunderstood it
20:55
<Hixie>
he'd already told me off for not understanding him
20:55
<Hixie>
anyway as far as i'm concerned that issue is resolved
20:55
<Hixie>
since the term "prose" is no longer in the spec
20:56
<Lachy>
yeah, him telling you off is understandable. Everyone seems to do that :-)
20:57
<annevk>
you're part of Hixie's posse Lachy, deal with it :p
20:57
<Hixie>
:-)
21:02
gsnedders
hides from the Cabel
21:02
<Lachy>
gsnedders, do you have a fear of cables or did you mean cabal?
21:02
<gsnedders>
Lachy: cabal
21:08
<jwalden>
!summon zcorpan
21:11
Hixie
disillusions people on the semantic web in html5 http://realtech.burningbird.net/semweb/semantic-web-dull-as-dishwater-edition/#comment-372
21:13
<Lachy>
Hixie, no comment from you on that page
21:14
<Lachy>
is it awaiting approval?
21:15
<Hixie>
yeus
21:21
Dashiva
wonders how they go from "People make links because they're useful, search engines add extra value" to "People add metadata because... there's no benefit whatsoever"
21:22
<Hixie>
feel free to comment also
21:22
<Hixie>
my comment is there now btw
21:23
<Hixie>
http://www.zingermans.com/ is great
21:23
<Hixie>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-16" />
21:24
<Dashiva>
I'll leave commenting to the official opera dudes
21:31
<Lachy>
Dashiva, non-opera people can comment too. It saves us the work :-)
21:32
<Dashiva>
Yeah, but I might say something to make chaals some after me
21:34
<Lachy>
why would chaals come after you? you don't work for Opera, do you?
21:34
<Dashiva>
I have worked for opera three summers in a row now, and I'm several kinds of volunteer rest of the year :)
22:03
<Dashiva>
You got a reply, Hixie :)
22:03
<Hixie>
woah, big reply
22:05
<Dashiva>
I wonder if links going 404 counts as data rot or not as far as bananas go
22:20
<annevk>
i don't think chaals has a bias towards Opera employees
22:20
<annevk>
btw
22:24
<annevk>
lol, TAG gets involved in ARIA
22:29
Hixie
searches for the mail anne refers to in his "Unlikely to be useful" folder
22:31
<annevk>
Hixie, it now changes UTF-16 to UTF-8 in two separate places, that's the idea?
22:31
<Hixie>
yeah. it was almost three, but i reduced it to two
22:32
<Hixie>
not sure how to merge the last two, but if it gets more complex i'll abstract it out into a separate "set of steps"
22:33
<annevk>
oh also, if you're testing charset and stuff, apparently two out of four browsers require http-equiv=content-type
22:33
<Hixie>
which two?
22:33
<Hixie>
http://www.hixie.ch/tests/adhoc/html/parsing/encoding/ is my test suite btw
22:33
<annevk>
IE and Opera iirc
22:33
<Hixie>
k
22:33
<Hixie>
seems safe not to require it then
22:34
<annevk>
i suppose
22:34
<Hixie>
it's sad that tbl is no longer really up to date with html on the web
22:35
<Hixie>
hey does anyone have IE? my mac is trying to update itself and i can't run the VM while that's happening
22:35
<annevk>
lets keep track of where we are in a decade
22:36
<annevk>
i have IE7 running, though UI-wise it's limited
22:36
<Hixie>
i need to know the result of these tests:
22:36
<Hixie>
http://www.hixie.ch/tests/adhoc/html/parsing/encoding/069.html
22:36
<Hixie>
http://www.hixie.ch/tests/adhoc/html/parsing/encoding/070.html
22:36
<Hixie>
http://www.hixie.ch/tests/adhoc/html/parsing/encoding/071.html
22:36
<Hixie>
http://www.hixie.ch/tests/adhoc/html/parsing/encoding/072.html
22:36
<annevk>
69: FAIL
22:37
<annevk>
windows-1254 used
22:37
<annevk>
70 fail, windows-1252 used
22:37
<annevk>
71, fail, windows-1254 used
22:37
<annevk>
72 fail, windows-1252 used
22:38
<Hixie>
well crap
22:38
<Hixie>
that's three browsers, three different sets of results
22:38
<Hixie>
firefox is Windows-1252 for all four
22:38
<SadEagle>
heh. for more fun, not that you care, konq3 uses iso-8859-9 --- supposedly, and... well, let's not talk about what 4 does
22:38
<Hixie>
safari is ISO-8859-9 (equiv of 1254 for the sake of this test) for all four
22:39
<Hixie>
how about opera?
22:39
<annevk>
maybe IE doesn't do content="...;charset='...'"
22:39
<annevk>
Opera passes all four
22:40
<Hixie>
ISO-8859-9?
22:40
<Hixie>
hm
22:40
<Hixie>
well crap
22:40
<annevk>
Opera ftw
22:40
<annevk>
:)
22:40
<Hixie>
well, "pass" is actually not what the spec says right now
22:40
<Hixie>
windows-1252 for all four is what the spec currently requires
22:41
<Hixie>
looks like spaces in encoding names is rare
22:41
<Dashiva>
Oh, Hixie. You and your wacky testcase hijinx
22:41
<Hixie>
so i guess, no trimming.
22:44
<Philip`>
annevk: In IE7, I see "PASS" ("Windows-1252") in 070 and 072
22:44
<Hixie>
i just changed the tests
22:44
<Philip`>
Ah
22:44
<Hixie>
anne's earlier report is now out of date
22:49
<Dashiva>
"But professor, these are the same questions as last year's exam!" "I know. I changed the answers."
22:53
Philip`
is reminded of a quiz show that asked how many moons the Earth has, and the correct answer was two; and the next year they asked the same question, and the answer "two" resulted in -10 points because the correct answer was now five
22:53
<Lachy>
Hixie, Philip`, do either of you have any stats on how much VoteLinks are used? rel=vote-for, vote-against, and vote-abstain?
22:54
<Hixie>
haven't seen it, but i haven't done studies for rel="" in a while
22:55
<blooberry>
lachy: what context are they used in? (element)
22:56
<Lachy>
blooberry, http://microformats.org/wiki/vote-links
22:56
<blooberry>
ah...A element. gotcha
22:57
<blooberry>
quick check: 7 times in DMoz URL set for vote-for. didn't find any for the others. Would have to do a deeper check to see if I'm looking for the right thing.
22:59
<Lachy>
technorati has removed their vote-links tracking page. I guess that means it failed. http://www.technorati.com/live/votes.html
23:00
<Hixie>
live and learn
23:02
<blooberry>
ah, I see...it is used more in REV attribute than REL. I found cases for all values...not many though: for: 29 cases; against: 14 cases; abstain: 12
23:09
<Hixie>
out of?
23:10
<blooberry>
3.5 million
23:11
<Hixie>
wow
23:11
<Philip`>
I see http://www.ehlinelaw.com and http://www.wiredprairie.us out of some tens of thousands that it's still processing
23:11
<Hixie>
that's low enough that they might be typos!
23:11
<blooberry>
indeed. 8-}
23:11
<blooberry>
I'll get a URL list for those now...should take a couple minutes. There could be some URL overlap there too.
23:12
<Lachy>
0.0008% seems quite insignificant
23:12
<Lachy>
I don't need it
23:13
<Philip`>
Hmph, lots of web pages use attributes
23:13
<Lachy>
Just needed to know whether or not it had any real world usage, in order to say whether or not it should be included in an HTML reference I'm reviewing
23:13
Philip`
's interesting-attribute extraction thing has got to 500MB of output from 50K pages
23:14
<Philip`>
blooberry: By the way, I don't know if you noticed but it looks like almost all of cnn.com has been removed from dmoz.org
23:15
<Hixie>
man i wish people wouldn't say things like "in 8.2.2.1 step 3"
23:15
<Hixie>
who knows what that will refer to by the time i reply to the e-mail
23:15
<Lachy>
LOL
23:16
<Dashiva>
"In line 8137..."
23:16
<blooberry>
philip`: hadn't noticed *cheers*
23:17
<Lachy>
Hixie, maybe you should add a big note to the top of the spec, just about the TOC that says section numbers are subject to change, please refer to sections by title (or something equally descriptive)
23:18
<Hixie>
the people who would notice that are clever enough to work it out for themselves :-)
23:18
<blooberry>
philip`: I have had some similarly large reports for some web page factors
23:20
<Philip`>
I don't want to use up all my disk space because then I'll have to try to figure out the LVM commands to add an extra partition and I'll probably break everything horribly
23:20
<Lachy>
the TOC is likely to be one of the first things people look at when the load the spec, before going to somewhere more specific. I'm sure they would see it if it were big, red and blinking.
23:21
<Dashiva>
It may blink, but parts of it must not flash for more than 3 seconds.
23:28
<gsnedders>
Hixie: am I guilty (of using numbers)?
23:28
<gsnedders>
I know at times I use both number and section title
23:30
<blooberry>
lachy: ah...just noticed the results list came back for rev=vote-*. only 37 URLs total, and 8 of those were on mp3.com URLs.
23:30
<blooberry>
philip`: Interesting. It didn't find either of those URLs that you mentioned. 8-{ *looks to see if they are actually in my database*
23:30
<blooberry>
(my crawl is from ~3 months ago though)
23:34
<Philip`>
http://www.ehlinelaw.com http://www.wiredprairie.us http://maxicine.com/cine/criticas-cine-moriras-en-tres-dias-31303 is all, out of 125K from a few days ago
23:34
<Hixie>
the rev=vote-against on http://www.wiredprairie.us is bogus
23:34
<Hixie>
(it's to a javascript: uri!)
23:35
<Hixie>
it's talking about what the user is voting against
23:35
<Hixie>
not what the page is voting against
23:36
<Hixie>
and http://www.ehlinelaw.com seems to have vote-for on almost all internal links
23:36
<Hixie>
looks like a misguided SEO attempt
23:39
<blooberry>
hmm. my script isn't picking up the REV from ehlinelaw...interesting and weird.
23:41
<Philip`>
blooberry: Is that something like case sensitivity, or more significant misparsing of the page?
23:42
<blooberry>
it could be. This is worth tracking down. 8-}
23:48
<blooberry>
ah...heheh. not a bug. In my output I was just putting it in a different section that I had hidden. It does find it, so it must be a more recent addition.
23:57
<annevk>
nice, getSVGDocument() == contentDocument now
23:57
<annevk>
would've been better if the former was never introduced, but still