00:34
<Domenic>
TabAtkins: if I were to write a piece of software that took (propName, value) pairs and parsed value according to the grammar for the CSS propName (e.g. given propName = "margin", parsed value according to `<margin-width>{1,4} | inherit`), where would I find an index of propName => grammar productions?
00:35
<TabAtkins>
Nowhere yet. It's something we plan to add to Shepherd.
00:35
<Domenic>
dang, ok
00:51
<JohnMH>
https://url.spec.whatwg.org/
00:51
<JohnMH>
Specifically, "The URL Standard defines URLs, domains, IP addresses, the application/x-www-form-urlencoded format, and their API."
00:51
<JohnMH>
What does the URL Standard have to do with domains or IP addresses?
00:52
<JohnMH>
I understand that these are possible inputs for the host field of a URL, but what do they have to do with a URL itself?
01:02
<Domenic>
JohnMH: https://url.spec.whatwg.org/#hosts-%28domains-and-ip-addresses%29
01:23
<JohnMH>
Domenic: That's not defining domains and IPs, that's using them
02:24
<Domenic>
Sure look like definitions to me.
02:27
<JohnMH>
Domenic: You're referencing RFCs, not defining what a domain or IP is.
02:28
<Domenic>
It does both.
02:29
<JohnMH>
Domenic: If you were to define a domain or IP, you would need to talk about what is and is not valid, why, and best practices.
02:29
<JohnMH>
And, where it says domains, I assume it means hosts?
03:00
<Domenic>
No a host is a distinct definition; scroll down a bit.
03:01
<Domenic>
(In general it seems like a lot of the things you're asking about are answered after scrolling down.)
04:15
<JohnMH>
Domenic: Are you referring to Section 3?
04:16
<JohnMH>
That doesn't define a host, it incorrectly references RFCs and reiterates information one would find in those RFCs. A host isn't "a domain, an IPv4 address, or an IPv6 address."
04:17
<JohnMH>
A host is any hostname or address, not specifically a domain or IP address
04:17
<JohnMH>
protocol://myhost/ is a valid URL
04:18
<JohnMH>
protocol:///myhost/ is not, for example
04:20
<Domenic>
You keep stating what things "are". It's very curious to me why you think your definitions of these terms are more interesting than the definitions in the URL standard.
04:21
<JohnMH>
These aren't my definitions.
04:22
<JohnMH>
Domenic: Hostnames are first defined here: https://tools.ietf.org/html/rfc952
04:22
<JohnMH>
You may follow from there to other RFCs
04:22
<Domenic>
But why would I want to?
04:23
<JohnMH>
To have a proper definition of a hostname, which the "URL Standard" shouldn't include in the first place
04:23
<Domenic>
You keep making statements about how the world "should" be, and what is "proper". Can you perhaps envision other people might disagree with you on those "should"s and "proper"s?
04:24
<JohnMH>
RFCs which everyone has based their definitions on for more than 30 years
04:24
<Domenic>
Well, clearly not everyone, including some of the most widely-used software on the planet.
04:25
<JohnMH>
That's not true, and you're simply ignoring that fact
04:26
<Domenic>
I guess I can't do much more to argue against such assertions than say "no, you're the one saying untrue things and ignoring facts"
04:26
<JohnMH>
For example, wget, curl, WinHTTP
04:26
<Domenic>
For example, Netscape, IE, Firefox, Edge, Chrome, Safari
04:26
<Domenic>
(Opera!)
04:26
<JohnMH>
Opera is just based on Chromium, why should it even be considered its' own browser?
04:26
<Domenic>
I meant Presto Opera
04:27
<Domenic>
Anyway, bedtime for me; night-night. Can discuss more tomorrow, but hopefully you can see how people might have different perspectives than yours.
04:27
<JohnMH>
I don't see how supporting malformatted URLs is actually something you'd want in any document.
04:28
<JohnMH>
Instead of supporting malformatted URLs, it'd be best to either throw an error at the user or just transparently rewrite it
04:28
<JohnMH>
I don't know why you're only using browsers for these examples, though.
04:28
<JohnMH>
Browsers are far from the only software which use URLs.
04:34
<Domenic>
tranparently rewriting it is exactly what is done!
04:34
<Domenic>
that is what the process of parsing does
04:35
<Domenic>
For example! http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4182
04:35
<Domenic>
(look at the "log:" output)
04:36
<Domenic>
As for why I am using browsers, well, we are in the channel of an organization focused on making browsers better. But it's true that if you had software that did not want to interoperate with browsers at all, then it could use different algorithms for URL processing
04:36
<Domenic>
I would assume you would not discuss that algorithm in this channel though
04:38
<Domenic>
I am excited to discover that we may be more in alignment than previously thought, though, with your "transparent rewrite" idea.
04:38
<Domenic>
But I really should sleep.
04:39
<JohnMH>
Domenic: But it shouldn't be in any spec to do so
04:39
<JohnMH>
It shouldn't be done, in all honesty
04:39
<JohnMH>
It's only even seen as a good idea for UX
04:39
<JohnMH>
Which is a horrible reason, to do anything.
04:40
<Domenic>
Ah, now, again, hopefully you can see how there are people in the world---some of them working on very popular software---who disagree with the idea that UX is a horrible reason to do things
04:40
<Domenic>
And it might be a good idea for those people to all agree on the details of how to do things, while valuing UX
04:40
<JohnMH>
Those people are, more than likely, working in UX fields
04:40
<Domenic>
Thus, those people might start writing a standard
04:40
<Domenic>
No, sorry, those people are browser engineers!
04:41
<JohnMH>
Yes, browser UX?
04:41
<Domenic>
No, browser code
04:41
<JohnMH>
That can easily be the same thing.
04:41
<Domenic>
Everyone on a browser values UX. The browser is the user's agent, after all
04:41
<JohnMH>
In any case, browsers need to follow the already defined standards for URLs, it's already well done.
04:41
<Domenic>
Hmm, well, that would have been a good argument to make in 1985 when timbl was working on Mosaic
04:42
<Domenic>
(1985 = probably not accurate)
04:42
<JohnMH>
If any changes need to be made to the actual URL standard, which I'm absolutely agreeing to, then that's entirely different
04:42
<JohnMH>
but having two different standards for URLs is ridiculous
04:42
<Domenic>
But he chose a different path, as did all the others following him. And so it turns out that the former "standards" became obsolete, as real-world software was not interested in following them.
04:42
<JohnMH>
Why should browsers follow a different definition of a URL than any other software?
04:43
<Domenic>
Yeah, that's why the URL standard obsoletes the many (U/I)R(I/N/L) RFCs
04:43
<Domenic>
They don't!
04:43
<Domenic>
Much other software also follows the URL standard, as other software often wants to interoperate with browsers.
04:43
<JohnMH>
But you're using only browsers for examples, and you're saying that other software should just use a "different algorithm for URL processing"?
04:44
<Domenic>
I'm not saying that. I'm saying that, maybe software you write, that doesn't want to interoperate with browsers, could do that
04:44
<Domenic>
But in general, the software most people write wants to interoperate with browsers, so it follows the URL standard
04:44
<Domenic>
We have some pretty extensive test results on this from a while back (URL parsing libraries in various languages stdlibs), let me dig them up for you...
04:44
<JohnMH>
I don't see how the ability to use git://user@host/repo.git as a URL has to do with that software wanting to interoperate with browsers.
04:44
<JohnMH>
It's just another URL.
04:45
<Domenic>
Right, I think most people want to be able to say "it's just another URL" when writing their software, and use a generic URL processing library that works the same in all scenarios. That will be one following the URL standard.
04:45
<JohnMH>
It's the same with ssh://user@host or sftp://user@host, or any other protocol.
04:45
<JohnMH>
Exactly!
04:45
<JohnMH>
So it's important not to prioritize browsers in this case, as all software should definitely be using the same standard.
04:46
<JohnMH>
Browsers aren't different than any other software, in this regard
04:46
<Domenic>
Well, we want to prioritize interoperability with browsers, as they are some of the most widely-used URL-consuming software on the planet.
04:47
<Domenic>
But let me phrase it another way for you:
04:47
<Domenic>
We want all software to agree on a single URL definition
04:47
<Domenic>
All software is afraid of losing users by making things that used to work, stop working
04:47
<Domenic>
(Some software might be written by people who think that this is not a big problem.)
04:48
<Domenic>
How are we going to get to this world, while still accounting for the fact that nobody wants to lose users?
04:48
<Domenic>
Probably, by converging on union semantics.
04:48
<JohnMH>
Browsers may be 50% of all URL usage, but that's not important.
04:48
<Domenic>
I'd guess 95%
04:48
<JohnMH>
I'd definitely say less than 60%.
04:48
<JohnMH>
But that's not the point,
04:48
<Domenic>
Well, it kind of is
04:48
<JohnMH>
Browsers definitely aren't special here.
04:48
<Domenic>
How are we going to reach interop? What is the shortest path? Changing 95% or changing 5%?
04:49
<JohnMH>
Neither
04:49
<JohnMH>
Define a standard. Everyone should follow that standard. Those who don't will either define their own standard, or will fade away.
04:49
<Domenic>
(Remember the prevalence of mobile devices, usage of which heavily outweighs that of computers. There it may be more obvious that most URL-processing software is a browser or web view.)
04:49
<Domenic>
Ah, see, no, that does not work.
04:49
<Domenic>
Nothing "fades away"
04:49
<Domenic>
Standards are not in and of themselves a good thing
04:49
<JohnMH>
URL processing on Android devices is usually intent-based, not used by the browser at all
04:50
<Domenic>
They are good to the extent they *add value for users*
04:50
<JohnMH>
For example, the Facebook application abuses "internal" protocols
04:50
<Domenic>
If you define a standard that does not add value for users (for example, by increasing interop) it will not be followed.
04:51
<JohnMH>
Increasing interoperability isn't an issue anymore. URLs have already been defined.
04:52
<Domenic>
Those two statements are entirely unrelated
04:52
<Domenic>
And the first is just not true
04:52
<JohnMH>
There is no reason not to follow the existing definition of a URL, as we have been for quite a while.
04:52
<Domenic>
Ah, but there is
04:52
<Domenic>
Because doing so does not increase value for users
04:52
<Domenic>
It in fact decreases value in many cases, by making their websites stop working
04:52
<JohnMH>
No, but you aren't adding anything useful to URLs in the first place.
04:53
<Domenic>
Sure we are. We're "adding" the ability to process URLs that occur in the wild.
04:53
<JohnMH>
That's an implementation issue, not an issue of standardization.
04:53
<JohnMH>
It should return an error, when the URL is erroneous.
04:53
<JohnMH>
There is no useful data on websites which use multiple forward slashes after protocols, for example.
04:54
<Domenic>
Hmm, another fundamental disconnect.
04:54
<JohnMH>
That is an invalid URL at that point, so a browser should tell the user that it is invalid.
04:54
<Domenic>
Implementations need to be interoperable, thus we write standards.
04:54
<Domenic>
Whereas you seem to think we write standards to define how things "should" be.
04:54
<Domenic>
Telling the user "this web page is invalid" does not add value.
04:54
<Domenic>
Do you remember the 90s when you would get a popup for every JavaScript error?
04:54
<JohnMH>
I miss those
04:55
<JohnMH>
They told me what sites were trash
04:55
<Domenic>
But hopefully you can see how people don't generally think they add user value
04:55
<Domenic>
Maybe they would to you
04:55
<JohnMH>
That is what a standard is. See: https://www.ietf.org/rfc/rfc1738.txt
04:55
<Domenic>
In which case I suggest writing a browser extension that enforces your personal standards on all the websites you visit, so you can find "trash"
04:55
<Domenic>
Yeah, as I said, a fundamental disconnect.
04:56
<JohnMH>
Why write a browser extension? My browser of choice doesn't run JavaScript to begin with.
04:56
<Domenic>
So that you can detect URLs that you don't like
04:56
<Domenic>
Those pages are presumably trash too
04:56
<JohnMH>
I can already see the URLs
04:56
<Domenic>
fair enough
04:56
<Domenic>
Anyway, I think we've discovered our fundamental disconnects here: adding user value and increasing interoperability, versus defining how the world should be.
04:57
<Domenic>
Hopefully you can see that a lot of software values the former, and thus standards organizations work to accomodate that software in some cases.
04:57
<JohnMH>
And I suppose this is the difference between this spec and a real standard definition?
04:57
<Domenic>
I can understand if you value the latter more.
04:57
<Domenic>
Well, between the URL Standard and a JohnMH "real standard", sure.
04:58
<JohnMH>
No, I'm talking about RFC 1738
04:58
<Domenic>
As I said.
04:58
<JohnMH>
I definitely didn't write it
04:58
<JohnMH>
I definitely didn't write iti
04:59
<Domenic>
Yeah, but you're the one making up categories
04:59
<JohnMH>
* It's not my definition, it's the generally accepted definition of a URL.
04:59
<Domenic>
That's just false
04:59
<JohnMH>
Not at all, Firefox followed it pretty closely until recently
04:59
<JohnMH>
Lynx still does
04:59
<JohnMH>
EWW does as well, w3m too
04:59
<Domenic>
None of those examples has convinced me that it is "generally accepted"
05:00
<Domenic>
It's definitely accepted by JohnMH and software he likes, though. That I'll grant.
05:00
<JohnMH>
Firefox is one of the most popular browsers, easily comparable to Chrome or other Chromium based things.
05:00
<Domenic>
Yes, and Firefox follows the URL Standard (and is in fact getting closer and closer to a 100% implementation based on Rust!)
05:01
<JohnMH>
Yes, Servo..
05:01
<Domenic>
Right, but they're going to be porting the Servo URL parser into Firefox; very exciting!
05:01
<JohnMH>
That's silly
05:01
<JohnMH>
In any case
05:02
<JohnMH>
Domenic: You also have popular software such as cURL, wget and even browsers such as w3m, eww and lynx which don't follow this "standard", but follow RFC 1738 instead
05:03
<Domenic>
It's true! Some software doesn't follow the URL Standard, especially software JohnMH likes to cite in his arguments on the internet! I do not contest this!
05:03
<JohnMH>
No, instead this software follows the defined standard since 1994, RFC 1738
06:12
<annevk>
That RFC has been obsoleted several times over
06:26
<MikeSmith>
It didn't seen like JohnMH had any actual problem he was trying to solve
06:28
<annevk>
He was just doing some 386'ing to pass the time I guess
11:54
<annevk>
I started writing a reply to some of the URL stuff going on on Twitter but it reads very much like https://annevankesteren.nl/2016/04/network-effects so far
11:58
annevk
finally found a good reference for when folks say something is "(in)sane": http://whatprivilege.com/replacing-crazy-for-ableism-and-preciseness-of-language/
11:58
<jgraham>
I haven't read twitter, but I read Daniel's blog post and it seemed like the whole HTML parsing thing again with s/HTML/URL/
12:00
<annevk>
It's always a variant on that
12:46
<JohnMH>
annevk: Perhaps "obsoleted", but every revision was based on 1738, and didn't change much. It definitely didn't change to allow https:///google.com/ to be valid
13:14
<MikeSmith>
JohnMH: I agree with you that it's not important to care about what browsers do and other tools should strictly follow 20-year old RFCs and pretend that browsers don't exist
13:15
<MikeSmith>
because that's clearly better for users
13:17
<Ms2ger>
MikeSmith, why would you care about interoperating with browsers?
13:17
<Ms2ger>
MikeSmith, the future is in the Semantic Web anyway
13:19
<JohnMH>
MikeSmith: What you describe is a UX issue, which shouldn't have anything to do with what a URL is.
13:22
<hsivonen>
which spec does insertAdjacentHTML live in these days?
13:22
<JohnMH>
URLs go far beyond the context of browsers, and while I understand that WHATWG is solely browsers, you're presenting a specification which many tools and programs other than browsers have to follow.
13:24
<hsivonen>
ooh. domparsing.spec.whatwg.org redirects but is not listed on spec.whatwg.org
13:24
<hsivonen>
confusing
13:25
<Ms2ger>
JohnMH, I don't think this discussion is going to go anywherew
13:30
<hsivonen>
Ms2ger: Looks like I've never replied to say whether I was OK with https://bitbucket.org/ms2ger/dom-parsing-and-serialization/commits/a5d7da5a4f86 . I think I am.
13:31
<Ms2ger>
That's good, I suppose :)
13:31
<JohnMH>
Ms2ger: Why not? There is definitely a good reason to discuss this, and change would definitely be for the better.
13:43
<jgraham>
I think that people are going to have a hard time believing that "change would definitely be for the better" if you don't consider interoperability amongst implementations, including browsers, to be the goal
13:46
<JohnMH>
Interoperability is definitely a goal; that's the point of standardizing something.
13:48
<jgraham>
OK, so the URL spec describes, to the best of our knowledge, what browsers must implement to interoperate with existing content, and with each other
13:49
<jgraham>
Changes to the URL spec therefore have to meet the bar of "this change does not reduce the ability to interoperate with existing content or reduce the chance for browsers to interoperate with each other"
13:59
<hsivonen>
Why might moving Gecko's mapped attributes to show up after non-mapped attributes in the iteration order make the iteration order closer to WebKit/Blink/spec?
14:00
<hsivonen>
I reviewed a patch that changed that but now I can't articulate why that bit of the patch was an improvement
14:10
<Ms2ger>
hsivonen, I don't know why that part was in there either
14:11
<JohnMH>
jgraham: If it refers only to browsers, perhaps the name shouldn't be URL, but BRL
14:12
<Ms2ger>
And HTML should be BTML?
14:12
<halindrome>
Ms2ger: lol
14:13
<Ms2ger>
No, seriously
14:13
<JohnMH>
No, as HTML is rendered primarily by browsers
14:13
<Ms2ger>
URLs are used primarily by browsers too
14:13
<JohnMH>
No, and that seems to be a common misconception here.
14:13
<halindrome>
Ms2ger: well... no.
14:14
<Ms2ger>
I guess that's where the discussion ends, then
14:21
<jgraham>
I don't understand why other tools wouldn't want to interoperate with browsers here. For example any kind of web scraper will need to have the same handling of URLs as a browser. If such a scraper wanted to be cURL based it would therefore benefit from cURL and browsers having compatible URL handling.
14:24
<hsivonen>
Ms2ger: thanks. Maybe I should revert the reordering of mapped and non-mapped, then.
14:36
<annevk>
hsivonen: insertion order, pretty please
14:38
<annevk>
jgraham: I tried to have that discussion with bagder in Mozilla's #necko
14:39
<annevk>
jgraham: not entirely sure I convinced him, but close enough
14:48
<hsivonen>
annevk: we aren't getting rid of mapped attributes in this iteration
14:48
<hsivonen>
annevk: non-mapped attributes changed to insertion order
14:48
<hsivonen>
annevk: and the mapped ones
14:48
<hsivonen>
annevk: but all mapped ones have to come before all the non-mapped ones or all the non-mapped ones have to come before all the mapped ones
14:49
<hsivonen>
annevk: we used to have mapped before non-mapped. Now we have non-mapped before mapped, and a bug report that this broke at least one JS program.
14:51
<hsivonen>
It's kinda disappointing that there's still another round of complaining about how the WHATWG does specs.
14:51
<hsivonen>
After all, the track record should be pretty good by now.
14:51
<annevk>
hsivonen: complaining will continue until morale improves
14:56
<JohnMH>
jgraham: Yes, but you're saying that as if browsers are the only things that use URLs, and as if browsers are authoritative over other software.
14:57
<jgraham>
No I'm not
14:58
<jgraham>
I'm saying that browsers have hard compat requirements dictated by two decades of legacy web content
14:59
<jgraham>
Those contraints tend to be more stringent than for other pieces of software, so naturally browsers have to have more say in standards that apply to both them and other application classes without the same constraints
15:00
<jgraham>
Plus, as I pointed out, it's often necessary for applications in other classes to directly interop with web content, so they actually have the same compat constraints
15:07
<hsivonen>
JohnMH: curl is an example of a non-browser piece of software that tries to be compatible with the Web as written for browsers
15:07
<hsivonen>
it seems to me that it would make sense to implement an URL Standard-conforming URL parser in curl instead of patching things piecemeal
15:56
<JohnMH>
hsivonen: curl doesn't try to be compatible with the "Web as written for browsers", curl reluctantly changes to these non-standard implementations
15:56
<JohnMH>
curl is not just for HTTP, curl also supports GOPHER, FTP, SFTP and more
16:06
<jgraham>
I am guenuinely bemused by idea that you would prefer to have a URL implement a weird nonstandard behaviour to get compatibility because the compatible standard wasn't the standard that you had to break to make your implementation useful
16:23
<JohnMH>
The implementation is useful without nonstandard parsing, which should be done by a pre-formatting library, not part of the URL itself
16:28
<jgraham>
I think the difference between a "preformatting library + a url parser" and a "url parser" where the aggregate behaviour of both is identical, is one made entirely of self delusion
16:30
<caitp>
this looks like a productive topic
16:31
<JohnMH>
Well, no. It's silly to say that it can't be done properly in practice. It's not hard to just create a function for formatting which is defined separately from something which is meant to define a URL, not the handling of malformed URLs and guessing what the user wants.
16:31
<annevk>
There's no guessing, it's all deterministic
16:31
<jgraham>
JohnMH: Your problem is that you have a broken definition of what it means to do something properly
16:32
<JohnMH>
jgraham: To do something properly is to follow the defined standards, not to change the standards based on implementations.
16:32
<jgraham>
Nope
16:32
<JohnMH>
Because an implementation decides to do something silly doesn't mean that the standards should be changed to accommodate.
16:33
<jgraham>
This isn't *an implementation*. We have been through this. It's what's required to work with two decades of legacy content.
16:33
<JohnMH>
That's literally the opposite of a standard, and it's more like something Microsoft would try to do. (See Microsoft Office "standards" and the history behind that for why it doesn't work, even for Microsoft)
16:33
<annevk>
It's not "an implementation", it's the clients making up 90% of the web
16:33
<jgraham>
Now I feel we reached the unproductive part of the conversation
16:34
<JohnMH>
90% of the web isn't HTTP
16:34
<annevk>
Wut?
16:34
<jgraham>
I suppose it won't help if I mention that we went through this exact discussion in 2007 with the HTML parser and the result doesn't look good for your position
16:34
<JohnMH>
Accepting any number of forward slashes isn't going to fix any anything, it's just going to mean that now clients can "acceptably" add any given number of forward slashes.
16:34
<JohnMH>
HTML is different, only browsers use HTML for the most part.
16:34
<annevk>
I don't think it means that at all
16:35
<JohnMH>
99% of HTML use is in browsers
16:35
<JohnMH>
With URLs, less than 20% of usage is browser based
16:35
<jgraham>
[citation needed]
16:35
<JohnMH>
Android, and other platforms, internally use URLs to communicate between applications or processes
16:36
<jgraham>
But if it helps we had exactly the same arguments put forward that we weren't considering non-browser implementations for HTML parsing
16:36
<JohnMH>
Yeah, but those are edge cases
16:36
<JohnMH>
When it comes to URLs, less than 60% (and that's a very liberal guess) of usage is from browsers
16:36
<JohnMH>
There's even an operating system where literally everything is a URL.
16:37
<jgraham>
Redox?
16:37
<jgraham>
I would be interested to know what it uses to parse urls
16:37
<JohnMH>
Yes, Redox
16:39
<jgraham>
Because it seems totally reasonable that e.g. Android should be able to use the same url parsing library for the browser component and for other components
16:39
<nox>
jgraham: I died on the inside a bit.
16:39
<nox>
jgraham: https://github.com/redox-os/redox/blob/master/kernel/fs/url.rs#L31-L43
16:39
<jgraham>
nox: haha
16:40
<jgraham>
Fair to say it uses something that looks almost, but not quite entirely, unlike urls
16:42
<JohnMH>
jgraham: A URL is universal, it isn't specific to browsers.
16:42
<jgraham>
Anyway, a scheme where you have to have multiple URL libraries just so that some applications can reject a subset of inputs for not corresponding to some standard written before we understood the process for achieving compatibility is more complex for everyone
16:43
<jgraham>
It's hard to understand how you would advocate it from an engineering point of view rather than a "religious" point of view ("RFCs MUST be followed")
16:44
<JohnMH>
The processing of achieving compatibility is not hard, you just have to agree on a standard and have everyone follow it. From a development standpoint, that's simple. If any party believes anything should be added, write an RFC and see what people think of it
16:44
<JohnMH>
If people approve of it, sweet! You've got a new standard.
16:44
<nox>
So what's the problem, given there is the URL standard?
16:45
<JohnMH>
The URL standard doesn't define a URL, it defines something specific to browsers.
16:45
<smaug____>
annevk: I'm trying to understand the process here. Perhaps there isn't any process. But why are we discussing about rootnode issue (https://github.com/whatwg/dom/issues/241) in https://github.com/whatwg/dom/pull/248 ?
16:45
<JohnMH>
It should either not claim to be about URLs, or should include more broad input than just browsers.
16:45
<nox>
The RFCs don't define a URL, they define something that nobody uses.
16:45
<annevk>
smaug____: the discussion seemed concluded, so I just want a final okay for the change
16:46
<smaug____>
annevk: and why that final ok should happen in that pull ?
16:46
<nox>
How did you quantify that "with URLs, less than 20% of usage is browser based", btw?
16:46
<smaug____>
and not in the issue
16:46
<smaug____>
it is really confusing to have the same conversation happening in multiple places
16:46
<annevk>
smaug____: generally that is how it works when reviewing changes
16:47
<smaug____>
well, I'm not reviewing anything here
16:48
<smaug____>
anyhow comments somewhere
16:48
<smaug____>
getRootNode() is ok to me
16:48
<smaug____>
s/comments/commented/
16:48
<JohnMH>
Starting with RFC 1738 and ending with RFC 3986, we have the definition of a URL as followed by Java SE, and other platforms.
16:48
<TabAtkins>
Like... there are over a trillion web pages in existence.
16:48
<annevk>
smaug____: sorry for the confusion
16:48
<TabAtkins>
So there are presumably 5 trillion, what, RDF databases?
16:48
<smaug____>
annevk: I blame github :)
16:49
<smaug____>
it is a confusing tool
16:49
<nox>
JohnMH: Where is protocol:///myhost/ valid in the URL standard?
16:49
<annevk>
smaug____: shout it from the rooftops
16:49
<JohnMH>
nox: Anywhere, but the third / would be part of the host
16:49
<JohnMH>
Which is correct, by the way
16:49
<nox>
lolwat
16:49
<JohnMH>
See the file protocol for the only example where you'd want that, though
16:49
<nox>
In which RFC?
16:50
<JohnMH>
nox: It says, nowhere, that multiple forward slashes are valid, which is fine because they shouldn't be in the sense you're bringing up.
16:50
<smaug____>
has cabanier been here recently
16:50
<nox>
I don't parse that sentence.
16:51
<annevk>
smaug____: haven't seen him
16:51
<annevk>
smaug____: he's been somewhat active though on GitHub and I think there was an email too some days ago
16:51
<JohnMH>
nox: Multiple forward slashes = not valid, if parsed everything past the first or second forward slash, depending on the protocol, should be part of the next field in the URL
16:51
<nox>
Where is that stated?
16:51
<smaug____>
annevk: yeah, would be better to chat here about the hit region stuff than in github
16:52
<JohnMH>
nox: It's only stated that it's not valid
16:52
nox
yawns.
16:52
<annevk>
smaug____: guess you'll have to email him
16:52
<JohnMH>
If you have https://googol///t?b#s
16:52
<JohnMH>
what do you expect that to be parsed as?
16:52
<nox>
That's not the same at all as what you said before.
16:52
<annevk>
smaug____: I told Domenic I'm willing to work on it, but not this month, too many things going on
16:53
<JohnMH>
It's not, but I'm proving the point
16:53
<nox>
No you are not.
16:53
<JohnMH>
I'd expect the server at googol to receive a request with the path ///t
16:53
<nox>
You are just talking of something else which is completely unrelated.
16:53
<JohnMH>
which can be a real path, but it isn't likely
16:53
<JohnMH>
So, that considered, what's keeping the same from being true of the host field
16:54
<smaug____>
annevk: totally understand. I'm somewhat worried that most of hit region stuff in the spec will need to be rewritten, though hopefully simplified significantly
16:54
<JohnMH>
nox: For example, under similar circumstances let's say we have the url https:////googol/t
16:54
<smaug____>
but I could be wrong here. Which is why I'm still trying to figure out why we current design is what it is.
16:55
<smaug____>
s/we/the/
16:55
<JohnMH>
I would expect the server at "/googol" to receive a request on the path "t"
16:55
<JohnMH>
or "/t"
16:55
<nox>
That's an empty host as per the rules of the HTTPS scheme. So even if you want to use RFC 1738 you are just wrong.
16:55
<JohnMH>
Exactly
16:55
<annevk>
smaug____: you saw https://lists.w3.org/Archives/Public/www-archive/2016Apr/0001.html, right?
16:55
<JohnMH>
So if that's not valid, why do you want to make it valid?
16:55
<nox>
Where is it valid?
16:56
<JohnMH>
That's exactly what this spec is trying to do, make parsing of /// become valid
16:56
<smaug____>
annevk: I did
16:56
<nox>
Where?
16:57
<smaug____>
annevk: but still wondering why someone ended up as complicated setup as it is now. I hope if the ID stuff is just removed, it all becomes good enough.
17:00
<JohnMH>
nox: https://url.spec.whatwg.org/#special-authority-ignore-slashes-state
17:01
<JohnMH>
This is suggesting that any given number of slashes should be accepted, or that a backslash is even a possible character
17:01
<nox>
"syntax violation"
17:02
<nox>
"06:28 <JohnMH> Instead of supporting malformatted URLs, it'd be best to either throw an error at the user or just transparently rewrite it"
17:02
<JohnMH>
That is a UX issue, though
17:02
<nox>
lol
17:02
<JohnMH>
not an issue of the definition of a URL
17:02
nox
is out of that discussion.
17:02
<JohnMH>
The definition of a URL has nothing to do with how address bars parse input
17:03
<annevk>
correct
17:03
<annevk>
Anyone who states otherwise is wrong
17:04
<JohnMH>
So, perhaps, the entire malformed URL parsing section of this spec should be dropped.
17:04
<annevk>
If it was about address bars, certainly
17:04
<nox>
You do realise that it does exactly what you said in my quote, right?
17:04
<annevk>
Fortunately however we know not to define UX
17:05
<JohnMH>
Right, but UX has nothing to do with URLs.
17:05
<annevk>
As I said
17:05
<JohnMH>
Parsing malformed URLs is entirely UX,
17:05
<annevk>
I sense agreement, yay, time to eat some soup
17:06
<annevk>
Ah no, that's where you're wrong, but still, gotta eat soup
17:06
<nox>
JohnMH: How so?
17:07
<nox>
And as I said, the URL standard says that "https:///" is a syntax violation, btw.
17:22
<JohnMH>
nox: Yes, but it also specifies that it is parsed
17:22
<jyasskin>
JohnMH: The users are the people writing web pages, and the spec describes how to give them a consistent UX.
17:22
<JohnMH>
URLs have absolutely nothing to do with UX
17:22
<nox>
You are free to report an error.
17:24
<jyasskin>
JohnMH: If a website author writes <a href="https:///my-server.com/">;, that's parsed according to the URL spec, which defines how to give them and the people browsing their site a consistent experience.
17:25
annevk
has been reading https://twitter.com/sarahjeong/status/730407790112464897 over dinner, wtf
17:25
<JohnMH>
That consistent experience should include an error message for malformed URLs?
17:25
<annevk>
(the whole thread)
17:28
<gsnedders>
JohnMH: but that isn't what browsers do today or what browsers did a decade ago, and any change would likely break sites
17:33
<JohnMH>
gsnedders: It's not an issue if that breaks sites, it will bring better development to the table and websites which don't fix it will just have errors, like NetScape and IE7 would show for them.
17:33
<jgraham>
ding ding, we have a loser
17:34
<jgraham>
"compatibility doesn't matter" in a didscussion about web browsers immediately shows that your ideas aren't based on the real world
17:35
<gsnedders>
it's an issue if it breaks sites because users simply move to a browser where the site still works, therefore changing behaviour doesn't help
17:36
<annevk>
JohnMH: what's your objective? I wonder if there's some meaningful progress we can make or whether we're just wasting each other's time
17:37
<nox>
Btw it's not like cURL reject https:///google.com/
17:37
<nox>
It just misparses it in a silly way: curl: (6) Could not resolve host: https
17:38
<nox>
So cURL's argument isn't really an URL either, it's like a fancy address bar with its own idiosyncrasies.
17:38
<JohnMH>
curl accepts a valid URL, nothing more
17:39
<JohnMH>
jgraham: I didn't say "compatibility doesn't matter"
17:39
<nox>
It considered https:///google.com/ to be a valid URL then.
17:39
<JohnMH>
In fact, I said they should all give an error when they run into.. an error.
17:39
<JohnMH>
nox: No, it expects it to be a valid URL and attempts to process it
17:39
<nox>
"curl accepts a valid URL"
17:40
<nox>
It accepted https:///google.com/
17:40
<nox>
If I had https as a valid host on my computer, it would have fetched who knows what.
17:40
<jgraham>
http://webcache.googleusercontent.com/search?q=cache:https://daniel.haxx.se/blog/2016/05/11/my-url-isnt-your-url/&num=1&strip=1&vwsrc=0 says curl accepts non-RFC things in nonstandard ways
17:40
<nox>
It was a DNS error, not cURL yelling about it not being an URL.
17:40
<JohnMH>
It expects a valid URL, it only does extremely basic URL validation
17:40
<jgraham>
So it doesn't actually meet any standard
17:41
<nox>
lol
17:41
<jgraham>
JohnMH: Compatability in this case being compatibility with content
17:41
<JohnMH>
jgraham: Yes, unfortunately curl has been following browsers in the past few years, that's another issue and I do have an open issue on the curl project
17:42
<annevk>
JohnMH: you're not going to state your objective?
17:42
<JohnMH>
jgraham: Compatibility is not an excuse to accept extremely malformed URLs, in fact, those people should be punished in the best way we can punish them: Simply not parsing their malformed URLs.
17:42
<jyasskin>
JohnMH: I'm interested in annevk's question. Do you have a concrete objective here? Complaining about a lack of purity in WHATWG's specifications really is a waste of time.
17:42
<nox>
See topic.
17:43
<JohnMH>
I'm complaining that WHATWG's URL specification isn't a URL specification at all, but that it is describing what browsers incorrectly call a URL. Either this form should not be dealt with at all, or should be dealt with separately from URLs as I have suggested.
17:43
<jgraham>
That "objective" is pure theoretical purity
17:43
<annevk>
I see, I think I can safely say that neither of those two things will happen
17:43
<jgraham>
It holds little weight here
17:44
<nox>
For as long as I remember, URLs have always been a wild wild west.
17:44
<JohnMH>
jgraham: There is no theoretical purity to it. You define it, you follow it. Any changes go through an acceptance process, and then other people can follow them
17:44
<JohnMH>
If software doesn't follow the standards, that's an issue
17:45
<nox>
If they don't for decades, you just change the standard.
17:45
<jgraham>
JohnMH: Thought experiment. You are a release manager for a browser. An engineer comes to you and proposes changing your URL parser to match the RFCs exactly. After doing some research you find this will break 10% of sites, which will continue to work in competing implementations. What do you do>
17:45
<jgraham>
s/>/?/
17:45
<JohnMH>
That isn't necessarily true. There is nothing that needs changing since RFC 3986
17:46
<nox>
How do you know that?
17:46
<JohnMH>
jgraham: I would accept the properly done URL parser, and suggest that the address bar handle weird input you expect from users.
17:46
<nox>
It's not just about the address bar, but about href attributes and whatnot too.
17:46
<jgraham>
JohnMH: Congratulations, your browser just lost most of its users. You went out of business.
17:47
<JohnMH>
nox: It may surprise you, but the curl project fetches content from URLs. No, I don't mean https:// and http://, I mean protocol://scheme
17:47
<JohnMH>
jgraham: Business?
17:47
<JohnMH>
Why was I trying to start a business for a browser?
17:47
<jgraham>
Sure, browser vendors are businesses
17:47
<nox>
JohnMH: It parsed https://google.com/ as https being a host.
17:47
<nox>
Err, sorry,
17:47
<JohnMH>
Why wasn't it a free software project, which properly respects users' freedoms and accepts changes from users?
17:47
<nox>
https:///google.com/
17:48
<jgraham>
Free software projects can also be businesses and certainly don't accept arbitary changes from users
17:48
<JohnMH>
There is also the issue that the URL specification says that it defines "domains, IPv4 addresses and IPv6 addresses" when in reality it simply references RFCs and says "That's what X is"
17:48
<JohnMH>
jgraham: Not arbitrary, sure. But that's another discussion entirely, and not relevant to this.
17:49
<jgraham>
Anyway, you have clarly demonstrated that you are working from such a different worldview from the other people here that this conversation is never going to be productive.
17:49
<JohnMH>
I don't know why you even brought up browsers, the majority of software which uses URLs aren't browsers, it's misc software from operating systems to application IPC.
17:49
<gsnedders>
the majority of software that processes URLs in web pages is web browsers.
17:49
<gsnedders>
s/is/are/
17:49
<JohnMH>
In web pages, yes
17:49
<JohnMH>
Sure
17:49
<nox>
"If you specify URL without protocol:// prefix, curl will attempt to guess what protocol you might want. It will then default to HTTP but try other protocols based on often-used host name prefixes. For example, for host names starting with "ftp." curl will assume you want to speak FTP."
17:49
<JohnMH>
But URLs are definitely not only used in web pages
17:50
<nox>
JohnMH: How did you see that most URLs aren't used in browsers?
17:50
<JohnMH>
nox: The Android operating system, among other pieces of software, use URLs in IPC
17:50
<JohnMH>
*uses
17:51
<nox>
Given the set of whatever URLs are used in any Android version is fixed,
17:51
<nox>
and given users can waste their time as long as they want to on their phone browsing the Web,
17:51
<nox>
aren't they mostly using URLs from the Web even on their Android phone?
17:52
<JohnMH>
No, definitely not
17:52
<JohnMH>
every time an application is launched, a URL has been passed to that application
17:53
<JohnMH>
Every time there's a notification or other event, there's a URL attached to that, which contains the package name and payload
17:53
<JohnMH>
I'm not saying that's a good idea, but it's one of the largest uses of URLs.
17:53
<nox>
Still smaller than users browsing the Web, UAs doing some AJAX crap, and whatever else.
17:54
<jyasskin>
To avoid confusion, it *might* make sense to clarify that url.spec.whatwg.org covers URLs used on the web, and not URLs used for other purposes. That said, it doesn't seem like Android folks are actually getting confused by this.
17:55
<annevk>
jyasskin: Android uses a different URL library from Chrome?
17:55
<annevk>
jyasskin: I thought Google had a Google-wide URL library
17:56
<jyasskin>
annevk: I'm not an authority on this, but I suspect that Intent parsing uses a different library from Chrome. I could be totally wrong though.
17:56
<gsnedders>
I do still wonder if we want to add a strict flag to fail hard on a parse error as with HTML
17:56
<annevk>
gsnedders: yeah, me too, but I haven't really seen it deployed for HTML and I'm not sure what good it does
17:57
<annevk>
jyasskin: it'd also surprise me if switching to a Chrome-like library would be incompatible with what they do today
17:58
jyasskin
is double-checking my assumption that Intent parsing uses its own library.
17:58
<gsnedders>
annevk: IMO it makes more sense for URL given things like IPC which use it, where you probably do want to require validity and fail hard
17:58
<jgraham>
I suspect that intents use something java based and chrome for android uses something different
17:59
<annevk>
gsnedders: as with markup, it makes more sense to me that any checks are at the model-level, not syntax
18:00
<JohnMH>
jyasskin: If that were the case, the name should be changed from URL to a more fitting name, such as BRL
18:00
<gsnedders>
annevk: IMO in the IPC case you really want to fail hard than end up with the security consequences of a message going somehwere you weren't expecting
18:06
<jyasskin>
https://android.googlesource.com/platform/frameworks/base/+/master/core/java/android/content/Intent.java#4690 has the intent parsing code. It calls to android.net.Uri to actually parse the URL, which is in https://android.googlesource.com/platform/frameworks/base/+/master/core/java/android/net/Uri.java.
19:33
<annevk>
jyasskin: ah right, should have realized since I was reading about the case in court earlier
19:34
<annevk>
lovely Java
19:34
<annevk>
jyasskin: Android might want to look into http://galimatias.mola.io/
20:06
<JohnMH>
annevk: But that uses the "URL Standard", which is not an actual standard but a set of references and instructions for formatting malformed URLs?
20:07
<JohnMH>
In fact, there is an obvious issue with the software, one the developer admits to..
20:07
<JohnMH>
Even if one wanted to use the "URL Standard", it doesn't parse any scheme other than HTTP and HTTPS
20:08
<JohnMH>
With everything else, it's just "scheme data"
20:09
<TabAtkins>
annevk: Chrome reimplements most things that also exist in "Google-wide" code, because we can't have dependencies into the google3 system if we want it to be independently buildable.
20:38
<jyasskin>
annevk: I don't have much leverage with the Android folks, but also I think gsnedders' point is right, that Intents don't intend or need to parse arbitrary URLs from the web, and should prefer to fail fast because they have security implications. That said, I'm not sure the existing implementation ever fails...
20:40
<annevk>
jyasskin: at that point it might be better to pass around URL records directly
20:40
<annevk>
jyasskin: oh well, doesn't really matter
20:41
<jyasskin>
They have their own legacy problem, in that `new Intent("string")` has worked, and they can't break compatibility with old apps using it, but it's a different legacy problem from the web. :-/
20:42
<gsnedders>
annevk: the IPC might not allow arbitrary objects
20:55
<hober>
why are we putting the url standard in scare quotes now
21:14
<zcorpan_>
"URL" "Standard"
21:15
<TabAtkins>
"U" RL
21:20
<JohnMH>
hober: Because the so-called "URL Standard" isn't actually a URL standard, it's a standard which claims to define URLs, but only defines browser usage of URLs.
21:46
jgraham
puts "hober" in scare quotes
21:47
<gsnedders>
Father "Ted".
21:48
<SimonSapin>
JohnMH: "With everything else, it's just "scheme data"" the spec changed on that point a few months ago
21:50
<JohnMH>
SimonSapin: I was talking about the library, not the spec itself.
21:50
<JohnMH>
SimonSapin: To suggest that library, with such a serious issue, is definitely not the right thing to do, unless an issue has been opened on the project's tracker.
21:53
<SimonSapin>
JohnMH: sorry, I haven’t followed all the discussion. Which library?
21:53
<hober>
hmm.
21:54
<JohnMH>
SimonSapin: http://galimatias.mola.io/
22:14
<gsnedders>
TabAtkins: http://specs.xanthir.com/css-font-display/ doesn't load
22:14
<gsnedders>
TabAtkins: which https://tabatkins.github.io/specs/css-font-display/ redirects too
22:26
<TabAtkins>
gsnedders: lolwut
22:27
<gsnedders>
TabAtkins: am I being stupid?
22:27
<TabAtkins>
no, i dont' remember why this is happening, is all
22:31
<JohnMH>
TabAtkins: Address doesn't resolve
22:31
<JohnMH>
It's just the subdomain
22:33
<TabAtkins>
yes, i know that the link is broken
22:33
<TabAtkins>
dunno why i have things redirecting
22:34
<TabAtkins>
...huh, because I have a cname file and github pays attention to that now
22:34
<TabAtkins>
don't even remember why i have that
22:36
<TabAtkins>
anyway, deleted
22:37
<JohnMH>
GitHub has always "paid attention" to that
23:37
<zcorpan>
Exception marked using <span> rather than <code>:
23:37
<zcorpan>
7344: throw an <span>"<code>IndexSizeError</code>"</span> <code>DOMException</code>. Otherwise, first,