WHATWG on 2021-07-07

02:30	<sujaldev>	oki thanks!
02:48	<sideshowbarker>	wonderful, https://drafts.csswg.org/ completely broken again for the Nth time
02:50	<sideshowbarker>	after this having happened so frequently, it’s baffling that there’s not a watchdog or something set up
02:51	sideshowbarker	goes off to raise an https://github.com/w3c/csswg-drafts issue about it for at least the 3rd time
12:30	<smaug>	I have asked this before, I think. Is there some documentation for a good workflow for making spec changes? Including the usual github pr creation but also what checks people do to find if there are relevant WPTs etc.
14:18	<freddy>	Admittedly, you are probably not exactly the intended audience but this might help https://wpc.guide/bug-guide/
14:19	<freddy>	but TBH, I often just look at similar patches and the follow-up bugs (e.g., for WPT) they led to
16:49	<bakkot>	annevk: domenic suggested you might have an opinion on https://github.com/bakkot/proposal-arraybuffer-base64/issues/5. the question under consideration (as I see it; he is welcome to correct) is, must any new base64-decoding API in the web platform decline to validate the padding bytes by default (and hence match the behavior of `atob`), or can it be stricter by default in pursuit of making it harder to accidentally fall into security issues arising from the (I think fairly natural) assumption that base64 encoding is one-to-one?
16:49	<bakkot>	also interested in opinions from others
16:50	<bakkot>	Domenic seems to feel strongly that it is more important to match `atob`, I feel fairly strongly that it is more important to make the less-secure behavior opt-in, so I am hoping we can get other opinions on the matter
16:55	<Domenic>	Matching data: URLs and other base64 encoding behaviors in the platform (e.g. SRI digests) is more important than atob
17:01	<Luca Casonato>	I agree
17:03	<bakkot>	I understand that position even less; how many people are ever manually decoding a data: url or an SRI digest?
17:04	<bakkot>	(CSP, incidentally, requires padding, so the web platform is not uniform here)
17:05	<bakkot>	actually SRI does too, as far as I can tell from the spec
17:06	<Domenic>	Yeah the spec doesn't seem to match browsers there
17:08	<Domenic>	I can't find base64 decoding in the CSP/SRI specs, only the implementations
17:09	<bakkot>	that's because they aren't specified in terms of decoding; they compare base64-encoded strings by string equality, and assume that base64 decoding is 1-to-1, since everyone makes this assumption.
17:17	<bakkot>	Yeah the spec doesn't seem to match browsers there Browsers disagree; Firefox follows the spec, Chrome does not
17:39	<shu>	that's because they aren't specified in terms of decoding; they compare base64-encoded strings by string equality, and assume that base64 decoding is 1-to-1, since everyone makes this assumption. i find this footgun argument to be pretty persuasive. fwiw i feel more radicalized than before from reading those tweets, and feel like if anything we should try to change the web default
17:52	<sideshowbarker>	Domenic: r? https://github.com/whatwg/whatwg.org/pull/371
17:58	<sideshowbarker>	Domenic: also https://github.com/whatwg/html-build/pull/265
18:00	<Domenic>	that's because they aren't specified in terms of decoding; they compare base64-encoded strings by string equality, and assume that base64 decoding is 1-to-1, since everyone makes this assumption. i find this footgun argument to be pretty persuasive. fwiw i feel more radicalized than before from reading those tweets, and feel like if anything we should try to change the web default I just don't think these people are JS practictioners. They are crypto people (?) who maybe use base64 for crypto purposes. We should not expose base64 at all if our audience is people hand-rolling crypto.
18:00	<Domenic>	Note also that the tweets note that Go has the same default as JS
18:01	<Domenic>	It feels really bad for TC39 to try to shift the web default by fiat
18:02	<shu>	the "we shouldn't expose this at all" is a separate argument, which i think i also disagree with, but we should table in this context
18:03	<Domenic>	Well it's relevant because what is our goal in exposing this
18:03	<Domenic>	Is it to allow people to decode base64 in a fashion they're used to from Node/Go/Deno/the web?
18:04	<Domenic>	Or is it to allow people to do secure crypto-adjacent stuff that assumes bijection?
18:04	<shu>	It feels really bad for TC39 to try to shift the web default by fiat i have a hard time engaging with this as well -- there are technical reasons to prefer the stricter variant, which there is disagreement with. it similarly feels bad to accuse TC39 of some kind of power grab here
18:04	<bakkot>	Filippo Valsorda is one of the people your employer pays to be hand-rolling crypto
18:04	<Domenic>	Filippo Valsorda is one of the people your employer pays to be hand-rolling crypto Yes, and I don't think he's the target audience for this API.
18:04	<Domenic>	I don't really think pointing at someone with a different opinion and saying "you get your money from the same dude" is really that insightful
18:05	<shu>	yes let's... cool down on that front
18:10	<Luca Casonato>	Just for clarity: this is about the strictness of base64 decoding right? We do all agree that base64 encoding should always include padding?
18:11	<Domenic>	Yes
18:12	<shu>	i've tried to steer the conversation to something more pragmatic, but so long as we're talking about broad-scope arguments like "we shouldn't diverge at all from existing standardized web APIs" and "the current default is not great", i do find the latter more convincing because i'm not sure what group the former position helps in this particular case
18:12	<Luca Casonato>	In this case, what are some arguments for being strict with padding by default? What use cases would this benefit?
18:13	<shu>	for the concrete AMP case, it seemed like the weird Java printer was, well, regarded as weird, and an attempt was made to fix it
18:13	<shu>	In this case, what are some arguments for being strict with padding by default? What use cases would this benefit? the use case that the (imo reasonable) assumption that base64 is a bijection holds, and that when it does not, it prompts the author to take a second look and figure out why
18:16	<Domenic>	Who is going around saying "I really want bijection"? I'm worried about people going "I want to switch from Buffer.from(x, "base64") to the new thing, but I did and now we have a production outage because I didn't know about how TC39 decided to diverge from Node semantics and data that used to work now fails"
18:16	<shu>	i think that's precisely the point of why it's so surprising -- nobody's going around saying that because they don't even realize it's not. i certainly didn't!
18:16	<Domenic>	I suspect people don't expect it.
18:17	<Luca Casonato>	the use case that the (imo reasonable) assumption that base64 is a bijection holds, and that when it does not, it prompts the author to take a second look and figure out why shu: Many other languages and ecosystems have set the precedent that this is not the case though: Go, Python, Java, Deno, Node, Rust. I don't think it is a reasonable assumption to make.
18:17	<Domenic>	Bijection is just not a property one usually insists on for encoding/decoding, is my claim. Certainly not with text encodings!
18:17	<Luca Casonato>	https://docs.rs/base64/0.13.0/base64/ <- too little padding is ok, too much is not
18:17	<bakkot>	Domenic: to be clear, `Buffer.from(x, "base64")` accepts mixed base64 and base64url in the same string, which I am definitely not proposing to support
18:17	<Domenic>	Domenic: to be clear, `Buffer.from(x, "base64")` accepts mixed base64 and base64url in the same string, which I am definitely not proposing to support Well OK, that's pretty bonkers, fair enough.
18:17	<shu>	Domenic: i'm worried about the latter too, but that's why i want more data here! it's all too easy to raise the specter of possible compat breakage
18:18	<bakkot>	(CSP does too, fun fact)
18:18	<Domenic>	Seems like that should be a non-default mode...
18:18	<shu>	and while we're here and i'm a little worked up, i really want us to stop pitting whatwg-vs-tc39, both inside and outside of tc39
18:18	<Domenic>	(I'll open a tracking issue)
18:19	<shu>	i mean, obviously JS is a stakeholder in the web platform and we want to improve it too. there are delegate who don't care about the web as much, but perpetuating that dichotomy entrenches that position which is counter-productive
18:27	<Domenic>	I don't think this is a very WHATWG-vs.-TC39 thing; for me at least I tried to frame all my comments as about technical/API surface concerns on each proposal. E.g. even if TC39 wants to do this then I still am unsure about putting things on the prototypes vs. a separate utility class. At first I thought it was more a web-vs-non-web thing. But recent research showing that Deno/Node/Go/etc. all do forgiving base-64 makes it seem like it's not even that. It's apparently about whether you expect a new JS API to have some bijection property vs. whether you expect it to align with the JS ecosystem/other standard library APIs.
18:31	<bakkot>	Bijection is just not a property one usually insists on for encoding/decoding, is my claim. Certainly not with text encodings! My claim is that a great many people assume it holds, for base64 in particular. Either Chrome's implementation of CSP or the CSP spec itself assumes it holds, so it's not like it's only amateurs who make this mistake. (I don't know what the intent of the CSP authors was, so I'm not saying Chrome is wrong, just that it doesn't match the spec-as-written.) That is to say, my claim is that very few people have the correct intuition about what `atob` and friends actually do in this edge case, and in any case are unlikely to be exposed to it, and as such we should match what they expect these APIs to do, not what they actually do.
18:35	<shu>	I don't think this is a very WHATWG-vs.-TC39 thing; for me at least I tried to frame all my comments as about technical/API surface concerns on each proposal. E.g. even if TC39 wants to do this then I still am unsure about putting things on the prototypes vs. a separate utility class. At first I thought it was more a web-vs-non-web thing. But recent research showing that Deno/Node/Go/etc. all do forgiving base-64 makes it seem like it's not even that. It's apparently about whether you expect a new JS API to have some bijection property vs. whether you expect it to align with the JS ecosystem/other standard library APIs. i was responding mainly to the "TC39 tries to shift the web default by fiat" comment
20:13	<timothygu>	Another data point in the "base64-being-bijection" problem. By default, the GNU coreutils `base64` program emits base64-encoded output wrapped at 76 cols, so already it requires whitespace to be ignored during decoding. In terms of decoding, all popular non-JS base64 decoders that I tested (Perl, Python, Ruby, coreutils) ignore whitespace and treat `YQ==` and `YR==` similarly (return `a` with no errors). It really seems like forgiving-decode is already an established default across ecosystems.
20:20	<shu>	ah cool, thanks for the extra datapoint
20:20	<bakkot>	timothygu: so there's two notions of "ignoring padding". "treat YQ== and YR== similarly" is one; "treat YQ and YQ== similarly" is the other
20:22	<timothygu>	Yeah I understand. I was trying to conclude that the notion of “base64 as bijection” may already be a lost cause
20:24	<bakkot>	I claim that literally zero people will have put `YR==` into a base64 decoder except as one of a.) manually, to see what happens; b.) leaking memory, or c.) because they are actively malicious and trying to take advantage of the looseness of parsers
20:27	<bakkot>	since there is no reasonable way to end up with `YR==` as an input, I am not at all convinced you can meaningfully extrapolate from "base64 decoders in other languages accept it" to "we also should accept it". it is an implementation detail that no one encounters in real life outside of malicious input.
20:30	<bakkot>	(as another data point, the most popular Rust base64 decoder rejects `YR==`, so it's not universal.)
20:31	<timothygu>	I claim that literally zero people will have put `YR==` into a base64 decoder except as one of a.) manually, to see what happens; b.) leaking memory, or c.) because they are actively malicious and trying to take advantage of the looseness of parsers would it make sense to gather metrics to support this claim? I could imagine some browser adding a usecounter to atob() for inconsistent padding
20:33	<bakkot>	if you think that claim requires support, sure. I am confident in its accuracy personally.