03:14
<Domenic>
I guess ES does not really have any "ASCII whitespace tests"? Anyway, people might find the fact that the web platform has multiple definitions of ASCII whitespace interesting. https://github.com/whatwg/infra/issues/670 plus, previously, https://github.com/whatwg/infra/pull/649
03:42
<Meghan Denny>
there's https://tc39.es/ecma262/#sec-white-space / https://tc39.es/ecma262/#prod-WhiteSpace
03:47
<Domenic>
Yeah, not ASCII-restricted though
04:31
<ljharb>
i would assume that anything that /\s/ matches is considered whitespace. any other definition would be very surprising
04:47
<Richard Gibson>
well, the behavior of \s is based on General Category "Space_Separator" plus select additions and not on property White_Space—as a result, U+0085 NEXT LINE (which has property White_Space) is not matched by it (regardless of flags), while U+FEFF ZERO WIDTH NO-BREAK SPACE (which does not have property White_Space) is matched by it. See also https://github.com/tc39/ecma262/pull/3303 , which was unfortunately closed
04:54
<ljharb>
i'm just saying that "\s" literally means "whitespace", so regardless of specs or standards or web reality, it'll be surprising if that doesn't hold
04:54
<ljharb>
unicode nonsense doesn't change that either way :-)
04:54
<Domenic>
ASCII whitespace is a useful thing to have in specs when attempting to parse data formats, etc. JSON has its own definition (which does not match JS's \s or White_Space).
04:55
<ljharb>
sure, JSON is its own unique thing. we changed JS a number of years ago to include 2 more newline characters so it'd match JSON, as i recall
04:56
<ljharb>
i think that regardless of what the current reality is, we should be striving to make there be a single definition of "whitespace", and for all things to match it
04:57
<ljharb>
in particular, at least, i hope nobody would ever add an additional definition of "whitespace" :-)
04:57
<Domenic>
I don't agree; I think it's useful for there to be separate definitions for whitespace and ASCII whitespace.
04:57
<Domenic>
This is similar to how both lowercasing and ASCII lowercasing are useful operations, for example.
04:57
<ljharb>
if the latter is a subset of the former, it might be fine, sure
04:58
<ljharb>
i'm saying the ideal is one definition; more than one is fine, but we should be striving to minimize that. if we end up with two that's pretty decent.
05:00
<bakkot>
the base64 proposal is the first place to my knowledge that introduces "ASCII whitespace" to JS https://tc39.es/proposal-arraybuffer-base64/spec/#sec-skipasciiwhitespace
05:01
<bakkot>
it does not include vertical tab
05:01
<bakkot>
it does include form feed
05:02
<bakkot>
this matches the infra definition, which is probably where I copied it from
05:07
<bakkot>
ah, yes, because that's what btoa uses https://infra.spec.whatwg.org/#forgiving-base64-decode
05:08
<bakkot>
including excluding vertical tab https://github.com/tc39/proposal-arraybuffer-base64/issues/5#issuecomment-1783929861