03:14 | <Domenic> | I guess ES does not really have any "ASCII whitespace tests"? Anyway, people might find the fact that the web platform has multiple definitions of ASCII whitespace interesting. https://github.com/whatwg/infra/issues/670 plus, previously, https://github.com/whatwg/infra/pull/649 |
03:42 | <Meghan Denny> | there's https://tc39.es/ecma262/#sec-white-space / https://tc39.es/ecma262/#prod-WhiteSpace |
03:47 | <Domenic> | Yeah, not ASCII-restricted though |
04:31 | <ljharb> | i would assume that anything that /\s/ matches is considered whitespace. any other definition would be very surprising |
04:47 | <Richard Gibson> | well, the behavior of \s is based on General Category "Space_Separator" plus select additions and not on property White_Spaceāas a result, U+0085 NEXT LINE (which has property White_Space) is not matched by it (regardless of flags), while U+FEFF ZERO WIDTH NO-BREAK SPACE (which does not have property White_Space) is matched by it. See also https://github.com/tc39/ecma262/pull/3303 , which was unfortunately closed |
04:54 | <ljharb> | i'm just saying that "\s" literally means "whitespace", so regardless of specs or standards or web reality, it'll be surprising if that doesn't hold |
04:54 | <ljharb> | unicode nonsense doesn't change that either way :-) |
04:54 | <Domenic> | ASCII whitespace is a useful thing to have in specs when attempting to parse data formats, etc. JSON has its own definition (which does not match JS's \s or White_Space ). |
04:55 | <ljharb> | sure, JSON is its own unique thing. we changed JS a number of years ago to include 2 more newline characters so it'd match JSON, as i recall |
04:56 | <ljharb> | i think that regardless of what the current reality is, we should be striving to make there be a single definition of "whitespace", and for all things to match it |
04:57 | <ljharb> | in particular, at least, i hope nobody would ever add an additional definition of "whitespace" :-) |
04:57 | <Domenic> | I don't agree; I think it's useful for there to be separate definitions for whitespace and ASCII whitespace. |
04:57 | <Domenic> | This is similar to how both lowercasing and ASCII lowercasing are useful operations, for example. |
04:57 | <ljharb> | if the latter is a subset of the former, it might be fine, sure |
04:58 | <ljharb> | i'm saying the ideal is one definition; more than one is fine, but we should be striving to minimize that. if we end up with two that's pretty decent. |
05:00 | <bakkot> | the base64 proposal is the first place to my knowledge that introduces "ASCII whitespace" to JS https://tc39.es/proposal-arraybuffer-base64/spec/#sec-skipasciiwhitespace |
05:01 | <bakkot> | it does not include vertical tab |
05:01 | <bakkot> | it does include form feed |
05:02 | <bakkot> | this matches the infra definition, which is probably where I copied it from |
05:07 | <bakkot> | ah, yes, because that's what btoa uses https://infra.spec.whatwg.org/#forgiving-base64-decode |
05:08 | <bakkot> | including excluding vertical tab https://github.com/tc39/proposal-arraybuffer-base64/issues/5#issuecomment-1783929861 |