02:21 | <rbuckton> | I just ran across a strange case while writing additional tests for RegExp Modifiers. I've found exactly two cases where
A quick test of the same patterns and inputs in C# shows no disagreement, so its not clear to me if this is expected or possibly a bug in |
02:31 | <rbuckton> | possibly having to do with how Unicode case folding for those characters produces an ASCII character. It just seems strange to have something that is not considered a word character when preserving case, but is considered a word character when ignoring case. |
02:39 | <bakkot> | the original sin here is that \b and \w are not unicode-aware even in u mode |
02:40 | <bakkot> | this behavior follows immediately from that: U+017f is not an ascii word character, but it case-folds to s , which is, and i means that the regex operates on case-folded characters |
02:40 | <bakkot> | the decision to make \b and \w not unicode-aware predates me, unfortunately, so I cannot tell you why this is. it does seem... bad. |
02:40 | <bakkot> | (\d too but that one matters a lot less.) |
03:19 | <Justin Ridgewell> | Time to introduce a w flag for very very unicode mode? |
03:27 | <bakkot> | we actually did specifically discuss and reject the possibility of making \b etc unicode-aware in v -mode https://github.com/tc39/notes/blob/2fccc7f7a38201354a007394ab867ec7b245b464/meetings/2021-08/aug-31.md#regexp-set-notation--properties-of-strings |
04:59 | <Justin Ridgewell> |
I do not remember this |
05:31 | <rbuckton> | I think waldemar's concern at the time was that changing \b , \w , and \d shouldn't be tied to the mode that adds set notation. We'd need to opt in either with a new mode or a {u} suffix. Either are fine so long as the new mode could be included in the modifiers list, i.e., \b{u} or (?w:\b) (or whatever flag we'd use) would work for those cases. |
05:38 | <rbuckton> | Oh, I guess I mentioned modifiers during that discussion as well. |
15:40 | <Richard Gibson> | the decision to make https://github.com/tc39/proposal-regexp-unicode-property-escapes/issues/22#issuecomment-279930140
(https://github.com/tc39/proposal-regexp-unicode-property-escapes/issues/22 is the [failed] attempt to make those escapes Unicode-aware under the |
15:57 | <shu> | who can add new members to the tc39 organization on GH? |
15:57 | <shu> | i'd like to add a V8 bot account for the purposes of test262 2-way sync. i can add the account to the right teams but first it has to be part of the tc39 organization, apparently |
16:07 | <ljharb> | done. |