06:17 | <mathiasbynens> | Bakkot: wdyt about allowing `[\p{Seq}]`? the only remaining surprising cases are then negation (either through `[^...]` or `\P{...}`) |
06:28 | <Bakkot> | mathiasbynens that feels... pretty weird at first glance? but it is possible that is just the novelty of it. |
06:34 | <Bakkot> | mathiasbynens I guess that seems like it makes things worse, not better; my major problem with `\p{Seq}` in general is that a thing I am expecting to match one character is suddenly matching maybe more than one character and to figure out which it's going to do I need to go track down some table in unicode |
06:35 | <Bakkot> | and expanding that so that `[]` now also maybe matches more than one character does not seem like an improvement |
06:49 | <mathiasbynens> | Bakkot: yeah, I similarly hadn't seriously considered it before. I get it from the UTC perspective though: UTS18 has always allowed strings in character classes |
06:50 | <mathiasbynens> | Bakkot: anyway, it's one of those things where if we throw now (like in the current proposal), we can always decide to loosen that up later |
06:51 | <mathiasbynens> | OTOH, the discussion on whether character classes can match multiple code points influences the syntax discussion |
06:53 | <mathiasbynens> | Bakkot: out of curiosity, why do you need to know if the thing that's matched is just one code point or more? |
06:53 | <mathiasbynens> | with /u you already don't know if it's 1 UTF-16 code unit or 1 code point (which could be 2 such units), and that's a Good Thing(tm) |
06:54 | <mathiasbynens> | this is the next step |
06:59 | <Bakkot> | mathiasbynens the way I reason about regular expressions is by walking them over strings, one character at a time |
06:59 | <Bakkot> | if we could redefine 'character' to 'glyph' everywhere then I could walk one glyph at a time |
07:00 | <Bakkot> | but as long as `.` means one code point I am stuck with thinking about code points |