00:02 | <drousso> | jridgewell can I assist in writing tests? |
00:03 | <jridgewell> | I have 0 tests... |
00:04 | <jridgewell> | How would you like to test? |
00:04 | <jridgewell> | help** |
00:06 | <drousso> | no idea lol |
00:07 | <drousso> | just thought i'd offer, given that i already have something testable ;P |
00:07 | <drousso> | i've never written anything test262, but am keen to learn :) |
00:07 | <jridgewell> | Maybe if you could make a list of the things you think we should test? |
00:07 | <jridgewell> | I have also never written a test262 test... |
00:10 | <devsnek> | aim for 100% coverage https://coveralls.io/builds/29699669/source?filename=src/runtime-semantics/AssignmentExpression.mjs |
00:10 | <devsnek> | unfortunately most of my parser is a dependency so i can't get good coverage data out of that |
00:11 | <rkirsling> | it would be super ideal if you could just be like "hey, when this holds, just do exactly what = is already doing" but given that it's a totally separate operator you basically need to do all the same tests again... :( |
00:12 | <devsnek> | use ai to merge `+=` tests with `||` tests |
00:13 | <rkirsling> | AI? In *my* parser? |
00:13 | <rkirsling> | ljharb: is this still somewhere on the editor group docket? https://github.com/tc39/ecma262/pull/1860 |
00:14 | <drousso> | jridgewell i'll see if I can come up with a list tonight |
00:14 | <devsnek> | drousso: are y'all able to get coverage from bytecode and stuff |
00:14 | <ljharb> | rkirsling: it's not, but i can add it |
00:14 | <drousso> | i already have a bunch of the straightforward cases either implemented or discussed in <https://webkit.org/b/209716> |
00:15 | <drousso> | devsnek i don't know, i'm a noob with JSC 😅 |
00:15 | <drousso> | i did this work partly to try to learn more |
00:15 | <rkirsling> | ljharb: much obliged |
00:15 | <drousso> | CC keith_miller msaboff |
00:16 | <devsnek> | if it can't, engine262 can do `npm run coverage` for an html which might be useful |
00:17 | <devsnek> | you could also do `./node_modules/.bin/nyc node test/test262/test262.js test/language/whatever-logical-assignment-tests**` |
00:19 | <rkirsling> | lol I'm pretty sure that's just engine262 though |
00:19 | <devsnek> | oh yeah that's just to verify the tests |
00:19 | <devsnek> | to some degree |
00:20 | <devsnek> | it doesn't tell you if all the engines will pass :P |
00:20 | <devsnek> | you'd need uh |
00:20 | <devsnek> | test262-harness for that |
00:39 | <rkirsling> | how does GH still not have a crying reaction |
00:39 | <rkirsling> | #stifledexpressivity |
00:41 | <rkirsling> | ljharb: should I update myself as the presenter for the forbidden extensions PR, if you were just hesitant to make me do it? |
05:21 | <ljharb> | rkirsling: absolutely! no need for a PR for that change either, just push a commit :-) |
15:40 | <bradleymeck> | what link are we using for hallway track? |
15:41 | <mpcsh> | I think this - https://hub.link/bHXk2f8 |
15:44 | <jridgewell> | Reminder that this chat is public. |
15:45 | <jridgewell> | Only share links in the Reflector. |
15:45 | <mpcsh> | this channel is private, right? only #tc39 is public I thought |
15:45 | <michaelficarra> | ugh |
15:45 | <devsnek> | this channel is public now too |
15:46 | <mpcsh> | oooof. sorry y'all, I didn't know that changed. |
15:46 | <michaelficarra> | it is publicly viewable, not open to public contributions |
15:46 | <michaelficarra> | it's understandable, this is only the second meeting where that's the case |
15:46 | <devsnek> | ystartsev: can a new link be generated |
15:46 | <michaelficarra> | it's easy enough to change the hubs link if we find unwanted users |
15:48 | <devsnek> | i'm going to have to deep clean my laptop from this zoom install |
15:49 | <devsnek> | zoom added code snippets apparently |
15:51 | <michaelficarra> | I don't even know what that means |
15:51 | <devsnek> | in the text chat |
15:51 | <devsnek> | we might need to have a no-zoom-text-chat rule btw |
15:52 | <michaelficarra> | yeah, at least for the sake of my sanity |
15:52 | <michaelficarra> | there's already enough places to follow during a meeting |
15:52 | <devsnek> | well for the public discussion rule as well, unless note takers are going to be summarizing the text chat |
15:53 | <michaelficarra> | yes I understood |
15:53 | <michaelficarra> | where do I find the TCQ link? |
15:53 | <michaelficarra> | it's not in the channel topic and not in the Reflector issue |
15:54 | <devsnek> | i'm not sure |
15:59 | <bradleymeck> | reflector |
16:07 | <apaprocki> | Is someone going to post the hubs URL to the Reflector issue? |
16:17 | <ystartsev> | devsnek: yep |
16:18 | <ystartsev> | apaprocki i can generate one |
16:25 | <bradleymeck> | its on the reflector now |
16:26 | <shu> | what's the idea for the hubs, are we supposed to idle there? |
16:26 | <shu> | or join it during breaks? |
16:27 | <ystartsev> | shu: either works |
16:27 | <ystartsev> | i do recommend reducing the settings so that it doesnt have the fan spinning all the time |
16:28 | <bradleymeck> | shu: i think idling during breaks at least allows people to start conversation in the way hallway track usually works |
16:28 | <devsnek> | i'm planning to join it during breaks |
16:28 | <bradleymeck> | if you don't idle there people can't approach in a public fashion, would have to do it via DMs etc. which plenary couldn't pick up on |
16:29 | <ystartsev> | bradley and i are in there already if people wanna join |
16:30 | <devsnek> | might need some contrast work https://usercontent.irccloud-cdn.com/file/MuiR0i0G/IMG_20200331_112945.jpg |
16:38 | <ystartsev> | should the room be more.. fun? |
16:39 | <rkirsling> | if you have to ask... ;) |
16:39 | <rkirsling> | (just kidding, I'm still making breakfast over here) |
16:39 | <drousso> | what does one do if one doesn't have a VR headset? :P |
16:40 | <bradleymeck> | drousso: you can just use WASD controls |
16:40 | <drousso> | nice! |
16:40 | <drousso> | is there a link? |
16:40 | <mpcsh> | reflector |
16:47 | <rkirsling> | there are some fun avatars in the Newest category hehe |
16:49 | <devsnek> | msaboff: |
16:49 | <devsnek> | oh oops |
16:55 | <robpalme> | over the next hour, folks are likely to request access details to dial in - in all cases please go to the Reflector link 275 (posted in the IRC channel subject) |
17:06 | <bradleymeck> | resolution 1x1 |
17:06 | <devsnek> | oh wow there are three pages of people |
17:07 | <ljharb> | it'd be great to keep chat to IRC, instead of zoom, during plenary stuff - it's hard enough to keep track of all the existing chat venues |
17:07 | <rkirsling> | yes please |
17:10 | <ljharb> | bterlson: heads up if you want to call on somebody, as the zoom host, you can unmute them (which asks them to confirm) but it gets their attention |
17:10 | <ystartsev> | there is also a stream of the zoom in the hub |
17:11 | <ljharb> | ystartsev: ah i didn't realize that, thanks |
17:11 | <Bakkot> | suggestion: introduce yourself the first time you talk |
17:11 | <caiolima> | lol for Keith's background |
17:11 | <rkirsling> | Bakkot: because that worked so well at JSConf EU :P |
17:11 | <rkirsling> | (half kidding, I think it will work better in plenary) |
17:12 | <Bakkot> | I wasn't there, so I will assume it worked perfectly |
17:12 | <rkirsling> | you didn't watch our panel on YouTube? 😱 |
17:12 | <rkirsling> | I'll link it in TDZ |
17:13 | <Bakkot> | I don't watch videos as a rule |
17:13 | <rkirsling> | that's...fair |
17:17 | <littledan> | akirose: If you're doing the schedule, i have my constraints updated here https://github.com/tc39/agendas/blob/master/2020/03.md#schedule-constraints |
17:17 | <littledan> | apologies for these being so last-minute |
17:17 | <akirose> | ty for letting me know |
17:20 | <robpalme> | We have four note-takers who have generously volunteered for this session ahead of time: Rick, Philip, Mark, and Jason! |
17:21 | <rkirsling> | 👏 |
17:21 | <littledan> | to clarify: invited experts only need to sign the IPR form *once ever*. No need to sign again for this meeting. |
17:27 | <devsnek> | akirose: are you using side-by-side mode? |
17:27 | <akirose> | i have no idea what that is |
17:28 | <devsnek> | oh to see rob at the same time as the slides |
17:28 | <akirose> | i have zoom full-screened and i'm flipping between it and the agenda schedule i'm working on |
17:28 | <ljharb> | devsnek: i see rob in the lower right corner, floating over the shared screen |
17:28 | <michaelficarra> | ooohh Oracle, that should be exciting |
17:28 | <devsnek> | aha full screen |
17:30 | <Bakkot> | would love to have graal people participating more |
17:38 | <bradleymeck> | i wonder how the trademark thing will go if they do join |
17:38 | <rkirsling> | bradleymeck: see tdz |
17:48 | <rwaldron> | bterlson the Test262 update is not on the TCQ agenda |
17:48 | <bterlson> | whoops I'll add it |
17:48 | <bterlson> | leo? |
17:48 | <bterlson> | doing it? or you for old times sake? :-P |
17:49 | <rwaldron> | Me |
17:55 | <rkirsling> | I wanted to blurt out "workin' on it" re JSC but didn't know whether that was a New Topic or what lol |
18:15 | <shu> | akirose: MylesBorins: bterlson: for the schedule, not sure if the agenda view on tcq is up-to-date, but please schedule the incubator call chartering to be sometime on the last day |
18:15 | <shu> | (as noted in the schedule constraints) |
18:15 | <MylesBorins> | I don't think it is up to date |
18:15 | <akirose> | it's not done yet, but i'll post the WIP schedule on the reflector |
18:16 | <shu> | ah okay, great |
18:17 | <rkirsling> | nice speediness there jridgewell :D |
18:17 | <jridgewell> | The tests aren't great… |
18:18 | <rkirsling> | yeah but officially nonzero |
18:18 | <jridgewell> | drousso might help me make more tests. |
18:18 | <rkirsling> | 👍 |
18:22 | <jackworks79> | test |
18:22 | <devsnek> | hello jackworks79 |
18:23 | <jackworks79> | does here a public IRC channel now? |
18:24 | <michaelficarra> | jackworks79: the public can view this IRC channel and read its logs, yes |
18:25 | <jridgewell> | jackworks79: Only delegates can message, though |
18:25 | <Bakkot> | #tc39 is the one which anyone can message |
18:25 | <Bakkot> | (which I'd encourage using most of the time, but not for discussing ongoing meeting stuff) |
18:29 | <akirose> | i need to apologize— Bakkot i'm asking you to go after lunch due to a scheduling constraint, maybe i'll ask s-h-u to do his PSA next instead. |
18:29 | <Bakkot> | yup seems fine |
18:29 | <akirose> | ty |
18:32 | <michaelficarra> | since when have we tried to prevent people from deadlocking themselves? |
18:33 | <devsnek> | the deadlock is that it will never wake up |
18:33 | <devsnek> | nothing can wake it |
18:35 | <michaelficarra> | while(1); |
18:36 | <Bakkot> | michaelficarra since we started trying to design a memory model usable by mortals |
18:36 | <Bakkot> | it is way less obvious when you've accidentally gotten into a deadlock with multithreading than when you just have an infinite loop, as a rule |
18:36 | <michaelficarra> | this was the single-threaded case though |
18:37 | <devsnek> | i think the design here is that you don't know if the buffer you're given is shared or not |
18:37 | <devsnek> | or more that you don't want to bother checking |
18:38 | <jackworks79> | one of my friend is doing this thing to block the main JS thread: |
18:38 | <michaelficarra> | devsnek: good point |
18:38 | <devsnek> | jridgewell: with those logical assignment tests v |
18:38 | <devsnek> | https://gc.gy/53384906.png |
18:38 | <devsnek> | 100% coverage of the runtime semantics! |
18:38 | <jackworks79> | with ({}) while (Atomics.load(...) === old) // blocking on the main thread |
18:39 | <jridgewell> | 😉 |
18:41 | <jackworks79> | he said "if you won't allowing me to lock (Atomics.wait) on the main thread, I'll use a more hacky and stupid (while loop to check spin lock) to lock the main thread" |
18:41 | <devsnek> | someone's audio is dying |
18:42 | <apaprocki> | I muted brian |
18:42 | <msaboff> | ty |
18:42 | <apaprocki> | somehow I wound up with host privs so I used them :P |
18:42 | <msaboff> | drunk with power |
18:44 | <bradleymeck> | jackworks79: correct, but thats a spin lock instead of a full on sleep |
18:46 | <mathiasbynens> | 0 is the new NaN, folks |
18:46 | <jackworks79> | (he is making a remote sync DOM so he need to force block on the main thread to wait for the result from a remote DOM env) |
18:47 | <devsnek> | Symbol.for('es.no.waiters') |
18:47 | <Bakkot> | what an excellent way to burn users' battery |
18:48 | <devsnek> | should use an evented model |
18:49 | <jridgewell> | It's really difficult to provide an sync API over an async thread |
18:49 | <jridgewell> | So, making it evented would just break the user's code |
18:50 | <jridgewell> | AMP has the same issue with WorkerDOM |
18:50 | <jackworks79> | yeah, in workers can use Atomics.wait to block the thread and make it fake sync |
18:50 | <devsnek> | make amp v2 |
18:51 | <mathiasbynens> | can y'all give more of a heads-up when you move agenda items around? previously it was communicated that named groups would happen before lunch |
18:51 | <brad4d> | I love that the "frozen" slide is constantly moving |
18:52 | <bradleymeck> | We also are looking at this wait behavior for instrumenting CJS in Node |
18:52 | <bradleymeck> | but we don't have [[CanWait]] set to false for anything |
18:52 | <michaelficarra> | brad4d: it's snowflakes though |
18:53 | <devsnek> | RIP v8 8.2 |
20:17 | <devsnek> | 👀 https://gc.gy/53390815.png |
20:17 | <devsnek> | https://gc.gy/53390839.png |
20:22 | <rkirsling> | agree with mathiasbynens 👍 |
20:24 | <rkirsling> | the whole `var y = { \u0066or: x } = { for: 42 };` is legal thing makes me cry too |
20:24 | <devsnek> | \u0066 in chat |
20:25 | <mathiasbynens> | Bakkot: you didn't mention that in non-u regexps, there's a more interesting case for /\u{FOO}/ if FOO consists of 0-9 only e.g. `123456`: then, it matches the literal character `u` repeated `123456` times. |
20:25 | <rkirsling> | (JSC has a ton of outstanding test262 failures about keywords-with-escapes as identifiers) |
20:25 | <mathiasbynens> | (#funfact but didn't want to waste committee time) |
20:25 | <devsnek> | non-unicode regex has a lot of scary stuff |
20:25 | <rbuckton> | I'd like to go on record as being opposed to allowing non-IdentifierNames in named groups, as they conflict with some RegExp related proposals I'm putting together. |
20:26 | <devsnek> | gibson042: can't the programs be expressed with \u{} |
20:26 | <littledan> | I'm not convinced that Waldemar's asciifier should be a goal... it's possible to write, but would just be slightly more complicated |
20:27 | <rbuckton> | (since it sounds like my question won't make it before the queue is cut off) |
20:27 | <littledan> | so I don't understand why the goal needs to include that the asciifier is so simple |
20:28 | <mathiasbynens> | I maintain such an asciifier... |
20:28 | <devsnek> | what is an asciifier |
20:28 | <akirose> | if you quit and re-join you have you re-add your full name 🤦🏻♀️ |
20:28 | <devsnek> | turning things outside ascii range into escapes? |
20:28 | <rkirsling> | akirose: yeah I can't figure out how to correct this :-/ |
20:29 | <shu> | mathiasbynens: waldemar's contention is that you can't convert a non-unicode regex into a unicode regex in general |
20:29 | <shu> | mathiasbynens: i don't know enough about regexps to say, is that actually true? |
20:29 | <akirose> | i had to find my own face in the gallery and choose "rename" from the context menu |
20:29 | <jackworks79> | I 口 Unicode |
20:30 | <mathiasbynens> | gibson042: can you give an example? |
20:31 | <mathiasbynens> | aah this is what i was missing earlier. we could totally make \u{...} work specifically for group names. i missed that gibson042 wouldn't want that |
20:32 | <rbuckton> | The reason `\u{1d49c}` isn't supported without the `u` flag is for back compat, however there's no back-compat concern to allow them without the `u` flag in a named capture group, as named capture groups were new syntax so there is no back-compat hazard. |
20:32 | <robpalme> | please really consider if you must be on the queue - we have to end this topic in 6 mins |
20:32 | <mathiasbynens> | rbuckton: exactly |
20:32 | <michaelficarra> | I don't think this is getting resolved in the next 10 minutes |
20:33 | <rbuckton> | The main reason *not* to allow them would be confusion due to inconsistency. |
20:33 | <michaelficarra> | jackworks: your pseduo-tofu bothers me so much more than real tofu |
20:33 | <mathiasbynens> | there's gonna be inconsistency in any case. we have to pick which inconsistency we want to live with |
20:33 | <devsnek> | are humans going to be confused by this though |
20:33 | <devsnek> | this is all tooling output |
20:33 | <mathiasbynens> | i think the inconsistency between pattern + match.groups.IDENTIFIER is what matters most |
20:34 | <rkirsling> | michaelficarra: lol I thought this too 😂 |
20:34 | <devsnek> | +1 to mathias |
20:36 | <jridgewell> | Then we could just allow any key name? |
20:37 | <jackworks79> | use ['for'] syntax any key name is already allowed imo 🤣 |
20:38 | <akirose> | 2 minute warning |
20:41 | <ljharb> | gibson042: isn't that what you objected to? |
20:41 | <gibson042> | what is the "this" here? |
20:42 | <rkirsling> | I get the urgency to resolve this matter but it feels really rushed given the temp of the room |
20:42 | <gibson042> | I think \u{…} should have identical treatment in a non-Unicode regex regardless of its use for matching vs. capture-group naming |
20:42 | <rwaldron> | akirose I would like to participate in "Make SharedArrayBuffer optional", but I also have to go at 5pm (hard stop, child care) |
20:42 | <msaboff> | We could come back to it with a longer time box, but it seems to be blocking to adopting ES 2020 |
20:42 | <gibson042> | i.e., equivalence with "u{…}" |
20:42 | <rwaldron> | If shu doesn't mind, could we do that tomorrow? |
20:43 | <shu> | i'm flexible for my items |
20:43 | <rwaldron> | shu I appreciate that |
20:43 | <ljharb> | gibson042: right, what kevin and waldemar were suggesting is, that `\u{1234}` would mean *that character* in both u and non-u regexes, not a literal "u{1234}", if i understand correctly |
20:44 | <ljharb> | gibson042: and i thought your position was, that in a non-u regex, `\u{1234}` must mean `u{1234}` |
20:44 | <shu> | i don't think we're talking about \u{nnn} at all...? |
20:44 | <rwaldron> | akirose shu the only other item that I want to participate in is "Atomics.waitAsync error rejection PR", but presumably that wont be reached until tomorrow or Thursday anyway |
20:44 | <gibson042> | ah, I think you have misunderstood Waldemar at least |
20:44 | <ljharb> | ah k |
20:44 | <ljharb> | i'm asking about the current slide |
20:44 | <shu> | aren't we talking about the surrogate pair syntax only? |
20:44 | <shu> | rwaldron: yeah, if at all, it's a late addition so i expect it to be at the end of the meeting |
20:44 | <ljharb> | shu: the question's the same tho |
20:44 | <msaboff> | In all of this, there is the other part of capture group names, the first "character" needs to have the Identifier_Start property and subsequent "characters" have the propert Identifier_Continue. |
20:44 | <shu> | ljharb: it is? |
20:44 | <ljharb> | in a non-u regex, `\u` means `u` |
20:44 | <rwaldron> | shu that's my expectation as well |
20:45 | <michaelficarra> | there's still a bunch of people who didn't fix their display name |
20:45 | <ljharb> | my understanding of gibson042's position was that that should remain true in named capture groups |
20:45 | <ljharb> | shu: ^ |
20:45 | <ljharb> | and my understanding of waldemar and kevin's preference was to make it be an actual character escape in non-u regexes (as well, ofc, as in u regexes) |
20:45 | <ljharb> | did i misunderstand? |
20:47 | <gibson042> | in order to represent in ASCII a regular expression with a non-BMP capture group name, it is necessary to allow *at least one of* `/(?<\ud835\udc9c>.)/` with surrogate-pair semantics or `/(?<\u{1d49c}>.)/` with code point semantics. I am against the latter because of inconsistency with \u{1d49c} in non-Unicode regexes outside of capture groups. |
20:47 | <gibson042> | and I believe that is also Waldemar's position |
20:47 | <ljharb> | ahhh ok |
20:47 | <ljharb> | so you want the non-curly surrogate pair syntax to mean "the char" but you want the curly form to be illegal (in a non-u regex)? |
20:47 | <gibson042> | I'm less sure about Kevin, but I think that matches as well |
20:47 | <devsnek> | why is ?<...> not always the ID_Identifier semantics |
20:48 | <devsnek> | er |
20:48 | <devsnek> | ID_... semantics |
20:48 | <ljharb> | gibson042: did my paraphrase make sense? |
20:48 | <devsnek> | i think i agree with shane that the thing in the arrows should always be an identifier |
20:49 | <rkirsling> | akirose: well put |
20:50 | <keith_miller> | I'm with Shane on this one I think it should just be an identifier prooduction |
20:50 | <keith_miller> | IIUC |
20:50 | <devsnek> | that would allow \u{} |
20:50 | <keith_miller> | correct |
20:51 | <devsnek> | 👍🏻 |
20:51 | <ljharb> | keith_miller: even in a non-unicode regex? |
20:51 | <devsnek> | yes |
20:51 | <jridgewell> | I still don't understand Waldemar's ASCIIfier |
20:51 | <keith_miller> | It's not observable there? |
20:51 | <rbuckton> | My internet connection just died, I will try to rejoin shortly. |
20:51 | <jridgewell> | You must be aware of the RegExp context |
20:51 | <msaboff> | I think that identifier production should be the same for both non-Unicode and Unicode RegExp's. |
20:51 | <keith_miller> | because it's not actually used in part of the regexp |
20:51 | <keith_miller> | ljharb:^ |
20:51 | <jridgewell> | Because if you naively changed the pretty a until a `\u{CODE}` in a non-unicode regex, it'd break the regex. |
20:52 | <keith_miller> | It's essentially a comment? |
20:52 | <ljharb> | keith_miller: ah ok, so the context is different for you between "the regex pattern itself" and "the annotation of the capture group" |
20:52 | <ljharb> | gibson042: thoughts on ^ ? |
20:52 | <jridgewell> | And if we allow any "name" there, why not allow _any_ name? |
20:52 | <devsnek> | i think shane put it nicely with the template example |
20:52 | <Bakkot> | jridgewell you can just change it to the pretty A into the two surrogate halves and it will work in both cases |
20:52 | <devsnek> | ${...} is like (<...> |
20:52 | <keith_miller> | ljharb: yeah |
20:52 | <devsnek> | er (?<...> |
20:53 | <Bakkot> | jridgewell the main reason not to allow any name is because it can conflict with numeric (non-named) matches, which isn't currently terrible but gets weird with some potential other features |
20:53 | <jridgewell> | Bringing up Waldemar's funny behavior `\u123\u456*` isn't the same as `\u{12345}*` |
20:53 | <msaboff> | ljharb I agree with keith_miller. Capture group names should be treated separately to the RegExp's pattern |
20:53 | <jridgewell> | So naive ASCIIfier is always needs to parse the regex. |
20:53 | <Bakkot> | shane, are you in IRC? I don't know your handle |
20:53 | <devsnek> | sffc: |
20:53 | <jridgewell> | It's just not possible to do otherwise. |
20:54 | <Bakkot> | jridgewell but `\u123\u456*` is the same as `A*` |
20:54 | <Bakkot> | in both unicode and non-unicode regexes |
20:54 | <Bakkot> | (I think...) |
20:54 | <gibson042> | I am against `/(?<\u{1d49c}>\u{1d49c})/` returning a group named "𝒜" with value "u{1d49c}", which is what the "apply code point semantics specifically when naming capture groups" implies |
20:54 | <jridgewell> | Be we can use the unicode codepoint anymore |
20:54 | <ljharb> | gibson042: ok - the thing you're against seems to be what a number of people are settling on |
20:54 | <devsnek> | i am okay with the example gibson042 just sent |
20:55 | <sffc> | hi. ok, so I understand the desire to represent regexes as strings, in which case the thing inside (?<...>) is interpreted as a string. In that situation, though, then (?<0>) should produce a capture group named with the string "0". |
20:55 | <Bakkot> | gibson042 there's a mandatory `>` after the group name, so you can't have that particular case |
20:55 | <gibson042> | having different semantics for "\u{…}" sequences based on where they appear in the regex is just too much cognitive burden for too little gain IMO |
20:55 | <jridgewell> | gibson042: Won't allowing the surrogate code point do exactly that? |
20:55 | <rkirsling> | yeah +1 to consistency about "what an identifier is" from me |
20:55 | <gibson042> | Bakkot: it's there |
20:56 | <Bakkot> | oh, sorry, I can't read |
20:56 | <Bakkot> | yup |
20:56 | <msaboff> | Bakkot Your slides said that let \ud835\udc9c; is not valid, but let \u{1d49c}; is. Shouldn't RegExp capture names be the same? |
20:56 | <michaelficarra> | we're talking about IdentifierNames, right? not Identifier? |
20:57 | <rkirsling> | separate question: has anyone ever suggested making `let \ud835\udc9c;` valid? |
20:57 | <devsnek> | michaelficarra: yes |
20:57 | <michaelficarra> | I *really* hope we wouldn't apply ReservedWord restrictions to named capture groups |
20:57 | <msaboff> | That is my understandng. |
20:57 | <michaelficarra> | rkirsling: I'm actually not sure why that's invalid |
20:57 | <michaelficarra> | I can't figure it out |
20:58 | <Bakkot> | michaelficarra because `\ud835` is not ID_Start |
20:58 | <michaelficarra> | remember that `let \u0065` is valid, so why wouldn't surrogate halves be valid? |
20:58 | <michaelficarra> | oh really, it checks the first code unit for ID_Start? |
20:58 | <michaelficarra> | okay then |
20:59 | <rkirsling> | alright that's fair |
20:59 | <michaelficarra> | btw where should we having the priority discussion? |
20:59 | <michaelficarra> | some people felt that it was unacceptable for the spec to go out with known incoherencies, but I think it's fine |
21:00 | <msaboff> | Bakkot If that is the case, don't we want named capture group name syntax to be the same as identifiers? |
21:00 | <msaboff> | I actually think that if we allow \u{}, we should also allow \u\u |
21:01 | <devsnek> | that would be iffy |
21:01 | <devsnek> | \u\u isn't disallowed btw |
21:01 | <devsnek> | its just that the first one isn't a valid identifier start |
21:01 | <sffc> | Reading the notes, I think WH's point was that anything represented in `/.../` syntax should also be able to be represented in `new RegExp("...")` syntax |
21:02 | <sffc> | and vice-verse |
21:02 | <rwaldron> | akirose shu I have to go now, but I think this proposal is ok |
21:02 | <rwaldron> | See you all tomorrow. |
21:02 | <devsnek> | 👋🏻 |
21:02 | <msaboff> | I haven't looked at the spec or our parser to see if a valid first surrogate \u is followed by a valid second surrogate that it is supposed to be treaded as a single Unicode code point. |
21:02 | <akirose> | ty rwaldron |
21:03 | <devsnek> | msaboff: https://tc39.es/ecma262/#sec-identifier-names-static-semantics-early-errors |
21:04 | <msaboff> | Thanks, looking... |
21:04 | <devsnek> | the first rule |
21:04 | <gibson042> | it's stronger than that... anything representable in `/…/` should be representable without using any code unit outside of 0x20 through 0x7E |
21:04 | <devsnek> | It is a Syntax Error if the SV of UnicodeEscapeSequence is none of "$", or "_", or the UTF16Encoding of a code point matched by the UnicodeIDStart lexical grammar production. |
21:04 | <Bakkot> | msaboff y'alls parser has a bunch of problems with non-BMP in general |
21:04 | <Bakkot> | doesn't even allow `let 𝒜` |
21:04 | <devsnek> | we'd have to modify our grammar to perform utf16 decoding on identifiers |
21:04 | <devsnek> | just typing that gives me shivers |
21:06 | <msaboff> | devsnek That says that \uHHHH is for BMP characters only. |
21:07 | <devsnek> | what does |
21:08 | <msaboff> | The spec only allows for one escape for an IdentifierStart or IndentifierPart. That is only one \uHHHH escape and not two together |
21:08 | <devsnek> | right |
21:09 | <msaboff> | If you want a non-BMP codepoint, you have to use \u{} |
21:09 | <devsnek> | right |
21:09 | <msaboff> | Given we aren't likely to change this for identifiers, I think named capture group identifiers should follow the same rules. |
21:10 | <msaboff> | Thins would include non-Unicode RegExps |
21:10 | <devsnek> | indeed |
21:10 | <michaelficarra> | I think we should change it for identifiers |
21:11 | <Bakkot> | with the effect that `/(?<\u{1d49c}>\u{1d49c})/` returns a group named "𝒜" with value "u{1d49c}", as gibson042 pointed out above? |
21:11 | <Bakkot> | msaboff ^ |
21:11 | <sffc> | Can someone explain the argument for allowing \u\u in capturing groups? |
21:11 | <devsnek> | sffc: that code example that bakkot just posted |
21:11 | <Bakkot> | devsnek sffc I will write it out; it is not that code example |
21:12 | <msaboff> | Bakkot Yes it would. |
21:12 | <devsnek> | it isn't? |
21:12 | <devsnek> | i thought the entire argument was that the example shouldn't be allowed |
21:12 | <devsnek> | so you'd have to use surrogate pairs |
21:12 | <Bakkot> | devsnek oh I guess that's part of it |
21:12 | <Bakkot> | let me write it out. |
21:13 | <bradleymeck> | shu: found the comment about different alloc trade offs https://freenode.logbot.info/tc39/20200219#c3271588-c3271590 that i was curious about, just a note no action |
21:13 | <sffc> | I don't see what Bakkot's example has to do with `(?<\u\u>)`; the question of what to do with `(?<\u{}>)` is a separate question |
21:14 | <devsnek> | this makes me want to buy facerig now |
21:14 | <Bakkot> | sffc the major argument is, currently if you are turning a JS source text into ascii you can do that for regexes by replacing any non-BMP with two escaped surrogate halves. if we disallow `\u\u`, now you have to actually parse it. and if you can't use `(?<\u{1d49c}>` in non-unicode regexs, then you can't do the thing you are trying to do at all; some people think you should not be able to use that (this is the relevance of my |
21:14 | <Bakkot> | previous example). |
21:15 | <devsnek> | how are they just replacing everything with surrogate pairs |
21:15 | <devsnek> | those aren't valid identifiers |
21:15 | <Bakkot> | "for regexes" |
21:15 | <shu> | bradleymeck: yeah, that's the buffer allocated if you go through Wasm.Memory |
21:15 | <devsnek> | ohhh just for regexes |
21:15 | <shu> | bradleymeck: once you get the SAB constructor back out, that follows JS rules (no rounding, page boundaries, wahtever) |
21:17 | <sffc> | Bakkot: so the problem is only for non-unicode regexes. But we already don't allow Unicode identifiers as capture group names in non-unicode regexes, according to the last slide in the presentation. |
21:17 | <Bakkot> | the current state is incoherent. |
21:18 | <Bakkot> | we neither allow nor disallow; I apologize that my slides suggested otherwise. |
21:19 | <sffc> | It makes sense if `/(?<\u{1d49c}>\u{1d49c})/` would have the behavior you suggested earlier (returns a group named "𝒜" with value "u{1d49c}"). Maybe it's a little weird, but it's well-defined. |
21:19 | <msaboff> | Bakkot Would you be fine with allowing \u{} in a named capture group ID for non-unicode RegExps? |
21:20 | <sffc> | Alternatively, it would make sense if you just forbid all non-BMP identifiers in non-unicode RegExp capture group names, in which case your example would be a compile error. |
21:20 | <gibson042> | you can't do that; `/(?<𝒜>.)/` is already valid |
21:21 | <Bakkot> | gibson042 this feature is not widely enough used for us to worry about back compat, I think |
21:21 | <gibson042> | ok, fair enough |
21:21 | <msaboff> | I think we can separate what is valid for a capture group identifier and what is match by the RegExp |
21:21 | <Bakkot> | msaboff I would (actually my assumed it was uncontroversial that they should be allowed); other people have said they would object to that, though. |
21:22 | <Bakkot> | *actually my presentation assumed |
21:22 | <michaelficarra> | reminder: please add your company name to your Zoom display name |
21:22 | <devsnek> | private fields aren't in the spec btw |
21:23 | <ljharb> | right, this is about adding it to the stage 3 private fields spec |
21:23 | <devsnek> | mhm |
21:23 | <msaboff> | Bakkot And then would you also support NOT allowing \uHHHH\uHHHH as a surrogate pair as a named capture group ID. It could be valid if each escape is a valid ID start / IS part. |
21:24 | <devsnek> | can't we just use IdentifierName |
21:25 | <gibson042> | people arguing for `/(?<\u{1d49c}>.)/` to be equivalent to `/(?<𝒜>.)/` while `/\u{1d49c}/` is NOT equivalent to `/𝒜/`... what's the benefit? |
21:25 | <jridgewell> | `foo?.bar.#baz` => `foo == null ? undefined : foo.bar.#baz` |
21:25 | <Bakkot> | msaboff I lean towards the other side of that question. I think approximately zero humans ever write or read code containing unicode escapes in named capture groups, so it makes sense to make things as easy as possible for tooling responsible for generating it. Allowing escaped surrogate halves here makes it easy for tooling. |
21:26 | <devsnek> | gibson042: that we use a consistent identifier everywhere |
21:26 | <mathiasbynens> | gibson042: what would Waldemar's hypothetical ASCIIfier return for the regexp-stored-as-string '[<U+0000>-<U+10FFFF>]', with the condition that it doesn't know whether the target RegExp will have the u flag or not? |
21:26 | <msaboff> | I get that, but does it make sense to have different grammar rules for identifiers in the language versus named capture group identifiers? |
21:26 | <mathiasbynens> | gibson042: if you know you end up in a `u` regexp, you can output `/\0-\u{10FFFF}/u` and call it a day, but that pattern wouldn't work in non-`u` |
21:26 | <Bakkot> | mathiasbynens you can replace those with the two surrogate halves and it will preserve semantics in both cases, I am almost certain. |
21:27 | <mathiasbynens> | Bakkot: well no you'd break the range in the non-u case |
21:27 | <Bakkot> | how sure are you? |
21:27 | <mathiasbynens> | Bakkot: 100% |
21:27 | <mathiasbynens> | Bakkot: since you're now creating a range between U+0000 and highSurrogate(U+10FFFF) |
21:28 | <msaboff> | Bakkot I don't think so. In mathiasbynens example, the RegExp is quite different with and without /u |
21:28 | <Bakkot> | mathiasbynens "since you're now creating"... in the non-u case? |
21:28 | <mathiasbynens> | and then trailSurrogate(U+10FFFF) is a lone character in the character class... |
21:28 | <Bakkot> | are you sure that's not what you already had? |
21:28 | <Bakkot> | I think that's what you already had |
21:28 | <mathiasbynens> | Bakkot: it isn't, is what i'm saying |
21:29 | <ljharb> | gus's point here is really good. |
21:30 | <devsnek> | :D |
21:30 | jridgewell | reverts PR to original state |
21:30 | <devsnek> | lol |
21:30 | <rkirsling> | 👏 |
21:31 | <Bakkot> | mathiasbynens I am pretty sure that's what you already had, at least in real engines |
21:31 | <Bakkot> | `/^[𝒜]$/.test('\ud835')` is "true" |
21:32 | <devsnek> | i think bmeck suggested we should set up specifying optional chaining and member expressions using some sort of macro syntax |
21:32 | <mathiasbynens> | Bakkot: I'm not talking about engines? |
21:32 | <devsnek> | i'd be in favor of doing such a thing |
21:32 | <Bakkot> | mathiasbynens I think also per spec, sorry |
21:32 | <sffc> | I think the main question for me is how `...` in `(?<...>)` is interpreted. If it's an identifier, then `\u\u` should be disallowed. If it's a string, then `0` should be allowed. |
21:32 | <mathiasbynens> | Bakkot: I'm talking about neither of those things :/ |
21:32 | <Bakkot> | mathiasbynens sorry, I am confused then |
21:32 | <msaboff> | Bakkot If mathiasbynens range was written with the last codepoint in the range as two \u escapes, do you agree with what he says the range becomes? |
21:33 | <rkirsling> | lol, "@jridgewell pushed 0 commits." |
21:33 | <Bakkot> | msaboff yes, my point is that it was already that thing to start with, I'm pretty sure |
21:33 | <mathiasbynens> | Bakkot: waldemar described an asciifier that takes a regexp-stored-as-string and turns it into an actual piece of sourcetext representing a regexp literal, WITHOUT knowing whether that literal will get a `u` flag or not |
21:33 | <msaboff> | With the /u flag it is completely different. |
21:33 | <Bakkot> | mathiasbynens yes? |
21:33 | <shu> | the consensus to the optional chaining for hash names was allow everywhere, right? |
21:34 | <Bakkot> | msaboff it only becomes that thing in non-unicode regexes |
21:34 | <shu> | (i'm confused by the second sentence in the consensus in the notes) |
21:34 | <devsnek> | shu: yes |
21:34 | <jridgewell> | I think I pushed -2 commits. |
21:34 | <Bakkot> | in unicode regexes, it is still the single range |
21:34 | <msaboff> | I think we agree |
21:34 | <Bakkot> | let me write out the four cases here |
21:34 | <mathiasbynens> | Bakkot: that's my point. the asciifier already needs to either a) know whether or not it gets the u flag or b) go out of its way to produce polyglot patterns that work properly in either case |
21:35 | <Bakkot> | mathiasbynens can it not just unconditionally put the two surrogate halves? |
21:35 | <Bakkot> | that is what I am having trouble with. |
21:35 | <Bakkot> | if not, why not? |
21:35 | <Bakkot> | what case does that break, and why does it break it? |
21:35 | <sffc> | So is waldemar's point that both `/(?<𝒜>/` and `/(?<𝒜>/u` asciify to the same thing? |
21:35 | <msaboff> | I think that an asciifier must know if the u flag is present. |
21:35 | <sffc> | (I'm missing a `)` in those examples) |
21:36 | <mathiasbynens> | that would behave differently with u vs non-u |
21:36 | <rkirsling> | yeah, are we really saying that there's currently ALWAYS a way to write a regexp-as-string without knowing whether it's unicode? |
21:36 | <Bakkot> | mathiasbynens _what_ would behave differently with `u` vs non-`u`? |
21:36 | <rkirsling> | like, why would that invariant ever exist? |
21:36 | <mathiasbynens> | Bakkot: another example: `'[💩-💫]'` |
21:36 | <mathiasbynens> | Bakkot: what would you output that works in both `u` and non-`u`? |
21:37 | <Bakkot> | that's never a legal non-u regex, so you don't have to worry about it |
21:37 | <mathiasbynens> | Bakkot: you're changing the goalposts though, the asciifier needs to produce output that's valid for either, since it doesn't know! |
21:37 | <mathiasbynens> | https://mathiasbynens.be/notes/es6-unicode-regex is full of examples |
21:38 | <msaboff> | I think we are heading into the weeds here. Lets focus on the named capture group IDs. |
21:38 | <gibson042> | yes, please |
21:38 | <Bakkot> | mathiasbynens outputting '[\uXXX\uXXX-\uXXX\uXXX]' will preserve the semantics: in the non-u case it is (still) an error, and in the u case it is (still) a single range |
21:38 | <Bakkot> | so your asciifier has preserved the semantics |
21:38 | <Bakkot> | which is what it needed to do |
21:38 | <Bakkot> | (sorry, four Xs, obviously) |
21:39 | <sffc> | When you're not in a named capture group, your asciifyer can output `\u\u`. I think that's fine. |
21:39 | <ljharb> | littledan: github has a "template" feature; you don't need to fork it, you *should* click the "use this template" button |
21:39 | <msaboff> | Thee is the interesting case of a RegExp with a NCG ID with non-BMP characters, but I don't think that is too controversial |
21:39 | <mathiasbynens> | the whole asciifier argument doesn't make sense. it's possible to produce patterns that work in either case, but it needs some work. capture group IDs would not be unique |
21:39 | <ljharb> | littledan: as opposed to making a totally disconnected repo |
21:39 | <Bakkot> | mathiasbynens... what? |
21:40 | <littledan> | ljharb: Oh cool. Could we point people to this? |
21:40 | <ljharb> | littledan: it's a big green button on the template repo |
21:40 | <gibson042> | repeating myself: in order to represent in ASCII a regular expression with a non-BMP capture group name, it is necessary to allow *at least one of* `/(?<\ud835\udc9c>.)/` with surrogate-pair semantics or `/(?<\u{1d49c}>.)/` with code point semantics. I am against the latter because of inconsistency with \u{1d49c} in non-Unicode regexes outside of capture groups. |
21:40 | <ljharb> | littledan: https://github.com/tc39/template-for-proposals |
21:40 | <littledan> | heh yeah that's really clear |
21:40 | <littledan> | sorry |
21:40 | <Bakkot> | mathiasbynens I am confused. my point is, currently you can write a regex asciifier which preserves semantics easily. if you have to parse it and treat named capture groups and non-named-capture groups, it is now harder. do you disagree with either of those two sentences? if so, which and why? |
21:41 | <ljharb> | littledan: np, the feature didn't exist when i first made the template |
21:41 | <littledan> | hmm, should we tell people to use that button in the #create-your-proposal-repo section? |
21:41 | <ljharb> | ideally, but i wasn't sure the committee had consensus on recommending that template yet |
21:41 | <Bakkot> | gibson042 other people are in favor of allowing `/(?<\u{1d49c}>.)/`, it sounds like, and damn the inconsistency (which seems fair enough to me; no human will ever write that code so the inconsistency isn't really a problem) |
21:41 | <sffc> | gibson042: I don't care about the inconsistency you mention. I find it more inconsistent that we are introducing a context in which we allow surrogate pairs as identifier names. |
21:42 | <mathiasbynens> | Bakkot: "easily"? you got the `<U+0>-<U+10FFFF>` wrong |
21:42 | <msaboff> | gibson042 The problem is that \ud835\udc9c is not valid for a JS identifier but \u{1d49c} is. |
21:42 | <sffc> | So I'm +1 to "damn the inconsistency" |
21:42 | <Bakkot> | mathiasbynens I still do not understand how I got it wrong. |
21:42 | <mathiasbynens> | Bakkot: what would you output? |
21:42 | <Bakkot> | mathiasbynens please give an example of a regex where the output of my algorithm does not have the same semantics as the input. |
21:42 | <msaboff> | So you want different syntax for specifying an ID inthe two contexts |
21:43 | <Bakkot> | mathiasbynens '[\uXXXX\uXXXX-\uXXXX\uXXXX]' |
21:43 | <mathiasbynens> | Bakkot: what would \uXXXX\uXXXX look like for U+0000 exactly? |
21:43 | <Bakkot> | mathiasbynens sorry, yes, for BMP code points it would just be `\uXXXX`, of course |
21:43 | <Bakkot> | so, `\u0000` |
21:44 | <mathiasbynens> | that's not a working regexp :/ |
21:44 | <mathiasbynens> | "where the output of my algorithm does not have the same semantics as the input" is key |
21:44 | <Bakkot> | mathiasbynens _neither was the input_ |
21:44 | <Bakkot> | so the semantics are preserved |
21:44 | <mathiasbynens> | Waldemar is saying his asciifier doesn't know which output flags are used |
21:44 | <Bakkot> | or, wait, hang on |
21:45 | <Bakkot> | wait why isn't it a working regex |
21:45 | <mathiasbynens> | the input _is_ valid |
21:45 | <Bakkot> | I keep getting confused between this and the '[💩-💫]' case |
21:45 | <Bakkot> | mathiasbynens why isn't it a working regex |
21:45 | <sffc> | Do we have agreement on allowing `/(?<\u{1d49c}>\u{1d49c})/` with the inconsistency about `\u{1d49c}` being interpreted differently in the capture group versus the main regex? |
21:45 | <robpalme> | back in 10 mins! |
21:45 | <Bakkot> | sffc gibson042 explicitly objected to that |
21:46 | <gibson042> | as did Waldemar, and probably more strongly than me if we're being honest |
21:46 | <msaboff> | sffc I'm fine with that |
21:46 | <rkirsling> | that's a pretty strong point of contention among the committee then :( |
21:46 | <mathiasbynens> | Bakkot: so you'd do something like [\0-\uLEAD\uTRAIL], which creates a range between U+0000 and U+LEAD, and then adds U+TRAIL as a lone character |
21:46 | <Bakkot> | mathiasbynens right, which is what your input regex did. |
21:47 | <mathiasbynens> | Bakkot: no, the input is a string, which represents a regex, per Waldemar's description |
21:47 | <sffc> | The alternative from my perspective is if we allow `/(?<0>.)/` and interpret "0" as a string, such that you can do `.groups["0"]` |
21:47 | <Bakkot> | or rather: right, that's what it does in the non-u case, which is what the input regex did in the non-u case. in the u case it creates a single range, which is what the input regex did in the u case. |
21:47 | <mathiasbynens> | Bakkot: you cannot say that's what the input did, because you cannot know this without knowing whether it's `u` vs non-`u` |
21:47 | <mathiasbynens> | which this supposed asciifier doesn't |
21:48 | <mathiasbynens> | so you cannot produce a broken pattern, you have to make something that works |
21:48 | <Bakkot> | mathiasbynens the job of the asciifier is to preserve the semantics. that is it's only job. if the input was going to be used with `u`, the semantics are preserved. if the input was going to be used without `u`, the semantics are preserved. so the semantics are preserved either way. |
21:49 | <Bakkot> | do you disagree about the job of the asciifier, or do you disagree that in both branches the semantics are the same? |
21:49 | <msaboff> | What would an asciifier do with |
21:49 | <msaboff> | let s = "[\0-<U+10fff>]"; |
21:49 | <msaboff> | r = new RegExp(s, "u") |
21:49 | <mathiasbynens> | ^ |
21:49 | <msaboff> | Where the <U+10FFFF> is the actual character |
21:49 | <Bakkot> | msaboff `let s = [\u0000-\uTRAIL\uLEAD]"` |
21:50 | <Bakkot> | r then has the same semantics |
21:50 | <mathiasbynens> | it does not lol |
21:50 | <sffc> | Can someone address my question about whether the capture group is a string, an identifier, or something special? My understanding is that if we go with what gibson042 and waldemar prefer, then we're introducing a new context where surrogate pairs are allowed as identifiers, but other strings are not. |
21:51 | <msaboff> | sffc: it (should be) an identifier |
21:51 | <mathiasbynens> | Bakkot: oh you meant a leading quote there |
21:51 | <gibson042> | mathiasbynens: Bakkot's point is that for both Unicode and non-Unicode regular expressions, `[\0-\ud835\udc9c]` is equivalent to `[\0-𝒜]` so that contextual awareness is irrelevant |
21:52 | <Bakkot> | mathiasbynens yes, leading quote, of course |
21:52 | <Bakkot> | yeah what gibson042 said |
21:52 | <mathiasbynens> | ok the disconnect is, i've been thinking of an asciifier that's like a JS function that accepts a string |
21:53 | <mathiasbynens> | whereas you see it as a tool operating on the source code |
21:53 | <sffc> | gibson042: if the capture group name is an identifier, then how do you justify the inconsistency of allowing `\u\u` in this context but not in a `let \u\u` context? |
21:54 | <gibson042> | I'd prefer it in both, but would justify the inconsistency by pointing out that this is an encoding inside a literal |
21:54 | <mathiasbynens> | if that's what you're doing, you could just transform any such group names globally, right? |
21:54 | <mathiasbynens> | much like variable name minification |
21:54 | <mathiasbynens> | but not without changing potentially observable semantics, sure |
21:55 | <gibson042> | and the prohibition against non-IdentifierNames could in principle be relaxed without changing my position |
21:55 | <sffc> | gibson042: ok, so we agree that we have an inconsistency with both outcomes. |
21:55 | <keith_miller> | to be fair even the asciifier would have observable semantics :P |
21:55 | <mathiasbynens> | keith_miller: right... |
21:55 | <msaboff> | The rule in https://tc39.es/ecma262/#prod-RegExpIdentifierName resolves to RegExpIdentifierStart folowed by RegExpIdentifierPart, but they have the same productions as IdentifierStart and IdentifierPart |
21:55 | <jridgewell> | `/[\0-\ud835\udc9c]/u` is not equivalent to `/[\0-𝒜]/u` |
21:55 | <keith_miller> | changes |
21:56 | <keith_miller> | because you changed the length of the file |
21:56 | <mathiasbynens> | and .source and .toString() etc. |
21:56 | <Bakkot> | jridgewell how sure of that claim are you |
21:56 | <gibson042> | sffc: it's an inconsistency of the same sort that allows `\n` but not raw U+000A in string literals |
21:56 | <jridgewell> | I just tested in Chrome |
21:56 | <Bakkot> | jridgewell what test did you run? |
21:56 | <gibson042> | jridgewell: for what input do you get different results? |
21:56 | <ljharb> | keith_miller: that's not really observable in JS tho |
21:56 | <jridgewell> | Nope, never mind, I forgot to change it. |
21:57 | <jridgewell> | I hit up twice. 😳 |
21:57 | <keith_miller> | I meant function length |
21:57 | <keith_miller> | ljharb:^ |
21:57 | <keith_miller> | sorry |
21:58 | <sffc> | gibson042 What is the behavior of `/(?<\ud835\udc9c>.)/` in your preference? Does it throw? |
21:58 | <mathiasbynens> | ljharb: function tostring, and source + toString on the regexp too |
21:58 | <gibson042> | sffc: no, it is equivalent to `/(?<𝒜>.)/` |
21:58 | <ljharb> | keith_miller: ah true |
21:59 | <ljharb> | mathiasbynens: also true |
21:59 | <gibson042> | it is how you express non-BMP capture group names without using non-ASCII source |
21:59 | <sffc> | gibson042: right, so you consider `/(?<𝒜>.)/` a valid regex that produces a group name of "𝒜"? |
21:59 | <Bakkot> | mathiasbynens what did you mean by "transform any such group names globally"? |
21:59 | <gibson042> | just like `/(\ud835\udc9c/` is how you express non-BMP matches without using non-ASCII source |
22:00 | <gibson042> | yes |
22:00 | <gibson042> | err, just like `/\ud835\udc9c/` is how you express non-BMP matches without using non-ASCII source |
22:01 | <sffc> | Bakkot: what do you think about allowing non-IdentifierNames in the regex capture group name, as gibson042 suggested would be compatible with his position? |
22:01 | <mathiasbynens> | Bakkot: like if the tool sees a group named `𝒜` it could rename that to `__renamed_1` and give `match.groups.𝒜` the same treatment |
22:01 | <Bakkot> | mathiasbynens sure, but `x = match.groups; x.𝒜` is harder |
22:02 | <mathiasbynens> | Bakkot: yeah |
22:02 | <Bakkot> | and by "harder" I mean "uncomputable" |
22:02 | <msaboff> | I think we are letting this asciifier argument have too much sway. Do we think this is a major use case? |
22:02 | <Bakkot> | sffc I would not want to allow capture group names which, when considered as code points, are not identifiers |
22:03 | <Bakkot> | msaboff in honesty I think it's pretty much the only use case. |
22:03 | <Bakkot> | msaboff that is, I don't think a human is ever going to write unicode escape sequences in group names. it's always going to be tools. |
22:03 | <Bakkot> | so, my preference is to make life easy for tools. |
22:04 | <michaelficarra> | so just allow \u{} everywhere |
22:04 | <michaelficarra> | if that's your only goal |
22:04 | <mathiasbynens> | michaelficarra: we can't do that in non-u regexps OUTSIDE of named groups, but within named groups yesssssssssssssss I'm all for it |
22:04 | <msaboff> | I don't know how prevalent of a use case it is, but I believe that humans WILL write unicode escapes for group names. Not common but likely given the use of poor dev tools. |
22:05 | <Bakkot> | msaboff it's pretty common; I have seen multiple bespoke JS-asciification tools in use at enterprises (all broken to some extent, but we don't need to make them more broken) |
22:05 | <msaboff> | michaelficarra I'm for it in NCG IDs as well. |
22:05 | <Bakkot> | michaelficarra as mathiasbynens says, you can't do it outside of group names. so now the tools have to parse the regex, instead of blindingly replacing everything in the regex. |
22:05 | <devsnek> | is there not just a babel plugin |
22:06 | <sffc> | Bakkot: can you explain why you would not want to allow non-identifiers in capture group names? |
22:06 | <gibson042> | so sffc and maybe mathiasbynens are against allowing `/(?<\ud835\udc9c>.)/` because of IdentifierName, and Waldmar and I are against allowing `/(?<\u{1d49c}>.)/` because of non-/u regexp semantics |
22:06 | <devsnek> | actually doesn't babel shell out to a regex parser |
22:06 | <gibson042> | but at least one of them must be allowed in order to support all-ASCII source |
22:06 | <sffc> | gibson042: that's my understanding of the situation, yes |
22:06 | <mathiasbynens> | I just want (or would like) to be able to copy-paste 𝒜𝒜𝒜 in `/(?<𝒜𝒜𝒜>.)/` and the corresponding `match.groups.𝒜𝒜𝒜` |
22:06 | <mathiasbynens> | in all cases |
22:07 | <msaboff> | Bakkot What would an asciifier do for the "let \u{1d49c};" case? |
22:07 | <gibson042> | that's already ASCII, so it would leave it alone |
22:08 | <Bakkot> | sffc: two reasons: one is that having `>` gets weird, and the other is that having numerics like `0` gets weird |
22:08 | <msaboff> | Okay, then what does an asciifier do for "let 𝒜;"? Won't it also convert it to "let \u{1d49c};"? |
22:08 | <msaboff> | If so, it can't do that without context. |
22:08 | <michaelficarra> | Bakkot mathiasbynens: sorry yes that's what I meant by "everwhere"; "regardless of u flag, inside NCGs" |
22:08 | <gibson042> | yes, it would have to |
22:09 | <msaboff> | So the only weird context is in the pattern part of a RegExp? |
22:09 | <gibson042> | unless we allow surrogate pairs wherever \u{…} is allowed, an asciifier must parse |
22:09 | <Bakkot> | must tokenize |
22:09 | <Bakkot> | doesn't have to do a full parse |
22:09 | <msaboff> | gibson042 I think that is a breaking change for nomral identifiers. |
22:09 | <Bakkot> | (except as necesary for tokenization) |
22:10 | <devsnek> | is there any possible (?< in regex that isn't a group name |
22:10 | <gibson042> | Bakkot: +1 |
22:10 | <Bakkot> | devsnek: `\(?<` |
22:10 | <devsnek> | ok excluding escapes |
22:10 | <Bakkot> | '[(?<]' |
22:10 | <msaboff> | I don't think so besides the escape. |
22:11 | <devsnek> | Bakkot: wouldn't that require a parse |
22:11 | <devsnek> | not just tokenize |
22:11 | <devsnek> | to know you're inside a capture group |
22:11 | <devsnek> | er |
22:11 | <devsnek> | character set |
22:12 | <devsnek> | whatever you call brackets |
22:12 | <mathiasbynens> | character class |
22:12 | <mathiasbynens> | hmm interesting |
22:13 | <Bakkot> | devsnek why? |
22:13 | <msaboff> | What should an asciifier do with: |
22:13 | <msaboff> | let first = "\0"; |
22:13 | <msaboff> | let last = "𝒜"; |
22:14 | <devsnek> | cuz you have to know whether you're parsing a group name or not |
22:14 | <devsnek> | theoretically |
22:14 | <msaboff> | let r = new RegExp("[" + first + "-" + last + "]") |
22:14 | <Bakkot> | devsnek: my desired state is, we end up such that you can replace any non-bmp in any regex (or string) with two escaped surrogates |
22:14 | <Bakkot> | msaboff you can also safely replace non-BMP code points in strings with two escaped surrogate halves |
22:15 | <devsnek> | i mean if we went with "whatever identifiers do" |
22:15 | <Bakkot> | devsnek oh, yes. that's my point; that's what I am hoping to avoid. |
22:15 | <gibson042> | it's also worth noting that `let \u0061\u0061` *is* valid, but `let \ud835\udc9c` is not simply because code unit U+D835 is not treated as part of a surrogate pair |
22:15 | <gibson042> | Bakkot: I agree |
22:16 | <msaboff> | That doesn't work for let r = new RegExp("(?<" + last + ">.)"); |
22:16 | <Bakkot> | msaboff how does it not? |
22:17 | <sffc> | If the asciifyer isn't able to have different behavior based on whether you are in a capture group or whether you are in a Unicode vs non-Unicode regex, then you need to expand to `\u\u`. Are those two restrictions required for the asciifyer? |
22:17 | <mathiasbynens> | let last = '\uD835\uDC9C'; // which is === '𝒜' |
22:17 | <msaboff> | I doubt that the two surrogate escapes are ID_Start and ID_Continue |
22:17 | <gibson042> | exactly |
22:17 | <mathiasbynens> | there won't be any escapes by the time you put it into the RegExp at runtime |
22:19 | <Bakkot> | sffc in practice, most asciifiers I see in the wild are bespoke and at least a little bit broken because they are not aware of all the absurd edge cases in JS. I am hopeful we can minimize new sharp edges. |
22:19 | <gibson042> | if `let \ud835\udc9c` were interpreted analogously to `"\ud835\udc9c"`, with the two escapes recognized as a single code point, then an asciifier could always replace non-ASCII code points with \u…\u… surrogate pairs |
22:19 | <sffc> | gibson042: non-Unicode regexes already have strange behavior when you embed Unicode characters. We're making the behavior no less strange. |
22:20 | <mathiasbynens> | in general, it seems bad to expose the concept of surrogates in more places |
22:20 | <gibson042> | we're not talking about changing the semantics of non-Unicode regexes with non-ASCII characters |
22:20 | <sffc> | +1 mathiasbynens |
22:20 | <gibson042> | I actually disagree with that, but it's a bit of a tangent anyway |
22:20 | <Bakkot> | sffc we are making the behavior harder for tools to get right, and no harder for humans to get right |
22:21 | <sffc> | gibson042: asciifyers aside, what should `/(?<\u{1d49c}>.)/` do in your opinion? |
22:23 | <gibson042> | it should be recognized as equivalent to `/(?<u{1d49c}>.)/`, i.e. an attempt to create a regex with a capture group named "u{1d49c}" |
22:23 | <Bakkot> | oof |
22:23 | <Bakkot> | I do not like that option |
22:23 | <gibson042> | which is currently invalid because group names must be IdentifierNames |
22:24 | <gibson042> | just like `/\u{1d49c}/` matches only "u{1d49c}" |
22:24 | <mathiasbynens> | named captures should have been `u`-only |
22:24 | <gibson042> | IOW, "\u{" has no special semantics in non-Unicode regular expressions |
22:25 | <Bakkot> | so the effective answer is, it should be an error, right? |
22:25 | <gibson042> | yes |
22:25 | <Bakkot> | ok good that's not so bad then |
22:26 | <devsnek> | mathias +1 |
22:26 | <gibson042> | and the spec machinery would be essentially "UTF16Decode, then require the result to conform with IdentifierName" |
22:27 | <devsnek> | i'd rather push proper unicode escapes into old regex than push old escapes into identifiers |
22:27 | <gibson042> | but you *can't* push them all the way in |
22:28 | <mathiasbynens> | devsnek: would be amazing if that was web compatible :o |
22:28 | <michaelficarra> | devsnek: they already exist in identifiers, they're just arguably handled wrong |
22:28 | <devsnek> | i meant in the group name |
22:28 | <devsnek> | not generally |
22:28 | <michaelficarra> | oh :-( |
22:28 | <devsnek> | lol |
22:28 | <mathiasbynens> | devsnek: ah yes, 100% agree |
22:28 | <devsnek> | tfw you confuse three people all at once |
22:28 | <michaelficarra> | in different ways |
22:29 | <keith_miller> | shu: Out of curiosity how did this API come up? |
22:29 | <keith_miller> | Was it from talking to graphics peoples? |
22:30 | <mathiasbynens> | gibson042: sorry for being a broken record, but why is that inconsistency (of \u{...} being allowed in named groups, but not elsewhere, in non-u regexps) too much for you? |
22:30 | <rkirsling> | ^ and is this an objection and not just a dispreference? |
22:31 | <mathiasbynens> | gibson042: i don't understand how that apparently outweighs the `/(?<𝒜𝒜𝒜>.)/u` && `match.groups.𝒜𝒜𝒜` consistency, which seems much more common |
22:31 | <Bakkot> | mathiasbynens: wait, what inconsistency |
22:32 | <Bakkot> | has anyone suggested `/(?<𝒜𝒜𝒜>.)/u` && `match.groups.𝒜𝒜𝒜` not work? |
22:32 | <mathiasbynens> | no |
22:32 | <mathiasbynens> | in plenary when it was suggested that we could make \u{...} work in group names within non-u RegExps |
22:32 | <Bakkot> | ah |
22:33 | <Bakkot> | (I am fine with that fwiw) |
22:33 | <Bakkot> | mathiasbynens: also, same question for you: why is the inconsistency of `\u\u` being allowed in named group names, but not in identifiers outside of literals, too much for you? |
22:33 | <mathiasbynens> | gibson042 said that'd be inconsistent with \u{} elsewhere in non-u RegExps (which is true) |
22:34 | <mathiasbynens> | Bakkot: the way i see it, we have to choose between the two, and so we should choose based on which pattern is more common |
22:34 | <gibson042> | It's too much because it adds *even more* complexity to an already overwhelming part of the language, and does so for very little benefit IMO. This is a strong dispreference, but I (though not necessarily Waldemar) would yield to supermajority. |
22:34 | <Bakkot> | mathiasbynens: why do you see it that we have to choose between the two? |
22:35 | <Bakkot> | mathiasbynens: my preference is to allow both (in both kind of regexes), as we do in strings and `u` regexs |
22:35 | <gibson042> | sffc and maybe mathiasbynens are against allowing `/(?<\ud835\udc9c>.)/` because of IdentifierName, and Waldmar and I are against allowing `/(?<\u{1d49c}>.)/` because of non-Unicode regexp semantics... but at least one of them must be allowed in order to support all-ASCII source |
22:35 | <Bakkot> | this makes life easiest for tooling authors and creates in expectation zero problems for any other humans, I would guess |
22:36 | <mathiasbynens> | what gibson said ^ |
22:36 | <Bakkot> | ugh |
22:36 | <mathiasbynens> | did i get that wrong? |
22:36 | <Bakkot> | yeah I think that's correct |
22:36 | <Bakkot> | I would like us to think first about what the actual effects of our decisions on future humans will be |
22:36 | <mathiasbynens> | and i agree on "zero problems for humans" |
22:36 | <mathiasbynens> | i just don't like to make the language uglier by allowing surrogates in more places |
22:37 | <Bakkot> | I appreciate that preference, I just think it should be outweighed by the relatively substantial likelihood that this decision leads to someone shipping broken code to real users as a result of tooling which is not aware of this edge case |
22:37 | <mathiasbynens> | i would hope (perhaps naively) that future humans always use the `u` flag |
22:37 | <Bakkot> | some will, many won't |
22:37 | <gibson042> | isn't it worse to have `\ud835\udc9c` sometimes be two code points and sometimes one? |
22:39 | <shu> | keith_miller: oh, no, not from the graphics folks |
22:39 | <mathiasbynens> | gibson042: hmm? |
22:39 | <keith_miller> | Interesting, where did it come up? |
22:40 | <gibson042> | regarding "i just don't like to make the language uglier by allowing surrogates in more places", I think it's worse to have more places where `\ud835\udc9c` represents two code points rather than one |
22:41 | <ljharb> | benjamn: wait, import.meta inherits from Module.prototype in node?? |
22:41 | <ljharb> | benjamn: or, you might want it to |
22:41 | <shu> | keith_miller: surma brought it up in working with bitmaps pulling out the A's instead of the RGB's, i think is the direct motivating example |
22:41 | <keith_miller> | got it |
22:41 | <benjamn> | ljharb: no, but Module.prototype was a useful feature of CommonJS |
22:41 | <Bakkot> | gibson042: if it's forbidden it doesn't really represent anything |
22:41 | <shu> | keith_miller: and indeed, the plan for RGBs was to make 3 views for each channel |
22:42 | <gibson042> | because `/(?<\u0061\u0061>.)/` is valid now and will presumably remain valid |
22:42 | <mathiasbynens> | gibson042: i don't follow. how does allowing \{...} in non-u RegExp group names increase the number of cases where `\ud835\udc9c` represents 2 code points? |
22:42 | <shu> | keith_miller: there's a category mismatch for me for the graphics use cases needing more expressivity -- simple strides exist in other languages and enjoy use, despite lacking the extra expressivity |
22:43 | <Bakkot> | gibson042: that is neither valid nor invalid now (per spec), and could be made invalid without breaking anyone (I suspect) |
22:43 | <shu> | keith_miller: so maybe the high-order bit here is actually how much implementation burden is there, given that this is intended to be a smallish, incremental ergonomic win |
22:44 | <mathiasbynens> | Bakkot: hm, that would be another deviation from Identifier though |
22:44 | <gibson042> | right. So rejecting `/(?<\ud835\udc9c>.)/` can only be on the basis of treating it as two code points |
22:44 | <shu> | keith_miller: that is, i'm pushing back against the framing that satisfying all graphics use cases is a pre-req |
22:45 | <Bakkot> | mathiasbynens \ud835\udc9c is not legal in identifiers, is it? |
22:45 | <gibson042> | which it is not in the same regex outside of naming a group |
22:45 | <mathiasbynens> | Bakkot: no |
22:45 | <Bakkot> | mathiasbynens wait which "no" |
22:45 | <Bakkot> | "no, it is not legal" or "no, you're mistaken, it is legal" |
22:45 | <mathiasbynens> | Bakkot: escaped surrogate pairs in identifiers == not valid |
22:45 | <gibson042> | `\ud835\udc9c` is not a valid IdentifierName because it is interpreted as two code units, neither of which are in a valid class |
22:46 | <wsdferdksl> | You can already access a non-BMP property using foo["\ud835\udc9c"]. |
22:46 | <keith_miller> | shu: I'd roughly agree with that assessment but I phrase it as the cost is roughly known/fixed but there may be enough use cases to justify it |
22:46 | <mathiasbynens> | (it's v late and i've been v difficult here, apologies and cheers for bearing with me so far) |
22:46 | <gibson042> | err, two code *points* |
22:46 | <shu> | keith_miller: yeah, point taken |
22:46 | <gibson042> | if it were interpreted as a surrogate pair for a single code point, then it would be a valid identifier |
22:46 | <wsdferdksl> | RegExes should work like strings |
22:46 | <keith_miller> | I'm not trying to say you have to solve all use cases only that there are still a lot of use cases that are not very ergonomic anyway |
22:47 | <gibson042> | which is what happens inside strings and inside regular expression literals outside of naming capture groups |
22:47 | <keith_miller> | with this api* |
22:47 | <shu> | keith_miller: right, so it comes down to how big is the set of use cases that would be made ergonomic, and how much work do we have to do for it |
22:47 | <msaboff> | I just checked and the current spec only allows unicode escapes in NGC Identifiers for Unicode. And it allows both \uXXXX\uXXXX and \u{XXXXX} for NCG identifiers. |
22:47 | <keith_miller> | yeah |
22:47 | <keith_miller> | I think we're on the same page |
22:47 | <shu> | keith_miller: which are both pretty valid; all the use cases on the explainer now are graphics, and if graphics folks are like "lol no" then that's just bad motivation. if we can't find better ones then yeah, just do it in user code |
22:48 | <mathiasbynens> | wsdferdksl: regexes have different concepts of what constitutes a "character" depending on the `u` flag, so strings/regexps don't map nicely |
22:48 | <Bakkot> | msaboff: the current spec has an early error for "the SV of RegExpUnicodeEscapeSequence", which is not an operation which is defined |
22:48 | <wsdferdksl> | I'm talking about non-u regexes |
22:48 | <mathiasbynens> | msaboff: sorry, what is NCG? |
22:48 | <Bakkot> | named capture group |
22:48 | <mathiasbynens> | ah duh |
22:49 | <wsdferdksl> | Both those and strings work with 16-bit chunks |
22:49 | <msaboff> | Named Capture Group |
22:49 | <gibson042> | regexes of both kinds recognize surrogate pairs as single code points outside of naming capture groups |
22:49 | <wsdferdksl> | No |
22:49 | <mathiasbynens> | no, look at atoms |
22:49 | <Bakkot> | character classes too |
22:49 | <msaboff> | gibson042 in practice pretty much in reality no |
22:49 | <mathiasbynens> | gibson042: e.g. \uLEAD\TRAIL{2} |
22:50 | <mathiasbynens> | https://mathiasbynens.be/notes/es6-unicode-regex has some examples |
22:51 | <gibson042> | this is veering into semantics now; non-Unicode regexes operate on UTF-16 code units |
22:52 | <keith_miller> | lol |
22:52 | <Bakkot> | wsdferdksl mathiasbynens gibson042: I don't think we're likely to resolve this today. are you all OK with the committee approving the current spec, including this oversight, as the candidate for 2020, and trying to resolve this later? |
22:52 | <wsdferdksl> | No |
22:52 | <wsdferdksl> | We should resolve this |
22:52 | <gibson042> | I am, since it's not new anyway |
22:52 | <mathiasbynens> | Bakkot: I see no rush tbh. I'd rather resolve it properly |
22:53 | <Bakkot> | wsdferdksl: given that we don't appear to be close to consensus, how can we resolve it? |
22:53 | <Bakkot> | I guess we could call a formal vote for this question |
22:53 | <wsdferdksl> | There is only one solution that works for ASCIIfiers, and it's not difficult to do. |
22:53 | <mathiasbynens> | let's keep the spec as-is until we can get proper consensus (which doesn't have to be in this meeting imho) |
22:53 | <msaboff> | Bakkot: Is the "SV value of RegExpUnicodeEscapeSequence" confusing in the context of a Unicode RegExp? |
22:53 | <mathiasbynens> | wsdferdksl: there are two solutions: we could make \u{...} work in NCG in non-u regexps |
22:53 | <Bakkot> | msaboff It's not defined at all, yes |
22:54 | <Bakkot> | that's why this issue comes up |
22:54 | <wsdferdksl> | I don't see why we don't have consensus. It's not like people are going to be writing this kind of stuff. |
22:54 | <Bakkot> | well, we don't |
22:54 | <wsdferdksl> | Why not? |
22:54 | <Bakkot> | people feel strongly about consistency with identifiers, mostly |
22:55 | <wsdferdksl> | How is that relevant? |
22:55 | <wsdferdksl> | It's not like people are going to be writing this kind of stuff. |
22:55 | <michaelficarra> | wsdferdksl: is this your first meeting? |
22:55 | <mathiasbynens> | wsdferdksl: why don't we make \u{...} work _only_ in NCG in non-u regexps? that way we don't expose the unfortunate concept of surrogates to more places in the language |
22:56 | <gibson042> | "it adds *even more* complexity to an already overwhelming part of the language, and does so for very little benefit IMO" |
22:56 | <wsdferdksl> | That would be gratuitously confusing. Once again, it's not like folks are going to be writing this stuff by hand. |
22:56 | <gibson042> | I'm stepping out for a bit, be back later |
22:57 | <mathiasbynens> | gibson042: can't you say the same thing about allowing individually-escaped paired surrogates in groups? |
22:58 | <mathiasbynens> | i'm heading off, should've gone to bed hours ago. thanks for bearing with me y'all. and Bakkot, I really appreciate your work on trying to fix this spec bug, one way or another -- thanks! |
22:58 | <Bakkot> | mathiasbynens thanks for engaging; sleep well |
23:00 | <msaboff> | Bakkot How do you reconcile that with "SV of UnicodeEscapeSequence"? The only difference compared to RegExpUnicodeEscapeSequence is that RegExpUnicodeEscapeSequence includes \uXXXX\uXXXX. |
23:01 | <bradleymeck> | xs uses freeze |
23:01 | <sffc> | Regexes are confusing. This is an edge case. My preference is to make `/(?<\u{1d49c}>.)/ == /(?<𝒜>.)/`. I think it's more important to have consistency with language syntax than consistency in behavior. If someone is surprised by the behavior, we have a reason for it. |
23:01 | <Bakkot> | msaboff: "SV" is an operation which is not defined for RegExpUnicodeEscapeSequence |
23:02 | <Bakkot> | but it is defined for UnicodeEscapeSequence |
23:02 | <Bakkot> | wsdferdksl: to be concrete, of the following four regular expressions literals, which do you think ought to be legal? /(?<\ud835\udc9c.)/ /(?<\u{1d49c}>.)/ /(?<\ud835\udc9c.)/u /(?<\u{1d49c}>.)/u |
23:03 | <robpalme> | 4 min break! |
23:03 | <devsnek> | btw https://arai-a.github.io/ecma262-compare |
23:03 | <wsdferdksl> | Bakkot: 0, 2, and 3. |
23:03 | <msaboff> | Bakkot Maybe the fastest path to victory is defining "SV of RegExpUnicodeEscapeSequence" |
23:04 | <wsdferdksl> | The rationale being that's those three out of the four are "legal" if you don't enclose the escapes inside (?<>). |
23:04 | <Bakkot> | msaboff: yeah, but there's normative implications and we have to get people to agree on the normative behavior |
23:04 | <devsnek> | ljharb: bradleymeck: https://nodejs.org/api/vm.html#vm_constructor_new_vm_sourcetextmodule_code_options |
23:04 | <devsnek> | i just remembered this is a thing |
23:05 | <bradleymeck> | non-stage 4 features, in my runtime :gasp: |
23:05 | <devsnek> | we can deprecate it |
23:05 | <devsnek> | actually we don't even need to do that |
23:05 | <devsnek> | its still experimental |
23:05 | <Bakkot> | wsdferdksl: would you be OK with /(?<\u{1d49c}>.)/ being legal? I would prefer it to be legal just for simplicity of tooling, personally |
23:05 | <msaboff> | Bakkot: do you think that any implementation is doing something different than the obvious? |
23:05 | <Bakkot> | msaboff yup |
23:06 | <Bakkot> | both you and chrome are |
23:06 | <wsdferdksl> | It's a bit more complex, but I wouldn't object to that, if the other three were also legal. |
23:06 | <msaboff> | Let me look at your slides again... |
23:06 | <Bakkot> | msaboff in particular, the "obvious" thing would not depend on the presence of the `u` flag, but JSC and V8 both do |
23:07 | <robpalme> | ok break time is over! |
23:07 | <wsdferdksl> | It wouldn't simplify the tooling because you can't use \u{} outside of (?<>) |
23:08 | <Bakkot> | wsdferdksl it depends on the tooling. I could imagine tooling which parses regexes and re-serializes them, and being able to use the same serialization logic for group names for both `u` and non-`u` regexs is slightly simpler. |
23:08 | <Bakkot> | but yes it's a very small win |
23:09 | <msaboff> | But the spec requires the u flag to get to the RegExpUnicodeEscapeSequence production. I think we comply in light of the u flag. Not that a patch landed last night in JSC that deals with this. |
23:09 | <benjamn> | does anyone know the rationale for using the m suffix for decimals? (as in .2m) |
23:09 | <devsnek> | as opposed to? |
23:09 | <benjamn> | haha, yes, it seems pretty arbitrary |
23:09 | <Bakkot> | msaboff the spec does not require the flag, in my reading? |
23:09 | <benjamn> | .2d? |
23:10 | <wsdferdksl> | 0x34d |
23:10 | <devsnek> | f means float, i means imaginary, etc |
23:10 | <benjamn> | oh sure, probably shouldn't be a-f |
23:10 | <benjamn> | why does m mean decimal though? |
23:10 | <devsnek> | that i don't know |
23:10 | <benjamn> | is it like two n's, smushed together? |
23:10 | <benjamn> | like a fraction? |
23:10 | <Bakkot> | deciMal |
23:10 | <shu> | you can't spell decimal without m |
23:10 | <rkirsling> | https://github.com/tc39/proposal-decimal/#why-are-literals-m-why-not-d |
23:10 | <benjamn> | shu: true that |
23:11 | <rbuckton> | _M_oney |
23:11 | <devsnek> | lol |
23:11 | <shu> | oooo |
23:11 | <devsnek> | that's how the twitter teachers will teach it 100% |
23:12 | <msaboff> | https://tc39.es/ecma262/#prod-RegExpUnicodeEscapeSequence has the U suffix and all rules that evaluate to it require U to be true. |
23:12 | <benjamn> | yes! I love the idea of #{ numerator, denominator } |
23:13 | <Bakkot> | msaboff the `[U]` suffix means it's a parameter, not a requirement |
23:13 | <msaboff> | The +U means it must be true |
23:13 | <devsnek> | benjamn: in a decimal type n/d is just that |
23:13 | <rbuckton> | Rationale for C#'s use of `m` (tldr; it was the next best letter in `decimal`): https://stackoverflow.com/a/977562 |
23:14 | <Bakkot> | msaboff which +U? |
23:14 | <rkirsling> | definitely better `m` than `i` or `l` |
23:14 | <sffc> | suggestion for decimal: `#{ mantissa, scale }` where mantissa is a BigInt |
23:14 | <devsnek> | what is scale |
23:14 | <benjamn> | devsnek: oh yes, I'm not suggesting that tuples would automatically serve all the decimal use cases |
23:14 | <sffc> | scale is a power of 10 |
23:14 | <devsnek> | oh exponent ok |
23:14 | <sffc> | `#{ 123n, -2 }` is 1.23 |
23:15 | <msaboff> | Every RHS rule for https://tc39.es/ecma262/#prod-RegExpUnicodeEscapeSequence |
23:15 | <devsnek> | 1.23m is 1.23 |
23:15 | <Bakkot> | msaboff: `[~U] u Hex4Digits` |
23:15 | <Bakkot> | so not every RHS rule |
23:16 | <msaboff> | That is not valid for NCG Identifiers. |
23:16 | <Bakkot> | How not? |
23:16 | <Bakkot> | GroupName is `GroupName[U] :: < RegExpIdentifierName[?U] >` |
23:17 | <Bakkot> | and then `RegExpIdentifierName[U] :: RegExpIdentifierStart[?U]` -> `RegExpIdentifierStart[U] :: \RegExpUnicodeEscapeSequence[?U]` |
23:18 | <msaboff> | In your mind, how does the character value definition for RegExpUnicodeEscapeSequence in https://tc39.es/ecma262/#sec-patterns-static-semantics-character-value answer the SV question? |
23:19 | <benjamn> | rbuckton: oh wow, I did not expect "the [first good] letter in decimal" to be such a persuasive argument |
23:19 | <Bakkot> | https://tc39.es/ecma262/#sec-patterns-static-semantics-character-value gives semantics for CharacterValue for every RHS that RegExpUnicodeEscapeSequence produces |
23:19 | <benjamn> | but d, e, c, and i are all quite problematic… so yeah |
23:20 | <msaboff> | Only the \uHexDIgits is valid to NCG ids in non-u RegExps |
23:20 | <michaelficarra> | sffc: that sounds a lot like a reduced form of Rationals |
23:20 | <rbuckton> | in C#: `d`-double,`e`-exponent, `c`-char, `i`-integer, `l`-long. Only other option would have been `a` |
23:21 | <msaboff> | Therefore we (currently) can't have a non-BMP character in a NCG id for non-u RegExps. Agree? |
23:21 | <rbuckton> | I'm currently pursuing a struct/value-type proposal with syntax for operator overloading. |
23:22 | <msaboff> | And we (currently) can have non-BMP codepoints in NCG ids for u flagged RegExps. |
23:22 | <devsnek> | can we get a tc39-regex channel |
23:22 | <devsnek> | (/s) |
23:22 | <msaboff> | So your slide #10 is conforming. |
23:22 | <Bakkot> | msaboff we can have a non-BMP character in a NCG id as two `\uHex4Digits`. or rather, the spec does not say whether or not we can ahve that. |
23:25 | <msaboff> | The way I read the spec is we must interpret two `\uHex4Digits` as two individual codepoints for non-unicode RegExp NGC ids. The must each be appropriate ID characters depending on their position. |
23:25 | <msaboff> | They might be dangling surrogates, but that would be a syntax error as they wouldn't be ID codepoints. |
23:26 | <michaelficarra> | msaboff: that's how I originally expected non-u regexps to work |
23:26 | <Bakkot> | The "must each be appropriate ID characters depending on their position" bit is the part where the current specification does not provide an answer. |
23:26 | <Bakkot> | but, yes, that would be the smallest delta from the current specification (and is what my current PR does) |
23:27 | <msaboff> | First position IDStart and following codepoints IDContinue |
23:27 | <rbuckton> | I think we need an official "stage 1" proposal repo to investigate operator overloading in all of its various forms, and as a single place to collect the various requirements and concerns. |
23:27 | <devsnek> | https://github.com/tc39/proposal-operator-overloading |
23:28 | <drousso> | ^ thanks :) |
23:28 | <devsnek> | i'm really not a fan of that proposal though |
23:28 | <msaboff> | Bakkot I think the current spec DOES provide the answer for the non u flag NCG id case. |
23:28 | <rbuckton> | devsnek: yes and no. That proposal is currently very specific to the constructor-based overloading approach. I've already expressed concern over this approach from a static analysis/tooling perspective. |
23:29 | <devsnek> | if js had operator overloading i'd want it to be all dynamic and whatnot |
23:29 | <ljharb> | at stage 1, proposals are about solving problems |
23:29 | <Bakkot> | msaboff: it doesn't tell you what "SV of UnicodeEscapeSequence" is, so you can't tell if it satisfies "the UTF16Encoding of a code point matched by the UnicodeIDStart lexical grammar production" |
23:29 | <msaboff> | I may disagree with what is says and be sympathetic to what you want, but that is different than how I read the current spec. |
23:30 | <msaboff> | What does the end of section https://tc39.es/ecma262/#sec-static-semantics-sv say to you about "SV of UnicodeEscapeSequence |
23:31 | <Bakkot> | msaboff: sorry, my previous message should read "it doesn't tell you what the SV of RegExpUnicodeEscapeSequence is" |
23:31 | <Bakkot> | msaboff: the relevant rule is "Early Errors: RegExpIdentifierStart[U]::\RegExpUnicodeEscapeSequence[?U] It is a Syntax Error if the SV of RegExpUnicodeEscapeSequence is none of "$", or "_", or the UTF16Encoding of a code point matched by the UnicodeIDStart lexical grammar production." |
23:31 | <ystartsev> | It feels like we are being pulled off topic a bit |
23:31 | <ystartsev> | (comment regarding discussion in the video) |
23:33 | <devsnek> | mfw 1n is not equal to 1 |
23:33 | <shu> | did someone remove my queue item? |
23:33 | <devsnek> | why do they have relational equality but not absolute equality |
23:34 | <Bakkot> | devsnek: `===` does not do type coercion |
23:34 | <msaboff> | Bakkot: Back to what I said earlier, I think that is what the end of https://tc39.es/ecma262/#sec-patterns-static-semantics-character-value describes. (Even though it talks about the CharacterValue, which is a String with one code point.) |
23:34 | <devsnek> | Bakkot: i'm not saying type coercion is needed |
23:34 | <Bakkot> | msaboff: that defines CharacterValue, not SV |
23:35 | <devsnek> | 1n and 1 both exactly represent the mathematical value 1 |
23:35 | <Bakkot> | my point is that "SV of RegExpUnicodeEscapeSequence" is not defined |
23:35 | <Bakkot> | yes, the smallest delta would be to use CharacterValue instead; that's what my current PR does |
23:35 | <Bakkot> | but it does not, currently, use CharacterValue instead |
23:36 | <msaboff> | IMHO, SV is the obvious string with a single CharacterValue. |
23:36 | <msaboff> | If we added such a rule, would that be suficient? |
23:37 | <Bakkot> | Yes, the problem is that the absence of semantics here means that such a decision would be normative |
23:37 | <Bakkot> | which is why I brought it to committee |
23:37 | <Bakkot> | and then people had opinions about which semantics to choose |
23:39 | <Bakkot> | msaboff: although, that said, that decision would mean it was impossible to render `/(?<𝒜>.)/` as ascii |
23:39 | <msaboff> | Make your PR "the SV of RegExpUnicodeEscapeSequence is the one character string of the CharacterValue of RegExpUnicodeEscapeSequence." and make it normative. |
23:39 | <Bakkot> | which also does seem legitimately bad |
23:40 | <msaboff> | That is a separate change to the RegExp section. |
23:40 | <keith_miller> | akirose: Can we post the meeting agenda for June? I meant to put https://github.com/tc39/ecma262/pull/1912 on the agenda but I thought PRs were automatic... I don't want to forget about it lol |
23:40 | <ljharb> | keith_miller: i'll do that |
23:41 | <keith_miller> | great thanks! |
23:41 | <msaboff> | Bakkot You need to change how RegExpIdentifierStart and RegExpIdentifierPart are defined |
23:41 | <Bakkot> | yeah |
23:41 | <Bakkot> | I am reasonably sure I can spec any possible semantics here if we can agree on the semantics |
23:41 | <Bakkot> | I just want os to agree on one thing |
23:42 | <robpalme> | we may finish at 17:09 if this runs for the full 30 minute slot |
23:43 | <devsnek> | 🎉 |
23:44 | <ystartsev> | ljharb: i may be interested in helping |
23:44 | <rkirsling> | how exciting |
23:45 | <devsnek> | i may be interested in commenting on the github issues |
23:46 | <benjamn> | am I correct in assuming `a ??= b ??= c` would mean `a ?? (a = (b ?? (b = c)))`? |
23:46 | <rkirsling> | devsnek: I might be interested in the complement of that |
23:46 | <devsnek> | that's how you open the js portal |
23:47 | <rkirsling> | benjamn: yep looks right |
23:47 | <rkirsling> | (just without the double-evaluations) |
23:47 | <benjamn> | rkirsling: ahh good point |
23:47 | <ystartsev> | i think it would be `a = a ?? b ?? c` ? |
23:47 | <devsnek> | no |
23:47 | <devsnek> | assignment chains return the most right-hand side |
23:48 | <devsnek> | s/return/use/ |
23:48 | <ystartsev> | ah ok |
23:48 | <devsnek> | `let _v = c; b ??= _v; a ??= _v;` |
23:48 | <jridgewell> | benjamn: I believe you are correct |
23:49 | <benjamn> | devsnek: is there a way to avoid evaluating c in that, if a or b is already defined? |
23:49 | <jridgewell> | https://babeljs.io/repl/#?browsers=&build=&builtIns=false&spec=false&loose=false&code_lz=IYAg_GC8IEblIDGQ&debug=false&forceAllTransforms=false&shippedProposals=false&circleciRepo=&evaluate=false&fileSize=false&timeTravel=false&sourceType=module&lineWrap=true&presets=stage-1&prettier=false&targets=&version=7.9.0&externalPlugins= |
23:50 | <jridgewell> | I don't believe `b` is set if `a` short-circuits |
23:50 | <devsnek> | benjamn: oh you're right it doesn |
23:50 | <devsnek> | doesn't evaluate the right hand side |
23:50 | <jridgewell> | Yah |
23:50 | <devsnek> | i forgot about that |
23:50 | <bradleymeck> | time pressure isn't good |
23:51 | <benjamn> | jridgewell: that reduces to |
23:51 | <benjamn> | (sorry premature send) |
23:52 | <jridgewell> | `a != null ? a : a = b != null ? _b : b = c;` |
23:53 | <benjamn> | yes |
23:53 | <jridgewell> | (s/_b/b/) |
23:53 | <benjamn> | if null there means nullish (including undefined) |
23:53 | <jridgewell> | Yes |
23:53 | <ljharb> | ystartsev: awesome, thanks! i'll post an issue shortly and tag you |
23:53 | <ystartsev> | ljharb: great thanks |
23:54 | <devsnek> | https://engine262.js.org/#gist=b972418f6ec2e52a7c9c711eda60a446 |
23:55 | <benjamn> | devsnek: cool that's what I would hope |
23:55 | <benjamn> | chained logical assignment seems entirely… logical |
23:55 | <ljharb> | keith_miller: june agenda is pushed, feel free to commit directly to it |
23:55 | <devsnek> | except when the assignment target is const /s |
23:55 | <rkirsling> | oh shit that didn't get brought up did it |
23:56 | <rkirsling> | or did I just look away and miss it? |
23:56 | <devsnek> | i don't remember it happening |
23:56 | <ljharb> | rkirsling: it didn't, no |
23:56 | <rkirsling> | 😬 |
23:56 | <rkirsling> | like |
23:56 | <rkirsling> | I don't expect it to change consensus |
23:56 | <rkirsling> | but it deserves public mention |
23:57 | <rkirsling> | I should've thought to bring it up myself |
23:57 | <drousso> | ...oops |
23:57 | <drousso> | yeah same |
23:57 | <littledan> | if we don't ship a spec, the latest spec will still have the bug |
23:57 | <ljharb> | littledan: put that on the queue |
23:57 | <rkirsling> | ^ |
23:57 | <devsnek> | ^ |
23:58 | <littledan> | eh I'm fine just leaving it here; the day's almost over |