2024-08-02 [12:52:03.0078] Congrats @stephenbelanger:matrix.org: https://github.com/nodejs/node/pull/48528#issuecomment-2266038177 [12:52:45.0704] Heh, that was quick. 😅 [12:52:46.0277] Last perf improvments were https://github.com/nodejs/node/pull/48528#issuecomment-2253643765 [13:02:17.0080] - `.get()` is ~50% faster regardless of the number of async resources - `.get()` is ~15% faster regardless of the number of nested `.run()` calls - Creating new async resources/promises is 50% to 1400% faster depending on number of resources - It’s 2%-14% slower when no `Variable` are in use? [13:02:39.0611] * 1. `.get()` is ~50% faster regardless of the number of async resources 2. `.get()` is ~15% faster regardless of the number of nested `.run()` calls 3. Creating new async resources/promises is 50% to 1400% faster depending on number of resources - It’s 2%-14% slower when no `Variable` are in use? [13:03:01.0492] Is there a test for speed of the `.run()` itself? [13:22:55.0516] The degradation when not in use is because of the infectious nature of AsyncResource. I plan on poking at that a bit to see what I can do to improve the performance there, but given that apps without _any_ variables don't actually _exist_ in-practice it's not a _huge_ concern. [13:24:37.0446] And no, I don't think there's a run-only benchmark. I want to build out the benchmarks a bit more around it. Because it was layered over async_hooks forever, and _that_ was always a pile of hacks, it never really got much serious consideration when it came to performance, stability, correctness, etc. I'm hoping to change that with the move off of an async_hooks core. 2024-08-03 [07:58:46.0325] Great work Stephen! [13:49:38.0881] got the same fun happening in deno as well. it was nice to see after landing the optimizations in V8 that we can measure the overhead in cpu cycles now 😄 https://github.com/denoland/deno/commit/3a1a1cc030fb7fc90d51ee27162466d6ac924926 [13:50:24.0145] hopefully this proposal can kill that last bit of O(n) complexity 2024-08-04 [10:22:10.0203] > <@devsnek:matrix.org> hopefully this proposal can kill that last bit of O(n) complexity Which one? 2024-08-05 [05:47:32.0243] littledan: in the node impl its `super(AsyncContextFrame.current())` and in the deno impl its `{...previousContextMapping}`. There's no efficient copy-on-write structure available in js like there is in native code [07:00:34.0576] > <@devsnek:matrix.org> littledan: in the node impl its `super(AsyncContextFrame.current())` and in the deno impl its `{...previousContextMapping}`. There's no efficient copy-on-write structure available in js like there is in native code it should be possible to implement a HAMT in JS, shouldn't it? (hash based on an incrementing number that each Variable has) [07:00:57.0671] or, equally, the simple linked list model (good enough sometimes--all web frameworks do this) [08:35:08.0026] A linked list of contexts is what I was going for originally, but it makes things uncollectable even if they’ve been replaced in the store because they are held further up in the linked list. Needed to do the map clone approach for reasonable memory characteristics. [08:35:33.0212] you can use a HAMT for this (the engine doesn't have special powers that JS doesn't have) [08:35:44.0339] https://en.wikipedia.org/wiki/Hash_array_mapped_trie [08:36:09.0083] in particular a "persistent" one [08:36:21.0567] I think engines would *not* implement this with a linked list plus magical GC 2024-08-07 [11:55:41.0660] Steve Hicks: You previously mentioned that with a single property name on events, developers would have to look up which would be the source of that snapshot for each event. If most events with async sources don't propagate the context but some do, wouldn't developers also have to look up when the context is propagated? [11:55:48.0883] and I suspect that would be a lot harder to google [12:55:36.0445] > <@abotella:igalia.com> Steve Hicks: You previously mentioned that with a single property name on events, developers would have to look up which would be the source of that snapshot for each event. If most events with async sources don't propagate the context but some do, wouldn't developers also have to look up when the context is propagated? That's a fair point. I really do prefer the consistency, but ultimately I think getting the right default as much as possible is a more important trade-off. I think in order to reason much further about this, we need two things: (1) a more concrete list of events and what context they will run in, and (2) a better understanding of whether the "no context exists" case falls back on the top-level (empty) context or else the registration-time context. What's the best way to collaborate on #1? [12:59:08.0269] last time I checked there were 250+ event names in the web platform (which is not the same as distinct events, since e.g. the `error` event on window is very different from the `error` event on say `WebSocket`) [13:00:00.0675] only those that have async sources matter here, but I don't think there's a good way to get the full list short of analyzing every single one [13:00:17.0630] * only those that have async sources matter here, but I don't think there's a good way to get the full list of those, short of analyzing every single one [13:01:21.0902] although maybe there's a way to analyze e.g. chromium code to get a partial list of events that are guaranteed to have async sources [13:01:55.0646] I can ask my internal chrome contacts to see if they've got any pointers [13:03:18.0820] one thing nicolo-ribaudo pointed out is that we could try to reach out to JS educators, maybe giving some example APIs for libraries that would be using AsyncContext, and let developers tell us which events they'd use those libraries with [13:03:46.0662] since we have a selection bias in that first-party developers won't be engaging because they will not be using AsyncContext directly most of the time [13:04:56.0175] that's an interesting approach - I'm a little unsure of what it would mean to use an event with a library, though. [13:05:31.0008] I imagine a tracing or DI library is generally pretty orthogonal from the event system, so it more comes down to just what events your application uses at all [13:07:04.0390] I'm not familiar with DI systems at all, so I'll have to trust you on that [13:07:24.0247] by DI here, it's referring to the same kind of thing as React Context [13:07:44.0639] it'll be good for us all to become familiar with these things, as they're a potentially important use of AsyncContext [13:08:08.0659] an example of a framework that uses this DI terminology is Vue https://vuejs.org/guide/components/provide-inject [13:11:36.0005] Well, I think there's a lot more to DI than what react context does (or else I may have a too-limited view of what react context is about, since my primary experience is just to avoid prop drilling). For instance, it's typically used to inject top-level/singleton services, like schedulers, RPC clients, or data stores. My experience with DI is mostly on the server via Guice (in Java) but in that context, it's a pretty different flavor from drilling props. A lot of the motivation tends to be looser coupling to support testing and late-loading. [13:13:32.0837] Andreu Botella: you mentioned 250+ event names, but presumably this also doesn't include (e.g.) custom element lifecycle callbacks, mutation/intersection observers, etc. [13:14:47.0036] for custom element callbacks, as far as I'm aware they're always either triggered synchronously from JS code, or caused by a user or browser event, so I don't think there are any possible async sources [13:16:09.0011] for observers, most of them bunch multiple observations, so calling the callback with only one of them would be a merge [13:17:54.0559] in the document I describe exposing the snapshot for each of them in the observation object, if it would be useful [13:18:08.0872] Chengzhong Wu pointed out that for `PerformanceObserver` they would be [13:19:55.0015] Yeah, resource timing in fetch is only available from `PerformanceObserver` so it is useful to get each resource timing event's relevant context snapshot [13:33:03.0545] another important question: if we use the top-level context for user/browser-sourced events, would users of AsyncContext need a way to tell whether it is the top-level context? or would every possible use only care about its own variables, and it could check whether they have the default values? [13:35:42.0496] I'm thinking of userland schedulers, which would have no variables, they would just store a snapshot – is there any use case where it would make sense to do that inside an event listener, and where the scheduler would want to replace the snapshot if it's empty? [13:41:55.0631] > <@abotella:igalia.com> for custom element callbacks, as far as I'm aware they're always either triggered synchronously from JS code, or caused by a user or browser event, so I don't think there are any possible async sources One async source I'm aware of is when a contenteditable element is deleted by user interaction. [13:42:58.0192] there's never good terminology for this stuff 😮‍💨 [13:43:22.0323] when I say async source, I'm discouting events caused by user or browser interactions [13:44:09.0314] Ah, I thought "user" meant "developer". So you're not counting a UI click initiating a click event as an async source? [13:47:31.0679] if we go with not using registration-time always, then we'd have one consistent behavior for events fired synchronously from the action of some JS code, and another consistent behavior (whether top-level context or registration context) for events that were not fired due to the action of some JS code [13:48:06.0293] what I'm concerned with is cases where you have JS code starting an async operation in the browser that eventually fires an event [13:48:26.0835] since those cases would have to be handled manually, and it's unfeasible to handle every single one of them in the initial rollout [13:48:41.0811] * since the spec and browser changes for those cases would have to be handled manually, and it's unfeasible to handle every single one of them in the initial rollout [13:48:48.0125] * since the spec and browser changes for those events would have to be handled manually, and it's unfeasible to handle every single one of them in the initial rollout [13:49:54.0790] Do you have an example of "JS code starting an async operation in the browser that eventually fires an event"? Are you thinking of e.g. XHR? [13:50:20.0778] XHR is a good and common example, which is why I keep using it, but there are many others [13:51:00.0527] a lot of those are obscure though [13:52:52.0408] * if we go with not using registration-time always, then we'd have one consistent behavior for events fired synchronously from the action of some JS code, and another consistent behavior (whether top-level context or registration context) for events that were not fired due to the action of some JS code in the same agent [13:53:05.0115] I'm not optimistic, but is there any way to hedge here? When we're rolling out userland APIs internally, we try to lock things down as much as possible at the outset so that not-yet-defined behavior is an early error. I'm not coming up with any good way to fail fast if anyone were to try to rely on a specific choice of context in the interim before we figure out what the correct default is. [13:53:54.0044] > <@stephenhicks:matrix.org> Well, I think there's a lot more to DI than what react context does (or else I may have a too-limited view of what react context is about, since my primary experience is just to avoid prop drilling). For instance, it's typically used to inject top-level/singleton services, like schedulers, RPC clients, or data stores. My experience with DI is mostly on the server via Guice (in Java) but in that context, it's a pretty different flavor from drilling props. A lot of the motivation tends to be looser coupling to support testing and late-loading. I think React Context is used for a lot of things (certainly data stores, not 100% sure about the others) [13:55:35.0049] > <@stephenhicks:matrix.org> I'm not optimistic, but is there any way to hedge here? When we're rolling out userland APIs internally, we try to lock things down as much as possible at the outset so that not-yet-defined behavior is an early error. I'm not coming up with any good way to fail fast if anyone were to try to rely on a specific choice of context in the interim before we figure out what the correct default is. Yeah, I can't come up with a way either [13:57:42.0057] The sledgehammer approach would just be to make `variable.get()` throw until a new snapshot is restored, but I don't see that flying. [13:58:17.0248] (and even then, people end up depending on errors, etc... web compat is a pain) [13:59:04.0545] The only future proof default is no context [13:59:20.0954] Provide both originating and registration via properties [14:02:51.0478] I don't think that satisfies the goal that AC-unaware user code does the right thing by defualt. And it doesn't really leave the door open to providing a different context later (e.g. we have a `beginTrace` API that starts a new trace if there's not one currently active, or opens a child span if there is. But changing an event from no-context to registration context (for example) ends up changing this behavior, which isn't good. [14:46:23.0137] Backing up a bit, I wonder how many events are actually ambiguous. I suspect that the majority will be pretty clear. It's a lot of upfront work, but if we can narrow down the list of events that we really don't know what to do with, that might make it easier to talk about. Is something like https://www.w3.org/wiki/List_of_events a good place to start? [14:51:35.0813] that looks very outdated, but it might be a place to start [14:52:12.0049] the definitive list of events is probably https://github.com/w3c/webref/tree/main/ed/events, scraped directly from web specs [14:52:48.0921] although not all of those are specs that are far enough in the process to be implementable in browsers, let alone implemented [14:53:09.0811] * although not all of those are specs that are far along in the process to be implementable in browsers, let alone implemented [14:58:57.0796] I have a buggy script running around that can take webref data and filter it based on MDN's compat data, I can work on that tomorrow if you think it would be useful [14:59:22.0360] * I have a buggy script running around that can take webref data and filter it based on MDN's compat data, I can pull it out and try to make it work on events tomorrow if you think it would be useful [15:00:19.0321] * I have a buggy script somewhere that can take webref data and filter it based on MDN's compat data, I can pull it out and try to make it work on events tomorrow if you think it would be useful 2024-08-08 [17:39:47.0699] that sounds useful, I was about to try to put something like that together myself [11:19:07.0742] I'm looking at chromium event dispatch code, and it seems like there are a number of events (especially dealing with navigation, cross-window communication and similar things) that are synchronous if the window is active, but not if it's in the back/forward cache [11:21:32.0793] That seems consistent with click, where it can be triggered either programmatically or by the user interface. Active vs fallback context should be fine? [11:22:44.0302] could be [11:23:29.0475] for the `message` event, though, chromium is already propagating the context for task attribution [11:31:24.0497] huh, it seems like for at least some of these events with an async source, the way chromium dispatches them would make it easy to propagate the context in the implementation [11:32:18.0886] but I don't know how that translates to other browsers, and that definitely does not translate to the spescs [11:32:23.0420] * but I don't know how that translates to other browsers, and that definitely does not translate to the specs 2024-08-12 [10:54:04.0407] So as I mentioned earlier, Chromium has a way to track what scheduled some async tasks (seems to be used for devtools and for ad tracking), so I made a patch that prints out the type of event, and for async sources some string related to the scheduler, for each event dispatch [10:54:14.0603] https://chromium-review.googlesource.com/c/chromium/src/+/5774262 [10:55:43.0715] maybe there's a way to turn that into telemetry that we could get, but the timeline on getting that will be months [10:56:15.0663] I also don't expect that the async task tracking covers every event with an async source [10:56:32.0252] * I also don't expect that this async task tracking covers every event with an async source, so we might not get a full list of such events [10:56:54.0653] but I might run a number of Blink web tests overnight and see what results we get 2024-08-14 [17:58:23.0401] I put together a spreadsheet from the @webref/events github: https://docs.google.com/spreadsheets/d/1r-IjEyTEuCzQtJgSyY-a5htrQFME9RPylkv3hT_puqg/edit?gid=1971388798#gid=1971388798 [19:09:51.0412] So far a handful of cases seem to maybe want to snapshot the context that was active when a task is queued - e.g. `securitypolicyviolation` is fired _asynchronously_ after JS code programatically does something that violates CSP - this seems basically impossible to polyfill, for what that's worth. [10:17:36.0915] what do you mean by "preserve" here? [10:18:08.0579] maybe we should maintain a "rationale" column in this doc [10:20:40.0608] sounds good, and i can also add anyone else as an editor if you ping me what account to add 2024-08-19 [13:56:05.0312] I was talking with our tracing folks today (both from the tracing side and the web frameworks side) and had broad agreement that they generally just want event handlers to do as little as possible in terms of snapshot/restore. I told them about my concerns with polyfilling these "async causes" (i.e. programmatic JS that triggers an event to be queued for dispatch in a future microtask) and the consensus was that, for any events we tend to care about, it's basically always more accurate to run listeners in the root/empty context rather than the registration context, and not to worry about a more-accurate programmatic causal context that we can't get access to. [13:58:04.0553] But with our current polyfill (which dispatches events in the registration context), they're running into real problems where event listeners are registered in the "application bootstrap context" and they really want stronger guarantees that that context will completely disappear when bootstrapping is complete. [14:05:10.0504] people have this intuition of "do as little as possible" or "disappear" but I don't think they mean the null context, I think they mean some sort of causal context, which we've discussed the difficulty of defining. Did you raise this difficulty with the requesting teams? [14:05:45.0309] or maybe you're saying, they are happy with the null context? [14:07:41.0368] > <@stephenhicks:matrix.org> But with our current polyfill (which dispatches events in the registration context), they're running into real problems where event listeners are registered in the "application bootstrap context" and they really want stronger guarantees that that context will completely disappear when bootstrapping is complete. This is excellent to know that you're really trying this in a polyfill and it's really not working out. But can you explain a little more about how this comes up mechanically, within a system that already does event delegation? [14:08:11.0341] * [EDIT: Sorry I misread what you wrote above, you already did address this] people have this intuition of "do as little as possible" or "disappear" but I don't think they mean the null context, I think they mean some sort of causal context, which we've discussed the difficulty of defining. Did you raise this difficulty with the requesting teams? [14:18:07.0661] If we do that, and at some later point we realize there's some existing event with an async source that needs the async dispatch context to be propagated, that won't be able to be changed [14:18:23.0293] we wouldn't even be able to have use counters [14:19:37.0730] what do you mean by "use counters"? [14:19:43.0371] > <@stephenhicks:matrix.org> But with our current polyfill (which dispatches events in the registration context), they're running into real problems where event listeners are registered in the "application bootstrap context" and they really want stronger guarantees that that context will completely disappear when bootstrapping is complete. one thing to note is that, if you have a particular variable (like the trace/span id) that you always want to be a particular value (like null), you can set that value explicitly your bootstrapping code. I'm really curious why this kind of thing didn't work for you [14:19:53.0949] > <@stephenhicks:matrix.org> what do you mean by "use counters"? this is about assessing the web compatibility of later changes in semantics [14:20:02.0269] I'm not sure I'm saying that async events never dispatch with a causal context. [14:22:07.0347] Chromium and firefox (probably also webkit, but I don't know that for sure) have telemetry where, if a page uses some combination of features, that usage gets recorded, to asses e.g. how likely is a change in semantics to affect websites [14:24:02.0670] you could have use counters for uses of `AsyncContext.prototype.get()` with the null context, or similar, but you can't track cases like ```js foo.addEventListener("bar", () => { asyncVar.run("baz", () => { someUnrelatedAsyncVar.get(); }); }); ``` [14:24:31.0828] hm... [14:24:50.0145] I guess it's not impossible to track, but it might add implementation complexity [14:26:09.0042] okay, you could track all uses of contexts derived from an async event without a lot of complexity, but not from a specific async event [14:26:27.0395] Andreu, I think you're jumping ahead a few steps too many when you're talking about this future compatibility risk; let's focus on figuring out their needs first [14:27:41.0037] Steve Hicks: Are we talking about the span id/trace id? Is this just one variable? Could you say more about how/when it's initialized, and how "application bootstrap time" becomes misleading compared to being identifiably "null"? [14:28:29.0183] Yes, the place where this is coming up is with the root trace, where the bootstrap runs in a bootstrap trace, and then event handlers look for a missing trace to know to start a new one (rather than continue an existing one). It might be possible to explicitly zero out the trace when registering listeners - I've asked whether that's viable. [14:30:03.0186] > <@stephenhicks:matrix.org> But with our current polyfill (which dispatches events in the registration context), they're running into real problems where event listeners are registered in the "application bootstrap context" and they really want stronger guarantees that that context will completely disappear when bootstrapping is complete. This sounds like the tail wagging the dog. They want to ensure that their bootstrap context is GC’d, and think not events capturing context will ensure that, but that’s just not the case. What happens when they have a pending promise or `setInterval`? I think null-context is the least useful choice for users. It’s simple enough for them to clear the boostrap context before adding event listeners, but we shouldn’t force everyone else to wrap their event handlers because of this case. [14:31:08.0004] > <@stephenhicks:matrix.org> I'm not sure I'm saying that async events never dispatch with a causal context. The GC use case prevents us from every making a causal context event. [14:31:23.0704] * The GC use case prevents us from ever using a causal context in events. [14:31:37.0094] Registration context is simply wrong for many, many events. There may be a more useful causal context that's available, and we should use it when possible, even if we can't polyfill it. [14:31:39.0577] > <@stephenhicks:matrix.org> Yes, the place where this is coming up is with the root trace, where the bootstrap runs in a bootstrap trace, and then event handlers look for a missing trace to know to start a new one (rather than continue an existing one). It might be possible to explicitly zero out the trace when registering listeners - I've asked whether that's viable. ah, OK, they want a bootstrap trace, and then they want event handlers to run in a null trace.... could we address this by loading a null trace snapshot right around addEventListener, narrowly? [14:32:08.0905] or is it that they really do want a causal trace? [14:32:43.0751] My understanding is that the code running `addEventListener` is largely out of our control - third party libraries and whatnot [14:33:18.0971] maybe patch addEventListener then? [14:33:59.0177] > <@stephenhicks:matrix.org> Registration context is simply wrong for many, many events. There may be a more useful causal context that's available, and we should use it when possible, even if we can't polyfill it. Sure, I see good arguments for causal context. But I don’t think there are any good ones for null context. [14:34:01.0597] > <@littledan:matrix.org> maybe patch addEventListener then? That way lies madness, let me tell you... [14:34:13.0351] > <@stephenhicks:matrix.org> Registration context is simply wrong for many, many events. There may be a more useful causal context that's available, and we should use it when possible, even if we can't polyfill it. concrete examples of cases where it is wrong is extremely helpful! Now we have one with tracing and bootstrap contexts. If we can assemble more, it'll be very useful. [14:34:44.0076] I think my point was that when causal context is unavailable, empty context is the best alternative - both for lossy polyfills and for standards [14:34:44.0456] > <@stephenhicks:matrix.org> Registration context is simply wrong for many, many events. There may be a more useful causal context that's available, and we should use it when possible, even if we can't polyfill it. * concrete examples of cases where it is wrong is extremely helpful! Now we have one with bootstrap traces. If we can assemble more, it'll be very useful. [14:35:28.0119] sure, I guess I'm just trying to understand the details of their requirements, and then we can think of the various possible ways to meet them [14:35:35.0830] rather than reaching a "point" yet [14:36:31.0201] A new piece of information for me today was that there's a desire to trace how long bootstrap takes, and so this is what makes the registration context wrong. I did not understand that phenomenon before this conversation. [14:36:47.0854] so this is very helpful [14:36:48.0625] > <@stephenhicks:matrix.org> That way lies madness, let me tell you... In particular, I've been patching addEventListener for my polyfill, and it's super subtle. [14:37:20.0655] > <@stephenhicks:matrix.org> That way lies madness, let me tell you... * In particular, I've been patching addEventListener for my polyfill, and it's super subtle. Also, short-term patches are nonviable because removeEventListener [14:37:40.0491] > <@stephenhicks:matrix.org> In particular, I've been patching addEventListener for my polyfill, and it's super subtle. Also, short-term patches are nonviable because removeEventListener yeah, I can imagine... I'm just trying to understand the constraints, and curious what kinds of third party libraries you're trying to trace through [14:37:44.0466] * In particular, I've been patching addEventListener for my polyfill, and it's super subtle. Also, short-term patches are nonviable because removeEventListener. So I think we need to put that out of our heads as a remotely viable suggestion. [14:37:53.0905] and what kind of flexibility you have about how you use them [14:38:24.0142] d3 and tanstack router were two that were named [14:39:09.0998] and it's not so much that we want to "trace through" them, so much as that they're involved in the app and all it takes is one library not doing it right to mess everything up - this is why the right defaults are critical [14:43:01.0453] Node.js _never_ patches events, and this is very intentional. Event emitters themselves are not async, they inherit asynchrony from what publishes to them. Therefore it’s actually that _triggering_ thing you need to propagate context to and leave event emitters alone to just continue in the context the dispatch happens in. [14:43:46.0851] In web platform that triggering thing can be internals, so that needs to be taken into consideration. [14:44:14.0942] But it should be treated as a propagation of the web platform thing, not a propagation of event emitters. [14:45:04.0115] Which means event emitters should not be expected to have “consistent” behaviour, because the conditions in which they are triggered is not consistent. [14:45:10.0964] propagating the context through everything async in the web platform is simply not feasible at this point in time, which is why I'm talking about async sources for events (which in your terms is async internals that eventually triggers the event) [14:45:46.0881] that way we can focus only on the async internals that are observable [14:46:20.0888] It doesn’t need to propagate fully _through_ it necessarily. Only to the extent that is observable from JavaScript. (At least at this point, anyway…) [14:47:06.0928] I _do_ want to have it actually accessible _everywhere_ though as I would _really_ like for profiler samples to capture current context state. [14:47:31.0540] But, as you say, that’s a large effort. [14:47:34.0768] so, how would it be if EventTargets had a snapshot associated with them (say, the default for the whole document) and ran all of their events there? [14:47:47.0612] * so, how would it be if EventTargets had a snapshot associated with them (say, the default for the whole document, so the global one) and ran all of their events there? [14:48:21.0778] and if there's another relevant snapshot for an event, we pass that in a property [14:49:01.0202] we have to define the web semantics completely, we can't say "whatever falls out" [14:49:34.0844] Binding to the initialization of an EventTarget? I can see how that could be useful for resource attribution, but would not work for the application tracing case without needing to rebind everything to the execution flow path (aka through flow) [14:50:33.0188] there are two problems we're discussing: the one Steven H raised (avoid misattribution to the bootstrap phase) and this one you're raising (do all the tracing causally) [14:51:48.0632] I think they’re related problems. [14:52:13.0288] The attribution depends on the flow you’re expecting. [14:52:14.0858] sure, the perfect solution to everything also solves the easier version that Steven described where it's OK for us to cheat and use the null context [14:53:08.0025] > <@stephenbelanger:matrix.org> But, as you say, that’s a large effort. Even having all events with an internal async source propagate the causal context won't be easy, and we have already gotten pushback (from Domenic Denicola) saying that that will be too burdensome for folks working on web specs [14:53:36.0539] > <@abotella:igalia.com> Even having all events with an internal async source propagate the causal context won't be easy, and we have already gotten pushback (from Domenic Denicola) saying that that will be too burdensome for folks working on web specs yes, we've gotten this pushback from Anne van Kesteren as well. Together, they represent Chrome and Safari. [14:53:40.0858] Attributing something to boot or EventTarget creation implies you are interested in viewing things from the perspective of resource ownership. Through or causal flow is a more direct line to what triggered the thing. [14:54:09.0080] > <@stephenbelanger:matrix.org> Attributing something to boot or EventTarget creation implies you are interested in viewing things from the perspective of resource ownership. Through or causal flow is a more direct line to what triggered the thing. sorry, I meant, just giving up 100% and saying, it's the global default snapshot [14:54:54.0900] Stephen Belanger: What do you want us to do when you bring up, browsers should do causal flow, when Andreu is reporting that we have pushback from browsers on this, and you also say you don't actually care much about the browser case? [14:55:26.0744] I think the right behavior is causal flow, but it seems unfeasible [14:55:35.0316] I think “giving up” is a valid decision given the complexity, though needs to be very clearly encoded and explained as “This is not the _correct_ decision, this is just what we have _now_ because solving this case is hard” [14:56:10.0523] Meaning, not painting ourselves into a corner with expecting that flow forever. [14:56:27.0271] > <@stephenbelanger:matrix.org> I think “giving up” is a valid decision given the complexity, though needs to be very clearly encoded and explained as “This is not the _correct_ decision, this is just what we have _now_ because solving this case is hard” Yes, but the problem with that is that the web is not versioned, so if programs start to rely on the behavior *now*, that behavior will be locked forever [14:56:46.0758] and it's even going to be hard to *detect* whether programs rely on that behavior [14:56:55.0296] I don't think there are any good solutions [14:57:11.0090] right, I don't see a way to avoid "painting ourselves into a corner" [14:57:20.0379] we have to choose what the semantics are [14:57:39.0795] > <@littledan:matrix.org> Stephen Belanger: What do you want us to do when you bring up, browsers should do causal flow, when Andreu is reporting that we have pushback from browsers on this, and you also say you don't actually care much about the browser case? I mean _I_ care about the browser case. APM vendors maybe _less_ so, but it’s a space we’re gradually starting to need to care about. [14:57:54.0897] > <@stephenbelanger:matrix.org> I mean _I_ care about the browser case. APM vendors maybe _less_ so, but it’s a space we’re gradually starting to need to care about. OK sorry ignore the last clause and just focus on the first two [14:58:31.0821] It’s more just that servers have very different execution patterns from browsers, and if you define something _only_ considering browser runtime semantics then you make it unusable for servers. [14:58:56.0677] maybe you could be more concrete about a server case that you're worried about us defining the wrong semantics for? [15:00:19.0861] > <@abotella:igalia.com> Yes, but the problem with that is that the web is not versioned, so if programs start to rely on the behavior *now*, that behavior will be locked forever Yes, that’s my concern with the “giving up” approach. If we can be sure that one would never _expect_ an empty context, but it could be present in some scenarios, then it could be seen as just an absence of support for that scenario. But if people begin to _expect_ an empty context in that case then it becomes contract. [15:02:26.0909] > <@littledan:matrix.org> maybe you could be more concrete about a server case that you're worried about us defining the wrong semantics for? I’ve stated it before, as has Matteo. Users expect things to persist into _temporally_ continuing code, not just call-recursive code. AsyncContext as it is now gets the call-recursive flow fine, but doesn’t flow _out_ to get the temporally continuous context. [15:03:31.0839] > <@stephenbelanger:matrix.org> I’ve stated it before, as has Matteo. Users expect things to persist into _temporally_ continuing code, not just call-recursive code. AsyncContext as it is now gets the call-recursive flow fine, but doesn’t flow _out_ to get the temporally continuous context. I know you and Matteo have stated this broad goal, but I think what we need to do is collect more concrete cases of variables and callsites where it's important that this handling be done [15:03:40.0409] Like we need to be able to create a span in a mysql query call, have that span flow out to the function that called it, and become the parent of the next span created in logically continuing code. [15:03:57.0624] > <@stephenbelanger:matrix.org> Yes, that’s my concern with the “giving up” approach. If we can be sure that one would never _expect_ an empty context, but it could be present in some scenarios, then it could be seen as just an absence of support for that scenario. But if people begin to _expect_ an empty context in that case then it becomes contract. There are at least two cases (XHR and same-window `postMessage`) where things internal to Chrome need the context to be propagated asynchronously. So it wouldn't be using the empty context for all events with an async source. But that's probably not enough for developers to not begin to expect an empty context in all other cases [15:04:24.0145] > <@stephenbelanger:matrix.org> Like we need to be able to create a span in a mysql query call, have that span flow out to the function that called it, and become the parent of the next span created in logically continuing code. Can this be done by mutating a span object that's held in an asynccontext variable? I thought that's what OTel already does [15:05:08.0716] No, because you lose the causality if you don’t actually have causal flow. [15:05:09.0322] > <@littledan:matrix.org> Can this be done by mutating a span object that's held in an asynccontext variable? I thought that's what OTel already does This is also what we're doing with our internal tracing framework [15:05:29.0648] > <@stephenbelanger:matrix.org> No, because you lose the causality if you don’t actually have causal flow. can you elaborate on what you mean here? [15:05:44.0401] I’ve provided examples before that if you, for example, do a Promise.all(…), have no way to know which branch you’re trying to merge back to. [15:06:11.0289] You’d have two branches trying to write to the same span and it would breaks [15:06:18.0095] * You’d have two branches trying to write to the same span and it would break. [15:06:32.0918] but... wasn't your solution to choose arbitrary among them anyway? [15:06:39.0109] > <@stephenhicks:matrix.org> This is also what we're doing with our internal tracing framework And we prefer it over the flow-through model because of the lexical guarantees against context being nuked by a bad actor in the middle. [15:06:41.0740] * but... wasn't your solution to choose arbitrarily among them anyway? [15:07:51.0202] You can only nuke flow-through if you mess with the context through a global graph, which it’s designed to generally not doz [15:07:55.0274] * You can only nuke flow-through if you mess with the context through a global graph, which it’s designed to generally not do. [15:08:12.0071] And you can cause the same problems with around flow right now. [15:08:38.0348] We’re doing exactly that right now with AsyncResource in Node.js messing with a global graph and causing problems for people. [15:09:08.0346] Around flow is not inherently safer. [15:09:41.0395] It’s actually _less_ safe because you frequently need to bind out of the paths it gives you because most are only conditionally useful z [15:09:45.0447] * It’s actually less safe because you frequently need to bind out of the paths it gives you because most are only conditionally useful. [15:09:57.0370] * It’s actually _less_ safe because you frequently need to bind out of the paths it gives you because most are only conditionally useful. [15:10:54.0304] I think the ideal state to my mind is that we have a version of my spreadsheet where all 400 rows are filled out consistently, with each either being "registration time", "empty context", or some other combination involving more specific contexts, etc. And then polyfills would likely fall back on empty context when they can't access the correct one. [15:12:39.0394] Around flow also requires _substantially_ more binding logic than through flow as through flow is in most cases just a continuation of the scope you are already in, while other semantics are continuously trying to return to a prior state out of the state it is in presently. [15:13:29.0055] > <@stephenbelanger:matrix.org> It’s actually _less_ safe because you frequently need to bind out of the paths it gives you because most are only conditionally useful. This is a good point, which demonstrates that the least safe thing of all is bad defaults. Any time anyone needs to break from the defaults, it leads to risk of (1) not doing it because they didn't realize they needed to, (2) doing it too much/inappropriately because it's subtle. [15:13:56.0057] > <@stephenhicks:matrix.org> I think the ideal state to my mind is that we have a version of my spreadsheet where all 400 rows are filled out consistently, with each either being "registration time", "empty context", or some other combination involving more specific contexts, etc. And then polyfills would likely fall back on empty context when they can't access the correct one. sounds good, but it'd be easier to fill this out if we had some agreed-on examples of cases where we don't want to use the registration context and want to use something else in particular. Do you have a few off the top of your head? [15:15:32.0508] Honestly, I almost never want registration context. setTimeout is about it. [15:15:52.0881] XHR [15:16:11.0886] so, idk, if you could go crazy and fill out the table with your opinions (maybe we could make different columns for us each to "vote") that'd be helpful [15:16:23.0241] I don't know whether you want things to be null or something else, for the other cases [15:16:51.0446] I was suggesting above, let's go crazy and make all events always be in the null context; I didn't really get a read for whether you liked that idea [15:17:29.0830] sorry, yes, I think that's closer to what I'd prefer [15:18:04.0847] though I'm not sure if that answer is quite nuanced enough to fly [15:18:36.0396] what sort of nuance do you think would be good to correct? [15:23:26.0992] A lot of events seem to be triggerable either by code or by user interaction. I would like to see the active context when the dispatch task was queued for the former case, and the empty context for the latter. This requires an understanding that the listener needs to be able to handle both empty and non-empty contexts, depending on how the event was triggered. [15:23:52.0920] for polyfills, everything looks user-triggered [15:24:06.0065] OK, if this is the only thing to follow (dispatch context vs empty context) it seems simple enough for me to drop my "this is also Zalgo" thing [15:24:46.0130] also seems simple enough for browsers' concerns about "can we do this without having some huge understanding of tracing whenever handling any callback" [15:25:13.0516] in Andreu's analysis, he identified many cases where the registration context just seemed like the right answer, beyond setTimeout. Did you have any thoughts on those? [15:25:32.0265] I don't know which those were [15:25:46.0879] anything before events in the web integration document [15:25:57.0934] I think XHR might be one case where it's something other than dispatch-or-empty [15:25:59.0309] * anything before the event section in the web integration document [15:26:10.0693] can you look at Andreu's doc and identify which parts you agree and disagree with? [15:26:57.0492] > <@littledan:matrix.org> can you look at Andreu's doc and identify which parts you agree and disagree with? This is https://github.com/tc39/proposal-async-context/pull/100 ? [15:27:00.0851] yeah [15:27:13.0371] thanks [15:37:13.0921] * Observers can batch, so you can't get the causal context - I don't feel strongly between null vs registration, but probably we need to expose the cause via mutation record property * I don't have enough context for action registrations to speak intelligently, but I think they are similar to events. Certainly web component lifecycle callbacks need to get a causally relevant (or fallback on null) snapshot, but I suspect something like mediaSession might also have a more relevant context, e.g. if you call `play()` programmatically or something [15:39:39.0583] Async completion callbacks all seem good as they are - I generally see registratio-time as a preferred default for callbacks that will run at most once, at a semi-predicatable time. When it will call more than once, indefinitely into the future, I think registration is the wrong _default_, though you may still need opt back into it in some cases. [15:41:58.0579] can you elaborate on which cases we should opt back into registration context? [15:42:21.0929] for observers: do you think the causal context is necessary for the MVP, or is it OK if it's a "for-later" thing? [15:42:41.0204] observers are a fun test of what we mean by "synchronous"! [15:49:08.0800] for observers, since the cause would be a new property, that can be left for later without risk of breakage [16:30:14.0488] > <@stephenhicks:matrix.org> Yes, the place where this is coming up is with the root trace, where the bootstrap runs in a bootstrap trace, and then event handlers look for a missing trace to know to start a new one (rather than continue an existing one). It might be possible to explicitly zero out the trace when registering listeners - I've asked whether that's viable. Is the trace on event handler agnostic to event types? In implementations where instrument known event types, it can determine either creating a new root span, or creating a child span by event type semantics. [16:31:49.0008] Like in OpenTelemetry, it is needed to be aware of event type semantics to produce useful traces. 2024-08-20 [17:28:17.0321] > <@littledan:matrix.org> can you elaborate on which cases we should opt back into registration context? By "opt back into" I meant `AsyncContext.Snapshot.wrap` to handle the rare exception to the default. [17:28:26.0639] > <@littledan:matrix.org> can you elaborate on which cases we should opt back into registration context? * By "opt back into" I meant `AsyncContext.Snapshot.wrap` to handle the rare exception to the default - so not "us" opting in [17:31:40.0722] > <@legendecas:matrix.org> Is the trace on event handler agnostic to event types? In implementations where instrument known event types, it can determine either creating a new root span, or creating a child span by event type semantics. My understanding here is that we _do_ fire some events synchronously via `element.click()` or `.focus()` and that those cascading cases are treated as child spans, whereas a user-initiated click (or focus) would be treated as a root span (but with the same tree of children). In this case, having event handlers "do nothing" produces exactly the right behavior - user-initiated actions come in with an empty context, while synchronous events inherit the caller's context. [23:43:29.0630] I think the question was about something else: do you have event-specific logic in your tracing system? Do we expect this to exist generally? The answer to this has significant impact on what kinds of API shapes would or wouldn’t work (even if you have identified one which would work). [01:04:55.0080] Ah, I think it is probably event-specific, but I don't know exactly. I'll ask, though can you explain why it's such a significant impact? [01:08:15.0691] Though, on second thought, the specificity could easily only be in the product code, rather than in the infrastructure. [06:22:51.0045] Whether it is event-specific determines whether we could indicate important context in event-specific properties, as you had proposed, and have a bit of logic to fix things up with that information [06:24:38.0143] This is really core because we know that different variables need different kinds of information flow. So, we are trying to figure out what the requirements are for this tracing variable (which is the motivation for all of the null context and flow-through discussion as far as I can tell) [06:28:19.0085] Also, maybe you have some other system that you have been comparing AsyncContext to, and converting an existing tracing system to be based on AsyncContext? Can you tell us about this baseline and what its semantics are? [07:18:35.0524] I was under the impression that choosing between initial context and registration context for user-sourced events would be trivial, but after thinking some more about it, I don't think it would be [07:19:06.0916] after all, you could fire an event synchronously in an inline `