TC39 Async Context on 2024-05-15

04:28	<Steve Hicks>	One of my AI's was to go through Stephen's docs and try to figure out what the gist of it was. I think Stephen was also going to try to distill it down a bit further, though it was late for him by the end of the meeting. Variable.prototype.wrap is a convenience function and I think we agreed to more or less table it for now. But there's also a question of per-instance propagation configuration that is impossible to do in userland (i.e. some vars propagate via init/registration context, while others propagate via causal/calling context). Per a discussion today with Scott Haseley, it sounds like there's already some precedent for this in how v8 handles yield vs. normal CPED/attribution, in that the latter propagates over setTimeout while the former is dropped. In the case of multiple variable configs, I think it would boil down to having effectively two separate variable linked lists instead of one, and they're treated differently by snapshot swapping. This may not be infeasible, so it's maybe worth considering whether it leads to a good solution. We didn't get a chance to discuss callingContext, though (tbh) it's still my preferred option for ensuring the APM use case is viable, due to balancing the extra complexity and feasibility of ever actually landing this proposal, vs stalling out from trying to do too much - but we do still need to figure out Promise.all. One (totally off-the-cuff) possibility would be to just stack up all the causal contexts, first-to-last. You know how many promises were merged, so you can pop that many contexts off the stack if you need them all. Upshot from discussion with Jatin was that he agreed that calling context is crucial, and figured that most userland schedulers (at least the ones he owns) would probably need to go with that default. Registration context is useless for our use of computed signals and effects (and he was particularly concerned about losing the ability to see causal context if/when Signals land in the standard), and we'd need to propagate causal context throughout all the stages of (user interaction) -> (lazy load, controller instantiation) -> (rpc fetch) -> (model cache update) -> (component rerender) in order to ensure tracing works correctly. These are all (currently) userland schedulers, at least, so it's possible, but ultimately the hope is to replace the model cache with signals.
04:48	<Steve Hicks>	I took a stab at implementing the stack-based Promise.all in userland with callingContext, and I ran into a brick wall because there's no way to actually stack multiple top-level frames: playground
04:53	<Steve Hicks>	I can see a few variants on callingContext: (1) it just puts the previous frame directly on top of the current one (i.e. behaves identical to Snapshot.run) - in this case, one could just make it a snapshot? This makes it impossible to access deeper-nested ones, since `callingContext()` will just get you back to the previous (registration?) context. Unless maybe it takes a depth argument. In that case, the stacking might just be possible. Or (2) it restores the entire context stack to whatever it looked like in the calling environment, such that a second `callingContext`would go back further in causation history.
11:07	<Stephen Belanger>	I made this small (-ish) example of how we're doing that differentiation between child-of and follows-from relationships and what we're trying to do with holding the minimum possible data in the store (just the ID). https://gist.github.com/Qard/6ceaca8bb792679e82c7693513baee0e
11:37	<Stephen Belanger>	In those examples we have a solution to the need to separate child-of and follows-from relationships, which is not too terribly complicated, so we can live with that. And as I expressed previously the multiple follows-from thing is not too terribly important as we at least get one of the branches so we can still mostly understand the execution structure. But we do need to be able to flow through at least singular pathed merges like an await or then continuation of a single promise. The examples above are meant to show that we are expecting something which logically continues from a particular point is expected to be able to attribute itself to that. Whereas what we get currently with both async/await and promises is a flat structure where all the `mysql.query` spans within those examples would get flattened up to linking with the http.server span, even if the second query has another query between it and the http handler starting.
14:37	<littledan>	I am confused by general comments on registration time vs call time. Can we do more to dig into the detailed cases? (Am chatting with Jatin about this now too)
14:38	<littledan>	Andreu had some sort of point by point analysis. What if we made that a Google Doc and then we could comment together on which things could/should be different for which use cases?
14:49	<littledan>	In reality there will be a mix of both registration time and call time, so I have trouble understanding conversations which are phrases like “vs”
15:27	<Steve Hicks>	A doc sounds like a good idea. Yes, there will be a mix, but there's more nuance than that. There's questions of consistency (e.g. button.addEventListener with a UI click vs. <button onclick="..."> with programmatic button.click() - do these behave the same? My opinion is no) and expressivity (I think it's clear we need some option to override the default in either direction). As long as there's an override to fix any mismatched default, I think we're in pretty good shape.
15:58	<Stephen Belanger>	I'm wondering if await/yield/then binding should just be a config per-store and we can just hold two sets of stores so ones that do have that turned on do those binds and ones that have it turned off don't get tracked in that list at all. Just a random idea. And to be clear, I don't care which way is the default. If we have the capability to switch to the other on our stores then that's basically the one single major blocker for APM vendors right now, as far as I can tell. 🤔
16:22	<Steve Hicks>	I think if we had that option then it would end up needing to be three different sets in the long run, since `scheduler.yield` (and/or `scheduler.currentTaskSignal`) would need yet a different propagation, where it does propagate across `await`, but not through `setTimeout`.
19:13	<littledan>	A doc sounds like a good idea. Yes, there will be a mix, but there's more nuance than that. There's questions of consistency (e.g. button.addEventListener with a UI click vs. <button onclick="..."> with programmatic button.click() - do these behave the same? My opinion is no) and expressivity (I think it's clear we need some option to override the default in either direction). As long as there's an override to fix any mismatched default, I think we're in pretty good shape. Can you say more about how you imagine that option being used? One possible default could be “use the originating/call context where available, otherwise fall back to registration if it doesn’t exist” and you could override that to “always registration time” by wrapping your callback yourself.
19:13	<littledan>	In that case, no options bag needed
19:14	<littledan>	Another is “always registration time, and you get passed the originating snapshot in a property of the event, which you can then .run within if you want” (again, you could choose the opposite default by wrapping the callback, this time in something that got the snapshot out and applied it)
19:15	<littledan>	In either case it would be OK to include an option as an ergonomic niceity but it seems optional to me
19:15	<littledan>	I think if we had that option then it would end up needing to be three different sets in the long run, since `scheduler.yield` (and/or `scheduler.currentTaskSignal`) would need yet a different propagation, where it does propagate across `await`, but not through `setTimeout`. Why were these semantics chosen for priority, btw?
19:16	<littledan>	I'm wondering if await/yield/then binding should just be a config per-store and we can just hold two sets of stores so ones that do have that turned on do those binds and ones that have it turned off don't get tracked in that list at all. Just a random idea. And to be clear, I don't care which way is the default. If we have the capability to switch to the other on our stores then that's basically the one single major blocker for APM vendors right now, as far as I can tell. 🤔 Yeah I could see the “two types of variables” idea but I don’t see how it solves the “maintain follows-from links” problem
19:18	<littledan>	Also I don’t really know how we would make the call-biased variables work
19:19	<Steve Hicks>	Another is “always registration time, and you get passed the originating snapshot in a property of the event, which you can then .run within if you want” (again, you could choose the opposite default by wrapping the callback, this time in something that got the snapshot out and applied it) I think this approach is problematic because it only really works for events. But there's a handful of other APIs (e.g. IntersectionObserver and MutationObserver, various Promise APIs, hypothetical future signals, etc) that don't have any events and you'd need a different custom solution for each to solve effectively the same problem.
19:19	<Andreu Botella>	I think this approach is problematic because it only really works for events. But there's a handful of other APIs (e.g. IntersectionObserver and MutationObserver, various Promise APIs, hypothetical future signals, etc) that don't have any events and you'd need a different custom solution for each to solve effectively the same problem. for observers you could have a property of the observer entry
19:20	<littledan>	Yeah I think this works better for observers than other options since they have a single callback for multiple things
19:21	<Andreu Botella>	Why were these semantics chosen for priority, btw? I think because `scheduler.yield()` wants to distinguish between a continuation of the current task and a subtask
19:21	<littledan>	For promise-based APIs: I am having trouble picturing what we would want and how; maybe you could give a concrete example of where you don’t want the restore-around-await semantics (“registration time”) and what you want instead?
19:21	<littledan>	I think because `scheduler.yield()` wants to distinguish between a continuation of the current task and a subtask What does that have to do with setTimeout?
19:21	<Steve Hicks>	Why were these semantics chosen for priority, btw? I don't know the background there. I scanned through https://github.com/WICG/scheduling-apis/blob/main/explainers/yield-and-continuation.md but don't see anything specifically about this choice.
19:22	<Andreu Botella>	What does that have to do with setTimeout? `setTimeout` would be a subtask
19:22	<littledan>	I don't know the background there. I scanned through https://github.com/WICG/scheduling-apis/blob/main/explainers/yield-and-continuation.md but don't see anything specifically about this choice. I guess you are relaying this case based on personal communication with Scott? Maybe he can clarify (or join here)?
19:23	<Steve Hicks>	I guess you are relaying this case based on personal communication with Scott? Maybe he can clarify (or join here)? Yes, I can ask.
19:23	<littledan>	Can you say more about how you imagine that option being used? One possible default could be “use the originating/call context where available, otherwise fall back to registration if it doesn’t exist” and you could override that to “always registration time” by wrapping your callback yourself. What do you think of this option Steve Hicks ?
19:24	<Andreu Botella>	from a conversation I had with him: It's important (as of now, subject to change) that those [yield-related fields of the object propagated through CPED] are not propagated to subtasks and events. The idea is that yield() can inherit the priority of the current task, but the current task and subtasks are not necessarily related (i.e. breaking up the current task by yielding in a loop does not imply other work spawned should have the same priority). It's possible this will change, but as of now we need to keep that behavior.
19:24	<Steve Hicks>	For promise-based APIs: I am having trouble picturing what we would want and how; maybe you could give a concrete example of where you don’t want the restore-around-await semantics (“registration time”) and what you want instead? My understanding is that this is what Stephen is asking for. I don't have quite as good a sense of the use case, but from the examples I've seen, he wants to `await openFile()` and have a trace span opened in `openFile` still be present on the outside.
19:25	<littledan>	Do you run into cases where you want this behavior with promises?
19:26	<Steve Hicks>	Yeah I think this works better for observers than other options since they have a single callback for multiple things That may be so on an individual level, but it's still a different solution for each situation, which I see as a big problem since it leads to everyone having to figure out for every given situation "how do I do this thing?". Also, it's ideal when userland APIs can have analogous behavior to builtins, and so every userland scheduler would also need to come up with their own custom solution.
19:29	<Steve Hicks>	What do you think of this option Steve Hicks ? I don't love the "where available" framing - it feels very "zalgo-adjacent" where you can never really be sure what context something will run in because it depends on external factors (e.g. for a click handler, it could run in either, depending on if it's dispatched programmatically or by user action). As a result, you just can't really rely on anything.
19:29	<littledan>	Do you have another idea for how we should handle observers?
19:30	<Steve Hicks>	I favor a general solution that doesn't rely on details of the scheduling API's shape.
19:30	<Steve Hicks>	something more like AsyncContext.callingContext where it works in all cases
19:31	<littledan>	I favor a general solution that doesn't rely on details of the scheduling API's shape. Of course, but I guess the scheduling API assumes it can be based on a primitive with certain properties, and we are trying to understand what that primitive is…
19:31	<littledan>	If the decision was not made for a very strong reason and turns out to be kinda irregular compared to other needs, we shouldn’t necessarily turn ourselves inside out trying to solve for it. But if it’s a good reason, that is different
19:32	<littledan>	I don’t understand how callingContext would relate to dropping things on setTimeout
19:33	<Steve Hicks>	sorry, my statement about dropping on setTimeout was just about how neither of the two default-propagation behaviors we're considering would actually work to enable replacing the current yield propagation (as currently spec'd) with AsyncContext
19:34	<littledan>	I don't love the "where available" framing - it feels very "zalgo-adjacent" where you can never really be sure what context something will run in because it depends on external factors (e.g. for a click handler, it could run in either, depending on if it's dispatched programmatically or by user action). As a result, you just can't really rely on anything. Yeah, I share the Zalgo concern, but maybe a bit more broadly. With signals, for example, it feels kinda Zalgo to me if we propagate in things about where the computed was read from (since that is a race in itself). But from taking with Jatin, I understand that he wants to see what triggers what in responding to a user gesture, so it’s kinda needed. An unfortunate contradiction
19:35	<Steve Hicks>	I'll go back to the doc idea you had - we need to get more known use cases and situations documented, I think, in order to get more insight into the downstream ramifications on application code, etc
19:35	<littledan>	sorry, my statement about dropping on setTimeout was just about how neither of the two default-propagation behaviors we're considering would actually work to enable replacing the current yield propagation (as currently spec'd) with AsyncContext Yeah, I agree; do you have an idea for an alternative that would handle this?
19:35	<Steve Hicks>	Yeah, I agree; do you have an idea for an alternative that would handle this? Sadly no. Change the scheduler spec?
19:36	<Steve Hicks>	(to allow propagating across child tasks like an ordinary async var)
19:37	<littledan>	That is my first intuition but it’s because I don’t understand the motivation for the current design
19:38	<littledan>	What would be unscalable is for each variable to have custom logic at each point where it might be propagated. I guess APMs have this power today though.
19:39	<Steve Hicks>	In terms of downstream repercussions, I'm thinking about app developers writing their handlers, middleware, signals, etc. I believe a fundamental axiom here is (or at least, I'd like it to be) that frameworks can put vars in place and app developers don't need to be aware of what those vars are - so needing to explicitly do anything with callingContext in their own callbacks would be a problem, and if there's a few layers of application code in the way such that the framework can't just pull their variable off the "top" callingContext, then that approach probably wouldn't work.
19:41	<littledan>	Agreed. And in general you can have lots of merges that look like that, I think (so Promise.all integration isn’t quite enough)
19:41	<littledan>	This is why the “two classes of variables” idea appeals to me somewhat (but I still don’t know how it would work)
19:43	<Steve Hicks>	This is why the “two classes of variables” idea appeals to me somewhat (but I still don’t know how it would work) agreed - especially if the "calling context" flavor means that it doesn't propagate across an `await`, then I'm not sure it's viable, though (IIUC) that would be more consistent with how then() would behave?
19:45	<Andreu Botella>	On an unrelated note, what do you expect this to print? `function cb() { asyncVar.run("foo", () => { throw new Error(); }); } asyncVar.run("bar", => { setTimeout(cb, 0); }); window.addEventListener("error", () => { console.log(asyncVar.get()); }, {useOriginatingContext: true});`
19:46	<Andreu Botella>	with the current spec, the only thing this could print is `bar`, but I'd expect that's not the expected behavior
19:48	<Steve Hicks>	with the current spec, the only thing this could print is `bar`, but I'd expect that's not the expected behavior I find that incredibly surprising. My mental model is that `v.run(a, () => v.run(b, f))` is equivalent to `v.run(b, f)`.
19:48	<Andreu Botella>	oh wait
19:48	<Steve Hicks>	though obviously that would change w/ callingContext
19:48	<Andreu Botella>	my bad
19:48	<Andreu Botella>	I meant to have `setTimeout` there
19:50	<Steve Hicks>	I'm still lacking some intuition here... how is this different from running `cb` directly in bar?
19:51	<Andreu Botella>	I guess it's not
19:51	<Andreu Botella>	the thing is, when `.run()` returns it will always restore the previous context, even if the callback threw
19:51	<Steve Hicks>	other than i guess that the error is async
19:52	<Andreu Botella>	so when the execution gets back to `setTimeout`, the current context is `bar`
19:52	<Andreu Botella>	the context active at throw time isn't preserved
19:53	<Steve Hicks>	I thought it was? Isn't that the point of useOriginatingContext?
19:53	<Steve Hicks>	so yah, I'd still expect either undefined or foo
19:54	<Andreu Botella>	I thought it was? Isn't that the point of useOriginatingContext? `useOriginatingContext` is there so the registration time isn't used
19:55	<Andreu Botella>	run is basically implemented like: `function run(value, cb) { const previousContext = changeContext(updateContext(value)); try { return cb(); } finally { changeContext(previousContext); // this loses track of foo and restores bar } }`
19:56	<Andreu Botella>	there's currently nothing in the spec text that preserves the context in which an exception is thrown
19:56	<Andreu Botella>	because the current context is switched back to the previous one in the finally
19:57	<Steve Hicks>	ah, my understanding was that unhandled rejections (at least) would hold onto the rejection context. I'd assumed that this extended to "error" as well, though that's not an API I'm familiar with
19:58	<Andreu Botella>	yeah, I think no one considered error until I started looking into it last week
19:58	<Andreu Botella>	I suspect that making that work would mean patching how completions work in the spec 😰
19:59	<Andreu Botella>	though I guess you could also have a `lastThrowContext` global state
19:59	<Andreu Botella>	that would only be used if `cb()` threw
20:00	<Andreu Botella>	that's a much less invasive change
20:20	<Andreu Botella>	Do we want to exposed the last thrown context to userland? Are there userland implementations of something like the error event?
20:20	<Andreu Botella>	Something like that in V8 would definitely be needed for JS runtimes like Node.js and Deno, since they implement the error event in JS
20:22	<Steve Hicks>	Does this require instrumenting every `throw` in order to polyfill?
20:22	<Andreu Botella>	Does this require instrumenting every `throw` in order to polyfill? no, I think it would only require changing the `run` implementation
20:23	<Steve Hicks>	ah, so if you catch in a run, then you know it was a throw-context
20:23	<Steve Hicks>	what about await-resumptions?
20:24	<Steve Hicks>	probably would be handled by the async function instrumentation, i guess
20:25	<Andreu Botella>	what about await-resumptions? I'll have to think about that
20:25	<Andreu Botella>	for the JS runtime use case, I think you only need sync throw handling
20:26	<Andreu Botella>	since anything else would use unhandledrejection rather than error