TC39 Async Context on 2024-08-19

20:56	<Steve Hicks>	I was talking with our tracing folks today (both from the tracing side and the web frameworks side) and had broad agreement that they generally just want event handlers to do as little as possible in terms of snapshot/restore. I told them about my concerns with polyfilling these "async causes" (i.e. programmatic JS that triggers an event to be queued for dispatch in a future microtask) and the consensus was that, for any events we tend to care about, it's basically always more accurate to run listeners in the root/empty context rather than the registration context, and not to worry about a more-accurate programmatic causal context that we can't get access to.
20:58	<Steve Hicks>	But with our current polyfill (which dispatches events in the registration context), they're running into real problems where event listeners are registered in the "application bootstrap context" and they really want stronger guarantees that that context will completely disappear when bootstrapping is complete.
21:05	<littledan>	[EDIT: Sorry I misread what you wrote above, you already did address this] people have this intuition of "do as little as possible" or "disappear" but I don't think they mean the null context, I think they mean some sort of causal context, which we've discussed the difficulty of defining. Did you raise this difficulty with the requesting teams?
21:05	<littledan>	or maybe you're saying, they are happy with the null context?
21:07	<littledan>	But with our current polyfill (which dispatches events in the registration context), they're running into real problems where event listeners are registered in the "application bootstrap context" and they really want stronger guarantees that that context will completely disappear when bootstrapping is complete. This is excellent to know that you're really trying this in a polyfill and it's really not working out. But can you explain a little more about how this comes up mechanically, within a system that already does event delegation?
21:18	<Andreu Botella>	If we do that, and at some later point we realize there's some existing event with an async source that needs the async dispatch context to be propagated, that won't be able to be changed
21:18	<Andreu Botella>	we wouldn't even be able to have use counters
21:19	<Steve Hicks>	what do you mean by "use counters"?
21:19	<littledan>	But with our current polyfill (which dispatches events in the registration context), they're running into real problems where event listeners are registered in the "application bootstrap context" and they really want stronger guarantees that that context will completely disappear when bootstrapping is complete. one thing to note is that, if you have a particular variable (like the trace/span id) that you always want to be a particular value (like null), you can set that value explicitly your bootstrapping code. I'm really curious why this kind of thing didn't work for you
21:19	<littledan>	what do you mean by "use counters"? this is about assessing the web compatibility of later changes in semantics
21:20	<Steve Hicks>	I'm not sure I'm saying that async events never dispatch with a causal context.
21:22	<Andreu Botella>	Chromium and firefox (probably also webkit, but I don't know that for sure) have telemetry where, if a page uses some combination of features, that usage gets recorded, to asses e.g. how likely is a change in semantics to affect websites
21:24	<Andreu Botella>	you could have use counters for uses of `AsyncContext.prototype.get()` with the null context, or similar, but you can't track cases like `foo.addEventListener("bar", () => { asyncVar.run("baz", () => { someUnrelatedAsyncVar.get(); }); });`
21:24	<Andreu Botella>	hm...
21:24	<Andreu Botella>	I guess it's not impossible to track, but it might add implementation complexity
21:26	<Andreu Botella>	okay, you could track all uses of contexts derived from an async event without a lot of complexity, but not from a specific async event
21:26	<littledan>	Andreu, I think you're jumping ahead a few steps too many when you're talking about this future compatibility risk; let's focus on figuring out their needs first
21:27	<littledan>	Steve Hicks: Are we talking about the span id/trace id? Is this just one variable? Could you say more about how/when it's initialized, and how "application bootstrap time" becomes misleading compared to being identifiably "null"?
21:28	<Steve Hicks>	Yes, the place where this is coming up is with the root trace, where the bootstrap runs in a bootstrap trace, and then event handlers look for a missing trace to know to start a new one (rather than continue an existing one). It might be possible to explicitly zero out the trace when registering listeners - I've asked whether that's viable.
21:30	<Justin Ridgewell>	But with our current polyfill (which dispatches events in the registration context), they're running into real problems where event listeners are registered in the "application bootstrap context" and they really want stronger guarantees that that context will completely disappear when bootstrapping is complete. This sounds like the tail wagging the dog. They want to ensure that their bootstrap context is GC’d, and think not events capturing context will ensure that, but that’s just not the case. What happens when they have a pending promise or `setInterval`? I think null-context is the least useful choice for users. It’s simple enough for them to clear the boostrap context before adding event listeners, but we shouldn’t force everyone else to wrap their event handlers because of this case.
21:31	<Justin Ridgewell>	The GC use case prevents us from ever using a causal context in events.
21:31	<Steve Hicks>	Registration context is simply wrong for many, many events. There may be a more useful causal context that's available, and we should use it when possible, even if we can't polyfill it.
21:31	<littledan>	Yes, the place where this is coming up is with the root trace, where the bootstrap runs in a bootstrap trace, and then event handlers look for a missing trace to know to start a new one (rather than continue an existing one). It might be possible to explicitly zero out the trace when registering listeners - I've asked whether that's viable. ah, OK, they want a bootstrap trace, and then they want event handlers to run in a null trace.... could we address this by loading a null trace snapshot right around addEventListener, narrowly?
21:32	<littledan>	or is it that they really do want a causal trace?
21:32	<Steve Hicks>	My understanding is that the code running `addEventListener` is largely out of our control - third party libraries and whatnot
21:33	<littledan>	maybe patch addEventListener then?
21:33	<Justin Ridgewell>	Registration context is simply wrong for many, many events. There may be a more useful causal context that's available, and we should use it when possible, even if we can't polyfill it. Sure, I see good arguments for causal context. But I don’t think there are any good ones for null context.
21:34	<Steve Hicks>	maybe patch addEventListener then? That way lies madness, let me tell you...
21:34	<littledan>	concrete examples of cases where it is wrong is extremely helpful! Now we have one with bootstrap traces. If we can assemble more, it'll be very useful.
21:34	<Steve Hicks>	I think my point was that when causal context is unavailable, empty context is the best alternative - both for lossy polyfills and for standards
21:35	<littledan>	sure, I guess I'm just trying to understand the details of their requirements, and then we can think of the various possible ways to meet them
21:35	<littledan>	rather than reaching a "point" yet
21:36	<littledan>	A new piece of information for me today was that there's a desire to trace how long bootstrap takes, and so this is what makes the registration context wrong. I did not understand that phenomenon before this conversation.
21:36	<littledan>	so this is very helpful
21:36	<Steve Hicks>	In particular, I've been patching addEventListener for my polyfill, and it's super subtle. Also, short-term patches are nonviable because removeEventListener. So I think we need to put that out of our heads as a remotely viable suggestion.
21:37	<littledan>	In particular, I've been patching addEventListener for my polyfill, and it's super subtle. Also, short-term patches are nonviable because removeEventListener yeah, I can imagine... I'm just trying to understand the constraints, and curious what kinds of third party libraries you're trying to trace through
21:37	<littledan>	and what kind of flexibility you have about how you use them
21:38	<Steve Hicks>	d3 and tanstack router were two that were named
21:39	<Steve Hicks>	and it's not so much that we want to "trace through" them, so much as that they're involved in the app and all it takes is one library not doing it right to mess everything up - this is why the right defaults are critical
21:43	<Stephen Belanger>	Node.js never patches events, and this is very intentional. Event emitters themselves are not async, they inherit asynchrony from what publishes to them. Therefore it’s actually that triggering thing you need to propagate context to and leave event emitters alone to just continue in the context the dispatch happens in.
21:43	<Stephen Belanger>	In web platform that triggering thing can be internals, so that needs to be taken into consideration.
21:44	<Stephen Belanger>	But it should be treated as a propagation of the web platform thing, not a propagation of event emitters.
21:45	<Stephen Belanger>	Which means event emitters should not be expected to have “consistent” behaviour, because the conditions in which they are triggered is not consistent.
21:45	<Andreu Botella>	propagating the context through everything async in the web platform is simply not feasible at this point in time, which is why I'm talking about async sources for events (which in your terms is async internals that eventually triggers the event)
21:45	<Andreu Botella>	that way we can focus only on the async internals that are observable
21:46	<Stephen Belanger>	It doesn’t need to propagate fully through it necessarily. Only to the extent that is observable from JavaScript. (At least at this point, anyway…)
21:47	<Stephen Belanger>	I do want to have it actually accessible everywhere though as I would really like for profiler samples to capture current context state.
21:47	<Stephen Belanger>	But, as you say, that’s a large effort.
21:47	<littledan>	so, how would it be if EventTargets had a snapshot associated with them (say, the default for the whole document, so the global one) and ran all of their events there?
21:48	<littledan>	and if there's another relevant snapshot for an event, we pass that in a property
21:49	<littledan>	we have to define the web semantics completely, we can't say "whatever falls out"
21:49	<Stephen Belanger>	Binding to the initialization of an EventTarget? I can see how that could be useful for resource attribution, but would not work for the application tracing case without needing to rebind everything to the execution flow path (aka through flow)
21:50	<littledan>	there are two problems we're discussing: the one Steven H raised (avoid misattribution to the bootstrap phase) and this one you're raising (do all the tracing causally)
21:51	<Stephen Belanger>	I think they’re related problems.
21:52	<Stephen Belanger>	The attribution depends on the flow you’re expecting.
21:52	<littledan>	sure, the perfect solution to everything also solves the easier version that Steven described where it's OK for us to cheat and use the null context
21:53	<Andreu Botella>	But, as you say, that’s a large effort. Even having all events with an internal async source propagate the causal context won't be easy, and we have already gotten pushback (from Domenic Denicola) saying that that will be too burdensome for folks working on web specs
21:53	<littledan>	Even having all events with an internal async source propagate the causal context won't be easy, and we have already gotten pushback (from Domenic Denicola) saying that that will be too burdensome for folks working on web specs yes, we've gotten this pushback from Anne van Kesteren as well. Together, they represent Chrome and Safari.
21:53	<Stephen Belanger>	Attributing something to boot or EventTarget creation implies you are interested in viewing things from the perspective of resource ownership. Through or causal flow is a more direct line to what triggered the thing.
21:54	<littledan>	Attributing something to boot or EventTarget creation implies you are interested in viewing things from the perspective of resource ownership. Through or causal flow is a more direct line to what triggered the thing. sorry, I meant, just giving up 100% and saying, it's the global default snapshot
21:54	<littledan>	Stephen Belanger: What do you want us to do when you bring up, browsers should do causal flow, when Andreu is reporting that we have pushback from browsers on this, and you also say you don't actually care much about the browser case?
21:55	<Andreu Botella>	I think the right behavior is causal flow, but it seems unfeasible
21:55	<Stephen Belanger>	I think “giving up” is a valid decision given the complexity, though needs to be very clearly encoded and explained as “This is not the correct decision, this is just what we have now because solving this case is hard”
21:56	<Stephen Belanger>	Meaning, not painting ourselves into a corner with expecting that flow forever.
21:56	<Andreu Botella>	I think “giving up” is a valid decision given the complexity, though needs to be very clearly encoded and explained as “This is not the correct decision, this is just what we have now because solving this case is hard” Yes, but the problem with that is that the web is not versioned, so if programs start to rely on the behavior now, that behavior will be locked forever
21:56	<Andreu Botella>	and it's even going to be hard to detect whether programs rely on that behavior
21:56	<Andreu Botella>	I don't think there are any good solutions
21:57	<littledan>	right, I don't see a way to avoid "painting ourselves into a corner"
21:57	<littledan>	we have to choose what the semantics are
21:57	<Stephen Belanger>	Stephen Belanger: What do you want us to do when you bring up, browsers should do causal flow, when Andreu is reporting that we have pushback from browsers on this, and you also say you don't actually care much about the browser case? I mean I care about the browser case. APM vendors maybe less so, but it’s a space we’re gradually starting to need to care about.
21:57	<littledan>	I mean I care about the browser case. APM vendors maybe less so, but it’s a space we’re gradually starting to need to care about. OK sorry ignore the last clause and just focus on the first two
21:58	<Stephen Belanger>	It’s more just that servers have very different execution patterns from browsers, and if you define something only considering browser runtime semantics then you make it unusable for servers.
21:58	<littledan>	maybe you could be more concrete about a server case that you're worried about us defining the wrong semantics for?
22:00	<Stephen Belanger>	Yes, but the problem with that is that the web is not versioned, so if programs start to rely on the behavior now, that behavior will be locked forever Yes, that’s my concern with the “giving up” approach. If we can be sure that one would never expect an empty context, but it could be present in some scenarios, then it could be seen as just an absence of support for that scenario. But if people begin to expect an empty context in that case then it becomes contract.
22:02	<Stephen Belanger>	maybe you could be more concrete about a server case that you're worried about us defining the wrong semantics for? I’ve stated it before, as has Matteo. Users expect things to persist into temporally continuing code, not just call-recursive code. AsyncContext as it is now gets the call-recursive flow fine, but doesn’t flow out to get the temporally continuous context.
22:03	<littledan>	I’ve stated it before, as has Matteo. Users expect things to persist into temporally continuing code, not just call-recursive code. AsyncContext as it is now gets the call-recursive flow fine, but doesn’t flow out to get the temporally continuous context. I know you and Matteo have stated this broad goal, but I think what we need to do is collect more concrete cases of variables and callsites where it's important that this handling be done
22:03	<Stephen Belanger>	Like we need to be able to create a span in a mysql query call, have that span flow out to the function that called it, and become the parent of the next span created in logically continuing code.
22:03	<Andreu Botella>	Yes, that’s my concern with the “giving up” approach. If we can be sure that one would never expect an empty context, but it could be present in some scenarios, then it could be seen as just an absence of support for that scenario. But if people begin to expect an empty context in that case then it becomes contract. There are at least two cases (XHR and same-window `postMessage`) where things internal to Chrome need the context to be propagated asynchronously. So it wouldn't be using the empty context for all events with an async source. But that's probably not enough for developers to not begin to expect an empty context in all other cases
22:04	<littledan>	Like we need to be able to create a span in a mysql query call, have that span flow out to the function that called it, and become the parent of the next span created in logically continuing code. Can this be done by mutating a span object that's held in an asynccontext variable? I thought that's what OTel already does
22:05	<Stephen Belanger>	No, because you lose the causality if you don’t actually have causal flow.
22:05	<Steve Hicks>	Can this be done by mutating a span object that's held in an asynccontext variable? I thought that's what OTel already does This is also what we're doing with our internal tracing framework
22:05	<littledan>	No, because you lose the causality if you don’t actually have causal flow. can you elaborate on what you mean here?
22:05	<Stephen Belanger>	I’ve provided examples before that if you, for example, do a Promise.all(…), have no way to know which branch you’re trying to merge back to.
22:06	<Stephen Belanger>	You’d have two branches trying to write to the same span and it would break.
22:06	<littledan>	but... wasn't your solution to choose arbitrarily among them anyway?
22:06	<Steve Hicks>	This is also what we're doing with our internal tracing framework And we prefer it over the flow-through model because of the lexical guarantees against context being nuked by a bad actor in the middle.
22:07	<Stephen Belanger>	You can only nuke flow-through if you mess with the context through a global graph, which it’s designed to generally not do.
22:08	<Stephen Belanger>	And you can cause the same problems with around flow right now.
22:08	<Stephen Belanger>	We’re doing exactly that right now with AsyncResource in Node.js messing with a global graph and causing problems for people.
22:09	<Stephen Belanger>	Around flow is not inherently safer.
22:09	<Stephen Belanger>	It’s actually less safe because you frequently need to bind out of the paths it gives you because most are only conditionally useful.
22:10	<Steve Hicks>	I think the ideal state to my mind is that we have a version of my spreadsheet where all 400 rows are filled out consistently, with each either being "registration time", "empty context", or some other combination involving more specific contexts, etc. And then polyfills would likely fall back on empty context when they can't access the correct one.
22:12	<Stephen Belanger>	Around flow also requires substantially more binding logic than through flow as through flow is in most cases just a continuation of the scope you are already in, while other semantics are continuously trying to return to a prior state out of the state it is in presently.
22:13	<Steve Hicks>	It’s actually less safe because you frequently need to bind out of the paths it gives you because most are only conditionally useful. This is a good point, which demonstrates that the least safe thing of all is bad defaults. Any time anyone needs to break from the defaults, it leads to risk of (1) not doing it because they didn't realize they needed to, (2) doing it too much/inappropriately because it's subtle.
22:13	<littledan>	I think the ideal state to my mind is that we have a version of my spreadsheet where all 400 rows are filled out consistently, with each either being "registration time", "empty context", or some other combination involving more specific contexts, etc. And then polyfills would likely fall back on empty context when they can't access the correct one. sounds good, but it'd be easier to fill this out if we had some agreed-on examples of cases where we don't want to use the registration context and want to use something else in particular. Do you have a few off the top of your head?
22:15	<Steve Hicks>	Honestly, I almost never want registration context. setTimeout is about it.
22:15	<Steve Hicks>	XHR
22:16	<littledan>	so, idk, if you could go crazy and fill out the table with your opinions (maybe we could make different columns for us each to "vote") that'd be helpful
22:16	<littledan>	I don't know whether you want things to be null or something else, for the other cases
22:16	<littledan>	I was suggesting above, let's go crazy and make all events always be in the null context; I didn't really get a read for whether you liked that idea
22:17	<Steve Hicks>	sorry, yes, I think that's closer to what I'd prefer
22:18	<Steve Hicks>	though I'm not sure if that answer is quite nuanced enough to fly
22:18	<littledan>	what sort of nuance do you think would be good to correct?
22:23	<Steve Hicks>	A lot of events seem to be triggerable either by code or by user interaction. I would like to see the active context when the dispatch task was queued for the former case, and the empty context for the latter. This requires an understanding that the listener needs to be able to handle both empty and non-empty contexts, depending on how the event was triggered.
22:23	<Steve Hicks>	for polyfills, everything looks user-triggered
22:24	<littledan>	OK, if this is the only thing to follow (dispatch context vs empty context) it seems simple enough for me to drop my "this is also Zalgo" thing
22:24	<littledan>	also seems simple enough for browsers' concerns about "can we do this without having some huge understanding of tracing whenever handling any callback"
22:25	<littledan>	in Andreu's analysis, he identified many cases where the registration context just seemed like the right answer, beyond setTimeout. Did you have any thoughts on those?
22:25	<Steve Hicks>	I don't know which those were
22:25	<Andreu Botella>	anything before the event section in the web integration document
22:25	<Steve Hicks>	I think XHR might be one case where it's something other than dispatch-or-empty
22:26	<littledan>	can you look at Andreu's doc and identify which parts you agree and disagree with?
22:26	<Steve Hicks>	can you look at Andreu's doc and identify which parts you agree and disagree with? This is https://github.com/tc39/proposal-async-context/pull/100 ?
22:27	<Andreu Botella>	yeah
22:27	<littledan>	thanks
22:37	<Steve Hicks>	Observers can batch, so you can't get the causal context - I don't feel strongly between null vs registration, but probably we need to expose the cause via mutation record property I don't have enough context for action registrations to speak intelligently, but I think they are similar to events. Certainly web component lifecycle callbacks need to get a causally relevant (or fallback on null) snapshot, but I suspect something like mediaSession might also have a more relevant context, e.g. if you call `play()` programmatically or something
22:39	<Steve Hicks>	Async completion callbacks all seem good as they are - I generally see registratio-time as a preferred default for callbacks that will run at most once, at a semi-predicatable time. When it will call more than once, indefinitely into the future, I think registration is the wrong default, though you may still need opt back into it in some cases.
22:41	<littledan>	can you elaborate on which cases we should opt back into registration context?
22:42	<littledan>	for observers: do you think the causal context is necessary for the MVP, or is it OK if it's a "for-later" thing?
22:42	<littledan>	observers are a fun test of what we mean by "synchronous"!
22:49	<Andreu Botella>	for observers, since the cause would be a new property, that can be left for later without risk of breakage
23:30	<Chengzhong Wu>	Yes, the place where this is coming up is with the root trace, where the bootstrap runs in a bootstrap trace, and then event handlers look for a missing trace to know to start a new one (rather than continue an existing one). It might be possible to explicitly zero out the trace when registering listeners - I've asked whether that's viable. Is the trace on event handler agnostic to event types? In implementations where instrument known event types, it can determine either creating a new root span, or creating a child span by event type semantics.
23:31	<Chengzhong Wu>	Like in OpenTelemetry, it is needed to be aware of event type semantics to produce useful traces.