20:56 | <Steve Hicks> | I was talking with our tracing folks today (both from the tracing side and the web frameworks side) and had broad agreement that they generally just want event handlers to do as little as possible in terms of snapshot/restore. I told them about my concerns with polyfilling these "async causes" (i.e. programmatic JS that triggers an event to be queued for dispatch in a future microtask) and the consensus was that, for any events we tend to care about, it's basically always more accurate to run listeners in the root/empty context rather than the registration context, and not to worry about a more-accurate programmatic causal context that we can't get access to. |
20:58 | <Steve Hicks> | But with our current polyfill (which dispatches events in the registration context), they're running into real problems where event listeners are registered in the "application bootstrap context" and they really want stronger guarantees that that context will completely disappear when bootstrapping is complete. |
21:05 | <littledan> | [EDIT: Sorry I misread what you wrote above, you already did address this] people have this intuition of "do as little as possible" or "disappear" but I don't think they mean the null context, I think they mean some sort of causal context, which we've discussed the difficulty of defining. Did you raise this difficulty with the requesting teams? |
21:05 | <littledan> | or maybe you're saying, they are happy with the null context? |
21:07 | <littledan> | But with our current polyfill (which dispatches events in the registration context), they're running into real problems where event listeners are registered in the "application bootstrap context" and they really want stronger guarantees that that context will completely disappear when bootstrapping is complete. |
21:18 | <Andreu Botella> | If we do that, and at some later point we realize there's some existing event with an async source that needs the async dispatch context to be propagated, that won't be able to be changed |
21:18 | <Andreu Botella> | we wouldn't even be able to have use counters |
21:19 | <Steve Hicks> | what do you mean by "use counters"? |
21:19 | <littledan> | But with our current polyfill (which dispatches events in the registration context), they're running into real problems where event listeners are registered in the "application bootstrap context" and they really want stronger guarantees that that context will completely disappear when bootstrapping is complete. |
21:19 | <littledan> | what do you mean by "use counters"? |
21:20 | <Steve Hicks> | I'm not sure I'm saying that async events never dispatch with a causal context. |
21:22 | <Andreu Botella> | Chromium and firefox (probably also webkit, but I don't know that for sure) have telemetry where, if a page uses some combination of features, that usage gets recorded, to asses e.g. how likely is a change in semantics to affect websites |
21:24 | <Andreu Botella> | you could have use counters for uses of
|
21:24 | <Andreu Botella> | hm... |
21:24 | <Andreu Botella> | I guess it's not impossible to track, but it might add implementation complexity |
21:26 | <Andreu Botella> | okay, you could track all uses of contexts derived from an async event without a lot of complexity, but not from a specific async event |
21:26 | <littledan> | Andreu, I think you're jumping ahead a few steps too many when you're talking about this future compatibility risk; let's focus on figuring out their needs first |
21:27 | <littledan> | Steve Hicks: Are we talking about the span id/trace id? Is this just one variable? Could you say more about how/when it's initialized, and how "application bootstrap time" becomes misleading compared to being identifiably "null"? |
21:28 | <Steve Hicks> | Yes, the place where this is coming up is with the root trace, where the bootstrap runs in a bootstrap trace, and then event handlers look for a missing trace to know to start a new one (rather than continue an existing one). It might be possible to explicitly zero out the trace when registering listeners - I've asked whether that's viable. |
21:30 | <Justin Ridgewell> | But with our current polyfill (which dispatches events in the registration context), they're running into real problems where event listeners are registered in the "application bootstrap context" and they really want stronger guarantees that that context will completely disappear when bootstrapping is complete. setInterval ?I think null-context is the least useful choice for users. It’s simple enough for them to clear the boostrap context before adding event listeners, but we shouldn’t force everyone else to wrap their event handlers because of this case. |
21:31 | <Justin Ridgewell> | The GC use case prevents us from ever using a causal context in events. |
21:31 | <Steve Hicks> | Registration context is simply wrong for many, many events. There may be a more useful causal context that's available, and we should use it when possible, even if we can't polyfill it. |
21:31 | <littledan> | Yes, the place where this is coming up is with the root trace, where the bootstrap runs in a bootstrap trace, and then event handlers look for a missing trace to know to start a new one (rather than continue an existing one). It might be possible to explicitly zero out the trace when registering listeners - I've asked whether that's viable. |
21:32 | <littledan> | or is it that they really do want a causal trace? |
21:32 | <Steve Hicks> | My understanding is that the code running addEventListener is largely out of our control - third party libraries and whatnot |
21:33 | <littledan> | maybe patch addEventListener then? |
21:33 | <Justin Ridgewell> | Registration context is simply wrong for many, many events. There may be a more useful causal context that's available, and we should use it when possible, even if we can't polyfill it. |
21:34 | <Steve Hicks> | maybe patch addEventListener then? |
21:34 | <littledan> | concrete examples of cases where it is wrong is extremely helpful! Now we have one with bootstrap traces. If we can assemble more, it'll be very useful. |
21:34 | <Steve Hicks> | I think my point was that when causal context is unavailable, empty context is the best alternative - both for lossy polyfills and for standards |
21:35 | <littledan> | sure, I guess I'm just trying to understand the details of their requirements, and then we can think of the various possible ways to meet them |
21:35 | <littledan> | rather than reaching a "point" yet |
21:36 | <littledan> | A new piece of information for me today was that there's a desire to trace how long bootstrap takes, and so this is what makes the registration context wrong. I did not understand that phenomenon before this conversation. |
21:36 | <littledan> | so this is very helpful |
21:36 | <Steve Hicks> | In particular, I've been patching addEventListener for my polyfill, and it's super subtle. Also, short-term patches are nonviable because removeEventListener. So I think we need to put that out of our heads as a remotely viable suggestion. |
21:37 | <littledan> | In particular, I've been patching addEventListener for my polyfill, and it's super subtle. Also, short-term patches are nonviable because removeEventListener |
21:37 | <littledan> | and what kind of flexibility you have about how you use them |
21:38 | <Steve Hicks> | d3 and tanstack router were two that were named |
21:39 | <Steve Hicks> | and it's not so much that we want to "trace through" them, so much as that they're involved in the app and all it takes is one library not doing it right to mess everything up - this is why the right defaults are critical |
21:43 | <Stephen Belanger> | Node.js never patches events, and this is very intentional. Event emitters themselves are not async, they inherit asynchrony from what publishes to them. Therefore it’s actually that triggering thing you need to propagate context to and leave event emitters alone to just continue in the context the dispatch happens in. |
21:43 | <Stephen Belanger> | In web platform that triggering thing can be internals, so that needs to be taken into consideration. |
21:44 | <Stephen Belanger> | But it should be treated as a propagation of the web platform thing, not a propagation of event emitters. |
21:45 | <Stephen Belanger> | Which means event emitters should not be expected to have “consistent” behaviour, because the conditions in which they are triggered is not consistent. |
21:45 | <Andreu Botella> | propagating the context through everything async in the web platform is simply not feasible at this point in time, which is why I'm talking about async sources for events (which in your terms is async internals that eventually triggers the event) |
21:45 | <Andreu Botella> | that way we can focus only on the async internals that are observable |
21:46 | <Stephen Belanger> | It doesn’t need to propagate fully through it necessarily. Only to the extent that is observable from JavaScript. (At least at this point, anyway…) |
21:47 | <Stephen Belanger> | I do want to have it actually accessible everywhere though as I would really like for profiler samples to capture current context state. |
21:47 | <Stephen Belanger> | But, as you say, that’s a large effort. |
21:47 | <littledan> | so, how would it be if EventTargets had a snapshot associated with them (say, the default for the whole document, so the global one) and ran all of their events there? |
21:48 | <littledan> | and if there's another relevant snapshot for an event, we pass that in a property |
21:49 | <littledan> | we have to define the web semantics completely, we can't say "whatever falls out" |
21:49 | <Stephen Belanger> | Binding to the initialization of an EventTarget? I can see how that could be useful for resource attribution, but would not work for the application tracing case without needing to rebind everything to the execution flow path (aka through flow) |
21:50 | <littledan> | there are two problems we're discussing: the one Steven H raised (avoid misattribution to the bootstrap phase) and this one you're raising (do all the tracing causally) |
21:51 | <Stephen Belanger> | I think they’re related problems. |
21:52 | <Stephen Belanger> | The attribution depends on the flow you’re expecting. |
21:52 | <littledan> | sure, the perfect solution to everything also solves the easier version that Steven described where it's OK for us to cheat and use the null context |
21:53 | <Andreu Botella> | But, as you say, that’s a large effort. |
21:53 | <littledan> | Even having all events with an internal async source propagate the causal context won't be easy, and we have already gotten pushback (from Domenic Denicola) saying that that will be too burdensome for folks working on web specs |
21:53 | <Stephen Belanger> | Attributing something to boot or EventTarget creation implies you are interested in viewing things from the perspective of resource ownership. Through or causal flow is a more direct line to what triggered the thing. |
21:54 | <littledan> | Attributing something to boot or EventTarget creation implies you are interested in viewing things from the perspective of resource ownership. Through or causal flow is a more direct line to what triggered the thing. |
21:54 | <littledan> | Stephen Belanger: What do you want us to do when you bring up, browsers should do causal flow, when Andreu is reporting that we have pushback from browsers on this, and you also say you don't actually care much about the browser case? |
21:55 | <Andreu Botella> | I think the right behavior is causal flow, but it seems unfeasible |
21:55 | <Stephen Belanger> | I think “giving up” is a valid decision given the complexity, though needs to be very clearly encoded and explained as “This is not the correct decision, this is just what we have now because solving this case is hard” |
21:56 | <Stephen Belanger> | Meaning, not painting ourselves into a corner with expecting that flow forever. |
21:56 | <Andreu Botella> | I think “giving up” is a valid decision given the complexity, though needs to be very clearly encoded and explained as “This is not the correct decision, this is just what we have now because solving this case is hard” |
21:56 | <Andreu Botella> | and it's even going to be hard to detect whether programs rely on that behavior |
21:56 | <Andreu Botella> | I don't think there are any good solutions |
21:57 | <littledan> | right, I don't see a way to avoid "painting ourselves into a corner" |
21:57 | <littledan> | we have to choose what the semantics are |
21:57 | <Stephen Belanger> | Stephen Belanger: What do you want us to do when you bring up, browsers should do causal flow, when Andreu is reporting that we have pushback from browsers on this, and you also say you don't actually care much about the browser case? |
21:57 | <littledan> | I mean I care about the browser case. APM vendors maybe less so, but it’s a space we’re gradually starting to need to care about. |
21:58 | <Stephen Belanger> | It’s more just that servers have very different execution patterns from browsers, and if you define something only considering browser runtime semantics then you make it unusable for servers. |
21:58 | <littledan> | maybe you could be more concrete about a server case that you're worried about us defining the wrong semantics for? |
22:00 | <Stephen Belanger> | Yes, but the problem with that is that the web is not versioned, so if programs start to rely on the behavior now, that behavior will be locked forever |
22:02 | <Stephen Belanger> | maybe you could be more concrete about a server case that you're worried about us defining the wrong semantics for? |
22:03 | <littledan> | I’ve stated it before, as has Matteo. Users expect things to persist into temporally continuing code, not just call-recursive code. AsyncContext as it is now gets the call-recursive flow fine, but doesn’t flow out to get the temporally continuous context. |
22:03 | <Stephen Belanger> | Like we need to be able to create a span in a mysql query call, have that span flow out to the function that called it, and become the parent of the next span created in logically continuing code. |
22:03 | <Andreu Botella> | Yes, that’s my concern with the “giving up” approach. If we can be sure that one would never expect an empty context, but it could be present in some scenarios, then it could be seen as just an absence of support for that scenario. But if people begin to expect an empty context in that case then it becomes contract. postMessage ) where things internal to Chrome need the context to be propagated asynchronously. So it wouldn't be using the empty context for all events with an async source. But that's probably not enough for developers to not begin to expect an empty context in all other cases |
22:04 | <littledan> | Like we need to be able to create a span in a mysql query call, have that span flow out to the function that called it, and become the parent of the next span created in logically continuing code. |
22:05 | <Stephen Belanger> | No, because you lose the causality if you don’t actually have causal flow. |
22:05 | <Steve Hicks> | Can this be done by mutating a span object that's held in an asynccontext variable? I thought that's what OTel already does |
22:05 | <littledan> | No, because you lose the causality if you don’t actually have causal flow. |
22:05 | <Stephen Belanger> | I’ve provided examples before that if you, for example, do a Promise.all(…), have no way to know which branch you’re trying to merge back to. |
22:06 | <Stephen Belanger> | You’d have two branches trying to write to the same span and it would break. |
22:06 | <littledan> | but... wasn't your solution to choose arbitrarily among them anyway? |
22:06 | <Steve Hicks> | This is also what we're doing with our internal tracing framework |
22:07 | <Stephen Belanger> | You can only nuke flow-through if you mess with the context through a global graph, which it’s designed to generally not do. |
22:08 | <Stephen Belanger> | And you can cause the same problems with around flow right now. |
22:08 | <Stephen Belanger> | We’re doing exactly that right now with AsyncResource in Node.js messing with a global graph and causing problems for people. |
22:09 | <Stephen Belanger> | Around flow is not inherently safer. |
22:09 | <Stephen Belanger> | It’s actually less safe because you frequently need to bind out of the paths it gives you because most are only conditionally useful. |
22:10 | <Steve Hicks> | I think the ideal state to my mind is that we have a version of my spreadsheet where all 400 rows are filled out consistently, with each either being "registration time", "empty context", or some other combination involving more specific contexts, etc. And then polyfills would likely fall back on empty context when they can't access the correct one. |
22:12 | <Stephen Belanger> | Around flow also requires substantially more binding logic than through flow as through flow is in most cases just a continuation of the scope you are already in, while other semantics are continuously trying to return to a prior state out of the state it is in presently. |
22:13 | <Steve Hicks> | It’s actually less safe because you frequently need to bind out of the paths it gives you because most are only conditionally useful. |
22:13 | <littledan> | I think the ideal state to my mind is that we have a version of my spreadsheet where all 400 rows are filled out consistently, with each either being "registration time", "empty context", or some other combination involving more specific contexts, etc. And then polyfills would likely fall back on empty context when they can't access the correct one. |
22:15 | <Steve Hicks> | Honestly, I almost never want registration context. setTimeout is about it. |
22:15 | <Steve Hicks> | XHR |
22:16 | <littledan> | so, idk, if you could go crazy and fill out the table with your opinions (maybe we could make different columns for us each to "vote") that'd be helpful |
22:16 | <littledan> | I don't know whether you want things to be null or something else, for the other cases |
22:16 | <littledan> | I was suggesting above, let's go crazy and make all events always be in the null context; I didn't really get a read for whether you liked that idea |
22:17 | <Steve Hicks> | sorry, yes, I think that's closer to what I'd prefer |
22:18 | <Steve Hicks> | though I'm not sure if that answer is quite nuanced enough to fly |
22:18 | <littledan> | what sort of nuance do you think would be good to correct? |
22:23 | <Steve Hicks> | A lot of events seem to be triggerable either by code or by user interaction. I would like to see the active context when the dispatch task was queued for the former case, and the empty context for the latter. This requires an understanding that the listener needs to be able to handle both empty and non-empty contexts, depending on how the event was triggered. |
22:23 | <Steve Hicks> | for polyfills, everything looks user-triggered |
22:24 | <littledan> | OK, if this is the only thing to follow (dispatch context vs empty context) it seems simple enough for me to drop my "this is also Zalgo" thing |
22:24 | <littledan> | also seems simple enough for browsers' concerns about "can we do this without having some huge understanding of tracing whenever handling any callback" |
22:25 | <littledan> | in Andreu's analysis, he identified many cases where the registration context just seemed like the right answer, beyond setTimeout. Did you have any thoughts on those? |
22:25 | <Steve Hicks> | I don't know which those were |
22:25 | <Andreu Botella> | anything before the event section in the web integration document |
22:25 | <Steve Hicks> | I think XHR might be one case where it's something other than dispatch-or-empty |
22:26 | <littledan> | can you look at Andreu's doc and identify which parts you agree and disagree with? |
22:26 | <Steve Hicks> | can you look at Andreu's doc and identify which parts you agree and disagree with? |
22:27 | <Andreu Botella> | yeah |
22:27 | <littledan> | thanks |
22:37 | <Steve Hicks> |
|
22:39 | <Steve Hicks> | Async completion callbacks all seem good as they are - I generally see registratio-time as a preferred default for callbacks that will run at most once, at a semi-predicatable time. When it will call more than once, indefinitely into the future, I think registration is the wrong default, though you may still need opt back into it in some cases. |
22:41 | <littledan> | can you elaborate on which cases we should opt back into registration context? |
22:42 | <littledan> | for observers: do you think the causal context is necessary for the MVP, or is it OK if it's a "for-later" thing? |
22:42 | <littledan> | observers are a fun test of what we mean by "synchronous"! |
22:49 | <Andreu Botella> | for observers, since the cause would be a new property, that can be left for later without risk of breakage |
23:30 | <Chengzhong Wu> | Yes, the place where this is coming up is with the root trace, where the bootstrap runs in a bootstrap trace, and then event handlers look for a missing trace to know to start a new one (rather than continue an existing one). It might be possible to explicitly zero out the trace when registering listeners - I've asked whether that's viable. |
23:31 | <Chengzhong Wu> | Like in OpenTelemetry, it is needed to be aware of event type semantics to produce useful traces. |