00:53 | <littledan> | Now, the fact that this wrapping would make a handful of existing use cases impossible is clearly a problem, and points to why we need some way to access the causing context, rather than just the registration context. This could be as simple as a AsyncContext.Snapshot.noWrap that would prevent any automatic wrapping (or other implicit snapshot restoration that the VM would do via abstract operations), though it's not clear to me whether this would (or should!) provide the non-restoring async/await behavior (vs. just affecting what context callbacks are entered with the first time). |
00:54 | <littledan> | Like it cancels out a wrap that happens around it? |
01:42 | <Steve Hicks> | yes, something like that - aside from being able to observe callingContext, wrapping is idempotent, so nowrap would maybe cancel all wraps that happen around it? not quite clear - it's just one option. |
02:26 | <littledan> | It feels kinda vaguely non-compositional to cancel out wrap; maybe better to avoid wrapping in the first place. Weren’t we discussing options for that? (But then you run into the “no null context” issue also) |
02:27 | <littledan> | I am not being very clear with noncompositional… I guess I mean, if you replace f with n => f(n), then noWrap stops working |
02:27 | <Steve Hicks> | not necessarily - it depends on how it's implemented. |
02:29 | <Steve Hicks> | If the implementation is to explicitly restore the previous snapshot, then it's robust to n => f(n) , though it makes wrap no longer idempotent (i.e. wrap(wrap(unwrap(f))) == wrap(f) ) and there's potentially weirdness around unwrap(unwrap(f)) == f or some such... |
02:30 | <Steve Hicks> | or if it popped off a stack instead of just swapping in the previous then it really would undo one layer for each |
02:30 | <Steve Hicks> | but it would only work from the inside - unwrap(wrap(f)) == wrap(f) , which is surprising |
02:32 | <Steve Hicks> | I don't love those properties... but I think any observability of callingContext probably breaks idempotency of wrap , so maybe there's no way to keep that one? |
08:07 | <littledan> | Huh, how would it figure out the previous one at runtime? Currently you can just use a single global variable for the current snapshot. |
08:36 | <Stephen Belanger> | Whould you mind expanding on the distinguishing of child-of and follow-from? All promise resolutions are scheduled asynchronously, it seems to me that in this case "child-of" would be meaning-less in promise's case The resolution is async, but we create a span around the thing that created the promise, so the child-of or follows-from relationship should already have been established at this point. When calling an async function, the portion up to the first await is called synchronously, so if we choose to wrap that part in a span then that activity should be considered a child-of relationship. However, anything after the first await is schedule asynchronously and so should be considered a continuation and therefore a follows-from relationship.
|
08:42 | <littledan> | Do OTel implementations in any other language with async/await make this distinction properly? If so it would be great to be able to refer to their implementation strategies. |
08:42 | <Stephen Belanger> | where can I read more about that pile of hacks? (maybe you already wrote about it somewhere?) |
08:45 | <Stephen Belanger> | I hope it's not necessary either, but I can see how Ste{v,ph}en's points argue for it to be necessary. (I'm having trouble understanding what you're supposed to do with currentContext besides reconstruct this array) |
08:51 | <Stephen Belanger> | The "don't bind around await" case feels a lot like "just use a global variable" to me. To my mind, automatic binding around await is basically one of the main purposes of AC, so if you're not doing that, what are you gaining? Is it just the question of restoring context after awaiting an already-done promise? Assuming all awaited promises were pending when the await kicked off, then a simple model seems like doing nothing would "just work" since the continuation would follow immediately afterwards, so the global value would be correct. Of course this breaks down when the await doesn't actually have to wait - and maybe also when two independent jobs are running interleaved? |
08:53 | <Stephen Belanger> | To Stephen's point about rewriting code from callbacks to promise/await, I agree that they should be consistent, but I'd argue that it's a transient issue. Callback-based code should be updated to wrap the callbacks so that the behavior is the same. I understand that the fact that ALS has existed for a while with the other behavior makes this a bit of a problem... Ideally the standard Node APIs would have all been updated to wrap/bind when ALS first came out, before anyone began depending on the non-wrapping behavior. |
08:54 | <Stephen Belanger> | I'm not even sure what use there would be for the other way with callbacks. I don't see any use case for that flow. 🤔 |
08:58 | <Stephen Belanger> | Do OTel implementations in any other language with async/await make this distinction properly? If so it would be great to be able to refer to their implementation strategies. |
08:58 | <littledan> | Yes, basically every other language flows this way with OTel, but other languages also provide things like bytecode manipulation so it's able to do this externally. With Node.js there is no solution. |
08:59 | <littledan> | Bytecode manipulation isn’t magic; I would be surprised if we can’t do what’s achieved with it |
09:00 | <Stephen Belanger> | Not sure. Like I said, most do this externally to the runtime itself so I'm unsure if any particular language is a good example of it. |
09:02 | <Chengzhong Wu> |
In the middle of a promise chain, it reads to me that the proposed solution needs to depends on additional cost to determine a child-of relationship now |
09:02 | <littledan> | That’s OK if it’s external, if it implements the right semantics I am still happy to have that as the reference point |
09:02 | <Stephen Belanger> | Bytecode manipulation isn’t magic; I would be surprised if we can’t do what’s achieved with it |
09:03 | <littledan> | Much of the code we need to patch to do that in JS is not reachable at runtime. With bytecode you can manipulate everything, even private things. |
09:04 | <Stephen Belanger> |
PROPOSED CONTEXT: SPAN1.2 ? What I'm expecting is for the context there to be a continuation of SPAN1 . |
09:05 | <Stephen Belanger> | Sounds good, so can a language runtime, which is what we are talking about changing. It would help us to have a concrete reference point even if it is external/messy |
09:07 | <Chengzhong Wu> | The example is equivalent to async-await as:
Is this the proposed behavior in https://github.com/tc39/proposal-async-context/issues/83? |
09:12 | <Stephen Belanger> | Again, I'm unclear what the decimal point thing is about here? |
09:13 | <littledan> | I'll have to talk to my team about that and see what examples I can get from them as I'm really not he expert on other languages. 😅 |
09:14 | <Chengzhong Wu> | Again, I'm unclear what the decimal point thing is about here? |
09:16 | <Stephen Belanger> | Ah, no. Only the first is a child-of because it occurs within the sync initial part of the async function. |
09:16 | <Stephen Belanger> | The second should be follows-from. |
09:18 | <Stephen Belanger> | The connection.query(...) expression is called before its result is awaited, and it's the calling of that which produces a span. That span is therefore within the synchronous scope of the http server request span. |
09:18 | <Stephen Belanger> | If there was some other await before it, even one which did not produce a span, then it would be considered follows-from. |
09:19 | <Chengzhong Wu> | yeah, so in a promise chain, child-of is no longer effective (only the first of a promise chain is sync) and I'm curious about how you create a graph with nesting semantics. |
09:21 | <Stephen Belanger> | That would look the same. The creation of that initial promise produces a sync/child-of span, and then anything in following then(...) continuations would be follows-from of whichever thing in the chain last produced a span. |
09:22 | <Stephen Belanger> | So from a user code behaviour perspective, callbacks, promises, and async/await would all produce the same graph. |
09:23 | <Stephen Belanger> | And in the less common case where a continuation is attached far away from where the promise is created, a manual bind could be used where necessary to express that further away logical path, if it makes sense. |
09:23 | <Stephen Belanger> | But generally from APM perspective we basically never want that further away path. |
09:23 | <Stephen Belanger> | Meaning register time of a promise continuation. |
09:24 | <Stephen Belanger> | We want resolve time basically universally. |
09:24 | <Stephen Belanger> | As do basically all userland context flow use cases I'm aware of. I'm very unclear what the actual use case is for register time binding. 😕 |
09:25 | <littledan> | Sometimes there is no particular resolve time context to apply, and we just need to fall back to register time |
09:25 | <littledan> | If it goes to a browser/os primitive |
09:27 | <littledan> | There’s the general connection pooling case, which generalizes this |
09:28 | <littledan> | Generally to get child-of edges, don’t you want restore after await? Sync is not enough. |
09:29 | <Stephen Belanger> | Resolve time naturally flows out to eventually promise-creation time--the thing you called which created a promise. |
09:29 | <littledan> | What do you mean “flows out to”? |
09:30 | <Stephen Belanger> | Generally to get child-of edges, don’t you want restore after await? Sync is not enough. |
09:32 | <Stephen Belanger> | What do you mean “flows out to”? |
09:33 | <littledan> | yeah, so in a promise chain, child-of is no longer effective (only the first of a promise chain is sync) and I'm curious about how you create a graph with nesting semantics. |
09:33 | <Stephen Belanger> | Creation of a promise is just an allocation. It's not controlling flow at that point. The actual scheduling of a promise is the providing of a value to resolve it with. |
09:36 | <Andreu Botella> | I mean the context flows up to some execution barrier where the JS-side work to be done on the scheduling side is done. With a promise, this is generally when the resolve happens. But if it's a native promise or something like that then none of the resolving stage happens within JS so you just follow the flow back upward until you get to a point in the causal that was in JS, which is when the original call happened. AsyncContext.Snapshot.p.run along the promise resolution path? |
09:37 | <Stephen Belanger> | in JS code that creates a promise and then resolves it, wouldn't this only be the case if there are no |
09:39 | <Stephen Belanger> | From my perspective: context flows into the promise executor, it flows into anything that happens within that executor, it then captures whenever the resolve(...) is called as that is what actually schedules passing the value to continuations. If the context value is set outside the promise constructor it will use that context. If it gets set somewhere within the executor leading up to when it calls resolve(...) it will use that. It all flows naturally toward the resolve point where it schedules the value to pass to continuations. |
09:40 | <Andreu Botella> | so IIUC, you're saying that in cases like await someAPI() , the resolve time is a child of the context before the await |
09:40 | <Andreu Botella> | but that is not the case with things like
|
09:41 | <Stephen Belanger> | Now you could look at it the other way of attaching a continuation is the scheduling and the resolve is a fulfillment of that scheduling, but the fulfillment is not triggered by that attaching of a continuation so I don't really feel like that's actually correct. Possibly a useful path to follow in some cases, though I'm unclear when. |
09:43 | <Stephen Belanger> | Yeah, so the expression which produces the promise which will be awaited occurs within that context, so the context flows into it naturally. The context of that outer function should reach into the promise and flow all the way to its resolve, unless some other context is set along that path. When the promise resolves the following execution is then a continuation of that resolution and should therefore flow the context which was captured at the point that path resolved. |
09:44 | <Stephen Belanger> | In your example there, you captured a snapshot and ran that around the resolve, so that changes the context which should come out of the await. |
09:47 | <Stephen Belanger> | This is why I feel flowing up to the resolve barrier makes a lot more sense, and then not binding around awaits at all, or maybe having some secondary flow that could be captured for those. If you follow the resolve path you will naturally have the context from the start of the async function flow through all the awaits and continue to be available, only changing if something in that causal path changed the context around what led to their resolve. |
09:48 | <Stephen Belanger> | Binding around awaits is chopping off branches of the execution graph, which I'm unclear why anyone would want to do that by default. 🤔 |
09:49 | <Andreu Botella> | but in cases like this, wouldn't this mess up tracing?
|
09:49 | <Stephen Belanger> | Like I get that there seems to be some use case people have in mind for why it makes sense for them to bind and restore around awaits, but I still have not seen any adequate explanation on why anyone would ever want that. 😕 |
09:50 | <Stephen Belanger> |
someApi() explicitly emptying the context for some reason? |
09:51 | <Andreu Botella> | I meant the definition of someApi() above, which is |
09:51 | <Stephen Belanger> | Ah, you mean it's restoring the snapshot captured previously? |
09:52 | <Stephen Belanger> | Yeah, that would return to whatever the context was which was captured in that snapshot. |
09:52 | <Stephen Belanger> | So if you're using a snapshot in that way then that is expected. |
09:53 | <Stephen Belanger> | And this is also why I would like to have per-instance bind because sometimes it makes sense globally like connection pools, but sometimes opinions may vary. |
09:53 | <Andreu Botella> | maybe this isn't something worth worrying about, but someone might be using snapshots to do things around their AsyncContext.Variable s not realizing that that might affect the context in which their promise gets resolved, which would affect other use libraries that use AsyncContext |
09:54 | <Andreu Botella> | also, if you have an interaction of various libraries that use AsyncContext for their separate goals, they might interact in unforeseen ways |
09:54 | <Stephen Belanger> | But yes, I believe it is "correct" for that to break context. Or rather, it might be incorrect that it is doing that snapshot restore. |
09:54 | <Stephen Belanger> | maybe this isn't something worth worrying about, but someone might be using snapshots to do things around their AsyncResource . 🙈 |
09:56 | <Stephen Belanger> | So in Node.js, async_hooks made the mistake of producing a singular execution graph which is grafted with AsyncResource binds. AsyncLocalStorage is layered over it so it inherits that mistake. We want it to not be layered over that so we can fix that problem. |
09:57 | <Andreu Botella> | I'm not sure I understand. Even if we do have per-instance wrap, you would still need global wraps, right? |
09:57 | <Stephen Belanger> | APMs (unfortunately) fairly often use AsyncResource to "fix" context management flows, but then we break other vendors because we change the graph they were expecting. 😬 |
09:58 | <Stephen Belanger> | Yes, global is still needed. |
09:58 | <Stephen Belanger> | Just not always. |
09:58 | <Stephen Belanger> | And this is specifically why I created the WindowChannel concept. So different consumers could make different decisions in a reasonably user-friendly way. |
09:59 | <Andreu Botella> | So my worry is about a (maybe small) library that might not realize they're supposed to be doing per-instance wrap rather than global wrap, since it works fine in their tests |
10:00 | <Andreu Botella> | Or cases where a library might have multiple variables and they use global wrap to handle all of them at once |
10:00 | <Stephen Belanger> | I mean, that will be a problem. But without instance-scoped bind it's still a problem, just without a solution. |
10:01 | <Andreu Botella> | right, I thought making per-instance wrap part of the API was a convenience thing, or perhaps a worry about performance, but it's part of the affordance of the API design |
10:01 | <Stephen Belanger> | People are going to use APIs wrong. We can't stop that. At best we can try to make it clear when they're doing it wrong. |
10:02 | <Stephen Belanger> | I mean, you can just do it with a store.get() and a closure that does store.run(...) around the orignal function. So it's easy enough to do externally. But I personally feel it should be part of the actual API, for clarity. |
10:02 | <Andreu Botella> | yeah, I see that now |
10:03 | <Stephen Belanger> | Well, that's not completely true if we have the sync vs continuation differentiation, but that's another thing. |
10:04 | <Stephen Belanger> | That's also why in my RFC I was explicitly defining separate windows. |
10:07 | <Stephen Belanger> | With separate windows you could make the store.run(...) be a continuationWindow.run(...) instead and key the child-of versus follows-from decision off that. |
10:10 | <Stephen Belanger> | I think separating those concepts from user perspective might be a bit too much complication though. I'm definitely one that would advocate for making it actively difficult to use APIs wrong. It can be challenging to do, but given a large enough pool of users you will for sure encounter a bunch which find some way to use it wrong. 😅 |
10:11 | <Stephen Belanger> | The less footguns we can give users the better. 🙈 |
12:26 | <Stephen Belanger> | If we have instance-scoped bind, should we give the instance and global versions different but related names to make the intent a bit more clear? Something like instance.bind(...) and Variable.bindAllVariables(...) or something like that? |
13:04 | <Chengzhong Wu> | This per-instance bind sounds like a requirement after the behavior is defined as flowing out to the outer scope. In the current form, this example would not be a problem:
|
13:06 | <Chengzhong Wu> | the problem also seems to be what we want to avoid as dynamic scope |
13:23 | <littledan> | APMs (unfortunately) fairly often use |