00:53
<littledan>
Now, the fact that this wrapping would make a handful of existing use cases impossible is clearly a problem, and points to why we need some way to access the causing context, rather than just the registration context. This could be as simple as a AsyncContext.Snapshot.noWrap that would prevent any automatic wrapping (or other implicit snapshot restoration that the VM would do via abstract operations), though it's not clear to me whether this would (or should!) provide the non-restoring async/await behavior (vs. just affecting what context callbacks are entered with the first time).
How would noWrap work?
00:54
<littledan>
Like it cancels out a wrap that happens around it?
01:42
<Steve Hicks>
yes, something like that - aside from being able to observe callingContext, wrapping is idempotent, so nowrap would maybe cancel all wraps that happen around it? not quite clear - it's just one option.
02:26
<littledan>
It feels kinda vaguely non-compositional to cancel out wrap; maybe better to avoid wrapping in the first place. Weren’t we discussing options for that? (But then you run into the “no null context” issue also)
02:27
<littledan>
I am not being very clear with noncompositional… I guess I mean, if you replace f with n => f(n), then noWrap stops working
02:27
<Steve Hicks>
not necessarily - it depends on how it's implemented.
02:29
<Steve Hicks>
If the implementation is to explicitly restore the previous snapshot, then it's robust to n => f(n), though it makes wrap no longer idempotent (i.e. wrap(wrap(unwrap(f))) == wrap(f)) and there's potentially weirdness around unwrap(unwrap(f)) == f or some such...
02:30
<Steve Hicks>
or if it popped off a stack instead of just swapping in the previous then it really would undo one layer for each
02:30
<Steve Hicks>
but it would only work from the inside - unwrap(wrap(f)) == wrap(f), which is surprising
02:32
<Steve Hicks>
I don't love those properties... but I think any observability of callingContext probably breaks idempotency of wrap, so maybe there's no way to keep that one?
08:07
<littledan>
Huh, how would it figure out the previous one at runtime? Currently you can just use a single global variable for the current snapshot.
08:36
<Stephen Belanger>
Whould you mind expanding on the distinguishing of child-of and follow-from? All promise resolutions are scheduled asynchronously, it seems to me that in this case "child-of" would be meaning-less in promise's case

The resolution is async, but we create a span around the thing that created the promise, so the child-of or follows-from relationship should already have been established at this point. When calling an async function, the portion up to the first await is called synchronously, so if we choose to wrap that part in a span then that activity should be considered a child-of relationship. However, anything after the first await is schedule asynchronously and so should be considered a continuation and therefore a follows-from relationship.

async function foo() {
  console.log('1')
  await Promise.resolve()
  console.log('3')
}
foo()
console.log('2')
08:42
<littledan>
Do OTel implementations in any other language with async/await make this distinction properly? If so it would be great to be able to refer to their implementation strategies.
08:42
<Stephen Belanger>
where can I read more about that pile of hacks? (maybe you already wrote about it somewhere?)
There's not really any writing on it presently, but basically because we don't flow into promise continuations we need to keep an object at the top-level of the request which we push spans into and basically guess what the parent is because, unlike with callbacks, the parent does not flow into the continuation. This means we have to look at what has been pushed into that list in the past and try to guess what the parent would be. Sometimes it's simply the last span in the list, but not always, so we need to do some additional analysis to figure out what was the last span which could be a realistic parent, which requires we also hold a bunch of the span metadata. Holding all this metadata is expensive given that we could have hundreds of thousands of concurrent spans. What we want is for the correct context to just be able to flow through the resolve path so we only need to store the IDs and can entirely eliminate that big bag of data which lives for the whole life of the request.
08:45
<Stephen Belanger>
I hope it's not necessary either, but I can see how Ste{v,ph}en's points argue for it to be necessary. (I'm having trouble understanding what you're supposed to do with currentContext besides reconstruct this array)
We don't need the list of all context historically, only a way at any given snapshot restore to elect to keep the prior value which already flowed in and additionally to be able to have a way to configure stores to capture resolve time of promises rather than register time.
08:51
<Stephen Belanger>
The "don't bind around await" case feels a lot like "just use a global variable" to me. To my mind, automatic binding around await is basically one of the main purposes of AC, so if you're not doing that, what are you gaining? Is it just the question of restoring context after awaiting an already-done promise? Assuming all awaited promises were pending when the await kicked off, then a simple model seems like doing nothing would "just work" since the continuation would follow immediately afterwards, so the global value would be correct. Of course this breaks down when the await doesn't actually have to wait - and maybe also when two independent jobs are running interleaved?

So a global variable is obviously a bit too naive to work, but where I'm struggling is to understand how an AC.Variable with mutable state falls short? That seems like it would address the issue of interleaved independent tasks, at least, and since (within the same task) you're presumably not clobbering the actual container variable, then that part shouldn't need to change after an await (and if it did, it could presumably be communicated back upstream). ISTR Stephen mentioning performance concerns, but it's important to compare this against the performance cost of additional overhead needed to maintain a more complex graph, or more different kinds of state that need extra work to fork/join.
It's not that we need no binds for async/await, it's that we want it to bind at the resolve point of the promise being awaited rather than at registration time directly at the point the await occurs. This would mean context would flow through promises rather than around them. It would also mean the restore which occurs after the await resumes would be that resolve point value rather than what the value was before the await occurred.
08:53
<Stephen Belanger>
To Stephen's point about rewriting code from callbacks to promise/await, I agree that they should be consistent, but I'd argue that it's a transient issue. Callback-based code should be updated to wrap the callbacks so that the behavior is the same. I understand that the fact that ALS has existed for a while with the other behavior makes this a bit of a problem... Ideally the standard Node APIs would have all been updated to wrap/bind when ALS first came out, before anyone began depending on the non-wrapping behavior.
It's intentional that it behaves that way with callbacks. That's how context management is supposed to work as it's not useful for most cases if it doesn't flow into continuations. That's literally the point of the "async" in AsyncLocalStorage.
08:54
<Stephen Belanger>
I'm not even sure what use there would be for the other way with callbacks. I don't see any use case for that flow. 🤔
08:58
<Stephen Belanger>
Do OTel implementations in any other language with async/await make this distinction properly? If so it would be great to be able to refer to their implementation strategies.
Yes, basically every other language flows this way with OTel, but other languages also provide things like bytecode manipulation so it's able to do this externally. With Node.js there is no solution.
08:58
<littledan>
Yes, basically every other language flows this way with OTel, but other languages also provide things like bytecode manipulation so it's able to do this externally. With Node.js there is no solution.
Other languages work the way you are proposing we adopt for JS? That is great to know. Can you suggest a concrete one for us to look into?
08:59
<littledan>
Bytecode manipulation isn’t magic; I would be surprised if we can’t do what’s achieved with it
09:00
<Stephen Belanger>
Not sure. Like I said, most do this externally to the runtime itself so I'm unsure if any particular language is a good example of it.
09:02
<Chengzhong Wu>
handleRequest((req, res) => {
  // ROOT SPAN0
  connection.query('SELECT 1 + 1 AS two') // SPAN1
    .then(a => {
      // PROPOSED CONTEXT: SPAN1.2
      // If the context.variable at this point is SPAN1.2, how do we determine
      // child-of or follow-from?
      // As proposed this should be a follow-from SPAN1.3 instead of SPAN2
      return connection.query('SELECT 2 + 2 AS four') // SPAN2
    })
    .then(b => {
      // PROPOSED CONTEXT: SPAN3.2
      res.writeHead(200, { 'Content-Type': 'text/plain' })
      res.end(a[0].two + b[0].four)
    })
})

function connectionQuery(query) {
  // Current Context: SPANx
  return makeFirstRequest() // SPANx.1
    .then(() => {
      makeSecondRequest() // SPANx.2
    })
}

In the middle of a promise chain, it reads to me that the proposed solution needs to depends on additional cost to determine a child-of relationship now

09:02
<littledan>
That’s OK if it’s external, if it implements the right semantics I am still happy to have that as the reference point
09:02
<Stephen Belanger>
Bytecode manipulation isn’t magic; I would be surprised if we can’t do what’s achieved with it
Much of the code we need to patch to do that in JS is not reachable at runtime. With bytecode you can manipulate everything, even private things.
09:03
<littledan>
Much of the code we need to patch to do that in JS is not reachable at runtime. With bytecode you can manipulate everything, even private things.
Sounds good, so can a language runtime, which is what we are talking about changing. It would help us to have a concrete reference point even if it is external/messy
09:04
<Stephen Belanger>
handleRequest((req, res) => {
  // ROOT SPAN0
  connection.query('SELECT 1 + 1 AS two') // SPAN1
    .then(a => {
      // PROPOSED CONTEXT: SPAN1.2
      // If the context.variable at this point is SPAN1.2, how do we determine
      // child-of or follow-from?
      // As proposed this should be a follow-from SPAN1.3 instead of SPAN2
      return connection.query('SELECT 2 + 2 AS four') // SPAN2
    })
    .then(b => {
      // PROPOSED CONTEXT: SPAN3.2
      res.writeHead(200, { 'Content-Type': 'text/plain' })
      res.end(a[0].two + b[0].four)
    })
})

function connectionQuery(query) {
  // Current Context: SPANx
  return makeFirstRequest() // SPANx.1
    .then(() => {
      makeSecondRequest() // SPANx.2
    })
}

In the middle of a promise chain, it reads to me that the proposed solution needs to depends on additional cost to determine a child-of relationship now

What do you mean by PROPOSED CONTEXT: SPAN1.2? What I'm expecting is for the context there to be a continuation of SPAN1.
09:05
<Stephen Belanger>
Sounds good, so can a language runtime, which is what we are talking about changing. It would help us to have a concrete reference point even if it is external/messy
I'll have to talk to my team about that and see what examples I can get from them as I'm really not he expert on other languages. 😅
09:07
<Chengzhong Wu>

The example is equivalent to async-await as:

handleRequest(async (req, res) => {
  // ROOT SPAN0
  await connection.query('SELECT 1 + 1 AS two') // SPAN1
  // PROPOSED CONTEXT: SPAN1.2
  await connection.query('SELECT 2 + 2 AS four') // SPAN2
  // PROPOSED CONTEXT: SPAN2.2
  res.writeHead(200, { 'Content-Type': 'text/plain' })
  res.end(a[0].two + b[0].four)
})

Is this the proposed behavior in https://github.com/tc39/proposal-async-context/issues/83?

09:12
<Stephen Belanger>
Again, I'm unclear what the decimal point thing is about here?
09:13
<littledan>
I'll have to talk to my team about that and see what examples I can get from them as I'm really not he expert on other languages. 😅
Sounds good, maybe we can all learn from this then
09:14
<Chengzhong Wu>
Again, I'm unclear what the decimal point thing is about here?
Decimal points represent child-of relations, increased number represets a follow-of relation.
09:16
<Stephen Belanger>
Ah, no. Only the first is a child-of because it occurs within the sync initial part of the async function.
09:16
<Stephen Belanger>
The second should be follows-from.
09:18
<Stephen Belanger>
The connection.query(...) expression is called before its result is awaited, and it's the calling of that which produces a span. That span is therefore within the synchronous scope of the http server request span.
09:18
<Stephen Belanger>
If there was some other await before it, even one which did not produce a span, then it would be considered follows-from.
09:19
<Chengzhong Wu>
yeah, so in a promise chain, child-of is no longer effective (only the first of a promise chain is sync) and I'm curious about how you create a graph with nesting semantics.
09:21
<Stephen Belanger>
That would look the same. The creation of that initial promise produces a sync/child-of span, and then anything in following then(...) continuations would be follows-from of whichever thing in the chain last produced a span.
09:22
<Stephen Belanger>
So from a user code behaviour perspective, callbacks, promises, and async/await would all produce the same graph.
09:23
<Stephen Belanger>
And in the less common case where a continuation is attached far away from where the promise is created, a manual bind could be used where necessary to express that further away logical path, if it makes sense.
09:23
<Stephen Belanger>
But generally from APM perspective we basically never want that further away path.
09:23
<Stephen Belanger>
Meaning register time of a promise continuation.
09:24
<Stephen Belanger>
We want resolve time basically universally.
09:24
<Stephen Belanger>
As do basically all userland context flow use cases I'm aware of. I'm very unclear what the actual use case is for register time binding. 😕
09:25
<littledan>
Sometimes there is no particular resolve time context to apply, and we just need to fall back to register time
09:25
<littledan>
If it goes to a browser/os primitive
09:27
<littledan>
There’s the general connection pooling case, which generalizes this
09:28
<littledan>
Generally to get child-of edges, don’t you want restore after await? Sync is not enough.
09:29
<Stephen Belanger>
Resolve time naturally flows out to eventually promise-creation time--the thing you called which created a promise.
09:29
<littledan>
What do you mean “flows out to”?
09:30
<Stephen Belanger>
Generally to get child-of edges, don’t you want restore after await? Sync is not enough.
Child-of edges are easy. They are just things happening synchronously within the current scope, so we don't need anything for that. Not sure why restoring matters here? 🤔
09:32
<Stephen Belanger>
What do you mean “flows out to”?
I mean the context flows up to some execution barrier where the JS-side work to be done on the scheduling side is done. With a promise, this is generally when the resolve happens. But if it's a native promise or something like that then none of the resolving stage happens within JS so you just follow the flow back upward until you get to a point in the causal that was in JS, which is when the original call happened.
09:33
<littledan>
yeah, so in a promise chain, child-of is no longer effective (only the first of a promise chain is sync) and I'm curious about how you create a graph with nesting semantics.
So it seems like we have a disagreement in this chat about this statement. How can we look into it further? Maybe analyzing some concrete cases?
09:33
<Stephen Belanger>
Creation of a promise is just an allocation. It's not controlling flow at that point. The actual scheduling of a promise is the providing of a value to resolve it with.
09:36
<Andreu Botella>
I mean the context flows up to some execution barrier where the JS-side work to be done on the scheduling side is done. With a promise, this is generally when the resolve happens. But if it's a native promise or something like that then none of the resolving stage happens within JS so you just follow the flow back upward until you get to a point in the causal that was in JS, which is when the original call happened.
in JS code that creates a promise and then resolves it, wouldn't this only be the case if there are no AsyncContext.Snapshot.p.run along the promise resolution path?
09:37
<Stephen Belanger>
in JS code that creates a promise and then resolves it, wouldn't this only be the case if there are no AsyncContext.Snapshot.p.run along the promise resolution path?
Not sure what you mean by that. Do you have an example?
09:39
<Stephen Belanger>
From my perspective: context flows into the promise executor, it flows into anything that happens within that executor, it then captures whenever the resolve(...) is called as that is what actually schedules passing the value to continuations. If the context value is set outside the promise constructor it will use that context. If it gets set somewhere within the executor leading up to when it calls resolve(...) it will use that. It all flows naturally toward the resolve point where it schedules the value to pass to continuations.
09:40
<Andreu Botella>
so IIUC, you're saying that in cases like await someAPI(), the resolve time is a child of the context before the await
09:40
<Andreu Botella>

but that is not the case with things like

const somePreviousSnapshot = new AsyncContext.Snapshot();

function someAPI() {
  return new Promise(resolve => {
    // I don't know why someone might do this, but they *can* do it.
    somePreviousSnapshot.run(resolve);
  });
}
09:41
<Stephen Belanger>
Now you could look at it the other way of attaching a continuation is the scheduling and the resolve is a fulfillment of that scheduling, but the fulfillment is not triggered by that attaching of a continuation so I don't really feel like that's actually correct. Possibly a useful path to follow in some cases, though I'm unclear when.
09:43
<Stephen Belanger>
Yeah, so the expression which produces the promise which will be awaited occurs within that context, so the context flows into it naturally. The context of that outer function should reach into the promise and flow all the way to its resolve, unless some other context is set along that path. When the promise resolves the following execution is then a continuation of that resolution and should therefore flow the context which was captured at the point that path resolved.
09:44
<Stephen Belanger>
In your example there, you captured a snapshot and ran that around the resolve, so that changes the context which should come out of the await.
09:47
<Stephen Belanger>
This is why I feel flowing up to the resolve barrier makes a lot more sense, and then not binding around awaits at all, or maybe having some secondary flow that could be captured for those. If you follow the resolve path you will naturally have the context from the start of the async function flow through all the awaits and continue to be available, only changing if something in that causal path changed the context around what led to their resolve.
09:48
<Stephen Belanger>
Binding around awaits is chopping off branches of the execution graph, which I'm unclear why anyone would want to do that by default. 🤔
09:49
<Andreu Botella>

but in cases like this, wouldn't this mess up tracing?

async function someFunctionYouCareAboutTracing() {
	const response1 = await fetch(...);
	await someAPI();  // this empties out the context after the await!
	const response2 = await fetch(...);
	return await doSomethingWith(response1, response2);
}
09:49
<Stephen Belanger>
Like I get that there seems to be some use case people have in mind for why it makes sense for them to bind and restore around awaits, but I still have not seen any adequate explanation on why anyone would ever want that. 😕
09:50
<Stephen Belanger>

but in cases like this, wouldn't this mess up tracing?

async function someFunctionYouCareAboutTracing() {
	const response1 = await fetch(...);
	await someAPI();  // this empties out the context after the await!
	const response2 = await fetch(...);
	return await doSomethingWith(response1, response2);
}
What about that would mess it up? Is someApi() explicitly emptying the context for some reason?
09:51
<Andreu Botella>
I meant the definition of someApi() above, which is
09:51
<Stephen Belanger>
Ah, you mean it's restoring the snapshot captured previously?
09:52
<Stephen Belanger>
Yeah, that would return to whatever the context was which was captured in that snapshot.
09:52
<Stephen Belanger>
So if you're using a snapshot in that way then that is expected.
09:53
<Stephen Belanger>
And this is also why I would like to have per-instance bind because sometimes it makes sense globally like connection pools, but sometimes opinions may vary.
09:53
<Andreu Botella>
maybe this isn't something worth worrying about, but someone might be using snapshots to do things around their AsyncContext.Variables not realizing that that might affect the context in which their promise gets resolved, which would affect other use libraries that use AsyncContext
09:54
<Andreu Botella>
also, if you have an interaction of various libraries that use AsyncContext for their separate goals, they might interact in unforeseen ways
09:54
<Stephen Belanger>
But yes, I believe it is "correct" for that to break context. Or rather, it might be incorrect that it is doing that snapshot restore.
09:54
<Stephen Belanger>
maybe this isn't something worth worrying about, but someone might be using snapshots to do things around their AsyncContext.Variables not realizing that that might affect the context in which their promise gets resolved, which would affect other use libraries that use AsyncContext
Yep, this is why Node.js hates AsyncResource. 🙈
09:56
<Stephen Belanger>
So in Node.js, async_hooks made the mistake of producing a singular execution graph which is grafted with AsyncResource binds. AsyncLocalStorage is layered over it so it inherits that mistake. We want it to not be layered over that so we can fix that problem.
09:57
<Andreu Botella>
I'm not sure I understand. Even if we do have per-instance wrap, you would still need global wraps, right?
09:57
<Stephen Belanger>
APMs (unfortunately) fairly often use AsyncResource to "fix" context management flows, but then we break other vendors because we change the graph they were expecting. 😬
09:58
<Stephen Belanger>
Yes, global is still needed.
09:58
<Stephen Belanger>
Just not always.
09:58
<Stephen Belanger>
And this is specifically why I created the WindowChannel concept. So different consumers could make different decisions in a reasonably user-friendly way.
09:59
<Andreu Botella>
So my worry is about a (maybe small) library that might not realize they're supposed to be doing per-instance wrap rather than global wrap, since it works fine in their tests
10:00
<Andreu Botella>
Or cases where a library might have multiple variables and they use global wrap to handle all of them at once
10:00
<Stephen Belanger>
I mean, that will be a problem. But without instance-scoped bind it's still a problem, just without a solution.
10:01
<Andreu Botella>
right, I thought making per-instance wrap part of the API was a convenience thing, or perhaps a worry about performance, but it's part of the affordance of the API design
10:01
<Stephen Belanger>
People are going to use APIs wrong. We can't stop that. At best we can try to make it clear when they're doing it wrong.
10:02
<Stephen Belanger>
I mean, you can just do it with a store.get() and a closure that does store.run(...) around the orignal function. So it's easy enough to do externally. But I personally feel it should be part of the actual API, for clarity.
10:02
<Andreu Botella>
yeah, I see that now
10:03
<Stephen Belanger>
Well, that's not completely true if we have the sync vs continuation differentiation, but that's another thing.
10:04
<Stephen Belanger>
That's also why in my RFC I was explicitly defining separate windows.
10:07
<Stephen Belanger>
With separate windows you could make the store.run(...) be a continuationWindow.run(...) instead and key the child-of versus follows-from decision off that.
10:10
<Stephen Belanger>
I think separating those concepts from user perspective might be a bit too much complication though. I'm definitely one that would advocate for making it actively difficult to use APIs wrong. It can be challenging to do, but given a large enough pool of users you will for sure encounter a bunch which find some way to use it wrong. 😅
10:11
<Stephen Belanger>
The less footguns we can give users the better. 🙈
12:26
<Stephen Belanger>
If we have instance-scoped bind, should we give the instance and global versions different but related names to make the intent a bit more clear? Something like instance.bind(...) and Variable.bindAllVariables(...) or something like that?
13:04
<Chengzhong Wu>

This per-instance bind sounds like a requirement after the behavior is defined as flowing out to the outer scope. In the current form, this example would not be a problem:

async function someFunctionYouCareAboutTracing() {
	const response1 = await fetch(...);
	await someAPI();  // the current context will not be modified by this call
	const response2 = await fetch(...);
	return await doSomethingWith(response1, response2);
}
13:06
<Chengzhong Wu>
the problem also seems to be what we want to avoid as dynamic scope
13:23
<littledan>
APMs (unfortunately) fairly often use AsyncResource to "fix" context management flows, but then we break other vendors because we change the graph they were expecting. 😬
Do you have an example of this sort of fix, and the costs and benefits of it, that we can look into?