05:14
<Steve Hicks>
The bad actor changing your value is only a problem if you explicitly give them the store and let them do that. If you just keep your stores private this is not a real problem.
I don't think that's right. Simply restoring a global snapshot is enough. If the subtask you're awaiting resolves its promise in a context that didn't derive from the one that was active when you called it, then you've lost your state even though it didn't have access to your variable.
05:24
<Steve Hicks>
So I would say the generally encouraged way to do binds should be instance-scoped by default and global bind should only ever be a "Are you sure you know what you're doing?" type of API for the power-user cases like module authors making sure their resource pool will not leak implementation details that would never be relevant to user code execution flow. Pool mechanisms I would say are almost universally okay to bind globally, but almost every other scenario is a matter of opinion and should (at least in my opinion) probably not bind at all by default and always follow that path through internals because otherwise you end up with these strange flows like with async/await not flowing through awaits the way most users seem to expect.
I found this initially surprising, but I'm wondering if this actually makes more sense under the "flow through await" scenario. With preserve-around-await semantics and default registration-time binding, you end up needing a bunch of global binds just to get reasonable behavior. Again, my axiom here is that application code and library/framework code shouldn't need to be aware of each other's variables in order to do the right thing. With preserve-around-await, global binds seems like generally the thing you need to uphold that axiom. But with flow-through-await, I can imagine that maybe that's no longer required, provided you're not somehow picking up promises that came from vastly different contexts, which seems generally unlikely to happen in most common situations.
08:15
<Stephen Belanger>
I don't think that's right. Simply restoring a global snapshot is enough. If the subtask you're awaiting resolves its promise in a context that didn't derive from the one that was active when you called it, then you've lost your state even though it didn't have access to your variable.

Well, yes, global snapshots are bad, which is why they should be discouraged except when absolutely necessary. If you flow through rather than around things then it's generally most advisable to actually bind-per store to only do graph reductions where needed whereas trying to bind around things all over the place takes way more binds and is often inescapable.

It's a lot better to just let things flow through by default and then provide some additional tools to reduce the graph where necessary. In general cases this is just the bind method, but for awaits it might make sense to have a store option to make it auto-bind on awaits or something like that. I personally feel it makes a lot more sense for await binds to be an option rather than a default we need to find a way out of, but I don't care too much either way, so long as the tools can do what is needed in a reasonable way.

10:58
<Chengzhong Wu>
I think the confusion in the issue was caused by the difference of AsyncLocalStorage behavior around the timing of enabling promise hooks. Enabling promise hooks before and after a promise creation/resolution would change how AsyncLocalStorage propagates a value is a problem on either semantics.
11:28
<Stephen Belanger>
It's not just the timing of enabling PromiseHook. It is that the portion of an async function before the first await behaves differently from the rest of the function because the context capture will have already happened around the outer await before passing over the inner awaits. It's confusing to users that the context is not flowing out.
11:29
<Stephen Belanger>
Regardless of timing inconsistencies, the user is clearly expecting that they should be able to retrieve the context set after an await within the inner async function from the outer async function (or module scope, in this case).
11:44
<littledan>
Wait, the init hook isn’t flowing out, it is a third thing
12:06
<littledan>
why are global snapshots bad?
12:06
<Chengzhong Wu>
It's not just the timing of enabling PromiseHook. It is that the portion of an async function before the first await behaves differently from the rest of the function because the context capture will have already happened around the outer await before passing over the inner awaits. It's confusing to users that the context is not flowing out.
The proposal as it current stands is maintaining the lexical consistent value inside a async function body across await for AsyncContext.Variable, which is part of the approach aiming to address the dynamic scope concerns from the previous meeting's feedback.
12:10
<littledan>
It happens to use enterWith(...) rather than run(...), but the confusion around expected flow is effectively the same.
What? I think "restore after await" would equally fix that bug (though agree that flows-out works here as well)
12:10
<Stephen Belanger>
why are global snapshots bad?
They're not bad exactly, just should not be the default tool people reach for as it means they are influencing the graphs of every store which, more often than not (in my experience with ALS users), is not actually what you want.
12:11
<Stephen Belanger>
There should be clearly communicated difference between "This is a universally applicable binding point." and "I want my context to flow in this way."
12:13
<Stephen Belanger>
The proposal as it current stands is maintaining the lexical consistent value inside a async function body across await for AsyncContext.Variable, which is part of the approach aiming to address the dynamic scope concerns from the previous meeting's feedback.
Yes, and I still don't understand why we are going for lexical scope given that, as far as I've seen, everyone is expecting it to follow execution flow.
12:13
<Stephen Belanger>
I don't know what the concerns were with dynamic scope. Is information that captured somewhere?
12:14
<littledan>
Also, I'd really appreciate if people reconsidered much of what James M Snell was was saying in https://github.com/nodejs/node/issues/46262. As far as people that understand the issues of context flow, he's one of very few others I'd trust to understand this stuff, having done a bunch of work on the Cloudflare equivalent of AsyncLocalStorage.
Yeah, James was working closely with us when he filed that issue, and the current spec reflects what he is proposing with respect to rejection. I may have misunderstood his comments on resolving (as you can see in the thread); I guess we should get back in touch to understand his point of view better.
12:15
<Stephen Belanger>
What? I think "restore after await" would equally fix that bug (though agree that flows-out works here as well)
It is doing "restore after await" as that is how PromiseHook works. That's precisely the problem here is that the user is assuming the context value will flow out of the awaited promises into the scope in which it was awaited.
12:15
<littledan>
we are definitely not doing lexical scope in general. There are definitely cases where some other relevant snapshot is used.
12:15
<littledan>
people have suggested things like "how about all callbacks just automatically close over the context" and this is broken and not what is proposed.
12:16
<Stephen Belanger>
Yeah, it seems like the flow is entirely reasonable with callbacks. I just don't understand why logically equivalent flows with promises and async/await don't flow in the same way. That seems extremely confusing to me.
12:16
<Stephen Belanger>
people have suggested things like "how about all callbacks just automatically close over the context" and this is broken and not what is proposed.
Broken in what way?
12:17
<littledan>
well, as you've been saying, sometimes it loses important information
12:17
<Stephen Belanger>
Any callback for an async task should flow context through it.
12:17
<littledan>
but also sometimes the registration context is the most relevant one (often it's the only possible one)
12:18
<Stephen Belanger>
But to be clear I mean through and not around, which is what binding does rather than just capturing at the edges.
12:19
<Stephen Belanger>
In Node.js we intentionally push capturing to as close to the edges as possible, meaning binding on the internal AsyncWrap type callbacks, so context flows through all the internals to get there.
12:19
<Stephen Belanger>
We could have bound the user callbacks directly instead, but then we would miss all that internal behaviour which may be relevant.
12:20
<littledan>

I've asked two questions that I don't yet know the answer to:

  • How are we handling merges, given that both contexts are often relevant (and in general this forms a big tree--there are more than two relevant things)?
  • How could per-variable handling scale, when you have a lot of different libraries and a lot of different variables and they're all supposed to fit together? It's easier for me to understand "two classes of variables".
12:20
<Stephen Belanger>
but also sometimes the registration context is the most relevant one (often it's the only possible one)
I have never found that with ALS. The appropriate place to do "resolve" path does get pushed a bit out sometimes though. Like if the resolve is in C++ internals somewhere then it gets pushed back to whatever call ran the C++ code which led to that resolve.
12:21
<littledan>
I have never found that with ALS. The appropriate place to do "resolve" path does get pushed a bit out sometimes though. Like if the resolve is in C++ internals somewhere then it gets pushed back to whatever call ran the C++ code which led to that resolve.
For example, in setTimeout, there's nothing to do but restore the snapshot from when it was previously called
12:21
<Stephen Belanger>
Everything always has some cause somewhere.
12:22
<littledan>
sure, the cause is often the registration time
12:22
<littledan>
so we're not necessarily disagreeing on the semantics
12:22
<littledan>
you wouldn't want internals of the scheduler to be understood as the cause, though--that would scramble information
12:22
<Stephen Belanger>
It's not that setTimeout needs to be registration time. It's that registration and scheduling time happen at the same time in some case. But it's not the registering of a callback which you actually care about, it's the path that led to it being called.
12:23
<littledan>
It's not that setTimeout needs to be registration time. It's that registration and scheduling time happen at the same time in some case. But it's not the registering of a callback which you actually care about, it's the path that led to it being called.
sure, yes I agree these are the semantics we are going for. As I've been saying, the "registration time vs call time" framing is not helpful. Instead, it's more like the organizing principle is, "preserve relevant information".
12:24
<littledan>
which sometimes means "registration time" in the terms we've been using
12:24
<littledan>
let's think about better ways to call this; there's just no debate about what setTimeout does
12:25
<Andreu Botella>
the cases where this distinction matters are cases where the "scheduling time" is at the start of the agent before any JS code can run – in which case the agent is necessarily empty, and necessarily preserves no relevant information
12:25
<littledan>
we should not be in the business of running anything with an empty context ever IMO. That would preclude many important context use cases.
12:26
<Stephen Belanger>

Yep, so I've found the way to do that effectively is to look specifically at where synchronous execution stops and tasks are stored to return to later in some way. In Node.js we create handle objects and, generally, the creation of those objects captures context and the callback is registered at that same point.

The reason we capture at this specific point though is because we don't have native context. If we did then the actual correct point would be where it queues the kernel task and should flow all the way up to that point and then capture context to restore when the kernel task comes back. For complete context flow you want to be as close to those edges as possible.

12:27
<Andreu Botella>
we should not be in the business of running anything with an empty context ever IMO. That would preclude many important context use cases.
I assume you mean we shouldn't be running anything that could not possibly have any data flow from the initial execution of the script or later, right?
12:27
<Andreu Botella>
because the initial execution will have an empty context, and that context might flow to places
12:28
<Stephen Belanger>
Everything in the process has a cause and therefore should always have a parent to flow from. The top-level flows from the fact of the process starting, which maybe doesn't have a "value" per-se, but I think that is probably the clearest expression of never having an "empty" context.
12:28
<littledan>
because the initial execution will have an empty context, and that context might flow to places
oh, sure. to make what I'm saying concrete, I mean: the host should be able to put some internal variables in that initial context, and then be able to rely on their presence; JS itself should never be creating an empty context.
12:29
<littledan>
Everything in the process has a cause and therefore should always have a parent to flow from. The top-level flows from the fact of the process starting, which maybe doesn't have a "value" per-se, but I think that is probably the clearest expression of never having an "empty" context.
cool, sounds like we're all agreeing, "JS shouldn't create empty contexts". This doesn't resolve the "two causes" point which your document so clearly lays out.
12:31
<Stephen Belanger>
Context is basically just a mirror of the directed acyclic graph expressing the flow of the code of the entire process. That then has reductions applied to carve paths around any irrelevant implementation details. Some things are universally irrelevant, like you'll want microtasks to flow context in a fairly universal way, but as things get to higher levels of abstraction from the instructions running on the cpu the more path decisions become subjective.
12:32
<littledan>

I've asked two questions that I don't yet know the answer to:

  • How are we handling merges, given that both contexts are often relevant (and in general this forms a big tree--there are more than two relevant things)?
  • How could per-variable handling scale, when you have a lot of different libraries and a lot of different variables and they're all supposed to fit together? It's easier for me to understand "two classes of variables".
still looking forward to answers to these questions
12:32
<Stephen Belanger>
cool, sounds like we're all agreeing, "JS shouldn't create empty contexts". This doesn't resolve the "two causes" point which your document so clearly lays out.
Are you referring to call versus continuation context in my doc?
12:32
<littledan>
yes
12:32
<littledan>
(I think)
12:32
<Stephen Belanger>
Ah, I've said a few times that particular point is far less critical than the ability to flow through awaits.
12:33
<Stephen Belanger>
The call versus continuation context thing is basically just differentiating between the sync code within a store.run(...) from anything it has propagated that context into.
12:33
<littledan>
in the terms you set up, it sounds like you're saying, "we should switch from call to continuation when it comes to await"
12:34
<littledan>
and what I'm missing is the understanding of why that's not a meaningful loss of information (given that it's useful for establishing a certain sense of parentage)
12:34
<Stephen Belanger>
We create a span and store it in the context with store.run(span, ...). If inside the function given to that we then immediately create another span because some other instrumented function was called synchronously, we would consider that a child-of relationship and link that accordingly. But if it happened within an async continuation then it is a follows-from relationship. So we differentiate the sync code from the continuations.
12:35
<Stephen Belanger>
We can do that easily enough by just wrapping the store runs with a global flag like this: https://gist.github.com/Qard/6ceaca8bb792679e82c7693513baee0e/#file-span-js-L66-L70
12:36
<Andreu Botella>

wait, if we go with resolution context, and we don't change how the current API works otherwise, we'd run into this. Is this what we want?

await asyncVar.run("foo", async () => {
    await doSomethingHere();
});

console.log(asyncVar.get()) // "foo"
12:37
<Stephen Belanger>
So it's not critical that the API does the whole two paths thing. Like I've said before, what is in my doc is essentially the ideal from APM perspective, but quite possibly has things which shouldn't actually exist at language level and should just be done externally. But we're aiming for a balance that anything we need to do externally is as low-cost as possible, because what we have right now is very expensive.
12:38
<Stephen Belanger>

wait, if we go with resolution context, and we don't change how the current API works otherwise, we'd run into this. Is this what we want?

await asyncVar.run("foo", async () => {
    await doSomethingHere();
});

console.log(asyncVar.get()) // "foo"
Yes. That is actually what users keep telling me they are expecting.
12:38
<Andreu Botella>
Yes. That is actually what users keep telling me they are expecting.
that is a very different behavior from the sync case
12:38
<littledan>
I know about the expense of PromiseHooks and patching all the libraries (which we all definitely want to resolve); what kind of expense is related to this issue that you're talking about now?
12:39
<Stephen Belanger>
The issue posted earlier is a user expecting that, and I get people making the same complaint about ALS someone or other every other week or so.
12:39
<littledan>
I'm pretty sure if we'll get a bunch of complaints in the other direction if you change it; this is just a difficult feature to have any intuition for
12:39
<littledan>
(that's not an argument one way or another0
12:40
<Stephen Belanger>
I know about the expense of PromiseHooks and patching all the libraries (which we all definitely want to resolve); what kind of expense is related to this issue that you're talking about now?
Just the general expense of all the stuff we have to do. Like I pointed out before that we presently need to keep a list of all spans in-memory from the lifetime of the whole request as we don't know how to do follows-from relationships otherwise.
12:40
<Andreu Botella>
function syncCase() {
    asyncVar.run("foo", () => {
        doSomethingHere()
    });

    console.log(asyncVar.get());  // undefined
}

async function asyncCase() {
    await asyncVar.run("foo", async () => {
        await doSomethingHere()
    });

    console.log(asyncVar.get());  // "foo"
}
12:41
<Stephen Belanger>
that is a very different behavior from the sync case
It is, but it's identical to how the same code rewritten with callbacks flows, and that's the confusion users are running into. We keep seeing people rewriting their apps with promises and then being confused when the context doesn't flow the way they're used to anymore.
12:41
<littledan>
Just the general expense of all the stuff we have to do. Like I pointed out before that we presently need to keep a list of all spans in-memory from the lifetime of the whole request as we don't know how to do follows-from relationships otherwise.
oh right, this part. I don't get it, can't you offload that processing to happen later? or this would require a change in the Otel protocol?
12:42
<Stephen Belanger>
It would require a change in the OTel spec which they've basically said unequivocally no to ever changing. 😐️
12:42
<Andreu Botella>
It is, but it's identical to how the same code rewritten with callbacks flows, and that's the confusion users are running into. We keep seeing people rewriting their apps with promises and then being confused when the context doesn't flow the way they're used to anymore.
sure, people rewriting code from callbacks to promises might prefer resolution time, but what about people writing async/await code from scratch? Do we know what they need? Would we be breaking them?
12:43
<littledan>
It is, but it's identical to how the same code rewritten with callbacks flows, and that's the confusion users are running into. We keep seeing people rewriting their apps with promises and then being confused when the context doesn't flow the way they're used to anymore.
is this always the form of this sort of migration, or do people also have this confusion with greenfield promise apps?
12:43
<Stephen Belanger>
Well, as I said the other day, if we follow resolution path that follows a causation path from the creation of the promise so we still actually get the same context flow unless someone goes and changes the context in the middle.
12:43
<littledan>
It would require a change in the OTel spec which they've basically said unequivocally no to ever changing. 😐️
Can you say more about the interactions with upstream OTel folks? Has anyone offered to make an efficient open-source collector to actually take advantage of this protocol improvement?
12:44
<Andreu Botella>
in web APIs there are cases where you can get a promise from somewhere else, that wasn't created in a data flow starting from the current function
12:45
<Stephen Belanger>
is this always the form of this sort of migration, or do people also have this confusion with greenfield promise apps?
I see the same confusion with people doing this with new projects too. Moreso from conversions, but also our customers tend to be enterprises that have ancient codebases they've been maintaining forever, so there's a clear bias to who we're talking to regularly.
12:45
<littledan>
and all customer complaints are private? Can you talk with any of them and ask them if they'd be willing to be in touch with us?
12:46
<Stephen Belanger>
Can you say more about the interactions with upstream OTel folks? Has anyone offered to make an efficient open-source collector to actually take advantage of this protocol improvement?
Can't be done in the collector because the data is needed in-process for distributed tracing headers wherever outgoing activity may occur.
12:48
<Stephen Belanger>
in web APIs there are cases where you can get a promise from somewhere else, that wasn't created in a data flow starting from the current function
True, but then it wasn't caused by that function so probably should still be keeping the registration context. And in any case this is what bind is for--in exceptional cases a user (or possibly even the runtime sometimes) can decide to apply a bind to capture the context at a different point.
12:48
<littledan>
Can't be done in the collector because the data is needed in-process for distributed tracing headers wherever outgoing activity may occur.
maybe I'm using the wrong terms, but it sounds like you had some idea for an improvement which was rejected. What was the idea, and how was it proposed?
12:49
<Andreu Botella>
how would you bind a promise?
12:49
<Stephen Belanger>
and all customer complaints are private? Can you talk with any of them and ask them if they'd be willing to be in touch with us?
Yes, I'm trying to talk with some of them right now. Getting time from enterprises to talk about things like this can be a bit time-consuming so we'll see how that goes...
12:49
<littledan>
Yes, I'm trying to talk with some of them right now. Getting time from enterprises to talk about things like this can be a bit time-consuming so we'll see how that goes...
thanks, I appreciate that
12:51
<Stephen Belanger>
maybe I'm using the wrong terms, but it sounds like you had some idea for an improvement which was rejected. What was the idea, and how was it proposed?
I've had some conversations with them about the current shape of their context management system. It's not multi-tenant and instead carries everything in one big immutably-cloned map tree thing, which adapts very poorly to AsyncLocalStorage, but they refuse to change it because spec says it should work this certain way and seem to believe they got everything right on the first try so nothing can ever change. 😐️
12:51
<Stephen Belanger>
The API is super convoluted.
12:52
<Stephen Belanger>
The context is a map, which you make clones of every time you attempt to modify it, but modifying it doesn't change what the active value is, it just changes what's in your copy. Then you can eventually activate it and it will give you a token to deactivate it at some later point.
12:53
<littledan>
...that sounds analogous semantically to what we have
12:53
<Stephen Belanger>
Sort of...but inside out.
12:54
<littledan>
where can I learn more about their context system and the discussion around changing it?
12:54
<Stephen Belanger>
You pass around the contexts yourself rather than the system managing it for you.
12:54
<Andreu Botella>
True, but then it wasn't caused by that function so probably should still be keeping the registration context. And in any case this is what bind is for--in exceptional cases a user (or possibly even the runtime sometimes) can decide to apply a bind to capture the context at a different point.
The APIs I'm thinking of are things like https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamDefaultReader/closed, or https://developer.mozilla.org/en-US/docs/Web/API/FontFaceSet/ready. I don't see how those could possibly made to work with resolution time, unless every user knows to work around them
12:57
<Stephen Belanger>

The ReadableStream closed thing seems like it would flow from whatever caused the close. Likely the final interaction, or if none is visible than just the creation of the stream itself.

The second seems to me like clearly the context would just be the root context as it causally flows from the document loading, as far as I can tell?

12:58
<Andreu Botella>
regardless of where they flow from, it's almost certainly not from the function in which they're awaited
12:58
<Andreu Botella>
so wouldn't that lose the context before the await?
12:59
<Stephen Belanger>
Probably, but that's what bind is for. 🀷
13:00
<littledan>
You pass around the contexts yourself rather than the system managing it for you.
sure, so, what's the problem in practice with this? does it actually differ in what can be expressed?
13:00
<Stephen Belanger>
Could also possibly do some kind of special-casing that things which would flow from somewhere that doesn't have a context value could auto-bind differently or something like that.
13:00
<littledan>
Probably, but that's what bind is for. 🀷
we should anticipate that most people don't know how to use bind most of the time and just won't use it, IMO.
13:01
<Stephen Belanger>
It's single-tenant and you pass the entire map around. It's basically like if only AsyncContext.Snapshot existed and AsyncContext.Variable did not.
13:01
<littledan>
OK, so OTel just uses one variable, that seems fine to me
13:02
<littledan>
I'm not yet understanding where things mismatch
13:02
<littledan>
did you ever explain this mismatch to them in writing?
13:02
<Stephen Belanger>
It's not that it's just one variable. It's multiple variables, but they're all living in a single map you have to pass around yourself.
13:03
<Stephen Belanger>
did you ever explain this mismatch to them in writing?
In Slack conversations.
13:04
<Stephen Belanger>
The problem is in the mutable cloning thing. You're basically passing around many clones and you "activate" a particular clone at some point in time, so it's essentially like you're passing around a bunch of these snapshots all over the place manually, but still doing the same step of making it the "active" value it has stored globally.
13:06
<littledan>
The problem is in the mutable cloning thing. You're basically passing around many clones and you "activate" a particular clone at some point in time, so it's essentially like you're passing around a bunch of these snapshots all over the place manually, but still doing the same step of making it the "active" value it has stored globally.
I don't see how that differs meaningfully from what we have
13:06
<Stephen Belanger>
Rather than holding values separately it stores them all together in this giant map and then lets you pass it around yourself, which means you have to be sure the flow is correct for all the types of data within the store or you can cause problems.
13:07
<shaylew>
we should anticipate that most people don't know how to use bind most of the time and just won't use it, IMO.
(that or only kind of know how to use bind, and end up using it defensively because they don't know which promises they can't await without breaking their context... meaning they break anything that relies on the causal-but-not-scoped propagation out of an await)
13:07
<littledan>
(that or only kind of know how to use bind, and end up using it defensively because they don't know which promises they can't await without breaking their context... meaning they break anything that relies on the causal-but-not-scoped propagation out of an await)
yes it would be horrible if people felt like they had to use bind defensively everywhere... what would the point be
13:09
<Stephen Belanger>
I don't see how that differs meaningfully from what we have
The main difference is the activation token thing. You don't replace the context, you activate any given context and get a token back when you do which you can call another API at some later point to deactivate it.
13:10
<Stephen Belanger>
It doesn't have any sort of store.run(...) scope, it just sets it for linear time and you manage it yourself.
13:10
<littledan>
so are there any usages of this API which are not "well-balanced"?
13:11
<littledan>
also at what level does this API exist? Is it just conceptual, or is it an actual thing in code which must be implemented per spec?
13:11
<Stephen Belanger>
It's an actual thing in code, and it's left to the user so there is most certainly misuses of it in the wild.
13:14
<littledan>
I see. Where is this implemented?
13:20
<James M Snell>
Sorry I missed a large chunk of this conversation but I can expand a bit more on the await/resolve/reject issue. The key problem is sometimes what you really want is *both*. That is, sometimes after a promise settles, what you want is whatever the context was before you started waiting for it (resolve or reject), and sometimes what you want is whatever the context was when it was settled (resolve or reject). And sometimes you might actually want both at the same time
13:21
<littledan>
Sorry I missed a large chunk of this conversation but I can expand a bit more on the await/resolve/reject issue. The key problem is sometimes what you really want is *both*. That is, sometimes after a promise settles, what you want is whatever the context was before you started waiting for it (resolve or reject), and sometimes what you want is whatever the context was when it was settled (resolve or reject). And sometimes you might actually want both at the same time
Yeah, I can see how both are relevant. But also, you need to be right there to "catch" both of them before the next context inheritance happens (and presumably you'd choose one of them at that point), right?
13:24
<Stephen Belanger>
I see. Where is this implemented?
All over the place, sadly. It starts here and then gets "activated" all over the place in instrumentation code or user code in various ways.
13:25
<Stephen Belanger>
It's concept of "multi-tenancy" is just using symbol keys into the one big map.
13:25
<James M Snell>
Well, the choosing bit is difficult. Users don't always have the info the decide which to choose
13:26
<Stephen Belanger>
Yeah, the choosing part is specifically why I created the diagnostics channel integration.
13:26
<Stephen Belanger>
And have been pushing for something similar with my universal context management RFC.
13:27
<Stephen Belanger>
With WindowChannel.
13:28
<Stephen Belanger>
I'm referring to the docs in this comment specifically.
13:29
<littledan>
So it's not critical that the API does the whole two paths thing. Like I've said before, what is in my doc is essentially the ideal from APM perspective, but quite possibly has things which shouldn't actually exist at language level and should just be done externally. But we're aiming for a balance that anything we need to do externally is as low-cost as possible, because what we have right now is very expensive.
James M Snell: would be great to get your take on this comment, outlining a possible policy to deal with the two contexts
13:30
<shaylew>
so are there any usages of this API which are not "well-balanced"?
(See the warnings about rust tracing::Span in async code https://docs.rs/tracing/latest/tracing/span/struct.Span.html#in-asynchronous-code for some consequences of this sort of design in the wild -- but they're better off because Futures are expressive enough to write Future::instrument without needing built-in help)
13:33
<Stephen Belanger>
James M Snell: would be great to get your take on this comment, outlining a possible policy to deal with the two contexts
These are two different issues. What James was referring to is the difference between register versus resolve path. What I was referring to was not a path thing at all but rather just the different points in context flow of the sync function a value is initially set for and the continuations that value flows into.
13:35
<Stephen Belanger>
There is a function given to store.run(...) which runs synchronously. Any async tasks scheduled within that function capture the value, but it's a different form of the value in that it was explicitly set for that scope not propagated to that scope.
13:38
<James M Snell>
I think I'll have to digest the conversation here a bit more to comment adequately. Give me a bit of time as my morning is just starting here and I'm finishing up a fairly busy on call week with a few remaining tasks. Should be able to jump in again later this morning
14:10
<James M Snell>
There's a lot in this thread to catch up on. Stephen Belanger can you indulge me with a bit of a tl;dr on your second issue here... the "What I was referring to..." part. Just want to make sure I'm groking what you're saying before I respond
14:17
<Stephen Belanger>

So I've been working on a plan for Tracing Channel splitting it internally into two separate "WindowChannel" types. One represents the call and the other represents the continuation. The same concept seemed at least relevant here so I wanted to share the doc.

From OpenTelemetry perspective we have two types of span relationships: child-of and follows-from for sync and async children respectively. To tell when the span in a context is sync or async we can split the store.run(...) into a "call window" around the sync phase of running the function given to it and the then separate "continuation windows" around each continuation. This describes the context behaviour at the two points a bit differently so "call window" describes it as a currently executing context to express that the wrapped function is still executing and should treat nested things as child-of related, and a "continuation window" describes it as a propagated context to express that the original task function ran at some point in the past and the value for it in the context is now representing that the task is done and things may now want to link to it as a follows-from relationship.

14:20
<Stephen Belanger>
This could easily enough be done by just wrapping the store.run(...) to set a global flag around the given function to tell it if its currently running or not, but it keeps getting brought up for some reason even though I've been trying to explain a few times now that I'm not really concerned about that particular design idea but rather about the flowing around awaits rather than through them issue.
14:21
<Stephen Belanger>
Intuitively people are expecting context of promise code to flow similarly to equivalent callback code, but this is not the case as we bind to register time rather than resolve.
14:23
<James M Snell>
I want to make sure I'm understanding the distinction correctly between "flowing around" vs "flowing through" awaits. Do you have a quick code example to illustrate it. I think I know what you're saying just need to make sure
14:24
<James M Snell>
even if just pseudocode
14:33
<James M Snell>
And I'm sorry if you aleady did provide that example earlier... there's a lot of convo ^^ previously so difficult to find what was already discussed
14:49
<Stephen Belanger>
const store = new AsyncLocalStorage()

async function foo() {
  store.getStore() // 2
}

async function main() {
  await store.run(2, foo)
  store.getStore() // 1 or 2?
}

store.run(1, main)
14:49
<Stephen Belanger>
So in this example, 1 is flowing around and 2 is flowing through.
14:51
<Stephen Belanger>

Comparatively, here's the equivalent with callbacks:

const store = new AsyncLocalStorage()

function foo(cb) {
  store.getStore() // 2
  setImmediate(cb)
}

function main() {
  store.run(2, foo, () => {
    store.getStore() // clearly should be 2
  })
}

store.run(1, main)
14:54
<Stephen Belanger>
Flowing around is essentially just local variable scope which, to me, doesn't really seem to have much value. I've asked a few times for an explanation or some use case why anyone would want that flow but have not yet got any answer on that. πŸ€”
14:55
<Andreu Botella>
Flowing around is essentially just local variable scope which, to me, doesn't really seem to have much value. I've asked a few times for an explanation or some use case why anyone would want that flow but have not yet got any answer on that. πŸ€”
it would be lexical scope within a function, but it propagates across function calls in a way that lexical scope doesn't
14:55
<littledan>
Flowing around is essentially just local variable scope which, to me, doesn't really seem to have much value. I've asked a few times for an explanation or some use case why anyone would want that flow but have not yet got any answer on that. πŸ€”
this is a straw man, no one is proposing fully lexically scoped semantics, but rather sticking with how AsyncLocalStorage currently works
14:57
<Stephen Belanger>
this is a straw man, no one is proposing fully lexically scoped semantics, but rather sticking with how AsyncLocalStorage currently works
I'm aware this is how AsyncLocalStorage presently works, and I have also stated that a bunch of times that how it works now should not be taken as any suggestion of correctness as it mostly certainly is not and has numerous well-known issues which have been discussed at length in the past.
14:58
<Stephen Belanger>
it would be lexical scope within a function, but it propagates across function calls in a way that lexical scope doesn't
Yep, I understand how it flows, I just don't understand why anyone would ever want it to flow in that particular way. As I said, I still have not had any specific use case explained to me for why we elected to flow it in that way.
14:58
<James M Snell>
Got it. and yeah, that's what I suspected you meant :-) ... the answer is: whether it should be 1 or 2 depends on the storage cell is being used for. That is, In some cases what you want is 1 and others what you want is 2 . Even in your second example there using the callback, there are use cases where store.getStore() within the callback should not "clearly be 2".
14:58
<Stephen Belanger>
In the case of AsyncLocalStorage, it just flows that way because that's just what we were actually able to do with the limited tools and resources we had available when building it.
14:59
<Stephen Belanger>
But it's most certainly an imperfect API.
14:59
<littledan>
Got it. and yeah, that's what I suspected you meant :-) ... the answer is: whether it should be 1 or 2 depends on the storage cell is being used for. That is, In some cases what you want is 1 and others what you want is 2 . Even in your second example there using the callback, there are use cases where store.getStore() within the callback should not "clearly be 2".
One vague idea I've had is, should there be two kinds of variables, one which leans in one direction and one in the other? (sometimes there is only one choice though)
15:00
<littledan>
don't you think it's kinda curious that, after ALS was implemented, the V8 team separately made CPED for another couple use cases that Chrome had, and the semantics basically coincide?
15:00
<Stephen Belanger>
Yeah, I don't have a specific answer on what such a thing should look like, but there have been discussions like the parameter versus return flow thing, but I don't think that's quite right as we need it to flow through which means both ways. (At least for the tracing use case)
15:00
<James M Snell>
I've been stewing over that also... whether there should be two types of variables... haven't landed on anything concrete yet. Alternatively one variable with two storage cells is an option. Unfortunately the DX gets a bit muddled and complicated
15:01
<Stephen Belanger>
My thinking was a per-store configuration or something to select which types of flows you want in a few specific cases.
15:01
<James M Snell>
by two storage cells I mean something like... (a) is the context value going in, (b) is the context value going out, both are accessible
15:02
<littledan>
by two storage cells I mean something like... (a) is the context value going in, (b) is the context value going out, both are accessible
that only works if you can run code in enough places (e.g., you'd want to inject logic into Promise.all, I think)
15:02
<Stephen Belanger>
That or just building out the tools so we can not only reduce the graph with binds but expand the graph back to the original paths with the inverse. (We've called that callingContext so far, but it's still very much up for debate.)
15:05
<James M Snell>
that only works if you can run code in enough places (e.g., you'd want to inject logic into Promise.all, I think)
Yep. gets complicated very quickly
15:05
<littledan>
That or just building out the tools so we can not only reduce the graph with binds but expand the graph back to the original paths with the inverse. (We've called that callingContext so far, but it's still very much up for debate.)
I don't get it, where does the logic live to enable that?
15:05
<Stephen Belanger>
The ideal I've been aiming for separately is having both context management and diagnostics channel landing together with integration between the two and using the WindowChannel concept to handle the multiple paths problem, but clearly that's not really an option here where we have AsyncContext already at Stage 2 and nothing for diagnostics channel yet in TC39. (or Web Perf, as Dan had suggested previously.)
15:07
<Stephen Belanger>
I don't get it, where does the logic live to enable that?
Where the context is swapped by a snapshot.run(...), we can just capture the context that had already flowed to that point and make that accessible in some way.
15:08
<Andreu Botella>
In the case of AsyncLocalStorage, it just flows that way because that's just what we were actually able to do with the limited tools and resources we had available when building it.
I'm surely missing some stuff here, but V8 gives you a kResolve promise hook, so wouldn't that give you the tools to flow the context through promise resolution?
15:12
<Stephen Belanger>
It's been a bit, but there was some complications with it being layered over async_hooks and async_hooks not flowing that way.
15:13
<littledan>
Where the context is swapped by a snapshot.run(...), we can just capture the context that had already flowed to that point and make that accessible in some way.
yeah so this "just" part is a little vague for me. I guess Zones let you have a callback run at this point, and we've very deliberately avoided that.
15:13
<littledan>
It's been a bit, but there was some complications with it being layered over async_hooks and async_hooks not flowing that way.
it'd be interesting to go into the history of async_hooks to understand why they made that decision (which matches CPED)
15:14
<Stephen Belanger>
Because async_hooks is not about flow, it's about resource lifetimes.
15:14
<Stephen Belanger>
It just happened to be that you could hack together a context manager of sorts on top of its events.
15:15
<littledan>
I thought context managers were always a primary use case of async_hooks
15:16
<Stephen Belanger>
Definitely not, no.
15:17
<Stephen Belanger>
Some adjustments were made to make it a bit less crashy when trying to do that, but it was very much about resource tracking.
15:19
<Stephen Belanger>
async_hooks was created and then years later core finally accepted that context management was actually a reasonable thing to want.
15:36
<James M Snell>

I think the two types of variables makes sense... but maybe with two distinct behaviors.. For this example I'm going to change up the terms a bit from what is currently spec'd and known. What the spec currently calls AsyncContext.Variable, I'm going to call AsyncContext.Scope instead... it's a thing that can be entered with a specific value. Then, what I'm calling AsyncContext.Variable in this example is a thing whose value can be set (similar to enterWith but that cannot be explicitly entered... the value is just retained as the context is propagated.

const store1 = new AsyncContext.Scope
const store2 = new AsyncContext.Variable

async function doSomething() {
  store2.set(`abc${store1.get()}`);
}

await Promise.all([
  store1.run(1, async () => { await doSomething(); console.log(store2.get()); // 'abc1' }),
  store1.run(2, async () => { await doSomething(); console.log(store2.get()); // 'abc2' }),
]);
console.log(store2.get()); // undefined ?

The key point is store2 here is not a context that can be entered but only a value that can be set at any point during a flow such that the value propagates through the flow

16:01
<James M Snell>
With such an approach, an AsyncContext.Scope is really just a wrapper around AsyncContext.Variable that sets the value on enter and unsets it on exit, while the AsyncContext.Variable is just the actual stored value and caries it through the async flow. In this example, store1 would have a .run(...) and a .get() but no .set(...) ... while store2 would have a .get() and .set(...) but no .run(...)
16:07
<James M Snell>
or put another way, calling .run(...) on a AsyncContext.Scope creates and enters a new async context scope by copying the current map and setting the new value for the specific cell... while calling .set() on the AsyncContext.Variable only modifies the value for that cell in the current context scope map (basically splitting out ALS's concept of enterWith() ... which I'm not super excited about but ....)
16:08
<littledan>
I imagine that tracing would want to have one Variable and one Scope, right?
16:08
<littledan>
both are useful
16:08
<littledan>
actually I'm confused, I don't really understand why Variable is based on .set and not also .run
16:09
<littledan>
wouldn't .run be sufficient?
16:10
<James M Snell>
It could be... I think the difference comes down to being when I call run(123, () => {}), I expect the value 123 to only be accessible from within that callback passed and anything subordinate to that... not necessarily preserved once that scope exits
16:11
<James M Snell>
where set(...) mutates the value in place without creating that subordinate scope at all..... I guess this is me just admitting to myself finally that there are valid use cases for ALS enterWith(...)
16:17
<littledan>
oh I see, set makes more sense semantically
16:18
<littledan>
Honestly I have to agree with others who pointed out that some in TC39 will see this version to be "not well behaved"
16:19
<James M Snell>
I don't disagree
16:19
<littledan>
then we get into a sort of political cost/benefit analysis... not where I'd like to be with language design
16:19
<James M Snell>
I just have to admit the use cases are valid and there are many places where We Need Both .... makes things way more complicated tho
16:24
<James M Snell>
this is why I think having two separate types makes sense tho, rather than the ALS way of smooshing things together. Having a distinct type of storage cell that explicitly mutates the current context makes the semantics quite a bit cleaner
16:25
<littledan>
what if we designed both of these, but could be OK with them achieving different levels/timelines of standardization?
16:25
<James M Snell>
I'd be good with that
16:25
<littledan>
I want to encourage you to not use AsyncContext.Variable for something different, though; let's choose a different name, just to keep the discussions less confusing
16:26
<littledan>
at least as a working title
16:26
<James M Snell>
heh, noted :-)
16:27
<James M Snell>
I'm not just sure Variable is the right term to use semantically for something that can't be mutated in place once set ;-) ... so it's a bit weird to use that for the "entering a scope with this value" case
16:27
<James M Snell>
but definitely important not to add even more confusion ;-)
16:27
<littledan>
sure, but that's a separate argument
16:27
<littledan>
and we had a sort of extensive bikeshed there, so to reopen it, you'd first want to understand the matrix of everyone else's opinions...
16:28
<Chengzhong Wu>
Yes, name bikeshed could be re-opened if there is an extension to the proposal
17:29
<James M Snell>
async_hooks was created and then years later core finally accepted that context management was actually a reasonable thing to want.
... and at the time async_hooks and promise_hooks were the only option to implementing something reasonable... despite the numerous obvious warts
19:01
<shaylew>

I think the two types of variables makes sense... but maybe with two distinct behaviors.. For this example I'm going to change up the terms a bit from what is currently spec'd and known. What the spec currently calls AsyncContext.Variable, I'm going to call AsyncContext.Scope instead... it's a thing that can be entered with a specific value. Then, what I'm calling AsyncContext.Variable in this example is a thing whose value can be set (similar to enterWith but that cannot be explicitly entered... the value is just retained as the context is propagated.

const store1 = new AsyncContext.Scope
const store2 = new AsyncContext.Variable

async function doSomething() {
  store2.set(`abc${store1.get()}`);
}

await Promise.all([
  store1.run(1, async () => { await doSomething(); console.log(store2.get()); // 'abc1' }),
  store1.run(2, async () => { await doSomething(); console.log(store2.get()); // 'abc2' }),
]);
console.log(store2.get()); // undefined ?

The key point is store2 here is not a context that can be entered but only a value that can be set at any point during a flow such that the value propagates through the flow

I haven't really looked at what languages with mutable dynamically bound variables do (I was mostly on the pure-ish functional side of PLT) but this looks an awful lot like something that would have been invented before under that name and if you're lucky the design space has been explored a nonzero amount
19:24
<littledan>
my experience with mutable dynamically bound variables in Factor was negative. It can be surprising how much or little the write to the variable is propagated up/backwards and then read by someone else. .run has a simple answer to this question: it isn't propagated up/backwards.
19:31
<James M Snell>
Yeah, I really wanted us to be able to get rid of the enterwith API pattern as it's not great at all
19:41
<littledan>
like we would have issues where a scope was introduced with a different variable in mind, and then it made an unrelated set act differently
19:41
<littledan>
so we ended up rediscovering that setting a dynamically scoped variable was an anti-pattern, and we moved towards doing .run instead
19:57
<Andreu Botella>
so we ended up rediscovering that setting a dynamically scoped variable was an anti-pattern, and we moved towards doing .run instead
even with .set(), I don't think this would be dynamic scoping, at least not in the sense that https://github.com/tc39/proposal-async-context/blob/master/SCOPING.md discusses
19:57
<Andreu Botella>
since you would still need to have access to the object in order to call .set()
19:58
<Andreu Botella>
but I don't have a background in PLT
19:59
<Andreu Botella>
so I might be missing stuff
20:04
<Andreu Botella>
also, how would this interact with events? Should Scope (i.e. the current Variable) always use the registration context and Variable (the one with .set()) always the originating context?
20:05
<Andreu Botella>
(does that even make sense)
20:44
<James M Snell>
good question... I don't know yet. going to have to stew on the semantics for a bit
20:50
<littledan>
btw Factor dynamic scope was definitely "restore after await" -- the language actually uses cooperatively scheduled coroutines, and it just felt natural to restore all the dynamically scoped variables after returning from yield. We didn't have anything for observability, though.
20:50
<littledan>
(Factor was a toy, no need to take real lessons from it)
21:29
<Steve Hicks>

I've asked two questions that I don't yet know the answer to:

  • How are we handling merges, given that both contexts are often relevant (and in general this forms a big tree--there are more than two relevant things)?
  • How could per-variable handling scale, when you have a lot of different libraries and a lot of different variables and they're all supposed to fit together? It's easier for me to understand "two classes of variables".

For merges, I think the flow-around branch is available (if you want it) via Snapshot.wrap() (or Variable.wrap, etc). If the flow-through branch is the default then we can allow the caller to opt into flow-around with wrap, then that seems pretty reasonable. For octopus merges (e.g. Promise.all) you should have access to the individual promises, so you can dig them up to await them individually if you want to examine their resolution contexts (assuming those resolutoin contexts are still tied to the promises). As a default, I think the all promise should maintain the most-recently-resolved parent promise context.

I have similar concerns as you do with per-variable handling. I'm really big on my ZK axiom: different components need to be able to do the right thing with Zero Knowledge of the vars used by any other components. I think the only way for this to be tenable is if the default flow just does the right thing - the instant anyone needs to manually bind anything, you're already in danger.

21:55
<Steve Hicks>
The APIs I'm thinking of are things like https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamDefaultReader/closed, or https://developer.mozilla.org/en-US/docs/Web/API/FontFaceSet/ready. I don't see how those could possibly made to work with resolution time, unless every user knows to work around them
Thinking outside the box, I wonder if there's any reasonable way to spec these as somehow "transparent", so that awaiting them wouldn't impact the current environment? This seems somewhat related in my mind to the case of a shared cache. If you've got an initial request that kicks off a computation it may make sense to include the whole computation time in the tracing, but if a different request joins the ongoing computation, it shouldn't be traced as a normal complete operation, and it certainly shouldn't clobber the second request's context.
22:02
<Steve Hicks>
what if we designed both of these, but could be OK with them achieving different levels/timelines of standardization?
I want to add a word of caution about polyfillability - even if we standardize incrementally, we should have this in mind such that a polyfill for the second feature could hopefully be built using native versions of the first, rather than starting from scratch a second time. There's some brainstorming about having some built-in standard async vars, e.g. for cancellation (which, incidentally, probably wants flow-around "scope" semantics, rather than the flow-throw "mutable variable" semantics) - depending on how these would work, it could actually be really bad to have to wipe out the native implementation to polyfill additional features on it if there might be builtins that would do unspoofable brand checks on something being a real var vs a polyfill imitation.
22:08
<Andreu Botella>
I'm not sure that one of the features would be polyfillable based on the other, brand checks or no
22:11
<Steve Hicks>
I'm very intrigued by the Scope/Variable idea and need to think through the question of how they interact - it seems a little bit odd that entering/exiting one Scope X would somehow affect the lifetime of an unrelated Variable Y.
22:12
<Steve Hicks>
(reminds me a bit of how \let and \global\let interact in TeX, which is pretty confusing)
22:12
<littledan>
yeah I think these are each fundamentally different and currently inexpressible primitives
22:12
<littledan>
I was not imagining that they would be polyfillable
22:13
<littledan>
I'm still having trouble understanding the "flow-around semantics are available if you want it" argument--do we actually expect people to use these APIs that way?
22:13
<littledan>
what should the thought process of a normal developer be, when they are trying to decide whether to Snapshot.wrap a promise that they're awaiting?
22:13
<littledan>
I was imagining that it's just expert library authors who maintain connection pools and such who would use these APIs
22:14
<littledan>
this is not an argument that we must use flow-around semantics, I'm just talking about that one part of the story
22:15
<Steve Hicks>
I think the answer is that we really can't rely on any "normal" developers having any idea how to wrap things correctly, so we need to design it to "do the right thing" as much as possible
22:19
<Steve Hicks>
and i worry that either default (flow-around vs flow-through) is going to require opt-outs in normal code... so maybe per-store configurability is the only viable option
22:20
<littledan>
and i worry that either default (flow-around vs flow-through) is going to require opt-outs in normal code... so maybe per-store configurability is the only viable option
maybe if we could somehow survey existing code and figure out when it does or should do an opt out, for which variables, and we can derive patterns for this?
22:21
<littledan>
I'm not even if per-store configurability (if it's a global default to push things in the call/resolve-time vs registration-time direction in the cases where both are available) will be a solution; sometimes you want the registration-time ancestor of the call-time ancestor
22:55
<Steve Hicks>

I think the two types of variables makes sense... but maybe with two distinct behaviors.. For this example I'm going to change up the terms a bit from what is currently spec'd and known. What the spec currently calls AsyncContext.Variable, I'm going to call AsyncContext.Scope instead... it's a thing that can be entered with a specific value. Then, what I'm calling AsyncContext.Variable in this example is a thing whose value can be set (similar to enterWith but that cannot be explicitly entered... the value is just retained as the context is propagated.

const store1 = new AsyncContext.Scope
const store2 = new AsyncContext.Variable

async function doSomething() {
  store2.set(`abc${store1.get()}`);
}

await Promise.all([
  store1.run(1, async () => { await doSomething(); console.log(store2.get()); // 'abc1' }),
  store1.run(2, async () => { await doSomething(); console.log(store2.get()); // 'abc2' }),
]);
console.log(store2.get()); // undefined ?

The key point is store2 here is not a context that can be entered but only a value that can be set at any point during a flow such that the value propagates through the flow

One concern I have with the mutable variable idea is how one is supposed to bound the mutation - I think it's common to make a change with a lifetime, which is why we've gravitated toward the run() encapsulation. As you've pointed out, with the flow-through semantics, run is very clearly the wrong name, and set is much more appropriate. But your example also hints at some mechanism whereby changes to the variable would still somehow go out of scope? I think that's important, but it seems like it would lead to lots of confusion. If some other scope is entered internally, would that inner scope accidentally prune off a branch of var mutations? That also doesn't seem right.