20:31
<Steve Hicks>
I had some personal issues come up this week and missed the session where we talked about yield*. But I was remiss because I really wanted to use the opportunity to get a sense of how the broader committee might feel about extending this to also include ordinary yield. I don't know if there's any opportunity to still bring it up in plenary at this point, or if not, I suppose I could reach out individually to folks who we think might be most opposed? At this point, I'm still convinced that dropping all the spec around generators (so that there's no context preservation around any yields, and yield just propagates through from the next() call) is the right thing to do - (1) 99% of the time it's irrelevant, (2) it's better performance to not need to do the extra swapping, (3) in the rare cases where it's not irrelevant, I'm pretty sure dispatch-context is the correct choice, (4) it's much less invasive to wrap it to get the other behavior, (5) generators are more like data than code - e.g. https://gist.github.com/jkrems/04a2b34fb9893e4c2b5c
20:47
<bakkot>
my understanding of the presentation is that it made yield* more like repeated yield, so I don't know what "extending this to also include ordinary yield" means
21:05
<Steve Hicks>
This is the opposite of my understanding. As I understand it, the previous situation was that yield* was basically just repeated yield, but the discussion this morning was to make it so that yield* would not restore the generator's internal/initial context but would instead pass on the next() calling context into the delegate.
21:10
<Justin Ridgewell>
for (const it of gen()) yield it will now behave differently than yield* gen()
21:10
<bakkot>
oh, sorry, you're right, I'm wrong
21:10
<Justin Ridgewell>
Only the inner gen() sees the difference, the wrapping generator does not.
21:11
<Steve Hicks>
Adding one more point to my earlier paragraph: (6) this would also restore consistency between yield and yield*
21:12
<bakkot>
I'm still not totally sure what "extending this to also include ordinary yield" means though
21:13
<bakkot>
now both behave as, snapshot context before the expression, restore it after
21:13
<bakkot>
(or else I am still misunderstanding)
21:14
<Steve Hicks>
ah, I see what you're saying
21:15
<Steve Hicks>
Yes, that part hasn't changed - yield* and yield both snapshot and restore around them within the body of the generator
21:16
<Steve Hicks>
In that sense, it's less of an extension. What I was referring to is that previously both versions prevented access to the next() calling context, whereas now yield* has access to it (if the delegate iterator is an ordinary object, rather than a generator), but yield doesn't expose that access at all (since there's no other next() call to delegate through).
21:17
<Steve Hicks>
What I'm talking about is dropping the snapshot-and-restore before/after yield behavior, which would expose the context next() was called in to the generator body directly
21:18
<bakkot>
that seems very wrong to me
21:18
<bakkot>
I guess I just don't buy the argument about generators being more data-like
21:19
<Mathieu Hofman>
Yeah I think the restore semantics inside generators are still the correct ones
21:21
<Mathieu Hofman>
There is still a general problem that some code is interested in the trigger context rather than the continuation context, but that's not specific to generators
21:22
<Jan Olaf Martin>
A question here would be: What are the situations where yield would meaningfully restore context that isn't already restored by the scheduling mechanism that calls next?
21:23
<Steve Hicks>
This is definitely still a general problem. My point is that generators will almost never actually observe any difference (since they're almost always iterated in the same context they're instantiated in) but in the cases when that's not the case, the simpler "don't do anything with context" behavior seems more likely to be correct - and as a bonus, it's a performance win in the vast majority of cases where it didn't matter in the first place.
21:24
<Steve Hicks>
Which is basically what Jan just said, in a few more words.
22:05
<bakkot>
Basically any time you're doing anything more complicated than just iterating the whole thing in one shot, I would think
22:06
<bakkot>

I guess this is not all that common, but e.g. I have this helper where I want a function to be sync when passed a sync callback and async when passed an async callback

function split(genFn) {
  function runSync(...args) {
    const it = genFn(...args);
    let next = it.next();
    while (true) {
      if (next.done) return next.value;
      next = it.next(next.value);
    }
  }

  async function runAsync(...args) {
    const it = genFn(...args);
    let next = it.next();
    while (true) {
      if (next.done) {
        return await next.value;
      }
      try {
        next = it.next(await next.value);
      } catch (err) {
        next = it.throw(err);
      }
    }
  }

  return { sync: runSync, async: runAsync };
}
22:06
<bakkot>
and I think it would be pretty weird if the async output of this helper did not work like the equivalent async function
22:10
<Justin Ridgewell>
They will be exactly equivalent in the current spec text, the new spec text, and with Steve's suggested change.
22:11
<bakkot>
Ah, because the async wrapper handles the state? Ok
22:11
<Justin Ridgewell>
The 1 edge case that would cause a difference is so niche that I don't want to write it, it will almost never come up
22:11
<Justin Ridgewell>
Exactly
22:11
<Justin Ridgewell>
The await next.value will do the capture for you
22:12
<bakkot>
Ok I'm a bit less worried about the proposed change then
22:13
<bakkot>
I can imagine e.g. having multiple async "workers" which are pulling from a generator as a source of tasks, where the generator wants to use the state when it was created not the state in each worker
22:13
<bakkot>
but that's also pretty niche
22:14
<Justin Ridgewell>
That can be handled very easily with Snapshot
22:15
<Justin Ridgewell>
And whether you would need that depends on if the workers are created outside the context you want to propagate anyways
22:15
<bakkot>
I mean, sure, you can always snapshot and restore around each yield manually, right?
22:16
<nicolo-ribaudo>
Fwiw I'm not against what Steve is proposing -- what I presented today is not my preferred solution but a compromise to push very slightly towards that direction.
22:16
<Justin Ridgewell>
No, not without accepting a snapshot as part of the gen's parameters
22:17
<nicolo-ribaudo>
Because the generator body already runs after the first .next() and not in the gen function call, right?
22:17
<bakkot>
IIRC parameter expressions run at the time the generator is called so you can do , snapshot = AsyncContext.snapshot() and not require the caller to pass it explicitly
22:17
<bakkot>
but that is kind of weird yes
22:17
<Justin Ridgewell>
Also because you can't wrap yield or await in a closure
22:18
<Justin Ridgewell>
You'd have to wrap the code before and after the yield/await, and break out of your snapshots everytime out wanted to yield/await a new value.
22:18
<Justin Ridgewell>
It's sooo cumbersome
22:28
<Jan Olaf Martin>
This sounds to me like the same issue you'd have if you had a "task factory function". And you could also argue for the exact opposite - that if it needs to do work in the context of the worker task, you maybe want that work to be tracked in that context. Definitely all very speculative and niche but it seems like that would be even harder if yield "forces" a context restore?
22:29
<bakkot>
I have a hard time imagining a case where a generator is the natural way to express the production of work but you want the body of the generator to be evaluated in the context of the worker tasks
22:30
<Justin Ridgewell>
An easy one is storing workerId in an Async.Variable
22:30
<Jan Olaf Martin>
and the produced work might want to close over a trace or span id
22:30
<Justin Ridgewell>
The producer could then read the worker's id (or trace/span/etc)
22:30
<Jan Olaf Martin>
and it should close over the trace of the worker task, not the creation of the queue/generator
22:31
<Jan Olaf Martin>
tbh, that feels more likely than the case where it needs to close over context from the generator creation. again, in the unlikely scenario that we're already in
22:31
<bakkot>
Why would the generator want the workerId?
22:32
<bakkot>
I would really naturally expect it to close over the creation of the queue
22:32
<Jan Olaf Martin>
because it might create values that do async work (e.g. run async functions and yield promises)
22:32
<bakkot>
but again why would those things want the context of the workers?
22:32
<bakkot>
the workers are not the thing which caused the work to be done
22:33
<Jan Olaf Martin>
because otherwise you wouldn't be able to trace IO to the worker task?
22:33
<bakkot>
the thing which made the generator is that thing
22:33
<bakkot>
... why would you want to trace IO to the worker task?
22:33
<bakkot>
Maybe you are imagining a different thing than I am?
22:34
<Justin Ridgewell>
We're just making hypotheticals, but I often deal with async logging that I'd like match
22:34
<Justin Ridgewell>
Eg, concurrent calls from worker1 and worker2, which are executing producer code with awaits in it
22:34
<Justin Ridgewell>
Without workerId.get(), the logs will appear jumbled: start start end end
22:35
<Justin Ridgewell>
With consumer side propagation, we could do console.log(workerId.get(), …) and match the sequential execution of both: w1:start w2:start w1:end w2:end
22:38
<Jan Olaf Martin>
Quick and dirty attempt from my side, very similar idea: https://gist.github.com/jkrems/9fa7fe92bdc2b8a4b1964c971882c6bd
22:42
<Jan Olaf Martin>
bakkot: Did you have an example on your end of when the context when entering the generator would be the more useful one (in a case where the two are actively different)? That might complete the picture :)
23:17
<bakkot>

I am imagining something like

let treeId = new AsyncContext.Variable();

async function fetchChildren(node) {
  console.log(`fetching children for node in tree from file ${treeId.get()}`);
  return thing;
}

async function* walk(node) {
  for (let child of await fetchChildren(node)) {
    yield child;
    if (child.hasChildren) {
      yield* walk(child);
    }
  }
}

let runner = new TaskRunner({ workers: 4 });

document.querySelector('#process').addEventListener('click', async () => {
  let file = document.querySelector('#file').files[0];
  let tree = parse(await file.bytes());
  treeId.run(file.name, () => {
    runner.add(walk(tree).map(console.log));
  });
});
23:19
<bakkot>
the fact that the TaskRunner is the thing coordinating work is not relevant to the generator or the things it's calling; it only care about the context which created the work
23:20
<bakkot>
(here I am imagining TaskRunner is something like, Promise.race reads from the first workers iterables, then read the next one, etc, looping back to the start when the end of the list is reached and removing an iterable once it is finished)
23:55
<Mathieu Hofman>

I guess this is not all that common, but e.g. I have this helper where I want a function to be sync when passed a sync callback and async when passed an async callback

function split(genFn) {
  function runSync(...args) {
    const it = genFn(...args);
    let next = it.next();
    while (true) {
      if (next.done) return next.value;
      next = it.next(next.value);
    }
  }

  async function runAsync(...args) {
    const it = genFn(...args);
    let next = it.next();
    while (true) {
      if (next.done) {
        return await next.value;
      }
      try {
        next = it.next(await next.value);
      } catch (err) {
        next = it.throw(err);
      }
    }
  }

  return { sync: runSync, async: runAsync };
}
We have the same thing in https://www.npmjs.com/package/@endo/trampoline