TC39 Structs and Shared Structs on 2024-07-18

16:53	<rbuckton>	With shu out, depending on who is in attendance today I'd like to spend some time discussing correlation. I have a rough sketch of a very simple correlation mechanism I've put together here: https://gist.github.com/rbuckton/b00ca9660fb888486da07b22e38dd1e9, though I'd like to hear more about other approaches.
16:57	<nicolo-ribaudo>	I'd also like to present my idea for re-using modules for correlation -- I have some drawings/images but unfortunately not something in written form
17:02	<littledan>	Meeting starting now, https://meet.google.com/kth-mssd-uqw
17:05	<rbuckton>	Since Shu is the host and is not present, I've created a new meet for this instance of the meeting: https://meet.google.com/iwo-weak-rfn
18:03	<rbuckton>	Gist about booststrapping a Worker: https://gist.github.com/rbuckton/08d020fc80da308ad3a1991384d4ff62
18:15	<rbuckton>	The point of the `shared struct S "identity-key"` syntax is that the key is statically known, which makes it unforgeable dynamically (outside of an evaluator). I mentioned CSP as it offers a way to set limits on dynamic evaluation, but we could also impose such limits without CSP by introducing opt-in mechanisms to enable correlation, just as I demonstrated with `new Worker(..., { correlate: true })`. We could, for example, forbid user-defined identities in `eval`/`new Function`/etc. by default and require some type of opt-in mechanism to enable it. While this is not as granular as, say, passing a capability token to each individual declaration, it does establish a trust boundary by requiring an explicit grant when running an evaluator.
18:16	<rbuckton>	We could, for example, forbid user-defined identities Or rather than forbid, we just don't correlate between the outside world and the evaluator.
18:19	<rbuckton>	In this model, rather than handing the capability to the `shared struct` declaration, you're handing the capability to the evaluator. If you need to execute or communicate with untrusted code, then you need to establish a trust boundary around it, and only grant the correlation capability to trusted code.
18:23	<Ashley Claymore>	Gist about booststrapping a Worker: https://gist.github.com/rbuckton/08d020fc80da308ad3a1991384d4ff62 Apps could also maybe do something similar to how React components can all add something to the `<head>` tag, collecting them all up as they are evaluated. And also Custom html elements. A library could provide a decorator which users can add to their structs, which collects them. And then the place that starts the worker can ask the library for the list of all decorated structs. I wonder if it should also be possible to register structs lazily to avoid having to import everything eagerly just in case they are used
18:55	<Mathieu Hofman>	So I was mistaken when I said it was fine to have a use once unforgeable token. At the end of the day if it's used as a key in a global/per realm registry, and the user code can sense whether that key has been used before or not, it becomes a global communication channel, which simply holding an immutable key value shouldn't enable (regardless of the forgeability of said value). Because the prototype registration is per realm, we cannot use a simple immutable value as correlation key where the user is in a position to provide a conflicting definition.
18:55	<Mathieu Hofman>	I think this observation may apply to the module source proposal as well, as technically a module source is considered an immutable "safe to share" value, but because it could be linked to different modules or in different evaluators/compartments, different evaluations of the module source in the same realm would result in different prototype behaviors for the same shared struct type.
18:55	<Mathieu Hofman>	Finally, a similar problem occurs with bundlers and string correlation tokens. Lets assume library "shared-awesomeness" is used by library "cool-helpers" and "nice-tools", and my app uses both. Even if both these libraries use the same version of "shared-awesomeness", the package manager could have installed separate copies, which would be evaluated separately. The correlation token would attempt to collapse the independent declarations, which would cause issues. Even if we don't fail the multiple definition, you would end up with one of the 2 definitions being ignored, which is a problem if there is any kind of shared state surrounding the definition.
18:58	<rbuckton>	Finally, a similar problem occurs with bundlers and string correlation tokens. Lets assume library "shared-awesomeness" is used by library "cool-helpers" and "nice-tools", and my app uses both. Even if both these libraries use the same version of "shared-awesomeness", the package manager could have installed separate copies, which would be evaluated separately. The correlation token would attempt to collapse the independent declarations, which would cause issues. Even if we don't fail the multiple definition, you would end up with one of the 2 definitions being ignored, which is a problem if there is any kind of shared state surrounding the definition. In this case I would say this means neither are valid, not one or the other. If the concern is detecting 1 vs 2+, that would require you to grant the permission to an evaluator for malicious code to use it, which is why you would want to isolate untrusted code behind a separate trust boundary (i.e., a shadow realm, `iframe`, etc.)
18:59	<Mathieu Hofman>	In the unforgeable token case, a way around this may be to reify the mutable aspect onto the object itself. E.g. having an exotic data property that exposes the currently registered prototype in the realm. It would make it clear the object is a direct "proxy" for the realm's registration of that type
19:00	<Mathieu Hofman>	what do you mean "neither" are valid. one declaration is evaluated before the other. When evaluating the first one, the engine is not in a position to know a second one is coming with the same token
19:11	<rbuckton>	Fair, but my point about detection remains. If you are evaluating untrusted code, you should put something between you and the untrusted code. If we require that a static identity must be laid down in an actual file, then untrusted code can't just produce new files on demand (if it can, you have far greater problems). If you want to allow an evaluator to correlate on a static identity, you must explicitly grant it the permission to do so. How I'd imagined this working is that whatever "registry" a Realm uses for this correlation is only passed down to child Realms (or evaluators) by an explicit grant. If not that is not provided, the child Realm/evaluator only gets its own "registry" (and we could theoretically also deny the ability for a child Realm/evaluator to have a registry at all). However, if you run your untrusted code in the same Realm, it would share your registry. Thus, you really want to be able to isolate untrusted code into a different Realm.
19:17	<global_lover>	hopefully there are notes?
19:32	<Mathieu Hofman>	Fair, but my point about detection remains. If you are evaluating untrusted code, you should put something between you and the untrusted code. If we require that a static identity must be laid down in an actual file, then untrusted code can't just produce new files on demand (if it can, you have far greater problems). If you want to allow an evaluator to correlate on a static identity, you must explicitly grant it the permission to do so. How I'd imagined this working is that whatever "registry" a Realm uses for this correlation is only passed down to child Realms (or evaluators) by an explicit grant. If not that is not provided, the child Realm/evaluator only gets its own "registry" (and we could theoretically also deny the ability for a child Realm/evaluator to have a registry at all). However, if you run your untrusted code in the same Realm, it would share your registry. Thus, you really want to be able to isolate untrusted code into a different Realm. I am really confused as to how the example I provided with "twin" libraries relates to untrusted eval. Are you suggesting that a bundler doesn't simply bundle but also modifies the shared struct declaration of each library installation to generate the token?
19:35	<shu>	why... would two different versions have the same correlation token?
19:36	<Mathieu Hofman>	2 different installs. Doesn't have to be different versions. Package managers do weird and complicated things.
19:37	<Mathieu Hofman>	To be clear, this "eval twin" problem is a major issue in the community today with class private fields, where separate installs don't recognize each other instances
19:37	<ljharb>	(it's a common occurrence; it's what peerDependencies are for)
19:37	<shu>	how is it solved for private fields?
19:38	<Mathieu Hofman>	The thing is that because we're tacking on a collapse mechanism on top of that, no we need to define what happens
19:38	<Mathieu Hofman>	how is it solved for private fields? It's not
19:38	<ljharb>	the "solution" is to use peer deps to force only one copy of the thing to be installed
19:38	<shu>	in the beginning one of my assumptions has been that this kind of correlation mechanism needs explicit handling by the tools
19:39	<Mathieu Hofman>	it's a major pain point today, and one reason some people consider private fields unacceptable
19:39	<shu>	it's somewhat heartening that it's an existing problem in that at least we won't be introducing a new one, and adding motivation to solve the existing problem as well
19:40	<shu>	for private fields is the challenge it's unclear if it's supposed to be collapsed?
19:41	<shu>	like, can bundlers do a byte-by-byte comparison of the two copies and collapse them? if not, why not?
19:43	<Mathieu Hofman>	if you collapse for private fields, you introduce other potential issues: If there is any used state in the surrounding scope of the declaration, you risk conflicts between "I recognize the private fields" vs "my surrounding scope doesn't know about this instance"
19:43	<shu>	i don't understand that, may need to see something concrete
19:44	<Mathieu Hofman>	because a byte for byte comparison of a definition is only safe if the definition is pure, aka doesn't close over any surrounding state
19:44	<shu>	how can a package close over different state?
19:45	<shu>	like, you can only close over stuff lexically enclosing you. this is two different copies of the same package, wouldn't they have the same enclosing lexical scope?
19:45	<Mathieu Hofman>	I was talking about the class / struct declaration, but you can extend that to the whole module if you want.
19:46	<shu>	right, i think you have to collapse at package granularity
19:46	<shu>	i don't see how a bundler can collapse at like... expression or statement granularity
19:46	<Mathieu Hofman>	the bindings of the module might resolve to different imports
19:46	<shu>	so how is it two copies of the same package? it's two different packages at that point
19:47	<shu>	in which case not recognizing each others' private fields seems working as intended
19:47	<shu>	in any case what's the issue to what ljharb was saying with peerDependencies? it's too unergonomic?
19:48	<Mathieu Hofman>	Correct, not recognizing private fields is what the spec intends, because it's different declarations, but it's not what the users intend, or understand. For them it's the "same" package
19:48	<shu>	i can't reconcile the user intent that it's the same module with "bindings of the module might resolve to different imports"
19:49	<shu>	i was understanding user intent of the same package to mean everything is the same, down to the environment chain and its contents, except it's evaluated twice
19:51	<Mathieu Hofman>	peer dependencies while the correct answer to be explicit about deduplication are fairly unergonomic, and not widely adopted
19:51	<shu>	is that an outreach issue or a fundamental one, do you think?
19:51	<shu>	anyway we can table this for a little bit later. were there notes / what's the upshot of the discussion today?
19:52	<Mathieu Hofman>	I'm not sure if anyone took notes.
19:53	<shu>	were there any conclusions or action items?
19:56	<Mathieu Hofman>	We explored a few promising options for correlation (module source, string base correlation token with opt-in, object based unforgeable token variation), but I think it's still early enough and everyone needs time to analyze the implications of each more
19:56	<shu>	got it, thank you
19:57	<shu>	off for a week, see you at the meeting
20:03	<rbuckton>	IMO, the current state re: private fields and dependency duplication is not only fine, its correct. Dependency deduplication is a concern for package managers. You can't always expect two versions of the same package to cooperate, and its far more than private state that is the problem. You also have weak maps, `instanceof` checks, duplicated initialization logic, etc.
20:22	<rbuckton>	I keep wondering if we're trying to solve the global communications channel concern at the wrong level. In a vanilla JS environment, communication channels abound because globals are mutable. You have to go out of your way to lock down an environment to prevent them from existing. It's a bad idea to run untrusted code in-process, much less without some form of isolation, be that via a `Worker`, an `iframe`, etc. It seems like what we really need is a formal isolation mechanism that lets us grant or deny capabilities to evaluators. If you really want to run untrusted code in-process, do so with a `Worker` or `ShadowRealm` and use that to establish a trust boundary and grant or deny capabilities: new Worker(..., { // requested capabilities cannot exceed current capabilities caps: { unsafe: true \| false, // allow `unsafe` code correlate: true \| false \| Array, // manually correlate shared structs eval: true \| false, // allow or disallow `eval`/`new Function` workers: true \| false, // allow or disallow creating child workers/agents realms: true \| false, // allow or disallow creating child realms foreign: true \| false, // allow or disallow access to objects from foreign realms // ... } });
20:54	<Mathieu Hofman>	We have successfully been using Compartments to run untrusted code. Separate global context and immutable intrinsics is all that is needed. Separate realms or even workers are way too heavyweight for the kind of separation we're interested in.
20:56	<Mathieu Hofman>	The spec currently has no observable internal mutable state (per realm, or per agent), and we don't want any to be introduced, at least not without some clear mitigations
20:59	<Mathieu Hofman>	We use direct `eval` and `with` to shim the separate global context of compartments. This is used for example by Lavamoat to isolate npm packages, and restrict the capabilities they have access to. It works, and it's used in production systems.
21:37	<Ashley Claymore>	the "solution" is to use peer deps to force only one copy of the thing to be installed Still need to get everyone to agree on the version range. And the wider the range the harder it is to test and assert that it actually works with that range. I wouldn't classify it as a solved problem
21:52	<ljharb>	it's solved, it's just not easy. there's lots of ways to use a CI matrix and continually test on all supported versions of a peer dep - it's just that most authors aren't fully diligent about it.