2024-06-03
[07:30:20.0506] <dminor>
Hi :) I'd like to bring up `implementation-contributed` again... After some discussions with Yulia and the other implementers, I've come back around to wanting to upstream our `non262` tests to test262. What I'm proposing is to upstream our non262 tests, in test262 format, but without the intention of reworking them to follow exactly what test262 does (e.g. testing a single part of the spec). Based on this, I don't think `staging` is the correct place.

The benefit to us is that we can simplify our testing, and work towards writing new tests in `staging` as opposed to increasing our corpus of `non262` tests. The benefit we see to the community is that we would increase the set of tests available to other implementations, including tests that test cross-cutting concerns, rather than individual bits of functionality.

[07:42:51.0430] <littledan>
Why wouldn't `staging` be a good place to upstream tests while not having the intention to rework them?

[07:44:35.0932] <littledan>
If the name is too evocative of eventually removing things, then we could make another directory which is set up the same way technically, which just removes that intentionality. I agree that there shouldn't be an expectation that tests which are shared like this are necessarily eventually reworked somehow or other--that's not the best use of people's time, and would limit the number of tests shared if they had to have that expectation.

[07:45:13.0092] <littledan>
(probably we should still remove the current contents of implementation-contributed, though, right?)

[07:45:44.0080] <littledan>
I had thought that folks already agreed on the plan that there is no expectation to rework these shared tests, so I'm having a little bit of trouble understanding dminor 's message.

[07:52:44.0333] <Ms2ger>
Fwiw: https://github.com/tc39/test262/compare/main...impl-sm

[09:01:34.0875] <dminor>
Looks like Ms2ger has already done the work :)

[09:03:10.0648] <dminor>
I have no strong opinion about where the tests go, I do think that the name `staging` suggests that the tests are intended to be moved into mainstream test262 at some point.

[09:03:32.0365] <dminor>
Whatever we pick, we should document the expectations around `staging`, if they aren't already.

[09:21:55.0049] <shu>
dminor: https://github.com/tc39/test262/blob/249657722525cc8e43b1ddb91f8df0b4b011fcf6/CONTRIBUTING.md#staging

[09:23:04.0659] <shu>
IIUC sounds like you'd like to change the "live temporarily" part, not anything before that

[09:28:46.0142] <dminor>
Yes, that sounds right to me. I guess it makes sense for me to open a PR and we can continue any discussion there.

[09:30:48.0956] <ptomato>
> <@littledan:matrix.org> (probably we should still remove the current contents of implementation-contributed, though, right?)

I was going to do this once we had something sensible to put there instead, but I wouldn't mind if someone just deleted it now. I don't think the current contents of i-c are runnable

[09:32:44.0735] <ptomato>
> <@shuyuguo:matrix.org> IIUC sounds like you'd like to change the "live temporarily" part, not anything before that

FWIW I think this was written with the assumption that it would be tests for new stage 2.7 / 3 features going into staging, and as it's documented elsewhere in that section that staging doesn't "count" towards the required coverage for stage 4. I don't think we anticipated a case like Spidermonkey's non262 tests at that time

[09:35:13.0235] <shu>
> <@pchimento:igalia.com> FWIW I think this was written with the assumption that it would be tests for new stage 2.7 / 3 features going into staging, and as it's documented elsewhere in that section that staging doesn't "count" towards the required coverage for stage 4. I don't think we anticipated a case like Spidermonkey's non262 tests at that time

agreed, this warrants a new consensus / discussion. i'm saying mechanically it should be the same as what's described in that section, except the "live temporarily" part

[09:38:23.0228] <dminor>
ptomato: do you have a preference between `staging` and new directory for this?

[09:55:42.0235] <ptomato>
Personally, I don't care that much 🙂

[09:56:20.0394] <ptomato>
i-c seems okay to me especially if we're going to put transformed tests from other engines there

[09:57:07.0517] <ptomato>
I wouldn't have a problem with staging, either. You're right that the name suggests temporary, though

[09:59:52.0020] <dminor>
I'll open a PR for i-c as that name seems closer to the intended use here. I'll open another PR to remove the current contents, as I believe we have consensus to move ahead with that.

[11:18:53.0983] <shu>
dminor: thanks for doing the work for getting more tests in the commons

[13:07:09.0669] <ljharb>
sorry i missed this here, before engaging on the PR

[13:07:33.0890] <ljharb>
i think that everything not in the main suite is temporary.

[13:07:48.0451] <ljharb>
 * i think that everything not in the main suite is temporary. it's just that the timeline may be long.

[13:17:33.0690] <shu>
like, finitely long but longer than our lifetimes? :)

[14:53:47.0146] <littledan>
Yeah it is important to be able to land shared tests somewhere while having literally no intent to ever do this work

[14:54:10.0605] <littledan>
Maybe someone else will decide to do it but that would be a separate decision 


2024-06-04
[09:55:09.0580] <ljharb>
i agree that lots of them might stay there forever. but it's still "temporary" in intention

[11:21:39.0327] <ptomato>
Sure, that works for me as well - as long as we don't commit ourselves as maintainers to do it

[11:22:57.0913] <ptomato>
I guess there's still a difference in intention though; what's been in staging so far has been coverage for pre-stage 4 stuff that needs to get migrated before stage 4.

[13:43:04.0104] <ljharb>
i mean the goal isn't to have an implementer not have to have local tests - it's to upstream tests that others can benefit from. and anything that applies across implementations probably should be in the main suite, right

[13:43:05.0643] <ljharb>
 * i mean the goal isn't to have an implementer not have to have local tests - it's to upstream tests that others can benefit from. and anything that applies across implementations probably should be in the main suite, right?

[13:49:07.0117] <shu>
> <@ljharb:matrix.org> i mean the goal isn't to have an implementer not have to have local tests - it's to upstream tests that others can benefit from. and anything that applies across implementations probably should be in the main suite, right?

i'd say yes, only if we expand the notion of what kind of tests the main suite ought to contain

[13:49:23.0308] <ljharb>
what are some examples of tests that wouldn't currently fit?

[13:50:46.0769] <shu>
currently the norm for a test262 test is that it tests a spec concept as isolated as possible, if preferable

to contrast, there are a lot of tests that engine implementers write that're testing interactions between N features that's hard to tease apart

[13:51:53.0582] <shu>
 * currently the norm for a test262 test is that it tests a spec concept as isolated as possible

to contrast, there are a lot of tests that engine implementers write that're testing interactions between N features that's hard to tease apart

[13:52:26.0723] <shu>
as a concrete example, engines do some form of conservative TDZ elision analysis in the parser

[13:54:17.0665] <shu>
if V8 writes a bunch of tests that covers its analysis, it seems helpful (or at least non-harmful outside of taking time) for other engines to run them too

[13:54:35.0055] <shu>
from the spec's POV a lot of these tests are going to be redundant as they test the same spec steps over and over

[13:54:36.0481] <Richard Gibson>
what would such a test look like in test262?

[13:55:05.0735] <shu>
you'd test that a certain program correctly throws / or not throw due to TDZ

[13:55:25.0893] <shu>
but there might be multiple tests that look like they're testing the exact same 5 steps of the spec that has to do with TDZ

[13:55:34.0454] <shu>
but might be testing how different productions get analyzed

[13:55:41.0796] <shu>
how would i organize those tests in test262?

[13:56:09.0842] <shu>
i would love to just be able to put those in the main suite, personally

[13:56:33.0742] <shu>
but i also know i don't care nearly enough about how the suite is organized as others

[13:59:54.0717] <shu>
put another way, if we broadly think of test262 tests as written so as to maximize spec step coverage, and engine tests as written so as to maximize engine code coverage, the latter often don't map cleanly onto the first. and if they don't, and the organization depends on this mapping, i think either we 

1. relax the organizational requirements of the main suite
2. put them somewhere else so the main suite can retain its existing organizational structure

[14:00:24.0595] <shu>
 * put another way, if we broadly think of test262 tests as written so as to maximize spec step coverage, and engine tests as written so as to maximize engine code coverage, the latter often don't map cleanly onto the former. and if they don't, and the organization depends on this mapping, i think either we

1. relax the organizational requirements of the main suite
2. put them somewhere else so the main suite can retain its existing organizational structure

[14:01:57.0233] <shu>
another concrete example: for resizable buffers, we wrote a lot of tests testing that the TA methods work on normal JS arrays with different kinds of contents (like, an array only containing doubles, or ints, since V8 has special optimizations for those)

[14:02:01.0374] <shu>
i think those are also valuable for other engines to run

[14:02:23.0372] <shu>
but the spec doesn't have a notion of a JS array containing only ints or doubles being anything special. it's just an array

[14:02:29.0634] <shu>
so how should i organize that?

[14:05:48.0600] <ljharb>
i very much agree these kinds of tests are valuable for other engines to run

[14:06:06.0434] <ljharb>
and thanks, that helps me see the ideological divide between test262 now, and a test262 that would contain these

[14:06:09.0153] <Richard Gibson>
I think that would look like the directories in https://github.com/tc39/test262/tree/main/test/language/block-scope

[14:06:29.0766] <Richard Gibson>
(as an ad-hoc improvement)

[14:14:25.0138] <shu>
fwiw i think there is also tremendous value in the "purity" of the existing way test262  is organized

[14:14:43.0415] <shu>
for new entrants it's valuable to know that it's just testing spec coverage

[14:21:13.0151] <littledan>
> <@ljharb:matrix.org> i agree that lots of them might stay there forever. but it's still "temporary" in intention

this is the part that we are disagreeing on

[14:21:49.0765] <littledan>
The intention. It seemed like the intention was the part that made dminor uncomfortable.

[14:25:21.0895] <littledan>
In particular I don’t want to return to the practice of “pruning” tests which are understood to be duplicates just because they refer to the same line of spec. This may reduce test coverage because it may hit different paths in implementations, and implementations sometimes repeat each other’s bugs, even for reasons that don’t correspond to a spec line

[14:26:11.0924] <littledan>
So if staging must have this “temporary” idea attached to it, that sounds like a reason to use a differently-named directory which avoids this implication.

[14:35:02.0159] <Richard Gibson>
 * I think that would look like the directories in https://github.com/tc39/test262/tree/main/test/language/block-scope and https://github.com/tc39/test262/tree/main/test/language/statements/const and https://github.com/tc39/test262/tree/main/test/language/statements/let , which do have multiple files covering the same TDZ requirements

[14:49:56.0867] <shu>
if we do put them somewhere other than the main suite i'd like the two suites to be messaged as siblings

[14:50:13.0070] <shu>
not that one is less rigorous or lesser than the other in any way


2024-06-05
[20:54:19.0489] <ljharb>
so the picture i'm getting is that the current main suite is for unique tests that test the spec. but, there's value in having another top-level suite for things that may not be unique, and test known code paths in one or more implementations?

[22:50:47.0064] <littledan>
Maybe that’s a way to consider it; I was thinking of, more openly, the parallel suite could be “tests that someone finds interesting”. I would not want to prune tests just because they are not demonstrating themselves to trace an implementation code path. The point is to be a good fit for automated two-way sync with no central code review. The main reason for someone else to delete a test is if it is incorrect.

[22:52:03.0052] <littledan>
(Maybe we are restricting “someone” to “some implementer”. And maybe “the test takes too long or otherwise is impractical for automation” could be another reason for removal.)

[15:07:39.0411] <ljharb>
btw pre-emptively cancelling next week's meeting, since it's plenary week


2024-06-06
[17:54:23.0154] <Richard Gibson>
I'd rather have an inline explanation/justification for tests that seem redundant than a separate parallel suite

[22:19:35.0334] <ljharb>
that could work too, depending on the ratio/quantity of them

[22:19:43.0189] <littledan>
> <@gibson042:matrix.org> I'd rather have an inline explanation/justification for tests that seem redundant than a separate parallel suite

This is the key issue: asking for all of these justifications leads to lower test coverage in practice.

[22:21:18.0886] <littledan>
And it will be less overhead to share the tests if we can do them in a separate directory within test262 as opposed to using a separate repo—using a directory in the shared repo will lead to the tests being run in more cases

[22:21:24.0272] <littledan>
> <@ljharb:matrix.org> that could work too, depending on the ratio/quantity of them

What do you mean?

[22:21:49.0720] <ljharb>
like, if there's a lot, then it'd be worse to have inline comments to have a parallel suite

[22:22:12.0718] <ljharb>
inline exceptions only make sense when it applies to a minority of cases

[22:23:25.0803] <littledan>
The question for us to answer is whether we oppose the proposal for various JS engines to implement two-way sync to share tests with literally no code review or documentation requirements. Bloomberg is funding Igalia to implement parts of this infrastructure, and Shu and dminor have expressed interest in making it part of their development processes.

[23:09:18.0668] <Richard Gibson>
> <@littledan:matrix.org> This is the key issue: asking for all of these justifications leads to lower test coverage in practice.

can you provide an example? Because I have already identified instances of that _not_ happening: https://matrix.to/#/!dGIsaFIukRWLdpurvh:matrix.org/$7CSIfN5Vw3P7TC8tkN1j_QrdfJXFt2XaiFt5aeWZU8Q?via=matrix.org&via=igalia.com&via=mozilla.org

[04:34:08.0371] <littledan>
you've identified cases where people have been able to land tests that exercised the same line of spec, but this is different from showing that this other potential source of tests never increases coverage. (I'm not going to dig up an example right now.)

[04:45:48.0719] <littledan>
But I wonder why Mozilla has its non262 directory; presumably some of the tests there provide some value

[05:18:23.0510] <Ms2ger>
😱🍇

[05:19:59.0835] <dminor>
It's largely historical reasons, from what I can tell. My preference would be for us to write new tests in `staging`, but we still see value in keeping the older tests around, and sharing them with other implementations. I think there's around 4000 of them, so inline comments is a non-starter for me for this use case.

[05:40:32.0543] <littledan>
right I think there's value in sharing tests

[06:33:36.0284] <Richard Gibson>
likewise

[06:35:22.0328] <littledan>
so, how do you feel about having a parallel suite, to support that goal?

[07:02:44.0977] <Richard Gibson>
opposed, with mild to moderate strength. As described above, I would rather satisfy that goal without a parallel suite (and to clarify, I consider inline explanation/justification for tests that seem redundant to be valuable but not necessary)

[08:18:22.0616] <littledan>
sorry, so how would you like to satisfy the goal?

[08:20:47.0746] <Richard Gibson>
by including tests that seem redundant inside the existing suite, ideally with inline explanation/justification for the apparent redundancy but not forcing that as a requirement (since we already have examples of this pattern without such explanations)

[08:28:29.0423] <dminor>
since we're exporting the tests by means of a script, I guess we could automatically add a comment indicating that these are historical (and potentially redundant) tests. but I'm not seeing the advantage of having these tests added to the main corpus as opposed to kept separate. could you expand on why you think this option is better?

[08:37:51.0755] <littledan>
Would be great to get reviews of this mega-PR: https://github.com/tc39/test262/pull/4103

[08:39:24.0612] <Richard Gibson>
then maybe I'm confused about the context, because **that** scenario seems to align with [`staging`](https://github.com/tc39/test262/blob/main/CONTRIBUTING.md#staging) as observed above by ljharb ("_tests that are subject to fewer requirements, in order to get tests running across more than one implementation as early as possible_", "_Implementations may designate a group of people who have permission to review and land tests in the staging folder_")

[08:40:29.0376] <littledan>
right, that's what we're talking about: a different folder inside the same repo (because breaking up into multiple repos would cause extra work for no benefit)

[08:40:40.0002] <littledan>
maybe you were thinking separate repo when we were talking about "suites"

[08:41:04.0107] <littledan>
maybe we'd put some in staging and some in implementation-contributed, but the "semantics" [when/how the tests run] would be the same