| 14:03 | <bakkot> | neat https://github.com/v8/agents/tree/main/extensions/ecma262_state_machine |
| 17:03 | <snek> | this is what engine262 does heh |
| 17:24 | <Daniel Dyryl> | Hi. I am a community member, working on Stage 0 proposals JSON.parseBinary and ArrayBuffer.prototype.detach. Matteo Collina at Fastify has given positive feedback, Daniel Minor at V8 suggested I bring them here to find a potential champion, readme on GitHub of TC39 mentions this specific Matrix channel. Use-cases: receiving payload by servers (not only http), receiving payload by clients, jsonwebtokens, filesystem manipulations, memory-constrained environments, high-throughput network communication. Most of the applications out there. Links to the repositories with Open-Source benchmarks and examples: |
| 17:30 | <eemeli> | On json-parse-binary, I find it surprising that the explainer does not appear to make any reference to the {type: 'json'} import attribute, or Fetch's Response.json() method. Is that intentional? |
| 17:30 | <bakkot> | I have a hard time understanding what problem json-parse-binary is solving. The only times I ever have a byte array containing JSON are when reading from the network or disk. From the network there's already .json(), and from disk there's type: 'json' imports. So... what's the remaining use case? |
| 17:33 | <bakkot> | for .detach, I think if you are in a situation where you are allocating and detaching a bunch of arraybuffers, something has already gone wrong. You'd generally be doing encodeInto or setFromBase64 or similar, to avoid allocating the arraybuffers in the first place. So it is difficult for me to imagine a situation where the overhead of .transfer(0) is relevant |
| 17:59 | <Daniel Dyryl> | There is nothing to imagine, but see the benchmarks. Literally speed boost. I didn’t mention imports or Response.json, because that might look as rude that I want to change something, before I even found a delegate. But yes, asking WhatWG to use JSON.parseBinary in Fetch and also making use of it in json imports were one of my subsequent ideas, just not listed. So yeah, intentionally. |
| 18:00 | <bakkot> | Microbenchmarks are almost never convincing without a story for why a real application might do something anything like as frequently as a microbenchmark |
| 18:01 | <bakkot> | Browsers could already implement .json and type: "json" imports as parsing over binary; I don't think they've done so but there's nothing stopping them |
| 18:01 | <Daniel Dyryl> | Saying that “it saves cpu time and memory” doesn’t convince you? |
| 18:01 | <bakkot> | Not on its own, no. |
| 18:03 | <bakkot> | Basically any possible standard library functions would save nonzero cpu and memory for nonzero programs. There needs to be more than that, such as an argument that it would save significant cpu and memory for many programs. |
| 18:05 | <Daniel Dyryl> | Parsing over binary json saves us from constant reallocations + UTF16 upgrade of a string because of a single emoji. By avoiding Error we potentially hide stack trace from unintentional leakages. If browser had implemented this, it would be even easier to expose such JSON parser, so nothing stops them, and nothing stops us. |
| 18:06 | <bakkot> | What I mean is: if your argument is "reading json from the network is slow because it goes through a string", then the solution is to convince browsers to implement a faster .json(), not to add a feature to the language |
| 18:08 | <Daniel Dyryl> | .json() belongs to Response, but in the case of node:http, we receive payload as chunks and not as a Response object |
| 18:10 | <bakkot> | For things to be worth adding to the language, they need to at minimum be useful in browsers, not just in node. but fwiw node already vendors simdjson so they might be open to exposing it to userland if you ask. |
| 18:10 | <Daniel Dyryl> | Concerning the “detach” and avoiding allocating ArrayBuffer, in the same node:http these chunks are allocated by the framework, not us. By using “detach” we would optimise our handlers. If not everyone, then libraries we use every dat can adopt it. |
| 18:12 | <bakkot> | I would think an additional zero-length ArrayBuffer allocation is ~never going to be a relevant amount of CPU time when the network is involved, but I could be wrong about that. |
| 18:12 | <Daniel Dyryl> | Not only Node, but Bun, Deno as well. I am proposing this to a language because there is a wide choice of instruments. |
| 18:12 | <bakkot> | WinterTC is the appropriate venue for server-runtime-only features |
| 18:13 | <bakkot> | Although IIRC bun and deno do expose Response-based HTTP servers, so they could just use .json already |
| 18:14 | <Daniel Dyryl> | By using Response.json we restrain ourselves from handling either network error, or parsing error. One method throws for both cases |
| 18:15 | <bakkot> | .json |
| 18:15 | <bakkot> | so they could all do this |
| 18:16 | <Daniel Dyryl> | So it can adopt these language features. Or I should spend much more time asking each of them, while there is a centralised place like ECMAScript specification? |
| 18:16 | <bakkot> | .json already exists. If what you want is for it to be more performant, that needs to be done by the implementors of it. |
| 18:17 | <bakkot> | If what you want is a different thing, TC39, WhatWG, or WinterTC might each be reasonable, depending on the thing you want. If you have a use case for parsing binary json that isn't just "I want .json to be faster" then it might be reasonable to do it in TC39 |
| 18:18 | <Daniel Dyryl> | But still thanks for this discussion. I literally spent previous week in solitude. Making changes to existing APIs is much more likely to be avoided, as the changes are not backwards-compatible. |
| 18:19 | <bakkot> | .json could operate on raw bytes, without an intermediate string, with no user-visible changes |
| 18:19 | <Daniel Dyryl> | So yes, for me it is reasonable to propose my ideas to TC39, because they are new, clearly express developers’ intentions, can be used in browser and on the backend, improve performance |
| 18:20 | <Daniel Dyryl> | For some framework like uWebSockets.js, where data comes only in chunks, using Response.json() would be possible if we could create Response from binary chunks. Can we? |
| 18:21 | <Daniel Dyryl> | Oh, wait. We actually can |
| 18:21 | <bakkot> | The Response constructor can take a Uint8Array, yes |
| 18:21 | <Daniel Dyryl> | Just a minute |
| 18:22 | <bakkot> | or a Blob if you have multiple Uint8Arrays |
| 18:22 | <bakkot> | (sidebar: it is a little weird that Blob doesn't have .json, but whatever) |
| 18:30 | <Daniel Dyryl> | Made a quick microbenchmark (I believe that they tell a story) where 1 mil times in for loop of “await new Response(nodeBuffer).json()” takes 6 seconds, and 1 mil of “JSON.parse(nodeBuffer.toString())” takes 300ms |
| 18:30 | <Daniel Dyryl> | So async parser - not really worth modifying |
| 18:31 | <Daniel Dyryl> | Json nodeBuffer (Buffer.from() result) is small - {“key”: “value”, “key2”: 123123 } |
| 18:33 | <Daniel Dyryl> | But since Blob was mentioned, probably JSON.parseBinary would also benefit from accepting it. An additional extension, thanks |
| 18:36 | <Daniel Dyryl> | In Bun results are 230ms for JSON.parse and 550ms for Response.json |
| 18:36 | <Daniel Dyryl> | Still a gap |
| 18:42 | <bakkot> | That does suggest there is substantial room for engines to optimize .json, yup |
| 18:43 | <bakkot> | Adding a parseBinary to the language wouldn't cause them to do that work, though. |
| 18:44 | <Daniel Dyryl> | Yes. And again, I propose not only eliminating intermediate js string, but also stop throwing SyntaxError |
| 18:44 | <Daniel Dyryl> | So return-type is different |
| 18:45 | <eemeli> | What's the use case where the performance of the error handling of JSON parsing is actually relevant? |
| 18:46 | <Daniel Dyryl> | Literally every DDoS attack. I don’t think that attackers try to generously obey the rules of JSON syntax |
| 18:46 | <Daniel Dyryl> | Finding a weak spot here means using the worst case for the server |
| 18:46 | <Daniel Dyryl> | By using JSON.parseBinary we prevent these incidents |
| 18:47 | <Daniel Dyryl> | All SyntaxError does is it generates a stack trace. We need that for debugging. But we can’t debug payload that came from who-knows-where |
| 18:48 | <eemeli> | I don't see that represented in the explainer at all. |
| 18:48 | <Daniel Dyryl> | Because imagine me writing everyone “you can DDoS every js backend out there” |
| 18:49 | <Daniel Dyryl> | That was a precaution. Now in this environment I can share it with you |
| 18:54 | <eemeli> | I don't really buy the argument that a savings of about 3µs (according to the benchmark results in the explainer) while dealing with a single network request is worth the cost of not throwing a SyntaxError. |
| 18:55 | <bakkot> | Stack traces are generally created lazily already |
| 18:55 | <bakkot> | So if you don't read .stack they're almost free |
| 18:55 | <Daniel Dyryl> | errors.mjs shows 2-3 seconds of difference |
| 18:56 | <Daniel Dyryl> | “throw” also has its drawbacks in performance |
| 18:56 | <eemeli> | Yeah, for 1 000 000 iterations. |
| 18:56 | <Daniel Dyryl> | 500ms |
| 18:56 | <Daniel Dyryl> | DDoS is about 3 requests? |
| 18:58 | <eemeli> | My point is that you're showing that there is a potential for saving 3µs per network request by not throwing an error. That's such a minuscule fraction of the total time spent on that request that it's not worth the cost of having e.g. JSON.parse and JSON.parseBinary have different APIs. |
| 18:59 | <Daniel Dyryl> | Is 3us from only errors, or from strings as well? |
| 18:59 | <eemeli> | I don't know, i'm quoting your own data. |
| 19:01 | <eemeli> | Hang on, what does "DDoS is about 3 requests" mean? |
| 19:02 | <Daniel Dyryl> | I mean that those attacks imply dozens of requests, not 3 (exemplify low quantity) |
| 19:05 | <Daniel Dyryl> | I looked again into the closest benchmark to real scenarios a https://github.com/Guthib-of-Dan/proposal-json-parse-binary#parsing-benchmark Not 3us, but 3us - 2ms per parsing. |
| 19:09 | <Daniel Dyryl> | I have mentioned in the repo “secure-json-parse” package which uses Error.stackTraceLimit = 0 trick before parsing. Their benchmarks show 7900avg for JSON.parse / 3500avg latency |
| 19:11 | <eemeli> | How long would you consider it to take for the server to deal with a single request? Not just the JSON parsing, but all the work involved before that? |
| 19:14 | <Daniel Dyryl> | Is there any special benchmark (not a microbenchmark) I can make which would convince you? With all work involved |
| 19:20 | <eemeli> | Probably not, tbh. As far as I see, you're making an argument that the JS language itself needs to be changed for performance reasons, primarily to deal with hypothetical DDoS attacks, by avoiding throwing a SyntaxError from JSON parsing. This seems like an extraordinary claim, for which I'm not seeing the commensurate extraordinary evidence. Or an indication for why this problem really needs to be solved at the JS spec level. |
| 19:23 | <Daniel Dyryl> | To conclude, you see my ideas as ones optimising for rare edge-cases, which save roughly several milliseconds? |
| 19:26 | <eemeli> | To amend that: ... which might in certain cases (according to your benchmarks) save some microseconds in the context of operations that are liable to consume in total at least multiple milliseconds. |
| 19:28 | <Daniel Dyryl> | Still, I am grateful for this discussion, I learned a ton of internal functionality and don’t regret at least trying. I consider these ideas closed from now on. Well, I tried. |
| 19:33 | <Daniel Dyryl> | Wait. What about ArrayBuffer.prototype.detach? By now we have dealt only with JSON.parseBinary |