02:05
<Mathieu Hofman>
So has anyone else ever needed a String.codePointCompare function (a la Intl.Collator.prototype.compare) to use with sort for comparing strings by Unicode code points instead of the default code units (when the comparator is missing). It seems that there is no Intl locale / collation that will do a dumb code point compare.
02:06
<Mathieu Hofman>
Bonus is that implementing this natively would allow engines using an internal utf8 representation for strings to just compare them by bytes!
02:14
<bakkot>
I have never needed to sort strings by code point, no
02:15
<bakkot>
I don't think any major engines use internal utf8 representations but I could be mistaken
02:16
<bakkot>
how did you find yourself needing this?
02:18
<bakkot>
speaking of sorting, though, I do want to have a Array<T>.sortBy(fn) method where the function is a map from T to Comparable: string | number | bigint | Array<Comparable>, and which sorts the inputs by comparing their outputs from fn (throwing if the outputs are of unlike types, and comparing arrays lexicographically)
02:18
<bakkot>
and given such a thing you could do array.sortBy(s => [...s])
02:19
<bakkot>
of course we are extremely unlikely to get any new array prototype methods with reasonable names, so I guess it would have to be a static Array.sortBy(arr, fn), which... ugh. but I'd still take it.
02:24
<Mathieu Hofman>
We need a portable way of sorting strings for Ocapn, and settled on unicode codepoint comparison. This is basically an interop question.
04:14
<Aapo Alasuutari>
Side quest: Is there actually ~any engines that use UTF-8 as their string representation? Mine does, but I'm wondering if there are others and if they simply accept string methods being non-standard, or if they take measures to hide the backing representation.
05:04
<Mathieu Hofman>
Moddable's XS can be built to use either utf-8 or cesu-8
05:05
<Mathieu Hofman>
I thought that v8 supported utf-8 strings, especially when interacting with the DOM
05:46
<Domenic>
DOM uses WTF-16, sometimes (but rarely) censoring lone surrogates on the boundaries
05:49
<Justin Ridgewell>
Moddable's XS can be built to use either utf-8 or cesu-8
Why cesu-8?
05:50
<Mathieu Hofman>
compactness of strings while keeping compatibility with utf-16
05:51
<Mathieu Hofman>
it makes some operations a little costly however (like random access to string index)
13:02
<Mathieu Hofman>
We need a portable way of sorting strings for Ocapn, and settled on unicode codepoint comparison. This is basically an interop question.
Also for interop with SQLite which by default encodes strings in utf-8 and sorts them with no collation.