00:06
<gsnedders>
nox: BTW, isn't it still the case that SSE2 is *way* quicker with aligned reads? Do you not want to check input is aligned first?
00:07
<nox>
gsnedders: I guess I could.
00:07
<nox>
jamesr___: Yeah, have to check.
00:07
<nox>
gsnedders: I'm not sure the simd crate handle that though.
00:08
<jamesr___>
sse2 instructions generally require 16 byte alignment or they fault
00:09
<jamesr___>
or some do, it hink
00:11
<nox>
Alignement matters only for load and store.
00:12
<nox>
gsnedders: In general when handling multibyte encodings, you can't stay aligned during the whole reading anyway.
00:13
<nox>
gsnedders: Maybe through very fancy shuffling to handle continuation bytes across chunks, but I'm not sure it's worth it.
00:15
<Domenic>
hmm how did nobody else catch that ASCII to UTF8 is a memcpy...
00:16
<jsbell>
I was wondering about that; isn't the code actually UTF-8 to ASCII, which requires range validation?
00:17
<jsbell>
(I glanced at the code only enough to realize I didn't care that much...)
00:17
<Domenic>
yeah same...
00:17
<Domenic>
"ASCIIEncoder" implies you are right
00:34
<jamesr___>
"maybe ASCII" -> utf8 is not a memcpy, if you want to map bytes with the high bit set to an error value in some way
06:08
<annevk>
TabAtkins: same line
06:08
<annevk>
TabAtkins: also, I prefer <li><p>Text to be on one line if <li> only contains a single <p>
06:09
<annevk>
jamesr___: yeah, seems to be about checking invalid bytes
07:54
<nox>
Domenic: UTF-8 is compatible with US-ASCII.
07:54
<nox>
Domenic: Not all bytes are US-ASCII code points.
07:54
<nox>
So no, decoding ASCII into UTF-8 isnt memcpy.
08:10
<Ms2ger>
What
08:10
<Ms2ger>
If it's actually ASCII, there's no bytes with the high bit set, so it is a memcpy
08:11
<nox>
Ms2ger: In the context of rust-encoding, you don't know if input is actually in said encoding.
08:11
<nox>
Ms2ger: That's why the UTF-8 decoder isn't a noop either.