| 00:06 | <gsnedders> | nox: BTW, isn't it still the case that SSE2 is *way* quicker with aligned reads? Do you not want to check input is aligned first? |
| 00:07 | <nox> | gsnedders: I guess I could. |
| 00:07 | <nox> | jamesr___: Yeah, have to check. |
| 00:07 | <nox> | gsnedders: I'm not sure the simd crate handle that though. |
| 00:08 | <jamesr___> | sse2 instructions generally require 16 byte alignment or they fault |
| 00:09 | <jamesr___> | or some do, it hink |
| 00:11 | <nox> | Alignement matters only for load and store. |
| 00:12 | <nox> | gsnedders: In general when handling multibyte encodings, you can't stay aligned during the whole reading anyway. |
| 00:13 | <nox> | gsnedders: Maybe through very fancy shuffling to handle continuation bytes across chunks, but I'm not sure it's worth it. |
| 00:15 | <Domenic> | hmm how did nobody else catch that ASCII to UTF8 is a memcpy... |
| 00:16 | <jsbell> | I was wondering about that; isn't the code actually UTF-8 to ASCII, which requires range validation? |
| 00:17 | <jsbell> | (I glanced at the code only enough to realize I didn't care that much...) |
| 00:17 | <Domenic> | yeah same... |
| 00:17 | <Domenic> | "ASCIIEncoder" implies you are right |
| 00:34 | <jamesr___> | "maybe ASCII" -> utf8 is not a memcpy, if you want to map bytes with the high bit set to an error value in some way |
| 06:08 | <annevk> | TabAtkins: same line |
| 06:08 | <annevk> | TabAtkins: also, I prefer <li><p>Text to be on one line if <li> only contains a single <p> |
| 06:09 | <annevk> | jamesr___: yeah, seems to be about checking invalid bytes |
| 07:54 | <nox> | Domenic: UTF-8 is compatible with US-ASCII. |
| 07:54 | <nox> | Domenic: Not all bytes are US-ASCII code points. |
| 07:54 | <nox> | So no, decoding ASCII into UTF-8 isnt memcpy. |
| 08:10 | <Ms2ger> | What |
| 08:10 | <Ms2ger> | If it's actually ASCII, there's no bytes with the high bit set, so it is a memcpy |
| 08:11 | <nox> | Ms2ger: In the context of rust-encoding, you don't know if input is actually in said encoding. |
| 08:11 | <nox> | Ms2ger: That's why the UTF-8 decoder isn't a noop either. |