Hey HN,
A CSV parser using Go 1.26's experimental simd/archsimd package.
I wanted to see what the new SIMD API looks like in practice. CSV parsing is mostly "find these bytes in a buffer"—load 64 bytes, compare, get a bitmask of positions. The interesting part was handling chunk boundaries correctly (quotes and line endings can split across chunks).
- Drop-in replacement for encoding/csv - ~20% faster for unquoted data on AVX-512 - Quoted data is slower (still optimizing) - Scalar fallback for non-AVX-512
Requires GOEXPERIMENT=simd.
https://github.com/nnnkkk7/go-simdcsv
Feedback on edge cases or the SIMD implementation welcome.
This is super cool! If I'm understanding your implementation correctly, you do perform bit by bit state machine logic to check whether quotes should be escaped etc. You can do that in a single pass by using carry-less polynomial multiplication instructions (_mm_clmulepi64_si128 on AVX-512 I believe), or by just computing the carryless xor directly on the quote mask and then &ing the inverse with the bitmask for quotes. Simdjson uses this trick, and I use it as well in my Rust simd csv parser:
https://github.com/juliusgeo/csimdv-rs/blob/681df3b036f30c5a...
This is a good write-up on how the approach works: https://nullprogram.com/blog/2021/12/04/
Thanks for the tip! Your comment prompted me to refactor the quote handling - replaced the bit-by-bit state machine loop with prefix XOR, and switched to adjacent bit masking for double-quote detection. Seeing a nice performance improvement in benchmarks. Go's simd/archsimd doesn't have CLMUL yet, but the XOR cascade works well. Appreciate your feedback!
Benchmark comparison with C# SIMD optimized CSV parser [1] would be fun to see.
Oh, nice! I’ll try to do it!!