Tanstack Start | Bf-Tree: modern read-write-optimized concurrent larger-than-memory range index

Bf-Tree: modern read-write-optimized concurrent larger-than-memory range index(github.com)

48 points by SchwKatze 5 hours ago | 7 comments

koverstreet 7 minutes ago
No multithreaded write benchmarks. That's a major omission, that's where b-trees crush LSM trees.
algorithmsRcool 3 hours ago
I get excited every time I see a paper from Bradish. I've learned so much about high performance software from studying systems that he has worked on. (not to diminish his many co-authors and contributors)
Some of his other projects:
[0] https://github.com/microsoft/garnet [1] https://github.com/microsoft/FASTER [2] https://github.com/microsoft/Trill
- reactordev 2 hours ago |parent
  You should check out Microsoft Orleans
  https://github.com/dotnet/orleans
0xdeafbeef 4 hours ago
I've tested with wal enabled, got deadlock several times, so looks raw for now
- heliumtera 3 hours ago |parent
  I think a fair comparison would be against a whitepaper? Clearly this is an exploratory venture and not production grade software.
  You managed to clone the repo an run your test by yourself, whatever the outcome is this is a plus against the standard model for scientific research.
  Also, a breath of fresh air among every other show HN thread using hundreds of adjectives to describe the "behavior" of a fully vibed system. I think this is a good model for presenting engineering projects.
  - SchwKatze 3 hours ago |parent
    > You managed to clone the repo an run your test by yourself, whatever the outcome is this is a plus against the standard model for scientific research.
    That's so true, which is kinda funny since one of the cornerstone of scientific thinking is reproducibility.
    IMHO they're using the best tool for this, nix.
tombert 3 hours ago
Interesting. I've been on/off working on something, though it's an append-only hash index thing (based on Erlang's Bitcask): https://git.sr.ht/~tombert/feocask
I haven't done any benchmarks on it yet so I have no idea how well it performs, but I have a very strict "no explicit lock" policy, so everything is handled with separate files with a reconciliation service that uses a vector clock to determine ordering.
NOTE: A lot of this is AI-assisted, treat what I have written as more of a whiteboard proof of concept. I'm in the process of rewriting it by hand to do some more aggressive optimizations.