Tanstack Start | Pass-by-Value Overhead

Pass-by-Value Overhead(owen.cafe)

69 points by todsacerdoti a day ago | 11 comments

anonymous908213 12 hours ago
> Don’t pass around data of size 4046-4080 bytes or 8161-8176 bytes, by value (at least not on an AMD Ryzen 3900X).
What a fascinating CPU bug. I am quite curious as to how that came to pass.
- srcmax 6 hours ago |parent
  That's called 4k aliasing. 4K aliasing occurs when you store one memory location, then load from another memory location which is 4KB offset from original.
- sgarland 10 hours ago |parent
  Me too, and I hope this article gets more traction.
- jasonthorsness 8 hours ago |parent
  Apparently some sizes are cursed!
  It would be great to repeat the author’s tests on other CPU models
  - TuxSH 6 hours ago |parent
    I wonder what the page size is on his system (and what effective alignment his pointers have). If it's 4K, the sizes look really close to 0x1000 and 0x2000 - maybe crossing page boundaries?
jklowden 8 hours ago
There is no pass-by-value overhead. There are only implementation decisions.
Pass by value describes the semantics of a function call, not implementation. Passing a const reference in C++ is pass-by-value. If the user opts to pass "a copy" instead, nothing requires the compiler to actually copy the data. The compiler is required only to supply the actual parameter as if it was copied.
- mattnewport 8 hours ago |parent
  This might be true in the abstract but it's not true of actual compilers dealing with real world calling conventions. Absent inlining or whole program optimization, calling conventions across translation units don't leave much room for flexibility.
  The semantics of pass by const reference are also not exactly the same as pass by value in C++. The compiler can't in general assume a const reference doesn't alias other arguments or global variables and so has to be more conservative with certain optimizations than with pass by value.
- duped 8 hours ago |parent
  Unfortunately "the compiler is required to supply the actual parameter as if it was copied" is leaky with respect to the ABI and linker. In C and C++ you cannot fully abstract it.
codedokode 11 hours ago
I usually use ChatGPT for such microbenchmarks (of course I design it myself and use LLM only as dumb code generator, so I don't need to remember how to measure time with nanosecond precision. I still have to add workarounds to prevent compiler over-optimizing the code). It's amazing, that when you get curious (for example, what is the fastest way to find an int in a small sorted array: using linear, binary search or branchless full scan?) you can get the answer in a couple minutes instead of spending 20-30 minutes writing the code manually.
By the way, the fastest way was branchless linear scan up to 32-64 elements, as far as I remember.
- lurquer an hour ago |parent
  In C++, I’ve noticed that ChatGPT is fixated on unordered_maps. No matter the situation, when I ask what container would be wise to use, it’s always inordered_maps. Even when you tell it the container will have at most a few hundred elements (a size that would allow you to iterate their a vector to find what your are looking for before the unordered_map even has its morning coffee) it pushes the map… with enough prodding, it will eventually concede that a vector pretty much beats everything for small .size()’s.
  - bmandale an hour ago |parent
    I agree with chatgpt here