The author seems a bit too excited about the discovery that the dot product of the vectors [a, b] and [1, 1] is a + b. I don't think the problem with getting neural nets to do arithmetic is that they literally can't add two coefficients of a vector, but that the input and output modalities are something different (e.g. digit sequences) and you want to use a generic architecture that can also do other tasks (e.g. text prediction in general). If you knew in advance that you just need to calculate a + b, you could skip the neural network altogether.
I'm going to guess the main take-away point is that the weights can be trained reliably if your transfer functions are sufficiently "stiff"? Not like you need the training for the operations presented, anyone could choose the weights manually, but it could maybe extend to more complex mathematical operations?
To be honest, it does feel a bit like Claude output (which the author states they used), reads convincingly "academic", but it seems like a drawn out tautology. For example, it's no surprise its precision is the same as floating point, as it's essentially carrying out the exact same operations on the CPU.
Please do correct me if I'm wrong! I've not read the cited paper on "Neural Arithmetic Logic Units", which may clear some stuff up.
Stiff function observation is not new. It exists in general linear solver theory for decades/centuries now. But stiff function do not scale as is needed for training
Would someone be able to say if this is somehow related to encoding data as polar coordinates, because at my knowledge level it looks like it could be related?
For some context, to learn more about quantum computing, I was trying to build an evolutionary style ML algo to generate quantum circuits using the quantum machine primitives. The type where the fittest survive and mutate.
In terms of computing (this was a few years ago), I was limited to the number of qubits I could simulate (as there had to be many simulations).
The solution I found was to encode data into the spin of the qubit (which is an analog value). So I used polar coordinates to "encode data"
The matrix values looked a lot like this, so I was wondering if hill space is related? I was having to make up some stuff as I went along, and finding out the correct area to learn about more would be useful.
[flagged]
You and the author both sound like AI gloop.
If I wanted to read bot slop I would have a Reddit account