This is from the same gentleman who (among other things) demonstrated that printf() is Turing complete and wrote a first person shooter in 13kB of Javascript.
> demonstrated that printf() is Turing complete and wrote a first person shooter in ...
Not gonna lie, I thought that sentence would end with the FPS being done in printf.
This guy wrote tic-tac-toe in a single call to printf for IOCCC 2020 competition:
That is quite literally a work of art.
It’s very fun and impressive but it’s absolutely not a single call.
Maybe a generous interpretation of the comment and a realisation that common language isn't always 100% precise would be better than pointless arguments about semantics.
There is only a single printf written in the source code.
That I can agree with!
I don't think it's an unreasonable criticism, otherwise the challenge is trivial:
``` function printg(arg) { printf(arg); } ```
That, of course, isn't what they did.
They did put it in a while loop though, which, if my programming isn't completely out of date, is capable of calling the function more than once.
The development writeup for the Doom is interesting, with many details. https://nicholas.carlini.com/writing/2019/javascript-doom-cl...
There was one month time to complete the competition. But it seems you were allowed to reuse any existing other code.
Looks like this was quite fun to work on.
(I feel a bit sad that I would never be able to get one month of free time to work on this now, due to family and job...)
Wow that’s a fascinating read, thanks for linking it!!
> a first person shooter in 13kB of Javascript
I was somewhat disappointed to realize they used WebGL for rendering the graphics.
There was an earlier version of the underlying 3d engine that used only Canvas. WebGL use is justified like this:
> Once I actually got that working doing all of the math by hand in JavaScript I decided that using WebGL would probably be worth it, and actually probably wasn't cheating all that much. WebGL exposes access to the GPU-enabled rendering engine through JavaScript. While it does abstract away some of the rendering, it's less than I thought---it just supports the ability to do the necessary math efficiently---so I decided this wouldn't be cheating. And fortunately, it didn't take long to reproduce the initial renderer, but this time supporting much better (and more efficient) graphics.
From: https://nicholas.carlini.com/writing/2019/3d-renderer-javasc...
That makes sense: As far as I understand, OpenGL 2.0 and beyond don’t really provide much fixed-function/predefined logic anymore, e.g. I believe you need to provide your own vertex and pixel shaders.
You could argue that rasterization itself is being taken care by the implementation, though.
Why is this disappointing?
This point was where this changed from crazy/fun to absolutely extraordinary, where calculations of multiple possible positions all occurred in parallel, running a regex over an increasing series of state & variable sets, aka threads:
> And now for my absolute favorite part of the language we've developed. By the magic of regular expressions (and the fact that they perform substitution globally over the entire string), we can run multiple threads simultaneously!
Also:
> What do you want out of a conclusion to a blog post like this? I don't really have much to conclude. I guess I'll just say that I think more people should do entirely pointless things like this. It's a lot of fun, no one cares how long it takes you to finish, no one cares if it works or not, and incidentally, it teaches you more than you wanted to know about dozens of areas of computer science outside your field.
What a wonderful ethos.
This makes me wonder whether I could achieve such a thing if I removed all my preoccupations of other stuff.
For me what I take out of it is the power to sit down, focus your mind on something then who knows the lengths of what is possible. That, and the author is clearly very talented/skilled and creative.
Apparently it takes more than skill but also persistence and concentration. Not many schools of thought explain this well .
No, it takes (1) knowing what you learn in a compilers class (or upper level math classes) and (2) putting it to work. He didn't write 80,000 regular expressions, he wrote a compiler that wrote those expressions. Commercial-quality compilers are hard to write but simple compilers are straightforward if you know the fundamentals.
It's like the problem of solving the Rubik's cube. If you look at it in terms of geometry and spatial intuition it's intractable. If you treat it as an abstract algebra problem and break it down into "solve the top of the cube, solve the middle row of the cube, solve the bottom of the cube" and then develop a set of operators that will let you permute certain parts of the cube it takes a moderate amount of skill, persistence and concentration.
CS students do exercises such as writing moderately complex programs for a Turing machine and it's an exercise like what he did.
====
Funny my project last month was a chess engine. For a long time I'd wondered if I could make a decent MCTS chess engine but someone close to me has been getting serious about chess (usually wins against the random person, usually loses at the chess club, is fighting hard to get up the bracket) so writing a program that was a match for him seemed like a fun and meaningful project and I decided to try the conventional route of alpha-beta search.
If you tried to write a chess engine from zero you'd probably struggle, but the lifetime value you get out of education (CS or otherwise) is looking things up the literature, learning from other people's experiences, and putting it to work. So this has been my guide
https://www.chessprogramming.org/Main_Page
I started out with Python with the goal of making something tiny and stylish and started out with a move generator from
https://python-chess.readthedocs.io/en/latest/
and once I got the signs right in alpha-beta negamax (when I had them wrong it discovered https://en.wikipedia.org/wiki/Fool%27s_mate) it beat my tester several times before he regrouped, traded a knight for a good pawn structure, and got his first win.
The next goal is to take it to the chess club which means it has to respect time control. I switched to Java because its faster and because it is easy to have a comms thread interrupt a think thread. The first try wasn't faster because my move ordering was terrible. I was much faster implementing it though because I could cut and paste the test cases I had in Python and if I knew anything at this point it was the signs in negamax.
https://en.wikipedia.org/wiki/Iterative_deepening_depth-firs... is not only good for time control but gets better move ordering and I'm in the middle of implementing that. It would take me years to find out that Iterative Deepening works a lot better than you might expect, the Killer Heuristic, etc. Reading is my superpower.
Okay, as you wish.
Though I've never seen people accomplish stuff like a) understand regex; and b) write a compiler for it without "persistence and concentration.". Perhaps some get it delivered through divine apparition of some kind.
What is it that really triggered your lengthy response and the narrative..., I really fail to understand, sorry.
Agree. The whole point of the article is to generate the long list of regex programmatically.
A quick way to verify this is to download the repo, remove everything other than main.py and regex-chess.json, and the programme will still work.
All the other python files are building up to regex-chess.json, see e.g. the imports and output to write_regex_json.py.
There's a bug somewhere it seems like, as it ends the following game with "Illegal move, you lose", even though it's not an illegal move:
1. e2e4, e7e5 2. d2d4, e5d4 3. d1d4, a7a5 4. g1f3, b7b5 5. b1c3, a5a4 6. c3b5, a4a3 7. b5a3, a8a3 8. b2a3 --> Illegal Move You Lose. Game over.
FEN of game above: 1nbqkbnr/2pp1ppp/8/8/3QP3/P4N2/P1P2PPP/R1B1KB1R b KQk - 0 8
a2a3 on the first move or on the second move after e2e4 outputs that you have played an illegal move.
But this is a bug as these are legal moves.
Simply using a2a4 as the first move does that too.
ah interesting, a2a3 does the same. May be a bug with moves on the a file by white
I fear not the man who plays chess with 84,688 regular expressions, but I fear the man who plays chess with one regular expression.
If there was a general heuristic to make sequentially-applied regexes linear (combine any 2 into 1), it could be applied to this
help, I'm being nerd-sniped
right away I see backreferences as a potential problem
it would be a VERY long regex, but you're basically encoding a chess engine, so...
"help, I'm being nerd-sniped"
Just breathe
Close your eyes
Picture an algebraic representation of the higher order function that represents your real life utility and minimizes the use of resources including your cognitive energy.
Focus on optimizing that function.
Ohmmmmm
I fear people in general because I have social anxiety
When I see this sort of thing I just I want to take my hat off and stand in solemn appreciation for the true heroes among men.
The bug with a-file moves has been fixed: https://github.com/carlini/regex-chess/issues/1.
Previously: Chess written in sed https://news.ycombinator.com/item?id=6261314
Of course, the sed version does make use of control flow commands in sed and only probes 1ply (I think) so this version is significantly different in that regard.
This is not only a chess engine, it's a computer and an assembly language, built in only using Regexp
I normally don’t lose so quickly opening with a2a4 !
oh, cool
I encountered similar bug in the a-column via e2e4-d2d4-d1d4-g1f3-f1b5-d4e5-e5c5-e1g1-b2a3
Google en passant
Holy hell!
Compare also https://codegolf.stackexchange.com/q/3503/32575
That shouldn't be really surprising, as all divisibility rules are necessarily regular because anything more complex wouldn't be human-executable "rules".
Humans are able to check whether a string of parens, like ()(()()), is matched but finite state machines can't.
In any case, if you know how the regex is constructed, it's not surprising. But I found it fun to actually do the construction, instead of just being theoretically aware of the possibility.
Is this balanced or not: ((((((((((((((((((((())))))))))))))))))))? Humans can check only so many parentheses before losing track of them, so it is still an FSM in my opinion.
Also, those regexes are directly translated from the equivalent and much smaller FSM. Regexes are necessarily complex only because they have Kleene stars and nothing else; it's like representing every Boolean circuits with NAND, which is of course possible and a little fun fact but the process itself isn't exactly fun to me.
Humans can't see the balance at a glance, but we can still easily check the balance of arbitrarily complex nested parenthesis because we are not limited in the same way an FSM is. We're just way way way slower than a computer.
Yeah, I agree that humans can indeed check some non-regular languages. That doesn't however mean that humans are inherently capable for checking all non-regular languages, as they are severely limited in the working memory size. Most if not all divisibility rules are a set of least significant digits or weighted running sums because they are subject to the same constraint, so they are indeed necessarily regular.
It all depends on what assumptions you are making. And what model you want to use. (In some sense, as far as we can tell, the entirely visible universe can only contain a finite amount of information. But it still makes more sense most of the time to talk about real world computers as if they implement something like eg a RAM machine or lambda calculus.)
Regular languages admit words of arbitrary length. Eg a regular language can tell you whether 461523850177206879302813461556288524354486376376930935555512181511680646984669431923718933249775297346246192616509695718413981019441670321942082230577379960485875768935619924872490629853314107285524330300421382702242540015152210668552218484465230532702298574921915359545891160565971424053668201732275877291369 is divisible by 7. But a human would have a harder time with this than with the paren example given by the grandfather comment.
I'll happily admit that there is (to me) no reason to believe that humans can do things that a Turing machine could not, or that we are magically exempt from stuff like the halting theorem or have special insights in NP-complete problems. I am only arguing that we are unarguably more powerful than FSMs, and with some pen and paper (or perhaps an endless string of tape...) we are not as limited by our working memory size. But we are very slow.
These sorts of power discussions require a certain amount of grace, unless you're really just super, super excited to read several paragraphs of intensely mathematical caveats, not just from the first poster, but from every poster in such a thread. There is a reasonable sense in which we can say with a straight face that humans are more powerful than finite state machines, even though in principle a human's entire life, all possible choices and all possible outcomes, could be encoded into some hypothetical state machine. If you want the really good philosophical bones, consult https://www.scottaaronson.com/papers/philos.pdf . Such a hypothetical state machine would require an exponential amount of power to construct (and given the sizes in question, we need not even specify what we mean by "power" since it's exponential in energy, exponential in mass, exponential in "math elements", it'll catch up to you no matter what you're tracking in), whereas we can say with a straight face that a human can increment and decrement a parenthesis count fairly trivially to a high degree (even if there may be error) whereas a finite state machine must give up at some point at some finite depth, and moreover, as you add depth to the state machine it rapidly expands in size whereas the parentheses counter is only tracking an int, which is growing at most log on the number of parenthesis count and thus often reasonably treated as constant for any reasonable problem size.
A parenthesis counter that's limited to, say, 1024 bits for its count, is still a finite state machine and can be implemented in those 1024 bits (plus a few extras for book keeping, I guess).
You don't need to have every state in an FSM be explicitly constructed up-front.
The pdf you mention is great, btw.
The idea of doing something without a defined "productive" goal might help to do things differently, discover new ways, and in the end stumble on an innovation?
Tried it, 2 comments:
1) Engine gives a piece early with: 1. b3 b6 2. Bb2 Bb7 3. e4 ??Bxg2
2) when you enter an uppercase letter it says illegal move
Edit: and here I am trying to "stumble" on an innovation and be productive again. Erh... Humans
Well, winning with checkmate in just a few moves is also possible (d4 d5 - c4 dxc4 - e4 d8xd4 and after d1xd4, d4c4, c4xc7 and c7xc8 won by checkmate).
I guess playing 'good' chess is not the point, the point is that you can play at all using regexp. (The 'move a2a3 and lose as not considered legal' is more serious then it not actually playing well).
One good way some people learned programming is by building replacements for Python builtin/standard modules functions in Python
As it's already been reported in this comment section, at a seemingly random stage of the game when it's the player to move, an error caused by an illegal move is triggered. Happened to me twice already in the opening. Once for a King's knight move to f3 and once for a simple pawn move as follows: 1.Nc3(b1c3) Nc6(b8c6) 2.g3(g2g3) g6(g7g6) 3.Bg2(f1g2) Bg7(f8g7) 4.e3(e2e3) Bxc3(g7c3) 5.bxc3(b2c3) a6(a7a6) 6.c4(c3c4) a5(a6a5) 7.Ne2(g1e2) a4(a5a4) 5.a3(a2a3) saying a2a3 is illegal.
Creative approach. Very impressive if it'd work, but you cannot finish the game, because it doesn't recognize the moves properly. The program understanding and strength is poor, but it was to be expected and it's beyond the point of this intellectual exercise, I guess.
This is truly impressive, I'm in complete awe.
I do think there are some bugs based on playing a game against it. It has a tendency to give up its queen and other pieces. and it blundered mate in 1 at the end of the game when it had moves that led to mate in 2 or 3.
Usually even a 2-ply engine should avoid these mistakes unless the evaluation function is completely silly, which may be the case here, I don't know. I tried looking at the code but it didn't make much sense to me, I'm not smart enough to understand this regex "runtime" based code. Could also be a bug of using < instead of > somewhere, or vice versa, making it choose the worst move instead of the best one.
My tester could trounce a 2-ply minimax engine easily. At 6 plies with the alpha-beta optimization it beat my tester for the first time (beats the average person, gets wrecked at the chess club but is trying to change that) which frustrated him greatly but after he spent a day thinking about strategy he prevailed. (Without alpha-beta the 6 ply search would have been completely unreasonable)
I got the signs wrong and it managed to fool's mate itself!
I struggled with testing for a while because when it makes bad moves you don't know if you correctly coded a bad chess engine or incorrectly coded a bad chess engine. Eventually I started using chess puzzles
https://www.chessprogramming.org/Test-Positions
which are not unit tests because they take seconds to run, but unlike a real game where there is no right move, there really is a right solution. BK.01 from
https://www.chessprogramming.org/Bratko-Kopec_Test
is a particularly nice one because it runs quickly!
Yeah, a 2-ply engine is pretty terrible at chess. Especially with no quiescence search.
I know what you're describing well, I've dabbled quite a bit in chess engine dev myself, and I'm planning to get back into it soon; I've got some interesting new ideas recently I wanna try out(once they're fleshed out enough to actually be implemented, right now they're just fanciful ideas I'm kicking around my head).
Testing is a bitch though, for sure. I know that stockfish is constantly being playtasted against itself, with a new instance spawned for every pull request etc, and then given an elo rating. That way they can tell if a potential change makes it weaker or stronger.
Debugging isn't easy either. Forget about stepping over code in the debugger. You have no idea whether the bug is only triggered after billions of nodes. That's a lot of stack frames to step through. And forget about debug prints too, for the most part, because putting an unconditional debug print in your search() , qsearch() or eval() will quickly lead to gigabytes and gigabytes of output...
Only helpful thing I found was to use asserts. Find invariants, and in your debug version check them every node, die if they don't hold and barf out your stack frame or a core dump. If you're lucky the bug is somewhere near where the assert failed in the call tree. Even that isn't guaranteed though.
How long does one single move take to calculate for you folks on a normal phone? On mine (running an i.MX 8M quad-core on Firefox), the first move took several minutes.
less than 5 seconds to respond to d2d4 on apple a18 pro
It does seem to play worse than it should, by game went: 1. d4 d5 2. c4 dxc4 3. e4 Qxd4 4. Qxd4 5. Bc4 6. Nf3 7. O-O 8. Rd1 9. Qd8#
Oh no dude, it's worse than that: c4, Qa4, Qxa5, Qc7, Qxc8# lol
Nice find. I tried to get an even shorter mate, but its tendency to give up its pieces got in the way: 1. e4 e5 2. Bc4 Bc5 3. Qh5 Bxf2+ 4. Ke2 Bxg1 5. Qxf7#
Can anyone get a mate in 4?
1. e4 e5 2. Nf3 Nf6 3. d3 Ne4 4. dxe4 a5 5. Ne5 b5 6. Qf3 a4 7. Qf7#
Still, impressive for "a pile of regexen".
Oh, that was easier than expected: 1. e4 e5 2. Qh5 a6 3. Bc4 a5 4. Qxf7# I assume a mate in 3 (or a Fool's mate in 2) is not possible...
I love the author's philosophy of doing 'entirely pointless things' for the joy of learning. This project reminds me why I got into programming in the first place - not just to solve practical problems, but to explore the weird and wonderful possibilities of code. The fact that this works at all is just mind-blowing!
That's some impressive code wizardry! I thought the 2-ply search would make it respond to a mate-in-1 threat, but the following game demonstrates otherwise:
1. e4 e5 2. Nf3 Nf6 3. Nxe5 Nxe4 4. Qe2 Nxd2 5. Nc6+ Ne4 6. Qxe4+ Qe7 7. Nxe7 Bxe7 8. Nc3 a5 9. Nd5 a4 10. Qxe7#
9. .., Nc6/O-O/Kf8 would have avoided mate in 1. Maybe this is related to the a2-a4 bug noticed by others?!
Apparently it's more of a 1.75ply search according to the author :)
Try: c4, Qa4, Qxa5, Qc7, Qxc8# lol
Kudoa for this, but it feels like there should be a more direct way? I mean, he first invented basically a general-purpose execution platform. That in itself is cool, but the fact that it then can execute a chess program is not actually that surprising.
What about directly encoding the rules of the game plus some basic strategy?
It's not a general-purpose execution program. It only executes bounded loops, not free loops.
In chess the word "strategy" is used for something different than "tactics". My tester can decide to sacrifice a knight to get pawns the way he wants (strategy), my chess program on the other hand is better at tactics (looking ahead a few moves and setting up a fork https://en.wikipedia.org/wiki/Fork_(chess)
Lasker's famous quote is "better a bad plan than no plan at all" but chess engines play superhuman chess with superior tactics and no strategy. There's nothing like the "basic strategy" in blackjack, rather you can make a very strong chess program by the exhaustive search he's using, but you have to optimize it a lot.
This is totally what jwz needed for xmas...
This is fantastic, but of course strictly speaking the use of back references means these aren't true regular language expressions but the "enhanced" kind that give security headaches for all the big tech firms.
a2a3 gives me illegal move game over. What am I missing here?
Somewhat surprisingly (given the creator's regex chops) capital letters are not accepted. My guess is that your autocorrect changed the move to A2a3.
Bad news if you're almost done with a game and enter a move with the wrong case.
I'm on PC (firefox) and can reproduce it, no capital letters in my input
Hm. I don't get any errors unless I use a capital, and I get an error every time I do.
Amazing!
And just in time when the novelty of playing chess in Postscript (https://seriot.ch/projects/pschess.html) has worn off :)
I'd be really helpful if you added some faded square outlines in the background. I think maybe I made an illegal move with a bishop because of misreading the diagonal.
assign_pop() is implemented a bit oddly. In particular, the second regex starts off with
This should have just been written like the following (which in fact eq() does do, though eq() is itself missing the %%` -> %% regex):(%%)([^`]\n?#stack:\n)
As it is the [^`] matches the newline, which is why the \n has to be marked as optional and in practice will always be skipped.(%%)(\n#stack:\n)
It doesn't seem to know how to handle pawn promotion. It told me it was an illegal move and that I'd lost the game.
It has an interesting response to the queens gambit accepted. Immediate queen sac
d2d4 d7d5 c2c4 d5d4 e2e4 d8d4 !? d1d4
I read 'cheese engine' and excitedly clicked.
I think my priorities are out of line
Tried the queens gambit and the computer hung the queen on move three :)
“Now comes the clever part.”
God bless our soldiers who see that regex is turing complete and choose to implement fun programs. Yall are truly a different breed :)
Regex isn't (necessarily) turing complete :)
> Because our program just consists of a sequence of regular expressions, you can't loop at all! That, technically, means we can't actually perform Turing Complete But we can do any bounded computation by just unrolling any loops we may have.
Although some (most?) implementations may be. Though by the above quote, the author didn't make use of that.
That's the point, I think: for a large number of real-world algorithms, you don't actually need a Turing Machine. There was a very well-written explanation of this on the front page[1] some time ago, concluded with:
> Any algorithm that can be implemented by a Turing Machine such that its runtime is bounded by some primitive recursive function of input can also be implemented by a primitive recursive function!
Also, "The Little Typer" book explores a language based on "primitive recursive functions" and shows what can be done in it and how.
[1] https://matklad.github.io/2024/08/01/primitive-recursive-fun...
I appreciate the resources and recommendations. I've been interesting in Rocq (formerly Coq) recently, and I've seen dependent types mentioned, so I've been curious to learn more.
On that note, I discovered Dafny[1] recently, as a more accessible way to program with proofs. There's a companion book[2] that seems very accessible and down-to-earth (and, unfortunately, quite expensive). I didn't have the time to play with it yet, but it looks like it does what Ada/SPARK does (and more), but with less verbose syntax and more options for compilation targets. It seems to be actively developed, too. Personally, I had a very hard time getting into Coq, which is a proof assistant more than a programming environment - Dafny seems much more welcoming for a "working programmer" :)
I appreciate even more ideas to work with. A more "working" proof language sounds interesting. While I agree that Rocq is decidably not for a "working programmer," I've had a lot of fun working through the book "Software Foundations". Last night, I was able to formally prove the pumping lemma for regular languages, and that was very satisfying and enjoyable. Another reason I'm learning Rocq is due to my (largely uninformed) interest in homotopy type theory.
Sure, but it really should be done using PEG.
</Joke>
This is seriously brilliant.
anyone that can read and understand regex i have concerns for
That's awesome!
I love this.
Playing chess with strings to build datasets for text generation.
I want to share this quick win.
The other day I was asking myself some theoretical chess questions, and wanted to answer them programmatically and needed to build some custom chess datasets for that.
I needed the chess basic routines, like getting the next legal moves, displaying the board, and some rudimentary position scores. I contemplated writing from scratch. I contemplated using some library. But instead I settled for a higher level choice : interfacing with Stockfish game engine over a text interface.
There is something called UCI, which stands for Universal Chess Interface, ( https://official-stockfish.github.io/docs/stockfish-wiki/UCI... ), to use it you start a new stockfish process and write and read from the standard inputs.
So instead of writing bug prone routines to check the validity of board positions, it turn the basic routines into a simple wrapper of parsing task to read and write UCI protocol to use a battle tested engine.
A chess position state is simply defined as a vector<string> representing the sequence of moves. Moves are string in long algebraic notation.
This architectural decision allows for very quick (LLM-powered development) prototyping.
namespace bp = boost::process; bp::ipstream is; bp::opstream os;
bp::child c("../Stockfish/src/stockfish", bp::std_in < os, bp::std_out > is);
void displayBoard( const vector<string> & moveSeq, bp::ipstream& is, bp::opstream& os );
void getLegalMoves( const vector<string> & moveSeq, vector<string>& legalMoves, bp::ipstream& is, bp::opstream& os );
void getTopKMoveAndScoreAtDepthFromPosition(const vector<string> & moveSeq,int K, int D, vector<pair<string,int> >& topkmoves, bp::ipstream& is, bp::opstream& os , bool debug = false);
void displayBoard( const vector<string> & moveSeq, bp::ipstream& is, bp::opstream& os ) {
os << "position startpos moves";
for( int i = 0 ; i < moveSeq.size() ; i++)
{
}os << " " << moveSeq[i];
os << endl;
os << "d" << endl;
os << "isready" << endl;
string line;
while (getline(is, line)) { if (!line.compare(0, 7, "readyok")) break; cout << line << endl; }
}
You get the gist...
c4, Qa4, Qxa5, Qc7, Qxc8# lol
I wonder if this is inspired by LLMs which similarly do tons of repetitive processing most of which is unrelated to the final answer.
No, because the repetitive processing here is critical to the final result.