I feel like LuaJIT probably deserves a mention here. It doesn't always get astounding performance, but it very often does, and you get all the inlining and dynamic specialization goodness that tracing JITs make cheap. And it demonstrates extremely convincingly that PyPy's difficulty with invoking native code isn't essential to tracing JIT.
PyPy's difficulty with invoking native code is not because of its tracing JIT but because the CPython C API makes it very hard to deviate from _any_ CPython implementation details, including e.g. choice of memory management and precise layout of objects in memory. I wrote extensively about it here: https://pypy.org/posts/2018/09/inside-cpyext-why-emulating-c...
HPy proves that it is indeed possible to have high-performance C extensions with PyPy: https://pypy.org/posts/2019/12/hpy-kick-off-sprint-report-18...
Thank you! I didn't mean to imply that it was because of the tracing JIT; rather, I meant to say explicitly that it was not because of the tracing JIT.
But isn't Lua very significantly simpler than Python? No MRO lookups, no descriptors, no __getitem__ and the like, no other operator overloading. It looks like turning Lua code into efficient native code should be a lot more doable, in more cases.
Lua does have __getitem__ (it's called __index) and operator overloading. Popular libraries like LPeg use operator overloading extensively. You can use __index to implement whatever method resolution order (MRO) and descriptors that you want (because Lua conflates __getitem__ and __getattr__). Moreover, in Lua, you can even do things like change the metatable of _ENV or _G. So, while Lua is indeed very significantly simpler than Python, it's not clear that the simplicity entitles a compiler to make many more assumptions about the meanings of constructs in your Lua code. The technique that allows JIT to work well for either language is to hoist most of the relevant guards (is this division operand a number rather than an LPeg pattern?) out of the native-code-compiled high-performance loop, bailing out to a slow path if they fail.
PyPy is trying to solve a harder problem than LuaJIT in another way, though: it's intended as a framework for writing a tracing JIT for your own language, not just a JIT for a single language. I've seen very promising prototypes using PyPy's infrastructure for this, but I'm not sure if any other PyPy-implemented language is really ready for general use.
IIRC, LuaJIT also doesn't like native code. Most native calls will cause LuaJIT to fall back to the unoptimized interpreter, with the exception of some special-cased functions from the standard library.
https://chrisfls.github.io/luajit-wiki/NYI/
LuaJIT prefers extension modules written in Lua with LuaJIT's FFI library, instead of those written in C using the traditional Lua–C API.
While it's true that if you call Lua extensions written with the Lua API from LuaJIT it will be slow, that is much less of a problem in practice in LuaJIT. LuaJIT's FFI is extremely fast, and that's what you usually use. (This is an option in PyPy, but the FFI is less fantastic.)
... If you're willing to write your own extension modules. The problem with pypy is that there is a multitude of important libraries using the CPython API.
And a significant chunk of these important extension libraries are supported in pypy using the emulated C-extension api (cpyext)
Right, the performance cost of cpyext is what we're contrasting with ctypes-like approaches like LuaJIT's FFI in this thread. In https://news.ycombinator.com/item?id=42656395 Antonio Cuni linked the standard explanation of why cpyext is so slow and also HPy, which I'm embarrassed to say I didn't know about.
Yes, and those libraries mostly don't exist for Lua. It's a big reason to use Python instead of Lua, and to use CPython rather than much better implementations like PyPy, but not much of a reason to use PUC Lua instead of LuaJIT.
On the other hand, there are also a multitude of important libraries using the C ABI, and, as you said, you can call those C libraries pretty easily with the LuaJIT FFI, without "writing extension modules". This is a big reason to use Lua instead of Python, as long as you can use LuaJIT.
Here's an example of the activity you're describing as "writing an extension module". Let's imagine that we have a garbage file we want to delete, and for some reason we're trapped in Lua, so we have to "write an extension module" to invoke unlink() from libc and call it:
That took literally three lines of code and less than two minutes. You can call that "writing an extension module" if you want, but I think that phrasing is really misleading; the impression it gives of what we're talking about is pretty far from the truth. It's like when I wired two RJ-45 jacks together crossing over the appropriate pairs for a 10BaseT null modem and said I'd built a "low-power full-duplex Ethernet switch".$ touch garbagefile $ luajit LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2022 Mike Pall. https://luajit.org/ JIT: ON SSE3 SSE4.1 BMI2 fold cse dce fwd dse narrow loop abc sink fuse > ffi = require 'ffi' > ffi.cdef 'int unlink(const char *pathname);' > libc = ffi.load '/lib/x86_64-linux-gnu/libc.so.6' > =libc.unlink cdata<int ()>: 0x7ff25ed39a00 > libc.unlink 'garbagefile' > $ ls -l garbagefile ls: cannot access 'garbagefile': No such file or directory
This works for any library, not just libc. Let's see what version of libcdparanoia I think I have installed:
As a more extended example, take a look at https://gitlab.com/kragen/bubbleos/-/blob/master/yeso/yeso.l..., a binding I wrote for a C library I'd written without giving any thought to Lua. Basically I copied and pasted the relevant sections from my .h file into the Lua code and added a few lines of Lua to load the relevant shared library:> ffi.cdef 'extern char *cdda_version();' > cdda = ffi.load '/usr/lib/x86_64-linux-gnu/libcdda_interface.so.0' > =ffi.string(cdda.cdda_version()) 10.2
And then the C functions defined in the .so and declared to the LuaJIT FFI were directly callable as properties of that `yeso` table, like `yeso.yw_wait`, `yeso.yw_close`, etc. There's another couple of pages in that .lua file but it's just a simple, convenient OO façade over the procedural-style C interface. Plus defining some constants from the .h file.local yeso = ffi.load(sodir .. lib)
Can't you do the same thing in Python with `ctypes`? Well, kind of. I mean, I did! But it's a huge pain in the ass, and the result is still worse. Contrast https://gitlab.com/kragen/bubbleos/-/blob/master/yeso/yeso.p..., which provides a more limited binding to the same API in the same way. For example, here's the definition of `ypic` from yeso.h:
And here's the definition of `ypic` in yeso.lua:typedef struct { ypix *p; yp_p2 size; int stride; } ypic;
I literally just copied and pasted the C. LuaJIT's C parser parses this at runtime. (Then at https://gitlab.com/kragen/bubbleos/-/blob/master/yeso/yeso.l... I added some methods to it, which is something you can't do with `ctypes`; you have to make a separate wrapper class. But in a sense those are just syntactic sugar.)typedef struct { ypix *p; yp_p2 size; int stride; } ypic;
Now, here's the definition of `ypic` in yeso.py:
It's a lot more work for much less return. It's not just that it's more verbose; there are also many more opportunities to screw up the types in a subtle way, and then instead of an exception traceback you get a core dump to debug with GDB. It's still better than using CPython's shitty PyObject API, but it's not in the same league as LuaJIT.class ypic(Structure): _fields_ = [ ('p', POINTER(ypix)), ('size', yp_p2), ('stride', c_int), ]
I don't want to come off as too positive on Lua here; I think that as a language it has several fatal flaws. (I wrote in more detail on this two weeks ago at https://news.ycombinator.com/item?id=42519070.) But being able to invoke native code is actually one of its strong points.
The equivalent of this (and strongly inspired by luajit's ffi) in the python world is cffi, btw: https://cffi.readthedocs.io/en/stable/
Oh, thanks! That's a second thing in this thread I'm embarrassed to have not known already. Does it get native-like performance in PyPy the way LuaJIT's FFI does? I'll have to try it with Yeso to see if it's an improvement.
It should get pretty good performance yes. Not sure how native like we get with the jit. Gut feeling would be a bit slower than gcc -O0? I would be very interested in your experience if you do try it.
I was wondering where LuaJIT would come up, too.
Nice article. I'd like to also know PyPy developers thoughts on the copy-and-patch approach chosen to implement the new JIT under development for CPython.