Tanstack Start | Memstop: Use LD_PRELOAD to delay process execution when low on memory

Memstop: Use LD_PRELOAD to delay process execution when low on memory(github.com)

61 points by ingve 7 days ago | 52 comments

AnotherGoodName 4 days ago
On Android something similar;
If you have an app that absolutely needs a lot of memory (transcoding HD videos in my case), before running it;
```
  - Spawn a new process and catch the unix signals for that process, specifically SIGINT. 
  - Have that new process allocate all the memory you need in your actual process. 
  - The OS will kill via it's Low Memory Killer (LMK) daemon. 
  - Since you caught the SIGINT in the parent process you can avoid the Android "Application X is using too much memory" notification that pops up to the user making it a silent failure. 
  - Lastly sleep a few MS to let Androids LMK kill other apps.
```
Now run your intensive memory process. This works for games too. A huge hack but needed for some of the lower end devices that don't have GB of RAM. Also needed since Java apps will just eat memory until something tells them to clean up, you need some overallocation to fail to trigger the whole memory free process on Android.
- phh 4 days ago |parent
  As someone who had to tweak many times LMK and Android's way to compute which process/app has which priorities, and how is a service restarted, I want to scream very loudly. But I guess that works.
- Y_Y 4 days ago |parent
  Ideally you do this in a bank app, so that when you swotch over to that PSD2 forced nonsense the app that made the payment request has died and loses the session and you get to start all over.
  - mschuster91 3 days ago |parent
    I've seen that shit happen on devices with >4GB of RAM. The fact that opening the bank's 2FA app can be enough to destroy the session of the app requesting payment is telling a few things:
    a) phone manufacturers play ridiculous games when it comes to hiding that their SoCs are dogshit
    b) so. many. apps. just outright suck at testing because they test their apps in emulators where there isn't Facebook and tons of other background crap running and thus never test their implementation of the "application is about to be killed off for lack of resources, save state to persistent storage" code path.
    c) Facebook plain sucks. Their iOS app [1] and their android app [2] are both ridiculously bloated and that has led to problems well over a decade ago, and it hasn't gotten better.
    d) Advertising, tracking, data mining and other spyware SDKs are just as bad as Facebook. The amount of code and RAM that each application carries just for this crap is insane, partially made worse by neither iOS nor Android supporting code deduplication and shared libraries.
    [1] https://quellish.tumblr.com/post/126712999812/how-on-earth-t...
    [2] https://news.ycombinator.com/item?id=28533119
- dekhn 3 days ago |parent
  I was going to ask if Android exposes mlock, which is a call that implements memory locking on normal UNIX and LINUX machines, but it looks like it is restricted to 64kb on Android.
- OptionOfT 4 days ago |parent
  So Android will kill your fake process.
  Even when that is done, does it keep on killing other apps? Why would it do that? Because once your fake process is gone, the memory pressure is gone.
  - AnotherGoodName 4 days ago |parent
    Yes, from what I've seen once Android start hitting memory issues it'll call onTrimMemory in all activities.
    https://developer.android.com/reference/android/app/Activity...
  - jayd16 4 days ago |parent
    Probably tries to kill older background apps before newer services.
- somethingsome 4 days ago |parent
  Oh please, say more, it seems very interesting.
  I didn't really get the last two bullet points
  - mmastrac 4 days ago |parent
    I think they are describing a "balloon" - you allocate enough memory to trigger a low-memory condition. Android then tries to kill off any large apps. You can then allocate large amounts of RAM in your own app after giving Android some time to shut down background apps, etc.
    - jbreckmckye 4 days ago |parent
      This reminds me of those "memory freer" apps they used to distribute for Windows XP. Did they actually do anything useful? Or just force Windows to write everything to the pagefile?
      - swinglock 4 days ago |parent
        That could perhaps be a useful if you had just enough memory to run a game, if not for other processes using it too. In that case background services should be written to swap while running the game which will probably cause stutters. Doing it before starting the game instead would be better, if it has to be done. So such a utility might then be useful in some cases when someone couldn't afford enough RAM.
    - sim7c00 3 days ago |parent
      is this thre term, baloon? like baloon drive? some vm stuff, i suddenly thought it may serve similar purpose before starting a VM, those can have big memory to load in. makes sense but this is a total guess from your use of baloon. any idea by chance? :D
      - mmastrac 2 days ago |parent
        That's literally it. Inflate the balloon, squeeze ram, you're good to go.
  - AnotherGoodName 4 days ago |parent
    When you spawn a child process you can catch the OS level events acting on the process. These are called Signals.
    https://unix.stackexchange.com/questions/176235/fork-and-how...
    If you wish to exit gracefully in some way (eg. cleanup) you can catch SIGINT and SIGTERM in a function in the parent process. Signals propagate unless caught. This way you can avoid the parent process (imagine a UI visible to the user) dying if you have a subprocess (memory intensive video encoding) terminated by the OS.
    Signals are undertaught in CS honestly.
    - cryptonector 4 days ago |parent
      Er, no. You cannot catch in one process a signal posted to another. I think you might be confusing signals generated by the tty/pty, which go to the foreground process group, and signals generated by the kernel for OOMs (which go only to the victim process, not the whole group). What you can do is see what signal a child process died with (see the `wait*(2)` system calls), and with cgroups you can find out about OOM kills of processes in a cgroup, but you still can't "catch" the signal and hold the victim alive and its memory allocated for a bit.
      - AnotherGoodName 4 days ago |parent
        >If you don't define an interrupt handler (or explicitly ignore the signal), both processes would just exit with a SIGINT code (the default behaviour).
        That's what i mean by catching it. You either trap signal or you have the parent process exit too. The goal is to avoid the parent dying and the OS messaging the user "process killed for too much memory" which happens without the trap.
        The trigger for memory is separately just to hit a failed malloc in a loop.
        cryptonector 4 days ago |parent
        You're still confusing signals generated by the tty/pty (which go to the foreground process _group_) and OOM signals which _don't_ go to the victim process' process group. You cannot catch a signal that wasn't posted to the process or its process group -- you just cannot.
        What you can do is notice via the wait syscalls that the child died with `SIGKILL`, and in a sense this is "catching" that that happened, but no signal handler runs (nor could it, since `SIGKILL` is what's used and you cannot set a handler for `SIGKILL`).
        somethingsome 4 days ago |parent
        Thank-you both, that was what I was missing. I didn't use signals in programming in many many years, and the 'catching' of signals was the most bothering to me.
marcodiego 4 days ago
I've seen energy-aware scheduling, literately decades of effort that culminated on the EEVDF scheduler so that it was possible to have a good scheduler that worked well on desktops, servers and HPC... and, between all those efforts, a giant parallel one to prevent or influence to OOM-Killer to behave better.
I really wonder if a "simple" memory-aware scheduler that punished tasks whose memory behavior (allocation or access) slows down the system would be enough. I mean, it doesn't happen anymore, but some years ago it was relatively simple to soft-crash a system just by trying to open a file that was significantly larger than the physical RAM. By 'soft-crashing' I mean the system became so slow that it was faster to reboot than wait for it to recover by itself. What if such a process was punished (for slowing down the system) by being slowed down (not being scheduled or getting lower cpu times) in a way that, no matter what it did, the other tasks continued fast enough so that it could be (even manually) killed without soft-crashing the system? Is there a reason why memory-aware scheduling was never explored or am I wrong and it was explored and proved not good?
- Asooka 3 days ago |parent
  For batch jobs I would really like a scheduler that will pause and fully swap out processes until memory is available again. For example, when compiling a C++ project, some source files or some link steps will require vast amounts of memory. In that case you would want to swap out all the other currently running compiler processes so the memory hungry one can do its job, then swap them back in. I don't want to punish the memory hungry process, actually I want exactly the opposite - I want everything else to get out of its way. The build system will eventually finish running processes that take up a lot of memory and will continue the ones that require little memory.
- toast0 4 days ago |parent
  > I really wonder if a "simple" memory-aware scheduler that punished tasks whose memory behavior (allocation or access) slows down the system would be enough. What if such a process was punished (for slowing down the system) by being slowed down (not being scheduled or getting lower cpu times) in a way that, no matter what it did, the other tasks continued fast enough so that it could be (even manually) killed without soft-crashing the system?
  This approach is hard to make work, because once the system is in memory shortage, mostly all processes will be slowing the system. There's already a penalty for accessing memory that's not currently paged in --- the process will be descheduled pending the i/o, and other processes can run during that time ... until they access memory that's not paged in. You can easily get into a situation where most of your cpu time is spend in paging and no useful work gets done. This can happen even without swap; the paging will just happen on memory mapped files, even if you're not using mmap for data files, your executables and libraries are mmaped, so the system will page those out and in in an effort to manage the memory shortage.
  To make a system easier to operate, I like to run with a small swap partition and monitor swap usage both in % and by rate. You can often get a small window of a still responsive system to try to identify the rogue process and kill it without having to restart the whole thing. A small partition means a big problem will quickly hit the OOM killer without being in swap hell for ages.
  There might be research or practice from commercial unix and mainframe where multi-tenancy is more common? What I've seen on the free software side is mostly avoiding the issue or trying to addressing it with policy limits on memory usage. Probably more thorough memory accounting is a necessary step to doing a better job, but adding more ram when you run into problems is effective mitigation, so....
meatmanek 4 days ago
I assumed it paused the program while it's running, by e.g. intercepting malloc calls or something, but no it just delays the startup.
I'm wondering what the value of this using LD_PRELOAD is, rather than just being a wrapper command that takes the command to execute as arguments. I guess it's easier to inject into a preexisting build system because it's all configured via environment variables?
- zackmorris 3 days ago |parent
  Ya that's what I wanted too. I'm actually flabbergasted that malloc() isn't a blocking call on most OSs, that would wait until the requested amount of memory was available before returning. That way programs would just suspend/resume as needed in low memory situations, rather that crashing. A process viewer could show which programs are blocked waiting for more memory, and users could optionally resume them manually. We could even serialize and unserialize entire programs and have them resume when enough memory is available.
  This one simple thing would have freed users from running under the assumption that programs can crash at any time, and allowed them to operate at a higher level of abstraction to get more real work done.
  This missed opportunity with a blocking malloc() is one of any number of obscene design decisions that I can't unsee in the tech status quo.
  In my experience, basically all software bugs stem from asynchronous/nonblocking behavior. Because it's difficult to prove that async code is deterministic without coming full circle and restructuring it as sync. For example, higher-order methods like map/reduce and scatter/gather arrays can replace iterators. And this can now be done automagically by LLM code assistants, and static analyzers since the middle of last century. Once we see that this is possible, especially if we use all const variables to avoid mutation (mostly future-proofing our logic), it's hard to avoid asking ourselves why we're all doing it the hard way. We should be able to click a block of code and choose "make sync" or "make functional" and vice versa. So that beginners could write batch files and macros with familiar iteration syntax and transpile it to safe and reliable functional code. And experts could write pure functional code in shorthand and export it as imperative code for others to verify.
  This was always another dream of mine, since I've been waiting for big companies to do it since the mid 1990s but they can't be bothered, yet I'll likely spend the rest of my life building CRUD apps to make rent.
  - jeroenhd 3 days ago |parent
    > I'm actually flabbergasted that malloc() isn't a blocking call on most OSs, that would wait until the requested amount of memory was available before returning.
    In many cases, the memory is "available" in the form of swap space (once other applications are swapped out). In other cases it's hard to tell how much memory is available (swap space can be variable, like with compressed swap space/compressed memory).
    It's not that hard to write your own malloc() that does this, of course; all you need is a quick wrapper like
```
    void* malloc_blocking(size_t size) {
        void* out;
        do {
            out = malloc(size);
            if (out == NULL) usleep(100000);
        } while (out == NULL);
        return out;
    }
```
    The problem with this "solution" is that malloc almost never fails. Applications don't typically handle malloc failure well, either; very few pre-allocate all the resources necessary to even log and notify the user of a memory allocation failure, let alone fail gracefully.
    In my experience, Windows deals with low-memory situations a lot better. It has APIs notifying applications that memory is running low so they can clear caches (i.e. by using CreateMemoryResourceNotification) and will often try to extend the page file temporarily rather than kill applications if it can help it.
    On Linux, you can measure the system memory state and application memory usage in a loop (do you check /proc/meminfo or the cgroup API? If so, which version? Good luck figuring that out!) but you're still left to the whims of the OOM daemon.
    You cannot just "make sync" everything. Asynchronous code isn't written to make your life harder. It increases throughput and performance. That's why Rust still allows for asynchronous code in various methods (threads and async) with a whole type system to make memory exchange safe while maintaining performance.
    Also, I've very rarely encountered bugs in any code I've debugged that are caused by asynchronous/nonblocking behaviour. You get a deadlock every now and then, but most bugs are either in the business logic layer or in code making assumption about things like "variables being initialized by the caller" and "pointers not being freed when used" or even just "pointers not being null". Using better languages and actually leveraging the many tools available these days can prevent most of those bugs.
- layer8 4 days ago |parent
  I think the LD_PRELOAD is automatically inherited by whatever processes make executes. Otherwise you’d have to wrap each individual build step.
- cryptonector 4 days ago |parent
  One would think it would be better to do interpose on `execve()` or even `fork()` rather than to simply do the waiting in an ELF `.init` section. After all, if the parent is spawning lots of child processes then that is a problem in itself. But yeah, this approach will presumably work most of the time.
  Incidentally, as pure-Go and pure-Rust programs proliferate, `LD_PRELOAD` stops being useful.
  - LegionMammal978 3 days ago |parent
    LD_PRELOAD still works with most Rust programs, which will link to libc as usual. (And in any case, you can't get things like full ASLR with a proper ET_EXEC binary.) What doesn't work is interjecting function calls under the assumption that the program can't work around them.
imp0cat 4 days ago
I just wanted to point out that GNU parallel has built-in options to do the same thing when running parallel processes that could possibly overwhelm the computer.
```
  --memfree size
    Minimum memory free when starting another job. The size can be postfixed with K, M, G, T, P, k, m, g, t, or p (see UNIT PREFIX).
```
rini17 4 days ago
No information about design philosophy, whether it triggers on RSS or virtual memory. And I'd think adding swap would be recommended as a place to stow away these stopped processes?
Naive approach might end with deadlock when all processs that could free up memory are stopped.
- otterley 4 days ago |parent
  Not only that, but it’s inherently racy (TOCTOU); a process could allocate a huge block of memory between the time the determination to allow the controlled program to start and the time that program is able to finish initializing is made.
  A better solution is to use a proper memory-reservation scheduler, not hacks like this. Kubernetes has such a thing.
- menaerus 3 days ago |parent
  It's pretty naive, it cat's from /proc/mem and extracts MemTotal and MemAvailable: https://github.com/surban/memstop/blob/master/memstop.c#L40-....
  In other words, that's not how virtual memory works. And I actually don't know how I would go and solve this problem because of the way commit, overcommit, and paging algorithms can intervene with the goal.
3 days ago
[deleted]
nialv7 3 days ago
1. You don't need this. Just run programs inside cgroup and set a memory limit (systemd-run lets you do this in a single convenient command). When the program reaches its memory limit it will be throttled.
2. Also often a bad idea. If you slow down a process you are also stopping it from _releasing_ memory.
- zokier 3 days ago |parent
  > When the program reaches its memory limit it will be throttled.
  I thought it will be killed?
  - jeroenhd 3 days ago |parent
    I believe they're talking about memory.high (https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.gi...):
```
    memory.high
 A read-write single value file which exists on non-root
 cgroups.  The default is "max".

 Memory usage throttle limit.  If a cgroup's usage goes
 over the high boundary, the processes of the cgroup are
 throttled and put under heavy reclaim pressure.

 Going over the high limit never invokes the OOM killer and
 under extreme conditions the limit may be breached. The high
 limit should be used in scenarios where an external process
 monitors the limited cgroup to alleviate heavy reclaim
 pressure.
```
    This is unlike memory.max, which does invoke the OOM killer. You can still reach memory.max after crossing memory.high, but you're less likely to do so if you have a method of dealing with the throttling.
4 days ago
[deleted]
ape4 4 days ago
This could be a nice systemd unit option.
- arianvanp 3 days ago |parent
  There is already memory.high (MemoryHigh= in systemd) you can set on cgroups which does something similar at the kernel level.
  But its challenging to use correctly. As its easy to end up in a live lock situation where the process never frees memory but also never gets killed.
  See all the details that Kubernetes had with introducing this
  https://github.com/kubernetes/enhancements/issues/2570
d00mB0t 7 days ago
OOM eat your heart out :D This is great, but there are security implication when using LD_PRELOAD--but I like it! More programs like this for parallel computing please.
- masfuerte 4 days ago |parent
  I'd be more worried about the possibility of deadlock.
- josephcsible 7 days ago |parent
  > there are security implication when using LD_PRELOAD
  What do you mean?
  - d00mB0t 7 days ago |parent
    A malicious user could inject memstop.so into a critical system service and delay execution--Writing a wrapper script would work, along with keeping unprivileged users from using LD_PRELOAD.
    - josephcsible 7 days ago |parent
      If a malicious user can control the environment of critical system services, you're already pwned. There's no actual security issue there and no value in such a wrapper script.
      - giingyui 4 days ago |parent
        It’s called “being on the other side of this airtight hatchway“
      - d00mB0t 7 days ago |parent
        You sure about that? :)
        josephcsible 7 days ago |parent
        Can you give a counterexample?
        coherentpony 4 days ago |parent
        You are the person that made the initial claim. The burden of proof is on you, not someone else.
        josephcsible 4 days ago |parent
        Huh? You claimed there was a security vulnerability and I disagreed and asked for an example of it.
    - pjc50 4 days ago |parent
      If it's a critical service running as root, there's no way you're allowed to inject stuff into it. That's already a far bigger security vulnerability.
      (I don't get the wrapper script suggestion, wrap what?)
      - mpyne 4 days ago |parent
        In addition, the link-loader already should ignore LD_PRELOAD for setuid binaries so even if you're not root but running a setuid binary, LD_PRELOAD can't help you (and if it can it's a security flaw with the link-loader).
        4 days ago |parent
        [deleted]