Improving Steam Client stability on Linux: setenv and multithreaded environments

Leaflet@lemmy.world · 2 months ago

Improving Steam Client stability on Linux: setenv and multithreaded environments

tal@lemmy.today · edit-2 2 months ago

That’s not a completely reliable fix, a third party library could still call setenv and trigger crashes, there’s still a risk of data races, but we’ve observed a significant reduction in SIGABRT volumes.

Hmm. If they want a dirty hack, I expect they could do a library interposer that overrides setenv(3) and getenv(3) symbols with versions that grab a global “environment variable” lock before calling the actual function.

They say that they’re having problems with third party libraries that use environment variables. If they’re using third-party libraries statically-linked against libc, I suppose that won’t work, but as long as they’re dynamically-linked, should be okay.

EDIT: Though you’ve still got an atomic update problem with the returned buffer, doing things the way they are, if you don’t want to leak memory. Like, one thread might have half-updated the value of the buffer when another is reading the buffer after returning from the interposer’s version of the function. That shouldn’t directly crash, but you can get a mangled environment variable value. And there’s not going to be guarantees on synchronization on access to the buffer, unlike the getenv() call itself.

thinks

This is more of a mind-game solution, but…

Well, you can’t track lifetime of pointers to a buffer. So there’s no true fix that doesn’t leak memory. Because the only absolute fix is to return a new buffer from getenv() for each unique setenv(), because POSIX provides no lifetime bounds.

But if you assume that anything midway through a buffer read is probably going to do so pretty soon, which is probably true…

You can maybe play tricks with mmap() and mremap(), if you’re willing to blow a page per environment variable that you want to update and a page of virtual address space per update, and some temporary memory. The buffer you return from the interposer’s getenv() is an mmap()ed range. In the interposer’s setenv(), if the value is modified, you mremap() with MREMAP_DONTUNMAP. Future calls to getenv() return the new address. That gives you a userspace page fault handler to the old range, which I suppose – haven’t written userspace page fault handlers myself – can probably block the memory read until the new value is visible and synchronize on visibility of changes across threads.

If you assume that any read of the buffer is sequential and moving forward, then if a page fault triggers on an attempted access at the address at the start of the page, then you can return the latest value of the value.

If you get a fault via an address into the middle of the buffer, and you still have a copy of the old value, then you’ve smacked into code in the middle of reading the buffer. Return the old value.

A given amount of time after an update, you’re free to purge old values from setenv(). Can do so out of the interposer’s functions.

You can never eliminate that chance that a thread has read the first N bytes of an environment variable buffer, then gone to sleep for ten minutes, then suddenly wants the remainder. In that case, you have to permit for the possibility that the thread sees part of the old environment variable value and part of the new. But you can expend temporary memory to remember old values longer to make that ever-more unlikely.

Improving Steam Client stability on Linux: setenv and multithreaded environments

Improving Steam Client stability on Linux: setenv and multithreaded environments

Just a moment...