> The only clear case is to call execve(2) in the child process just after fork(2).
It’s crystal clear that you should not call fork() then execve() in any nontrivial process or multithreaded process. Use posix_spawn() or, if you must roll your own, use vfork() or clone(). Even if you can arrange to use fork() correctly like this, it’s very slow in a process that uses a lot of MAP_PRIVATE memory.
One more reason to use posix_spawn() instead of fork() is that the latter must duplicate parent process memory, so for large processes which work under time constraints this can take nontrivial amount of time. This can bite when calling system() to execute shell command, since under the hood it also calls fork().
For the purposes of this discussion, it doesn't matter whether system() uses clone() or fork() - they have the exact same problem that they have to do a non-trivial amount of virtual address space setup, and can trigger OOM if you don't have overcommit enabled. (Indeed Glibc's fork() is just a wrapper around clone() rather than the legacy fork() syscall.)
On the contrary, the purposes of this discussion were set out by amluto above:
> you should not call fork() [...] Use posix_spawn() or [...] use vfork() or clone().
and by kruzcek:
> This can bite [...] since [...] it also calls fork().
The people at the head of this discussion set out the premise that it very much does matter whether fork() or clone() or vfork() is called, by system() or directly, and this is in fact the very thing that they are discussing.
They are wrong about what system() actually does under the covers, they were slipshod about what C library they were talking about, and like you they have not picked up on the CLONE_VM flag being the important thing when it comes to the memory map, but they are right that there is a difference in the address space work that one has to be careful about and it is not "the exact same problem".
vfork() is even harder to use correctly than fork().
posix_spawn() is not a silver bullet either. Its API is cumbersome, it severly limits what you could do in the child process before exec(), and has no sane way to handle errors in the pre-exec() stage.
And AFACIS the glibc implementation of posix_spawn() doesn't even fix the memory duplication problem out of the box. It uses vfork() only if you use the non-standard POSIX_SPAWN_USEVFORK flag.
You could fork a "forker" process, early in the program and use pipes to make it do what you can't later in time. socketpairs if you need to send descriptors as well.
closefrom(2) is nice bsdism, btw. After setting up stdin, out and err, just closefrom(3). For the same money, it could have been designed to close a range instead, odd this was not selected.
A less thorough article, which conversely generated more Hacker News discussion (than exists here at the time that I write this), can be found via https://news.ycombinator.com/item?id=8449164 .
While there are flags for closing file descriptors on exec(), they are not set by default. This means that delvelopers of every application that uses fork() must set these flags otherwise descriptor leak is present.
It’s crystal clear that you should not call fork() then execve() in any nontrivial process or multithreaded process. Use posix_spawn() or, if you must roll your own, use vfork() or clone(). Even if you can arrange to use fork() correctly like this, it’s very slow in a process that uses a lot of MAP_PRIVATE memory.
* https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/pos...
* https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/uni...
Under the hood in the musl C library, system() calls posix_spawn() which in turn calls clone().
* https://git.musl-libc.org/cgit/musl/tree/src/process/system....
* https://git.musl-libc.org/cgit/musl/tree/src/process/posix_s...
On the contrary, the purposes of this discussion were set out by amluto above:
> you should not call fork() [...] Use posix_spawn() or [...] use vfork() or clone().
and by kruzcek:
> This can bite [...] since [...] it also calls fork().
The people at the head of this discussion set out the premise that it very much does matter whether fork() or clone() or vfork() is called, by system() or directly, and this is in fact the very thing that they are discussing.
They are wrong about what system() actually does under the covers, they were slipshod about what C library they were talking about, and like you they have not picked up on the CLONE_VM flag being the important thing when it comes to the memory map, but they are right that there is a difference in the address space work that one has to be careful about and it is not "the exact same problem".
* https://news.ycombinator.com/item?id=9653238
* https://github.com/torvalds/linux/blob/0e9b10395018ab78bf6bf...
posix_spawn() is not a silver bullet either. Its API is cumbersome, it severly limits what you could do in the child process before exec(), and has no sane way to handle errors in the pre-exec() stage.
And AFACIS the glibc implementation of posix_spawn() doesn't even fix the memory duplication problem out of the box. It uses vfork() only if you use the non-standard POSIX_SPAWN_USEVFORK flag.
closefrom(2) is nice bsdism, btw. After setting up stdin, out and err, just closefrom(3). For the same money, it could have been designed to close a range instead, odd this was not selected.
Presumably for issues similar to those pointed out here.