fix: use PR_SET_PDEATHSIG so to ensure child processes are killed in a timely manner (#1626)
Some users have reported issues where child processes are not cleaned up after Codex exits (e.g., https://github.com/openai/codex/issues/1570). This is generally a tricky issue on operating systems: if a parent process receives `SIGKILL`, then it terminates immediately and cannot communicate with the child. **It only helps on Linux**, but this PR introduces the use of `prctl(2)` so that if the parent process dies, `SIGTERM` will be delivered to the child process. Whereas previously, I believe that if Codex spawned a long-running process (like `tsc --watch`) and the Codex process received `SIGKILL`, the `tsc --watch` process would be reparented to the init process and would never be killed. Now with the use of `prctl(2)`, the `tsc --watch` process should receive `SIGTERM` in that scenario. We still need to come up with a solution for macOS. I've started to look at `launchd`, but I'm researching a number of options.
This commit is contained in:
1
codex-rs/Cargo.lock
generated
1
codex-rs/Cargo.lock
generated
@@ -669,6 +669,7 @@ dependencies = [
|
|||||||
"fs2",
|
"fs2",
|
||||||
"futures",
|
"futures",
|
||||||
"landlock",
|
"landlock",
|
||||||
|
"libc",
|
||||||
"maplit",
|
"maplit",
|
||||||
"mcp-types",
|
"mcp-types",
|
||||||
"mime_guess",
|
"mime_guess",
|
||||||
|
|||||||
@@ -22,6 +22,7 @@ env-flags = "0.1.1"
|
|||||||
eventsource-stream = "0.2.3"
|
eventsource-stream = "0.2.3"
|
||||||
fs2 = "0.4.3"
|
fs2 = "0.4.3"
|
||||||
futures = "0.3"
|
futures = "0.3"
|
||||||
|
libc = "0.2.174"
|
||||||
mcp-types = { path = "../mcp-types" }
|
mcp-types = { path = "../mcp-types" }
|
||||||
mime_guess = "2.0"
|
mime_guess = "2.0"
|
||||||
rand = "0.9"
|
rand = "0.9"
|
||||||
|
|||||||
@@ -384,6 +384,31 @@ async fn spawn_child_async(
|
|||||||
cmd.env(CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR, "1");
|
cmd.env(CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR, "1");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// If this Codex process dies (including being killed via SIGKILL), we want
|
||||||
|
// any child processes that were spawned as part of a `"shell"` tool call
|
||||||
|
// to also be terminated.
|
||||||
|
|
||||||
|
// This relies on prctl(2), so it only works on Linux.
|
||||||
|
#[cfg(target_os = "linux")]
|
||||||
|
unsafe {
|
||||||
|
cmd.pre_exec(|| {
|
||||||
|
// This prctl call effectively requests, "deliver SIGTERM when my
|
||||||
|
// current parent dies."
|
||||||
|
if libc::prctl(libc::PR_SET_PDEATHSIG, libc::SIGTERM) == -1 {
|
||||||
|
return Err(io::Error::last_os_error());
|
||||||
|
}
|
||||||
|
|
||||||
|
// Though if there was a race condition and this pre_exec() block is
|
||||||
|
// run _after_ the parent (i.e., the Codex process) has already
|
||||||
|
// exited, then the parent is the _init_ process (which will never
|
||||||
|
// die), so we should just terminate the child process now.
|
||||||
|
if libc::getppid() == 1 {
|
||||||
|
libc::raise(libc::SIGTERM);
|
||||||
|
}
|
||||||
|
Ok(())
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
match stdio_policy {
|
match stdio_policy {
|
||||||
StdioPolicy::RedirectForShellTool => {
|
StdioPolicy::RedirectForShellTool => {
|
||||||
// Do not create a file descriptor for stdin because otherwise some
|
// Do not create a file descriptor for stdin because otherwise some
|
||||||
|
|||||||
Reference in New Issue
Block a user