feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)

As stated in `codex-rs/README.md`: Today, Codex CLI is written in TypeScript and requires Node.js 22+ to run it. For a number of users, this runtime requirement inhibits adoption: they would be better served by a standalone executable. As maintainers, we want Codex to run efficiently in a wide range of environments with minimal overhead. We also want to take advantage of operating system-specific APIs to provide better sandboxing, where possible. To that end, we are moving forward with a Rust implementation of Codex CLI contained in this folder, which has the following benefits: - The CLI compiles to small, standalone, platform-specific binaries. - Can make direct, native calls to [seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and [landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in order to support sandboxing on Linux. - No runtime garbage collection, resulting in lower memory consumption and better, more predictable performance. Currently, the Rust implementation is materially behind the TypeScript implementation in functionality, so continue to use the TypeScript implmentation for the time being. We will publish native executables via GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
parent acc4acc81e
commit 31d0d7a305
71 changed files with 14099 additions and 0 deletions
--- a/codex-rs/core/Cargo.toml
+++ b/codex-rs/core/Cargo.toml
@@ -0,0 +1,62 @@
+[package]
+name = "codex-core"
+version = "0.1.0"
+edition = "2021"
+
+[lib]
+name = "codex_core"
+path = "src/lib.rs"
+
+[dependencies]
+anyhow = "1"
+async-channel = "2.3.1"
+base64 = "0.21"
+bytes = "1.10.1"
+clap = { version = "4", features = ["derive", "wrap_help"], optional = true }
+codex-apply-patch = { path = "../apply-patch" }
+dirs = "6"
+env-flags = "0.1.1"
+eventsource-stream = "0.2.3"
+expanduser = "1.2.2"
+fs-err = "3.1.0"
+futures = "0.3"
+mime_guess = "2.0"
+patch = "0.7"
+rand = "0.9"
+reqwest = { version = "0.12", features = ["json", "stream"] }
+serde = { version = "1", features = ["derive"] }
+serde_json = "1"
+thiserror = "2.0.12"
+tokio = { version = "1", features = [
+    "io-std",
+    "macros",
+    "process",
+    "rt-multi-thread",
+    "signal",
+] }
+tokio-util = "0.7.14"
+toml = "0.8.20"
+tracing = { version = "0.1.41", features = ["log"] }
+tree-sitter = "0.25.3"
+tree-sitter-bash = "0.23.3"
+
+[target.'cfg(target_os = "linux")'.dependencies]
+libc = "0.2.172"
+landlock = "0.4.1"
+seccompiler = "0.5.0"
+
+# Build OpenSSL from source for musl builds.
+[target.x86_64-unknown-linux-musl.dependencies]
+openssl-sys = { version = "*", features = ["vendored"] }
+
+[dev-dependencies]
+assert_cmd = "2"
+predicates = "3"
+tempfile = "3"
+wiremock = "0.6"
+
+[features]
+default = []
+
+# Separate feature so that `clap` is not a mandatory dependency.
+cli = ["clap"]
--- a/codex-rs/core/README.md
+++ b/codex-rs/core/README.md
@@ -0,0 +1,10 @@
+# codex-core
+
+This crate implements the business logic for Codex. It is designed to be used by the various Codex UIs written in Rust.
+
+Though for non-Rust UIs, we are also working to define a _protocol_ for talking to Codex. See:
+
+- [Specification](../docs/protocol_v1.md)
+- [Rust types](./src/protocol.rs)
+
+You can use the `proto` subcommand using the executable in the [`cli` crate](../cli) to speak the protocol using newline-delimited-JSON over stdin/stdout.
--- a/codex-rs/core/prompt.md
+++ b/codex-rs/core/prompt.md
@@ -0,0 +1,98 @@
+Please resolve the user's task by editing and testing the code files in your current code execution session.
+You are a deployed coding agent.
+Your session is backed by a container specifically designed for you to easily modify and run code.
+The repo(s) are already cloned in your working directory, and you must fully solve the problem for your answer to be considered correct.
+
+You MUST adhere to the following criteria when executing the task:
+
+- Working on the repo(s) in the current environment is allowed, even if they are proprietary.
+- Analyzing code for vulnerabilities is allowed.
+- Showing user code and tool call details is allowed.
+- User instructions may overwrite the _CODING GUIDELINES_ section in this developer message.
+- Do not use \`ls -R\`, \`find\`, or \`grep\` - these are slow in large repos. Use \`rg\` and \`rg --files\`.
+- Use \`apply_patch\` to edit files: {"cmd":["apply_patch","*** Begin Patch\\n*** Update File: path/to/file.py\\n@@ def example():\\n- pass\\n+ return 123\\n*** End Patch"]}
+- If completing the user's task requires writing or modifying files:
+  - Your code and final answer should follow these _CODING GUIDELINES_:
+    - Fix the problem at the root cause rather than applying surface-level patches, when possible.
+    - Avoid unneeded complexity in your solution.
+      - Ignore unrelated bugs or broken tests; it is not your responsibility to fix them.
+    - Update documentation as necessary.
+    - Keep changes consistent with the style of the existing codebase. Changes should be minimal and focused on the task.
+      - Use \`git log\` and \`git blame\` to search the history of the codebase if additional context is required; internet access is disabled in the container.
+    - NEVER add copyright or license headers unless specifically requested.
+    - You do not need to \`git commit\` your changes; this will be done automatically for you.
+    - If there is a .pre-commit-config.yaml, use \`pre-commit run --files ...\` to check that your changes pass the pre- commit checks. However, do not fix pre-existing errors on lines you didn't touch.
+      - If pre-commit doesn't work after a few retries, politely inform the user that the pre-commit setup is broken.
+    - Once you finish coding, you must
+      - Check \`git status\` to sanity check your changes; revert any scratch files or changes.
+      - Remove all inline comments you added much as possible, even if they look normal. Check using \`git diff\`. Inline comments must be generally avoided, unless active maintainers of the repo, after long careful study of the code and the issue, will still misinterpret the code without the comments.
+      - Check if you accidentally add copyright or license headers. If so, remove them.
+      - Try to run pre-commit if it is available.
+      - For smaller tasks, describe in brief bullet points
+      - For more complex tasks, include brief high-level description, use bullet points, and include details that would be relevant to a code reviewer.
+- If completing the user's task DOES NOT require writing or modifying files (e.g., the user asks a question about the code base):
+  - Respond in a friendly tune as a remote teammate, who is knowledgeable, capable and eager to help with coding.
+- When your task involves writing or modifying files:
+  - Do NOT tell the user to "save the file" or "copy the code into a file" if you already created or modified the file using \`apply_patch\`. Instead, reference the file as already saved.
+  - Do NOT show the full contents of large files you have already written, unless the user explicitly asks for them.
+
+§ `apply-patch` Specification
+
+Your patch language is a stripped‑down, file‑oriented diff format designed to be easy to parse and safe to apply. You can think of it as a high‑level envelope:
+
+**_ Begin Patch
+[ one or more file sections ]
+_** End Patch
+
+Within that envelope, you get a sequence of file operations.
+You MUST include a header to specify the action you are taking.
+Each operation starts with one of three headers:
+
+**_ Add File: <path> - create a new file. Every following line is a + line (the initial contents).
+_** Delete File: <path> - remove an existing file. Nothing follows.
+\*\*\* Update File: <path> - patch an existing file in place (optionally with a rename).
+
+May be immediately followed by \*\*\* Move to: <new path> if you want to rename the file.
+Then one or more “hunks”, each introduced by @@ (optionally followed by a hunk header).
+Within a hunk each line starts with:
+
+- for inserted text,
+
+* for removed text, or
+  space ( ) for context.
+  At the end of a truncated hunk you can emit \*\*\* End of File.
+
+Patch := Begin { FileOp } End
+Begin := "**_ Begin Patch" NEWLINE
+End := "_** End Patch" NEWLINE
+FileOp := AddFile | DeleteFile | UpdateFile
+AddFile := "**_ Add File: " path NEWLINE { "+" line NEWLINE }
+DeleteFile := "_** Delete File: " path NEWLINE
+UpdateFile := "**_ Update File: " path NEWLINE [ MoveTo ] { Hunk }
+MoveTo := "_** Move to: " newPath NEWLINE
+Hunk := "@@" [ header ] NEWLINE { HunkLine } [ "*** End of File" NEWLINE ]
+HunkLine := (" " | "-" | "+") text NEWLINE
+
+A full patch can combine several operations:
+
+**_ Begin Patch
+_** Add File: hello.txt
+Hello world
+**_ Update File: src/app.py
+_** Move to: src/main.py
+@@ def greet():
+-print("Hi")
+print("Hello, world!")
+**_ Delete File: obsolete.txt
+_** End Patch
+
+It is important to remember:
+
+- You must include a header with your intended action (Add/Delete/Update)
+- You must prefix new lines with `+` even when creating a new file
+
+You can invoke apply_patch like:
+
+```
+shell {"command":["apply_patch","*** Begin Patch\n*** Add File: hello.txt\n+Hello, world!\n*** End Patch\n"]}
+```
--- a/codex-rs/core/src/approval_mode_cli_arg.rs
+++ b/codex-rs/core/src/approval_mode_cli_arg.rs
@@ -0,0 +1,61 @@
+//! Standard type to use with the `--approval-mode` CLI option.
+//! Available when the `cli` feature is enabled for the crate.
+
+use clap::ValueEnum;
+
+use crate::protocol::AskForApproval;
+use crate::protocol::SandboxPolicy;
+
+#[derive(Clone, Debug, ValueEnum)]
+#[value(rename_all = "kebab-case")]
+pub enum ApprovalModeCliArg {
+    /// Run all commands without asking for user approval.
+    /// Only asks for approval if a command fails to execute, in which case it
+    /// will escalate to the user to ask for un-sandboxed execution.
+    OnFailure,
+
+    /// Only run "known safe" commands (e.g. ls, cat, sed) without
+    /// asking for user approval. Will escalate to the user if the model
+    /// proposes a command that is not allow-listed.
+    UnlessAllowListed,
+
+    /// Never ask for user approval
+    /// Execution failures are immediately returned to the model.
+    Never,
+}
+
+#[derive(Clone, Debug, ValueEnum)]
+#[value(rename_all = "kebab-case")]
+pub enum SandboxModeCliArg {
+    /// Network syscalls will be blocked
+    NetworkRestricted,
+    /// Filesystem writes will be restricted
+    FileWriteRestricted,
+    /// Network and filesystem writes will be restricted
+    NetworkAndFileWriteRestricted,
+    /// No restrictions; full "unsandboxed" mode
+    DangerousNoRestrictions,
+}
+
+impl From<ApprovalModeCliArg> for AskForApproval {
+    fn from(value: ApprovalModeCliArg) -> Self {
+        match value {
+            ApprovalModeCliArg::OnFailure => AskForApproval::OnFailure,
+            ApprovalModeCliArg::UnlessAllowListed => AskForApproval::UnlessAllowListed,
+            ApprovalModeCliArg::Never => AskForApproval::Never,
+        }
+    }
+}
+
+impl From<SandboxModeCliArg> for SandboxPolicy {
+    fn from(value: SandboxModeCliArg) -> Self {
+        match value {
+            SandboxModeCliArg::NetworkRestricted => SandboxPolicy::NetworkRestricted,
+            SandboxModeCliArg::FileWriteRestricted => SandboxPolicy::FileWriteRestricted,
+            SandboxModeCliArg::NetworkAndFileWriteRestricted => {
+                SandboxPolicy::NetworkAndFileWriteRestricted
+            }
+            SandboxModeCliArg::DangerousNoRestrictions => SandboxPolicy::DangerousNoRestrictions,
+        }
+    }
+}
--- a/codex-rs/core/src/client.rs
+++ b/codex-rs/core/src/client.rs
@@ -0,0 +1,374 @@
+use std::collections::BTreeMap;
+use std::io::BufRead;
+use std::path::Path;
+use std::pin::Pin;
+use std::sync::LazyLock;
+use std::task::Context;
+use std::task::Poll;
+use std::time::Duration;
+
+use bytes::Bytes;
+use eventsource_stream::Eventsource;
+use futures::prelude::*;
+use reqwest::StatusCode;
+use serde::Deserialize;
+use serde::Serialize;
+use serde_json::Value;
+use tokio::sync::mpsc;
+use tokio::time::timeout;
+use tokio_util::io::ReaderStream;
+use tracing::debug;
+use tracing::trace;
+use tracing::warn;
+
+use crate::error::CodexErr;
+use crate::error::Result;
+use crate::flags::get_api_key;
+use crate::flags::CODEX_RS_SSE_FIXTURE;
+use crate::flags::OPENAI_API_BASE;
+use crate::flags::OPENAI_REQUEST_MAX_RETRIES;
+use crate::flags::OPENAI_STREAM_IDLE_TIMEOUT_MS;
+use crate::flags::OPENAI_TIMEOUT_MS;
+use crate::models::ResponseInputItem;
+use crate::models::ResponseItem;
+use crate::util::backoff;
+
+#[derive(Default, Debug, Clone)]
+pub struct Prompt {
+    pub input: Vec<ResponseInputItem>,
+    pub prev_id: Option<String>,
+    pub instructions: Option<String>,
+}
+
+#[derive(Debug)]
+pub enum ResponseEvent {
+    OutputItemDone(ResponseItem),
+    Completed { response_id: String },
+}
+
+#[derive(Debug, Serialize)]
+struct Payload<'a> {
+    model: &'a str,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    instructions: Option<&'a String>,
+    input: &'a Vec<ResponseInputItem>,
+    tools: &'a [Tool],
+    tool_choice: &'static str,
+    parallel_tool_calls: bool,
+    reasoning: Option<Reasoning>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    previous_response_id: Option<String>,
+    stream: bool,
+}
+
+#[derive(Debug, Serialize)]
+struct Reasoning {
+    effort: &'static str,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    generate_summary: Option<bool>,
+}
+
+#[derive(Debug, Serialize)]
+struct Tool {
+    name: &'static str,
+    #[serde(rename = "type")]
+    kind: &'static str, // "function"
+    description: &'static str,
+    strict: bool,
+    parameters: JsonSchema,
+}
+
+/// Generic JSON‑Schema subset needed for our tool definitions
+#[derive(Debug, Clone, Serialize)]
+#[serde(tag = "type", rename_all = "lowercase")]
+enum JsonSchema {
+    String,
+    Number,
+    Array {
+        items: Box<JsonSchema>,
+    },
+    Object {
+        properties: BTreeMap<String, JsonSchema>,
+        required: &'static [&'static str],
+        #[serde(rename = "additionalProperties")]
+        additional_properties: bool,
+    },
+}
+
+/// Tool usage specification
+static TOOLS: LazyLock<Vec<Tool>> = LazyLock::new(|| {
+    let mut properties = BTreeMap::new();
+    properties.insert(
+        "command".to_string(),
+        JsonSchema::Array {
+            items: Box::new(JsonSchema::String),
+        },
+    );
+    properties.insert("workdir".to_string(), JsonSchema::String);
+    properties.insert("timeout".to_string(), JsonSchema::Number);
+
+    vec![Tool {
+        name: "shell",
+        kind: "function",
+        description: "Runs a shell command, and returns its output.",
+        strict: false,
+        parameters: JsonSchema::Object {
+            properties,
+            required: &["command"],
+            additional_properties: false,
+        },
+    }]
+});
+
+#[derive(Clone)]
+pub struct ModelClient {
+    model: String,
+    client: reqwest::Client,
+}
+
+impl ModelClient {
+    pub fn new(model: impl ToString) -> Self {
+        let model = model.to_string();
+        let client = reqwest::Client::new();
+        Self { model, client }
+    }
+
+    pub async fn stream(&mut self, prompt: &Prompt) -> Result<ResponseStream> {
+        if let Some(path) = &*CODEX_RS_SSE_FIXTURE {
+            // short circuit for tests
+            warn!(path, "Streaming from fixture");
+            return stream_from_fixture(path).await;
+        }
+
+        let payload = Payload {
+            model: &self.model,
+            instructions: prompt.instructions.as_ref(),
+            input: &prompt.input,
+            tools: &TOOLS,
+            tool_choice: "auto",
+            parallel_tool_calls: false,
+            reasoning: Some(Reasoning {
+                effort: "high",
+                generate_summary: None,
+            }),
+            previous_response_id: prompt.prev_id.clone(),
+            stream: true,
+        };
+
+        let url = format!("{}/v1/responses", *OPENAI_API_BASE);
+        debug!(url, "POST");
+        trace!("request payload: {}", serde_json::to_string(&payload)?);
+
+        let mut attempt = 0;
+        loop {
+            attempt += 1;
+
+            let res = self
+                .client
+                .post(&url)
+                .bearer_auth(get_api_key()?)
+                .header("OpenAI-Beta", "responses=experimental")
+                .header(reqwest::header::ACCEPT, "text/event-stream")
+                .json(&payload)
+                .timeout(*OPENAI_TIMEOUT_MS)
+                .send()
+                .await;
+            match res {
+                Ok(resp) if resp.status().is_success() => {
+                    let (tx_event, rx_event) = mpsc::channel::<Result<ResponseEvent>>(16);
+
+                    // spawn task to process SSE
+                    let stream = resp.bytes_stream().map_err(CodexErr::Reqwest);
+                    tokio::spawn(process_sse(stream, tx_event));
+
+                    return Ok(ResponseStream { rx_event });
+                }
+                Ok(res) => {
+                    let status = res.status();
+                    // The OpenAI Responses endpoint returns structured JSON bodies even for 4xx/5xx
+                    // errors. When we bubble early with only the HTTP status the caller sees an opaque
+                    // "unexpected status 400 Bad Request" which makes debugging nearly impossible.
+                    // Instead, read (and include) the response text so higher layers and users see the
+                    // exact error message (e.g. "Unknown parameter: 'input[0].metadata'"). The body is
+                    // small and this branch only runs on error paths so the extra allocation is
+                    // negligible.
+                    if !(status == StatusCode::TOO_MANY_REQUESTS || status.is_server_error()) {
+                        // Surface the error body to callers. Use `unwrap_or_default` per Clippy.
+                        let body = (res.text().await).unwrap_or_default();
+                        return Err(CodexErr::UnexpectedStatus(status, body));
+                    }
+
+                    if attempt > *OPENAI_REQUEST_MAX_RETRIES {
+                        return Err(CodexErr::RetryLimit(status));
+                    }
+
+                    // Pull out Retry‑After header if present.
+                    let retry_after_secs = res
+                        .headers()
+                        .get(reqwest::header::RETRY_AFTER)
+                        .and_then(|v| v.to_str().ok())
+                        .and_then(|s| s.parse::<u64>().ok());
+
+                    let delay = retry_after_secs
+                        .map(|s| Duration::from_millis(s * 1_000))
+                        .unwrap_or_else(|| backoff(attempt));
+                    tokio::time::sleep(delay).await;
+                }
+                Err(e) => {
+                    if attempt > *OPENAI_REQUEST_MAX_RETRIES {
+                        return Err(e.into());
+                    }
+                    let delay = backoff(attempt);
+                    tokio::time::sleep(delay).await;
+                }
+            }
+        }
+    }
+}
+
+#[derive(Debug, Deserialize, Serialize)]
+struct SseEvent {
+    #[serde(rename = "type")]
+    kind: String,
+    response: Option<Value>,
+    item: Option<Value>,
+}
+
+#[derive(Debug, Deserialize)]
+struct ResponseCompleted {
+    id: String,
+}
+
+async fn process_sse<S>(stream: S, tx_event: mpsc::Sender<Result<ResponseEvent>>)
+where
+    S: Stream<Item = Result<Bytes>> + Unpin,
+{
+    let mut stream = stream.eventsource();
+
+    // If the stream stays completely silent for an extended period treat it as disconnected.
+    let idle_timeout = *OPENAI_STREAM_IDLE_TIMEOUT_MS;
+    // The response id returned from the "complete" message.
+    let mut response_id = None;
+
+    loop {
+        let sse = match timeout(idle_timeout, stream.next()).await {
+            Ok(Some(Ok(sse))) => sse,
+            Ok(Some(Err(e))) => {
+                debug!("SSE Error: {e:#}");
+                let event = CodexErr::Stream(e.to_string());
+                let _ = tx_event.send(Err(event)).await;
+                return;
+            }
+            Ok(None) => {
+                match response_id {
+                    Some(response_id) => {
+                        let event = ResponseEvent::Completed { response_id };
+                        let _ = tx_event.send(Ok(event)).await;
+                    }
+                    None => {
+                        let _ = tx_event
+                            .send(Err(CodexErr::Stream(
+                                "stream closed before response.completed".into(),
+                            )))
+                            .await;
+                    }
+                }
+                return;
+            }
+            Err(_) => {
+                let _ = tx_event
+                    .send(Err(CodexErr::Stream("idle timeout waiting for SSE".into())))
+                    .await;
+                return;
+            }
+        };
+
+        let event: SseEvent = match serde_json::from_str(&sse.data) {
+            Ok(event) => event,
+            Err(e) => {
+                debug!("Failed to parse SSE event: {e}, data: {}", &sse.data);
+                continue;
+            }
+        };
+
+        trace!(?event, "SSE event");
+        match event.kind.as_str() {
+            // Individual output item finalised. Forward immediately so the
+            // rest of the agent can stream assistant text/functions *live*
+            // instead of waiting for the final `response.completed` envelope.
+            //
+            // IMPORTANT: We used to ignore these events and forward the
+            // duplicated `output` array embedded in the `response.completed`
+            // payload.  That produced two concrete issues:
+            //   1. No real‑time streaming – the user only saw output after the
+            //      entire turn had finished, which broke the “typing” UX and
+            //      made long‑running turns look stalled.
+            //   2. Duplicate `function_call_output` items – both the
+            //      individual *and* the completed array were forwarded, which
+            //      confused the backend and triggered 400
+            //      "previous_response_not_found" errors because the duplicated
+            //      IDs did not match the incremental turn chain.
+            //
+            // The fix is to forward the incremental events *as they come* and
+            // drop the duplicated list inside `response.completed`.
+            "response.output_item.done" => {
+                let Some(item_val) = event.item else { continue };
+                let Ok(item) = serde_json::from_value::<ResponseItem>(item_val) else {
+                    debug!("failed to parse ResponseItem from output_item.done");
+                    continue;
+                };
+
+                let event = ResponseEvent::OutputItemDone(item);
+                if tx_event.send(Ok(event)).await.is_err() {
+                    return;
+                }
+            }
+            // Final response completed – includes array of output items & id
+            "response.completed" => {
+                if let Some(resp_val) = event.response {
+                    match serde_json::from_value::<ResponseCompleted>(resp_val) {
+                        Ok(r) => {
+                            response_id = Some(r.id);
+                        }
+                        Err(e) => {
+                            debug!("failed to parse ResponseCompleted: {e}");
+                            continue;
+                        }
+                    };
+                };
+            }
+            other => debug!(other, "sse event"),
+        }
+    }
+}
+
+pub struct ResponseStream {
+    rx_event: mpsc::Receiver<Result<ResponseEvent>>,
+}
+
+impl Stream for ResponseStream {
+    type Item = Result<ResponseEvent>;
+
+    fn poll_next(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
+        self.rx_event.poll_recv(cx)
+    }
+}
+
+/// used in tests to stream from a text SSE file
+async fn stream_from_fixture(path: impl AsRef<Path>) -> Result<ResponseStream> {
+    let (tx_event, rx_event) = mpsc::channel::<Result<ResponseEvent>>(16);
+    let f = std::fs::File::open(path.as_ref())?;
+    let lines = std::io::BufReader::new(f).lines();
+
+    // insert \n\n after each line for proper SSE parsing
+    let mut content = String::new();
+    for line in lines {
+        content.push_str(&line?);
+        content.push_str("\n\n");
+    }
+
+    let rdr = std::io::Cursor::new(content);
+    let stream = ReaderStream::new(rdr).map_err(CodexErr::Io);
+    tokio::spawn(process_sse(stream, tx_event));
+    Ok(ResponseStream { rx_event })
+}
--- a/codex-rs/core/src/codex.rs
+++ b/codex-rs/core/src/codex.rs
--- a/codex-rs/core/src/codex_wrapper.rs
+++ b/codex-rs/core/src/codex_wrapper.rs
@@ -0,0 +1,85 @@
+use std::sync::atomic::AtomicU64;
+use std::sync::Arc;
+
+use crate::config::Config;
+use crate::protocol::AskForApproval;
+use crate::protocol::Event;
+use crate::protocol::EventMsg;
+use crate::protocol::Op;
+use crate::protocol::SandboxPolicy;
+use crate::protocol::Submission;
+use crate::util::notify_on_sigint;
+use crate::Codex;
+use tokio::sync::Notify;
+use tracing::debug;
+
+/// Spawn a new [`Codex`] and initialise the session.
+///
+/// Returns the wrapped [`Codex`] **and** the `SessionInitialized` event that
+/// is received as a response to the initial `ConfigureSession` submission so
+/// that callers can surface the information to the UI.
+pub async fn init_codex(
+    approval_policy: AskForApproval,
+    sandbox_policy: SandboxPolicy,
+    model_override: Option<String>,
+) -> anyhow::Result<(CodexWrapper, Event, Arc<Notify>)> {
+    let ctrl_c = notify_on_sigint();
+    let config = Config::load().unwrap_or_default();
+    debug!("loaded config: {config:?}");
+    let codex = CodexWrapper::new(Codex::spawn(ctrl_c.clone())?);
+    let init_id = codex
+        .submit(Op::ConfigureSession {
+            model: model_override.or_else(|| config.model.clone()),
+            instructions: config.instructions,
+            approval_policy,
+            sandbox_policy,
+        })
+        .await?;
+
+    // The first event must be `SessionInitialized`. Validate and forward it to
+    // the caller so that they can display it in the conversation history.
+    let event = codex.next_event().await?;
+    if event.id != init_id
+        || !matches!(
+            &event,
+            Event {
+                id: _id,
+                msg: EventMsg::SessionConfigured { .. },
+            }
+        )
+    {
+        return Err(anyhow::anyhow!(
+            "expected SessionInitialized but got {event:?}"
+        ));
+    }
+
+    Ok((codex, event, ctrl_c))
+}
+
+pub struct CodexWrapper {
+    next_id: AtomicU64,
+    codex: Codex,
+}
+
+impl CodexWrapper {
+    fn new(codex: Codex) -> Self {
+        Self {
+            next_id: AtomicU64::new(0),
+            codex,
+        }
+    }
+
+    /// Returns the id of the Submission.
+    pub async fn submit(&self, op: Op) -> crate::error::Result<String> {
+        let id = self
+            .next_id
+            .fetch_add(1, std::sync::atomic::Ordering::SeqCst)
+            .to_string();
+        self.codex.submit(Submission { id: id.clone(), op }).await?;
+        Ok(id)
+    }
+
+    pub async fn next_event(&self) -> crate::error::Result<Event> {
+        self.codex.next_event().await
+    }
+}
--- a/codex-rs/core/src/config.rs
+++ b/codex-rs/core/src/config.rs
@@ -0,0 +1,42 @@
+use dirs::home_dir;
+use serde::Deserialize;
+
+/// Embedded fallback instructions that mirror the TypeScript CLI’s default system prompt. These
+/// are compiled into the binary so a clean install behaves correctly even if the user has not
+/// created `~/.codex/instructions.md`.
+const EMBEDDED_INSTRUCTIONS: &str = include_str!("../prompt.md");
+
+#[derive(Default, Deserialize, Debug, Clone)]
+pub struct Config {
+    pub model: Option<String>,
+    pub instructions: Option<String>,
+}
+
+impl Config {
+    /// Load ~/.codex/config.toml and ~/.codex/instructions.md (if present).
+    /// Returns `None` if neither file exists.
+    pub fn load() -> Option<Self> {
+        let mut cfg: Config = Self::load_from_toml().unwrap_or_default();
+
+        // Highest precedence → user‑provided ~/.codex/instructions.md (if present)
+        // Fallback           → embedded default instructions baked into the binary
+
+        cfg.instructions =
+            Self::load_instructions().or_else(|| Some(EMBEDDED_INSTRUCTIONS.to_string()));
+
+        Some(cfg)
+    }
+
+    fn load_from_toml() -> Option<Self> {
+        let mut p = home_dir()?;
+        p.push(".codex/config.toml");
+        let contents = std::fs::read_to_string(&p).ok()?;
+        toml::from_str(&contents).ok()
+    }
+
+    fn load_instructions() -> Option<String> {
+        let mut p = home_dir()?;
+        p.push(".codex/instructions.md");
+        std::fs::read_to_string(&p).ok()
+    }
+}
--- a/codex-rs/core/src/error.rs
+++ b/codex-rs/core/src/error.rs
@@ -0,0 +1,103 @@
+use reqwest::StatusCode;
+use serde_json;
+use std::io;
+use thiserror::Error;
+use tokio::task::JoinError;
+
+pub type Result<T> = std::result::Result<T, CodexErr>;
+
+#[derive(Error, Debug)]
+pub enum SandboxErr {
+    /// Error from sandbox execution
+    #[error("sandbox denied exec error, exit code: {0}, stdout: {1}, stderr: {2}")]
+    Denied(i32, String, String),
+
+    /// Error from linux seccomp filter setup
+    #[cfg(target_os = "linux")]
+    #[error("seccomp setup error")]
+    SeccompInstall(#[from] seccompiler::Error),
+
+    /// Error from linux seccomp backend
+    #[cfg(target_os = "linux")]
+    #[error("seccomp backend error")]
+    SeccompBackend(#[from] seccompiler::BackendError),
+
+    /// Error from linux landlock
+    #[error("Landlock was not able to fully enforce all sandbox rules")]
+    LandlockRestrict,
+}
+
+#[derive(Error, Debug)]
+pub enum CodexErr {
+    /// Returned by ResponsesClient when the SSE stream disconnects or errors out **after** the HTTP
+    /// handshake has succeeded but **before** it finished emitting `response.completed`.
+    ///
+    /// The Session loop treats this as a transient error and will automatically retry the turn.
+    #[error("stream disconnected before completion: {0}")]
+    Stream(String),
+
+    /// Returned by run_command_stream when the spawned child process timed out (10s).
+    #[error("timeout waiting for child process to exit")]
+    Timeout,
+
+    /// Returned by run_command_stream when the child could not be spawned (its stdout/stderr pipes
+    /// could not be captured). Analogous to the previous `CodexError::Spawn` variant.
+    #[error("spawn failed: child stdout/stderr not captured")]
+    Spawn,
+
+    /// Returned by run_command_stream when the user pressed Ctrl‑C (SIGINT). Session uses this to
+    /// surface a polite FunctionCallOutput back to the model instead of crashing the CLI.
+    #[error("interrupted (Ctrl‑C)")]
+    Interrupted,
+
+    /// Unexpected HTTP status code.
+    #[error("unexpected status {0}: {1}")]
+    UnexpectedStatus(StatusCode, String),
+
+    /// Retry limit exceeded.
+    #[error("exceeded retry limit, last status: {0}")]
+    RetryLimit(StatusCode),
+
+    /// Agent loop died unexpectedly
+    #[error("internal error; agent loop died unexpectedly")]
+    InternalAgentDied,
+
+    /// Sandbox error
+    #[error("sandbox error: {0}")]
+    Sandbox(#[from] SandboxErr),
+
+    // -----------------------------------------------------------------
+    // Automatic conversions for common external error types
+    // -----------------------------------------------------------------
+    #[error(transparent)]
+    Io(#[from] io::Error),
+
+    #[error(transparent)]
+    Reqwest(#[from] reqwest::Error),
+
+    #[error(transparent)]
+    Json(#[from] serde_json::Error),
+
+    #[cfg(target_os = "linux")]
+    #[error(transparent)]
+    LandlockRuleset(#[from] landlock::RulesetError),
+
+    #[cfg(target_os = "linux")]
+    #[error(transparent)]
+    LandlockPathFd(#[from] landlock::PathFdError),
+
+    #[error(transparent)]
+    TokioJoin(#[from] JoinError),
+
+    #[error("missing environment variable {0}")]
+    EnvVar(&'static str),
+}
+
+impl CodexErr {
+    /// Minimal shim so that existing `e.downcast_ref::<CodexErr>()` checks continue to compile
+    /// after replacing `anyhow::Error` in the return signature. This mirrors the behavior of
+    /// `anyhow::Error::downcast_ref` but works directly on our concrete enum.
+    pub fn downcast_ref<T: std::any::Any>(&self) -> Option<&T> {
+        (self as &dyn std::any::Any).downcast_ref::<T>()
+    }
+}
--- a/codex-rs/core/src/exec.rs
+++ b/codex-rs/core/src/exec.rs
@@ -0,0 +1,277 @@
+use std::io;
+use std::path::PathBuf;
+use std::process::ExitStatus;
+use std::process::Stdio;
+use std::sync::Arc;
+use std::time::Duration;
+use std::time::Instant;
+
+use serde::Deserialize;
+use tokio::io::AsyncReadExt;
+use tokio::io::BufReader;
+use tokio::process::Command;
+use tokio::sync::Notify;
+
+use crate::error::CodexErr;
+use crate::error::Result;
+use crate::error::SandboxErr;
+
+/// Maximum we keep for each stream (100 KiB).
+const MAX_STREAM_OUTPUT: usize = 100 * 1024;
+
+const DEFAULT_TIMEOUT_MS: u64 = 10_000;
+
+/// Hardcode this since it does not seem worth including the libc craate just
+/// for this.
+const SIGKILL_CODE: i32 = 9;
+
+const MACOS_SEATBELT_READONLY_POLICY: &str = include_str!("seatbelt_readonly_policy.sbpl");
+
+#[derive(Deserialize, Debug, Clone)]
+pub struct ExecParams {
+    pub command: Vec<String>,
+    pub workdir: Option<String>,
+
+    /// This is the maximum time in seconds that the command is allowed to run.
+    #[serde(rename = "timeout")]
+    // The wire format uses `timeout`, which has ambiguous units, so we use
+    // `timeout_ms` as the field name so it is clear in code.
+    pub timeout_ms: Option<u64>,
+}
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+pub enum SandboxType {
+    None,
+
+    /// Only available on macOS.
+    MacosSeatbelt,
+
+    /// Only available on Linux.
+    LinuxSeccomp,
+}
+
+#[cfg(target_os = "linux")]
+async fn exec_linux(
+    params: ExecParams,
+    writable_roots: &[PathBuf],
+    ctrl_c: Arc<Notify>,
+) -> Result<RawExecToolCallOutput> {
+    crate::linux::exec_linux(params, writable_roots, ctrl_c).await
+}
+
+#[cfg(not(target_os = "linux"))]
+async fn exec_linux(
+    _params: ExecParams,
+    _writable_roots: &[PathBuf],
+    _ctrl_c: Arc<Notify>,
+) -> Result<RawExecToolCallOutput> {
+    Err(CodexErr::Io(io::Error::new(
+        io::ErrorKind::InvalidInput,
+        "linux sandbox is not supported on this platform",
+    )))
+}
+
+pub async fn process_exec_tool_call(
+    params: ExecParams,
+    sandbox_type: SandboxType,
+    writable_roots: &[PathBuf],
+    ctrl_c: Arc<Notify>,
+) -> Result<ExecToolCallOutput> {
+    let start = Instant::now();
+
+    let raw_output_result = match sandbox_type {
+        SandboxType::None => exec(params, ctrl_c).await,
+        SandboxType::MacosSeatbelt => {
+            let ExecParams {
+                command,
+                workdir,
+                timeout_ms,
+            } = params;
+            let seatbelt_command = create_seatbelt_command(command, writable_roots);
+            exec(
+                ExecParams {
+                    command: seatbelt_command,
+                    workdir,
+                    timeout_ms,
+                },
+                ctrl_c,
+            )
+            .await
+        }
+        SandboxType::LinuxSeccomp => exec_linux(params, writable_roots, ctrl_c).await,
+    };
+    let duration = start.elapsed();
+    match raw_output_result {
+        Ok(raw_output) => {
+            let exit_code = raw_output.exit_status.code().unwrap_or(-1);
+            let stdout = String::from_utf8_lossy(&raw_output.stdout).to_string();
+            let stderr = String::from_utf8_lossy(&raw_output.stderr).to_string();
+
+            // NOTE(ragona): This is much less restrictive than the previous check. If we exec
+            // a command, and it returns anything other than success, we assume that it may have
+            // been a sandboxing error and allow the user to retry. (The user of course may choose
+            // not to retry, or in a non-interactive mode, would automatically reject the approval.)
+            if exit_code != 0 && sandbox_type != SandboxType::None {
+                return Err(CodexErr::Sandbox(SandboxErr::Denied(
+                    exit_code, stdout, stderr,
+                )));
+            }
+
+            Ok(ExecToolCallOutput {
+                exit_code,
+                stdout,
+                stderr,
+                duration,
+            })
+        }
+        Err(err) => {
+            tracing::error!("exec error: {err}");
+            Err(err)
+        }
+    }
+}
+
+pub fn create_seatbelt_command(command: Vec<String>, writable_roots: &[PathBuf]) -> Vec<String> {
+    let (policies, cli_args): (Vec<String>, Vec<String>) = writable_roots
+        .iter()
+        .enumerate()
+        .map(|(index, root)| {
+            let param_name = format!("WRITABLE_ROOT_{index}");
+            let policy: String = format!("(subpath (param \"{param_name}\"))");
+            let cli_arg = format!("-D{param_name}={}", root.to_string_lossy());
+            (policy, cli_arg)
+        })
+        .unzip();
+
+    let full_policy = if policies.is_empty() {
+        MACOS_SEATBELT_READONLY_POLICY.to_string()
+    } else {
+        let scoped_write_policy = format!("(allow file-write*\n{}\n)", policies.join(" "));
+        format!("{MACOS_SEATBELT_READONLY_POLICY}\n{scoped_write_policy}")
+    };
+
+    let mut seatbelt_command: Vec<String> = vec![
+        "sandbox-exec".to_string(),
+        "-p".to_string(),
+        full_policy.to_string(),
+    ];
+    seatbelt_command.extend(cli_args);
+    seatbelt_command.push("--".to_string());
+    seatbelt_command.extend(command);
+    seatbelt_command
+}
+
+#[derive(Debug)]
+pub struct RawExecToolCallOutput {
+    pub exit_status: ExitStatus,
+    pub stdout: Vec<u8>,
+    pub stderr: Vec<u8>,
+}
+
+#[derive(Debug)]
+pub struct ExecToolCallOutput {
+    pub exit_code: i32,
+    pub stdout: String,
+    pub stderr: String,
+    pub duration: Duration,
+}
+
+pub async fn exec(
+    ExecParams {
+        command,
+        workdir,
+        timeout_ms,
+    }: ExecParams,
+    ctrl_c: Arc<Notify>,
+) -> Result<RawExecToolCallOutput> {
+    let mut child = {
+        if command.is_empty() {
+            return Err(CodexErr::Io(io::Error::new(
+                io::ErrorKind::InvalidInput,
+                "command args are empty",
+            )));
+        }
+
+        let mut cmd = Command::new(&command[0]);
+        if command.len() > 1 {
+            cmd.args(&command[1..]);
+        }
+        if let Some(dir) = &workdir {
+            cmd.current_dir(dir);
+        }
+        cmd.stdout(Stdio::piped()).stderr(Stdio::piped());
+        cmd.kill_on_drop(true);
+        cmd.spawn()?
+    };
+
+    let stdout_handle = tokio::spawn(read_capped(
+        BufReader::new(child.stdout.take().expect("stdout is not piped")),
+        MAX_STREAM_OUTPUT,
+    ));
+    let stderr_handle = tokio::spawn(read_capped(
+        BufReader::new(child.stderr.take().expect("stderr is not piped")),
+        MAX_STREAM_OUTPUT,
+    ));
+
+    let interrupted = ctrl_c.notified();
+    let timeout = Duration::from_millis(timeout_ms.unwrap_or(DEFAULT_TIMEOUT_MS));
+    let exit_status = tokio::select! {
+        result = tokio::time::timeout(timeout, child.wait()) => {
+            match result {
+                Ok(Ok(exit_status)) => exit_status,
+                Ok(e) => e?,
+                Err(_) => {
+                    // timeout
+                    child.start_kill()?;
+                    // Debatable whether `child.wait().await` should be called here.
+                    synthetic_exit_status(128 + SIGKILL_CODE)
+                }
+            }
+        }
+        _ = interrupted => {
+            child.start_kill()?;
+            synthetic_exit_status(128 + SIGKILL_CODE)
+        }
+    };
+
+    let stdout = stdout_handle.await??;
+    let stderr = stderr_handle.await??;
+
+    Ok(RawExecToolCallOutput {
+        exit_status,
+        stdout,
+        stderr,
+    })
+}
+
+async fn read_capped<R: AsyncReadExt + Unpin>(
+    mut reader: R,
+    max_output: usize,
+) -> io::Result<Vec<u8>> {
+    let mut buf = Vec::with_capacity(max_output.min(8 * 1024));
+    let mut tmp = [0u8; 8192];
+
+    loop {
+        let n = reader.read(&mut tmp).await?;
+        if n == 0 {
+            break;
+        }
+        if buf.len() < max_output {
+            let remaining = max_output - buf.len();
+            buf.extend_from_slice(&tmp[..remaining.min(n)]);
+        }
+    }
+    Ok(buf)
+}
+
+#[cfg(unix)]
+fn synthetic_exit_status(code: i32) -> ExitStatus {
+    use std::os::unix::process::ExitStatusExt;
+    std::process::ExitStatus::from_raw(code)
+}
+
+#[cfg(windows)]
+fn synthetic_exit_status(code: u32) -> ExitStatus {
+    use std::os::windows::process::ExitStatusExt;
+    std::process::ExitStatus::from_raw(code)
+}
--- a/codex-rs/core/src/flags.rs
+++ b/codex-rs/core/src/flags.rs
@@ -0,0 +1,30 @@
+use std::time::Duration;
+
+use env_flags::env_flags;
+
+use crate::error::CodexErr;
+use crate::error::Result;
+
+env_flags! {
+    pub OPENAI_DEFAULT_MODEL: &str = "o3";
+    pub OPENAI_API_BASE: &str = "https://api.openai.com";
+    pub OPENAI_API_KEY: Option<&str> = None;
+    pub OPENAI_TIMEOUT_MS: Duration = Duration::from_millis(30_000), |value| {
+        value.parse().map(Duration::from_millis)
+    };
+    pub OPENAI_REQUEST_MAX_RETRIES: u64 = 4;
+    pub OPENAI_STREAM_MAX_RETRIES: u64 = 10;
+
+    /// Maximum idle time (no SSE events received) before the stream is treated as
+    /// disconnected and retried by the agent. The default of 75 s is slightly
+    /// above OpenAI’s documented 60 s load‑balancer timeout.
+    pub OPENAI_STREAM_IDLE_TIMEOUT_MS: Duration = Duration::from_millis(75_000), |value| {
+        value.parse().map(Duration::from_millis)
+    };
+
+    pub CODEX_RS_SSE_FIXTURE: Option<&str> = None;
+}
+
+pub fn get_api_key() -> Result<&'static str> {
+    OPENAI_API_KEY.ok_or_else(|| CodexErr::EnvVar("OPENAI_API_KEY"))
+}
--- a/codex-rs/core/src/is_safe_command.rs
+++ b/codex-rs/core/src/is_safe_command.rs
@@ -0,0 +1,332 @@
+use tree_sitter::Parser;
+use tree_sitter::Tree;
+use tree_sitter_bash::LANGUAGE as BASH;
+
+pub fn is_known_safe_command(command: &[String]) -> bool {
+    if is_safe_to_call_with_exec(command) {
+        return true;
+    }
+
+    // TODO(mbolin): Also support safe commands that are piped together such
+    // as `cat foo | wc -l`.
+    matches!(
+        command,
+        [bash, flag, script]
+            if bash == "bash"
+            && flag == "-lc"
+            && try_parse_bash(script).and_then(|tree|
+                try_parse_single_word_only_command(&tree, script)).is_some_and(|parsed_bash_command| is_safe_to_call_with_exec(&parsed_bash_command))
+    )
+}
+
+fn is_safe_to_call_with_exec(command: &[String]) -> bool {
+    let cmd0 = command.first().map(String::as_str);
+
+    match cmd0 {
+        Some(
+            "cat" | "cd" | "echo" | "grep" | "head" | "ls" | "pwd" | "rg" | "tail" | "wc" | "which",
+        ) => true,
+
+        Some("find") => {
+            // Certain options to `find` can delete files, write to files, or
+            // execute arbitrary commands, so we cannot auto-approve the
+            // invocation of `find` in such cases.
+            #[rustfmt::skip]
+            const UNSAFE_FIND_OPTIONS: &[&str] = &[
+                // Options that can execute arbitrary commands.
+                "-exec", "-execdir", "-ok", "-okdir",
+                // Option that deletes matching files.
+                "-delete",
+                // Options that write pathnames to a file.
+                "-fls", "-fprint", "-fprint0", "-fprintf",
+            ];
+
+            !command
+                .iter()
+                .any(|arg| UNSAFE_FIND_OPTIONS.contains(&arg.as_str()))
+        }
+
+        // Git
+        Some("git") => matches!(
+            command.get(1).map(String::as_str),
+            Some("branch" | "status" | "log" | "diff" | "show")
+        ),
+
+        // Rust
+        Some("cargo") if command.get(1).map(String::as_str) == Some("check") => true,
+
+        // Special-case `sed -n {N|M,N}p FILE`
+        Some("sed")
+            if {
+                command.len() == 4
+                    && command.get(1).map(String::as_str) == Some("-n")
+                    && is_valid_sed_n_arg(command.get(2).map(String::as_str))
+                    && command.get(3).map(String::is_empty) == Some(false)
+            } =>
+        {
+            true
+        }
+
+        // ── anything else ─────────────────────────────────────────────────
+        _ => false,
+    }
+}
+
+fn try_parse_bash(bash_lc_arg: &str) -> Option<Tree> {
+    let lang = BASH.into();
+    let mut parser = Parser::new();
+    parser.set_language(&lang).expect("load bash grammar");
+
+    let old_tree: Option<&Tree> = None;
+    parser.parse(bash_lc_arg, old_tree)
+}
+
+/// If `tree` represents a single Bash command whose name and every argument is
+/// an ordinary `word`, return those words in order; otherwise, return `None`.
+///
+/// `src` must be the exact source string that was parsed into `tree`, so we can
+/// extract the text for every node.
+pub fn try_parse_single_word_only_command(tree: &Tree, src: &str) -> Option<Vec<String>> {
+    // Any parse error is an immediate rejection.
+    if tree.root_node().has_error() {
+        return None;
+    }
+
+    // (program …) with exactly one statement
+    let root = tree.root_node();
+    if root.kind() != "program" || root.named_child_count() != 1 {
+        return None;
+    }
+
+    let cmd = root.named_child(0)?; // (command …)
+    if cmd.kind() != "command" {
+        return None;
+    }
+
+    let mut words = Vec::new();
+    let mut cursor = cmd.walk();
+
+    for child in cmd.named_children(&mut cursor) {
+        match child.kind() {
+            // The command name node wraps one `word` child.
+            "command_name" => {
+                let word_node = child.named_child(0)?; // make sure it's only a word
+                if word_node.kind() != "word" {
+                    return None;
+                }
+                words.push(word_node.utf8_text(src.as_bytes()).ok()?.to_owned());
+            }
+            // Positional‑argument word (allowed).
+            "word" | "number" => {
+                words.push(child.utf8_text(src.as_bytes()).ok()?.to_owned());
+            }
+            "string" => {
+                if child.child_count() == 3
+                    && child.child(0)?.kind() == "\""
+                    && child.child(1)?.kind() == "string_content"
+                    && child.child(2)?.kind() == "\""
+                {
+                    words.push(child.child(1)?.utf8_text(src.as_bytes()).ok()?.to_owned());
+                } else {
+                    // Anything else means the command is *not* plain words.
+                    return None;
+                }
+            }
+            "concatenation" => {
+                // TODO: Consider things like `'ab\'a'`.
+                return None;
+            }
+            "raw_string" => {
+                // Raw string is a single word, but we need to strip the quotes.
+                let raw_string = child.utf8_text(src.as_bytes()).ok()?;
+                let stripped = raw_string
+                    .strip_prefix('\'')
+                    .and_then(|s| s.strip_suffix('\''));
+                if let Some(stripped) = stripped {
+                    words.push(stripped.to_owned());
+                } else {
+                    return None;
+                }
+            }
+            // Anything else means the command is *not* plain words.
+            _ => return None,
+        }
+    }
+
+    Some(words)
+}
+
+/* ----------------------------------------------------------
+Example
+---------------------------------------------------------- */
+
+/// Returns true if `arg` matches /^(\d+,)?\d+p$/
+fn is_valid_sed_n_arg(arg: Option<&str>) -> bool {
+    // unwrap or bail
+    let s = match arg {
+        Some(s) => s,
+        None => return false,
+    };
+
+    // must end with 'p', strip it
+    let core = match s.strip_suffix('p') {
+        Some(rest) => rest,
+        None => return false,
+    };
+
+    // split on ',' and ensure 1 or 2 numeric parts
+    let parts: Vec<&str> = core.split(',').collect();
+    match parts.as_slice() {
+        // single number, e.g. "10"
+        [num] => !num.is_empty() && num.chars().all(|c| c.is_ascii_digit()),
+
+        // two numbers, e.g. "1,5"
+        [a, b] => {
+            !a.is_empty()
+                && !b.is_empty()
+                && a.chars().all(|c| c.is_ascii_digit())
+                && b.chars().all(|c| c.is_ascii_digit())
+        }
+
+        // anything else (more than one comma) is invalid
+        _ => false,
+    }
+}
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    fn vec_str(args: &[&str]) -> Vec<String> {
+        args.iter().map(|s| s.to_string()).collect()
+    }
+
+    #[test]
+    fn known_safe_examples() {
+        assert!(is_safe_to_call_with_exec(&vec_str(&["ls"])));
+        assert!(is_safe_to_call_with_exec(&vec_str(&["git", "status"])));
+        assert!(is_safe_to_call_with_exec(&vec_str(&[
+            "sed", "-n", "1,5p", "file.txt"
+        ])));
+
+        // Safe `find` command (no unsafe options).
+        assert!(is_safe_to_call_with_exec(&vec_str(&[
+            "find", ".", "-name", "file.txt"
+        ])));
+    }
+
+    #[test]
+    fn unknown_or_partial() {
+        assert!(!is_safe_to_call_with_exec(&vec_str(&["foo"])));
+        assert!(!is_safe_to_call_with_exec(&vec_str(&["git", "fetch"])));
+        assert!(!is_safe_to_call_with_exec(&vec_str(&[
+            "sed", "-n", "xp", "file.txt"
+        ])));
+
+        // Unsafe `find` commands.
+        for args in [
+            vec_str(&["find", ".", "-name", "file.txt", "-exec", "rm", "{}", ";"]),
+            vec_str(&[
+                "find", ".", "-name", "*.py", "-execdir", "python3", "{}", ";",
+            ]),
+            vec_str(&["find", ".", "-name", "file.txt", "-ok", "rm", "{}", ";"]),
+            vec_str(&["find", ".", "-name", "*.py", "-okdir", "python3", "{}", ";"]),
+            vec_str(&["find", ".", "-delete", "-name", "file.txt"]),
+            vec_str(&["find", ".", "-fls", "/etc/passwd"]),
+            vec_str(&["find", ".", "-fprint", "/etc/passwd"]),
+            vec_str(&["find", ".", "-fprint0", "/etc/passwd"]),
+            vec_str(&["find", ".", "-fprintf", "/root/suid.txt", "%#m %u %p\n"]),
+        ] {
+            assert!(
+                !is_safe_to_call_with_exec(&args),
+                "expected {:?} to be unsafe",
+                args
+            );
+        }
+    }
+
+    #[test]
+    fn bash_lc_safe_examples() {
+        assert!(is_known_safe_command(&vec_str(&["bash", "-lc", "ls"])));
+        assert!(is_known_safe_command(&vec_str(&["bash", "-lc", "ls -1"])));
+        assert!(is_known_safe_command(&vec_str(&[
+            "bash",
+            "-lc",
+            "git status"
+        ])));
+        assert!(is_known_safe_command(&vec_str(&[
+            "bash",
+            "-lc",
+            "grep -R \"Cargo.toml\" -n"
+        ])));
+        assert!(is_known_safe_command(&vec_str(&[
+            "bash",
+            "-lc",
+            "sed -n 1,5p file.txt"
+        ])));
+        assert!(is_known_safe_command(&vec_str(&[
+            "bash",
+            "-lc",
+            "sed -n '1,5p' file.txt"
+        ])));
+
+        assert!(is_known_safe_command(&vec_str(&[
+            "bash",
+            "-lc",
+            "find . -name file.txt"
+        ])));
+    }
+
+    #[test]
+    fn bash_lc_unsafe_examples() {
+        assert!(
+            !is_known_safe_command(&vec_str(&["bash", "-lc", "git", "status"])),
+            "Four arg version is not known to be safe."
+        );
+        assert!(
+            !is_known_safe_command(&vec_str(&["bash", "-lc", "'git status'"])),
+            "The extra quoting around 'git status' makes it a program named 'git status' and is therefore unsafe."
+        );
+
+        assert!(
+            !is_known_safe_command(&vec_str(&["bash", "-lc", "find . -name file.txt -delete"])),
+            "Unsafe find option should not be auto‑approved."
+        );
+    }
+
+    #[test]
+    fn test_try_parse_single_word_only_command() {
+        let script_with_single_quoted_string = "sed -n '1,5p' file.txt";
+        let parsed_words = try_parse_bash(script_with_single_quoted_string)
+            .and_then(|tree| {
+                try_parse_single_word_only_command(&tree, script_with_single_quoted_string)
+            })
+            .unwrap();
+        assert_eq!(
+            vec![
+                "sed".to_string(),
+                "-n".to_string(),
+                // Ensure the single quotes are properly removed.
+                "1,5p".to_string(),
+                "file.txt".to_string()
+            ],
+            parsed_words,
+        );
+
+        let script_with_number_arg = "ls -1";
+        let parsed_words = try_parse_bash(script_with_number_arg)
+            .and_then(|tree| try_parse_single_word_only_command(&tree, script_with_number_arg))
+            .unwrap();
+        assert_eq!(vec!["ls", "-1"], parsed_words,);
+
+        let script_with_double_quoted_string_with_no_funny_stuff_arg = "grep -R \"Cargo.toml\" -n";
+        let parsed_words = try_parse_bash(script_with_double_quoted_string_with_no_funny_stuff_arg)
+            .and_then(|tree| {
+                try_parse_single_word_only_command(
+                    &tree,
+                    script_with_double_quoted_string_with_no_funny_stuff_arg,
+                )
+            })
+            .unwrap();
+        assert_eq!(vec!["grep", "-R", "Cargo.toml", "-n"], parsed_words);
+    }
+}
--- a/codex-rs/core/src/lib.rs
+++ b/codex-rs/core/src/lib.rs
@@ -0,0 +1,30 @@
+//! Root of the `codex-core` library.
+
+// Prevent accidental direct writes to stdout/stderr in library code. All
+// user‑visible output must go through the appropriate abstraction (e.g.,
+// the TUI or the tracing stack).
+#![deny(clippy::print_stdout, clippy::print_stderr)]
+
+mod client;
+pub mod codex;
+pub mod codex_wrapper;
+pub mod config;
+pub mod error;
+pub mod exec;
+mod flags;
+mod is_safe_command;
+#[cfg(target_os = "linux")]
+mod linux;
+mod models;
+pub mod protocol;
+mod safety;
+pub mod util;
+
+pub use codex::Codex;
+
+#[cfg(feature = "cli")]
+mod approval_mode_cli_arg;
+#[cfg(feature = "cli")]
+pub use approval_mode_cli_arg::ApprovalModeCliArg;
+#[cfg(feature = "cli")]
+pub use approval_mode_cli_arg::SandboxModeCliArg;
--- a/codex-rs/core/src/linux.rs
+++ b/codex-rs/core/src/linux.rs
@@ -0,0 +1,320 @@
+use std::collections::BTreeMap;
+use std::io;
+use std::path::PathBuf;
+use std::sync::Arc;
+
+use crate::error::CodexErr;
+use crate::error::Result;
+use crate::error::SandboxErr;
+use crate::exec::exec;
+use crate::exec::ExecParams;
+use crate::exec::RawExecToolCallOutput;
+
+use landlock::Access;
+use landlock::AccessFs;
+use landlock::CompatLevel;
+use landlock::Compatible;
+use landlock::Ruleset;
+use landlock::RulesetAttr;
+use landlock::RulesetCreatedAttr;
+use landlock::ABI;
+use seccompiler::apply_filter;
+use seccompiler::BpfProgram;
+use seccompiler::SeccompAction;
+use seccompiler::SeccompCmpArgLen;
+use seccompiler::SeccompCmpOp;
+use seccompiler::SeccompCondition;
+use seccompiler::SeccompFilter;
+use seccompiler::SeccompRule;
+use seccompiler::TargetArch;
+use tokio::sync::Notify;
+
+pub async fn exec_linux(
+    params: ExecParams,
+    writable_roots: &[PathBuf],
+    ctrl_c: Arc<Notify>,
+) -> Result<RawExecToolCallOutput> {
+    // Allow READ on /
+    // Allow WRITE on /dev/null
+    let ctrl_c_copy = ctrl_c.clone();
+    let writable_roots_copy = writable_roots.to_vec();
+
+    // Isolate thread to run the sandbox from
+    let tool_call_output = std::thread::spawn(move || {
+        let rt = tokio::runtime::Builder::new_current_thread()
+            .enable_all()
+            .build()
+            .expect("Failed to create runtime");
+
+        rt.block_on(async {
+            let abi = ABI::V5;
+            let access_rw = AccessFs::from_all(abi);
+            let access_ro = AccessFs::from_read(abi);
+
+            let mut ruleset = Ruleset::default()
+                .set_compatibility(CompatLevel::BestEffort)
+                .handle_access(access_rw)?
+                .create()?
+                .add_rules(landlock::path_beneath_rules(&["/"], access_ro))?
+                .add_rules(landlock::path_beneath_rules(&["/dev/null"], access_rw))?
+                .set_no_new_privs(true);
+
+            if !writable_roots_copy.is_empty() {
+                ruleset = ruleset.add_rules(landlock::path_beneath_rules(
+                    &writable_roots_copy,
+                    access_rw,
+                ))?;
+            }
+
+            let status = ruleset.restrict_self()?;
+
+            // TODO(wpt): Probably wanna expand this more generically and not warn every time.
+            if status.ruleset == landlock::RulesetStatus::NotEnforced {
+                return Err(CodexErr::Sandbox(SandboxErr::LandlockRestrict));
+            }
+
+            if let Err(e) = install_network_seccomp_filter() {
+                return Err(CodexErr::Sandbox(e));
+            }
+
+            exec(params, ctrl_c_copy).await
+        })
+    })
+    .join();
+
+    match tool_call_output {
+        Ok(Ok(output)) => Ok(output),
+        Ok(Err(e)) => Err(e),
+        Err(e) => Err(CodexErr::Io(io::Error::new(
+            io::ErrorKind::Other,
+            format!("thread join failed: {e:?}"),
+        ))),
+    }
+}
+
+fn install_network_seccomp_filter() -> std::result::Result<(), SandboxErr> {
+    // Build rule map.
+    let mut rules: BTreeMap<i64, Vec<SeccompRule>> = BTreeMap::new();
+
+    // Helper – insert unconditional deny rule for syscall number.
+    let mut deny_syscall = |nr: i64| {
+        rules.insert(nr, vec![]); // empty rule vec = unconditional match
+    };
+
+    deny_syscall(libc::SYS_connect);
+    deny_syscall(libc::SYS_accept);
+    deny_syscall(libc::SYS_accept4);
+    deny_syscall(libc::SYS_bind);
+    deny_syscall(libc::SYS_listen);
+    deny_syscall(libc::SYS_getpeername);
+    deny_syscall(libc::SYS_getsockname);
+    deny_syscall(libc::SYS_shutdown);
+    deny_syscall(libc::SYS_sendto);
+    deny_syscall(libc::SYS_sendmsg);
+    deny_syscall(libc::SYS_sendmmsg);
+    deny_syscall(libc::SYS_recvfrom);
+    deny_syscall(libc::SYS_recvmsg);
+    deny_syscall(libc::SYS_recvmmsg);
+    deny_syscall(libc::SYS_getsockopt);
+    deny_syscall(libc::SYS_setsockopt);
+    deny_syscall(libc::SYS_ptrace);
+
+    // For `socket` we allow AF_UNIX (arg0 == AF_UNIX) and deny everything else.
+    let unix_only_rule = SeccompRule::new(vec![SeccompCondition::new(
+        0, // first argument (domain)
+        SeccompCmpArgLen::Dword,
+        SeccompCmpOp::Eq,
+        libc::AF_UNIX as u64,
+    )?])?;
+
+    rules.insert(libc::SYS_socket, vec![unix_only_rule]);
+    rules.insert(libc::SYS_socketpair, vec![]); // always deny (Unix can use socketpair but fine, keep open?)
+
+    let filter = SeccompFilter::new(
+        rules,
+        SeccompAction::Allow,                     // default – allow
+        SeccompAction::Errno(libc::EPERM as u32), // when rule matches – return EPERM
+        if cfg!(target_arch = "x86_64") {
+            TargetArch::x86_64
+        } else if cfg!(target_arch = "aarch64") {
+            TargetArch::aarch64
+        } else {
+            unimplemented!("unsupported architecture for seccomp filter");
+        },
+    )?;
+
+    let prog: BpfProgram = filter.try_into()?;
+
+    apply_filter(&prog)?;
+
+    Ok(())
+}
+
+#[cfg(test)]
+mod tests_linux {
+    use super::*;
+    use crate::exec::process_exec_tool_call;
+    use crate::exec::ExecParams;
+    use crate::exec::SandboxType;
+    use std::sync::Arc;
+    use tempfile::NamedTempFile;
+    use tokio::sync::Notify;
+
+    #[allow(clippy::print_stdout)]
+    async fn run_cmd(cmd: &[&str], writable_roots: &[PathBuf]) {
+        let params = ExecParams {
+            command: cmd.iter().map(|elm| elm.to_string()).collect(),
+            workdir: None,
+            timeout_ms: Some(200),
+        };
+        let res = process_exec_tool_call(
+            params,
+            SandboxType::LinuxSeccomp,
+            writable_roots,
+            Arc::new(Notify::new()),
+        )
+        .await
+        .unwrap();
+
+        if res.exit_code != 0 {
+            println!("stdout:\n{}", res.stdout);
+            println!("stderr:\n{}", res.stderr);
+            panic!("exit code: {}", res.exit_code);
+        }
+    }
+
+    #[tokio::test]
+    async fn test_root_read() {
+        run_cmd(&["ls", "-l", "/bin"], &[]).await;
+    }
+
+    #[tokio::test]
+    #[should_panic]
+    async fn test_root_write() {
+        let tmpfile = NamedTempFile::new().unwrap();
+        let tmpfile_path = tmpfile.path().to_string_lossy();
+        run_cmd(
+            &["bash", "-lc", &format!("echo blah > {}", tmpfile_path)],
+            &[],
+        )
+        .await;
+    }
+
+    #[tokio::test]
+    async fn test_dev_null_write() {
+        run_cmd(&["echo", "blah", ">", "/dev/null"], &[]).await;
+    }
+
+    #[tokio::test]
+    async fn test_writable_root() {
+        let tmpdir = tempfile::tempdir().unwrap();
+        let file_path = tmpdir.path().join("test");
+        run_cmd(
+            &[
+                "bash",
+                "-lc",
+                &format!("echo blah > {}", file_path.to_string_lossy()),
+            ],
+            &[tmpdir.path().to_path_buf()],
+        )
+        .await;
+    }
+
+    /// Helper that runs `cmd` under the Linux sandbox and asserts that the command
+    /// does NOT succeed (i.e. returns a non‑zero exit code) **unless** the binary
+    /// is missing in which case we silently treat it as an accepted skip so the
+    /// suite remains green on leaner CI images.
+    async fn assert_network_blocked(cmd: &[&str]) {
+        let params = ExecParams {
+            command: cmd.iter().map(|s| s.to_string()).collect(),
+            workdir: None,
+            // Give the tool a generous 2‑second timeout so even slow DNS timeouts
+            // do not stall the suite.
+            timeout_ms: Some(2_000),
+        };
+
+        let result = process_exec_tool_call(
+            params,
+            SandboxType::LinuxSeccomp,
+            &[],
+            Arc::new(Notify::new()),
+        )
+        .await;
+
+        let (exit_code, stdout, stderr) = match result {
+            Ok(output) => (output.exit_code, output.stdout, output.stderr),
+            Err(CodexErr::Sandbox(SandboxErr::Denied(exit_code, stdout, stderr))) => {
+                (exit_code, stdout, stderr)
+            }
+            _ => {
+                panic!("expected sandbox denied error, got: {:?}", result);
+            }
+        };
+
+        dbg!(&stderr);
+        dbg!(&stdout);
+        dbg!(&exit_code);
+
+        // A completely missing binary exits with 127.  Anything else should also
+        // be non‑zero (EPERM from seccomp will usually bubble up as 1, 2, 13…)
+        // If—*and only if*—the command exits 0 we consider the sandbox breached.
+
+        if exit_code == 0 {
+            panic!(
+                "Network sandbox FAILED - {:?} exited 0\nstdout:\n{}\nstderr:\n{}",
+                cmd, stdout, stderr
+            );
+        }
+    }
+
+    #[tokio::test]
+    async fn sandbox_blocks_curl() {
+        assert_network_blocked(&["curl", "-I", "http://openai.com"]).await;
+    }
+
+    #[cfg(target_os = "linux")]
+    #[tokio::test]
+    async fn sandbox_blocks_wget() {
+        assert_network_blocked(&["wget", "-qO-", "http://openai.com"]).await;
+    }
+
+    #[tokio::test]
+    async fn sandbox_blocks_ping() {
+        // ICMP requires raw socket – should be denied quickly with EPERM.
+        assert_network_blocked(&["ping", "-c", "1", "8.8.8.8"]).await;
+    }
+
+    #[tokio::test]
+    async fn sandbox_blocks_nc() {
+        // Zero‑length connection attempt to localhost.
+        assert_network_blocked(&["nc", "-z", "127.0.0.1", "80"]).await;
+    }
+
+    #[tokio::test]
+    async fn sandbox_blocks_ssh() {
+        // Force ssh to attempt a real TCP connection but fail quickly.  `BatchMode`
+        // avoids password prompts, and `ConnectTimeout` keeps the hang time low.
+        assert_network_blocked(&[
+            "ssh",
+            "-o",
+            "BatchMode=yes",
+            "-o",
+            "ConnectTimeout=1",
+            "github.com",
+        ])
+        .await;
+    }
+
+    #[tokio::test]
+    async fn sandbox_blocks_getent() {
+        assert_network_blocked(&["getent", "ahosts", "openai.com"]).await;
+    }
+
+    #[tokio::test]
+    async fn sandbox_blocks_dev_tcp_redirection() {
+        // This syntax is only supported by bash and zsh. We try bash first.
+        // Fallback generic socket attempt using /bin/sh with bash‑style /dev/tcp.  Not
+        // all images ship bash, so we guard against 127 as well.
+        assert_network_blocked(&["bash", "-c", "echo hi > /dev/tcp/127.0.0.1/80"]).await;
+    }
+}
--- a/codex-rs/core/src/models.rs
+++ b/codex-rs/core/src/models.rs
@@ -0,0 +1,175 @@
+use base64::Engine;
+use serde::ser::Serializer;
+use serde::Deserialize;
+use serde::Serialize;
+
+use crate::protocol::InputItem;
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+#[serde(tag = "type", rename_all = "snake_case")]
+pub enum ResponseInputItem {
+    Message {
+        role: String,
+        content: Vec<ContentItem>,
+    },
+    FunctionCallOutput {
+        call_id: String,
+        output: FunctionCallOutputPayload,
+    },
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+#[serde(tag = "type", rename_all = "snake_case")]
+pub enum ContentItem {
+    InputText { text: String },
+    InputImage { image_url: String },
+    OutputText { text: String },
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+#[serde(tag = "type", rename_all = "snake_case")]
+pub enum ResponseItem {
+    Message {
+        role: String,
+        content: Vec<ContentItem>,
+    },
+    FunctionCall {
+        name: String,
+        // The Responses API returns the function call arguments as a *string* that contains
+        // JSON, not as an already‑parsed object. We keep it as a raw string here and let
+        // Session::handle_function_call parse it into a Value. This exactly matches the
+        // Chat Completions + Responses API behavior.
+        arguments: String,
+        call_id: String,
+    },
+    // NOTE: The input schema for `function_call_output` objects that clients send to the
+    // OpenAI /v1/responses endpoint is NOT the same shape as the objects the server returns on the
+    // SSE stream. When *sending* we must wrap the string output inside an object that includes a
+    // required `success` boolean. The upstream TypeScript CLI does this implicitly. To ensure we
+    // serialize exactly the expected shape we introduce a dedicated payload struct and flatten it
+    // here.
+    FunctionCallOutput {
+        call_id: String,
+        output: FunctionCallOutputPayload,
+    },
+    #[serde(other)]
+    Other,
+}
+
+impl From<Vec<InputItem>> for ResponseInputItem {
+    fn from(items: Vec<InputItem>) -> Self {
+        Self::Message {
+            role: "user".to_string(),
+            content: items
+                .into_iter()
+                .filter_map(|c| match c {
+                    InputItem::Text { text } => Some(ContentItem::InputText { text }),
+                    InputItem::Image { image_url } => Some(ContentItem::InputImage { image_url }),
+                    InputItem::LocalImage { path } => match std::fs::read(&path) {
+                        Ok(bytes) => {
+                            let mime = mime_guess::from_path(&path)
+                                .first()
+                                .map(|m| m.essence_str().to_owned())
+                                .unwrap_or_else(|| "application/octet-stream".to_string());
+                            let encoded = base64::engine::general_purpose::STANDARD.encode(bytes);
+                            Some(ContentItem::InputImage {
+                                image_url: format!("data:{};base64,{}", mime, encoded),
+                            })
+                        }
+                        Err(err) => {
+                            tracing::warn!(
+                                "Skipping image {} – could not read file: {}",
+                                path.display(),
+                                err
+                            );
+                            None
+                        }
+                    },
+                })
+                .collect::<Vec<ContentItem>>(),
+        }
+    }
+}
+
+#[expect(dead_code)]
+#[derive(Deserialize, Debug, Clone)]
+pub struct FunctionCallOutputPayload {
+    pub content: String,
+    pub success: Option<bool>,
+}
+
+// The Responses API expects two *different* shapes depending on success vs failure:
+//   • success → output is a plain string (no nested object)
+//   • failure → output is an object { content, success:false }
+// The upstream TypeScript CLI implements this by special‑casing the serialize path.
+// We replicate that behavior with a manual Serialize impl.
+
+impl Serialize for FunctionCallOutputPayload {
+    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
+    where
+        S: Serializer,
+    {
+        // The upstream TypeScript CLI always serializes `output` as a *plain string* regardless
+        // of whether the function call succeeded or failed. The boolean is purely informational
+        // for local bookkeeping and is NOT sent to the OpenAI endpoint. Sending the nested object
+        // form `{ content, success:false }` triggers the 400 we are still seeing. Mirror the JS CLI
+        // exactly: always emit a bare string.
+
+        serializer.serialize_str(&self.content)
+    }
+}
+
+// Implement Display so callers can treat the payload like a plain string when logging or doing
+// trivial substring checks in tests (existing tests call `.contains()` on the output). Display
+// returns the raw `content` field.
+
+impl std::fmt::Display for FunctionCallOutputPayload {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        f.write_str(&self.content)
+    }
+}
+
+impl std::ops::Deref for FunctionCallOutputPayload {
+    type Target = str;
+    fn deref(&self) -> &Self::Target {
+        &self.content
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn serializes_success_as_plain_string() {
+        let item = ResponseInputItem::FunctionCallOutput {
+            call_id: "call1".into(),
+            output: FunctionCallOutputPayload {
+                content: "ok".into(),
+                success: None,
+            },
+        };
+
+        let json = serde_json::to_string(&item).unwrap();
+        let v: serde_json::Value = serde_json::from_str(&json).unwrap();
+
+        // Success case -> output should be a plain string
+        assert_eq!(v.get("output").unwrap().as_str().unwrap(), "ok");
+    }
+
+    #[test]
+    fn serializes_failure_as_string() {
+        let item = ResponseInputItem::FunctionCallOutput {
+            call_id: "call1".into(),
+            output: FunctionCallOutputPayload {
+                content: "bad".into(),
+                success: Some(false),
+            },
+        };
+
+        let json = serde_json::to_string(&item).unwrap();
+        let v: serde_json::Value = serde_json::from_str(&json).unwrap();
+
+        assert_eq!(v.get("output").unwrap().as_str().unwrap(), "bad");
+    }
+}
--- a/codex-rs/core/src/protocol.rs
+++ b/codex-rs/core/src/protocol.rs
@@ -0,0 +1,275 @@
+//! Defines the protocol for a Codex session between a client and an agent.
+//!
+//! Uses a SQ (Submission Queue) / EQ (Event Queue) pattern to asynchronously communicate
+//! between user and agent.
+
+use std::collections::HashMap;
+use std::path::PathBuf;
+
+use serde::Deserialize;
+use serde::Serialize;
+
+/// Submission Queue Entry - requests from user
+#[derive(Debug, Clone, Deserialize, Serialize)]
+pub struct Submission {
+    /// Unique id for this Submission to correlate with Events
+    pub id: String,
+    /// Payload
+    pub op: Op,
+}
+
+/// Submission operation
+#[derive(Debug, Clone, Deserialize, Serialize)]
+#[serde(tag = "type", rename_all = "snake_case")]
+#[non_exhaustive]
+pub enum Op {
+    /// Configure the model session.
+    ConfigureSession {
+        /// If not specified, server will use its default model.
+        model: Option<String>,
+        /// Model instructions
+        instructions: Option<String>,
+        /// When to escalate for approval for execution
+        approval_policy: AskForApproval,
+        /// How to sandbox commands executed in the system
+        sandbox_policy: SandboxPolicy,
+    },
+
+    /// Abort current task.
+    /// This server sends no corresponding Event
+    Interrupt,
+
+    /// Input from the user
+    UserInput {
+        /// User input items, see `InputItem`
+        items: Vec<InputItem>,
+    },
+
+    /// Approve a command execution
+    ExecApproval {
+        /// The id of the submission we are approving
+        id: String,
+        /// The user's decision in response to the request.
+        decision: ReviewDecision,
+    },
+
+    /// Approve a code patch
+    PatchApproval {
+        /// The id of the submission we are approving
+        id: String,
+        /// The user's decision in response to the request.
+        decision: ReviewDecision,
+    },
+}
+
+/// Determines how liberally commands are auto‑approved by the system.
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
+pub enum AskForApproval {
+    /// Under this policy, only “known safe” commands—as determined by
+    /// `is_safe_command()`—that **only read files** are auto‑approved.
+    /// Everything else will ask the user to approve.
+    UnlessAllowListed,
+
+    /// In addition to everything allowed by **`Suggest`**, commands that
+    /// *write* to files **within the user’s approved list of writable paths**
+    /// are also auto‑approved.
+    /// TODO(ragona): fix
+    AutoEdit,
+
+    /// *All* commands are auto‑approved, but they are expected to run inside a
+    /// sandbox where network access is disabled and writes are confined to a
+    /// specific set of paths. If the command fails, it will be escalated to
+    /// the user to approve execution without a sandbox.
+    OnFailure,
+
+    /// Never ask the user to approve commands. Failures are immediately returned
+    /// to the model, and never escalated to the user for approval.
+    Never,
+}
+
+/// Determines execution restrictions for model shell commands
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
+pub enum SandboxPolicy {
+    /// Network syscalls will be blocked
+    NetworkRestricted,
+    /// Filesystem writes will be restricted
+    FileWriteRestricted,
+    /// Network and filesystem writes will be restricted
+    NetworkAndFileWriteRestricted,
+    /// No restrictions; full "unsandboxed" mode
+    DangerousNoRestrictions,
+}
+
+/// User input
+#[non_exhaustive]
+#[derive(Debug, Clone, Deserialize, Serialize)]
+#[serde(tag = "type", rename_all = "snake_case")]
+pub enum InputItem {
+    Text {
+        text: String,
+    },
+    /// Pre‑encoded data: URI image.
+    Image {
+        image_url: String,
+    },
+
+    /// Local image path provided by the user.  This will be converted to an
+    /// `Image` variant (base64 data URL) during request serialization.
+    LocalImage {
+        path: std::path::PathBuf,
+    },
+}
+
+/// Event Queue Entry - events from agent
+#[derive(Debug, Clone, Deserialize, Serialize)]
+pub struct Event {
+    /// Submission `id` that this event is correlated with.
+    pub id: String,
+    /// Payload
+    pub msg: EventMsg,
+}
+
+/// Response event from the agent
+#[non_exhaustive]
+#[derive(Debug, Clone, Deserialize, Serialize)]
+#[serde(tag = "type", rename_all = "snake_case")]
+pub enum EventMsg {
+    /// Error while executing a submission
+    Error {
+        message: String,
+    },
+
+    /// Agent has started a task
+    TaskStarted,
+
+    /// Agent has completed all actions
+    TaskComplete,
+
+    /// Agent text output message
+    AgentMessage {
+        message: String,
+    },
+
+    /// Ack the client's configure message.
+    SessionConfigured {
+        /// Tell the client what model is being queried.
+        model: String,
+    },
+
+    /// Notification that the server is about to execute a command.
+    ExecCommandBegin {
+        /// Identifier so this can be paired with the ExecCommandEnd event.
+        call_id: String,
+        /// The command to be executed.
+        command: Vec<String>,
+        /// The command's working directory if not the default cwd for the
+        /// agent.
+        cwd: String,
+    },
+
+    ExecCommandEnd {
+        /// Identifier for the ExecCommandBegin that finished.
+        call_id: String,
+        /// Captured stdout
+        stdout: String,
+        /// Captured stderr
+        stderr: String,
+        /// The command's exit code.
+        exit_code: i32,
+    },
+
+    ExecApprovalRequest {
+        /// The command to be executed.
+        command: Vec<String>,
+        /// The command's working directory.
+        cwd: PathBuf,
+        /// Optional human‑readable reason for the approval (e.g. retry without
+        /// sandbox).
+        #[serde(skip_serializing_if = "Option::is_none")]
+        reason: Option<String>,
+    },
+
+    ApplyPatchApprovalRequest {
+        changes: HashMap<PathBuf, FileChange>,
+        /// Optional explanatory reason (e.g. request for extra write access).
+        #[serde(skip_serializing_if = "Option::is_none")]
+        reason: Option<String>,
+
+        /// When set, the agent is asking the user to allow writes under this
+        /// root for the remainder of the session.
+        #[serde(skip_serializing_if = "Option::is_none")]
+        grant_root: Option<PathBuf>,
+    },
+
+    BackgroundEvent {
+        message: String,
+    },
+
+    /// Notification that the agent is about to apply a code patch. Mirrors
+    /// `ExecCommandBegin` so front‑ends can show progress indicators.
+    PatchApplyBegin {
+        /// Identifier so this can be paired with the PatchApplyEnd event.
+        call_id: String,
+
+        /// If true, there was no ApplyPatchApprovalRequest for this patch.
+        auto_approved: bool,
+
+        /// The changes to be applied.
+        changes: HashMap<PathBuf, FileChange>,
+    },
+
+    /// Notification that a patch application has finished.
+    PatchApplyEnd {
+        /// Identifier for the PatchApplyBegin that finished.
+        call_id: String,
+        /// Captured stdout (summary printed by apply_patch).
+        stdout: String,
+        /// Captured stderr (parser errors, IO failures, etc.).
+        stderr: String,
+        /// Whether the patch was applied successfully.
+        success: bool,
+    },
+}
+
+/// User's decision in response to an ExecApprovalRequest.
+#[derive(Debug, Default, Clone, Copy, Deserialize, Serialize)]
+#[serde(rename_all = "snake_case")]
+pub enum ReviewDecision {
+    /// User has approved this command and the agent should execute it.
+    Approved,
+
+    /// User has approved this command and wants to automatically approve any
+    /// future identical instances (`command` and `cwd` match exactly) for the
+    /// remainder of the session.
+    ApprovedForSession,
+
+    /// User has denied this command and the agent should not execute it, but
+    /// it should continue the session and try something else.
+    #[default]
+    Denied,
+
+    /// User has denied this command and the agent should not do anything until
+    /// the user's next command.
+    Abort,
+}
+
+#[derive(Debug, Clone, Deserialize, Serialize)]
+#[serde(rename_all = "snake_case")]
+pub enum FileChange {
+    Add {
+        content: String,
+    },
+    Delete,
+    Update {
+        unified_diff: String,
+        move_path: Option<PathBuf>,
+    },
+}
+
+#[derive(Debug, Clone, Deserialize, Serialize)]
+pub struct Chunk {
+    /// 1-based line index of the first line in the original file
+    pub orig_index: u32,
+    pub deleted_lines: Vec<String>,
+    pub inserted_lines: Vec<String>,
+}
--- a/codex-rs/core/src/safety.rs
+++ b/codex-rs/core/src/safety.rs
@@ -0,0 +1,236 @@
+use std::collections::HashMap;
+use std::collections::HashSet;
+use std::path::Component;
+use std::path::Path;
+use std::path::PathBuf;
+
+use codex_apply_patch::ApplyPatchFileChange;
+
+use crate::exec::SandboxType;
+use crate::is_safe_command::is_known_safe_command;
+use crate::protocol::AskForApproval;
+use crate::protocol::SandboxPolicy;
+
+#[derive(Debug)]
+pub enum SafetyCheck {
+    AutoApprove { sandbox_type: SandboxType },
+    AskUser,
+    Reject { reason: String },
+}
+
+pub fn assess_patch_safety(
+    changes: &HashMap<PathBuf, ApplyPatchFileChange>,
+    policy: AskForApproval,
+    writable_roots: &[PathBuf],
+) -> SafetyCheck {
+    if changes.is_empty() {
+        return SafetyCheck::Reject {
+            reason: "empty patch".to_string(),
+        };
+    }
+
+    match policy {
+        AskForApproval::OnFailure | AskForApproval::AutoEdit | AskForApproval::Never => {
+            // Continue to see if this can be auto-approved.
+        }
+        // TODO(ragona): I'm not sure this is actually correct? I believe in this case
+        // we want to continue to the writable paths check before asking the user.
+        AskForApproval::UnlessAllowListed => {
+            return SafetyCheck::AskUser;
+        }
+    }
+
+    if is_write_patch_constrained_to_writable_paths(changes, writable_roots) {
+        SafetyCheck::AutoApprove {
+            sandbox_type: SandboxType::None,
+        }
+    } else if policy == AskForApproval::OnFailure {
+        // Only auto‑approve when we can actually enforce a sandbox. Otherwise
+        // fall back to asking the user because the patch may touch arbitrary
+        // paths outside the project.
+        match get_platform_sandbox() {
+            Some(sandbox_type) => SafetyCheck::AutoApprove { sandbox_type },
+            None => SafetyCheck::AskUser,
+        }
+    } else if policy == AskForApproval::Never {
+        SafetyCheck::Reject {
+            reason: "writing outside of the project; rejected by user approval settings"
+                .to_string(),
+        }
+    } else {
+        SafetyCheck::AskUser
+    }
+}
+
+pub fn assess_command_safety(
+    command: &[String],
+    approval_policy: AskForApproval,
+    sandbox_policy: SandboxPolicy,
+    approved: &HashSet<Vec<String>>,
+) -> SafetyCheck {
+    let approve_without_sandbox = || SafetyCheck::AutoApprove {
+        sandbox_type: SandboxType::None,
+    };
+
+    // Previously approved or allow-listed commands
+    // All approval modes allow these commands to continue without sandboxing
+    if is_known_safe_command(command) || approved.contains(command) {
+        // TODO(ragona): I think we should consider running even these inside the sandbox, but it's
+        // a change in behavior so I'm keeping it at parity with upstream for now.
+        return approve_without_sandbox();
+    }
+
+    // Command was not known-safe or allow-listed
+    match sandbox_policy {
+        // Only the dangerous sandbox policy will run arbitrary commands outside a sandbox
+        SandboxPolicy::DangerousNoRestrictions => approve_without_sandbox(),
+        // All other policies try to run the command in a sandbox if it is available
+        _ => match get_platform_sandbox() {
+            // We have a sandbox, so we can approve the command in all modes
+            Some(sandbox_type) => SafetyCheck::AutoApprove { sandbox_type },
+            None => {
+                // We do not have a sandbox, so we need to consider the approval policy
+                match approval_policy {
+                    // Never is our "non-interactive" mode; it must automatically reject
+                    AskForApproval::Never => SafetyCheck::Reject {
+                        reason: "auto-rejected by user approval settings".to_string(),
+                    },
+                    // Otherwise, we ask the user for approval
+                    _ => SafetyCheck::AskUser,
+                }
+            }
+        },
+    }
+}
+
+pub fn get_platform_sandbox() -> Option<SandboxType> {
+    if cfg!(target_os = "macos") {
+        Some(SandboxType::MacosSeatbelt)
+    } else if cfg!(target_os = "linux") {
+        Some(SandboxType::LinuxSeccomp)
+    } else {
+        None
+    }
+}
+
+fn is_write_patch_constrained_to_writable_paths(
+    changes: &HashMap<PathBuf, ApplyPatchFileChange>,
+    writable_roots: &[PathBuf],
+) -> bool {
+    // Early‑exit if there are no declared writable roots.
+    if writable_roots.is_empty() {
+        return false;
+    }
+
+    // Normalize a path by removing `.` and resolving `..` without touching the
+    // filesystem (works even if the file does not exist).
+    fn normalize(path: &Path) -> Option<PathBuf> {
+        let mut out = PathBuf::new();
+        for comp in path.components() {
+            match comp {
+                Component::ParentDir => {
+                    out.pop();
+                }
+                Component::CurDir => { /* skip */ }
+                other => out.push(other.as_os_str()),
+            }
+        }
+        Some(out)
+    }
+
+    // Determine whether `path` is inside **any** writable root. Both `path`
+    // and roots are converted to absolute, normalized forms before the
+    // prefix check.
+    let is_path_writable = |p: &PathBuf| {
+        let cwd = match std::env::current_dir() {
+            Ok(cwd) => cwd,
+            Err(_) => return false,
+        };
+
+        let abs = if p.is_absolute() {
+            p.clone()
+        } else {
+            cwd.join(p)
+        };
+        let abs = match normalize(&abs) {
+            Some(v) => v,
+            None => return false,
+        };
+
+        writable_roots.iter().any(|root| {
+            let root_abs = if root.is_absolute() {
+                root.clone()
+            } else {
+                normalize(&cwd.join(root)).unwrap_or_else(|| cwd.join(root))
+            };
+
+            abs.starts_with(&root_abs)
+        })
+    };
+
+    for (path, change) in changes {
+        match change {
+            ApplyPatchFileChange::Add { .. } | ApplyPatchFileChange::Delete => {
+                if !is_path_writable(path) {
+                    return false;
+                }
+            }
+            ApplyPatchFileChange::Update { move_path, .. } => {
+                if !is_path_writable(path) {
+                    return false;
+                }
+                if let Some(dest) = move_path {
+                    if !is_path_writable(dest) {
+                        return false;
+                    }
+                }
+            }
+        }
+    }
+
+    true
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_writable_roots_constraint() {
+        let cwd = std::env::current_dir().unwrap();
+        let parent = cwd.parent().unwrap().to_path_buf();
+
+        // Helper to build a single‑entry map representing a patch that adds a
+        // file at `p`.
+        let make_add_change = |p: PathBuf| {
+            let mut m = HashMap::new();
+            m.insert(
+                p.clone(),
+                ApplyPatchFileChange::Add {
+                    content: String::new(),
+                },
+            );
+            m
+        };
+
+        let add_inside = make_add_change(PathBuf::from("inner.txt"));
+        let add_outside = make_add_change(parent.join("outside.txt"));
+
+        assert!(is_write_patch_constrained_to_writable_paths(
+            &add_inside,
+            &[PathBuf::from(".")]
+        ));
+
+        let add_outside_2 = make_add_change(parent.join("outside.txt"));
+        assert!(!is_write_patch_constrained_to_writable_paths(
+            &add_outside_2,
+            &[PathBuf::from(".")]
+        ));
+
+        // With parent dir added as writable root, it should pass.
+        assert!(is_write_patch_constrained_to_writable_paths(
+            &add_outside,
+            &[PathBuf::from("..")]
+        ))
+    }
+}
--- a/codex-rs/core/src/seatbelt_readonly_policy.sbpl
+++ b/codex-rs/core/src/seatbelt_readonly_policy.sbpl
@@ -0,0 +1,70 @@
+(version 1)
+
+; inspired by Chrome's sandbox policy:
+; https://source.chromium.org/chromium/chromium/src/+/main:sandbox/policy/mac/common.sb;l=273-319;drc=7b3962fe2e5fc9e2ee58000dc8fbf3429d84d3bd
+
+; start with closed-by-default
+(deny default)
+
+; allow read-only file operations
+(allow file-read*)
+
+; child processes inherit the policy of their parent
+(allow process-exec)
+(allow process-fork)
+(allow signal (target self))
+
+(allow file-write-data
+  (require-all
+    (path "/dev/null")
+    (vnode-type CHARACTER-DEVICE)))
+
+; sysctls permitted.
+(allow sysctl-read
+  (sysctl-name "hw.activecpu")
+  (sysctl-name "hw.busfrequency_compat")
+  (sysctl-name "hw.byteorder")
+  (sysctl-name "hw.cacheconfig")
+  (sysctl-name "hw.cachelinesize_compat")
+  (sysctl-name "hw.cpufamily")
+  (sysctl-name "hw.cpufrequency_compat")
+  (sysctl-name "hw.cputype")
+  (sysctl-name "hw.l1dcachesize_compat")
+  (sysctl-name "hw.l1icachesize_compat")
+  (sysctl-name "hw.l2cachesize_compat")
+  (sysctl-name "hw.l3cachesize_compat")
+  (sysctl-name "hw.logicalcpu_max")
+  (sysctl-name "hw.machine")
+  (sysctl-name "hw.ncpu")
+  (sysctl-name "hw.nperflevels")
+  (sysctl-name "hw.optional.arm.FEAT_BF16")
+  (sysctl-name "hw.optional.arm.FEAT_DotProd")
+  (sysctl-name "hw.optional.arm.FEAT_FCMA")
+  (sysctl-name "hw.optional.arm.FEAT_FHM")
+  (sysctl-name "hw.optional.arm.FEAT_FP16")
+  (sysctl-name "hw.optional.arm.FEAT_I8MM")
+  (sysctl-name "hw.optional.arm.FEAT_JSCVT")
+  (sysctl-name "hw.optional.arm.FEAT_LSE")
+  (sysctl-name "hw.optional.arm.FEAT_RDM")
+  (sysctl-name "hw.optional.arm.FEAT_SHA512")
+  (sysctl-name "hw.optional.armv8_2_sha512")
+  (sysctl-name "hw.memsize")
+  (sysctl-name "hw.pagesize")
+  (sysctl-name "hw.packages")
+  (sysctl-name "hw.pagesize_compat")
+  (sysctl-name "hw.physicalcpu_max")
+  (sysctl-name "hw.tbfrequency_compat")
+  (sysctl-name "hw.vectorunit")
+  (sysctl-name "kern.hostname")
+  (sysctl-name "kern.maxfilesperproc")
+  (sysctl-name "kern.osproductversion")
+  (sysctl-name "kern.osrelease")
+  (sysctl-name "kern.ostype")
+  (sysctl-name "kern.osvariant_status")
+  (sysctl-name "kern.osversion")
+  (sysctl-name "kern.secure_kernel")
+  (sysctl-name "kern.usrstack64")
+  (sysctl-name "kern.version")
+  (sysctl-name "sysctl.proc_cputype")
+  (sysctl-name-prefix "hw.perflevel")
+)
--- a/codex-rs/core/src/util.rs
+++ b/codex-rs/core/src/util.rs
@@ -0,0 +1,68 @@
+use std::sync::Arc;
+use std::time::Duration;
+
+use rand::Rng;
+use tokio::sync::Notify;
+use tracing::debug;
+
+/// Make a CancellationToken that is fulfilled when SIGINT occurs.
+pub fn notify_on_sigint() -> Arc<Notify> {
+    let notify = Arc::new(Notify::new());
+
+    tokio::spawn({
+        let notify = Arc::clone(&notify);
+        async move {
+            loop {
+                tokio::signal::ctrl_c().await.ok();
+                debug!("Keyboard interrupt");
+                notify.notify_waiters();
+            }
+        }
+    });
+
+    notify
+}
+
+/// Default exponential back‑off schedule: 200ms → 400ms → 800ms → 1600ms.
+pub(crate) fn backoff(attempt: u64) -> Duration {
+    let base_delay_ms = 200u64 * (1u64 << (attempt - 1));
+    let jitter = rand::rng().random_range(0.8..1.2);
+    let delay_ms = (base_delay_ms as f64 * jitter) as u64;
+    Duration::from_millis(delay_ms)
+}
+
+/// Return `true` if the current working directory is inside a Git repository.
+///
+/// The check walks up the directory hierarchy looking for a `.git` folder. This
+/// approach does **not** require the `git` binary or the `git2` crate and is
+/// therefore fairly lightweight.  It intentionally only looks for the
+/// presence of a *directory* named `.git` – this is good enough for regular
+/// work‑trees and bare repos that live inside a work‑tree (common for
+/// developers running Codex locally).
+///
+/// Note that this does **not** detect *work‑trees* created with
+/// `git worktree add` where the checkout lives outside the main repository
+/// directory.  If you need Codex to work from such a checkout simply pass the
+/// `--allow-no-git-exec` CLI flag that disables the repo requirement.
+pub fn is_inside_git_repo() -> bool {
+    // Best‑effort: any IO error is treated as "not a repo" – the caller can
+    // decide what to do with the result.
+    let mut dir = match std::env::current_dir() {
+        Ok(d) => d,
+        Err(_) => return false,
+    };
+
+    loop {
+        if dir.join(".git").exists() {
+            return true;
+        }
+
+        // Pop one component (go up one directory).  `pop` returns false when
+        // we have reached the filesystem root.
+        if !dir.pop() {
+            break;
+        }
+    }
+
+    false
+}
--- a/codex-rs/core/tests/live_agent.rs
+++ b/codex-rs/core/tests/live_agent.rs
@@ -0,0 +1,219 @@
+//! Live integration tests that exercise the full [`Agent`] stack **against the real
+//! OpenAI `/v1/responses` API**.  These tests complement the lightweight mock‑based
+//! unit tests by verifying that the agent can drive an end‑to‑end conversation,
+//! stream incremental events, execute function‑call tool invocations and safely
+//! chain multiple turns inside a single session – the exact scenarios that have
+//! historically been brittle.
+//!
+//! The live tests are **ignored by default** so CI remains deterministic and free
+//! of external dependencies.  Developers can opt‑in locally with e.g.
+//!
+//! ```bash
+//! OPENAI_API_KEY=sk‑... cargo test --test live_agent -- --ignored --nocapture
+//! ```
+//!
+//! Make sure your key has access to the experimental *Responses* API and that
+//! any billable usage is acceptable.
+
+use std::time::Duration;
+
+use codex_core::protocol::AskForApproval;
+use codex_core::protocol::EventMsg;
+use codex_core::protocol::InputItem;
+use codex_core::protocol::Op;
+use codex_core::protocol::SandboxPolicy;
+use codex_core::protocol::Submission;
+use codex_core::Codex;
+use tokio::sync::Notify;
+use tokio::time::timeout;
+
+fn api_key_available() -> bool {
+    std::env::var("OPENAI_API_KEY").is_ok()
+}
+
+/// Helper that spawns a fresh Agent and sends the mandatory *ConfigureSession*
+/// submission.  The caller receives the constructed [`Agent`] plus the unique
+/// submission id used for the initialization message.
+async fn spawn_codex() -> Codex {
+    assert!(
+        api_key_available(),
+        "OPENAI_API_KEY must be set for live tests"
+    );
+
+    // Environment tweaks to keep the tests snappy and inexpensive while still
+    // exercising retry/robustness logic.
+    std::env::set_var("OPENAI_REQUEST_MAX_RETRIES", "2");
+    std::env::set_var("OPENAI_STREAM_MAX_RETRIES", "2");
+
+    let agent = Codex::spawn(std::sync::Arc::new(Notify::new())).unwrap();
+
+    agent
+        .submit(Submission {
+            id: "init".into(),
+            op: Op::ConfigureSession {
+                model: None,
+                instructions: None,
+                approval_policy: AskForApproval::OnFailure,
+                sandbox_policy: SandboxPolicy::NetworkAndFileWriteRestricted,
+            },
+        })
+        .await
+        .expect("failed to submit init");
+
+    // Drain the SessionInitialized event so subsequent helper loops don't have
+    // to special‑case it.
+    loop {
+        let ev = timeout(Duration::from_secs(30), agent.next_event())
+            .await
+            .expect("timeout waiting for init event")
+            .expect("agent channel closed");
+        if matches!(ev.msg, EventMsg::SessionConfigured { .. }) {
+            break;
+        }
+    }
+
+    agent
+}
+
+/// Verifies that the agent streams incremental *AgentMessage* events **before**
+/// emitting `TaskComplete` and that a second task inside the same session does
+/// not get tripped up by a stale `previous_response_id`.
+#[ignore]
+#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
+async fn live_streaming_and_prev_id_reset() {
+    if !api_key_available() {
+        eprintln!("skipping live_streaming_and_prev_id_reset – OPENAI_API_KEY not set");
+        return;
+    }
+
+    let codex = spawn_codex().await;
+
+    // ---------- Task 1 ----------
+    codex
+        .submit(Submission {
+            id: "task1".into(),
+            op: Op::UserInput {
+                items: vec![InputItem::Text {
+                    text: "Say the words 'stream test'".into(),
+                }],
+            },
+        })
+        .await
+        .unwrap();
+
+    let mut saw_message_before_complete = false;
+    loop {
+        let ev = timeout(Duration::from_secs(60), codex.next_event())
+            .await
+            .expect("timeout waiting for task1 events")
+            .expect("agent closed");
+
+        match ev.msg {
+            EventMsg::AgentMessage { .. } => saw_message_before_complete = true,
+            EventMsg::TaskComplete => break,
+            EventMsg::Error { message } => panic!("agent reported error in task1: {message}"),
+            _ => (),
+        }
+    }
+
+    assert!(
+        saw_message_before_complete,
+        "Agent did not stream any AgentMessage before TaskComplete"
+    );
+
+    // ---------- Task 2 (same session) ----------
+    codex
+        .submit(Submission {
+            id: "task2".into(),
+            op: Op::UserInput {
+                items: vec![InputItem::Text {
+                    text: "Respond with exactly: second turn succeeded".into(),
+                }],
+            },
+        })
+        .await
+        .unwrap();
+
+    let mut got_expected = false;
+    loop {
+        let ev = timeout(Duration::from_secs(60), codex.next_event())
+            .await
+            .expect("timeout waiting for task2 events")
+            .expect("agent closed");
+
+        match &ev.msg {
+            EventMsg::AgentMessage { message } if message.contains("second turn succeeded") => {
+                got_expected = true;
+            }
+            EventMsg::TaskComplete => break,
+            EventMsg::Error { message } => panic!("agent reported error in task2: {message}"),
+            _ => (),
+        }
+    }
+
+    assert!(got_expected, "second task did not receive expected answer");
+}
+
+/// Exercises a *function‑call → shell execution* round‑trip by instructing the
+/// model to run a harmless `echo` command.  The test asserts that:
+///   1. the function call is executed (we see `ExecCommandBegin`/`End` events)
+///   2. the captured stdout reaches the client unchanged.
+#[ignore]
+#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
+async fn live_shell_function_call() {
+    if !api_key_available() {
+        eprintln!("skipping live_shell_function_call – OPENAI_API_KEY not set");
+        return;
+    }
+
+    let codex = spawn_codex().await;
+
+    const MARKER: &str = "codex_live_echo_ok";
+
+    codex
+        .submit(Submission {
+            id: "task_fn".into(),
+            op: Op::UserInput {
+                items: vec![InputItem::Text {
+                    text: format!(
+                        "Use the shell function to run the command `echo {MARKER}` and no other commands."
+                    ),
+                }],
+            },
+        })
+        .await
+        .unwrap();
+
+    let mut saw_begin = false;
+    let mut saw_end_with_output = false;
+
+    loop {
+        let ev = timeout(Duration::from_secs(60), codex.next_event())
+            .await
+            .expect("timeout waiting for function‑call events")
+            .expect("agent closed");
+
+        match ev.msg {
+            EventMsg::ExecCommandBegin { command, .. } => {
+                assert_eq!(command, vec!["echo", MARKER]);
+                saw_begin = true;
+            }
+            EventMsg::ExecCommandEnd {
+                stdout, exit_code, ..
+            } => {
+                assert_eq!(exit_code, 0, "echo returned non‑zero exit code");
+                assert!(stdout.contains(MARKER));
+                saw_end_with_output = true;
+            }
+            EventMsg::TaskComplete => break,
+            EventMsg::Error { message } => panic!("agent error during shell test: {message}"),
+            _ => (),
+        }
+    }
+
+    assert!(saw_begin, "ExecCommandBegin event missing");
+    assert!(
+        saw_end_with_output,
+        "ExecCommandEnd with expected output missing"
+    );
+}
--- a/codex-rs/core/tests/live_cli.rs
+++ b/codex-rs/core/tests/live_cli.rs
@@ -0,0 +1,143 @@
+//! Optional smoke tests that hit the real OpenAI /v1/responses endpoint. They are `#[ignore]` by
+//! default so CI stays deterministic and free. Developers can run them locally with
+//! `cargo test --test live_cli -- --ignored` provided they set a valid `OPENAI_API_KEY`.
+
+use assert_cmd::prelude::*;
+use predicates::prelude::*;
+use std::process::Command;
+use std::process::Stdio;
+use tempfile::TempDir;
+
+fn require_api_key() -> String {
+    std::env::var("OPENAI_API_KEY")
+        .expect("OPENAI_API_KEY env var not set — skip running live tests")
+}
+
+/// Helper that spawns the binary inside a TempDir with minimal flags. Returns (Assert, TempDir).
+fn run_live(prompt: &str) -> (assert_cmd::assert::Assert, TempDir) {
+    use std::io::Read;
+    use std::io::Write;
+    use std::thread;
+
+    let dir = TempDir::new().unwrap();
+
+    // Build a plain `std::process::Command` so we have full control over the underlying stdio
+    // handles. `assert_cmd`’s own `Command` wrapper always forces stdout/stderr to be piped
+    // internally which prevents us from streaming them live to the terminal (see its `spawn`
+    // implementation). Instead we configure the std `Command` ourselves, then later hand the
+    // resulting `Output` to `assert_cmd` for the familiar assertions.
+
+    let mut cmd = Command::cargo_bin("codex-rs").unwrap();
+    cmd.current_dir(dir.path());
+    cmd.env("OPENAI_API_KEY", require_api_key());
+
+    // We want three things at once:
+    //   1. live streaming of the child’s stdout/stderr while the test is running
+    //   2. captured output so we can keep using assert_cmd’s `Assert` helpers
+    //   3. cross‑platform behavior (best effort)
+    //
+    // To get that we:
+    //   • set both stdout and stderr to `piped()` so we can read them programmatically
+    //   • spawn a thread for each stream that copies bytes into two sinks:
+    //       – the parent process’ stdout/stderr for live visibility
+    //       – an in‑memory buffer so we can pass it to `assert_cmd` later
+
+    // Pass the prompt through the `--` separator so the CLI knows when user input ends.
+    cmd.arg("--allow-no-git-exec")
+        .arg("-v")
+        .arg("--")
+        .arg(prompt);
+
+    cmd.stdin(Stdio::piped());
+    cmd.stdout(Stdio::piped());
+    cmd.stderr(Stdio::piped());
+
+    let mut child = cmd.spawn().expect("failed to spawn codex-rs");
+
+    // Send the terminating newline so Session::run exits after the first turn.
+    child
+        .stdin
+        .as_mut()
+        .expect("child stdin unavailable")
+        .write_all(b"\n")
+        .expect("failed to write to child stdin");
+
+    // Helper that tees a ChildStdout/ChildStderr into both the parent’s stdio and a Vec<u8>.
+    fn tee<R: Read + Send + 'static>(
+        mut reader: R,
+        mut writer: impl Write + Send + 'static,
+    ) -> thread::JoinHandle<Vec<u8>> {
+        thread::spawn(move || {
+            let mut buf = Vec::new();
+            let mut chunk = [0u8; 4096];
+            loop {
+                match reader.read(&mut chunk) {
+                    Ok(0) => break,
+                    Ok(n) => {
+                        writer.write_all(&chunk[..n]).ok();
+                        writer.flush().ok();
+                        buf.extend_from_slice(&chunk[..n]);
+                    }
+                    Err(_) => break,
+                }
+            }
+            buf
+        })
+    }
+
+    let stdout_handle = tee(
+        child.stdout.take().expect("child stdout"),
+        std::io::stdout(),
+    );
+    let stderr_handle = tee(
+        child.stderr.take().expect("child stderr"),
+        std::io::stderr(),
+    );
+
+    let status = child.wait().expect("failed to wait on child");
+    let stdout = stdout_handle.join().expect("stdout thread panicked");
+    let stderr = stderr_handle.join().expect("stderr thread panicked");
+
+    let output = std::process::Output {
+        status,
+        stdout,
+        stderr,
+    };
+
+    (output.assert(), dir)
+}
+
+#[ignore]
+#[test]
+fn live_create_file_hello_txt() {
+    if std::env::var("OPENAI_API_KEY").is_err() {
+        eprintln!("skipping live_create_file_hello_txt – OPENAI_API_KEY not set");
+        return;
+    }
+
+    let (assert, dir) = run_live("Use the shell tool with the apply_patch command to create a file named hello.txt containing the text 'hello'.");
+
+    assert.success();
+
+    let path = dir.path().join("hello.txt");
+    assert!(path.exists(), "hello.txt was not created by the model");
+
+    let contents = std::fs::read_to_string(path).unwrap();
+
+    assert_eq!(contents.trim(), "hello");
+}
+
+#[ignore]
+#[test]
+fn live_print_working_directory() {
+    if std::env::var("OPENAI_API_KEY").is_err() {
+        eprintln!("skipping live_print_working_directory – OPENAI_API_KEY not set");
+        return;
+    }
+
+    let (assert, dir) = run_live("Print the current working directory using the shell function.");
+
+    assert
+        .success()
+        .stdout(predicate::str::contains(dir.path().to_string_lossy()));
+}
--- a/codex-rs/core/tests/previous_response_id.rs
+++ b/codex-rs/core/tests/previous_response_id.rs
@@ -0,0 +1,156 @@
+use std::time::Duration;
+
+use codex_core::protocol::AskForApproval;
+use codex_core::protocol::InputItem;
+use codex_core::protocol::Op;
+use codex_core::protocol::SandboxPolicy;
+use codex_core::protocol::Submission;
+use codex_core::Codex;
+use serde_json::Value;
+use tokio::time::timeout;
+use wiremock::matchers::method;
+use wiremock::matchers::path;
+use wiremock::Match;
+use wiremock::Mock;
+use wiremock::MockServer;
+use wiremock::Request;
+use wiremock::ResponseTemplate;
+
+/// Matcher asserting that JSON body has NO `previous_response_id` field.
+struct NoPrevId;
+
+impl Match for NoPrevId {
+    fn matches(&self, req: &Request) -> bool {
+        serde_json::from_slice::<Value>(&req.body)
+            .map(|v| v.get("previous_response_id").is_none())
+            .unwrap_or(false)
+    }
+}
+
+/// Matcher asserting that JSON body HAS a `previous_response_id` field.
+struct HasPrevId;
+
+impl Match for HasPrevId {
+    fn matches(&self, req: &Request) -> bool {
+        serde_json::from_slice::<Value>(&req.body)
+            .map(|v| v.get("previous_response_id").is_some())
+            .unwrap_or(false)
+    }
+}
+
+/// Build minimal SSE stream with completed marker.
+fn sse_completed(id: &str) -> String {
+    format!(
+        "event: response.completed\n\
+data: {{\"type\":\"response.completed\",\"response\":{{\"id\":\"{}\",\"output\":[]}}}}\n\n\n",
+        id
+    )
+}
+
+#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
+async fn keeps_previous_response_id_between_tasks() {
+    // Mock server
+    let server = MockServer::start().await;
+
+    // First request – must NOT include `previous_response_id`.
+    let first = ResponseTemplate::new(200)
+        .insert_header("content-type", "text/event-stream")
+        .set_body_raw(sse_completed("resp1"), "text/event-stream");
+
+    Mock::given(method("POST"))
+        .and(path("/v1/responses"))
+        .and(NoPrevId)
+        .respond_with(first)
+        .expect(1)
+        .mount(&server)
+        .await;
+
+    // Second request – MUST include `previous_response_id`.
+    let second = ResponseTemplate::new(200)
+        .insert_header("content-type", "text/event-stream")
+        .set_body_raw(sse_completed("resp2"), "text/event-stream");
+
+    Mock::given(method("POST"))
+        .and(path("/v1/responses"))
+        .and(HasPrevId)
+        .respond_with(second)
+        .expect(1)
+        .mount(&server)
+        .await;
+
+    // Environment
+    std::env::set_var("OPENAI_API_KEY", "test-key");
+    std::env::set_var("OPENAI_API_BASE", server.uri());
+    std::env::set_var("OPENAI_REQUEST_MAX_RETRIES", "0");
+    std::env::set_var("OPENAI_STREAM_MAX_RETRIES", "0");
+
+    let codex = Codex::spawn(std::sync::Arc::new(tokio::sync::Notify::new())).unwrap();
+
+    // Init session
+    codex
+        .submit(Submission {
+            id: "init".into(),
+            op: Op::ConfigureSession {
+                model: None,
+                instructions: None,
+                approval_policy: AskForApproval::OnFailure,
+                sandbox_policy: SandboxPolicy::NetworkAndFileWriteRestricted,
+            },
+        })
+        .await
+        .unwrap();
+    // drain init event
+    let _ = codex.next_event().await.unwrap();
+
+    // Task 1 – triggers first request (no previous_response_id)
+    codex
+        .submit(Submission {
+            id: "task1".into(),
+            op: Op::UserInput {
+                items: vec![InputItem::Text {
+                    text: "hello".into(),
+                }],
+            },
+        })
+        .await
+        .unwrap();
+
+    // Wait for TaskComplete
+    loop {
+        let ev = timeout(Duration::from_secs(1), codex.next_event())
+            .await
+            .unwrap()
+            .unwrap();
+        if matches!(ev.msg, codex_core::protocol::EventMsg::TaskComplete) {
+            break;
+        }
+    }
+
+    // Task 2 – should include `previous_response_id` (triggers second request)
+    codex
+        .submit(Submission {
+            id: "task2".into(),
+            op: Op::UserInput {
+                items: vec![InputItem::Text {
+                    text: "again".into(),
+                }],
+            },
+        })
+        .await
+        .unwrap();
+
+    // Wait for TaskComplete or error
+    loop {
+        let ev = timeout(Duration::from_secs(1), codex.next_event())
+            .await
+            .unwrap()
+            .unwrap();
+        match ev.msg {
+            codex_core::protocol::EventMsg::TaskComplete => break,
+            codex_core::protocol::EventMsg::Error { message } => {
+                panic!("unexpected error: {message}")
+            }
+            _ => (),
+        }
+    }
+}
--- a/codex-rs/core/tests/stream_no_completed.rs
+++ b/codex-rs/core/tests/stream_no_completed.rs
@@ -0,0 +1,109 @@
+//! Verifies that the agent retries when the SSE stream terminates before
+//! delivering a `response.completed` event.
+
+use std::time::Duration;
+
+use codex_core::protocol::AskForApproval;
+use codex_core::protocol::InputItem;
+use codex_core::protocol::Op;
+use codex_core::protocol::SandboxPolicy;
+use codex_core::protocol::Submission;
+use codex_core::Codex;
+use tokio::time::timeout;
+use wiremock::matchers::method;
+use wiremock::matchers::path;
+use wiremock::Mock;
+use wiremock::MockServer;
+use wiremock::Request;
+use wiremock::Respond;
+use wiremock::ResponseTemplate;
+
+fn sse_incomplete() -> String {
+    // Only a single line; missing the completed event.
+    "event: response.output_item.done\n\n".to_string()
+}
+
+fn sse_completed(id: &str) -> String {
+    format!(
+        "event: response.completed\n\
+data: {{\"type\":\"response.completed\",\"response\":{{\"id\":\"{}\",\"output\":[]}}}}\n\n\n",
+        id
+    )
+}
+
+#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
+async fn retries_on_early_close() {
+    let server = MockServer::start().await;
+
+    struct SeqResponder;
+    impl Respond for SeqResponder {
+        fn respond(&self, _: &Request) -> ResponseTemplate {
+            use std::sync::atomic::AtomicUsize;
+            use std::sync::atomic::Ordering;
+            static CALLS: AtomicUsize = AtomicUsize::new(0);
+            let n = CALLS.fetch_add(1, Ordering::SeqCst);
+            if n == 0 {
+                ResponseTemplate::new(200)
+                    .insert_header("content-type", "text/event-stream")
+                    .set_body_raw(sse_incomplete(), "text/event-stream")
+            } else {
+                ResponseTemplate::new(200)
+                    .insert_header("content-type", "text/event-stream")
+                    .set_body_raw(sse_completed("resp_ok"), "text/event-stream")
+            }
+        }
+    }
+
+    Mock::given(method("POST"))
+        .and(path("/v1/responses"))
+        .respond_with(SeqResponder {})
+        .expect(2)
+        .mount(&server)
+        .await;
+
+    // Environment
+    std::env::set_var("OPENAI_API_KEY", "test-key");
+    std::env::set_var("OPENAI_API_BASE", server.uri());
+    std::env::set_var("OPENAI_REQUEST_MAX_RETRIES", "0");
+    std::env::set_var("OPENAI_STREAM_MAX_RETRIES", "1");
+    std::env::set_var("OPENAI_STREAM_IDLE_TIMEOUT_MS", "2000");
+
+    let codex = Codex::spawn(std::sync::Arc::new(tokio::sync::Notify::new())).unwrap();
+
+    codex
+        .submit(Submission {
+            id: "init".into(),
+            op: Op::ConfigureSession {
+                model: None,
+                instructions: None,
+                approval_policy: AskForApproval::OnFailure,
+                sandbox_policy: SandboxPolicy::NetworkAndFileWriteRestricted,
+            },
+        })
+        .await
+        .unwrap();
+    let _ = codex.next_event().await.unwrap();
+
+    codex
+        .submit(Submission {
+            id: "task".into(),
+            op: Op::UserInput {
+                items: vec![InputItem::Text {
+                    text: "hello".into(),
+                }],
+            },
+        })
+        .await
+        .unwrap();
+
+    // Wait until TaskComplete (should succeed after retry).
+    loop {
+        let ev = timeout(Duration::from_secs(10), codex.next_event())
+            .await
+            .unwrap()
+            .unwrap();
+        if matches!(ev.msg, codex_core::protocol::EventMsg::TaskComplete) {
+            break;
+        }
+    }
+}