Files
llmx/codex-rs/protocol/src/protocol.rs

1326 lines
43 KiB
Rust
Raw Normal View History

//! Defines the protocol for a Codex session between a client and an agent.
//!
//! Uses a SQ (Submission Queue) / EQ (Event Queue) pattern to asynchronously communicate
//! between user and agent.
use std::collections::HashMap;
use std::fmt;
use std::path::Path;
use std::path::PathBuf;
use std::str::FromStr;
use std::time::Duration;
use crate::ConversationId;
use crate::config_types::ReasoningEffort as ReasoningEffortConfig;
use crate::config_types::ReasoningSummary as ReasoningSummaryConfig;
use crate::custom_prompts::CustomPrompt;
use crate::message_history::HistoryEntry;
use crate::models::ContentItem;
use crate::models::ResponseItem;
use crate::num_format::format_with_separators;
use crate::parse_command::ParsedCommand;
use crate::plan_tool::UpdatePlanArgs;
feat: support mcp_servers in config.toml (#829) This adds initial support for MCP servers in the style of Claude Desktop and Cursor. Note this PR is the bare minimum to get things working end to end: all configured MCP servers are launched every time Codex is run, there is no recovery for MCP servers that crash, etc. (Also, I took some shortcuts to change some fields of `Session` to be `pub(crate)`, which also means there are circular deps between `codex.rs` and `mcp_tool_call.rs`, but I will clean that up in a subsequent PR.) `codex-rs/README.md` is updated as part of this PR to explain how to use this feature. There is a bit of plumbing to route the new settings from `Config` to the business logic in `codex.rs`. The most significant chunks for new code are in `mcp_connection_manager.rs` (which defines the `McpConnectionManager` struct) and `mcp_tool_call.rs`, which is responsible for tool calls. This PR also introduces new `McpToolCallBegin` and `McpToolCallEnd` event types to the protocol, but does not add any handlers for them. (See https://github.com/openai/codex/pull/836 for initial usage.) To test, I added the following to my `~/.codex/config.toml`: ```toml # Local build of https://github.com/hideya/mcp-server-weather-js [mcp_servers.weather] command = "/Users/mbolin/code/mcp-server-weather-js/dist/index.js" args = [] ``` And then I ran the following: ``` codex-rs$ cargo run --bin codex exec 'what is the weather in san francisco' [2025-05-06T22:40:05] Task started: 1 [2025-05-06T22:40:18] Agent message: Here’s the latest National Weather Service forecast for San Francisco (downtown, near 37.77° N, 122.42° W): This Afternoon (Tue): • Sunny, high near 69 °F • West-southwest wind around 12 mph Tonight: • Partly cloudy, low around 52 °F • SW wind 7–10 mph ... ``` Note that Codex itself is not able to make network calls, so it would not normally be able to get live weather information like this. However, the weather MCP is [currently] not run under the Codex sandbox, so it is able to hit `api.weather.gov` and fetch current weather information. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/829). * #836 * __->__ #829
2025-05-06 15:47:59 -07:00
use mcp_types::CallToolResult;
use mcp_types::Tool as McpTool;
use serde::Deserialize;
use serde::Serialize;
use serde_json::Value;
use serde_with::serde_as;
use strum_macros::Display;
2025-08-18 13:08:53 -07:00
use ts_rs::TS;
/// Open/close tags for special user-input blocks. Used across crates to avoid
/// duplicated hardcoded strings.
pub const USER_INSTRUCTIONS_OPEN_TAG: &str = "<user_instructions>";
pub const USER_INSTRUCTIONS_CLOSE_TAG: &str = "</user_instructions>";
pub const ENVIRONMENT_CONTEXT_OPEN_TAG: &str = "<environment_context>";
pub const ENVIRONMENT_CONTEXT_CLOSE_TAG: &str = "</environment_context>";
pub const USER_MESSAGE_BEGIN: &str = "## My request for Codex:";
/// Submission Queue Entry - requests from user
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct Submission {
/// Unique id for this Submission to correlate with Events
pub id: String,
/// Payload
pub op: Op,
}
/// Submission operation
feat: record messages from user in ~/.codex/history.jsonl (#939) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq)]
#[serde(tag = "type", rename_all = "snake_case")]
feat: support the chat completions API in the Rust CLI (#862) This is a substantial PR to add support for the chat completions API, which in turn makes it possible to use non-OpenAI model providers (just like in the TypeScript CLI): * It moves a number of structs from `client.rs` to `client_common.rs` so they can be shared. * It introduces support for the chat completions API in `chat_completions.rs`. * It updates `ModelProviderInfo` so that `env_key` is `Option<String>` instead of `String` (for e.g., ollama) and adds a `wire_api` field * It updates `client.rs` to choose between `stream_responses()` and `stream_chat_completions()` based on the `wire_api` for the `ModelProviderInfo` * It updates the `exec` and TUI CLIs to no longer fail if the `OPENAI_API_KEY` environment variable is not set * It updates the TUI so that `EventMsg::Error` is displayed more prominently when it occurs, particularly now that it is important to alert users to the `CodexErr::EnvVar` variant. * `CodexErr::EnvVar` was updated to include an optional `instructions` field so we can preserve the behavior where we direct users to https://platform.openai.com if `OPENAI_API_KEY` is not set. * Cleaned up the "welcome message" in the TUI to ensure the model provider is displayed. * Updated the docs in `codex-rs/README.md`. To exercise the chat completions API from OpenAI models, I added the following to my `config.toml`: ```toml model = "gpt-4o" model_provider = "openai-chat-completions" [model_providers.openai-chat-completions] name = "OpenAI using Chat Completions" base_url = "https://api.openai.com/v1" env_key = "OPENAI_API_KEY" wire_api = "chat" ``` Though to test a non-OpenAI provider, I installed ollama with mistral locally on my Mac because ChatGPT said that would be a good match for my hardware: ```shell brew install ollama ollama serve ollama pull mistral ``` Then I added the following to my `~/.codex/config.toml`: ```toml model = "mistral" model_provider = "ollama" ``` Note this code could certainly use more test coverage, but I want to get this in so folks can start playing with it. For reference, I believe https://github.com/openai/codex/pull/247 was roughly the comparable PR on the TypeScript side.
2025-05-08 21:46:06 -07:00
#[allow(clippy::large_enum_variant)]
#[non_exhaustive]
pub enum Op {
/// Abort current task.
/// This server sends [`EventMsg::TurnAborted`] in response.
Interrupt,
/// Input from the user
UserInput {
/// User input items, see `InputItem`
items: Vec<InputItem>,
},
/// Similar to [`Op::UserInput`], but contains additional context required
/// for a turn of a [`crate::codex_conversation::CodexConversation`].
UserTurn {
/// User input items, see `InputItem`
items: Vec<InputItem>,
/// `cwd` to use with the [`SandboxPolicy`] and potentially tool calls
/// such as `local_shell`.
cwd: PathBuf,
/// Policy to use for command approval.
approval_policy: AskForApproval,
/// Policy to use for tool calls such as `local_shell`.
sandbox_policy: SandboxPolicy,
/// Must be a valid model slug for the [`crate::client::ModelClient`]
/// associated with this conversation.
model: String,
/// Will only be honored if the model is configured to use reasoning.
#[serde(skip_serializing_if = "Option::is_none")]
effort: Option<ReasoningEffortConfig>,
/// Will only be honored if the model is configured to use reasoning.
summary: ReasoningSummaryConfig,
// The JSON schema to use for the final assistant message
final_output_json_schema: Option<Value>,
},
/// Override parts of the persistent turn context for subsequent turns.
///
/// All fields are optional; when omitted, the existing value is preserved.
/// This does not enqueue any input it only updates defaults used for
/// future `UserInput` turns.
OverrideTurnContext {
/// Updated `cwd` for sandbox/tool calls.
#[serde(skip_serializing_if = "Option::is_none")]
cwd: Option<PathBuf>,
/// Updated command approval policy.
#[serde(skip_serializing_if = "Option::is_none")]
approval_policy: Option<AskForApproval>,
/// Updated sandbox policy for tool calls.
#[serde(skip_serializing_if = "Option::is_none")]
sandbox_policy: Option<SandboxPolicy>,
/// Updated model slug. When set, the model family is derived
/// automatically.
#[serde(skip_serializing_if = "Option::is_none")]
model: Option<String>,
/// Updated reasoning effort (honored only for reasoning-capable models).
///
/// Use `Some(Some(_))` to set a specific effort, `Some(None)` to clear
/// the effort, or `None` to leave the existing value unchanged.
#[serde(skip_serializing_if = "Option::is_none")]
effort: Option<Option<ReasoningEffortConfig>>,
/// Updated reasoning summary preference (honored only for reasoning-capable models).
#[serde(skip_serializing_if = "Option::is_none")]
summary: Option<ReasoningSummaryConfig>,
},
/// Approve a command execution
ExecApproval {
/// The id of the submission we are approving
id: String,
/// The user's decision in response to the request.
decision: ReviewDecision,
},
/// Approve a code patch
PatchApproval {
/// The id of the submission we are approving
id: String,
/// The user's decision in response to the request.
decision: ReviewDecision,
},
feat: record messages from user in ~/.codex/history.jsonl (#939) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
/// Append an entry to the persistent cross-session message history.
///
/// Note the entry is not guaranteed to be logged if the user has
/// history disabled, it matches the list of "sensitive" patterns, etc.
AddToHistory {
/// The message text to be stored.
text: String,
},
/// Request a single history entry identified by `log_id` + `offset`.
GetHistoryEntryRequest { offset: usize, log_id: u64 },
/// Request the full in-memory conversation transcript for the current session.
/// Reply is delivered via `EventMsg::ConversationHistory`.
GetPath,
/// Request the list of MCP tools available across all configured servers.
/// Reply is delivered via `EventMsg::McpListToolsResponse`.
ListMcpTools,
/// Request the list of available custom prompts.
ListCustomPrompts,
/// Request the agent to summarize the current conversation context.
/// The agent will use its existing context (either conversation history or previous response id)
/// to generate a summary which will be returned as an AgentMessage event.
Compact,
Review Mode (Core) (#3401) ## 📝 Review Mode -- Core This PR introduces the Core implementation for Review mode: - New op `Op::Review { prompt: String }:` spawns a child review task with isolated context, a review‑specific system prompt, and a `Config.review_model`. - `EnteredReviewMode`: emitted when the child review session starts. Every event from this point onwards reflects the review session. - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review finishes or is interrupted, with optional structured findings: ```json { "findings": [ { "title": "<≤ 80 chars, imperative>", "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>", "confidence_score": <float 0.0-1.0>, "priority": <int 0-3>, "code_location": { "absolute_file_path": "<file path>", "line_range": {"start": <int>, "end": <int>} } } ], "overall_correctness": "patch is correct" | "patch is incorrect", "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>", "overall_confidence_score": <float 0.0-1.0> } ``` ## Questions ### Why separate out its own message history? We want the review thread to match the training of our review models as much as possible -- that means using a custom prompt, removing user instructions, and starting a clean chat history. We also want to make sure the review thread doesn't leak into the parent thread. ### Why do this as a mode, vs. sub-agents? 1. We want review to be a synchronous task, so it's fine for now to do a bespoke implementation. 2. We're still unclear about the final structure for sub-agents. We'd prefer to land this quickly and then refactor into sub-agents without rushing that implementation.
2025-09-12 16:25:10 -07:00
/// Request a code review from the agent.
Review { review_request: ReviewRequest },
/// Request to shut down codex instance.
Shutdown,
}
/// Determines the conditions under which the user is consulted to approve
/// running the command proposed by Codex.
2025-08-18 13:08:53 -07:00
#[derive(Debug, Clone, Copy, Default, PartialEq, Eq, Hash, Serialize, Deserialize, Display, TS)]
#[serde(rename_all = "kebab-case")]
#[strum(serialize_all = "kebab-case")]
pub enum AskForApproval {
/// Under this policy, only "known safe" commands—as determined by
/// `is_safe_command()`—that **only read files** are autoapproved.
/// Everything else will ask the user to approve.
#[serde(rename = "untrusted")]
#[strum(serialize = "untrusted")]
UnlessTrusted,
/// *All* commands are autoapproved, but they are expected to run inside a
/// sandbox where network access is disabled and writes are confined to a
/// specific set of paths. If the command fails, it will be escalated to
/// the user to approve execution without a sandbox.
OnFailure,
/// The model decides when to ask the user for approval.
#[default]
OnRequest,
/// Never ask the user to approve commands. Failures are immediately returned
/// to the model, and never escalated to the user for approval.
Never,
}
/// Determines execution restrictions for model shell commands.
2025-08-18 13:08:53 -07:00
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Display, TS)]
#[strum(serialize_all = "kebab-case")]
#[serde(tag = "mode", rename_all = "kebab-case")]
pub enum SandboxPolicy {
/// No restrictions whatsoever. Use with caution.
#[serde(rename = "danger-full-access")]
DangerFullAccess,
/// Read-only access to the entire file-system.
#[serde(rename = "read-only")]
ReadOnly,
/// Same as `ReadOnly` but additionally grants write access to the current
/// working directory ("workspace").
#[serde(rename = "workspace-write")]
WorkspaceWrite {
/// Additional folders (beyond cwd and possibly TMPDIR) that should be
/// writable from within the sandbox.
#[serde(default, skip_serializing_if = "Vec::is_empty")]
writable_roots: Vec<PathBuf>,
/// When set to `true`, outbound network access is allowed. `false` by
/// default.
#[serde(default)]
network_access: bool,
/// When set to `true`, will NOT include the per-user `TMPDIR`
/// environment variable among the default writable roots. Defaults to
/// `false`.
#[serde(default)]
exclude_tmpdir_env_var: bool,
/// When set to `true`, will NOT include the `/tmp` among the default
/// writable roots on UNIX. Defaults to `false`.
#[serde(default)]
exclude_slash_tmp: bool,
},
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
}
feat: make .git read-only within a writable root when using Seatbelt (#1765) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.
2025-08-01 16:11:24 -07:00
/// A writable root path accompanied by a list of subpaths that should remain
/// readonly even when the root is writable. This is primarily used to ensure
/// toplevel VCS metadata directories (e.g. `.git`) under a writable root are
/// not modified by the agent.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct WritableRoot {
fix: tighten up checks against writable folders for SandboxPolicy (#2338) I was looking at the implementation of `Session::get_writable_roots()`, which did not seem right, as it was a copy of writable roots, which is not guaranteed to be in sync with the `sandbox_policy` field. I looked at who was calling `get_writable_roots()` and its only call site was `apply_patch()` in `codex-rs/core/src/apply_patch.rs`, which took the roots and forwarded them to `assess_patch_safety()` in `safety.rs`. I updated `assess_patch_safety()` to take `sandbox_policy: &SandboxPolicy` instead of `writable_roots: &[PathBuf]` (and replaced `Session::get_writable_roots()` with `Session::get_sandbox_policy()`). Within `safety.rs`, it was fairly easy to update `is_write_patch_constrained_to_writable_paths()` to work with `SandboxPolicy`, and in particular, it is far more accurate because, for better or worse, `SandboxPolicy::get_writable_roots_with_cwd()` _returns an empty vec_ for `SandboxPolicy::DangerFullAccess`, suggesting that _nothing_ is writable when in reality _everything_ is writable. With this PR, `is_write_patch_constrained_to_writable_paths()` now does the right thing for each variant of `SandboxPolicy`. I thought this would be the end of the story, but it turned out that `test_writable_roots_constraint()` in `safety.rs` needed to be updated, as well. In particular, the test was writing to `std::env::current_dir()` instead of a `TempDir`, which I suspect was a holdover from earlier when `SandboxPolicy::WorkspaceWrite` would always make `TMPDIR` writable on macOS, which made it hard to write tests to verify `SandboxPolicy` in `TMPDIR`. Fortunately, we now have `exclude_tmpdir_env_var` as an option on `SandboxPolicy::WorkspaceWrite`, so I was able to update the test to preserve the existing behavior, but to no longer write to `std::env::current_dir()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2338). * #2345 * #2329 * #2343 * #2340 * __->__ #2338
2025-08-15 09:06:15 -07:00
/// Absolute path, by construction.
feat: make .git read-only within a writable root when using Seatbelt (#1765) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.
2025-08-01 16:11:24 -07:00
pub root: PathBuf,
fix: tighten up checks against writable folders for SandboxPolicy (#2338) I was looking at the implementation of `Session::get_writable_roots()`, which did not seem right, as it was a copy of writable roots, which is not guaranteed to be in sync with the `sandbox_policy` field. I looked at who was calling `get_writable_roots()` and its only call site was `apply_patch()` in `codex-rs/core/src/apply_patch.rs`, which took the roots and forwarded them to `assess_patch_safety()` in `safety.rs`. I updated `assess_patch_safety()` to take `sandbox_policy: &SandboxPolicy` instead of `writable_roots: &[PathBuf]` (and replaced `Session::get_writable_roots()` with `Session::get_sandbox_policy()`). Within `safety.rs`, it was fairly easy to update `is_write_patch_constrained_to_writable_paths()` to work with `SandboxPolicy`, and in particular, it is far more accurate because, for better or worse, `SandboxPolicy::get_writable_roots_with_cwd()` _returns an empty vec_ for `SandboxPolicy::DangerFullAccess`, suggesting that _nothing_ is writable when in reality _everything_ is writable. With this PR, `is_write_patch_constrained_to_writable_paths()` now does the right thing for each variant of `SandboxPolicy`. I thought this would be the end of the story, but it turned out that `test_writable_roots_constraint()` in `safety.rs` needed to be updated, as well. In particular, the test was writing to `std::env::current_dir()` instead of a `TempDir`, which I suspect was a holdover from earlier when `SandboxPolicy::WorkspaceWrite` would always make `TMPDIR` writable on macOS, which made it hard to write tests to verify `SandboxPolicy` in `TMPDIR`. Fortunately, we now have `exclude_tmpdir_env_var` as an option on `SandboxPolicy::WorkspaceWrite`, so I was able to update the test to preserve the existing behavior, but to no longer write to `std::env::current_dir()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2338). * #2345 * #2329 * #2343 * #2340 * __->__ #2338
2025-08-15 09:06:15 -07:00
/// Also absolute paths, by construction.
feat: make .git read-only within a writable root when using Seatbelt (#1765) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.
2025-08-01 16:11:24 -07:00
pub read_only_subpaths: Vec<PathBuf>,
}
fix: tighten up checks against writable folders for SandboxPolicy (#2338) I was looking at the implementation of `Session::get_writable_roots()`, which did not seem right, as it was a copy of writable roots, which is not guaranteed to be in sync with the `sandbox_policy` field. I looked at who was calling `get_writable_roots()` and its only call site was `apply_patch()` in `codex-rs/core/src/apply_patch.rs`, which took the roots and forwarded them to `assess_patch_safety()` in `safety.rs`. I updated `assess_patch_safety()` to take `sandbox_policy: &SandboxPolicy` instead of `writable_roots: &[PathBuf]` (and replaced `Session::get_writable_roots()` with `Session::get_sandbox_policy()`). Within `safety.rs`, it was fairly easy to update `is_write_patch_constrained_to_writable_paths()` to work with `SandboxPolicy`, and in particular, it is far more accurate because, for better or worse, `SandboxPolicy::get_writable_roots_with_cwd()` _returns an empty vec_ for `SandboxPolicy::DangerFullAccess`, suggesting that _nothing_ is writable when in reality _everything_ is writable. With this PR, `is_write_patch_constrained_to_writable_paths()` now does the right thing for each variant of `SandboxPolicy`. I thought this would be the end of the story, but it turned out that `test_writable_roots_constraint()` in `safety.rs` needed to be updated, as well. In particular, the test was writing to `std::env::current_dir()` instead of a `TempDir`, which I suspect was a holdover from earlier when `SandboxPolicy::WorkspaceWrite` would always make `TMPDIR` writable on macOS, which made it hard to write tests to verify `SandboxPolicy` in `TMPDIR`. Fortunately, we now have `exclude_tmpdir_env_var` as an option on `SandboxPolicy::WorkspaceWrite`, so I was able to update the test to preserve the existing behavior, but to no longer write to `std::env::current_dir()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2338). * #2345 * #2329 * #2343 * #2340 * __->__ #2338
2025-08-15 09:06:15 -07:00
impl WritableRoot {
pub fn is_path_writable(&self, path: &Path) -> bool {
fix: tighten up checks against writable folders for SandboxPolicy (#2338) I was looking at the implementation of `Session::get_writable_roots()`, which did not seem right, as it was a copy of writable roots, which is not guaranteed to be in sync with the `sandbox_policy` field. I looked at who was calling `get_writable_roots()` and its only call site was `apply_patch()` in `codex-rs/core/src/apply_patch.rs`, which took the roots and forwarded them to `assess_patch_safety()` in `safety.rs`. I updated `assess_patch_safety()` to take `sandbox_policy: &SandboxPolicy` instead of `writable_roots: &[PathBuf]` (and replaced `Session::get_writable_roots()` with `Session::get_sandbox_policy()`). Within `safety.rs`, it was fairly easy to update `is_write_patch_constrained_to_writable_paths()` to work with `SandboxPolicy`, and in particular, it is far more accurate because, for better or worse, `SandboxPolicy::get_writable_roots_with_cwd()` _returns an empty vec_ for `SandboxPolicy::DangerFullAccess`, suggesting that _nothing_ is writable when in reality _everything_ is writable. With this PR, `is_write_patch_constrained_to_writable_paths()` now does the right thing for each variant of `SandboxPolicy`. I thought this would be the end of the story, but it turned out that `test_writable_roots_constraint()` in `safety.rs` needed to be updated, as well. In particular, the test was writing to `std::env::current_dir()` instead of a `TempDir`, which I suspect was a holdover from earlier when `SandboxPolicy::WorkspaceWrite` would always make `TMPDIR` writable on macOS, which made it hard to write tests to verify `SandboxPolicy` in `TMPDIR`. Fortunately, we now have `exclude_tmpdir_env_var` as an option on `SandboxPolicy::WorkspaceWrite`, so I was able to update the test to preserve the existing behavior, but to no longer write to `std::env::current_dir()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2338). * #2345 * #2329 * #2343 * #2340 * __->__ #2338
2025-08-15 09:06:15 -07:00
// Check if the path is under the root.
if !path.starts_with(&self.root) {
return false;
}
// Check if the path is under any of the read-only subpaths.
for subpath in &self.read_only_subpaths {
if path.starts_with(subpath) {
return false;
}
}
true
}
}
impl FromStr for SandboxPolicy {
type Err = serde_json::Error;
fn from_str(s: &str) -> Result<Self, Self::Err> {
serde_json::from_str(s)
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
}
}
impl SandboxPolicy {
/// Returns a policy with read-only disk access and no network.
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
pub fn new_read_only_policy() -> Self {
SandboxPolicy::ReadOnly
}
/// Returns a policy that can read the entire disk, but can only write to
/// the current working directory and the per-user tmp dir on macOS. It does
/// not allow network access.
pub fn new_workspace_write_policy() -> Self {
SandboxPolicy::WorkspaceWrite {
writable_roots: vec![],
network_access: false,
exclude_tmpdir_env_var: false,
exclude_slash_tmp: false,
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
}
}
/// Always returns `true`; restricting read access is not supported.
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
pub fn has_full_disk_read_access(&self) -> bool {
true
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
}
pub fn has_full_disk_write_access(&self) -> bool {
match self {
SandboxPolicy::DangerFullAccess => true,
SandboxPolicy::ReadOnly => false,
SandboxPolicy::WorkspaceWrite { .. } => false,
}
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
}
pub fn has_full_network_access(&self) -> bool {
match self {
SandboxPolicy::DangerFullAccess => true,
SandboxPolicy::ReadOnly => false,
SandboxPolicy::WorkspaceWrite { network_access, .. } => *network_access,
}
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
}
feat: make .git read-only within a writable root when using Seatbelt (#1765) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.
2025-08-01 16:11:24 -07:00
/// Returns the list of writable roots (tailored to the current working
/// directory) together with subpaths that should remain readonly under
/// each writable root.
pub fn get_writable_roots_with_cwd(&self, cwd: &Path) -> Vec<WritableRoot> {
match self {
SandboxPolicy::DangerFullAccess => Vec::new(),
SandboxPolicy::ReadOnly => Vec::new(),
SandboxPolicy::WorkspaceWrite {
writable_roots,
exclude_tmpdir_env_var,
exclude_slash_tmp,
network_access: _,
} => {
feat: make .git read-only within a writable root when using Seatbelt (#1765) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.
2025-08-01 16:11:24 -07:00
// Start from explicitly configured writable roots.
let mut roots: Vec<PathBuf> = writable_roots.clone();
// Always include defaults: cwd, /tmp (if present on Unix), and
// on macOS, the per-user TMPDIR unless explicitly excluded.
roots.push(cwd.to_path_buf());
// Include /tmp on Unix unless explicitly excluded.
if cfg!(unix) && !exclude_slash_tmp {
let slash_tmp = PathBuf::from("/tmp");
if slash_tmp.is_dir() {
roots.push(slash_tmp);
}
}
// Include $TMPDIR unless explicitly excluded. On macOS, TMPDIR
// is per-user, so writes to TMPDIR should not be readable by
// other users on the system.
//
// By comparison, TMPDIR is not guaranteed to be defined on
// Linux or Windows, but supporting it here gives users a way to
// provide the model with their own temporary directory without
// having to hardcode it in the config.
if !exclude_tmpdir_env_var
&& let Some(tmpdir) = std::env::var_os("TMPDIR")
&& !tmpdir.is_empty()
{
roots.push(PathBuf::from(tmpdir));
}
feat: make .git read-only within a writable root when using Seatbelt (#1765) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.
2025-08-01 16:11:24 -07:00
// For each root, compute subpaths that should remain read-only.
roots
feat: make .git read-only within a writable root when using Seatbelt (#1765) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.
2025-08-01 16:11:24 -07:00
.into_iter()
.map(|writable_root| {
let mut subpaths = Vec::new();
let top_level_git = writable_root.join(".git");
if top_level_git.is_dir() {
subpaths.push(top_level_git);
}
WritableRoot {
root: writable_root,
read_only_subpaths: subpaths,
}
})
.collect()
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
}
}
}
}
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
/// User input
#[non_exhaustive]
feat: record messages from user in ~/.codex/history.jsonl (#939) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq)]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum InputItem {
Text {
text: String,
},
/// Preencoded data: URI image.
Image {
image_url: String,
},
/// Local image path provided by the user. This will be converted to an
/// `Image` variant (base64 data URL) during request serialization.
LocalImage {
path: std::path::PathBuf,
},
}
/// Event Queue Entry - events from agent
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct Event {
/// Submission `id` that this event is correlated with.
pub id: String,
/// Payload
pub msg: EventMsg,
}
/// Response event from the agent
2025-09-14 17:34:33 -07:00
/// NOTE: Make sure none of these values have optional types, as it will mess up the extension code-gen.
#[derive(Debug, Clone, Deserialize, Serialize, Display, TS)]
#[serde(tag = "type", rename_all = "snake_case")]
#[strum(serialize_all = "snake_case")]
pub enum EventMsg {
/// Error while executing a submission
Error(ErrorEvent),
/// Agent has started a task
TaskStarted(TaskStartedEvent),
/// Agent has completed all actions
TaskComplete(TaskCompleteEvent),
/// Usage update for the current session, including totals and last turn.
/// Optional means unknown — UIs should not display when `None`.
TokenCount(TokenCountEvent),
feat: show number of tokens remaining in UI (#1388) When using the OpenAI Responses API, we now record the `usage` field for a `"response.completed"` event, which includes metrics about the number of tokens consumed. We also introduce `openai_model_info.rs`, which includes current data about the most common OpenAI models available via the API (specifically `context_window` and `max_output_tokens`). If Codex does not recognize the model, you can set `model_context_window` and `model_max_output_tokens` explicitly in `config.toml`. When then introduce a new event type to `protocol.rs`, `TokenCount`, which includes the `TokenUsage` for the most recent turn. Finally, we update the TUI to record the running sum of tokens used so the percentage of available context window remaining can be reported via the placeholder text for the composer: ![Screenshot 2025-06-25 at 11 20 55 PM](https://github.com/user-attachments/assets/6fd6982f-7247-4f14-84b2-2e600cb1fd49) We could certainly get much fancier with this (such as reporting the estimated cost of the conversation), but for now, we are just trying to achieve feature parity with the TypeScript CLI. Though arguably this improves upon the TypeScript CLI, as the TypeScript CLI uses heuristics to estimate the number of tokens used rather than using the `usage` information directly: https://github.com/openai/codex/blob/296996d74e345b1b05d8c3451a06ace21c5ada96/codex-cli/src/utils/approximate-tokens-used.ts#L3-L16 Fixes https://github.com/openai/codex/issues/1242
2025-06-25 23:31:11 -07:00
/// Agent text output message
AgentMessage(AgentMessageEvent),
/// User/system input message (what was sent to the model)
UserMessage(UserMessageEvent),
/// Agent text output delta message
AgentMessageDelta(AgentMessageDeltaEvent),
/// Reasoning event from agent.
AgentReasoning(AgentReasoningEvent),
/// Agent reasoning delta event from agent.
AgentReasoningDelta(AgentReasoningDeltaEvent),
/// Raw chain-of-thought from agent.
AgentReasoningRawContent(AgentReasoningRawContentEvent),
/// Agent reasoning content delta event from agent.
AgentReasoningRawContentDelta(AgentReasoningRawContentDeltaEvent),
/// Signaled when the model begins a new reasoning summary section (e.g., a new titled block).
AgentReasoningSectionBreak(AgentReasoningSectionBreakEvent),
/// Ack the client's configure message.
SessionConfigured(SessionConfiguredEvent),
McpToolCallBegin(McpToolCallBeginEvent),
feat: support mcp_servers in config.toml (#829) This adds initial support for MCP servers in the style of Claude Desktop and Cursor. Note this PR is the bare minimum to get things working end to end: all configured MCP servers are launched every time Codex is run, there is no recovery for MCP servers that crash, etc. (Also, I took some shortcuts to change some fields of `Session` to be `pub(crate)`, which also means there are circular deps between `codex.rs` and `mcp_tool_call.rs`, but I will clean that up in a subsequent PR.) `codex-rs/README.md` is updated as part of this PR to explain how to use this feature. There is a bit of plumbing to route the new settings from `Config` to the business logic in `codex.rs`. The most significant chunks for new code are in `mcp_connection_manager.rs` (which defines the `McpConnectionManager` struct) and `mcp_tool_call.rs`, which is responsible for tool calls. This PR also introduces new `McpToolCallBegin` and `McpToolCallEnd` event types to the protocol, but does not add any handlers for them. (See https://github.com/openai/codex/pull/836 for initial usage.) To test, I added the following to my `~/.codex/config.toml`: ```toml # Local build of https://github.com/hideya/mcp-server-weather-js [mcp_servers.weather] command = "/Users/mbolin/code/mcp-server-weather-js/dist/index.js" args = [] ``` And then I ran the following: ``` codex-rs$ cargo run --bin codex exec 'what is the weather in san francisco' [2025-05-06T22:40:05] Task started: 1 [2025-05-06T22:40:18] Agent message: Here’s the latest National Weather Service forecast for San Francisco (downtown, near 37.77° N, 122.42° W): This Afternoon (Tue): • Sunny, high near 69 °F • West-southwest wind around 12 mph Tonight: • Partly cloudy, low around 52 °F • SW wind 7–10 mph ... ``` Note that Codex itself is not able to make network calls, so it would not normally be able to get live weather information like this. However, the weather MCP is [currently] not run under the Codex sandbox, so it is able to hit `api.weather.gov` and fetch current weather information. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/829). * #836 * __->__ #829
2025-05-06 15:47:59 -07:00
McpToolCallEnd(McpToolCallEndEvent),
feat: support mcp_servers in config.toml (#829) This adds initial support for MCP servers in the style of Claude Desktop and Cursor. Note this PR is the bare minimum to get things working end to end: all configured MCP servers are launched every time Codex is run, there is no recovery for MCP servers that crash, etc. (Also, I took some shortcuts to change some fields of `Session` to be `pub(crate)`, which also means there are circular deps between `codex.rs` and `mcp_tool_call.rs`, but I will clean that up in a subsequent PR.) `codex-rs/README.md` is updated as part of this PR to explain how to use this feature. There is a bit of plumbing to route the new settings from `Config` to the business logic in `codex.rs`. The most significant chunks for new code are in `mcp_connection_manager.rs` (which defines the `McpConnectionManager` struct) and `mcp_tool_call.rs`, which is responsible for tool calls. This PR also introduces new `McpToolCallBegin` and `McpToolCallEnd` event types to the protocol, but does not add any handlers for them. (See https://github.com/openai/codex/pull/836 for initial usage.) To test, I added the following to my `~/.codex/config.toml`: ```toml # Local build of https://github.com/hideya/mcp-server-weather-js [mcp_servers.weather] command = "/Users/mbolin/code/mcp-server-weather-js/dist/index.js" args = [] ``` And then I ran the following: ``` codex-rs$ cargo run --bin codex exec 'what is the weather in san francisco' [2025-05-06T22:40:05] Task started: 1 [2025-05-06T22:40:18] Agent message: Here’s the latest National Weather Service forecast for San Francisco (downtown, near 37.77° N, 122.42° W): This Afternoon (Tue): • Sunny, high near 69 °F • West-southwest wind around 12 mph Tonight: • Partly cloudy, low around 52 °F • SW wind 7–10 mph ... ``` Note that Codex itself is not able to make network calls, so it would not normally be able to get live weather information like this. However, the weather MCP is [currently] not run under the Codex sandbox, so it is able to hit `api.weather.gov` and fetch current weather information. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/829). * #836 * __->__ #829
2025-05-06 15:47:59 -07:00
WebSearchBegin(WebSearchBeginEvent),
WebSearchEnd(WebSearchEndEvent),
/// Notification that the server is about to execute a command.
ExecCommandBegin(ExecCommandBeginEvent),
feat: support mcp_servers in config.toml (#829) This adds initial support for MCP servers in the style of Claude Desktop and Cursor. Note this PR is the bare minimum to get things working end to end: all configured MCP servers are launched every time Codex is run, there is no recovery for MCP servers that crash, etc. (Also, I took some shortcuts to change some fields of `Session` to be `pub(crate)`, which also means there are circular deps between `codex.rs` and `mcp_tool_call.rs`, but I will clean that up in a subsequent PR.) `codex-rs/README.md` is updated as part of this PR to explain how to use this feature. There is a bit of plumbing to route the new settings from `Config` to the business logic in `codex.rs`. The most significant chunks for new code are in `mcp_connection_manager.rs` (which defines the `McpConnectionManager` struct) and `mcp_tool_call.rs`, which is responsible for tool calls. This PR also introduces new `McpToolCallBegin` and `McpToolCallEnd` event types to the protocol, but does not add any handlers for them. (See https://github.com/openai/codex/pull/836 for initial usage.) To test, I added the following to my `~/.codex/config.toml`: ```toml # Local build of https://github.com/hideya/mcp-server-weather-js [mcp_servers.weather] command = "/Users/mbolin/code/mcp-server-weather-js/dist/index.js" args = [] ``` And then I ran the following: ``` codex-rs$ cargo run --bin codex exec 'what is the weather in san francisco' [2025-05-06T22:40:05] Task started: 1 [2025-05-06T22:40:18] Agent message: Here’s the latest National Weather Service forecast for San Francisco (downtown, near 37.77° N, 122.42° W): This Afternoon (Tue): • Sunny, high near 69 °F • West-southwest wind around 12 mph Tonight: • Partly cloudy, low around 52 °F • SW wind 7–10 mph ... ``` Note that Codex itself is not able to make network calls, so it would not normally be able to get live weather information like this. However, the weather MCP is [currently] not run under the Codex sandbox, so it is able to hit `api.weather.gov` and fetch current weather information. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/829). * #836 * __->__ #829
2025-05-06 15:47:59 -07:00
/// Incremental chunk of output from a running command.
ExecCommandOutputDelta(ExecCommandOutputDeltaEvent),
ExecCommandEnd(ExecCommandEndEvent),
feat: support mcp_servers in config.toml (#829) This adds initial support for MCP servers in the style of Claude Desktop and Cursor. Note this PR is the bare minimum to get things working end to end: all configured MCP servers are launched every time Codex is run, there is no recovery for MCP servers that crash, etc. (Also, I took some shortcuts to change some fields of `Session` to be `pub(crate)`, which also means there are circular deps between `codex.rs` and `mcp_tool_call.rs`, but I will clean that up in a subsequent PR.) `codex-rs/README.md` is updated as part of this PR to explain how to use this feature. There is a bit of plumbing to route the new settings from `Config` to the business logic in `codex.rs`. The most significant chunks for new code are in `mcp_connection_manager.rs` (which defines the `McpConnectionManager` struct) and `mcp_tool_call.rs`, which is responsible for tool calls. This PR also introduces new `McpToolCallBegin` and `McpToolCallEnd` event types to the protocol, but does not add any handlers for them. (See https://github.com/openai/codex/pull/836 for initial usage.) To test, I added the following to my `~/.codex/config.toml`: ```toml # Local build of https://github.com/hideya/mcp-server-weather-js [mcp_servers.weather] command = "/Users/mbolin/code/mcp-server-weather-js/dist/index.js" args = [] ``` And then I ran the following: ``` codex-rs$ cargo run --bin codex exec 'what is the weather in san francisco' [2025-05-06T22:40:05] Task started: 1 [2025-05-06T22:40:18] Agent message: Here’s the latest National Weather Service forecast for San Francisco (downtown, near 37.77° N, 122.42° W): This Afternoon (Tue): • Sunny, high near 69 °F • West-southwest wind around 12 mph Tonight: • Partly cloudy, low around 52 °F • SW wind 7–10 mph ... ``` Note that Codex itself is not able to make network calls, so it would not normally be able to get live weather information like this. However, the weather MCP is [currently] not run under the Codex sandbox, so it is able to hit `api.weather.gov` and fetch current weather information. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/829). * #836 * __->__ #829
2025-05-06 15:47:59 -07:00
ExecApprovalRequest(ExecApprovalRequestEvent),
feat: support mcp_servers in config.toml (#829) This adds initial support for MCP servers in the style of Claude Desktop and Cursor. Note this PR is the bare minimum to get things working end to end: all configured MCP servers are launched every time Codex is run, there is no recovery for MCP servers that crash, etc. (Also, I took some shortcuts to change some fields of `Session` to be `pub(crate)`, which also means there are circular deps between `codex.rs` and `mcp_tool_call.rs`, but I will clean that up in a subsequent PR.) `codex-rs/README.md` is updated as part of this PR to explain how to use this feature. There is a bit of plumbing to route the new settings from `Config` to the business logic in `codex.rs`. The most significant chunks for new code are in `mcp_connection_manager.rs` (which defines the `McpConnectionManager` struct) and `mcp_tool_call.rs`, which is responsible for tool calls. This PR also introduces new `McpToolCallBegin` and `McpToolCallEnd` event types to the protocol, but does not add any handlers for them. (See https://github.com/openai/codex/pull/836 for initial usage.) To test, I added the following to my `~/.codex/config.toml`: ```toml # Local build of https://github.com/hideya/mcp-server-weather-js [mcp_servers.weather] command = "/Users/mbolin/code/mcp-server-weather-js/dist/index.js" args = [] ``` And then I ran the following: ``` codex-rs$ cargo run --bin codex exec 'what is the weather in san francisco' [2025-05-06T22:40:05] Task started: 1 [2025-05-06T22:40:18] Agent message: Here’s the latest National Weather Service forecast for San Francisco (downtown, near 37.77° N, 122.42° W): This Afternoon (Tue): • Sunny, high near 69 °F • West-southwest wind around 12 mph Tonight: • Partly cloudy, low around 52 °F • SW wind 7–10 mph ... ``` Note that Codex itself is not able to make network calls, so it would not normally be able to get live weather information like this. However, the weather MCP is [currently] not run under the Codex sandbox, so it is able to hit `api.weather.gov` and fetch current weather information. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/829). * #836 * __->__ #829
2025-05-06 15:47:59 -07:00
ApplyPatchApprovalRequest(ApplyPatchApprovalRequestEvent),
feat: support mcp_servers in config.toml (#829) This adds initial support for MCP servers in the style of Claude Desktop and Cursor. Note this PR is the bare minimum to get things working end to end: all configured MCP servers are launched every time Codex is run, there is no recovery for MCP servers that crash, etc. (Also, I took some shortcuts to change some fields of `Session` to be `pub(crate)`, which also means there are circular deps between `codex.rs` and `mcp_tool_call.rs`, but I will clean that up in a subsequent PR.) `codex-rs/README.md` is updated as part of this PR to explain how to use this feature. There is a bit of plumbing to route the new settings from `Config` to the business logic in `codex.rs`. The most significant chunks for new code are in `mcp_connection_manager.rs` (which defines the `McpConnectionManager` struct) and `mcp_tool_call.rs`, which is responsible for tool calls. This PR also introduces new `McpToolCallBegin` and `McpToolCallEnd` event types to the protocol, but does not add any handlers for them. (See https://github.com/openai/codex/pull/836 for initial usage.) To test, I added the following to my `~/.codex/config.toml`: ```toml # Local build of https://github.com/hideya/mcp-server-weather-js [mcp_servers.weather] command = "/Users/mbolin/code/mcp-server-weather-js/dist/index.js" args = [] ``` And then I ran the following: ``` codex-rs$ cargo run --bin codex exec 'what is the weather in san francisco' [2025-05-06T22:40:05] Task started: 1 [2025-05-06T22:40:18] Agent message: Here’s the latest National Weather Service forecast for San Francisco (downtown, near 37.77° N, 122.42° W): This Afternoon (Tue): • Sunny, high near 69 °F • West-southwest wind around 12 mph Tonight: • Partly cloudy, low around 52 °F • SW wind 7–10 mph ... ``` Note that Codex itself is not able to make network calls, so it would not normally be able to get live weather information like this. However, the weather MCP is [currently] not run under the Codex sandbox, so it is able to hit `api.weather.gov` and fetch current weather information. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/829). * #836 * __->__ #829
2025-05-06 15:47:59 -07:00
BackgroundEvent(BackgroundEventEvent),
feat: support mcp_servers in config.toml (#829) This adds initial support for MCP servers in the style of Claude Desktop and Cursor. Note this PR is the bare minimum to get things working end to end: all configured MCP servers are launched every time Codex is run, there is no recovery for MCP servers that crash, etc. (Also, I took some shortcuts to change some fields of `Session` to be `pub(crate)`, which also means there are circular deps between `codex.rs` and `mcp_tool_call.rs`, but I will clean that up in a subsequent PR.) `codex-rs/README.md` is updated as part of this PR to explain how to use this feature. There is a bit of plumbing to route the new settings from `Config` to the business logic in `codex.rs`. The most significant chunks for new code are in `mcp_connection_manager.rs` (which defines the `McpConnectionManager` struct) and `mcp_tool_call.rs`, which is responsible for tool calls. This PR also introduces new `McpToolCallBegin` and `McpToolCallEnd` event types to the protocol, but does not add any handlers for them. (See https://github.com/openai/codex/pull/836 for initial usage.) To test, I added the following to my `~/.codex/config.toml`: ```toml # Local build of https://github.com/hideya/mcp-server-weather-js [mcp_servers.weather] command = "/Users/mbolin/code/mcp-server-weather-js/dist/index.js" args = [] ``` And then I ran the following: ``` codex-rs$ cargo run --bin codex exec 'what is the weather in san francisco' [2025-05-06T22:40:05] Task started: 1 [2025-05-06T22:40:18] Agent message: Here’s the latest National Weather Service forecast for San Francisco (downtown, near 37.77° N, 122.42° W): This Afternoon (Tue): • Sunny, high near 69 °F • West-southwest wind around 12 mph Tonight: • Partly cloudy, low around 52 °F • SW wind 7–10 mph ... ``` Note that Codex itself is not able to make network calls, so it would not normally be able to get live weather information like this. However, the weather MCP is [currently] not run under the Codex sandbox, so it is able to hit `api.weather.gov` and fetch current weather information. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/829). * #836 * __->__ #829
2025-05-06 15:47:59 -07:00
2025-08-21 01:15:24 -07:00
/// Notification that a model stream experienced an error or disconnect
/// and the system is handling it (e.g., retrying with backoff).
StreamError(StreamErrorEvent),
/// Notification that the agent is about to apply a code patch. Mirrors
/// `ExecCommandBegin` so frontends can show progress indicators.
PatchApplyBegin(PatchApplyBeginEvent),
/// Notification that a patch application has finished.
PatchApplyEnd(PatchApplyEndEvent),
feat: record messages from user in ~/.codex/history.jsonl (#939) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
TurnDiff(TurnDiffEvent),
feat: record messages from user in ~/.codex/history.jsonl (#939) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
/// Response to GetHistoryEntryRequest.
GetHistoryEntryResponse(GetHistoryEntryResponseEvent),
/// List of MCP tools available to the agent.
McpListToolsResponse(McpListToolsResponseEvent),
/// List of custom prompts available to the agent.
ListCustomPromptsResponse(ListCustomPromptsResponseEvent),
PlanUpdate(UpdatePlanArgs),
TurnAborted(TurnAbortedEvent),
/// Notification that the agent is shutting down.
ShutdownComplete,
ConversationPath(ConversationPathResponseEvent),
Review Mode (Core) (#3401) ## 📝 Review Mode -- Core This PR introduces the Core implementation for Review mode: - New op `Op::Review { prompt: String }:` spawns a child review task with isolated context, a review‑specific system prompt, and a `Config.review_model`. - `EnteredReviewMode`: emitted when the child review session starts. Every event from this point onwards reflects the review session. - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review finishes or is interrupted, with optional structured findings: ```json { "findings": [ { "title": "<≤ 80 chars, imperative>", "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>", "confidence_score": <float 0.0-1.0>, "priority": <int 0-3>, "code_location": { "absolute_file_path": "<file path>", "line_range": {"start": <int>, "end": <int>} } } ], "overall_correctness": "patch is correct" | "patch is incorrect", "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>", "overall_confidence_score": <float 0.0-1.0> } ``` ## Questions ### Why separate out its own message history? We want the review thread to match the training of our review models as much as possible -- that means using a custom prompt, removing user instructions, and starting a clean chat history. We also want to make sure the review thread doesn't leak into the parent thread. ### Why do this as a mode, vs. sub-agents? 1. We want review to be a synchronous task, so it's fine for now to do a bespoke implementation. 2. We're still unclear about the final structure for sub-agents. We'd prefer to land this quickly and then refactor into sub-agents without rushing that implementation.
2025-09-12 16:25:10 -07:00
/// Entered review mode.
EnteredReviewMode(ReviewRequest),
/// Exited review mode with an optional final result to apply.
2025-09-14 17:34:33 -07:00
ExitedReviewMode(ExitedReviewModeEvent),
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct ExitedReviewModeEvent {
pub review_output: Option<ReviewOutputEvent>,
}
// Individual event payload types matching each `EventMsg` variant.
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct ErrorEvent {
pub message: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct TaskCompleteEvent {
pub last_agent_message: Option<String>,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct TaskStartedEvent {
pub model_context_window: Option<u64>,
}
#[derive(Debug, Clone, Deserialize, Serialize, Default, TS)]
feat: show number of tokens remaining in UI (#1388) When using the OpenAI Responses API, we now record the `usage` field for a `"response.completed"` event, which includes metrics about the number of tokens consumed. We also introduce `openai_model_info.rs`, which includes current data about the most common OpenAI models available via the API (specifically `context_window` and `max_output_tokens`). If Codex does not recognize the model, you can set `model_context_window` and `model_max_output_tokens` explicitly in `config.toml`. When then introduce a new event type to `protocol.rs`, `TokenCount`, which includes the `TokenUsage` for the most recent turn. Finally, we update the TUI to record the running sum of tokens used so the percentage of available context window remaining can be reported via the placeholder text for the composer: ![Screenshot 2025-06-25 at 11 20 55 PM](https://github.com/user-attachments/assets/6fd6982f-7247-4f14-84b2-2e600cb1fd49) We could certainly get much fancier with this (such as reporting the estimated cost of the conversation), but for now, we are just trying to achieve feature parity with the TypeScript CLI. Though arguably this improves upon the TypeScript CLI, as the TypeScript CLI uses heuristics to estimate the number of tokens used rather than using the `usage` information directly: https://github.com/openai/codex/blob/296996d74e345b1b05d8c3451a06ace21c5ada96/codex-cli/src/utils/approximate-tokens-used.ts#L3-L16 Fixes https://github.com/openai/codex/issues/1242
2025-06-25 23:31:11 -07:00
pub struct TokenUsage {
pub input_tokens: u64,
pub cached_input_tokens: u64,
feat: show number of tokens remaining in UI (#1388) When using the OpenAI Responses API, we now record the `usage` field for a `"response.completed"` event, which includes metrics about the number of tokens consumed. We also introduce `openai_model_info.rs`, which includes current data about the most common OpenAI models available via the API (specifically `context_window` and `max_output_tokens`). If Codex does not recognize the model, you can set `model_context_window` and `model_max_output_tokens` explicitly in `config.toml`. When then introduce a new event type to `protocol.rs`, `TokenCount`, which includes the `TokenUsage` for the most recent turn. Finally, we update the TUI to record the running sum of tokens used so the percentage of available context window remaining can be reported via the placeholder text for the composer: ![Screenshot 2025-06-25 at 11 20 55 PM](https://github.com/user-attachments/assets/6fd6982f-7247-4f14-84b2-2e600cb1fd49) We could certainly get much fancier with this (such as reporting the estimated cost of the conversation), but for now, we are just trying to achieve feature parity with the TypeScript CLI. Though arguably this improves upon the TypeScript CLI, as the TypeScript CLI uses heuristics to estimate the number of tokens used rather than using the `usage` information directly: https://github.com/openai/codex/blob/296996d74e345b1b05d8c3451a06ace21c5ada96/codex-cli/src/utils/approximate-tokens-used.ts#L3-L16 Fixes https://github.com/openai/codex/issues/1242
2025-06-25 23:31:11 -07:00
pub output_tokens: u64,
pub reasoning_output_tokens: u64,
feat: show number of tokens remaining in UI (#1388) When using the OpenAI Responses API, we now record the `usage` field for a `"response.completed"` event, which includes metrics about the number of tokens consumed. We also introduce `openai_model_info.rs`, which includes current data about the most common OpenAI models available via the API (specifically `context_window` and `max_output_tokens`). If Codex does not recognize the model, you can set `model_context_window` and `model_max_output_tokens` explicitly in `config.toml`. When then introduce a new event type to `protocol.rs`, `TokenCount`, which includes the `TokenUsage` for the most recent turn. Finally, we update the TUI to record the running sum of tokens used so the percentage of available context window remaining can be reported via the placeholder text for the composer: ![Screenshot 2025-06-25 at 11 20 55 PM](https://github.com/user-attachments/assets/6fd6982f-7247-4f14-84b2-2e600cb1fd49) We could certainly get much fancier with this (such as reporting the estimated cost of the conversation), but for now, we are just trying to achieve feature parity with the TypeScript CLI. Though arguably this improves upon the TypeScript CLI, as the TypeScript CLI uses heuristics to estimate the number of tokens used rather than using the `usage` information directly: https://github.com/openai/codex/blob/296996d74e345b1b05d8c3451a06ace21c5ada96/codex-cli/src/utils/approximate-tokens-used.ts#L3-L16 Fixes https://github.com/openai/codex/issues/1242
2025-06-25 23:31:11 -07:00
pub total_tokens: u64,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct TokenUsageInfo {
pub total_token_usage: TokenUsage,
pub last_token_usage: TokenUsage,
pub model_context_window: Option<u64>,
}
impl TokenUsageInfo {
pub fn new_or_append(
info: &Option<TokenUsageInfo>,
last: &Option<TokenUsage>,
model_context_window: Option<u64>,
) -> Option<Self> {
if info.is_none() && last.is_none() {
return None;
}
let mut info = match info {
Some(info) => info.clone(),
None => Self {
total_token_usage: TokenUsage::default(),
last_token_usage: TokenUsage::default(),
model_context_window,
},
};
if let Some(last) = last {
info.append_last_usage(last);
}
Some(info)
}
pub fn append_last_usage(&mut self, last: &TokenUsage) {
self.total_token_usage.add_assign(last);
self.last_token_usage = last.clone();
}
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct TokenCountEvent {
pub info: Option<TokenUsageInfo>,
pub rate_limits: Option<RateLimitSnapshot>,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct RateLimitSnapshot {
pub primary: Option<RateLimitWindow>,
pub secondary: Option<RateLimitWindow>,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct RateLimitWindow {
/// Percentage (0-100) of the window that has been consumed.
pub used_percent: f64,
/// Rolling window duration, in minutes.
pub window_minutes: Option<u64>,
/// Seconds until the window resets.
pub resets_in_seconds: Option<u64>,
}
// Includes prompts, tools and space to call compact.
const BASELINE_TOKENS: u64 = 12000;
impl TokenUsage {
pub fn is_zero(&self) -> bool {
self.total_tokens == 0
}
pub fn cached_input(&self) -> u64 {
self.cached_input_tokens
}
pub fn non_cached_input(&self) -> u64 {
self.input_tokens.saturating_sub(self.cached_input())
}
/// Primary count for display as a single absolute value: non-cached input + output.
pub fn blended_total(&self) -> u64 {
self.non_cached_input() + self.output_tokens
}
/// For estimating what % of the model's context window is used, we need to account
/// for reasoning output tokens from prior turns being dropped from the context window.
/// We approximate this here by subtracting reasoning output tokens from the total.
/// This will be off for the current turn and pending function calls.
pub fn tokens_in_context_window(&self) -> u64 {
self.total_tokens
.saturating_sub(self.reasoning_output_tokens)
}
/// Estimate the remaining user-controllable percentage of the model's context window.
///
/// `context_window` is the total size of the model's context window.
/// `BASELINE_TOKENS` should capture tokens that are always present in
/// the context (e.g., system prompt and fixed tool instructions) so that
/// the percentage reflects the portion the user can influence.
///
/// This normalizes both the numerator and denominator by subtracting the
/// baseline, so immediately after the first prompt the UI shows 100% left
/// and trends toward 0% as the user fills the effective window.
pub fn percent_of_context_window_remaining(&self, context_window: u64) -> u8 {
if context_window <= BASELINE_TOKENS {
return 0;
}
let effective_window = context_window - BASELINE_TOKENS;
let used = self
.tokens_in_context_window()
.saturating_sub(BASELINE_TOKENS);
let remaining = effective_window.saturating_sub(used);
((remaining as f32 / effective_window as f32) * 100.0).clamp(0.0, 100.0) as u8
}
/// In-place element-wise sum of token counts.
pub fn add_assign(&mut self, other: &TokenUsage) {
self.input_tokens += other.input_tokens;
self.cached_input_tokens += other.cached_input_tokens;
self.output_tokens += other.output_tokens;
self.reasoning_output_tokens += other.reasoning_output_tokens;
self.total_tokens += other.total_tokens;
}
}
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct FinalOutput {
pub token_usage: TokenUsage,
}
impl From<TokenUsage> for FinalOutput {
fn from(token_usage: TokenUsage) -> Self {
Self { token_usage }
}
}
impl fmt::Display for FinalOutput {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let token_usage = &self.token_usage;
write!(
f,
"Token usage: total={} input={}{} output={}{}",
format_with_separators(token_usage.blended_total()),
format_with_separators(token_usage.non_cached_input()),
if token_usage.cached_input() > 0 {
format!(
" (+ {} cached)",
format_with_separators(token_usage.cached_input())
)
} else {
String::new()
},
format_with_separators(token_usage.output_tokens),
if token_usage.reasoning_output_tokens > 0 {
format!(
" (reasoning {})",
format_with_separators(token_usage.reasoning_output_tokens)
)
} else {
String::new()
}
)
}
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct AgentMessageEvent {
pub message: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
#[serde(rename_all = "snake_case")]
pub enum InputMessageKind {
/// Plain user text (default)
Plain,
/// XML-wrapped user instructions (<user_instructions>...)
UserInstructions,
/// XML-wrapped environment context (<environment_context>...)
EnvironmentContext,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct UserMessageEvent {
pub message: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub kind: Option<InputMessageKind>,
#[serde(skip_serializing_if = "Option::is_none")]
pub images: Option<Vec<String>>,
}
impl<T, U> From<(T, U)> for InputMessageKind
where
T: AsRef<str>,
U: AsRef<str>,
{
fn from(value: (T, U)) -> Self {
let (_role, message) = value;
let message = message.as_ref();
let trimmed = message.trim();
if starts_with_ignore_ascii_case(trimmed, ENVIRONMENT_CONTEXT_OPEN_TAG)
&& ends_with_ignore_ascii_case(trimmed, ENVIRONMENT_CONTEXT_CLOSE_TAG)
{
InputMessageKind::EnvironmentContext
} else if starts_with_ignore_ascii_case(trimmed, USER_INSTRUCTIONS_OPEN_TAG)
&& ends_with_ignore_ascii_case(trimmed, USER_INSTRUCTIONS_CLOSE_TAG)
{
InputMessageKind::UserInstructions
} else {
InputMessageKind::Plain
}
}
}
fn starts_with_ignore_ascii_case(text: &str, prefix: &str) -> bool {
let text_bytes = text.as_bytes();
let prefix_bytes = prefix.as_bytes();
text_bytes.len() >= prefix_bytes.len()
&& text_bytes
.iter()
.zip(prefix_bytes.iter())
.all(|(a, b)| a.eq_ignore_ascii_case(b))
}
fn ends_with_ignore_ascii_case(text: &str, suffix: &str) -> bool {
let text_bytes = text.as_bytes();
let suffix_bytes = suffix.as_bytes();
text_bytes.len() >= suffix_bytes.len()
&& text_bytes[text_bytes.len() - suffix_bytes.len()..]
.iter()
.zip(suffix_bytes.iter())
.all(|(a, b)| a.eq_ignore_ascii_case(b))
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct AgentMessageDeltaEvent {
pub delta: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct AgentReasoningEvent {
pub text: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct AgentReasoningRawContentEvent {
pub text: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct AgentReasoningRawContentDeltaEvent {
pub delta: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct AgentReasoningSectionBreakEvent {}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct AgentReasoningDeltaEvent {
pub delta: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct McpInvocation {
/// Name of the MCP server as defined in the config.
pub server: String,
/// Name of the tool as given by the MCP server.
pub tool: String,
/// Arguments to the tool call.
pub arguments: Option<serde_json::Value>,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct McpToolCallBeginEvent {
/// Identifier so this can be paired with the McpToolCallEnd event.
pub call_id: String,
pub invocation: McpInvocation,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct McpToolCallEndEvent {
/// Identifier for the corresponding McpToolCallBegin that finished.
pub call_id: String,
pub invocation: McpInvocation,
#[ts(type = "string")]
pub duration: Duration,
/// Result of the tool call. Note this could be an error.
fix: introduce ResponseInputItem::McpToolCallOutput variant (#1151) The output of an MCP server tool call can be one of several types, but to date, we treated all outputs as text by showing the serialized JSON as the "tool output" in Codex: https://github.com/openai/codex/blob/25a9949c49194d5a64de54a11bcc5b4724ac9bd5/codex-rs/mcp-types/src/lib.rs#L96-L101 This PR adds support for the `ImageContent` variant so we can now display an image output from an MCP tool call. In making this change, we introduce a new `ResponseInputItem::McpToolCallOutput` variant so that we can work with the `mcp_types::CallToolResult` directly when the function call is made to an MCP server. Though arguably the more significant change is the introduction of `HistoryCell::CompletedMcpToolCallWithImageOutput`, which is a cell that uses `ratatui_image` to render an image into the terminal. To support this, we introduce `ImageRenderCache`, cache a `ratatui_image::picker::Picker`, and `ensure_image_cache()` to cache the appropriate scaled image data and dimensions based on the current terminal size. To test, I created a minimal `package.json`: ```json { "name": "kitty-mcp", "version": "1.0.0", "type": "module", "description": "MCP that returns image of kitty", "main": "index.js", "dependencies": { "@modelcontextprotocol/sdk": "^1.12.0" } } ``` with the following `index.js` to define the MCP server: ```js #!/usr/bin/env node import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js"; import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js"; import { readFile } from "node:fs/promises"; import { join } from "node:path"; const IMAGE_URI = "image://Ada.png"; const server = new McpServer({ name: "Demo", version: "1.0.0", }); server.tool( "get-cat-image", "If you need a cat image, this tool will provide one.", async () => ({ content: [ { type: "image", data: await getAdaPngBase64(), mimeType: "image/png" }, ], }) ); server.resource("Ada the Cat", IMAGE_URI, async (uri) => { const base64Image = await getAdaPngBase64(); return { contents: [ { uri: uri.href, mimeType: "image/png", blob: base64Image, }, ], }; }); async function getAdaPngBase64() { const __dirname = new URL(".", import.meta.url).pathname; // From https://github.com/benjajaja/ratatui-image/blob/9705ce2c59ec669abbce2924cbfd1f5ae22c9860/assets/Ada.png const filePath = join(__dirname, "Ada.png"); const imageData = await readFile(filePath); const base64Image = imageData.toString("base64"); return base64Image; } const transport = new StdioServerTransport(); await server.connect(transport); ``` With the local changes from this PR, I added the following to my `config.toml`: ```toml [mcp_servers.kitty] command = "node" args = ["/Users/mbolin/code/kitty-mcp/index.js"] ``` Running the TUI from source: ``` cargo run --bin codex -- --model o3 'I need a picture of a cat' ``` I get: <img width="732" alt="image" src="https://github.com/user-attachments/assets/bf80b721-9ca0-4d81-aec7-77d6899e2869" /> Now, that said, I have only tested in iTerm and there is definitely some funny business with getting an accurate character-to-pixel ratio (sometimes the `CompletedMcpToolCallWithImageOutput` thinks it needs 10 rows to render instead of 4), so there is still work to be done here.
2025-05-28 19:03:17 -07:00
pub result: Result<CallToolResult, String>,
}
impl McpToolCallEndEvent {
pub fn is_success(&self) -> bool {
match &self.result {
Ok(result) => !result.is_error.unwrap_or(false),
Err(_) => false,
}
}
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct WebSearchBeginEvent {
pub call_id: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct WebSearchEndEvent {
pub call_id: String,
pub query: String,
}
/// Response payload for `Op::GetHistory` containing the current session's
/// in-memory transcript.
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct ConversationPathResponseEvent {
pub conversation_id: ConversationId,
pub path: PathBuf,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct ResumedHistory {
pub conversation_id: ConversationId,
pub history: Vec<RolloutItem>,
pub rollout_path: PathBuf,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub enum InitialHistory {
New,
Resumed(ResumedHistory),
Forked(Vec<RolloutItem>),
}
impl InitialHistory {
pub fn get_rollout_items(&self) -> Vec<RolloutItem> {
match self {
InitialHistory::New => Vec::new(),
InitialHistory::Resumed(resumed) => resumed.history.clone(),
InitialHistory::Forked(items) => items.clone(),
}
}
pub fn get_event_msgs(&self) -> Option<Vec<EventMsg>> {
match self {
InitialHistory::New => None,
InitialHistory::Resumed(resumed) => Some(
resumed
.history
.iter()
.filter_map(|ri| match ri {
RolloutItem::EventMsg(ev) => Some(ev.clone()),
_ => None,
})
.collect(),
),
InitialHistory::Forked(items) => Some(
items
.iter()
.filter_map(|ri| match ri {
RolloutItem::EventMsg(ev) => Some(ev.clone()),
_ => None,
})
.collect(),
),
}
}
}
#[derive(Serialize, Deserialize, Clone, Default, Debug, TS)]
pub struct SessionMeta {
pub id: ConversationId,
pub timestamp: String,
pub cwd: PathBuf,
pub originator: String,
pub cli_version: String,
pub instructions: Option<String>,
}
#[derive(Serialize, Deserialize, Debug, Clone, TS)]
pub struct SessionMetaLine {
#[serde(flatten)]
pub meta: SessionMeta,
#[serde(skip_serializing_if = "Option::is_none")]
pub git: Option<GitInfo>,
}
#[derive(Serialize, Deserialize, Debug, Clone, TS)]
#[serde(tag = "type", content = "payload", rename_all = "snake_case")]
pub enum RolloutItem {
SessionMeta(SessionMetaLine),
ResponseItem(ResponseItem),
Compacted(CompactedItem),
TurnContext(TurnContextItem),
EventMsg(EventMsg),
}
#[derive(Serialize, Deserialize, Clone, Debug, TS)]
pub struct CompactedItem {
pub message: String,
}
impl From<CompactedItem> for ResponseItem {
fn from(value: CompactedItem) -> Self {
ResponseItem::Message {
id: None,
role: "assistant".to_string(),
content: vec![ContentItem::OutputText {
text: value.message,
}],
}
}
}
#[derive(Serialize, Deserialize, Clone, Debug, TS)]
pub struct TurnContextItem {
pub cwd: PathBuf,
pub approval_policy: AskForApproval,
pub sandbox_policy: SandboxPolicy,
pub model: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub effort: Option<ReasoningEffortConfig>,
pub summary: ReasoningSummaryConfig,
}
#[derive(Serialize, Deserialize, Clone)]
pub struct RolloutLine {
pub timestamp: String,
#[serde(flatten)]
pub item: RolloutItem,
}
#[derive(Serialize, Deserialize, Clone, Debug, TS)]
pub struct GitInfo {
/// Current commit hash (SHA)
#[serde(skip_serializing_if = "Option::is_none")]
pub commit_hash: Option<String>,
/// Current branch name
#[serde(skip_serializing_if = "Option::is_none")]
pub branch: Option<String>,
/// Repository URL (if available from remote)
#[serde(skip_serializing_if = "Option::is_none")]
pub repository_url: Option<String>,
}
Review Mode (Core) (#3401) ## 📝 Review Mode -- Core This PR introduces the Core implementation for Review mode: - New op `Op::Review { prompt: String }:` spawns a child review task with isolated context, a review‑specific system prompt, and a `Config.review_model`. - `EnteredReviewMode`: emitted when the child review session starts. Every event from this point onwards reflects the review session. - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review finishes or is interrupted, with optional structured findings: ```json { "findings": [ { "title": "<≤ 80 chars, imperative>", "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>", "confidence_score": <float 0.0-1.0>, "priority": <int 0-3>, "code_location": { "absolute_file_path": "<file path>", "line_range": {"start": <int>, "end": <int>} } } ], "overall_correctness": "patch is correct" | "patch is incorrect", "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>", "overall_confidence_score": <float 0.0-1.0> } ``` ## Questions ### Why separate out its own message history? We want the review thread to match the training of our review models as much as possible -- that means using a custom prompt, removing user instructions, and starting a clean chat history. We also want to make sure the review thread doesn't leak into the parent thread. ### Why do this as a mode, vs. sub-agents? 1. We want review to be a synchronous task, so it's fine for now to do a bespoke implementation. 2. We're still unclear about the final structure for sub-agents. We'd prefer to land this quickly and then refactor into sub-agents without rushing that implementation.
2025-09-12 16:25:10 -07:00
/// Review request sent to the review session.
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, TS)]
pub struct ReviewRequest {
pub prompt: String,
pub user_facing_hint: String,
}
/// Structured review result produced by a child review session.
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, TS)]
pub struct ReviewOutputEvent {
pub findings: Vec<ReviewFinding>,
pub overall_correctness: String,
pub overall_explanation: String,
pub overall_confidence_score: f32,
}
impl Default for ReviewOutputEvent {
fn default() -> Self {
Self {
findings: Vec::new(),
overall_correctness: String::default(),
overall_explanation: String::default(),
overall_confidence_score: 0.0,
}
}
}
/// A single review finding describing an observed issue or recommendation.
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, TS)]
pub struct ReviewFinding {
pub title: String,
pub body: String,
pub confidence_score: f32,
pub priority: i32,
pub code_location: ReviewCodeLocation,
}
/// Location of the code related to a review finding.
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, TS)]
pub struct ReviewCodeLocation {
pub absolute_file_path: PathBuf,
pub line_range: ReviewLineRange,
}
/// Inclusive line range in a file associated with the finding.
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, TS)]
pub struct ReviewLineRange {
pub start: u32,
pub end: u32,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct ExecCommandBeginEvent {
/// Identifier so this can be paired with the ExecCommandEnd event.
pub call_id: String,
/// The command to be executed.
pub command: Vec<String>,
/// The command's working directory if not the default cwd for the agent.
pub cwd: PathBuf,
[1/3] Parse exec commands and format them more nicely in the UI (#2095) # Note for reviewers The bulk of this PR is in in the new file, `parse_command.rs`. This file is designed to be written TDD and implemented with Codex. Do not worry about reviewing the code, just review the unit tests (if you want). If any cases are missing, we'll add more tests and have Codex fix them. I think the best approach will be to land and iterate. I have some follow-ups I want to do after this lands. The next PR after this will let us merge (and dedupe) multiple sequential cells of the same such as multiple read commands. The deduping will also be important because the model often reads the same file multiple times in a row in chunks === This PR formats common commands like reading, formatting, testing, etc more nicely: It tries to extract things like file names, tests and falls back to the cmd if it doesn't. It also only shows stdout/err if the command failed. <img width="770" height="238" alt="CleanShot 2025-08-09 at 16 05 15" src="https://github.com/user-attachments/assets/0ead179a-8910-486b-aa3d-7d26264d751e" /> <img width="348" height="158" alt="CleanShot 2025-08-09 at 16 05 32" src="https://github.com/user-attachments/assets/4302681b-5e87-4ff3-85b4-0252c6c485a9" /> <img width="834" height="324" alt="CleanShot 2025-08-09 at 16 05 56 2" src="https://github.com/user-attachments/assets/09fb3517-7bd6-40f6-a126-4172106b700f" /> Part 2: https://github.com/openai/codex/pull/2097 Part 3: https://github.com/openai/codex/pull/2110
2025-08-11 11:26:15 -07:00
pub parsed_cmd: Vec<ParsedCommand>,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct ExecCommandEndEvent {
/// Identifier for the ExecCommandBegin that finished.
pub call_id: String,
/// Captured stdout
pub stdout: String,
/// Captured stderr
pub stderr: String,
/// Captured aggregated output
#[serde(default)]
pub aggregated_output: String,
/// The command's exit code.
pub exit_code: i32,
/// The duration of the command execution.
#[ts(type = "string")]
pub duration: Duration,
/// Formatted output from the command, as seen by the model.
pub formatted_output: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, TS)]
#[serde(rename_all = "snake_case")]
pub enum ExecOutputStream {
Stdout,
Stderr,
}
#[serde_as]
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, TS)]
pub struct ExecCommandOutputDeltaEvent {
/// Identifier for the ExecCommandBegin that produced this chunk.
pub call_id: String,
/// Which stream produced this chunk.
pub stream: ExecOutputStream,
/// Raw bytes from the stream (may not be valid UTF-8).
#[serde_as(as = "serde_with::base64::Base64")]
#[ts(type = "string")]
pub chunk: Vec<u8>,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct ExecApprovalRequestEvent {
/// Identifier for the associated exec call, if available.
pub call_id: String,
/// The command to be executed.
pub command: Vec<String>,
/// The command's working directory.
pub cwd: PathBuf,
/// Optional human-readable reason for the approval (e.g. retry without sandbox).
#[serde(skip_serializing_if = "Option::is_none")]
pub reason: Option<String>,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct ApplyPatchApprovalRequestEvent {
/// Responses API call id for the associated patch apply call, if available.
pub call_id: String,
pub changes: HashMap<PathBuf, FileChange>,
/// Optional explanatory reason (e.g. request for extra write access).
#[serde(skip_serializing_if = "Option::is_none")]
pub reason: Option<String>,
/// When set, the agent is asking the user to allow writes under this root for the remainder of the session.
#[serde(skip_serializing_if = "Option::is_none")]
pub grant_root: Option<PathBuf>,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct BackgroundEventEvent {
pub message: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
2025-08-21 01:15:24 -07:00
pub struct StreamErrorEvent {
pub message: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct PatchApplyBeginEvent {
/// Identifier so this can be paired with the PatchApplyEnd event.
pub call_id: String,
/// If true, there was no ApplyPatchApprovalRequest for this patch.
pub auto_approved: bool,
/// The changes to be applied.
pub changes: HashMap<PathBuf, FileChange>,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct PatchApplyEndEvent {
/// Identifier for the PatchApplyBegin that finished.
pub call_id: String,
/// Captured stdout (summary printed by apply_patch).
pub stdout: String,
/// Captured stderr (parser errors, IO failures, etc.).
pub stderr: String,
/// Whether the patch was applied successfully.
pub success: bool,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct TurnDiffEvent {
pub unified_diff: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
feat: record messages from user in ~/.codex/history.jsonl (#939) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
pub struct GetHistoryEntryResponseEvent {
pub offset: usize,
pub log_id: u64,
/// The entry at the requested offset, if available and parseable.
#[serde(skip_serializing_if = "Option::is_none")]
pub entry: Option<HistoryEntry>,
}
/// Response payload for `Op::ListMcpTools`.
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct McpListToolsResponseEvent {
/// Fully qualified tool name -> tool definition.
pub tools: std::collections::HashMap<String, McpTool>,
}
/// Response payload for `Op::ListCustomPrompts`.
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct ListCustomPromptsResponseEvent {
pub custom_prompts: Vec<CustomPrompt>,
}
#[derive(Debug, Default, Clone, Deserialize, Serialize, TS)]
pub struct SessionConfiguredEvent {
/// Name left as session_id instead of conversation_id for backwards compatibility.
pub session_id: ConversationId,
/// Tell the client what model is being queried.
pub model: String,
feat: record messages from user in ~/.codex/history.jsonl (#939) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
/// The effort the model is putting into reasoning about the user's request.
#[serde(skip_serializing_if = "Option::is_none")]
pub reasoning_effort: Option<ReasoningEffortConfig>,
feat: record messages from user in ~/.codex/history.jsonl (#939) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
/// Identifier of the history log file (inode on Unix, 0 otherwise).
pub history_log_id: u64,
/// Current number of entries in the history log.
pub history_entry_count: usize,
Replay EventMsgs from Response Items when resuming a session with history. (#3123) ### Overview This PR introduces the following changes: 1. Adds a unified mechanism to convert ResponseItem into EventMsg. 2. Ensures that when a session is initialized with initial history, a vector of EventMsg is sent along with the session configuration. This allows clients to re-render the UI accordingly. 3. Added integration testing ### Caveats This implementation does not send every EventMsg that was previously dispatched to clients. The excluded events fall into two categories: • “Arguably” rolled-out events Examples include tool calls and apply-patch calls. While these events are conceptually rolled out, we currently only roll out ResponseItems. These events are already being handled elsewhere and transformed into EventMsg before being sent. • Non-rolled-out events Certain events such as TurnDiff, Error, and TokenCount are not rolled out at all. ### Future Directions At present, resuming a session involves maintaining two states: • UI State Clients can replay most of the important UI from the provided EventMsg history. • Model State The model receives the complete session history to reconstruct its internal state. This design provides a solid foundation. If, in the future, more precise UI reconstruction is needed, we have two potential paths: 1. Introduce a third data structure that allows us to derive both ResponseItems and EventMsgs. 2. Clearly divide responsibilities: the core system ensures the integrity of the model state, while clients are responsible for reconstructing the UI.
2025-09-03 21:47:00 -07:00
/// Optional initial messages (as events) for resumed sessions.
/// When present, UIs can use these to seed the history.
#[serde(skip_serializing_if = "Option::is_none")]
pub initial_messages: Option<Vec<EventMsg>>,
pub rollout_path: PathBuf,
}
/// User's decision in response to an ExecApprovalRequest.
OpenTelemetry events (#2103) ### Title ## otel Codex can emit [OpenTelemetry](https://opentelemetry.io/) **log events** that describe each run: outbound API requests, streamed responses, user input, tool-approval decisions, and the result of every tool invocation. Export is **disabled by default** so local runs remain self-contained. Opt in by adding an `[otel]` table and choosing an exporter. ```toml [otel] environment = "staging" # defaults to "dev" exporter = "none" # defaults to "none"; set to otlp-http or otlp-grpc to send events log_user_prompt = false # defaults to false; redact prompt text unless explicitly enabled ``` Codex tags every exported event with `service.name = "codex-cli"`, the CLI version, and an `env` attribute so downstream collectors can distinguish dev/staging/prod traffic. Only telemetry produced inside the `codex_otel` crate—the events listed below—is forwarded to the exporter. ### Event catalog Every event shares a common set of metadata fields: `event.timestamp`, `conversation.id`, `app.version`, `auth_mode` (when available), `user.account_id` (when available), `terminal.type`, `model`, and `slug`. With OTEL enabled Codex emits the following event types (in addition to the metadata above): - `codex.api_request` - `cf_ray` (optional) - `attempt` - `duration_ms` - `http.response.status_code` (optional) - `error.message` (failures) - `codex.sse_event` - `event.kind` - `duration_ms` - `error.message` (failures) - `input_token_count` (completion only) - `output_token_count` (completion only) - `cached_token_count` (completion only, optional) - `reasoning_token_count` (completion only, optional) - `tool_token_count` (completion only) - `codex.user_prompt` - `prompt_length` - `prompt` (redacted unless `log_user_prompt = true`) - `codex.tool_decision` - `tool_name` - `call_id` - `decision` (`approved`, `approved_for_session`, `denied`, or `abort`) - `source` (`config` or `user`) - `codex.tool_result` - `tool_name` - `call_id` - `arguments` - `duration_ms` (execution time for the tool) - `success` (`"true"` or `"false"`) - `output` ### Choosing an exporter Set `otel.exporter` to control where events go: - `none` – leaves instrumentation active but skips exporting. This is the default. - `otlp-http` – posts OTLP log records to an OTLP/HTTP collector. Specify the endpoint, protocol, and headers your collector expects: ```toml [otel] exporter = { otlp-http = { endpoint = "https://otel.example.com/v1/logs", protocol = "binary", headers = { "x-otlp-api-key" = "${OTLP_TOKEN}" } }} ``` - `otlp-grpc` – streams OTLP log records over gRPC. Provide the endpoint and any metadata headers: ```toml [otel] exporter = { otlp-grpc = { endpoint = "https://otel.example.com:4317", headers = { "x-otlp-meta" = "abc123" } }} ``` If the exporter is `none` nothing is written anywhere; otherwise you must run or point to your own collector. All exporters run on a background batch worker that is flushed on shutdown. If you build Codex from source the OTEL crate is still behind an `otel` feature flag; the official prebuilt binaries ship with the feature enabled. When the feature is disabled the telemetry hooks become no-ops so the CLI continues to function without the extra dependencies. --------- Co-authored-by: Anton Panasenko <apanasenko@openai.com>
2025-09-29 19:30:55 +01:00
#[derive(Debug, Default, Clone, Copy, Deserialize, Serialize, PartialEq, Eq, Display, TS)]
#[serde(rename_all = "snake_case")]
pub enum ReviewDecision {
/// User has approved this command and the agent should execute it.
Approved,
/// User has approved this command and wants to automatically approve any
/// future identical instances (`command` and `cwd` match exactly) for the
/// remainder of the session.
ApprovedForSession,
/// User has denied this command and the agent should not execute it, but
/// it should continue the session and try something else.
#[default]
Denied,
/// User has denied this command and the agent should not do anything until
/// the user's next command.
Abort,
}
2025-08-18 13:08:53 -07:00
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, TS)]
#[serde(rename_all = "snake_case")]
pub enum FileChange {
Add {
content: String,
},
Delete {
content: String,
},
Update {
unified_diff: String,
move_path: Option<PathBuf>,
},
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct Chunk {
/// 1-based line index of the first line in the original file
pub orig_index: u32,
pub deleted_lines: Vec<String>,
pub inserted_lines: Vec<String>,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS)]
pub struct TurnAbortedEvent {
pub reason: TurnAbortReason,
}
2025-08-18 13:08:53 -07:00
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, TS)]
#[serde(rename_all = "snake_case")]
pub enum TurnAbortReason {
Interrupted,
Replaced,
ReviewEnded,
}
#[cfg(test)]
mod tests {
use super::*;
use anyhow::Result;
use serde_json::json;
use tempfile::NamedTempFile;
/// Serialize Event to verify that its JSON representation has the expected
/// amount of nesting.
#[test]
fn serialize_event() -> Result<()> {
let conversation_id = ConversationId::from_string("67e55044-10b1-426f-9247-bb680e5fe0c8")?;
let rollout_file = NamedTempFile::new()?;
let event = Event {
id: "1234".to_string(),
msg: EventMsg::SessionConfigured(SessionConfiguredEvent {
session_id: conversation_id,
model: "codex-mini-latest".to_string(),
reasoning_effort: Some(ReasoningEffortConfig::default()),
feat: record messages from user in ~/.codex/history.jsonl (#939) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
history_log_id: 0,
history_entry_count: 0,
Replay EventMsgs from Response Items when resuming a session with history. (#3123) ### Overview This PR introduces the following changes: 1. Adds a unified mechanism to convert ResponseItem into EventMsg. 2. Ensures that when a session is initialized with initial history, a vector of EventMsg is sent along with the session configuration. This allows clients to re-render the UI accordingly. 3. Added integration testing ### Caveats This implementation does not send every EventMsg that was previously dispatched to clients. The excluded events fall into two categories: • “Arguably” rolled-out events Examples include tool calls and apply-patch calls. While these events are conceptually rolled out, we currently only roll out ResponseItems. These events are already being handled elsewhere and transformed into EventMsg before being sent. • Non-rolled-out events Certain events such as TurnDiff, Error, and TokenCount are not rolled out at all. ### Future Directions At present, resuming a session involves maintaining two states: • UI State Clients can replay most of the important UI from the provided EventMsg history. • Model State The model receives the complete session history to reconstruct its internal state. This design provides a solid foundation. If, in the future, more precise UI reconstruction is needed, we have two potential paths: 1. Introduce a third data structure that allows us to derive both ResponseItems and EventMsgs. 2. Clearly divide responsibilities: the core system ensures the integrity of the model state, while clients are responsible for reconstructing the UI.
2025-09-03 21:47:00 -07:00
initial_messages: None,
rollout_path: rollout_file.path().to_path_buf(),
}),
};
let expected = json!({
"id": "1234",
"msg": {
"type": "session_configured",
"session_id": "67e55044-10b1-426f-9247-bb680e5fe0c8",
"model": "codex-mini-latest",
"reasoning_effort": "medium",
"history_log_id": 0,
"history_entry_count": 0,
"rollout_path": format!("{}", rollout_file.path().display()),
}
});
assert_eq!(expected, serde_json::to_value(&event)?);
Ok(())
}
#[test]
fn vec_u8_as_base64_serialization_and_deserialization() -> Result<()> {
let event = ExecCommandOutputDeltaEvent {
call_id: "call21".to_string(),
stream: ExecOutputStream::Stdout,
chunk: vec![1, 2, 3, 4, 5],
};
let serialized = serde_json::to_string(&event)?;
assert_eq!(
r#"{"call_id":"call21","stream":"stdout","chunk":"AQIDBAU="}"#,
serialized,
);
let deserialized: ExecCommandOutputDeltaEvent = serde_json::from_str(&serialized)?;
assert_eq!(deserialized, event);
Ok(())
}
}