Currently we collect all all turn items in a vector, then we add it to
the history on success. This result in losing those items on errors
including aborting `ctrl+c`.
This PR:
- Adds the ability for the tool call to handle cancellation
- bubble the turn items up to where we are recording this info
Admittedly, this logic is an ad-hoc logic that doesn't handle a lot of
error edge cases. The right thing to do is recording to the history on
the spot as `items`/`tool calls output` come. However, this isn't
possible because of having different `task_kind` that has different
`conversation_histories`. The `try_run_turn` has no idea what thread are
we using. We cannot also pass an `arc` to the `conversation_histories`
because it's a private element of `state`.
That's said, `abort` is the most common case and we should cover it
until we remove `task kind`
The backend will be returning unix timestamps (seconds since epoch)
instead of RFC 3339 strings. This will make it more ergonomic for
developers to integrate against - no string parsing.
This change ensures that we store the absolute time instead of relative
offsets of when the primary and secondary rate limits will reset.
Previously these got recalculated relative to current time, which leads
to the displayed reset times to change over time, including after doing
a codex resume.
For previously changed sessions, this will cause the reset times to not
show due to this being a breaking change:
<img width="524" height="55" alt="Screenshot 2025-10-17 at 5 14 18 PM"
src="https://github.com/user-attachments/assets/53ebd43e-da25-4fef-9c47-94a529d40265"
/>
Fixes https://github.com/openai/codex/issues/4761
In the past, we were treating `input exceeded context window` as a
streaming error and retrying on it. Retrying on it has no point because
it won't change the behavior. In this PR, we surface the error to the
client without retry and also send a token count event to indicate that
the context window is full.
<img width="650" height="125" alt="image"
src="https://github.com/user-attachments/assets/c26b1213-4c27-4bfc-90f4-51a270a3efd5"
/>
# Tool System Refactor
- Centralizes tool definitions and execution in `core/src/tools/*`:
specs (`spec.rs`), handlers (`handlers/*`), router (`router.rs`),
registry/dispatch (`registry.rs`), and shared context (`context.rs`).
One registry now builds the model-visible tool list and binds handlers.
- Router converts model responses to tool calls; Registry dispatches
with consistent telemetry via `codex-rs/otel` and unified error
handling. Function, Local Shell, MCP, and experimental `unified_exec`
all flow through this path; legacy shell aliases still work.
- Rationale: reduce per‑tool boilerplate, keep spec/handler in sync, and
make adding tools predictable and testable.
Example: `read_file`
- Spec: `core/src/tools/spec.rs` (see `create_read_file_tool`,
registered by `build_specs`).
- Handler: `core/src/tools/handlers/read_file.rs` (absolute `file_path`,
1‑indexed `offset`, `limit`, `L#: ` prefixes, safe truncation).
- E2E test: `core/tests/suite/read_file.rs` validates the tool returns
the requested lines.
## Next steps:
- Decompose `handle_container_exec_with_params`
- Add parallel tool calls
We continue the separation between `codex app-server` and `codex
mcp-server`.
In particular, we introduce a new crate, `codex-app-server-protocol`,
and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it
`codex-rs/app-server-protocol/src/protocol.rs`.
Because `ConversationId` was defined in `mcp_protocol.rs`, we move it
into its own file, `codex-rs/protocol/src/conversation_id.rs`, and
because it is referenced in a ton of places, we have to touch a lot of
files as part of this PR.
We also decide to get away from proper JSON-RPC 2.0 semantics, so we
also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which
is basically the same `JSONRPCMessage` type defined in `mcp-types`
except with all of the `"jsonrpc": "2.0"` removed.
Getting rid of `"jsonrpc": "2.0"` makes our serialization logic
considerably simpler, as we can lean heavier on serde to serialize
directly into the wire format that we use now.
We're trying to migrate from `session_id: Uuid` to `conversation_id:
ConversationId`. Not only does this give us more type safety but it
unifies our terminology across Codex and with the implementation of
session resuming, a conversation (which can span multiple sessions) is
more appropriate.
I started this impl on https://github.com/openai/codex/pull/3219 as part
of getting resume working in the extension but it's big enough that it
should be broken out.
- added `uninlined_format_args` to `[workspace.lints.clippy]` in the
`Cargo.toml` for the workspace
- ran `cargo clippy --tests --fix`
- ran `just fmt`
This PR improves the error message presented to the user when logged in
with ChatGPT and a rate-limit error occurs. In particular, it provides
the user with information about when the rate limit will be reset. It
removes older code that attempted to do the same but relied on parsing
of error messages that are not generated by the ChatGPT endpoint. The
new code uses newly-added error fields.
This PR does two things because after I got deep into the first one I
started pulling on the thread to the second:
- Makes `ConversationManager` the place where all in-memory
conversations are created and stored. Previously, `MessageProcessor` in
the `codex-mcp-server` crate was doing this via its `session_map`, but
this is something that should be done in `codex-core`.
- It unwinds the `ctrl_c: tokio::sync::Notify` that was threaded
throughout our code. I think this made sense at one time, but now that
we handle Ctrl-C within the TUI and have a proper `Op::Interrupt` event,
I don't think this was quite right, so I removed it. For `codex exec`
and `codex proto`, we now use `tokio::signal::ctrl_c()` directly, but we
no longer make `Notify` a field of `Codex` or `CodexConversation`.
Changes of note:
- Adds the files `conversation_manager.rs` and `codex_conversation.rs`
to `codex-core`.
- `Codex` and `CodexSpawnOk` are no longer exported from `codex-core`:
other crates must use `CodexConversation` instead (which is created via
`ConversationManager`).
- `core/src/codex_wrapper.rs` has been deleted in favor of
`ConversationManager`.
- `ConversationManager::new_conversation()` returns `NewConversation`,
which is in line with the `new_conversation` tool we want to add to the
MCP server. Note `NewConversation` includes `SessionConfiguredEvent`, so
we eliminate checks in cases like `codex-rs/core/tests/client.rs` to
verify `SessionConfiguredEvent` is the first event because that is now
internal to `ConversationManager`.
- Quite a bit of code was deleted from
`codex-rs/mcp-server/src/message_processor.rs` since it no longer has to
manage multiple conversations itself: it goes through
`ConversationManager` instead.
- `core/tests/live_agent.rs` has been deleted because I had to update a
bunch of tests and all the tests in here were ignored, and I don't think
anyone ever ran them, so this was just technical debt, at this point.
- Removed `notify_on_sigint()` from `util.rs` (and in a follow-up, I
hope to refactor the blandly-named `util.rs` into more descriptive
files).
- In general, I started replacing local variables named `codex` as
`conversation`, where appropriate, though admittedly I didn't do it
through all the integration tests because that would have added a lot of
noise to this PR.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2240).
* #2264
* #2263
* __->__ #2240
## Summary
- fix typo in usage limit banner text
- update error message tests
## Testing
- `just fmt`
- `RUSTC_BOOTSTRAP=1 just fix` *(fails: `let` expressions in this
position are unstable)*
- `RUSTC_BOOTSTRAP=1 cargo test --all-features` *(fails: `let`
expressions in this position are unstable)*
------
https://chatgpt.com/codex/tasks/task_i_689610fc1fe4832081bdd1118779b60b
Historically, we spawned the Seatbelt and Landlock sandboxes in
substantially different ways:
For **Seatbelt**, we would run `/usr/bin/sandbox-exec` with our policy
specified as an arg followed by the original command:
d1de7bb383/codex-rs/core/src/exec.rs (L147-L219)
For **Landlock/Seccomp**, we would do
`tokio::runtime::Builder::new_current_thread()`, _invoke
Landlock/Seccomp APIs to modify the permissions of that new thread_, and
then spawn the command:
d1de7bb383/codex-rs/core/src/exec_linux.rs (L28-L49)
While it is neat that Landlock/Seccomp supports applying a policy to
only one thread without having to apply it to the entire process, it
requires us to maintain two different codepaths and is a bit harder to
reason about. The tipping point was
https://github.com/openai/codex/pull/1061, in which we had to start
building up the `env` in an unexpected way for the existing
Landlock/Seccomp approach to continue to work.
This PR overhauls things so that we do similar things for Mac and Linux.
It turned out that we were already building our own "helper binary"
comparable to Mac's `sandbox-exec` as part of the `cli` crate:
d1de7bb383/codex-rs/cli/Cargo.toml (L10-L12)
We originally created this to build a small binary to include with the
Node.js version of the Codex CLI to provide support for Linux
sandboxing.
Though the sticky bit is that, at this point, we still want to deploy
the Rust version of Codex as a single, standalone binary rather than a
CLI and a supporting sandboxing binary. To satisfy this goal, we use
"the arg0 trick," in which we:
* use `std::env::current_exe()` to get the path to the CLI that is
currently running
* use the CLI as the `program` for the `Command`
* set `"codex-linux-sandbox"` as arg0 for the `Command`
A CLI that supports sandboxing should check arg0 at the start of the
program. If it is `"codex-linux-sandbox"`, it must invoke
`codex_linux_sandbox::run_main()`, which runs the CLI as if it were
`codex-linux-sandbox`. When acting as `codex-linux-sandbox`, we make the
appropriate Landlock/Seccomp API calls and then use `execvp(3)` to spawn
the original command, so do _replace_ the process rather than spawn a
subprocess. Incidentally, we do this before starting the Tokio runtime,
so the process should only have one thread when `execvp(3)` is called.
Because the `core` crate that needs to spawn the Linux sandboxing is not
a CLI in its own right, this means that every CLI that includes `core`
and relies on this behavior has to (1) implement it and (2) provide the
path to the sandboxing executable. While the path is almost always
`std::env::current_exe()`, we needed to make this configurable for
integration tests, so `Config` now has a `codex_linux_sandbox_exe:
Option<PathBuf>` property to facilitate threading this through,
introduced in https://github.com/openai/codex/pull/1089.
This common pattern is now captured in
`codex_linux_sandbox::run_with_sandbox()` and all of the `main.rs`
functions that should use it have been updated as part of this PR.
The `codex-linux-sandbox` crate added to the Cargo workspace as part of
this PR now has the bulk of the Landlock/Seccomp logic, which makes
`core` a bit simpler. Indeed, `core/src/exec_linux.rs` and
`core/src/landlock.rs` were removed/ported as part of this PR. I also
moved the unit tests for this code into an integration test,
`linux-sandbox/tests/landlock.rs`, in which I use
`env!("CARGO_BIN_EXE_codex-linux-sandbox")` as the value for
`codex_linux_sandbox_exe` since `std::env::current_exe()` is not
appropriate in that case.
This is a substantial PR to add support for the chat completions API,
which in turn makes it possible to use non-OpenAI model providers (just
like in the TypeScript CLI):
* It moves a number of structs from `client.rs` to `client_common.rs` so
they can be shared.
* It introduces support for the chat completions API in
`chat_completions.rs`.
* It updates `ModelProviderInfo` so that `env_key` is `Option<String>`
instead of `String` (for e.g., ollama) and adds a `wire_api` field
* It updates `client.rs` to choose between `stream_responses()` and
`stream_chat_completions()` based on the `wire_api` for the
`ModelProviderInfo`
* It updates the `exec` and TUI CLIs to no longer fail if the
`OPENAI_API_KEY` environment variable is not set
* It updates the TUI so that `EventMsg::Error` is displayed more
prominently when it occurs, particularly now that it is important to
alert users to the `CodexErr::EnvVar` variant.
* `CodexErr::EnvVar` was updated to include an optional `instructions`
field so we can preserve the behavior where we direct users to
https://platform.openai.com if `OPENAI_API_KEY` is not set.
* Cleaned up the "welcome message" in the TUI to ensure the model
provider is displayed.
* Updated the docs in `codex-rs/README.md`.
To exercise the chat completions API from OpenAI models, I added the
following to my `config.toml`:
```toml
model = "gpt-4o"
model_provider = "openai-chat-completions"
[model_providers.openai-chat-completions]
name = "OpenAI using Chat Completions"
base_url = "https://api.openai.com/v1"
env_key = "OPENAI_API_KEY"
wire_api = "chat"
```
Though to test a non-OpenAI provider, I installed ollama with mistral
locally on my Mac because ChatGPT said that would be a good match for my
hardware:
```shell
brew install ollama
ollama serve
ollama pull mistral
```
Then I added the following to my `~/.codex/config.toml`:
```toml
model = "mistral"
model_provider = "ollama"
```
Note this code could certainly use more test coverage, but I want to get
this in so folks can start playing with it.
For reference, I believe https://github.com/openai/codex/pull/247 was
roughly the comparable PR on the TypeScript side.
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.