## Summary
This PR is an alternative approach to #4711, but instead of changing our
storage, parses out shell calls in the client and reserializes them on
the fly before we send them out as part of the request.
What this changes:
1. Adds additional serialization logic when the
ApplyPatchToolType::Freeform is in use.
2. Adds a --custom-apply-patch flag to enable this setting on a
session-by-session basis.
This change is delicate, but is not meant to be permanent. It is meant
to be the first step in a migration:
1. (This PR) Add in-flight serialization with config
2. Update model_family default
3. Update serialization logic to store turn outputs in a structured
format, with logic to serialize based on model_family setting.
4. Remove this rewrite in-flight logic.
## Test Plan
- [x] Additional unit tests added
- [x] Integration tests added
- [x] Tested locally
# Tool System Refactor
- Centralizes tool definitions and execution in `core/src/tools/*`:
specs (`spec.rs`), handlers (`handlers/*`), router (`router.rs`),
registry/dispatch (`registry.rs`), and shared context (`context.rs`).
One registry now builds the model-visible tool list and binds handlers.
- Router converts model responses to tool calls; Registry dispatches
with consistent telemetry via `codex-rs/otel` and unified error
handling. Function, Local Shell, MCP, and experimental `unified_exec`
all flow through this path; legacy shell aliases still work.
- Rationale: reduce per‑tool boilerplate, keep spec/handler in sync, and
make adding tools predictable and testable.
Example: `read_file`
- Spec: `core/src/tools/spec.rs` (see `create_read_file_tool`,
registered by `build_specs`).
- Handler: `core/src/tools/handlers/read_file.rs` (absolute `file_path`,
1‑indexed `offset`, `limit`, `L#: ` prefixes, safe truncation).
- E2E test: `core/tests/suite/read_file.rs` validates the tool returns
the requested lines.
## Next steps:
- Decompose `handle_container_exec_with_params`
- Add parallel tool calls
We currently get information about rate limits in the response headers.
We want to forward them to the clients to have better transparency.
UI/UX plans have been discussed and this information is needed.
## Summary
Resolves a merge conflict between #3597 and #3560, and adds tests to
double check our apply_patch configuration.
## Testing
- [x] Added unit tests
---------
Co-authored-by: dedrisian-oai <dedrisian@openai.com>
## 📝 Review Mode -- Core
This PR introduces the Core implementation for Review mode:
- New op `Op::Review { prompt: String }:` spawns a child review task
with isolated context, a review‑specific system prompt, and a
`Config.review_model`.
- `EnteredReviewMode`: emitted when the child review session starts.
Every event from this point onwards reflects the review session.
- `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review
finishes or is interrupted, with optional structured findings:
```json
{
"findings": [
{
"title": "<≤ 80 chars, imperative>",
"body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>",
"confidence_score": <float 0.0-1.0>,
"priority": <int 0-3>,
"code_location": {
"absolute_file_path": "<file path>",
"line_range": {"start": <int>, "end": <int>}
}
}
],
"overall_correctness": "patch is correct" | "patch is incorrect",
"overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>",
"overall_confidence_score": <float 0.0-1.0>
}
```
## Questions
### Why separate out its own message history?
We want the review thread to match the training of our review models as
much as possible -- that means using a custom prompt, removing user
instructions, and starting a clean chat history.
We also want to make sure the review thread doesn't leak into the parent
thread.
### Why do this as a mode, vs. sub-agents?
1. We want review to be a synchronous task, so it's fine for now to do a
bespoke implementation.
2. We're still unclear about the final structure for sub-agents. We'd
prefer to land this quickly and then refactor into sub-agents without
rushing that implementation.
When item ids are sent to Responses API it will load them from the
database ignoring the provided values. This adds extra latency.
Not having the mode to store requests also allows us to simplify the
code.
## Breaking change
The `disable_response_storage` configuration option is removed.
This PR does the following:
- divides user msgs into 3 categories: plain, user instructions, and
environment context
- Centralizes adding user instructions and environment context to a
degree
- Improve the integration testing
Building on top of #3123
Specifically this
[comment](https://github.com/openai/codex/pull/3123#discussion_r2319885089).
We need to send the user message while ignoring the User Instructions
and Environment Context we attach.
## Summary
It appears that #2108 hit a merge conflict with #2355 - I failed to
notice the path difference when re-reviewing the former. This PR
rectifies that, and consolidates it into the protocol package, in line
with our philosophy of specifying types in one place.
## Testing
- [x] Adds config test for model_verbosity
The gpt-oss models require reasoning with subsequent Chat Completions
requests because otherwise the model forgets why the tools were called.
This change fixes that and also adds some additional missing
documentation around how to handle context windows in Ollama and how to
show the CoT if you desire to.
- Introduce websearch end to complement the begin
- Moves the logic of adding the sebsearch tool to
create_tools_json_for_responses_api
- Making it the client responsibility to toggle the tool on or off
- Other misc in #2371 post commit feedback
- Show the query:
<img width="1392" height="151" alt="image"
src="https://github.com/user-attachments/assets/8457f1a6-f851-44cf-bcca-0d4fe460ce89"
/>
Adds web_search tool, enabling the model to use Responses API web_search
tool.
- Disabled by default, enabled by --search flag
- When --search is passed, exposes web_search_request function tool to
the model, which triggers user approval. When approved, the model can
use the web_search tool for the remainder of the turn
<img width="1033" height="294" alt="image"
src="https://github.com/user-attachments/assets/62ac6563-b946-465c-ba5d-9325af28b28f"
/>
---------
Co-authored-by: easong-openai <easong@openai.com>
## Summary
GPT-5 introduced the concept of [custom
tools](https://platform.openai.com/docs/guides/function-calling#custom-tools),
which allow the model to send a raw string result back, simplifying
json-escape issues. We are migrating gpt-5 to use this by default.
However, gpt-oss models do not support custom tools, only normal
functions. So we keep both tool definitions, and provide whichever one
the model family supports.
## Testing
- [x] Tested locally with various models
- [x] Unit tests pass
**Summary**
- Adds `model_verbosity` config (values: low, medium, high).
- Sends `text.verbosity` only for GPT‑5 family models via the Responses
API.
- Updates docs and adds serialization tests.
**Motivation**
- GPT‑5 introduces a verbosity control to steer output length/detail
without pro
mpt surgery.
- Exposing it as a config knob keeps prompts stable and makes behavior
explicit
and repeatable.
**Changes**
- Config:
- Added `Verbosity` enum (low|medium|high).
- Added optional `model_verbosity` to `ConfigToml`, `Config`, and
`ConfigProfi
le`.
- Request wiring:
- Extended `ResponsesApiRequest` with optional `text` object.
- Populates `text.verbosity` only when model family is `gpt-5`; omitted
otherw
ise.
- Tests:
- Verifies `text.verbosity` serializes when set and is omitted when not
set.
- Docs:
- Added “GPT‑5 Verbosity” section in `codex-rs/README.md`.
- Added `model_verbosity` section to `codex-rs/config.md`.
**Usage**
- In `~/.codex/config.toml`:
- `model = "gpt-5"`
- `model_verbosity = "low"` (or `"medium"` default, `"high"`)
- CLI override example:
- `codex -c model="gpt-5" -c model_verbosity="high"`
**API Impact**
- Requests to GPT‑5 via Responses API include: `text: { verbosity:
"low|medium|h
igh" }` when configured.
- For legacy models or Chat Completions providers, `text` is omitted.
**Backward Compatibility**
- Default behavior unchanged when `model_verbosity` is not set (server
default “
medium”).
**Testing**
- Added unit tests for serialization/omission of `text.verbosity`.
- Ran `cargo fmt` and `cargo test --all-features` (all green).
**Docs**
- `README.md`: new “GPT‑5 Verbosity” note under Config with example.
- `config.md`: new `model_verbosity` section.
**Out of Scope**
- No changes to temperature/top_p or other GPT‑5 parameters.
- No changes to Chat Completions wiring.
**Risks / Notes**
- If OpenAI changes the wire shape for verbosity, we may need to update
`Respons
esApiRequest`.
- Behavior gated to `gpt-5` model family to avoid unexpected effects
elsewhere.
**Checklist**
- [x] Code gated to GPT‑5 family only
- [x] Docs updated (`README.md`, `config.md`)
- [x] Tests added and passing
- [x] Formatting applied
Release note: Add `model_verbosity` config to control GPT‑5 output verbosity via the Responses API (low|medium|high).
## Summary
We've experienced a bit of drift in system prompting for `apply_patch`:
- As pointed out in #2030 , our prettier formatting started altering
prompt.md in a few ways
- We introduced a separate markdown file for apply_patch instructions in
#993, but currently duplicate them in the prompt.md file
- We added a first-class apply_patch tool in #2303, which has yet
another definition
This PR starts to consolidate our logic in a few ways:
- We now only use
`apply_patch_tool_instructions.md](https://github.com/openai/codex/compare/dh--apply-patch-tool-definition?expand=1#diff-d4fffee5f85cb1975d3f66143a379e6c329de40c83ed5bf03ffd3829df985bea)
for system instructions
- We no longer include apply_patch system instructions if the tool is
specified
I'm leaving the definition in openai_tools.rs as duplicated text for now
because we're going to be iterated on the first-class tool soon.
## Testing
- [x] Added integration tests to verify prompt stability
- [x] Tested locally with several different models (gpt-5, gpt-oss,
o4-mini)
This pull request resolves#2296; I've confirmed if it works by:
1. Add settings to ~/.codex/config.toml:
```toml
model_reasoning_effort = "minimal"
```
2. Run the CLI:
```
cd codex-rs
cargo build && RUST_LOG=trace cargo run --bin codex
/status
tail -f ~/.codex/log/codex-tui.log
```
Co-authored-by: pakrym-oai <pakrym@openai.com>
This PR:
* Added the clippy.toml to configure allowable expect / unwrap usage in
tests
* Removed as many expect/allow lines as possible from tests
* moved a bunch of allows to expects where possible
Note: in integration tests, non `#[test]` helper functions are not
covered by this so we had to leave a few lingering `expect(expect_used`
checks around
## Summary
Currently, we use request-time logic to determine the user_instructions
and environment_context messages. This means that neither of these
values can change over time as conversations go on. We want to add in
additional details here, so we're migrating these to save these messages
to the rollout file instead. This is simpler for the client, and allows
us to append additional environment_context messages to each turn if we
want
## Testing
- [x] Integration test coverage
- [x] Tested locally with a few turns, confirmed model could reference
environment context and cached token metrics were reasonably high
Wait for newlines, then render markdown on a line by line basis. Word wrap it for the current terminal size and then spit it out line by line into the UI. Also adds tests and fixes some UI regressions.
## Summary
Forgot to remove this in #1869 last night! Too much of a performance hit
on the main thread. We can bring it back via an async thread on startup.
## Summary
Includes a new user message in the api payload which provides useful
environment context for the model, so it knows about things like the
current working directory and the sandbox.
## Testing
Updated unit tests
## Summary
In an effort to make tools easier to work with and more configurable,
I'm introducing `ToolConfig` and updating `Prompt` to take in a general
list of Tools. I think this is simpler and better for a few reasons:
- We can easily assemble tools from various sources (our own harness,
mcp servers, etc.) and we can consolidate the logic for constructing the
logic in one place that is separate from serialization.
- client.rs no longer needs arbitrary config values, it just takes in a
list of tools to serialize
A hefty portion of the PR is now updating our conversion of
`mcp_types::Tool` to `OpenAITool`, but considering that @bolinfest
accurately called this out as a TODO long ago, I think it's time we
tackled it.
## Testing
- [x] Experimented locally, no changes, as expected
- [x] Added additional unit tests
- [x] Responded to rust-review
https://github.com/openai/codex/pull/1835 has some messed up history.
This adds support for streaming chat completions, which is useful for ollama. We should probably take a very skeptical eye to the code introduced in this PR.
---------
Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
To date, we have a number of hardcoded OpenAI model slug checks spread
throughout the codebase, which makes it hard to audit the various
special cases for each model. To mitigate this issue, this PR introduces
the idea of a `ModelFamily` that has fields to represent the existing
special cases, such as `supports_reasoning_summaries` and
`uses_local_shell_tool`.
There is a `find_family_for_model()` function that maps the raw model
slug to a `ModelFamily`. This function hardcodes all the knowledge about
the special attributes for each model. This PR then replaces the
hardcoded model name checks with checks against a `ModelFamily`.
Note `ModelFamily` is now available as `Config::model_family`. We should
ultimately remove `Config::model` in favor of
`Config::model_family::slug`.
## Summary
Our recent change in #1737 can sometimes lead to the model confusing
AGENTS.md context as part of the message. But a little prompting and
formatting can help fix this!
## Testing
- Ran locally with a few different prompts to verify the model
behaves well.
- Updated unit tests
Always store the entire conversation history.
Request encrypted COT when not storing Responses.
Send entire input context instead of sending previous_response_id
- Added support for message and reasoning deltas
- Skipped adding the support in the cli and tui for later
- Commented a failing test (wrong merge) that needs fix in a separate
PR.
Side note: I think we need to disable merge when the CI don't pass.
As noted in the updated docs, this makes it so that you can set:
```toml
model_supports_reasoning_summaries = true
```
as a way of overriding the existing heuristic for when to set the
`reasoning` field on a sampling request:
341c091c5b/codex-rs/core/src/client_common.rs (L152-L166)
When using the OpenAI Responses API, we now record the `usage` field for
a `"response.completed"` event, which includes metrics about the number
of tokens consumed. We also introduce `openai_model_info.rs`, which
includes current data about the most common OpenAI models available via
the API (specifically `context_window` and `max_output_tokens`). If
Codex does not recognize the model, you can set `model_context_window`
and `model_max_output_tokens` explicitly in `config.toml`.
When then introduce a new event type to `protocol.rs`, `TokenCount`,
which includes the `TokenUsage` for the most recent turn.
Finally, we update the TUI to record the running sum of tokens used so
the percentage of available context window remaining can be reported via
the placeholder text for the composer:

We could certainly get much fancier with this (such as reporting the
estimated cost of the conversation), but for now, we are just trying to
achieve feature parity with the TypeScript CLI.
Though arguably this improves upon the TypeScript CLI, as the TypeScript
CLI uses heuristics to estimate the number of tokens used rather than
using the `usage` information directly:
296996d74e/codex-cli/src/utils/approximate-tokens-used.ts (L3-L16)
Fixes https://github.com/openai/codex/issues/1242
This fixes a longstanding error in the Rust CLI where `codex.rs`
contained an errant `is_first_turn` check that would exclude the user
instructions for subsequent "turns" of a conversation when using the
responses API (i.e., when `previous_response_id` existed).
While here, renames `Prompt.instructions` to `Prompt.user_instructions`
since we now have quite a few levels of instructions floating around.
Also removed an unnecessary use of `clone()` in
`Prompt.get_full_instructions()`.
As explained in detail in the doc comment for `ParseMode::Lenient`, we
have observed that GPT-4.1 does not always generate a valid invocation
of `apply_patch`. Fortunately, the error is predictable, so we introduce
some new logic to the `codex-apply-patch` crate to recover from this
error.
Because we would like to avoid this becoming a de facto standard (as it
would be incompatible if `apply_patch` were provided as an actual
executable, unless we also introduced the lenient behavior in the
executable, as well), we require passing `ParseMode::Lenient` to
`parse_patch_text()` to make it clear that the caller is opting into
supporting this special case.
Note the analogous change to the TypeScript CLI was
https://github.com/openai/codex/pull/930. In addition to changing the
accepted input to `apply_patch`, it also introduced additional
instructions for the model, which we include in this PR.
Note that `apply-patch` does not depend on either `regex` or
`regex-lite`, so some of the checks are slightly more verbose to avoid
introducing this dependency.
That said, this PR does not leverage the existing
`extract_heredoc_body_from_apply_patch_command()`, which depends on
`tree-sitter` and `tree-sitter-bash`:
5a5aa89914/codex-rs/apply-patch/src/lib.rs (L191-L246)
though perhaps it should.
Previous to this PR, we always set `reasoning` when making a request
using the Responses API:
d7245cbbc9/codex-rs/core/src/client.rs (L108-L111)
Though if you tried to use the Rust CLI with `--model gpt-4.1`, this
would fail with:
```shell
"Unsupported parameter: 'reasoning.effort' is not supported with this model."
```
We take a cue from the TypeScript CLI, which does a check on the model
name:
d7245cbbc9/codex-cli/src/utils/agent/agent-loop.ts (L786-L789)
This PR does a similar check, though also adds support for the following
config options:
```
model_reasoning_effort = "low" | "medium" | "high" | "none"
model_reasoning_summary = "auto" | "concise" | "detailed" | "none"
```
This way, if you have a model whose name happens to start with `"o"` (or
`"codex"`?), you can set these to `"none"` to explicitly disable
reasoning, if necessary. (That said, it seems unlikely anyone would use
the Responses API with non-OpenAI models, but we provide an escape
hatch, anyway.)
This PR also updates both the TUI and `codex exec` to show `reasoning
effort` and `reasoning summaries` in the header.
I had seen issues where `codex-rs` would not always write files without
me pressuring it to do so, and between that and the report of
https://github.com/openai/codex/issues/900, I decided to look into this
further. I found two serious issues with agent instructions:
(1) We were only sending agent instructions on the first turn, but
looking at the TypeScript code, we should be sending them on every turn.
(2) There was a serious issue where the agent instructions were
frequently lost:
* The TypeScript CLI appears to keep writing `~/.codex/instructions.md`:
55142e3e6c/codex-cli/src/utils/config.ts (L586)
* If `instructions.md` is present, the Rust CLI uses the contents of it
INSTEAD OF the default prompt, even if `instructions.md` is empty:
55142e3e6c/codex-rs/core/src/config.rs (L202-L203)
The combination of these two things means that I have been using
`codex-rs` without these key instructions:
https://github.com/openai/codex/blob/main/codex-rs/core/prompt.md
Looking at the TypeScript code, it appears we should be concatenating
these three items every time (if they exist):
* `prompt.md`
* `~/.codex/instructions.md`
* nearest `AGENTS.md`
This PR fixes things so that:
* `Config.instructions` is `None` if `instructions.md` is empty
* `Payload.instructions` is now `&'a str` instead of `Option<&'a
String>` because we should always have _something_ to send
* `Prompt` now has a `get_full_instructions()` helper that returns a
`Cow<str>` that will always include the agent instructions first.
As shown in the screenshot, we now include reasoning messages from the
model in the TUI under the heading "codex reasoning":

To ensure these are visible by default when using `o4-mini`, this also
changes the default value for `summary` (formerly `generate_summary`,
which is deprecated in favor of `summary` according to the docs) from
unset to `"auto"`.
This is a substantial PR to add support for the chat completions API,
which in turn makes it possible to use non-OpenAI model providers (just
like in the TypeScript CLI):
* It moves a number of structs from `client.rs` to `client_common.rs` so
they can be shared.
* It introduces support for the chat completions API in
`chat_completions.rs`.
* It updates `ModelProviderInfo` so that `env_key` is `Option<String>`
instead of `String` (for e.g., ollama) and adds a `wire_api` field
* It updates `client.rs` to choose between `stream_responses()` and
`stream_chat_completions()` based on the `wire_api` for the
`ModelProviderInfo`
* It updates the `exec` and TUI CLIs to no longer fail if the
`OPENAI_API_KEY` environment variable is not set
* It updates the TUI so that `EventMsg::Error` is displayed more
prominently when it occurs, particularly now that it is important to
alert users to the `CodexErr::EnvVar` variant.
* `CodexErr::EnvVar` was updated to include an optional `instructions`
field so we can preserve the behavior where we direct users to
https://platform.openai.com if `OPENAI_API_KEY` is not set.
* Cleaned up the "welcome message" in the TUI to ensure the model
provider is displayed.
* Updated the docs in `codex-rs/README.md`.
To exercise the chat completions API from OpenAI models, I added the
following to my `config.toml`:
```toml
model = "gpt-4o"
model_provider = "openai-chat-completions"
[model_providers.openai-chat-completions]
name = "OpenAI using Chat Completions"
base_url = "https://api.openai.com/v1"
env_key = "OPENAI_API_KEY"
wire_api = "chat"
```
Though to test a non-OpenAI provider, I installed ollama with mistral
locally on my Mac because ChatGPT said that would be a good match for my
hardware:
```shell
brew install ollama
ollama serve
ollama pull mistral
```
Then I added the following to my `~/.codex/config.toml`:
```toml
model = "mistral"
model_provider = "ollama"
```
Note this code could certainly use more test coverage, but I want to get
this in so folks can start playing with it.
For reference, I believe https://github.com/openai/codex/pull/247 was
roughly the comparable PR on the TypeScript side.