This PR adds support for a model-based summary and risk assessment for
commands that violate the sandbox policy and require user approval. This
aids the user in evaluating whether the command should be approved.
The feature works by taking a failed command and passing it back to the
model and asking it to summarize the command, give it a risk level (low,
medium, high) and a risk category (e.g. "data deletion" or "data
exfiltration"). It uses a new conversation thread so the context in the
existing thread doesn't influence the answer. If the call to the model
fails or takes longer than 5 seconds, it falls back to the current
behavior.
For now, this is an experimental feature and is gated by a config key
`experimental_sandbox_command_assessment`.
Here is a screen shot of the approval prompt showing the risk assessment
and summary.
<img width="723" height="282" alt="image"
src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b"
/>
## Summary
- wrap the default reqwest::Client inside a new
CodexHttpClient/CodexRequestBuilder pair and log the HTTP method, URL,
and status for each request
- update the auth/model/provider plumbing to use the new builder helpers
so headers and bearer auth continue to be applied consistently
- add the shared `http` dependency that backs the header conversion
helpers
## Testing
- `CODEX_SANDBOX=seatbelt CODEX_SANDBOX_NETWORK_DISABLED=1 cargo test -p
codex-core`
- `CODEX_SANDBOX=seatbelt CODEX_SANDBOX_NETWORK_DISABLED=1 cargo test -p
codex-chatgpt`
- `CODEX_SANDBOX=seatbelt CODEX_SANDBOX_NETWORK_DISABLED=1 cargo test -p
codex-tui`
------
https://chatgpt.com/codex/tasks/task_i_68fa5038c17483208b1148661c5873be
1. I have seen too many reports of people hitting startup timeout errors
and thinking Codex is broken. Hopefully this will help people
self-serve. We may also want to consider raising the timeout to ~15s.
2. Make it more clear what PAT is (personal access token) in the GitHub
error
<img width="2378" height="674" alt="CleanShot 2025-10-23 at 22 05 06"
src="https://github.com/user-attachments/assets/d148ce1d-ade3-4511-84a4-c164aefdb5c5"
/>
I want to centralize input processing and management to
`ConversationHistory`. This would need `ConversationHistory` to have
access to `token_info` (i.e. preventing adding a big input to the
history). Besides, it makes more sense to have it on
`ConversationHistory` than `state`.
Walk the sessions tree instead of using file_search so gitignored
CODEX_HOME directories can resume sessions. Add a regression test that
covers a .gitignore'd sessions directory.
Fixes#5247Fixes#5412
---------
Co-authored-by: Owen Lin <owen@openai.com>
Currently we collect all all turn items in a vector, then we add it to
the history on success. This result in losing those items on errors
including aborting `ctrl+c`.
This PR:
- Adds the ability for the tool call to handle cancellation
- bubble the turn items up to where we are recording this info
Admittedly, this logic is an ad-hoc logic that doesn't handle a lot of
error edge cases. The right thing to do is recording to the history on
the spot as `items`/`tool calls output` come. However, this isn't
possible because of having different `task_kind` that has different
`conversation_histories`. The `try_run_turn` has no idea what thread are
we using. We cannot also pass an `arc` to the `conversation_histories`
because it's a private element of `state`.
That's said, `abort` is the most common case and we should cover it
until we remove `task kind`
We are doing some ad-hoc logic while dealing with conversation history.
Ideally, we shouldn't mutate `vec[responseitem]` manually at all and
should depend on `ConversationHistory` for those changes.
Those changes are:
- Adding input to the history
- Removing items from the history
- Correcting history
I am also adding some `error` logs for cases we shouldn't ideally face.
For example, we shouldn't be missing `toolcalls` or `outputs`. We
shouldn't hit `ContextWindowExceeded` while performing `compact`
This refactor will give us granular control over our context management.
I haven't heard of any issues with the studio rmcp client so let's
remove the legacy one and default to the new one.
Any code changes are moving code from the adapter inline but there
should be no meaningful functionality changes.
1. Adds AgentMessage, Reasoning, WebSearch items.
2. Switches the ResponseItem parsing to use new items and then also emit
3. Removes user-item kind and filters out "special" (environment) user
items when returning to clients.
While we do not want to encourage users to hardcode secrets in their
`config.toml` file, it should be possible to pass an API key
programmatically. For example, when using `codex app-server`, it is
possible to pass a "bag of configuration" as part of the
`NewConversationParams`:
682d05512f/codex-rs/app-server-protocol/src/protocol.rs (L248-L251)
When using `codex app-server`, it's not practical to change env vars of
the `codex app-server` process on the fly (which is how we usually read
API key values), so this helps with that.
## Summary
- make the plan tool available by default by removing the feature flag
and always registering the handler
- drop plan-tool CLI and API toggles across the exec, TUI, MCP server,
and app server code paths
- update tests and configs to reflect the always-on plan tool and guard
workspace restriction tests against env leakage
## Testing
Manually tested the extension.
------
https://chatgpt.com/codex/tasks/task_i_68f67a3ff2d083209562a773f814c1f9
This #[serial] approach is not ideal. I am tracking a separate issue to
create an injectable env var provider but I want to fix these tests
first.
Fixes#5447
Today `sub_id` is an ID of a single incoming Codex Op submition. We then
associate all events triggered by this operation using the same
`sub_id`.
At the same time we are also creating a TurnContext per submission and
we'd like to start associating some events (item added/item completed)
with an entire turn instead of just the operation that started it.
Using turn context when sending events give us flexibility to change
notification scheme.
Expose the session cwd in the notify payload and update docs so scripts
and extensions receive the real project path; users get accurate
project-aware notifications in CLI and VS Code.
Fixes#5387
Because the GitHub MCP is one of the most popular MCPs and it
confusingly doesn't support OAuth, we should make it more clear how to
make it work so people don't think Codex is broken.
Without proper `zsh -lc` parsing, we lose some things like proper
command parsing, turn diff tracking, safe command checks, and other
things we expect from raw or `bash -lc` commands.
Some MCP servers expose a lot of tools. In those cases, it is reasonable
to allow/denylist tools for Codex to use so it doesn't get overwhelmed
with too many tools.
The new configuration options available in the `mcp_server` toml table
are:
* `enabled_tools`
* `disabled_tools`
Fixes#4796
Adds a `GET account/rateLimits/read` API to app-server. This calls the
codex backend to fetch the user's current rate limits.
This would be helpful in checking rate limits without having to send a
message.
For calling the codex backend usage API, I generated the types and
manually copied the relevant ones into `codex-backend-openapi-types`.
It'll be nice to extend our internal openapi generator to support Rust
so we don't have to run these manual steps.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Adds a new ItemStarted event and delivers UserMessage as the first item
type (more to come).
Renames `InputItem` to `UserInput` considering we're using the `Item`
suffix for actual items.
The backend will be returning unix timestamps (seconds since epoch)
instead of RFC 3339 strings. This will make it more ergonomic for
developers to integrate against - no string parsing.
This should make it more clear that specific tools come from MCP
servers.
#4806 requested that we add the server name but we already do that.
Fixes#4806
Tightened the docs so the sandbox guide matches reality, noted the new
tools.view_image toggle next to web search, and linked the README to the
getting-started guide which now owns the familiar tips (backtrack, --cd,
--add-dir, etc.).
Add a `--add-dir` CLI flag so sessions can use extra writable roots in
addition to the ones specified in the config file. These are ephemerally
added during the session only.
Fixes#3303Fixes#2797
The goal of this change:
1. Unify user input and user turn implementation.
2. Have a single place where turn/session setting overrides are applied.
3. Have a single place where turn context is created.
4. Create TurnContext only for actual turn and have a separate structure
for current session settings (reuse ConfigureSession)
This change ensures that we store the absolute time instead of relative
offsets of when the primary and secondary rate limits will reset.
Previously these got recalculated relative to current time, which leads
to the displayed reset times to change over time, including after doing
a codex resume.
For previously changed sessions, this will cause the reset times to not
show due to this being a breaking change:
<img width="524" height="55" alt="Screenshot 2025-10-17 at 5 14 18 PM"
src="https://github.com/user-attachments/assets/53ebd43e-da25-4fef-9c47-94a529d40265"
/>
Fixes https://github.com/openai/codex/issues/4761
`ParsedCommand::Read` has a `name` field that attempts to identify the
name of the file being read, but the file may not be in the `cwd` in
which the command is invoked as demonstrated by this existing unit test:
0139f6780c/codex-rs/core/src/parse_command.rs (L250-L260)
As you can see, `tui/Cargo.toml` is the relative path to the file being
read.
This PR introduces a new `path: PathBuf` field to `ParsedCommand::Read`
that attempts to capture this information. When possible, this is an
absolute path, though when relative, it should be resolved against the
`cwd` that will be used to run the command to derive the absolute path.
This should make it easier for clients to provide UI for a "read file"
event that corresponds to the command execution.