Seeing timeouts on certain, slow mcp server starting up when codex is
invoked. Before this change, the timeout was a hard-coded 10s. Need the
ability to define arbitrary timeouts on a per-server basis.
## Summary of changes
- Add startup_timeout_ms to McpServerConfig with 10s default when unset
- Use per-server timeout for initialize and tools/list
- Introduce ManagedClient to store client and timeout; rename
LIST_TOOLS_TIMEOUT to DEFAULT_STARTUP_TIMEOUT
- Update docs to document startup_timeout_ms with example and options
table
---------
Co-authored-by: Matthew Dolan <dolan-openai@users.noreply.github.com>
We're trying to migrate from `session_id: Uuid` to `conversation_id:
ConversationId`. Not only does this give us more type safety but it
unifies our terminology across Codex and with the implementation of
session resuming, a conversation (which can span multiple sessions) is
more appropriate.
I started this impl on https://github.com/openai/codex/pull/3219 as part
of getting resume working in the extension but it's big enough that it
should be broken out.
When item ids are sent to Responses API it will load them from the
database ignoring the provided values. This adds extra latency.
Not having the mode to store requests also allows us to simplify the
code.
## Breaking change
The `disable_response_storage` configuration option is removed.
i'm not yet convinced i have the best heuristics for what to highlight,
but this feels like a useful step towards something a bit easier to
read, esp. when the model is producing large commands.
<img width="669" height="589" alt="Screenshot 2025-09-03 at 8 21 56 PM"
src="https://github.com/user-attachments/assets/b9cbcc43-80e8-4d41-93c8-daa74b84b331"
/>
also a fairly significant refactor of our line wrapping logic.
We had multiple issues with context size calculation:
1. `initial_prompt_tokens` calculation based on cache size is not
reliable, cache misses might set it to much higher value. For now
hardcoded to a safer constant.
2. Input context size for GPT-5 is 272k (that's where 33% came from).
Fixes.
## Summary
Follow-up to #3056
This PR updates the mcp-server interface for reading the config settings
saved by the user. At risk of introducing _another_ Config struct, I
think it makes sense to avoid tying our protocol to ConfigToml, as its
become a bit unwieldy. GetConfigTomlResponse was a de-facto struct for
this already - better to make it explicit, in my opinion.
This is technically a breaking change of the mcp-server protocol, but
given the previous interface was introduced so recently in #2725, and we
have not yet even started to call it, I propose proceeding with the
breaking change - but am open to preserving the old endpoint.
## Testing
- [x] Added additional integration test coverage
When serializing to JSON, the existing solution created an enormous
array of ints, which is far more bytes on the wire than a base64-encoded
string would be.
Last week, I thought I found the smoking gun in our flaky integration
tests where holding these locks could have led to potential deadlock:
- https://github.com/openai/codex/pull/2876
- https://github.com/openai/codex/pull/2878
Yet even after those PRs went in, we continued to see flakinees in our
integration tests! Though with the additional logging added as part of
debugging those tests, I now saw things like:
```
read message from stdout: Notification(JSONRPCNotification { jsonrpc: "2.0", method: "codex/event/exec_approval_request", params: Some(Object {"id": String("0"), "msg": Object {"type": String("exec_approval_request"), "call_id": String("call1"), "command": Array [String("python3"), String("-c"), String("print(42)")], "cwd": String("/tmp/.tmpFj2zwi/workdir")}, "conversationId": String("c67b32c5-9475-41bf-8680-f4b4834ebcc6")}) })
notification: Notification(JSONRPCNotification { jsonrpc: "2.0", method: "codex/event/exec_approval_request", params: Some(Object {"id": String("0"), "msg": Object {"type": String("exec_approval_request"), "call_id": String("call1"), "command": Array [String("python3"), String("-c"), String("print(42)")], "cwd": String("/tmp/.tmpFj2zwi/workdir")}, "conversationId": String("c67b32c5-9475-41bf-8680-f4b4834ebcc6")}) })
read message from stdout: Request(JSONRPCRequest { id: Integer(0), jsonrpc: "2.0", method: "execCommandApproval", params: Some(Object {"conversation_id": String("c67b32c5-9475-41bf-8680-f4b4834ebcc6"), "call_id": String("call1"), "command": Array [String("python3"), String("-c"), String("print(42)")], "cwd": String("/tmp/.tmpFj2zwi/workdir")}) })
writing message to stdin: Response(JSONRPCResponse { id: Integer(0), jsonrpc: "2.0", result: Object {"decision": String("approved")} })
in read_stream_until_notification_message(codex/event/task_complete)
[mcp stderr] 2025-09-04T00:00:59.738585Z INFO codex_mcp_server::message_processor: <- response: JSONRPCResponse { id: Integer(0), jsonrpc: "2.0", result: Object {"decision": String("approved")} }
[mcp stderr] 2025-09-04T00:00:59.738740Z DEBUG codex_core::codex: Submission sub=Submission { id: "1", op: ExecApproval { id: "0", decision: Approved } }
[mcp stderr] 2025-09-04T00:00:59.738832Z WARN codex_core::codex: No pending approval found for sub_id: 0
```
That is, a response was sent for a request, but no callback was in place
to handle the response!
This time, I think I may have found the underlying issue (though the
fixes for holding locks for too long may have also been part of it),
which is I found cases where we were sending the request:
234c0a0469/codex-rs/core/src/codex.rs (L597)
before inserting the `Sender` into the `pending_approvals` map (which
has to wait on acquiring a mutex):
234c0a0469/codex-rs/core/src/codex.rs (L598-L601)
so it is possible the request could go out and the client could respond
before `pending_approvals` was updated!
Note this was happening in both `request_command_approval()` and
`request_patch_approval()`, which maps to the sorts of errors we have
been seeing when these integration tests have been flaking on us.
While here, I am also adding some extra logging that prints if inserting
into `pending_approvals` overwrites an entry as opposed to purely
inserting one. Today, a conversation can have only one pending request
at a time, but as we are planning to support parallel tool calls, this
invariant may not continue to hold, in which case we need to revisit
this abstraction.
Adds a TUI resume flow with an interactive picker and quick resume.
- CLI:
- --resume / -r: open picker to resume a prior session
- --continue / -l: resume the most recent session (no picker)
- Behavior on resume: initial history is replayed, welcome banner
hidden, and the first redraw is suppressed to avoid flicker.
- Implementation:
- New tui/src/resume_picker.rs (paginated listing via
RolloutRecorder::list_conversations)
- App::run accepts ResumeSelection; resumes from disk when requested
- ChatWidget refactor with ChatWidgetInit and new_from_existing; replays
initial messages
- Tests: cover picker sorting/preview extraction and resumed-history
rendering.
- Docs: getting-started updated with flags and picker usage.
https://github.com/user-attachments/assets/1bb6469b-e5d1-42f6-bec6-b1ae6debda3b
This PR does the following:
- divides user msgs into 3 categories: plain, user instructions, and
environment context
- Centralizes adding user instructions and environment context to a
degree
- Improve the integration testing
Building on top of #3123
Specifically this
[comment](https://github.com/openai/codex/pull/3123#discussion_r2319885089).
We need to send the user message while ignoring the User Instructions
and Environment Context we attach.
### Overview
This PR introduces the following changes:
1. Adds a unified mechanism to convert ResponseItem into EventMsg.
2. Ensures that when a session is initialized with initial history, a
vector of EventMsg is sent along with the session configuration. This
allows clients to re-render the UI accordingly.
3. Added integration testing
### Caveats
This implementation does not send every EventMsg that was previously
dispatched to clients. The excluded events fall into two categories:
• “Arguably” rolled-out events
Examples include tool calls and apply-patch calls. While these events
are conceptually rolled out, we currently only roll out ResponseItems.
These events are already being handled elsewhere and transformed into
EventMsg before being sent.
• Non-rolled-out events
Certain events such as TurnDiff, Error, and TokenCount are not rolled
out at all.
### Future Directions
At present, resuming a session involves maintaining two states:
• UI State
Clients can replay most of the important UI from the provided EventMsg
history.
• Model State
The model receives the complete session history to reconstruct its
internal state.
This design provides a solid foundation. If, in the future, more precise
UI reconstruction is needed, we have two potential paths:
1. Introduce a third data structure that allows us to derive both
ResponseItems and EventMsgs.
2. Clearly divide responsibilities: the core system ensures the
integrity of the model state, while clients are responsible for
reconstructing the UI.
In this test, the ChatGPT token path is used, and the auth layer tries
to refresh the token if it thinks the token is “old.” Your helper writes
a fixed last_refresh timestamp that has now aged past the 28‑day
threshold, so the code attempts a real refresh against auth.openai.com,
never reaches the mock, and you end up with
received_requests().await.unwrap() being empty.
## Summary
It appears that #2108 hit a merge conflict with #2355 - I failed to
notice the path difference when re-reviewing the former. This PR
rectifies that, and consolidates it into the protocol package, in line
with our philosophy of specifying types in one place.
## Testing
- [x] Adds config test for model_verbosity
**What?**
Auto-approve patches when `SandboxPolicy::DangerFullAccess` is enabled
on platforms without sandbox support.
Changes in `codex-rs/core/src/safety.rs`: return
`SafetyCheck::AutoApprove { sandbox_type: SandboxType::None }` when no
sandbox is available and DangerFullAccess is set.
**Why?**
On platforms lacking sandbox support, requiring explicit user approval
despite `DangerFullAccess` being explicitly enabled adds friction
without additional safety. This aligns behavior with the stated policy
intent.
**How?**
Extend `assess_patch_safety` match:
* If `get_platform_sandbox()` returns `Some`, keep `AutoApprove {
sandbox_type }`.
* If `None` **and** `SandboxPolicy::DangerFullAccess`, return
`AutoApprove { SandboxType::None }`.
* Otherwise, fall back to `AskUser`.
**Tests**
* Local checks:
```bash
cargo test && cargo clippy --tests && cargo fmt -- --config
imports_granularity=Item
```
(Additionally: `just fmt`, `just fix -p codex-core`, `cargo check -p
codex-core`.)
**Docs**
No user-facing CLI changes. No README/help updates needed.
**Risk/Impact**
Reduces prompts on non-sandboxed platforms when DangerFullAccess is
explicitly chosen; consistent with policy semantics.
---------
Co-authored-by: Michael Bolin <bolinfest@gmail.com>
Correct the `shell` tool description for sandboxed runs and add targeted
tests.
- Fix the WorkspaceWrite description to clearly state that writes
outside the writable roots require escalated permissions; reads are not
restricted. The previous wording/formatting could be read as restricting
reads outside the workspace.
- Render the writable roots list on its own lines under a newline after
"writable roots:" for clarity.
- Show the "Commands that require network access" note only in
WorkspaceWrite when network is disabled.
- Add focused tests that call `create_shell_tool_for_sandbox` directly
and assert the exact description text for WorkspaceWrite, ReadOnly, and
DangerFullAccess.
- Update AGENTS.md to note that `just fmt` can be run automatically
without asking.
- Move rollout persistence and listing into a dedicated module:
rollout/{recorder,list}.
- Expose lightweight conversation listing that returns file paths plus
the first 5 JSONL records for preview.
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 2.0.12 to
2.0.16.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/dtolnay/thiserror/releases">thiserror's
releases</a>.</em></p>
<blockquote>
<h2>2.0.16</h2>
<ul>
<li>Add to "no-std" crates.io category (<a
href="https://redirect.github.com/dtolnay/thiserror/issues/429">#429</a>)</li>
</ul>
<h2>2.0.15</h2>
<ul>
<li>Prevent <code>Error::provide</code> API becoming unavailable from a
future new compiler lint (<a
href="https://redirect.github.com/dtolnay/thiserror/issues/427">#427</a>)</li>
</ul>
<h2>2.0.14</h2>
<ul>
<li>Allow build-script cleanup failure with NFSv3 output directory to be
non-fatal (<a
href="https://redirect.github.com/dtolnay/thiserror/issues/426">#426</a>)</li>
</ul>
<h2>2.0.13</h2>
<ul>
<li>Documentation improvements</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="40b58536cc"><code>40b5853</code></a>
Release 2.0.16</li>
<li><a
href="83dfb5f99b"><code>83dfb5f</code></a>
Merge pull request <a
href="https://redirect.github.com/dtolnay/thiserror/issues/429">#429</a>
from dtolnay/nostd</li>
<li><a
href="9b4a99fb90"><code>9b4a99f</code></a>
Add to "no-std" crates.io category</li>
<li><a
href="f6145ebe84"><code>f6145eb</code></a>
Release 2.0.15</li>
<li><a
href="2717177976"><code>2717177</code></a>
Merge pull request <a
href="https://redirect.github.com/dtolnay/thiserror/issues/427">#427</a>
from dtolnay/caplints</li>
<li><a
href="2cd13e6767"><code>2cd13e6</code></a>
Make error_generic_member_access compatible with -Dwarnings</li>
<li><a
href="eea6799e2d"><code>eea6799</code></a>
Release 2.0.14</li>
<li><a
href="a2aa6d7a57"><code>a2aa6d7</code></a>
Merge pull request <a
href="https://redirect.github.com/dtolnay/thiserror/issues/426">#426</a>
from dtolnay/enotempty</li>
<li><a
href="f00ebc57be"><code>f00ebc5</code></a>
Allow build-script cleanup failure with NFSv3 output directory to be
non-fatal</li>
<li><a
href="61f28da3df"><code>61f28da</code></a>
Release 2.0.13</li>
<li>Additional commits viewable in <a
href="https://github.com/dtolnay/thiserror/compare/2.0.12...2.0.16">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Summary
This PR implements advisory file locking for the message history using
Rust 1.89+ stabilized std::fs::File locking APIs, eliminating the need
for external dependencies.
## Key Changes
- **Stable API Usage**: Uses std::fs::File::try_lock() and
try_lock_shared() APIs stabilized in Rust 1.89
- **Cross-Platform Compatibility**:
- Unix systems use try_lock_shared() for advisory read locks
- Windows systems use try_lock() due to different lock semantics
- **Retry Logic**: Maintains existing retry behavior for concurrent
access scenarios
- **No External Dependencies**: Removes need for external file locking
crates
## Technical Details
The implementation provides advisory file locking to prevent corruption
when multiple Codex processes attempt to write to the message history
file simultaneously. The locking is platform-aware to handle differences
in Windows vs Unix file locking behavior.
## Testing
- ✅ Builds successfully on all platforms
- ✅ Existing message history tests pass
- ✅ File locking retry logic verified
Related to discussion in #2773 about using stabilized Rust APIs instead
of external dependencies.
---------
Co-authored-by: Michael Bolin <bolinfest@gmail.com>
The gpt-oss models require reasoning with subsequent Chat Completions
requests because otherwise the model forgets why the tools were called.
This change fixes that and also adds some additional missing
documentation around how to handle context windows in Ollama and how to
show the CoT if you desire to.
## Summary
Fixes an issue with the lark grammar definition for the apply_patch
freeform tool. This does NOT change the defaults, merely patches the
root cause of the issue we were seeing with empty lines, and an issue
with config flowing through correctly.
Specifically, the following requires that a line is non-empty:
```
add_line: "+" /(.+)/ LF -> line
```
but many changes _should_ involve creating/updating empty lines. The new
definition is:
```
add_line: "+" /(.*)/ LF -> line
```
## Testing
- [x] Tested locally, reproduced the issue without the update and
confirmed that the model will produce empty lines wiht the new lark
grammar
We have two ways of loading conversation with a previous history. Fork
conversation and the experimental resume that we had before. In this PR,
I am unifying their code path. The path is getting the history items and
recording them in a brand new conversation. This PR also constraint the
rollout recorder responsibilities to be only recording to the disk and
loading from the disk.
The PR also fixes a current bug when we have two forking in a row:
History 1:
<Environment Context>
UserMessage_1
UserMessage_2
UserMessage_3
**Fork with n = 1 (only remove one element)**
History 2:
<Environment Context>
UserMessage_1
UserMessage_2
<Environment Context>
**Fork with n = 1 (only remove one element)**
History 2:
<Environment Context>
UserMessage_1
UserMessage_2
**<Environment Context>**
This shouldn't happen but because we were appending the `<Environment
Context>` after each spawning and it's considered as _user message_.
Now, we don't add this message if restoring and old conversation.
- Introduce websearch end to complement the begin
- Moves the logic of adding the sebsearch tool to
create_tools_json_for_responses_api
- Making it the client responsibility to toggle the tool on or off
- Other misc in #2371 post commit feedback
- Show the query:
<img width="1392" height="151" alt="image"
src="https://github.com/user-attachments/assets/8457f1a6-f851-44cf-bcca-0d4fe460ce89"
/>
Adds custom `/prompts` to `~/.codex/prompts/<command>.md`.
<img width="239" height="107" alt="Screenshot 2025-08-25 at 6 22 42 PM"
src="https://github.com/user-attachments/assets/fe6ebbaa-1bf6-49d3-95f9-fdc53b752679"
/>
---
Details:
1. Adds `Op::ListCustomPrompts` to core.
2. Returns `ListCustomPromptsResponse` with list of `CustomPrompt`
(name, content).
3. TUI calls the operation on load, and populates the custom prompts
(excluding prompts that collide with builtins).
4. Selecting the custom prompt automatically sends the prompt to the
agent.
This PR fixes two edge cases in managing burst paste (mainly on power
shell).
Bugs:
- Needs an event key after paste to render the pasted items
> ChatComposer::flush_paste_burst_if_due() flushes on timeout. Called:
> - Pre-render in App on TuiEvent::Draw.
> - Via a delayed frame
>
BottomPane::request_redraw_in(ChatComposer::recommended_paste_flush_delay()).
- Parses two key events separately before starting parsing burst paste
> When threshold is crossed, pull preceding burst chars out of the
textarea and prepend to paste_burst_buffer, then keep buffering.
- Integrates with #2567 to bring image pasting to windows.
- added `uninlined_format_args` to `[workspace.lints.clippy]` in the
`Cargo.toml` for the workspace
- ran `cargo clippy --tests --fix`
- ran `just fmt`
This is a stopgap solution, but today, we are seeing the client get
flooded with events. Since we already truncate the output we send to the
model, it feels reasonable to limit how many deltas we send to the
client.
### What this PR does
This PR introduces a new public method,
remove_conversation(conversation_id: Uuid), to the ConversationManager.
This allows consumers of the codex-core library to manually remove a
conversation from the manager's in-memory storage.
### Why this change is needed
I am currently adapting the Codex client to run as a long-lived server
application. In this server environment, ConversationManager instances
persist for extended periods, and new conversations are created for each
incoming user request.
The current implementation of ConversationManager stores all created
conversations in a HashMap indefinitely, with no mechanism for removal.
This leads to unbounded memory growth in a server context, as every new
conversation permanently occupies memory.
While an automatic TTL-based cleanup mechanism could be one solution, a
simpler, more direct remove_conversation method provides the necessary
control for my use case. It allows my server application to explicitly
manage the lifecycle of conversations, such as cleaning them up after a
request is fully processed or after a period of inactivity is detected
at the application level.
This change provides a minimal, non-intrusive way to address the memory
management issue for server-like applications built on top of
codex-core, giving developers the flexibility to implement their own
cleanup logic.
Signed-off-by: M4n5ter <m4n5terrr@gmail.com>
Co-authored-by: Michael Bolin <mbolin@openai.com>