Commit Graph

260 Commits

Author SHA1 Message Date
Michael Bolin
e3b03eaccb feat: StreamableShell with exec_command and write_stdin tools (#2574) 2025-08-22 18:10:55 -07:00
Ahmed Ibrahim
311ad0ce26 fork conversation from a previous message (#2575)
This can be the underlying logic in order to start a conversation from a
previous message. will need some love in the UI.

Base for building this: #2588
2025-08-22 17:06:09 -07:00
Jeremy Rose
d994019f3f tui: coalesce command output; show unabridged commands in transcript (#2590)
https://github.com/user-attachments/assets/effec7c7-732a-4b61-a2ae-3cb297b6b19b
2025-08-22 16:32:31 -07:00
Ahmed Ibrahim
097782c775 Move models.rs to protocol (#2595)
Moving models.rs to protocol so we can use them in `Codex` operations
2025-08-22 22:18:54 +00:00
Michael Bolin
8ba8089592 fix: prefer sending MCP structuredContent as the function call response, if available (#2594)
Prior to this change, when we got a `CallToolResult` from an MCP server,
we JSON-serialized its `content` field as the `content` to send back to
the model as part of the function call output that we send back to the
model. This meant that we were dropping the `structuredContent` on the
floor.

Though reading
https://modelcontextprotocol.io/specification/2025-06-18/schema#tool, it
appears that if `outputSchema` is specified, then `structuredContent`
should be set, which seems to be a "higher-fidelity" response to the
function call. This PR updates our handling of `CallToolResult` to
prefer using the JSON-serialization of `structuredContent`, if present,
using `content` as a fallback.

Also, it appears that the sense of `success` was inverted prior to this
PR!
2025-08-22 14:10:18 -07:00
Jeremy Rose
57c498159a test: simplify tests in config.rs (#2586)
this is much easier to read, thanks @bolinfest for the suggestion.
2025-08-22 14:04:21 -07:00
Dylan
6f0b499594 [config] Detect git worktrees for project trust (#2585)
## Summary
When resolving our current directory as a project, we want to be a
little bit more clever:
1. If we're in a sub-directory of a git repo, resolve our project
against the root of the git repo
2. If we're in a git worktree, resolve the project against the root of
the git repo

## Testing
- [x] Added unit tests
- [x] Confirmed locally with a git worktree (the one i was using for
this feature)
2025-08-22 13:54:51 -07:00
Dylan
236c4f76a6 [apply_patch] freeform apply_patch tool (#2576)
## Summary
GPT-5 introduced the concept of [custom
tools](https://platform.openai.com/docs/guides/function-calling#custom-tools),
which allow the model to send a raw string result back, simplifying
json-escape issues. We are migrating gpt-5 to use this by default.

However, gpt-oss models do not support custom tools, only normal
functions. So we keep both tool definitions, and provide whichever one
the model family supports.

## Testing
- [x] Tested locally with various models
- [x] Unit tests pass
2025-08-22 13:42:34 -07:00
Eric Traut
dc42ec0eb4 Add AuthManager and enhance GetAuthStatus command (#2577)
This PR adds a central `AuthManager` struct that manages the auth
information used across conversations and the MCP server. Prior to this,
each conversation and the MCP server got their own private snapshots of
the auth information, and changes to one (such as a logout or token
refresh) were not seen by others.

This is especially problematic when multiple instances of the CLI are
run. For example, consider the case where you start CLI 1 and log in to
ChatGPT account X and then start CLI 2 and log out and then log in to
ChatGPT account Y. The conversation in CLI 1 is still using account X,
but if you create a new conversation, it will suddenly (and
unexpectedly) switch to account Y.

With the `AuthManager`, auth information is read from disk at the time
the `ConversationManager` is constructed, and it is cached in memory.
All new conversations use this same auth information, as do any token
refreshes.

The `AuthManager` is also used by the MCP server's GetAuthStatus
command, which now returns the auth method currently used by the MCP
server.

This PR also includes an enhancement to the GetAuthStatus command. It
now accepts two new (optional) input parameters: `include_token` and
`refresh_token`. Callers can use this to request the in-use auth token
and can optionally request to refresh the token.

The PR also adds tests for the login and auth APIs that I recently added
to the MCP server.
2025-08-22 13:10:11 -07:00
vjain419
80b00a193e feat(gpt5): add model_verbosity for GPT‑5 via Responses API (#2108)
**Summary**
- Adds `model_verbosity` config (values: low, medium, high).
- Sends `text.verbosity` only for GPT‑5 family models via the Responses
API.
- Updates docs and adds serialization tests.

**Motivation**
- GPT‑5 introduces a verbosity control to steer output length/detail
without pro
mpt surgery.
- Exposing it as a config knob keeps prompts stable and makes behavior
explicit
and repeatable.

**Changes**
- Config:
  - Added `Verbosity` enum (low|medium|high).
- Added optional `model_verbosity` to `ConfigToml`, `Config`, and
`ConfigProfi
le`.
- Request wiring:
  - Extended `ResponsesApiRequest` with optional `text` object.
- Populates `text.verbosity` only when model family is `gpt-5`; omitted
otherw
ise.
- Tests:
- Verifies `text.verbosity` serializes when set and is omitted when not
set.
- Docs:
  - Added “GPT‑5 Verbosity” section in `codex-rs/README.md`.
  - Added `model_verbosity` section to `codex-rs/config.md`.

**Usage**
- In `~/.codex/config.toml`:
  - `model = "gpt-5"`
  - `model_verbosity = "low"` (or `"medium"` default, `"high"`)
- CLI override example:
  - `codex -c model="gpt-5" -c model_verbosity="high"`

**API Impact**
- Requests to GPT‑5 via Responses API include: `text: { verbosity:
"low|medium|h
igh" }` when configured.
- For legacy models or Chat Completions providers, `text` is omitted.

**Backward Compatibility**
- Default behavior unchanged when `model_verbosity` is not set (server
default “
medium”).

**Testing**
- Added unit tests for serialization/omission of `text.verbosity`.
- Ran `cargo fmt` and `cargo test --all-features` (all green).

**Docs**
- `README.md`: new “GPT‑5 Verbosity” note under Config with example.
- `config.md`: new `model_verbosity` section.

**Out of Scope**
- No changes to temperature/top_p or other GPT‑5 parameters.
- No changes to Chat Completions wiring.

**Risks / Notes**
- If OpenAI changes the wire shape for verbosity, we may need to update
`Respons
esApiRequest`.
- Behavior gated to `gpt-5` model family to avoid unexpected effects
elsewhere.

**Checklist**
- [x] Code gated to GPT‑5 family only
- [x] Docs updated (`README.md`, `config.md`)
- [x] Tests added and passing
- [x] Formatting applied

Release note: Add `model_verbosity` config to control GPT‑5 output verbosity via the Responses API (low|medium|high).
2025-08-22 09:12:10 -07:00
Dylan
e4c275d615 [apply-patch] Clean up apply-patch tool definitions (#2539)
## Summary
We've experienced a bit of drift in system prompting for `apply_patch`:
- As pointed out in #2030 , our prettier formatting started altering
prompt.md in a few ways
- We introduced a separate markdown file for apply_patch instructions in
#993, but currently duplicate them in the prompt.md file
- We added a first-class apply_patch tool in #2303, which has yet
another definition

This PR starts to consolidate our logic in a few ways:
- We now only use
`apply_patch_tool_instructions.md](https://github.com/openai/codex/compare/dh--apply-patch-tool-definition?expand=1#diff-d4fffee5f85cb1975d3f66143a379e6c329de40c83ed5bf03ffd3829df985bea)
for system instructions
- We no longer include apply_patch system instructions if the tool is
specified

I'm leaving the definition in openai_tools.rs as duplicated text for now
because we're going to be iterated on the first-class tool soon.

## Testing
- [x] Added integration tests to verify prompt stability
- [x] Tested locally with several different models (gpt-5, gpt-oss,
o4-mini)
2025-08-21 20:07:41 -07:00
Dylan
9f71dcbf57 [shell_tool] Small updates to ensure shell consistency (#2571)
## Summary
Small update to hopefully improve some shell edge cases, and make the
function clearer to the model what is going on. Keeping `timeout` as an
alias means that calls with the previous name will still work.

## Test Plan
- [x] Tested locally, model still works
2025-08-21 19:58:07 -07:00
Jeremy Rose
750ca9e21d core: write explicit [projects] tables for trusted projects (#2523)
all of my trust_level settings in my ~/.codex/config.toml were on one
line.
2025-08-21 13:20:36 -07:00
Jeremy Rose
db934e438e read all AGENTS.md up to git root (#2532)
This updates our logic for AGENTS.md to match documented behavior, which
is to read all AGENTS.md files from cwd up to git root.
2025-08-21 08:52:17 -07:00
easong-openai
8ad56be06e Parse and expose stream errors (#2540) 2025-08-21 01:15:24 -07:00
Dylan
d2b2a6d13a [prompt] xml-format EnvironmentContext (#2272)
## Summary
Before we land #2243, let's start printing environment_context in our
preferred format. This struct will evolve over time with new
information, xml gives us a balance of human readable without too much
parsing, llm readable, and extensible.

Also moves us over to an Option-based struct, so we can easily provide
diffs to the model.

## Testing
- [x] Updated tests to reflect new format
2025-08-20 23:45:16 -07:00
eddy-win
050b9baeb6 Bridge command generation to powershell when on Windows (#2319)
## What? Why? How?
- When running on Windows, codex often tries to invoke bash commands,
which commonly fail (unless WSL is installed)
- Fix: Detect if powershell is available and, if so, route commands to
it
- Also add a shell_name property to environmental context for codex to
default to powershell commands when running in that environment

## Testing
- Tested within WSL and powershell (e.g. get top 5 largest files within
a folder and validated that commands generated were powershell commands)
- Tested within Zsh
- Updated unit tests

---------

Co-authored-by: Eddy Escardo <eddy@openai.com>
2025-08-20 16:30:34 -07:00
Ahmed Ibrahim
c579ae41ae Fix login for internal employees (#2528)
This PR:
- fixes for internal employee because we currently want to prefer SIWC
for them.
- fixes retrying forever on unauthorized access. we need to break
eventually on max retries.
2025-08-20 14:05:20 -07:00
Jeremy Rose
0ad4e11c84 detect terminal and include in request headers (#2437)
This adds the terminal version to the UA header.
2025-08-20 16:54:26 +00:00
Michael Bolin
ce434b1219 fix: prefer config var to env var (#2495) 2025-08-20 04:51:59 +00:00
Ahmed Ibrahim
d1f1e36836 Refresh ChatGPT auth token (#2484)
ChatGPT token's live for only 1 hour. If the session is longer we don't
refresh the token. We should get the expiry timestamp and attempt to
refresh before it.
2025-08-19 21:01:31 -07:00
Gabriel Peal
eaae56a1b0 Client headers (#2487) 2025-08-19 23:32:15 -04:00
Gabriel Peal
77148a5c61 Diff command (#2476) 2025-08-19 22:50:28 -04:00
Michael Bolin
e58125e6c1 chore: Rust 1.89 promoted file locking to the standard library, so prefer stdlib to fs2 (#2467)
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2467).
* __->__ #2467
* #2465
2025-08-19 13:22:46 -07:00
Michael Bolin
50c48e88f5 chore: upgrade to Rust 1.89 (#2465)
Codex created this PR from the following prompt:

> upgrade this entire repo to Rust 1.89. Note that this requires
updating codex-rs/rust-toolchain.toml as well as the workflows in
.github/. Make sure that things are "clippy clean" as this change will
likely uncover new Clippy errors. `just fmt` and `cargo clippy --tests`
are sufficient to check for correctness

Note this modifies a lot of lines because it folds nested `if`
statements using `&&`.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2465).
* #2467
* __->__ #2465
2025-08-19 13:22:02 -07:00
Ahmed Ibrahim
e91c3d6d1c Support changing reasoning effort (#2435)
https://github.com/user-attachments/assets/50198ee8-5915-47a3-bb71-69af65add1ef

Building up on #2431 #2428
2025-08-19 17:55:07 +00:00
Dylan
e7e5fe91c8 [tui] Support /mcp command (#2430)
## Summary
Adds a `/mcp` command to list active tools. We can extend this command
to allow configuration of MCP tools, but for now a simple list command
will help debug if your config.toml and your tools are working as
expected.
2025-08-19 09:00:31 -07:00
Ahmed Ibrahim
97f995a749 Show login options when not signed in with ChatGPT (#2440)
Motivation: we have users who uses their API key although they want to
use ChatGPT account. We want to give them the chance to always login
with their account.

This PR displays login options when the user is not signed in with
ChatGPT. Even if you have set an OpenAI API key as an environment
variable, you will still be prompted to log in with ChatGPT.

We’ve also added a new flag, `always_use_api_key_signing` false by
default, which ensures you are never asked to log in with ChatGPT and
always defaults to using your API key.



https://github.com/user-attachments/assets/b61ebfa9-3c5e-4ab7-bf94-395c23a0e0af

After ChatGPT sign in:


https://github.com/user-attachments/assets/d58b366b-c46a-428f-a22f-2ac230f991c0
2025-08-19 03:22:48 +00:00
Ahmed Ibrahim
c283f9f6ce Add an operation to override current task context (#2431)
- Added an operation to override current task context
- Added a test to check that cache stays the same
2025-08-18 19:59:19 +00:00
Ahmed Ibrahim
c9963b52e9 consolidate reasoning enums into one (#2428)
We have three enums for each of reasoning summaries and reasoning effort
with same values. They can be consolidated into one.
2025-08-18 11:50:17 -07:00
Michael Bolin
712bfa04ac chore: move mcp-server/src/wire_format.rs to protocol/src/mcp_protocol.rs (#2423)
The existing `wire_format.rs` should share more types with the
`codex-protocol` crate (like `AskForApproval` instead of maintaining a
parallel `CodexToolCallApprovalPolicy` enum), so this PR moves
`wire_format.rs` into `codex-protocol`, renaming it as
`mcp-protocol.rs`. We also de-dupe types, where appropriate.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2423).
* #2424
* __->__ #2423
2025-08-18 09:36:57 -07:00
Michael Bolin
b581498882 fix: introduce EventMsg::TurnAborted (#2365)
Introduces `EventMsg::TurnAborted` that should be sent in response to
`Op::Interrupt`.

In the MCP server, updates the handling of a
`ClientRequest::InterruptConversation` request such that it sends the
`Op::Interrupt` but does not respond to the request until it sees an
`EventMsg::TurnAborted`.
2025-08-17 21:40:31 -07:00
Kazuhiro Sera
dcfdd2faf5 Fix #2296 Add "minimal" reasoning effort for GPT 5 models (#2326)
This pull request resolves #2296; I've confirmed if it works by:

1. Add settings to ~/.codex/config.toml:
```toml
model_reasoning_effort = "minimal"
```

2. Run the CLI:
```
cd codex-rs
cargo build && RUST_LOG=trace cargo run --bin codex
/status
tail -f ~/.codex/log/codex-tui.log
```

Co-authored-by: pakrym-oai <pakrym@openai.com>
2025-08-15 12:59:52 -07:00
Michael Bolin
d262244725 fix: introduce codex-protocol crate (#2355) 2025-08-15 12:44:40 -07:00
Michael Bolin
17aa394ae7 feat: introduce Op:UserTurn (#2329)
This introduces `Op::UserTurn`, which makes it possible to override many
of the fields that were set when the `Session` was originally created
when creating a new conversation turn. This is one way we could support
changing things like `model` or `cwd` in the middle of the conversation,
though we may want to consider making each field optional, or
alternatively having a separate `Op` that mutates the `TurnContext`
associated with a `submission_loop()`.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2329).
* #2345
* __->__ #2329
* #2343
* #2340
* #2338
2025-08-15 09:56:05 -07:00
Michael Bolin
13ed67cfc1 feat: introduce TurnContext (#2343)
This PR introduces `TurnContext`, which is designed to hold a set of
fields that should be constant for a turn of a conversation. Note that
the fields of `TurnContext` were previously governed by `Session`.

Ultimately, we want to enable users to change these values between turns
(changing model, approval policy, etc.), though in the current
implementation, the `TurnContext` is constant for the entire
conversation.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2345).
* #2345
* #2329
* __->__ #2343
* #2340
* #2338
2025-08-15 09:40:02 -07:00
Michael Bolin
6730592433 fix: introduce MutexExt::lock_unchecked() so we stop ignoring unwrap() throughout codex.rs (#2340)
This way we are sure a dangerous `unwrap()` does not sneak in!

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2340).
* #2345
* #2329
* #2343
* __->__ #2340
* #2338
2025-08-15 09:14:44 -07:00
Michael Bolin
26c8373821 fix: tighten up checks against writable folders for SandboxPolicy (#2338)
I was looking at the implementation of `Session::get_writable_roots()`,
which did not seem right, as it was a copy of writable roots, which is
not guaranteed to be in sync with the `sandbox_policy` field.

I looked at who was calling `get_writable_roots()` and its only call
site was `apply_patch()` in `codex-rs/core/src/apply_patch.rs`, which
took the roots and forwarded them to `assess_patch_safety()` in
`safety.rs`. I updated `assess_patch_safety()` to take `sandbox_policy:
&SandboxPolicy` instead of `writable_roots: &[PathBuf]` (and replaced
`Session::get_writable_roots()` with `Session::get_sandbox_policy()`).

Within `safety.rs`, it was fairly easy to update
`is_write_patch_constrained_to_writable_paths()` to work with
`SandboxPolicy`, and in particular, it is far more accurate because, for
better or worse, `SandboxPolicy::get_writable_roots_with_cwd()` _returns
an empty vec_ for `SandboxPolicy::DangerFullAccess`, suggesting that
_nothing_ is writable when in reality _everything_ is writable. With
this PR, `is_write_patch_constrained_to_writable_paths()` now does the
right thing for each variant of `SandboxPolicy`.

I thought this would be the end of the story, but it turned out that
`test_writable_roots_constraint()` in `safety.rs` needed to be updated,
as well. In particular, the test was writing to
`std::env::current_dir()` instead of a `TempDir`, which I suspect was a
holdover from earlier when `SandboxPolicy::WorkspaceWrite` would always
make `TMPDIR` writable on macOS, which made it hard to write tests to
verify `SandboxPolicy` in `TMPDIR`. Fortunately, we now have
`exclude_tmpdir_env_var` as an option on
`SandboxPolicy::WorkspaceWrite`, so I was able to update the test to
preserve the existing behavior, but to no longer write to
`std::env::current_dir()`.







---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2338).
* #2345
* #2329
* #2343
* #2340
* __->__ #2338
2025-08-15 09:06:15 -07:00
Dylan
6df8e35314 [tools] Add apply_patch tool (#2303)
## Summary
We've been seeing a number of issues and reports with our synthetic
`apply_patch` tool, e.g. #802. Let's make this a real tool - in my
anecdotal testing, it's critical for GPT-OSS models, but I'd like to
make it the standard across GPT-5 and codex models as well.

## Testing
- [x] Tested locally
- [x] Integration test
2025-08-15 11:55:53 -04:00
Parker Thompson
a075424437 Added allow-expect-in-tests / allow-unwrap-in-tests (#2328)
This PR:
* Added the clippy.toml to configure allowable expect / unwrap usage in
tests
* Removed as many expect/allow lines as possible from tests
* moved a bunch of allows to expects where possible

Note: in integration tests, non `#[test]` helper functions are not
covered by this so we had to leave a few lingering `expect(expect_used`
checks around
2025-08-14 17:59:01 -07:00
Michael Bolin
2ecca79663 fix: run python_multiprocessing_lock_works integration test on Mac and Linux (#2318)
The high-order bit on this PR is that it makes it so `sandbox.rs` tests
both Mac and Linux, as we introduce a general
`spawn_command_under_sandbox()` function with platform-specific
implementations for testing.

An important, and interesting, discovery in porting the test to Linux is
that (for reasons cited in the code comments), `/dev/shm` has to be
added to `writable_roots` on Linux in order for `multiprocessing.Lock`
to work there. Granting write access to `/dev/shm` comes with some
degree of risk, so we do not make this the default for Codex CLI.

Piggybacking on top of #2317, this moves the
`python_multiprocessing_lock_works` test yet again, moving
`codex-rs/core/tests/sandbox.rs` to `codex-rs/exec/tests/sandbox.rs`
because in `codex-rs/exec/tests` we can use `cargo_bin()` like so:

```
let codex_linux_sandbox_exe = assert_cmd::cargo::cargo_bin("codex-exec");
```

which is necessary so we can use `codex_linux_sandbox_exe` and therefore
`spawn_command_under_linux_sandbox` in an integration test.

This also moves `spawn_command_under_linux_sandbox()` out of `exec.rs`
and into `landlock.rs`, which makes things more consistent with
`seatbelt.rs` in `codex-core`.

For reference, https://github.com/openai/codex/pull/1808 is the PR that
made the change to Seatbelt to get this test to pass on Mac.
2025-08-14 15:47:48 -07:00
Michael Bolin
a8c7f5391c fix: move general sandbox tests to codex-rs/core/tests/sandbox.rs (#2317)
Previous to this PR, `codex-rs/core/tests/sandbox.rs` contained
integration tests that were specific to Seatbelt. This PR moves those
tests to `codex-rs/core/src/seatbelt.rs` and designates
`codex-rs/core/tests/sandbox.rs` to be used as the home for
cross-platform (well, Mac and Linux...) sandbox tests.

To start, this migrates
`python_multiprocessing_lock_works_under_seatbelt()` from #1823 to the
new `sandbox.rs` because this is the type of thing that should work on
both Mac _and_ Linux, though I still need to do some work to clean up
the test so it works on both platforms.
2025-08-14 14:48:38 -07:00
David Z Hao
992e81d9b5 test(core): add seatbelt sem lock tests (#1823)
## Summary
- add a unit test to ensure the macOS seatbelt policy allows POSIX
semaphores
- add a macOS-only test that runs a Python multiprocessing Lock under
Seatbelt

## Testing
- `cargo test -p codex_core seatbelt_base_policy_allows_ipc_posix_sem
--no-fail-fast` (failed: failed to download from
`https://static.crates.io/crates/tokio-stream/0.1.17/download`)
- `cargo test -p codex_core seatbelt_base_policy_allows_ipc_posix_sem
--no-fail-fast --offline` (failed: attempting to make an HTTP request,
but --offline was specified)
- `cargo test --all-features --no-fail-fast --offline` (failed:
attempting to make an HTTP request, but --offline was specified)
- `just fmt` (failed: command not found: just)
- `just fix` (failed: command not found: just)

Ran tests locally to confirm it passes on master and failed before my
previous change

------
https://chatgpt.com/codex/tasks/task_i_6890f221e0a4833381cfb53e11499bcc
2025-08-14 14:23:06 -07:00
Jeremy Rose
7038827bf4 fix bash commands being incorrectly quoted in display (#2313)
The "display format" of commands was sometimes producing incorrect
quoting like `echo foo '>' bar`, which is importantly different from the
actual command that was being run. This refactors ParsedCommand to have
a string in `cmd` instead of a vec, as a `vec` can't accurately capture
a full command.
2025-08-14 17:08:29 -04:00
Michael Bolin
8f11652458 fix: parallelize logic in Session::new() (#2305)
#2291 made it so that `Session::new()` is on the critical path to
`Codex::spawn()`, which means it is on the hot path to CLI startup. This
refactors `Session::new()` to run a number of async tasks in parallel
that were previously run serially to try to reduce latency.
2025-08-14 13:29:58 -07:00
Dylan
544980c008 [context] Store context messages in rollouts (#2243)
## Summary
Currently, we use request-time logic to determine the user_instructions
and environment_context messages. This means that neither of these
values can change over time as conversations go on. We want to add in
additional details here, so we're migrating these to save these messages
to the rollout file instead. This is simpler for the client, and allows
us to append additional environment_context messages to each turn if we
want

## Testing
- [x] Integration test coverage
- [x] Tested locally with a few turns, confirmed model could reference
environment context and cached token metrics were reasonably high
2025-08-14 14:51:13 -04:00
Michael Bolin
cf7a7e63a3 exploration: create Session as part of Codex::spawn() (#2291)
Historically, `Codex::spawn()` would create the instance of `Codex` and
enforce, by construction, that `Op::ConfigureSession` was the first `Op`
submitted via `submit()`. Then over in `submission_loop()`, it would
handle the case for taking the parameters of `Op::ConfigureSession` and
turning it into a `Session`.

This approach has two challenges from a state management perspective:


f968a1327a/codex-rs/core/src/codex.rs (L718)

- The local `sess` variable in `submission_loop()` has to be `mut` and
`Option<Arc<Session>>` because it is not invariant that a `Session` is
present for the lifetime of the loop, so there is a lot of logic to deal
with the case where `sess` is `None` (e.g., the `send_no_session_event`
function and all of its callsites).
- `submission_loop()` is written in such a way that
`Op::ConfigureSession` could be observed multiple times, but in
practice, it is only observed exactly once at the start of the loop.

In this PR, we try to simplify the state management by _removing_ the
`Op::ConfigureSession` enum variant and constructing the `Session` as
part of `Codex::spawn()` so that it can be passed to `submission_loop()`
as `Arc<Session>`. The original logic from the `Op::ConfigureSession`
has largely been moved to the new `Session::new()` constructor.

---

Incidentally, I also noticed that the handling of `Op::ConfigureSession`
can result in events being dispatched in addition to
`EventMsg::SessionConfigured`, as an `EventMsg::Error` is created for
every MCP initialization error, so it is important to preserve that
behavior:


f968a1327a/codex-rs/core/src/codex.rs (L901-L916)

Though admittedly, I believe this does not play nice with #2264, as
these error messages will likely be dispatched before the client has a
chance to call `addConversationListener`, so we likely need to make it
so `newConversation` automatically creates the subscription, but we must
also guarantee that the "ack" from `newConversation` is returned before
any other conversation-related notifications are sent so the client
knows what `conversation_id` to match on.
2025-08-14 09:55:28 -07:00
Michael Bolin
085f166707 fix: make all fields of Session private (#2285)
As `Session` needs a bit of work, it will make things easier to move
around if we can start by reducing the extent of its public API. This
makes all the fields private, though adds three `pub(crate)` getters.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2285).
* #2287
* #2286
* __->__ #2285
2025-08-13 22:53:54 -07:00
pakrym-oai
f1be7978cf Parse reasoning text content (#2277)
Sometimes COT is returns as text content instead of `ReasoningText`. We
should parse it but not serialize back on requests.

---------

Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
2025-08-13 18:39:58 -07:00
pakrym-oai
de2c6a2ce7 Enable reasoning for codex-prefixed models (#2275)
## Summary
- enable reasoning for any model slug starting with `codex-`
- provide default model info for `codex-*` slugs
- test that codex models are detected and support reasoning

## Testing
- `just fmt`
- `just fix` *(fails: E0658 `let` expressions in this position are
unstable)*
- `cargo test --all-features` *(fails: E0658 `let` expressions in this
position are unstable)*

------
https://chatgpt.com/codex/tasks/task_i_689d13f8705483208a6ed21c076868e1
2025-08-13 17:02:50 -07:00