Commit Graph

74 Commits

Author SHA1 Message Date
Thibault Sottiaux
d7286e9829 chore: remove model upgrade popup (#4332) 2025-09-27 13:25:09 -07:00
jif-oai
1fc3413a46 ref: state - 2 (#4229)
Extracting tasks in a module and start abstraction behind a Trait (more
to come on this but each task will be tackled in a dedicated PR)
The goal was to drop the ActiveTask and to have a (potentially) set of
tasks during each turn
2025-09-26 13:49:08 +00:00
jif-oai
250b244ab4 ref: full state refactor (#4174)
## Current State Observations
- `Session` currently holds many unrelated responsibilities (history,
approval queues, task handles, rollout recorder, shell discovery, token
tracking, etc.), making it hard to reason about ownership and lifetimes.
- The anonymous `State` struct inside `codex.rs` mixes session-long data
with turn-scoped queues and approval bookkeeping.
- Turn execution (`run_task`) relies on ad-hoc local variables that
should conceptually belong to a per-turn state object.
- External modules (`codex::compact`, tests) frequently poke the raw
`Session.state` mutex, which couples them to implementation details.
- Interrupts, approvals, and rollout persistence all have bespoke
cleanup paths, contributing to subtle bugs when a turn is aborted
mid-flight.

## Desired End State
- Keep a slim `Session` object that acts as the orchestrator and façade.
It should expose a focused API (submit, approvals, interrupts, event
emission) without storing unrelated fields directly.
- Introduce a `state` module that encapsulates all mutable data
structures:
- `SessionState`: session-persistent data (history, approved commands,
token/rate-limit info, maybe user preferences).
- `ActiveTurn`: metadata for the currently running turn (sub-id, task
kind, abort handle) and an `Arc<TurnState>`.
- `TurnState`: all turn-scoped pieces (pending inputs, approval waiters,
diff tracker, review history, auto-compact flags, last agent message,
outstanding tool call bookkeeping).
- Group long-lived helpers/managers into a dedicated `SessionServices`
struct so `Session` does not accumulate "random" fields.
- Provide clear, lock-safe APIs so other modules never touch raw
mutexes.
- Ensure every turn creates/drops a `TurnState` and that
interrupts/finishes delegate cleanup to it.
2025-09-25 12:16:06 +02:00
iceweasel-oai
0e58870634 adds a windows-specific method to check if a command is safe (#4119)
refactors command_safety files into its own package, so we can add
platform-specific ones
Also creates a windows-specific of `is_known_safe_command` that just
returns false always, since that is what happens today.
2025-09-24 14:03:43 -07:00
pakrym-oai
addc946d13 Simplify tool implemetations (#4160)
Use Result<String, FunctionCallError> for all tool handling code and
rely on error propagation instead of creating failed items everywhere.
2025-09-24 17:27:35 +00:00
Jeremy Rose
b34e906396 Reland "refactor transcript view to handle HistoryCells" (#3753)
Reland of #3538
2025-09-18 20:55:53 +00:00
dedrisian-oai
72733e34c4 Add dev message upon review out (#3758)
Proposal: We want to record a dev message like so:

```
{
      "type": "message",
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "<user_action>
  <context>User initiated a review task. Here's the full review output from reviewer model. User may select one or more comments to resolve.</context>
  <action>review</action>
  <results>
  {findings_str}
  </results>
</user_action>"
        }
      ]
    },
```

Without showing in the chat transcript.

Rough idea, but it fixes issue where the user finishes a review thread,
and asks the parent "fix the rest of the review issues" thinking that
the parent knows about it.

### Question: Why not a tool call?

Because the agent didn't make the call, it was a human. + we haven't
implemented sub-agents yet, and we'll need to think about the way we
represent these human-led tool calls for the agent.
2025-09-16 18:43:32 -07:00
dedrisian-oai
7fe4021f95 Review mode core updates (#3701)
1. Adds the environment prompt (including cwd) to review thread
2. Prepends the review prompt as a user message (temporary fix so the
instructions are not replaced on backend)
3. Sets reasoning to low
4. Sets default review model to `gpt-5-codex`
2025-09-16 13:36:51 -07:00
Ahmed Ibrahim
a30e5e40ee enable-resume (#3537)
Adding the ability to resume conversations.
we have one verb `resume`. 

Behavior:

`tui`:
`codex resume`: opens session picker
`codex resume --last`: continue last message
`codex resume <session id>`: continue conversation with `session id`

`exec`:
`codex resume --last`: continue last conversation
`codex resume <session id>`: continue conversation with `session id`

Implementation:
- I added a function to find the path in `~/.codex/sessions/` with a
`UUID`. This is helpful in resuming with session id.
- Added the above mentioned flags
- Added lots of testing
2025-09-14 19:33:19 -04:00
jif-oai
8453915e02 feat: TUI onboarding (#3398)
Example of how onboarding could look like
2025-09-11 15:04:29 -07:00
jif-oai
c09ed74a16 Unified execution (#3288)
## Unified PTY-Based Exec Tool

Note: this requires to have this flag in the config:
`use_experimental_unified_exec_tool=true`

- Adds a PTY-backed interactive exec feature (“unified_exec”) with
session reuse via
  session_id, bounded output (128 KiB), and timeout clamping (≤ 60 s).
- Protocol: introduces ResponseItem::UnifiedExec { session_id,
arguments, timeout_ms }.
- Tools: exposes unified_exec as a function tool (Responses API);
excluded from Chat
  Completions payload while still supported in tool lists.
- Path handling: resolves commands via PATH (or explicit paths), with
UTF‑8/newline‑aware
  truncation (truncate_middle).
- Tests: cover command parsing, path resolution, session
persistence/cleanup, multi‑session
  isolation, timeouts, and truncation behavior.
2025-09-10 17:38:11 -07:00
dedrisian-oai
87654ec0b7 Persist model & reasoning changes (#2799)
Persists `/model` changes across both general and profile-specific
sessions.
2025-09-10 20:53:46 +00:00
Ahmed Ibrahim
45bd5ca4b9 Move initial history to protocol (#3422)
To fix an edge case of forking then resuming

#3419
2025-09-10 10:17:24 -07:00
Wang
ac8a3155d6 feat(core): re-export InitialHistory from conversation_manager (#3270)
This commit adds a re-export for InitialHistory from the internal
conversation_manager module in codex-core's lib.rs.

The `RolloutRecorder::get_rollout_history` method (exposed via `pub use
rollout::RolloutRecorder;`, already present in lib.rs) returns an
`InitialHistory` type, which is defined in the private
conversation_manager module. Without this re-export, consumers of the
public RolloutRecorder API would not be able to directly use the return
type, as they cannot access the private module. This would result in an
inconvenient experience where the method's return value cannot be
handled without additional, non-obvious imports.

By adding `pub use conversation_manager::InitialHistory;`, we make
InitialHistory available as `codex_core::InitialHistory`, improving API
ergonomics for users of the rollout functionality while keeping the
conversation_manager module internal.

No functional changes are made; this is a pure re-export for better
usability.

Signed-off-by: M4n5ter <m4n5terrr@gmail.com>
2025-09-09 10:37:08 -07:00
Michael Bolin
ace14e8d36 feat: add ArchiveConversation to ClientRequest (#3353)
Adds support for `ArchiveConversation` in the JSON-RPC server that takes
a `(ConversationId, PathBuf)` pair and:

- verifies the `ConversationId` corresponds to the rollout id at the
`PathBuf`
- if so, invokes
`ConversationManager.remove_conversation(ConversationId)`
- if the `CodexConversation` was in memory, send `Shutdown` and wait for
`ShutdownComplete` with a timeout
- moves the `.jsonl` file to `$CODEX_HOME/archived_sessions`

---------

Co-authored-by: Gabriel Peal <gabriel@openai.com>
2025-09-09 11:39:00 -04:00
Gabriel Peal
5eaaf307e1 Generate more typescript types and return conversation id with ConversationSummary (#3219)
This PR does multiple things that are necessary for conversation resume
to work from the extension. I wanted to make sure everything worked so
these changes wound up in one PR:
1. Generate more ts types
2. Resume rollout history files rather than create a new one every time
it is resumed so you don't see a duplicate conversation in history for
every resume. Chatted with @aibrahim-oai to verify this
3. Return conversation_id in conversation summaries
4. [Cleanup] Use serde and strong types for a lot of the rollout file
parsing
2025-09-08 17:54:47 -04:00
Jeremy Rose
d6182becbe syntax-highlight bash lines (#3142)
i'm not yet convinced i have the best heuristics for what to highlight,
but this feels like a useful step towards something a bit easier to
read, esp. when the model is producing large commands.

<img width="669" height="589" alt="Screenshot 2025-09-03 at 8 21 56 PM"
src="https://github.com/user-attachments/assets/b9cbcc43-80e8-4d41-93c8-daa74b84b331"
/>

also a fairly significant refactor of our line wrapping logic.
2025-09-05 14:10:32 +00:00
Ahmed Ibrahim
234c0a0469 TUI: Add session resume picker (--resume) and quick resume (--continue) (#3135)
Adds a TUI resume flow with an interactive picker and quick resume.

- CLI: 
  - --resume / -r: open picker to resume a prior session
  - --continue   / -l: resume the most recent session (no picker)
- Behavior on resume: initial history is replayed, welcome banner
hidden, and the first redraw is suppressed to avoid flicker.
- Implementation:
- New tui/src/resume_picker.rs (paginated listing via
RolloutRecorder::list_conversations)
  - App::run accepts ResumeSelection; resumes from disk when requested
- ChatWidget refactor with ChatWidgetInit and new_from_existing; replays
initial messages
- Tests: cover picker sorting/preview extraction and resumed-history
rendering.
- Docs: getting-started updated with flags and picker usage.



https://github.com/user-attachments/assets/1bb6469b-e5d1-42f6-bec6-b1ae6debda3b
2025-09-04 06:20:40 +00:00
Ahmed Ibrahim
2b96f9f569 Dividing UserMsgs into categories to send it back to the tui (#3127)
This PR does the following:

- divides user msgs into 3 categories: plain, user instructions, and
environment context
- Centralizes adding user instructions and environment context to a
degree
- Improve the integration testing

Building on top of #3123

Specifically this
[comment](https://github.com/openai/codex/pull/3123#discussion_r2319885089).
We need to send the user message while ignoring the User Instructions
and Environment Context we attach.
2025-09-04 05:34:50 +00:00
Ahmed Ibrahim
f2036572b6 Replay EventMsgs from Response Items when resuming a session with history. (#3123)
### Overview

This PR introduces the following changes:
	1.	Adds a unified mechanism to convert ResponseItem into EventMsg.
2. Ensures that when a session is initialized with initial history, a
vector of EventMsg is sent along with the session configuration. This
allows clients to re-render the UI accordingly.
	3. 	Added integration testing

### Caveats

This implementation does not send every EventMsg that was previously
dispatched to clients. The excluded events fall into two categories:
	•	“Arguably” rolled-out events
Examples include tool calls and apply-patch calls. While these events
are conceptually rolled out, we currently only roll out ResponseItems.
These events are already being handled elsewhere and transformed into
EventMsg before being sent.
	•	Non-rolled-out events
Certain events such as TurnDiff, Error, and TokenCount are not rolled
out at all.

### Future Directions

At present, resuming a session involves maintaining two states:
	•	UI State
Clients can replay most of the important UI from the provided EventMsg
history.
	•	Model State
The model receives the complete session history to reconstruct its
internal state.

This design provides a solid foundation. If, in the future, more precise
UI reconstruction is needed, we have two potential paths:
1. Introduce a third data structure that allows us to derive both
ResponseItems and EventMsgs.
2. Clearly divide responsibilities: the core system ensures the
integrity of the model state, while clients are responsible for
reconstructing the UI.
2025-09-04 04:47:00 +00:00
pakrym-oai
c636f821ae Add a common way to create HTTP client (#3110)
Ensure User-Agent and originator are always sent.
2025-09-03 10:11:02 -07:00
Ahmed Ibrahim
d77b33ded7 core(rollout): extract rollout module, add listing API, and return file heads (#1634)
- Move rollout persistence and listing into a dedicated module:
rollout/{recorder,list}.
- Expose lightweight conversation listing that returns file paths plus
the first 5 JSONL records for preview.
2025-09-03 07:39:19 +00:00
Dominik Kundel
b127a3643f Improve gpt-oss compatibility (#2461)
The gpt-oss models require reasoning with subsequent Chat Completions
requests because otherwise the model forgets why the tools were called.
This change fixes that and also adds some additional missing
documentation around how to handle context windows in Ollama and how to
show the CoT if you desire to.
2025-09-02 19:49:03 -07:00
pakrym-oai
03e2796ca4 Move CodexAuth and AuthManager to the core crate (#3074)
Fix a long standing layering issue.
2025-09-02 18:36:19 -07:00
dedrisian-oai
b8e8454b3f Custom /prompts (#2696)
Adds custom `/prompts` to `~/.codex/prompts/<command>.md`.

<img width="239" height="107" alt="Screenshot 2025-08-25 at 6 22 42 PM"
src="https://github.com/user-attachments/assets/fe6ebbaa-1bf6-49d3-95f9-fdc53b752679"
/>

---

Details:

1. Adds `Op::ListCustomPrompts` to core.
2. Returns `ListCustomPromptsResponse` with list of `CustomPrompt`
(name, content).
3. TUI calls the operation on load, and populates the custom prompts
(excluding prompts that collide with builtins).
4. Selecting the custom prompt automatically sends the prompt to the
agent.
2025-08-29 02:16:39 +00:00
Michael Bolin
e3b03eaccb feat: StreamableShell with exec_command and write_stdin tools (#2574) 2025-08-22 18:10:55 -07:00
Ahmed Ibrahim
097782c775 Move models.rs to protocol (#2595)
Moving models.rs to protocol so we can use them in `Codex` operations
2025-08-22 22:18:54 +00:00
Dylan
236c4f76a6 [apply_patch] freeform apply_patch tool (#2576)
## Summary
GPT-5 introduced the concept of [custom
tools](https://platform.openai.com/docs/guides/function-calling#custom-tools),
which allow the model to send a raw string result back, simplifying
json-escape issues. We are migrating gpt-5 to use this by default.

However, gpt-oss models do not support custom tools, only normal
functions. So we keep both tool definitions, and provide whichever one
the model family supports.

## Testing
- [x] Tested locally with various models
- [x] Unit tests pass
2025-08-22 13:42:34 -07:00
Jeremy Rose
db934e438e read all AGENTS.md up to git root (#2532)
This updates our logic for AGENTS.md to match documented behavior, which
is to read all AGENTS.md files from cwd up to git root.
2025-08-21 08:52:17 -07:00
Jeremy Rose
0ad4e11c84 detect terminal and include in request headers (#2437)
This adds the terminal version to the UA header.
2025-08-20 16:54:26 +00:00
Michael Bolin
d262244725 fix: introduce codex-protocol crate (#2355) 2025-08-15 12:44:40 -07:00
Michael Bolin
2ecca79663 fix: run python_multiprocessing_lock_works integration test on Mac and Linux (#2318)
The high-order bit on this PR is that it makes it so `sandbox.rs` tests
both Mac and Linux, as we introduce a general
`spawn_command_under_sandbox()` function with platform-specific
implementations for testing.

An important, and interesting, discovery in porting the test to Linux is
that (for reasons cited in the code comments), `/dev/shm` has to be
added to `writable_roots` on Linux in order for `multiprocessing.Lock`
to work there. Granting write access to `/dev/shm` comes with some
degree of risk, so we do not make this the default for Codex CLI.

Piggybacking on top of #2317, this moves the
`python_multiprocessing_lock_works` test yet again, moving
`codex-rs/core/tests/sandbox.rs` to `codex-rs/exec/tests/sandbox.rs`
because in `codex-rs/exec/tests` we can use `cargo_bin()` like so:

```
let codex_linux_sandbox_exe = assert_cmd::cargo::cargo_bin("codex-exec");
```

which is necessary so we can use `codex_linux_sandbox_exe` and therefore
`spawn_command_under_linux_sandbox` in an integration test.

This also moves `spawn_command_under_linux_sandbox()` out of `exec.rs`
and into `landlock.rs`, which makes things more consistent with
`seatbelt.rs` in `codex-core`.

For reference, https://github.com/openai/codex/pull/1808 is the PR that
made the change to Seatbelt to get this test to pass on Mac.
2025-08-14 15:47:48 -07:00
Dylan
544980c008 [context] Store context messages in rollouts (#2243)
## Summary
Currently, we use request-time logic to determine the user_instructions
and environment_context messages. This means that neither of these
values can change over time as conversations go on. We want to add in
additional details here, so we're migrating these to save these messages
to the rollout file instead. This is simpler for the client, and allows
us to append additional environment_context messages to each turn if we
want

## Testing
- [x] Integration test coverage
- [x] Tested locally with a few turns, confirmed model could reference
environment context and cached token metrics were reasonably high
2025-08-14 14:51:13 -04:00
Michael Bolin
08ed618f72 chore: introduce ConversationManager as a clearinghouse for all conversations (#2240)
This PR does two things because after I got deep into the first one I
started pulling on the thread to the second:

- Makes `ConversationManager` the place where all in-memory
conversations are created and stored. Previously, `MessageProcessor` in
the `codex-mcp-server` crate was doing this via its `session_map`, but
this is something that should be done in `codex-core`.
- It unwinds the `ctrl_c: tokio::sync::Notify` that was threaded
throughout our code. I think this made sense at one time, but now that
we handle Ctrl-C within the TUI and have a proper `Op::Interrupt` event,
I don't think this was quite right, so I removed it. For `codex exec`
and `codex proto`, we now use `tokio::signal::ctrl_c()` directly, but we
no longer make `Notify` a field of `Codex` or `CodexConversation`.

Changes of note:

- Adds the files `conversation_manager.rs` and `codex_conversation.rs`
to `codex-core`.
- `Codex` and `CodexSpawnOk` are no longer exported from `codex-core`:
other crates must use `CodexConversation` instead (which is created via
`ConversationManager`).
- `core/src/codex_wrapper.rs` has been deleted in favor of
`ConversationManager`.
- `ConversationManager::new_conversation()` returns `NewConversation`,
which is in line with the `new_conversation` tool we want to add to the
MCP server. Note `NewConversation` includes `SessionConfiguredEvent`, so
we eliminate checks in cases like `codex-rs/core/tests/client.rs` to
verify `SessionConfiguredEvent` is the first event because that is now
internal to `ConversationManager`.
- Quite a bit of code was deleted from
`codex-rs/mcp-server/src/message_processor.rs` since it no longer has to
manage multiple conversations itself: it goes through
`ConversationManager` instead.
- `core/tests/live_agent.rs` has been deleted because I had to update a
bunch of tests and all the tests in here were ignored, and I don't think
anyone ever ran them, so this was just technical debt, at this point.
- Removed `notify_on_sigint()` from `util.rs` (and in a follow-up, I
hope to refactor the blandly-named `util.rs` into more descriptive
files).
- In general, I started replacing local variables named `codex` as
`conversation`, where appropriate, though admittedly I didn't do it
through all the integration tests because that would have added a lot of
noise to this PR.




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2240).
* #2264
* #2263
* __->__ #2240
2025-08-13 13:38:18 -07:00
pakrym-oai
cb78f2333e Set user-agent (#2230)
Use the same well-defined value in all cases when sending user-agent
header
2025-08-12 16:40:04 +00:00
Gabriel Peal
7f6408720b [1/3] Parse exec commands and format them more nicely in the UI (#2095)
# Note for reviewers
The bulk of this PR is in in the new file, `parse_command.rs`. This file
is designed to be written TDD and implemented with Codex. Do not worry
about reviewing the code, just review the unit tests (if you want). If
any cases are missing, we'll add more tests and have Codex fix them.

I think the best approach will be to land and iterate. I have some
follow-ups I want to do after this lands. The next PR after this will
let us merge (and dedupe) multiple sequential cells of the same such as
multiple read commands. The deduping will also be important because the
model often reads the same file multiple times in a row in chunks

===

This PR formats common commands like reading, formatting, testing, etc
more nicely:

It tries to extract things like file names, tests and falls back to the
cmd if it doesn't. It also only shows stdout/err if the command failed.

<img width="770" height="238" alt="CleanShot 2025-08-09 at 16 05 15"
src="https://github.com/user-attachments/assets/0ead179a-8910-486b-aa3d-7d26264d751e"
/>
<img width="348" height="158" alt="CleanShot 2025-08-09 at 16 05 32"
src="https://github.com/user-attachments/assets/4302681b-5e87-4ff3-85b4-0252c6c485a9"
/>
<img width="834" height="324" alt="CleanShot 2025-08-09 at 16 05 56 2"
src="https://github.com/user-attachments/assets/09fb3517-7bd6-40f6-a126-4172106b700f"
/>

Part 2: https://github.com/openai/codex/pull/2097
Part 3: https://github.com/openai/codex/pull/2110
2025-08-11 14:26:15 -04:00
Dylan
aff97ed7dd [core] Separate tools config from openai client (#1858)
## Summary
In an effort to make tools easier to work with and more configurable,
I'm introducing `ToolConfig` and updating `Prompt` to take in a general
list of Tools. I think this is simpler and better for a few reasons:
- We can easily assemble tools from various sources (our own harness,
mcp servers, etc.) and we can consolidate the logic for constructing the
logic in one place that is separate from serialization.
- client.rs no longer needs arbitrary config values, it just takes in a
list of tools to serialize

A hefty portion of the PR is now updating our conversion of
`mcp_types::Tool` to `OpenAITool`, but considering that @bolinfest
accurately called this out as a TODO long ago, I think it's time we
tackled it.

## Testing
- [x] Experimented locally, no changes, as expected
- [x] Added additional unit tests
- [x] Responded to rust-review
2025-08-05 19:27:52 -07:00
Michael Bolin
d365cae077 fix: when using --oss, ensure correct configuration is threaded through correctly (#1859)
This PR started as an investigation with the goal of eliminating the use
of `unsafe { std::env::set_var() }` in `ollama/src/client.rs`, as
setting environment variables in a multithreaded context is indeed
unsafe and these tests were observed to be flaky, as a result.

Though as I dug deeper into the issue, I discovered that the logic for
instantiating `OllamaClient` under test scenarios was not quite right.
In this PR, I aimed to:

- share more code between the two creation codepaths,
`try_from_oss_provider()` and `try_from_provider_with_base_url()`
- use the values from `Config` when setting up Ollama, as we have
various mechanisms for overriding config values, so we should be sure
that we are always using the ultimate `Config` for things such as the
`ModelProviderInfo` associated with the `oss` id

Once this was in place,
`OllamaClient::try_from_provider_with_base_url()` could be used in unit
tests for `OllamaClient` so it was possible to create a properly
configured client without having to set environment variables.
2025-08-05 13:55:32 -07:00
easong-openai
9285350842 Introduce --oss flag to use gpt-oss models (#1848)
This adds support for easily running Codex backed by a local Ollama
instance running our new open source models. See
https://github.com/openai/gpt-oss for details.

If you pass in `--oss` you'll be prompted to install/launch ollama, and
it will automatically download the 20b model and attempt to use it.

We'll likely want to expand this with some options later to make the
experience smoother for users who can't run the 20b or want to run the
120b.

Co-authored-by: Michael Bolin <mbolin@openai.com>
2025-08-05 11:31:11 -07:00
Michael Bolin
136b3ee5bf chore: introduce ModelFamily abstraction (#1838)
To date, we have a number of hardcoded OpenAI model slug checks spread
throughout the codebase, which makes it hard to audit the various
special cases for each model. To mitigate this issue, this PR introduces
the idea of a `ModelFamily` that has fields to represent the existing
special cases, such as `supports_reasoning_summaries` and
`uses_local_shell_tool`.

There is a `find_family_for_model()` function that maps the raw model
slug to a `ModelFamily`. This function hardcodes all the knowledge about
the special attributes for each model. This PR then replaces the
hardcoded model name checks with checks against a `ModelFamily`.

Note `ModelFamily` is now available as `Config::model_family`. We should
ultimately remove `Config::model` in favor of
`Config::model_family::slug`.
2025-08-04 23:50:03 -07:00
Gabriel Peal
1f3318c1c5 Add a TurnDiffTracker to create a unified diff for an entire turn (#1770)
This lets us show an accumulating diff across all patches in a turn.
Refer to the docs for TurnDiffTracker for implementation details.

There are multiple ways this could have been done and this felt like the
right tradeoff between reliability and completeness:
*Pros*
* It will pick up all changes to files that the model touched including
if they prettier or another command that updates them.
* It will not pick up changes made by the user or other agents to files
it didn't modify.

*Cons*
* It will pick up changes that the user made to a file that the model
also touched
* It will not pick up changes to codegen or files that were not modified
with apply_patch
2025-08-04 11:57:04 -04:00
Dylan
e3565a3f43 [sandbox] Filter out certain non-sandbox errors (#1804)
## Summary
Users frequently complain about re-approving commands that have failed
for non-sandbox reasons. We can't diagnose with complete accuracy which
errors happened because of a sandbox failure, but we can start to
eliminate some common simple cases.

This PR captures the most common case I've seen, which is a `command not
found` error.

## Testing
- [x] Added unit tests
- [x] Ran a few cases locally
2025-08-03 13:05:48 -07:00
Michael Bolin
5a0ad5ab8f chore: refactor exec.rs: create separate seatbelt.rs and spawn.rs files (#1762)
At 550 lines, `exec.rs` was a bit large. In particular, I found it hard
to locate the Seatbelt-related code quickly without a file with
`seatbelt` in the name, so this refactors things so:

- `spawn_command_under_seatbelt()` and dependent code moves to a new
`seatbelt.rs` file
- `spawn_child_async()` and dependent code moves to a new `spawn.rs`
file
2025-07-31 13:11:47 -07:00
Michael Bolin
221ebfcccc fix: run apply_patch calls through the sandbox (#1705)
Building on the work of https://github.com/openai/codex/pull/1702, this
changes how a shell call to `apply_patch` is handled.

Previously, a shell call to `apply_patch` was always handled in-process,
never leveraging a sandbox. To determine whether the `apply_patch`
operation could be auto-approved, the
`is_write_patch_constrained_to_writable_paths()` function would check if
all the paths listed in the paths were writable. If so, the agent would
apply the changes listed in the patch.

Unfortunately, this approach afforded a loophole: symlinks!

* For a soft link, we could fix this issue by tracing the link and
checking whether the target is in the set of writable paths, however...
* ...For a hard link, things are not as simple. We can run `stat FILE`
to see if the number of links is greater than 1, but then we would have
to do something potentially expensive like `find . -inum <inode_number>`
to find the other paths for `FILE`. Further, even if this worked, this
approach runs the risk of a
[TOCTOU](https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use)
race condition, so it is not robust.

The solution, implemented in this PR, is to take the virtual execution
of the `apply_patch` CLI into an _actual_ execution using `codex
--codex-run-as-apply-patch PATCH`, which we can run under the sandbox
the user specified, just like any other `shell` call.

This, of course, assumes that the sandbox prevents writing through
symlinks as a mechanism to write to folders that are not in the writable
set configured by the sandbox. I verified this by testing the following
on both Mac and Linux:

```shell
#!/usr/bin/env bash
set -euo pipefail

# Can running a command in SANDBOX_DIR write a file in EXPLOIT_DIR?

# Codex is run in SANDBOX_DIR, so writes should be constrianed to this directory.
SANDBOX_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX)
# EXPLOIT_DIR is outside of SANDBOX_DIR, so let's see if we can write to it.
EXPLOIT_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX)

echo "SANDBOX_DIR: $SANDBOX_DIR"
echo "EXPLOIT_DIR: $EXPLOIT_DIR"

cleanup() {
  # Only remove if it looks sane and still exists
  [[ -n "${SANDBOX_DIR:-}" && -d "$SANDBOX_DIR" ]] && rm -rf -- "$SANDBOX_DIR"
  [[ -n "${EXPLOIT_DIR:-}" && -d "$EXPLOIT_DIR" ]] && rm -rf -- "$EXPLOIT_DIR"
}

trap cleanup EXIT

echo "I am the original content" > "${EXPLOIT_DIR}/original.txt"

# Drop the -s to test hard links.
ln -s "${EXPLOIT_DIR}/original.txt" "${SANDBOX_DIR}/link-to-original.txt"

cat "${SANDBOX_DIR}/link-to-original.txt"

if [[ "$(uname)" == "Linux" ]]; then
    SANDBOX_SUBCOMMAND=landlock
else
    SANDBOX_SUBCOMMAND=seatbelt
fi

# Attempt the exploit
cd "${SANDBOX_DIR}"

codex debug "${SANDBOX_SUBCOMMAND}" bash -lc "echo pwned > ./link-to-original.txt" || true

cat "${EXPLOIT_DIR}/original.txt"
```

Admittedly, this change merits a proper integration test, but I think I
will have to do that in a follow-up PR.
2025-07-30 16:45:08 -07:00
pakrym-oai
ea01a5ffe2 Add support for a separate chatgpt auth endpoint (#1712)
Adds a `CodexAuth` type that encapsulates information about available
auth modes and logic for refreshing the token.
Changes `Responses` API to send requests to different endpoints based on
the auth type.
Updates login_with_chatgpt to support API-less mode and skip the key
exchange.
2025-07-30 19:40:15 +00:00
Gabriel Peal
8828f6f082 Add an experimental plan tool (#1726)
This adds a tool the model can call to update a plan. The tool doesn't
actually _do_ anything but it gives clients a chance to read and render
the structured plan. We will likely iterate on the prompt and tools
exposed for planning over time.
2025-07-29 14:22:02 -04:00
Michael Bolin
5ebb7dd34c chore: split apply_patch logic out of codex.rs and into apply_patch.rs (#1703)
This is a straight refactor, moving apply-patch-related code from
`codex.rs` and into the new `apply_patch.rs` file. The only "logical"
change is inlining `#[allow(clippy::unwrap_used)]` instead of declaring
`#![allow(clippy::unwrap_used)]` at the top of the file (which is
currently the case in `codex.rs`).

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1703).
* #1705
* __->__ #1703
* #1702
* #1698
* #1697
2025-07-28 09:51:22 -07:00
Michael Bolin
2405c40026 chore: update Codex::spawn() to return a struct instead of a tuple (#1677)
Also update `init_codex()` to return a `struct` instead of a tuple, as well.
2025-07-27 20:01:35 -07:00
pakrym-oai
7ee87123a6 Optionally run using user profile (#1678) 2025-07-25 11:45:23 -07:00
Michael Bolin
a1641743a8 feat: expand the set of commands that can be safely identified as "trusted" (#1668)
This PR updates `is_known_safe_command()` to account for "safe
operators" to expand the set of commands that can be run without
approval. This concept existed in the TypeScript CLI, and we are
[finally!] porting it to the Rust one:


c9e2def494/codex-cli/src/approvals.ts (L531-L541)

The idea is that if we have `EXPR1 SAFE_OP EXPR2` and `EXPR1` and
`EXPR2` are considered safe independently, then `EXPR1 SAFE_OP EXPR2`
should be considered safe. Currently, `SAFE_OP` includes `&&`, `||`,
`;`, and `|`.

In the TypeScript implementation, we relied on
https://www.npmjs.com/package/shell-quote to parse the string of Bash,
as it could provide a "lightweight" parse tree, parsing `'beep || boop >
/byte'` as:

```
[ 'beep', { op: '||' }, 'boop', { op: '>' }, '/byte' ]
```

Though in this PR, we introduce the use of
https://crates.io/crates/tree-sitter-bash for parsing (which
incidentally we were already using in
[`codex-apply-patch`](c9e2def494/codex-rs/apply-patch/Cargo.toml (L18))),
which gives us a richer parse tree. (Incidentally, if you have never
played with tree-sitter, try the
[playground](https://tree-sitter.github.io/tree-sitter/7-playground.html)
and select **Bash** from the dropdown to see how it parses various
expressions.)

As a concrete example, prior to this change, our implementation of
`is_known_safe_command()` could verify things like:

```
["bash", "-lc", "grep -R \"Cargo.toml\" -n"]
```

but not:

```
["bash", "-lc", "grep -R \"Cargo.toml\" -n || true"]
```

With this change, the version with `|| true` is also accepted.

Admittedly, this PR does not expand the safety check to support
subshells, so it would reject, e.g. `bash -lc 'ls || (pwd && echo hi)'`,
but that can be addressed in a subsequent PR.
2025-07-24 14:13:30 -07:00