Commit Graph

691 Commits

Author SHA1 Message Date
Robby He
dc2f26f7b5 Fix is_api_message to correctly exclude reasoning messages (#6156)
## Problem

The `is_api_message` function in `conversation_history.rs` had a
misalignment between its documentation and implementation:

- **Comment stated**: "Anything that is not a system message or
'reasoning' message is considered an API message"
- **Code behavior**: Was returning `true` for `ResponseItem::Reasoning`,
meaning reasoning messages were incorrectly treated as API messages

This inconsistency could lead to reasoning messages being persisted in
conversation history when they should be filtered out.

## Root Cause

Investigation revealed that reasoning messages are explicitly excluded
throughout the codebase:

1. **Chat completions API** (lines 267-272 in `chat_completions.rs`)
omits reasoning from conversation history:
   ```rust
   ResponseItem::Reasoning { .. } | ResponseItem::Other => {
       // Omit these items from the conversation history.
       continue;
   }
   ```

2. **Existing tests** like `drops_reasoning_when_last_role_is_user` and
`ignores_reasoning_before_last_user` validate that reasoning should be
excluded from API payloads

## Solution

Fixed the `is_api_message` function to align with its documentation and
the rest of the codebase:

```rust
// Before: Reasoning was incorrectly returning true
ResponseItem::Reasoning { .. } | ResponseItem::WebSearchCall { .. } => true,

// After: Reasoning correctly returns false  
ResponseItem::WebSearchCall { .. } => true,
ResponseItem::Reasoning { .. } | ResponseItem::Other => false,
```

## Testing

- Enhanced existing test to verify reasoning messages are properly
filtered out
- All 264 core tests pass, including 8 chat completions tests that
validate reasoning behavior
- No regressions introduced

This ensures reasoning messages are consistently excluded from API
message processing across the entire codebase.
2025-11-03 20:55:41 -08:00
Eric Traut
1e0e553304 Fixed notify handler so it passes correct input_messages details (#6143)
This fixes bug #6121. 

The `input_messages` field passed to the notify handler is currently
empty because the logic is incorrectly including the OutputText rather
than InputText. I've fixed that and added proper filtering to remove
messages associated with AGENTS.md and other context injected by the
harness.

Testing: I wrote a notify handler and verified that the user prompt is
correctly passed through to the handler.
2025-11-03 14:23:04 -08:00
iceweasel-oai
07b7d28937 log sandbox commands to $CODEX_HOME instead of cwd (#6171)
Logging commands in the Windows Sandbox is temporary, but while we are
doing it, let's always write to CODEX_HOME instead of dirtying the cwd.
2025-11-03 13:12:33 -08:00
Ahmed Ibrahim
6ee7fbcfff feat: add the time after aborting (#5996)
Tell the model how much time passed after the user aborted the call.
2025-11-03 11:44:06 -08:00
iceweasel-oai
2eda75a8ee Do not skip trust prompt on Windows if sandbox is enabled. (#6167)
If the experimental windows sandbox is enabled, the trust prompt should
show on Windows.
2025-11-03 11:27:45 -08:00
Vinh Nguyen
a1ee10b438 fix: improve usage URLs in status card and snapshots (#6111)
Hi OpenAI Codex team, currently "Visit chatgpt.com/codex/settings/usage
for up-to-date information on rate limits and credits" message in status
card and error messages. For now, without the "https://" prefix, the
link cannot be clicked directly from most terminals or chat interfaces.

<img width="636" height="127" alt="Screenshot 2025-11-02 at 22 47 06"
src="https://github.com/user-attachments/assets/5ea11e8b-fb74-451c-85dc-f4d492b2678b"
/>

---

The fix is intent to improve this issue:

- It makes the link clickable in terminals that support it, hence better
accessibility
- It follows standard URL formatting practices
- It maintains consistency with other links in the application (like the
existing "https://openai.com/chatgpt/pricing" links)

Thank you!
2025-11-02 21:44:59 -08:00
Eric Traut
0c7efa0cfd Fix incorrect "deprecated" message about experimental config key (#6131)
When I enable `experimental_sandbox_command_assessment`, I get an
incorrect deprecation warning: "experimental_sandbox_command_assessment
is deprecated. Use experimental_sandbox_command_assessment instead."

This PR fixes this error.
2025-11-02 16:33:09 -08:00
Eric Traut
d5853d9c47 Changes to sandbox command assessment feature based on initial experiment feedback (#6091)
* Removed sandbox risk categories; feedback indicates that these are not
that useful and "less is more"
* Tweaked the assessment prompt to generate terser answers
* Fixed bug in orchestrator that prevents this feature from being
exposed in the extension
2025-11-01 14:52:23 -07:00
Thomas Stokes
d9118c04bf Parse the Azure OpenAI rate limit message (#5956)
Fixes #4161

Currently Codex uses a regex to parse the "Please try again in 1.898s"
OpenAI-style rate limit message, so that it can wait the correct
duration before retrying. Azure OpenAI returns a different error that
looks like "Rate limit exceeded. Try again in 35 seconds."

This PR extends the regex and parsing code to match in a more fuzzy
manner, handling anything matching the pattern "try again in
\<duration>\<unit>".
2025-11-01 09:33:13 -07:00
jif-oai
611e00c862 feat: compactor 2 (#6027)
Co-authored-by: pakrym-oai <pakrym@openai.com>
2025-10-31 14:27:08 -07:00
Ahmed Ibrahim
c8ebb2a0dc Add warning on compact (#6052)
This PR introduces the ability for `core` to send `warnings` as it can
send `errors. It also sends a warning on compaction.

<img width="811" height="187" alt="image"
src="https://github.com/user-attachments/assets/0947a42d-b720-420d-b7fd-115f8a65a46a"
/>
2025-10-31 13:27:33 -07:00
Dylan Hurd
88e083a9d0 chore: Add shell serialization tests for json (#6043)
## Summary
Can never have enough tests on this code path - checking that json
inside a shell call is deserialized correctly.

## Tests
- [x] These are tests 😎
2025-10-31 11:01:58 -07:00
Ahmed Ibrahim
1c8507b32a Truncate total tool calls text (#5979)
Put a cap on the aggregate output of text content on tool calls.

---------

Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>
2025-10-31 10:30:36 -07:00
jif-oai
0508823075 test: undo (#6034) 2025-10-31 14:46:24 +00:00
pakrym-oai
2371d771cc Update user instruction message format (#6010) 2025-10-30 18:44:02 -07:00
Ahmed Ibrahim
dc2aeac21f override verbosity for gpt-5-codex (#6007)
we are seeing [reports](https://github.com/openai/codex/issues/6004) of
users having verbosity in their config.toml and facing issues.
gpt-5-codex doesn't accept other values rather than medium for
verbosity.
2025-10-31 00:45:05 +00:00
Jack
f842849bec docs: Fix markdown list item spacing in codex-rs/core/review_prompt.md (#4144)
Fixes a Markdown parsing issue where a list item used `*` without a
following space (`*Line ranges ...`). Per CommonMark, a space after the
list marker is required. Updated to `* Line ranges ...` so the guideline
renders as a standalone bullet. This change improves readability and
prevents mis-parsing in renderers.

Co-authored-by: Eric Traut <etraut@openai.com>
2025-10-30 17:39:21 -07:00
zhao-oai
dcf73970d2 rate limit errors now provide absolute time (#6000) 2025-10-30 20:33:25 -04:00
Ahmed Ibrahim
a3d3719481 Remove last turn reasoning filtering (#5986) 2025-10-30 23:20:32 +00:00
iceweasel-oai
87cce88f48 Windows Sandbox - Alpha version (#4905)
- Added the new codex-windows-sandbox crate that builds both a library
entry point (run_windows_sandbox_capture) and a CLI executable to launch
commands inside a Windows restricted-token sandbox, including ACL
management, capability SID provisioning, network lockdown, and output
capture
(windows-sandbox-rs/src/lib.rs:167, windows-sandbox-rs/src/main.rs:54).
- Introduced the experimental WindowsSandbox feature flag and wiring so
Windows builds can opt into the sandbox:
SandboxType::WindowsRestrictedToken, the in-process execution path, and
platform sandbox selection now honor the flag (core/src/features.rs:47,
core/src/config.rs:1224, core/src/safety.rs:19,
core/src/sandboxing/mod.rs:69, core/src/exec.rs:79,
core/src/exec.rs:172).
- Updated workspace metadata to include the new crate and its
Windows-specific dependencies so the core crate can link against it
(codex-rs/
    Cargo.toml:91, core/Cargo.toml:86).
- Added a PowerShell bootstrap script that installs the Windows
toolchain, required CLI utilities, and builds the workspace to ease
development
    on the platform (scripts/setup-windows.ps1:1).
- Landed a Python smoke-test suite that exercises
read-only/workspace-write policies, ACL behavior, and network denial for
the Windows sandbox
    binary (windows-sandbox-rs/sandbox_smoketests.py:1).
2025-10-30 15:51:57 -07:00
Bernard Niset
ff6d4cec6b fix: Update seatbelt policy for java on macOS (#3987)
# Summary

This PR is related to the Issue #3978 and contains a fix to the seatbelt
profile for macOS that allows to run java/jdk tooling from the sandbox.
I have found that the included change is the minimum change to make it
run on my machine.

There is a unit test added by codex when making this fix. I wonder if it
is useful since you need java installed on the target machine for it to
be relevant. I can remove it it is better.

Fixes #3978
2025-10-30 14:25:04 -07:00
Celia Chen
6ef658a9f9 [Hygiene] Remove include_view_image_tool config (#5976)
There's still some debate about whether we want to expose
`tools.view_image` or `feature.view_image` so those are left unchanged
for now, but this old `include_view_image_tool` config is good-to-go.
Also updated the doc to reflect that `view_image` tool is now by default
true.
2025-10-30 13:23:24 -07:00
Anton Panasenko
9572cfc782 [codex] add developer instructions (#5897)
we are using developer instructions for code reviews, we need to pass
them in cli as well.
2025-10-30 11:18:31 -07:00
Dylan Hurd
4a55646a02 chore: testing on freeform apply_patch (#5952)
## Summary
Duplicates the tests in `apply_patch_cli.rs`, but tests the freeform
apply_patch tool as opposed to the function call path. The good news is
that all the tests pass with zero logical tests, with the exception of
the heredoc, which doesn't really make sense in the freeform tool
context anyway.

@jif-oai since you wrote the original tests in #5557, I'd love your
opinion on the right way to DRY these test cases between the two. Happy
to set up a more sophisticated harness, but didn't want to go down the
rabbit hole until we agreed on the right pattern

## Testing
- [x] These are tests
2025-10-30 10:40:48 -07:00
jif-oai
f4f9695978 feat: compaction prompt configurable (#5959)
```
 codex -c compact_prompt="Summarize in bullet points"
 ```
2025-10-30 14:24:24 +00:00
Ahmed Ibrahim
5fcc380bd9 Pass initial history as an optional to codex delegate (#5950)
This will give us more freedom on controlling the delegation. i.e we can
fork our history and run `compact`.
2025-10-30 07:22:42 -07:00
jif-oai
aa76003e28 chore: unify config crates (#5958) 2025-10-30 10:28:32 +00:00
Ahmed Ibrahim
fac548e430 Send delegate header (#5942)
Send delegate type header
2025-10-30 09:49:40 +00:00
zhao-oai
b34efde2f3 asdf (#5940)
.
2025-10-30 01:10:41 +00:00
Ahmed Ibrahim
7aa46ab5fc ignore agent message deltas for the review mode (#5937)
The deltas produce the whole json output. ignore them.
2025-10-30 00:47:55 +00:00
pakrym-oai
3429e82e45 Add item streaming events (#5546)
Adds AgentMessageContentDelta, ReasoningContentDelta,
ReasoningRawContentDelta item streaming events while maintaining
compatibility for old events.

---------

Co-authored-by: Owen Lin <owen@openai.com>
2025-10-29 22:33:57 +00:00
Ahmed Ibrahim
13e1d0362d Delegate review to codex instance (#5572)
In this PR, I am exploring migrating task kind to an invocation of
Codex. The main reason would be getting rid off multiple
`ConversationHistory` state and streamlining our context/history
management.

This approach depends on opening a channel between the sub-codex and
codex. This channel is responsible for forwarding `interactive`
(`approvals`) and `non-interactive` events. The `task` is responsible
for handling those events.

This opens the door for implementing `codex as a tool`, replacing
`compact` and `review`, and potentially subagents.

One consideration is this code is very similar to `app-server` specially
in the approval part. If in the future we wanted an interactive
`sub-codex` we should consider using `codex-mcp`
2025-10-29 21:04:25 +00:00
jif-oai
db31f6966d chore: config editor (#5878)
The goal is to have a single place where we actually write files

In a follow-up PR, will move everything config related in a dedicated
module and move the helpers in a dedicated file
2025-10-29 20:52:46 +00:00
Rasmus Rygaard
39e09c289d Add a wrapper around raw response items (#5923)
We currently have nested enums when sending raw response items in the
app-server protocol. This makes downstream schemas confusing because we
need to embed `type`-discriminated enums within each other.

This PR adds a small wrapper around the response item so we can keep the
schemas separate
2025-10-29 20:32:40 +00:00
jif-oai
3183935bd7 feat: add output even in sandbox denied (#5908) 2025-10-29 18:21:18 +00:00
jif-oai
060637b4d4 feat: deprecation warning (#5825)
<img width="955" height="311" alt="Screenshot 2025-10-28 at 14 26 25"
src="https://github.com/user-attachments/assets/99729b3d-3bc9-4503-aab3-8dc919220ab4"
/>
2025-10-29 12:29:28 +00:00
jif-oai
fa92cd92fa chore: merge git crates (#5909)
Merge `git-apply` and `git-tooling` into `utils/`
2025-10-29 12:11:44 +00:00
Abhishek Bhardwaj
89591e4246 feature: Add "!cmd" user shell execution (#2471)
feature: Add "!cmd" user shell execution

This change lets users run local shell commands directly from the TUI by
prefixing their input with ! (e.g. !ls). Output is truncated to keep the
exec cell usable, and Ctrl-C cleanly
  interrupts long-running commands (e.g. !sleep 10000).

**Summary of changes**

- Route Op::RunUserShellCommand through a dedicated UserShellCommandTask
(core/src/tasks/user_shell.rs), keeping the task logic out of codex.rs.
- Reuse the existing tool router: the task constructs a ToolCall for the
local_shell tool and relies on ShellHandler, so no manual MCP tool
lookup is required.
- Emit exec lifecycle events (ExecCommandBegin/ExecCommandEnd) so the
TUI can show command metadata, live output, and exit status.

**End-to-end flow**

  **TUI handling**

1. ChatWidget::submit_user_message (TUI) intercepts messages starting
with !.
2. Non-empty commands dispatch Op::RunUserShellCommand { command };
empty commands surface a help hint.
3. No UserInput items are created, so nothing is enqueued for the model.

  **Core submission loop**
4. The submission loop routes the op to handlers::run_user_shell_command
(core/src/codex.rs).
5. A fresh TurnContext is created and Session::spawn_user_shell_command
enqueues UserShellCommandTask.

  **Task execution**
6. UserShellCommandTask::run emits TaskStartedEvent, formats the
command, and prepares a ToolCall targeting local_shell.
  7. ToolCallRuntime::handle_tool_call dispatches to ShellHandler.

  **Shell tool runtime**
8. ShellHandler::run_exec_like launches the process via the unified exec
runtime, honoring sandbox and shell policies, and emits
ExecCommandBegin/End.
9. Stdout/stderr are captured for the UI, but the task does not turn the
resulting ToolOutput into a model response.

  **Completion**
10. After ExecCommandEnd, the task finishes without an assistant
message; the session marks it complete and the exec cell displays the
final output.

  **Conversation context**

- The command and its output never enter the conversation history or the
model prompt; the flow is local-only.
  - Only exec/task events are emitted for UI rendering.

**Demo video**


https://github.com/user-attachments/assets/fcd114b0-4304-4448-a367-a04c43e0b996
2025-10-29 00:31:20 -07:00
Axojhf
802d2440b4 Fix bash detection failure in VS Code Codex extension on Windows under certain conditions (#3421)
Found that the VS Code Codex extension throws “Error starting
conversation” when initializing a conversation with Git for Windows’
bash on PATH.
Debugging showed the bash-detection logic did not return as expected;
this change makes it reliable in that scenario.
Possibly related to issue #2841.
2025-10-28 21:29:16 -07:00
pakrym-oai
ef3e075ad6 Refresh tokens more often and log a better message when both auth and token refresh fails (#5655)
<img width="784" height="153" alt="image"
src="https://github.com/user-attachments/assets/c44b0eb2-d65c-4fc2-8b54-b34f7e1c4d95"
/>
2025-10-28 18:55:53 -07:00
Anton Panasenko
149e198ce8 [codex][app-server] resume conversation from history (#5893) 2025-10-28 18:18:03 -07:00
zhao-oai
36113509f2 verify mime type of images (#5888)
solves: https://github.com/openai/codex/issues/5675

Block non-image uploads in the view_image workflow. We now confirm the
file’s MIME is image/* before building the data URL; otherwise we emit a
“unsupported MIME type” error to the model. This stops the agent from
sending application/json blobs that the Responses API rejects with 400s.

<img width="409" height="556" alt="Screenshot 2025-10-28 at 1 15 10 PM"
src="https://github.com/user-attachments/assets/a92199e8-2769-4b1d-8e33-92d9238c90fe"
/>
2025-10-28 14:52:51 -07:00
Ahmed Ibrahim
ef55992ab0 remove beta experimental header (#5892) 2025-10-28 21:28:56 +00:00
pakrym-oai
1b8f2543ac Filter out reasoning items from previous turns (#5857)
Reduces request size and prevents 400 errors when switching between API
orgs.

Based on Responses API behavior described in
https://cookbook.openai.com/examples/responses_api/reasoning_items#caching
2025-10-28 11:39:34 -07:00
Jeremy Rose
65107d24a2 Fix handling of non-main default branches for cloud task submissions (#5069)
## Summary
- detect the repository's default branch before submitting a cloud task
- expose a helper in `codex_core::git_info` for retrieving the default
branch name

Fixes #4888


------
https://chatgpt.com/codex/tasks/task_i_68e96093cf28832ca0c9c73fc618a309
2025-10-28 11:02:25 -07:00
jif-oai
5ba2a17576 chore: decompose submission loop (#5854) 2025-10-28 15:23:46 +00:00
jif-oai
be4bdfec93 chore: drop useless shell stuff (#5848) 2025-10-28 14:52:52 +00:00
jif-oai
7ff142d93f chore: speed-up pipeline (#5812)
Speed-up pipeline by:
* Decoupling tests and clippy
* Use pre-built binary in tests
* `sccache` for caching of the builds
2025-10-28 14:08:52 +00:00
Celia Chen
4a42c4e142 [Auth] Choose which auth storage to use based on config (#5792)
This PR is a follow-up to #5591. It allows users to choose which auth
storage mode they want by using the new
`cli_auth_credentials_store_mode` config.
2025-10-27 19:41:49 -07:00
Josh McKinney
66a4b89822 feat(tui): clarify Windows auto mode requirements (#5568)
## Summary
- Coerce Windows `workspace-write` configs back to read-only, surface
the forced downgrade in the approvals popup,
  and funnel users toward WSL or Full Access.
- Add WSL installation instructions to the Auto preset on Windows while
keeping the preset available for other
  platforms.
- Skip the trust-on-first-run prompt on native Windows so new folders
remain read-only without additional
  confirmation.
- Expose a structured sandbox policy resolution from config to flag
Windows downgrades and adjust tests (core,
exec, TUI) to reflect the new behavior; provide a Windows-only approvals
snapshot.

  ## Testing
  - cargo fmt
- cargo test -p codex-core
config::tests::add_dir_override_extends_workspace_writable_roots
- cargo test -p codex-exec
suite::resume::exec_resume_preserves_cli_configuration_overrides
- cargo test -p codex-tui
chatwidget::tests::approvals_selection_popup_snapshot
- cargo test -p codex-tui
approvals_popup_includes_wsl_note_for_auto_mode
  - cargo test -p codex-tui windows_skips_trust_prompt
  - just fix -p codex-core
  - just fix -p codex-tui
2025-10-28 01:19:32 +00:00