Commit Graph

25 Commits

Author SHA1 Message Date
Dylan
4ae6b9787a standardize shell description (#3514)
## Summary
Standardizes the shell description across sandbox_types, since we cover
this in the prompt, and have moved necessary details (like
network_access and writeable workspace roots) to EnvironmentContext
messages.

## Test Plan
- [x] updated unit tests
2025-09-12 14:24:09 -04:00
jif-oai
c09ed74a16 Unified execution (#3288)
## Unified PTY-Based Exec Tool

Note: this requires to have this flag in the config:
`use_experimental_unified_exec_tool=true`

- Adds a PTY-backed interactive exec feature (“unified_exec”) with
session reuse via
  session_id, bounded output (128 KiB), and timeout clamping (≤ 60 s).
- Protocol: introduces ResponseItem::UnifiedExec { session_id,
arguments, timeout_ms }.
- Tools: exposes unified_exec as a function tool (Responses API);
excluded from Chat
  Completions payload while still supported in tool lists.
- Path handling: resolves commands via PATH (or explicit paths), with
UTF‑8/newline‑aware
  truncation (truncate_middle).
- Tests: cover command parsing, path resolution, session
persistence/cleanup, multi‑session
  isolation, timeouts, and truncation behavior.
2025-09-10 17:38:11 -07:00
Jeremy Rose
f69f07b028 put workspace roots in the environment context (#3375)
to keep the tool description constant when the writable roots change.
2025-09-10 15:10:52 -07:00
Jeremy Rose
97000c6e6d core: correct sandboxed shell tool description (reads allowed anywhere) (#3069)
Correct the `shell` tool description for sandboxed runs and add targeted
tests.

- Fix the WorkspaceWrite description to clearly state that writes
outside the writable roots require escalated permissions; reads are not
restricted. The previous wording/formatting could be read as restricting
reads outside the workspace.
- Render the writable roots list on its own lines under a newline after
"writable roots:" for clarity.
- Show the "Commands that require network access" note only in
WorkspaceWrite when network is disabled.
- Add focused tests that call `create_shell_tool_for_sandbox` directly
and assert the exact description text for WorkspaceWrite, ReadOnly, and
DangerFullAccess.
- Update AGENTS.md to note that `just fmt` can be run automatically
without asking.
2025-09-03 10:02:34 -07:00
Ahmed Ibrahim
a56eb48195 Use the new search tool (#3086)
We were using the preview search tool in the past. We should use the new
one.
2025-09-03 01:16:47 -07:00
Ahmed Ibrahim
9dbe7284d2 Following up on #2371 post commit feedback (#2852)
- Introduce websearch end to complement the begin 
- Moves the logic of adding the sebsearch tool to
create_tools_json_for_responses_api
- Making it the client responsibility to toggle the tool on or off 
- Other misc in #2371 post commit feedback
- Show the query:

<img width="1392" height="151" alt="image"
src="https://github.com/user-attachments/assets/8457f1a6-f851-44cf-bcca-0d4fe460ce89"
/>
2025-08-28 19:24:38 -07:00
dedrisian-oai
4e9ad23864 Add "View Image" tool (#2723)
Adds a "View Image" tool so Codex can find and see images by itself:

<img width="1772" height="420" alt="Screenshot 2025-08-26 at 10 40
04 AM"
src="https://github.com/user-attachments/assets/7a459c7b-0b86-4125-82d9-05fbb35ade03"
/>
2025-08-27 17:41:23 -07:00
Michael Bolin
7b20db942a fix: build is broken on main; introduce ToolsConfigParams to help fix (#2663)
`ToolsConfig::new()` taking a large number of boolean params was hard to
manage and it finally bit us (see
https://github.com/openai/codex/pull/2660). This changes
`ToolsConfig::new()` so that it takes a struct (and also reduces the
visibility of some members, where possible).
2025-08-24 22:43:42 -07:00
Uhyeon Park
ee2ccb5cb6 Fix cache hit rate by making MCP tools order deterministic (#2611)
Fixes https://github.com/openai/codex/issues/2610

This PR sorts the tools in `get_openai_tools` by name to ensure a
consistent MCP tool order.

Currently, MCP servers are stored in a HashMap, which does not guarantee
ordering. As a result, the tool order changes across turns, effectively
breaking prompt caching in multi-turn sessions.

An alternative solution would be to replace the HashMap with an ordered
structure, but that would require a much larger code change. Given that
it is unrealistic to have so many MCP tools that sorting would cause
performance issues, this lightweight fix is chosen instead.

By ensuring deterministic tool order, this change should significantly
improve cache hit rates and prevent users from hitting usage limits too
quickly. (For reference, my own sessions last week reached the limit
unusually fast, with cache hit rates falling below 1%.)

## Result

After this fix, sessions with MCP servers now show caching behavior
almost identical to sessions without MCP servers.
Without MCP             |  With MCP
:-------------------------:|:-------------------------:
<img width="1368" height="1634" alt="image"
src="https://github.com/user-attachments/assets/26edab45-7be8-4d6a-b471-558016615fc8"
/> | <img width="1356" height="1632" alt="image"
src="https://github.com/user-attachments/assets/5f3634e0-3888-420b-9aaf-deefd9397b40"
/>
2025-08-24 19:56:24 -07:00
Reuben Narad
363636f5eb Add web search tool (#2371)
Adds web_search tool, enabling the model to use Responses API web_search
tool.
- Disabled by default, enabled by --search flag
- When --search is passed, exposes web_search_request function tool to
the model, which triggers user approval. When approved, the model can
use the web_search tool for the remainder of the turn
<img width="1033" height="294" alt="image"
src="https://github.com/user-attachments/assets/62ac6563-b946-465c-ba5d-9325af28b28f"
/>

---------

Co-authored-by: easong-openai <easong@openai.com>
2025-08-23 22:58:56 -07:00
Michael Bolin
e3b03eaccb feat: StreamableShell with exec_command and write_stdin tools (#2574) 2025-08-22 18:10:55 -07:00
Dylan
236c4f76a6 [apply_patch] freeform apply_patch tool (#2576)
## Summary
GPT-5 introduced the concept of [custom
tools](https://platform.openai.com/docs/guides/function-calling#custom-tools),
which allow the model to send a raw string result back, simplifying
json-escape issues. We are migrating gpt-5 to use this by default.

However, gpt-oss models do not support custom tools, only normal
functions. So we keep both tool definitions, and provide whichever one
the model family supports.

## Testing
- [x] Tested locally with various models
- [x] Unit tests pass
2025-08-22 13:42:34 -07:00
Dylan
e4c275d615 [apply-patch] Clean up apply-patch tool definitions (#2539)
## Summary
We've experienced a bit of drift in system prompting for `apply_patch`:
- As pointed out in #2030 , our prettier formatting started altering
prompt.md in a few ways
- We introduced a separate markdown file for apply_patch instructions in
#993, but currently duplicate them in the prompt.md file
- We added a first-class apply_patch tool in #2303, which has yet
another definition

This PR starts to consolidate our logic in a few ways:
- We now only use
`apply_patch_tool_instructions.md](https://github.com/openai/codex/compare/dh--apply-patch-tool-definition?expand=1#diff-d4fffee5f85cb1975d3f66143a379e6c329de40c83ed5bf03ffd3829df985bea)
for system instructions
- We no longer include apply_patch system instructions if the tool is
specified

I'm leaving the definition in openai_tools.rs as duplicated text for now
because we're going to be iterated on the first-class tool soon.

## Testing
- [x] Added integration tests to verify prompt stability
- [x] Tested locally with several different models (gpt-5, gpt-oss,
o4-mini)
2025-08-21 20:07:41 -07:00
Dylan
9f71dcbf57 [shell_tool] Small updates to ensure shell consistency (#2571)
## Summary
Small update to hopefully improve some shell edge cases, and make the
function clearer to the model what is going on. Keeping `timeout` as an
alias means that calls with the previous name will still work.

## Test Plan
- [x] Tested locally, model still works
2025-08-21 19:58:07 -07:00
Michael Bolin
50c48e88f5 chore: upgrade to Rust 1.89 (#2465)
Codex created this PR from the following prompt:

> upgrade this entire repo to Rust 1.89. Note that this requires
updating codex-rs/rust-toolchain.toml as well as the workflows in
.github/. Make sure that things are "clippy clean" as this change will
likely uncover new Clippy errors. `just fmt` and `cargo clippy --tests`
are sufficient to check for correctness

Note this modifies a lot of lines because it folds nested `if`
statements using `&&`.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2465).
* #2467
* __->__ #2465
2025-08-19 13:22:02 -07:00
Dylan
6df8e35314 [tools] Add apply_patch tool (#2303)
## Summary
We've been seeing a number of issues and reports with our synthetic
`apply_patch` tool, e.g. #802. Let's make this a real tool - in my
anecdotal testing, it's critical for GPT-OSS models, but I'd like to
make it the standard across GPT-5 and codex models as well.

## Testing
- [x] Tested locally
- [x] Integration test
2025-08-15 11:55:53 -04:00
Parker Thompson
a075424437 Added allow-expect-in-tests / allow-unwrap-in-tests (#2328)
This PR:
* Added the clippy.toml to configure allowable expect / unwrap usage in
tests
* Removed as many expect/allow lines as possible from tests
* moved a bunch of allows to expects where possible

Note: in integration tests, non `#[test]` helper functions are not
covered by this so we had to leave a few lingering `expect(expect_used`
checks around
2025-08-14 17:59:01 -07:00
Yaroslav
f146981b73 feat: add JSON schema sanitization for MCP tools to ensure compatibil… (#1975)
…ity with internal JsonSchema enum

Closes: #1973 

Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>
2025-08-10 17:57:39 -07:00
Dylan
725dd6be6a [approval_policy] Add OnRequest approval_policy (#1865)
## Summary
A split-up PR of #1763 , stacked on top of a tools refactor #1858 to
make the change clearer. From the previous summary:

> Let's try something new: tell the model about the sandbox, and let it
decide when it will need to break the sandbox. Some local testing
suggests that it works pretty well with zero iteration on the prompt!

## Testing
- [x] Added unit tests
- [x] Tested locally and it appears to work smoothly!
2025-08-05 20:44:20 -07:00
Dylan
aff97ed7dd [core] Separate tools config from openai client (#1858)
## Summary
In an effort to make tools easier to work with and more configurable,
I'm introducing `ToolConfig` and updating `Prompt` to take in a general
list of Tools. I think this is simpler and better for a few reasons:
- We can easily assemble tools from various sources (our own harness,
mcp servers, etc.) and we can consolidate the logic for constructing the
logic in one place that is separate from serialization.
- client.rs no longer needs arbitrary config values, it just takes in a
list of tools to serialize

A hefty portion of the PR is now updating our conversion of
`mcp_types::Tool` to `OpenAITool`, but considering that @bolinfest
accurately called this out as a TODO long ago, I think it's time we
tackled it.

## Testing
- [x] Experimented locally, no changes, as expected
- [x] Added additional unit tests
- [x] Responded to rust-review
2025-08-05 19:27:52 -07:00
Dylan
d31e149cb1 [prompt] Update prompt.md (#1839)
## Summary
Additional clarifications to our prompt. Still very concise, but we'll
continue to add more here.
2025-08-05 00:43:23 -07:00
Michael Bolin
136b3ee5bf chore: introduce ModelFamily abstraction (#1838)
To date, we have a number of hardcoded OpenAI model slug checks spread
throughout the codebase, which makes it hard to audit the various
special cases for each model. To mitigate this issue, this PR introduces
the idea of a `ModelFamily` that has fields to represent the existing
special cases, such as `supports_reasoning_summaries` and
`uses_local_shell_tool`.

There is a `find_family_for_model()` function that maps the raw model
slug to a `ModelFamily`. This function hardcodes all the knowledge about
the special attributes for each model. This PR then replaces the
hardcoded model name checks with checks against a `ModelFamily`.

Note `ModelFamily` is now available as `Config::model_family`. We should
ultimately remove `Config::model` in favor of
`Config::model_family::slug`.
2025-08-04 23:50:03 -07:00
Gabriel Peal
8828f6f082 Add an experimental plan tool (#1726)
This adds a tool the model can call to update a plan. The tool doesn't
actually _do_ anything but it gives clients a chance to read and render
the structured plan. We will likely iterate on the prompt and tools
exposed for planning over time.
2025-07-29 14:22:02 -04:00
Michael Bolin
e40f86b446 chore: logging cleanup (#1196)
Update what we log to make `RUST_LOG=debug` a bit easier to work with.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1196).
* #1167
* __->__ #1196
2025-06-02 13:31:33 -07:00
Michael Bolin
1bf82056b3 fix: introduce create_tools_json() and share it with chat_completions.rs (#1177)
The main motivator behind this PR is that `stream_chat_completions()`
was not adding the `"tools"` entry to the payload posted to the
`/chat/completions` endpoint. This (1) refactors the existing logic to
build up the `"tools"` JSON from `client.rs` into `openai_tools.rs`, and
(2) updates the use of responses API (`client.rs`) and chat completions
API (`chat_completions.rs`) to both use it.

Note this PR alone is not sufficient to get tool calling from chat
completions working: that is done in
https://github.com/openai/codex/pull/1167.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1177).
* #1167
* __->__ #1177
2025-05-30 14:07:03 -07:00