Commit Graph

61 Commits

Author SHA1 Message Date
Owen Lin
c84fc83222 Use int timestamps for rate limit reset_at (#5383)
The backend will be returning unix timestamps (seconds since epoch)
instead of RFC 3339 strings. This will make it more ergonomic for
developers to integrate against - no string parsing.
2025-10-20 12:26:46 -07:00
Rasmus Rygaard
846960ae3d Generate JSON schema for app-server protocol (#5063)
Add annotations and an export script that let us generate app-server
protocol types as typescript and JSONSchema.

The script itself is a bit hacky because we need to manually label some
of the types. Unfortunately it seems that enum variants don't get good
names by default and end up with something like `EventMsg1`,
`EventMsg2`, etc. I'm not an expert in this by any means, but since this
is only run manually and we already need to enumerate the types required
to describe the protocol, it didn't seem that much worse. An ideal
solution here would be to have some kind of root that we could generate
schemas for in one go, but I'm not sure if that's compatible with how we
generate the protocol today.
2025-10-20 11:45:11 -07:00
Ahmed Ibrahim
049a61bcfc Auto compact at ~90% (#5292)
Users now hit a window exceeded limit and they usually don't know what
to do. This starts auto compact at ~90% of the window.
2025-10-20 11:29:49 -07:00
Thibault Sottiaux
0e08dd6055 fix: switch rate limit reset handling to timestamps (#5304)
This change ensures that we store the absolute time instead of relative
offsets of when the primary and secondary rate limits will reset.
Previously these got recalculated relative to current time, which leads
to the displayed reset times to change over time, including after doing
a codex resume.

For previously changed sessions, this will cause the reset times to not
show due to this being a breaking change:
<img width="524" height="55" alt="Screenshot 2025-10-17 at 5 14 18 PM"
src="https://github.com/user-attachments/assets/53ebd43e-da25-4fef-9c47-94a529d40265"
/>

Fixes https://github.com/openai/codex/issues/4761
2025-10-17 17:39:37 -07:00
Gabriel Peal
40fba1bb4c [MCP] Add support for resources (#5239)
This PR adds support for [MCP
resources](https://modelcontextprotocol.io/specification/2025-06-18/server/resources)
by adding three new tools for the model:
1. `list_resources`
2. `list_resource_templates`
3. `read_resource`

These 3 tools correspond to the [three primary MCP resource protocol
messages](https://modelcontextprotocol.io/specification/2025-06-18/server/resources#protocol-messages).

Example of listing and reading a GitHub resource tempalte
<img width="2984" height="804" alt="CleanShot 2025-10-15 at 17 31 10"
src="https://github.com/user-attachments/assets/89b7f215-2e2a-41c5-90dd-b932ac84a585"
/>

`/mcp` with Figma configured
<img width="2984" height="442" alt="CleanShot 2025-10-15 at 18 29 35"
src="https://github.com/user-attachments/assets/a7578080-2ed2-4c59-b9b4-d8461f90d8ee"
/>

Fixes #4956
2025-10-17 01:05:15 -04:00
Michael Bolin
995f5c3614 feat: add Vec<ParsedCommand> to ExecApprovalRequestEvent (#5222)
This adds `parsed_cmd: Vec<ParsedCommand>` to `ExecApprovalRequestEvent`
in the core protocol (`protocol/src/protocol.rs`), which is also what
this field is named on `ExecCommandBeginEvent`. Honestly, I don't love
the name (it sounds like a single command, but it is actually a list of
them), but I don't want to get distracted by a naming discussion right
now.

This also adds `parsed_cmd` to `ExecCommandApprovalParams` in
`codex-rs/app-server-protocol/src/protocol.rs`, so it will be available
via `codex app-server`, as well.

For consistency, I also updated `ExecApprovalElicitRequestParams` in
`codex-rs/mcp-server/src/exec_approval.rs` to include this field under
the name `codex_parsed_cmd`, as that struct already has a number of
special `codex_*` fields. Note this is the code for when Codex is used
as an MCP _server_ and therefore has to conform to the official spec for
an MCP elicitation type.
2025-10-15 13:58:40 -07:00
jif-oai
f98fa85b44 feat: message when stream get correctly resumed (#4988)
<img width="366" height="109" alt="Screenshot 2025-10-09 at 17 44 16"
src="https://github.com/user-attachments/assets/26bc6f60-11bc-4fc6-a1cc-430ca1203969"
/>
2025-10-10 09:07:14 +00:00
dedrisian-oai
4300236681 revert /name for now (#4978)
There was a regression where we'd read entire rollout contents if there
was no /name present.
2025-10-08 17:13:49 -07:00
dedrisian-oai
ec238a2c39 feat: Set chat name (#4974)
Set chat name with `/name` so they appear in the codex resume page:


https://github.com/user-attachments/assets/c0252bba-3a53-44c7-a740-f4690a3ad405
2025-10-08 16:35:35 -07:00
Gabriel Peal
3c5e12e2a4 [MCP] Add auth status to MCP servers (#4918)
This adds a queryable auth status for MCP servers which is useful:
1. To determine whether a streamable HTTP server supports auth or not
based on whether or not it supports RFC 8414-3.2
2. Allow us to build a better user experience on top of MCP status
2025-10-08 17:37:57 -04:00
Colin Young
b09f62a1c3 [Codex] Use Number instead of BigInt for TokenCountEvent (#4856)
Adjust to use typescript number so reduce casting and normalizing code
for VSCE since js supports up to 2^53-1
2025-10-06 18:59:37 -07:00
Ahmed Ibrahim
90ef94d3b3 Surface context window error to the client (#4675)
In the past, we were treating `input exceeded context window` as a
streaming error and retrying on it. Retrying on it has no point because
it won't change the behavior. In this PR, we surface the error to the
client without retry and also send a token count event to indicate that
the context window is full.

<img width="650" height="125" alt="image"
src="https://github.com/user-attachments/assets/c26b1213-4c27-4bfc-90f4-51a270a3efd5"
/>
2025-10-05 01:40:06 +00:00
Gabriel Peal
1d17ca1fa3 [MCP] Add support for MCP Oauth credentials (#4517)
This PR adds oauth login support to streamable http servers when
`experimental_use_rmcp_client` is enabled.

This PR is large but represents the minimal amount of work required for
this to work. To keep this PR smaller, login can only be done with
`codex mcp login` and `codex mcp logout` but it doesn't appear in `/mcp`
or `codex mcp list` yet. Fingers crossed that this is the last large MCP
PR and that subsequent PRs can be smaller.

Under the hood, credentials are stored using platform credential
managers using the [keyring crate](https://crates.io/crates/keyring).
When the keyring isn't available, it falls back to storing credentials
in `CODEX_HOME/.credentials.json` which is consistent with how other
coding agents handle authentication.

I tested this on macOS, Windows, WSL (ubuntu), and Linux. I wasn't able
to test the dbus store on linux but did verify that the fallback works.

One quirk is that if you have credentials, during development, every
build will have its own ad-hoc binary so the keyring won't recognize the
reader as being the same as the write so it may ask for the user's
password. I may add an override to disable this or allow
users/enterprises to opt-out of the keyring storage if it causes issues.

<img width="5064" height="686" alt="CleanShot 2025-09-30 at 19 31 40"
src="https://github.com/user-attachments/assets/9573f9b4-07f1-4160-83b8-2920db287e2d"
/>
<img width="745" height="486" alt="image"
src="https://github.com/user-attachments/assets/9562649b-ea5f-4f22-ace2-d0cb438b143e"
/>
2025-10-03 13:43:12 -04:00
pakrym-oai
4c566d484a Separate interactive and non-interactive sessions (#4612)
Do not show exec session in VSCode/TUI selector.
2025-10-02 13:06:21 -07:00
Jeremy Rose
45936f8fbd show "Viewed Image" when the model views an image (#4475)
<img width="1022" height="339" alt="Screenshot 2025-09-29 at 4 22 00 PM"
src="https://github.com/user-attachments/assets/12da7358-19be-4010-a71b-496ede6dfbbf"
/>
2025-10-02 18:36:03 +00:00
Michael Bolin
5881c0d6d4 fix: remove mcp-types from app server protocol (#4537)
We continue the separation between `codex app-server` and `codex
mcp-server`.

In particular, we introduce a new crate, `codex-app-server-protocol`,
and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it
`codex-rs/app-server-protocol/src/protocol.rs`.

Because `ConversationId` was defined in `mcp_protocol.rs`, we move it
into its own file, `codex-rs/protocol/src/conversation_id.rs`, and
because it is referenced in a ton of places, we have to touch a lot of
files as part of this PR.

We also decide to get away from proper JSON-RPC 2.0 semantics, so we
also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which
is basically the same `JSONRPCMessage` type defined in `mcp-types`
except with all of the `"jsonrpc": "2.0"` removed.

Getting rid of `"jsonrpc": "2.0"` makes our serialization logic
considerably simpler, as we can lean heavier on serde to serialize
directly into the wire format that we use now.
2025-10-01 02:16:26 +00:00
vishnu-oai
04c1782e52 OpenTelemetry events (#2103)
### Title

## otel

Codex can emit [OpenTelemetry](https://opentelemetry.io/) **log events**
that
describe each run: outbound API requests, streamed responses, user
input,
tool-approval decisions, and the result of every tool invocation. Export
is
**disabled by default** so local runs remain self-contained. Opt in by
adding an
`[otel]` table and choosing an exporter.

```toml
[otel]
environment = "staging"   # defaults to "dev"
exporter = "none"          # defaults to "none"; set to otlp-http or otlp-grpc to send events
log_user_prompt = false    # defaults to false; redact prompt text unless explicitly enabled
```

Codex tags every exported event with `service.name = "codex-cli"`, the
CLI
version, and an `env` attribute so downstream collectors can distinguish
dev/staging/prod traffic. Only telemetry produced inside the
`codex_otel`
crate—the events listed below—is forwarded to the exporter.

### Event catalog

Every event shares a common set of metadata fields: `event.timestamp`,
`conversation.id`, `app.version`, `auth_mode` (when available),
`user.account_id` (when available), `terminal.type`, `model`, and
`slug`.

With OTEL enabled Codex emits the following event types (in addition to
the
metadata above):

- `codex.api_request`
  - `cf_ray` (optional)
  - `attempt`
  - `duration_ms`
  - `http.response.status_code` (optional)
  - `error.message` (failures)
- `codex.sse_event`
  - `event.kind`
  - `duration_ms`
  - `error.message` (failures)
  - `input_token_count` (completion only)
  - `output_token_count` (completion only)
  - `cached_token_count` (completion only, optional)
  - `reasoning_token_count` (completion only, optional)
  - `tool_token_count` (completion only)
- `codex.user_prompt`
  - `prompt_length`
  - `prompt` (redacted unless `log_user_prompt = true`)
- `codex.tool_decision`
  - `tool_name`
  - `call_id`
- `decision` (`approved`, `approved_for_session`, `denied`, or `abort`)
  - `source` (`config` or `user`)
- `codex.tool_result`
  - `tool_name`
  - `call_id`
  - `arguments`
  - `duration_ms` (execution time for the tool)
  - `success` (`"true"` or `"false"`)
  - `output`

### Choosing an exporter

Set `otel.exporter` to control where events go:

- `none` – leaves instrumentation active but skips exporting. This is
the
  default.
- `otlp-http` – posts OTLP log records to an OTLP/HTTP collector.
Specify the
  endpoint, protocol, and headers your collector expects:

  ```toml
  [otel]
  exporter = { otlp-http = {
    endpoint = "https://otel.example.com/v1/logs",
    protocol = "binary",
    headers = { "x-otlp-api-key" = "${OTLP_TOKEN}" }
  }}
  ```

- `otlp-grpc` – streams OTLP log records over gRPC. Provide the endpoint
and any
  metadata headers:

  ```toml
  [otel]
  exporter = { otlp-grpc = {
    endpoint = "https://otel.example.com:4317",
    headers = { "x-otlp-meta" = "abc123" }
  }}
  ```

If the exporter is `none` nothing is written anywhere; otherwise you
must run or point to your
own collector. All exporters run on a background batch worker that is
flushed on
shutdown.

If you build Codex from source the OTEL crate is still behind an `otel`
feature
flag; the official prebuilt binaries ship with the feature enabled. When
the
feature is disabled the telemetry hooks become no-ops so the CLI
continues to
function without the extra dependencies.

---------

Co-authored-by: Anton Panasenko <apanasenko@openai.com>
2025-09-29 11:30:55 -07:00
Ahmed Ibrahim
cb96f4f596 Add Reset in for rate limits (#4111)
- Parse the headers
- Reorganize the struct because it's getting too long
- show the resets at in the tui

<img width="324" height="79" alt="image"
src="https://github.com/user-attachments/assets/ca15cd48-f112-4556-91ab-1e3a9bc4683d"
/>
2025-09-24 15:31:08 +00:00
Ahmed Ibrahim
8227a5ba1b Send limits when getting rate limited (#4102)
Users need visibility on rate limits when they are rate limited.
2025-09-23 22:56:34 +00:00
pakrym-oai
fdb8dadcae Add exec output-schema parameter (#4079)
Adds structured output to `exec` via the `--structured-output`
parameter.
2025-09-23 13:59:16 -07:00
pakrym-oai
0f9a796617 Use anyhow::Result in tests for error propagation (#4105) 2025-09-23 13:31:36 -07:00
Ahmed Ibrahim
dd56750612 Change headers and struct of rate limits (#4060) 2025-09-22 21:06:20 +00:00
Ahmed Ibrahim
04504d8218 Forward Rate limits to the UI (#3965)
We currently get information about rate limits in the response headers.
We want to forward them to the clients to have better transparency.
UI/UX plans have been discussed and this information is needed.
2025-09-20 21:26:16 -07:00
dedrisian-oai
62258df92f feat: /review (#3774)
Adds `/review` action in TUI

<img width="637" height="370" alt="Screenshot 2025-09-17 at 12 41 19 AM"
src="https://github.com/user-attachments/assets/b1979a6e-844a-4b97-ab20-107c185aec1d"
/>
2025-09-18 14:14:16 -07:00
pakrym-oai
d4aba772cb Switch to uuid_v7 and tighten ConversationId usage (#3819)
Make sure conversations have a timestamp.
2025-09-18 14:37:03 +00:00
dedrisian-oai
2aa84b8891 Fix EventMsg Optional (#3604) 2025-09-15 00:34:33 +00:00
Ahmed Ibrahim
bbea6bbf7e Handle resuming/forking after compact (#3533)
We need to construct the history different when compact happens. For
this, we need to just consider the history after compact and convert
compact to a response item.

This needs to change and use `build_compact_history` when this #3446 is
merged.
2025-09-14 13:23:31 +00:00
dedrisian-oai
90a0fd342f Review Mode (Core) (#3401)
## 📝 Review Mode -- Core

This PR introduces the Core implementation for Review mode:

- New op `Op::Review { prompt: String }:` spawns a child review task
with isolated context, a review‑specific system prompt, and a
`Config.review_model`.
- `EnteredReviewMode`: emitted when the child review session starts.
Every event from this point onwards reflects the review session.
- `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review
finishes or is interrupted, with optional structured findings:

```json
{
  "findings": [
    {
      "title": "<≤ 80 chars, imperative>",
      "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>",
      "confidence_score": <float 0.0-1.0>,
      "priority": <int 0-3>,
      "code_location": {
        "absolute_file_path": "<file path>",
        "line_range": {"start": <int>, "end": <int>}
      }
    }
  ],
  "overall_correctness": "patch is correct" | "patch is incorrect",
  "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>",
  "overall_confidence_score": <float 0.0-1.0>
}
```

## Questions

### Why separate out its own message history?

We want the review thread to match the training of our review models as
much as possible -- that means using a custom prompt, removing user
instructions, and starting a clean chat history.

We also want to make sure the review thread doesn't leak into the parent
thread.

### Why do this as a mode, vs. sub-agents?

1. We want review to be a synchronous task, so it's fine for now to do a
bespoke implementation.
2. We're still unclear about the final structure for sub-agents. We'd
prefer to land this quickly and then refactor into sub-agents without
rushing that implementation.
2025-09-12 23:25:10 +00:00
jif-oai
ea225df22e feat: context compaction (#3446)
## Compact feature:
1. Stops the model when the context window become too large
2. Add a user turn, asking for the model to summarize
3. Build a bridge that contains all the previous user message + the
summary. Rendered from a template
4. Start sampling again from a clean conversation with only that bridge
2025-09-12 13:07:10 -07:00
jif-oai
c6fd056aa6 feat: reasoning effort as optional (#3527)
Allow the reasoning effort to be optional
2025-09-12 12:06:33 -07:00
Michael Bolin
9bbeb75361 feat: include reasoning_effort in NewConversationResponse (#3506)
`ClientRequest::NewConversation` picks up the reasoning level from the user's defaults in `config.toml`, so it should be reported in `NewConversationResponse`.
2025-09-11 21:04:40 -07:00
Ahmed Ibrahim
674e3d3c90 Add Compact and Turn Context to the rollout items (#3444)
Adding compact and turn context to the rollout items

based on #3440
2025-09-11 18:08:51 +00:00
Ahmed Ibrahim
162e1235a8 Change forking to read the rollout from file (#3440)
This PR changes get history op to get path. Then, forking will use a
path. This will help us have one unified codepath for resuming/forking
conversations. Will also help in having rollout history in order. It
also fixes a bug where you won't see the UI when resuming after forking.
2025-09-10 17:42:54 -07:00
Eric Traut
39db113cc9 Added images to UserMessageEvent (#3400)
This PR adds an `images` field to the existing `UserMessageEvent` so we
can encode zero or more images associated with a user message. This
allows images to be restored when conversations are restored.
2025-09-10 10:18:43 -07:00
Ahmed Ibrahim
45bd5ca4b9 Move initial history to protocol (#3422)
To fix an edge case of forking then resuming

#3419
2025-09-10 10:17:24 -07:00
Michael Bolin
2a76a08a9e fix: include rollout_path in NewConversationResponse (#3352)
Adding the `rollout_path` to the `NewConversationResponse` makes it so a
client can perform subsequent operations on a `(ConversationId,
PathBuf)` pair. #3353 will introduce support for `ArchiveConversation`.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/3352).
* #3353
* __->__ #3352
2025-09-09 00:11:48 -07:00
Gabriel Peal
5eaaf307e1 Generate more typescript types and return conversation id with ConversationSummary (#3219)
This PR does multiple things that are necessary for conversation resume
to work from the extension. I wanted to make sure everything worked so
these changes wound up in one PR:
1. Generate more ts types
2. Resume rollout history files rather than create a new one every time
it is resumed so you don't see a duplicate conversation in history for
every resume. Chatted with @aibrahim-oai to verify this
3. Return conversation_id in conversation summaries
4. [Cleanup] Use serde and strong types for a lot of the rollout file
parsing
2025-09-08 17:54:47 -04:00
Justin Lebar
18330c2362 Format large numbers in a more readable way. (#2046)
- In the bottom line of the TUI, print the number of tokens to 3 sigfigs
  with an SI suffix, e.g. "1.23K".
- Elsewhere where we print a number, I figure it's worthwhile to print
  the exact number, because e.g. it's a summary of your session. Here we print
  the numbers comma-separated.
2025-09-08 21:48:48 +00:00
Gabriel Peal
c8fab51372 Use ConversationId instead of raw Uuids (#3282)
We're trying to migrate from `session_id: Uuid` to `conversation_id:
ConversationId`. Not only does this give us more type safety but it
unifies our terminology across Codex and with the implementation of
session resuming, a conversation (which can span multiple sessions) is
more appropriate.

I started this impl on https://github.com/openai/codex/pull/3219 as part
of getting resume working in the extension but it's big enough that it
should be broken out.
2025-09-07 23:22:25 -04:00
pakrym-oai
0269096229 Move token usage/context information to session level (#3221)
Move context information into the main loop so it can be used to
interrupt the loop or start auto-compaction.
2025-09-06 15:19:23 +00:00
pakrym-oai
7df9e9c664 Correctly calculate remaining context size (#3190)
We had multiple issues with context size calculation:
1. `initial_prompt_tokens` calculation based on cache size is not
reliable, cache misses might set it to much higher value. For now
hardcoded to a safer constant.
2. Input context size for GPT-5 is 272k (that's where 33% came from).

Fixes.
2025-09-04 23:34:14 +00:00
Michael Bolin
91708bb031 fix: fix serde_as annotation and verify with test (#3170)
I didn't do https://github.com/openai/codex/pull/3163 correctly the
first time: now verified with a test.
2025-09-04 10:38:00 -07:00
Michael Bolin
0a83db5512 fix: use a more efficient wire format for ExecCommandOutputDeltaEvent.chunk (#3163)
When serializing to JSON, the existing solution created an enormous
array of ints, which is far more bytes on the wire than a base64-encoded
string would be.
2025-09-04 08:21:58 -07:00
Ahmed Ibrahim
2b96f9f569 Dividing UserMsgs into categories to send it back to the tui (#3127)
This PR does the following:

- divides user msgs into 3 categories: plain, user instructions, and
environment context
- Centralizes adding user instructions and environment context to a
degree
- Improve the integration testing

Building on top of #3123

Specifically this
[comment](https://github.com/openai/codex/pull/3123#discussion_r2319885089).
We need to send the user message while ignoring the User Instructions
and Environment Context we attach.
2025-09-04 05:34:50 +00:00
Ahmed Ibrahim
f2036572b6 Replay EventMsgs from Response Items when resuming a session with history. (#3123)
### Overview

This PR introduces the following changes:
	1.	Adds a unified mechanism to convert ResponseItem into EventMsg.
2. Ensures that when a session is initialized with initial history, a
vector of EventMsg is sent along with the session configuration. This
allows clients to re-render the UI accordingly.
	3. 	Added integration testing

### Caveats

This implementation does not send every EventMsg that was previously
dispatched to clients. The excluded events fall into two categories:
	•	“Arguably” rolled-out events
Examples include tool calls and apply-patch calls. While these events
are conceptually rolled out, we currently only roll out ResponseItems.
These events are already being handled elsewhere and transformed into
EventMsg before being sent.
	•	Non-rolled-out events
Certain events such as TurnDiff, Error, and TokenCount are not rolled
out at all.

### Future Directions

At present, resuming a session involves maintaining two states:
	•	UI State
Clients can replay most of the important UI from the provided EventMsg
history.
	•	Model State
The model receives the complete session history to reconstruct its
internal state.

This design provides a solid foundation. If, in the future, more precise
UI reconstruction is needed, we have two potential paths:
1. Introduce a third data structure that allows us to derive both
ResponseItems and EventMsgs.
2. Clearly divide responsibilities: the core system ensures the
integrity of the model state, while clients are responsible for
reconstructing the UI.
2025-09-04 04:47:00 +00:00
Jeremy Rose
e442ecedab rework message styling (#2877)
https://github.com/user-attachments/assets/cf07f62b-1895-44bb-b9c3-7a12032eb371
2025-09-02 17:29:58 +00:00
Ahmed Ibrahim
9dbe7284d2 Following up on #2371 post commit feedback (#2852)
- Introduce websearch end to complement the begin 
- Moves the logic of adding the sebsearch tool to
create_tools_json_for_responses_api
- Making it the client responsibility to toggle the tool on or off 
- Other misc in #2371 post commit feedback
- Show the query:

<img width="1392" height="151" alt="image"
src="https://github.com/user-attachments/assets/8457f1a6-f851-44cf-bcca-0d4fe460ce89"
/>
2025-08-28 19:24:38 -07:00
dedrisian-oai
b8e8454b3f Custom /prompts (#2696)
Adds custom `/prompts` to `~/.codex/prompts/<command>.md`.

<img width="239" height="107" alt="Screenshot 2025-08-25 at 6 22 42 PM"
src="https://github.com/user-attachments/assets/fe6ebbaa-1bf6-49d3-95f9-fdc53b752679"
/>

---

Details:

1. Adds `Op::ListCustomPrompts` to core.
2. Returns `ListCustomPromptsResponse` with list of `CustomPrompt`
(name, content).
3. TUI calls the operation on load, and populates the custom prompts
(excluding prompts that collide with builtins).
4. Selecting the custom prompt automatically sends the prompt to the
agent.
2025-08-29 02:16:39 +00:00
Ahmed Ibrahim
d0e06f74e2 send context window with task started (#2752)
- Send context window with task started
- Accounting for changing the model per turn
2025-08-27 00:04:21 -07:00
Reuben Narad
363636f5eb Add web search tool (#2371)
Adds web_search tool, enabling the model to use Responses API web_search
tool.
- Disabled by default, enabled by --search flag
- When --search is passed, exposes web_search_request function tool to
the model, which triggers user approval. When approved, the model can
use the web_search tool for the remainder of the turn
<img width="1033" height="294" alt="image"
src="https://github.com/user-attachments/assets/62ac6563-b946-465c-ba5d-9325af28b28f"
/>

---------

Co-authored-by: easong-openai <easong@openai.com>
2025-08-23 22:58:56 -07:00