valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
pakrym-oai	aecbe0f333	Add helper for response created SSE events in tests (#4758 ) ## Summary - add a reusable `ev_response_created` helper that builds `response.created` SSE events for integration tests - update the exec and core integration suites to use the new helper instead of repeating manual JSON literals - keep the streaming fixtures consistent by relying on the shared helper in every touched test ## Testing - `just fmt` ------ https://chatgpt.com/codex/tasks/task_i_68e1fe885bb883208aafffb94218da61	2025-10-05 21:11:43 +00:00
Dylan	4764fc1ee7	feat: Freeform apply_patch with simple shell output (#4718 ) ## Summary This PR is an alternative approach to #4711, but instead of changing our storage, parses out shell calls in the client and reserializes them on the fly before we send them out as part of the request. What this changes: 1. Adds additional serialization logic when the ApplyPatchToolType::Freeform is in use. 2. Adds a --custom-apply-patch flag to enable this setting on a session-by-session basis. This change is delicate, but is not meant to be permanent. It is meant to be the first step in a migration: 1. (This PR) Add in-flight serialization with config 2. Update model_family default 3. Update serialization logic to store turn outputs in a structured format, with logic to serialize based on model_family setting. 4. Remove this rewrite in-flight logic. ## Test Plan - [x] Additional unit tests added - [x] Integration tests added - [x] Tested locally	2025-10-04 19:16:36 -07:00
Fouad Matin	a5b7675e42	add(core): managed config (#3868 ) ## Summary - Factor `load_config_as_toml` into `core::config_loader` so config loading is reusable across callers. - Layer `~/.codex/config.toml`, optional `~/.codex/managed_config.toml`, and macOS managed preferences (base64) with recursive table merging and scoped threads per source. ## Config Flow ``` Managed prefs (macOS profile: com.openai.codex/config_toml_base64) ▲ │ ~/.codex/managed_config.toml │ (optional file-based override) ▲ │ ~/.codex/config.toml (user-defined settings) ``` - The loader searches under the resolved `CODEX_HOME` directory (defaults to `~/.codex`). - Managed configs let administrators ship fleet-wide overrides via device profiles which is useful for enforcing certain settings like sandbox or approval defaults. - For nested hash tables: overlays merge recursively. Child tables are merged key-by-key, while scalar or array values replace the prior layer entirely. This lets admins add or tweak individual fields without clobbering unrelated user settings.	2025-10-03 13:02:26 -07:00
Michael Bolin	042d4d55d9	feat: `codex exec` writes only the final message to stdout (#4644 ) This updates `codex exec` so that, by default, most of the agent's activity is written to stderr so that only the final agent message is written to stdout. This makes it easier to pipe `codex exec` into another tool without extra filtering. I introduced `#![deny(clippy::print_stdout)]` to help enforce this change and renamed the `ts_println!()` macro to `ts_msg()` because (1) it no longer calls `println!()` and (2), `ts_eprintln!()` seemed too long of a name. While here, this also adds `-o` as an alias for `--output-last-message`. Fixes https://github.com/openai/codex/issues/1670	2025-10-03 16:22:12 +00:00
jif-oai	33d3ecbccc	chore: refactor tool handling (#4510 ) # Tool System Refactor - Centralizes tool definitions and execution in `core/src/tools/`: specs (`spec.rs`), handlers (`handlers/`), router (`router.rs`), registry/dispatch (`registry.rs`), and shared context (`context.rs`). One registry now builds the model-visible tool list and binds handlers. - Router converts model responses to tool calls; Registry dispatches with consistent telemetry via `codex-rs/otel` and unified error handling. Function, Local Shell, MCP, and experimental `unified_exec` all flow through this path; legacy shell aliases still work. - Rationale: reduce per‑tool boilerplate, keep spec/handler in sync, and make adding tools predictable and testable. Example: `read_file` - Spec: `core/src/tools/spec.rs` (see `create_read_file_tool`, registered by `build_specs`). - Handler: `core/src/tools/handlers/read_file.rs` (absolute `file_path`, 1‑indexed `offset`, `limit`, `L#: ` prefixes, safe truncation). - E2E test: `core/tests/suite/read_file.rs` validates the tool returns the requested lines. ## Next steps: - Decompose `handle_container_exec_with_params` - Add parallel tool calls	2025-10-03 13:21:06 +01:00
pakrym-oai	1d94b9111c	Use supports_color in codex exec (#4633 ) It knows how to detect github actions	2025-10-03 01:15:03 +00:00
pakrym-oai	c405d8c06c	Rename assistant message to agent message and fix item type field naming (#4610 ) Naming cleanup	2025-10-02 15:07:14 -07:00
pakrym-oai	f895d4cbb3	Minor cleanup of codex exec output (#4585 ) <img width="850" height="723" alt="image" src="https://github.com/user-attachments/assets/2ae067bf-ba6b-47bf-9ffe-d1c3f3aa1870" /> <img width="872" height="547" alt="image" src="https://github.com/user-attachments/assets/9058be24-6513-4423-9dae-2d5fd4cbf162" />	2025-10-02 14:17:42 -07:00
pakrym-oai	4c566d484a	Separate interactive and non-interactive sessions (#4612 ) Do not show exec session in VSCode/TUI selector.	2025-10-02 13:06:21 -07:00
Jeremy Rose	45936f8fbd	show "Viewed Image" when the model views an image (#4475 ) <img width="1022" height="339" alt="Screenshot 2025-09-29 at 4 22 00 PM" src="https://github.com/user-attachments/assets/12da7358-19be-4010-a71b-496ede6dfbbf" />	2025-10-02 18:36:03 +00:00
pakrym-oai	2f6fb37d72	Support CODEX_API_KEY for codex exec (#4615 ) Allows to set API key per invocation of `codex exec`	2025-10-02 09:59:45 -07:00
pakrym-oai	31102af54b	Add initial set of doc comments to the SDK (#4513 ) Also perform minor code cleanup.	2025-10-01 13:12:59 -07:00
Michael Bolin	5881c0d6d4	fix: remove mcp-types from app server protocol (#4537 ) We continue the separation between `codex app-server` and `codex mcp-server`. In particular, we introduce a new crate, `codex-app-server-protocol`, and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it `codex-rs/app-server-protocol/src/protocol.rs`. Because `ConversationId` was defined in `mcp_protocol.rs`, we move it into its own file, `codex-rs/protocol/src/conversation_id.rs`, and because it is referenced in a ton of places, we have to touch a lot of files as part of this PR. We also decide to get away from proper JSON-RPC 2.0 semantics, so we also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which is basically the same `JSONRPCMessage` type defined in `mcp-types` except with all of the `"jsonrpc": "2.0"` removed. Getting rid of `"jsonrpc": "2.0"` makes our serialization logic considerably simpler, as we can lean heavier on serde to serialize directly into the wire format that we use now.	2025-10-01 02:16:26 +00:00
pakrym-oai	7fc3edf8a7	Remove legacy codex exec --json format (#4525 ) `codex exec --json` now maps to the behavior of `codex exec --experimental-json` with new event and item shapes. Thread events: - thread.started - turn.started - turn.completed - turn.failed - item.started - item.updated - item.completed Item types: - assistant_message - reasoning - command_execution - file_change - mcp_tool_call - web_search - todo_list - error Sample output: <details> `codex exec "list my assigned github issues" --json \| jq` ``` { "type": "thread.started", "thread_id": "01999ce5-f229-7661-8570-53312bd47ea3" } { "type": "turn.started" } { "type": "item.completed", "item": { "id": "item_0", "item_type": "reasoning", "text": "Planning to list assigned GitHub issues" } } { "type": "item.started", "item": { "id": "item_1", "item_type": "mcp_tool_call", "server": "github", "tool": "search_issues", "status": "in_progress" } } { "type": "item.completed", "item": { "id": "item_1", "item_type": "mcp_tool_call", "server": "github", "tool": "search_issues", "status": "completed" } } { "type": "item.completed", "item": { "id": "item_2", "item_type": "reasoning", "text": "Organizing final message structure" } } { "type": "item.completed", "item": { "id": "item_3", "item_type": "assistant_message", "text": "Assigned Issues\n- openai/codex#3267 – “stream error: stream disconnected before completion…” (bug) – last update 2025-09-08\n- openai/codex#3257 – “You've hit your usage limit. Try again in 4 days 20 hours 9 minutes.” – last update 2025-09-23\n- openai/codex#3054 – “reqwest SSL panic (library has no ciphers)” (bug) – last update 2025-09-03\n- openai/codex#3051 – “thread 'main' panicked at linux-sandbox/src/linux_run_main.rs:53:5:” (bug) – last update 2025-09-10\n- openai/codex#3004 – “Auto-compact when approaching context limit” (enhancement) – last update 2025-09-26\n- openai/codex#2916 – “Feature request: Add OpenAI service tier support for cost optimization” – last update 2025-09-12\n- openai/codex#1581 – “stream error: stream disconnected before completion: stream closed before response.complete; retrying...” (bug) – last update 2025-09-17" } } { "type": "turn.completed", "usage": { "input_tokens": 34785, "cached_input_tokens": 12544, "output_tokens": 560 } } ``` </details>	2025-09-30 17:21:37 -07:00
pakrym-oai	a534356fe1	Wire up web search item (#4511 ) Add handling for web search events.	2025-09-30 12:01:17 -07:00
pakrym-oai	c09e131653	Set originator for codex exec (#4485 ) Distinct from the main CLI.	2025-09-29 20:59:19 -07:00
pakrym-oai	ea82f86662	Rename conversation to thread in codex exec (#4482 )	2025-09-29 20:18:30 -07:00
pakrym-oai	a8edc57740	Add MCP tool call item to codex exec (#4481 ) No arguments/results for now. ``` { "type": "item.started", "item": { "id": "item_1", "item_type": "mcp_tool_call", "server": "github", "tool": "search_issues", "status": "in_progress" } } { "type": "item.completed", "item": { "id": "item_1", "item_type": "mcp_tool_call", "server": "github", "tool": "search_issues", "status": "completed" } } ```	2025-09-29 19:45:11 -07:00
pakrym-oai	4a80059b1b	Add turn.failed and rename session created to thread started (#4478 ) Don't produce completed when turn failed.	2025-09-29 18:38:04 -07:00
vishnu-oai	04c1782e52	OpenTelemetry events (#2103 ) ### Title ## otel Codex can emit [OpenTelemetry](https://opentelemetry.io/) log events that describe each run: outbound API requests, streamed responses, user input, tool-approval decisions, and the result of every tool invocation. Export is disabled by default so local runs remain self-contained. Opt in by adding an `[otel]` table and choosing an exporter. ```toml [otel] environment = "staging" # defaults to "dev" exporter = "none" # defaults to "none"; set to otlp-http or otlp-grpc to send events log_user_prompt = false # defaults to false; redact prompt text unless explicitly enabled ``` Codex tags every exported event with `service.name = "codex-cli"`, the CLI version, and an `env` attribute so downstream collectors can distinguish dev/staging/prod traffic. Only telemetry produced inside the `codex_otel` crate—the events listed below—is forwarded to the exporter. ### Event catalog Every event shares a common set of metadata fields: `event.timestamp`, `conversation.id`, `app.version`, `auth_mode` (when available), `user.account_id` (when available), `terminal.type`, `model`, and `slug`. With OTEL enabled Codex emits the following event types (in addition to the metadata above): - `codex.api_request` - `cf_ray` (optional) - `attempt` - `duration_ms` - `http.response.status_code` (optional) - `error.message` (failures) - `codex.sse_event` - `event.kind` - `duration_ms` - `error.message` (failures) - `input_token_count` (completion only) - `output_token_count` (completion only) - `cached_token_count` (completion only, optional) - `reasoning_token_count` (completion only, optional) - `tool_token_count` (completion only) - `codex.user_prompt` - `prompt_length` - `prompt` (redacted unless `log_user_prompt = true`) - `codex.tool_decision` - `tool_name` - `call_id` - `decision` (`approved`, `approved_for_session`, `denied`, or `abort`) - `source` (`config` or `user`) - `codex.tool_result` - `tool_name` - `call_id` - `arguments` - `duration_ms` (execution time for the tool) - `success` (`"true"` or `"false"`) - `output` ### Choosing an exporter Set `otel.exporter` to control where events go: - `none` – leaves instrumentation active but skips exporting. This is the default. - `otlp-http` – posts OTLP log records to an OTLP/HTTP collector. Specify the endpoint, protocol, and headers your collector expects: ```toml [otel] exporter = { otlp-http = { endpoint = "https://otel.example.com/v1/logs", protocol = "binary", headers = { "x-otlp-api-key" = "${OTLP_TOKEN}" } }} ``` - `otlp-grpc` – streams OTLP log records over gRPC. Provide the endpoint and any metadata headers: ```toml [otel] exporter = { otlp-grpc = { endpoint = "https://otel.example.com:4317", headers = { "x-otlp-meta" = "abc123" } }} ``` If the exporter is `none` nothing is written anywhere; otherwise you must run or point to your own collector. All exporters run on a background batch worker that is flushed on shutdown. If you build Codex from source the OTEL crate is still behind an `otel` feature flag; the official prebuilt binaries ship with the feature enabled. When the feature is disabled the telemetry hooks become no-ops so the CLI continues to function without the extra dependencies. --------- Co-authored-by: Anton Panasenko <apanasenko@openai.com>	2025-09-29 11:30:55 -07:00
pakrym-oai	cc1b21e47f	Add turn started/completed events and correct exit code on error (#4309 ) Adds new event for session completed that includes usage. Also ensures we return 1 on failures. ``` { "type": "session.created", "session_id": "019987a7-93e7-7b20-9e05-e90060e411ea" } { "type": "turn.started" } ... { "type": "turn.completed", "usage": { "input_tokens": 78913, "cached_input_tokens": 65280, "output_tokens": 1099 } } ```	2025-09-26 16:21:50 -07:00
pakrym-oai	ea095e30c1	Add todo-list tool support (#4255 ) Adds a 1-per-turn todo-list item and item.updated event ```jsonl {"type":"item.started","item":{"id":"item_6","item_type":"todo_list","items":[{"text":"Record initial two-step plan now","completed":false},{"text":"Update progress to next step","completed":false}]}} {"type":"item.updated","item":{"id":"item_6","item_type":"todo_list","items":[{"text":"Record initial two-step plan now","completed":true},{"text":"Update progress to next step","completed":false}]}} {"type":"item.completed","item":{"id":"item_6","item_type":"todo_list","items":[{"text":"Record initial two-step plan now","completed":true},{"text":"Update progress to next step","completed":false}]}} ```	2025-09-26 09:35:47 -07:00
pakrym-oai	8e3a048fec	Add codex exec testing helpers (#4254 ) Add a shortcut to create working directories and run codex exec with fake server.	2025-09-25 17:12:45 -07:00
pakrym-oai	67aab04c66	[codex exec] Add item.started and support it for command execution (#4250 ) Adds a new `item.started` event to `codex exec` and implements it for command_execution item type. ```jsonl {"type":"session.created","session_id":"019982d1-75f0-7920-b051-e0d3731a5ed8"} {"type":"item.completed","item":{"id":"item_0","item_type":"reasoning","text":"Executing commands securely\n\nI'm thinking about how the default harness typically uses \"bash -lc,\" while historically \"bash\" is what we've been using. The command should be executed as a string in our CLI, so using \"bash -lc 'echo hello'\" is optimal but calling \"echo hello\" directly feels safer. The sandbox makes sure environment variables like CODEX_SANDBOX_NETWORK_DISABLED=1 are set, so I won't ask for approval. I just need to run \"echo hello\" and correctly present the output."}} {"type":"item.completed","item":{"id":"item_1","item_type":"reasoning","text":"Preparing for tool calls\n\nI realize that I need to include a preamble before making any tool calls. So, I'll first state the preamble in the commentary channel, then proceed with the tool call. After that, I need to present the final message along with the output. It's possible that the CLI will show the output inline, but I must ensure that I present the result clearly regardless. Let's move forward and get this organized!"}} {"type":"item.completed","item":{"id":"item_2","item_type":"assistant_message","text":"Running `echo` to confirm shell access and print output."}} {"type":"item.started","item":{"id":"item_3","item_type":"command_execution","command":"bash -lc echo hello","aggregated_output":"","exit_code":null,"status":"in_progress"}} {"type":"item.completed","item":{"id":"item_3","item_type":"command_execution","command":"bash -lc echo hello","aggregated_output":"hello\n","exit_code":0,"status":"completed"}} {"type":"item.completed","item":{"id":"item_4","item_type":"assistant_message","text":"hello"}} ```	2025-09-25 22:25:02 +00:00
Jeremy Rose	4a5f05c136	make tests pass cleanly in sandbox (#4067 ) This changes the reqwest client used in tests to be sandbox-friendly, and skips a bunch of other tests that don't work inside the sandbox/without network.	2025-09-25 13:11:14 -07:00
pakrym-oai	344d4a1d68	Add explicit codex exec events (#4177 ) This pull request add a new experimental format of JSON output. You can try it using `codex exec --experimental-json`. Design takes a lot of inspiration from Responses API items and stream format. # Session and items Each invocation of `codex exec` starts or resumes a session. Session contains multiple high-level item types: 1. Assistant message 2. Assistant thinking 3. Command execution 4. File changes 5. To-do lists 6. etc. # Events Session and items are going through their life cycles which is represented by events. Session is `session.created` or `session.resumed` Items are `item.added`, `item.updated`, `item.completed`, `item.require_approval` (or other item types like `item.output_delta` when we need streaming). So a typical session can look like: <details> ``` { "type": "session.created", "session_id": "01997dac-9581-7de3-b6a0-1df8256f2752" } { "type": "item.completed", "item": { "id": "itm_0", "item_type": "assistant_message", "text": "I’ll locate the top-level README and remove its first line. Then I’ll show a quick summary of what changed." } } { "type": "item.completed", "item": { "id": "itm_1", "item_type": "command_execution", "command": "bash -lc ls -la \| sed -n '1,200p'", "aggregated_output": "pyenv: cannot rehash: /Users/pakrym/.pyenv/shims isn't writable\ntotal 192\ndrwxr-xr-x@ 33 pakrym staff 1056 Sep 24 14:36 .\ndrwxr-xr-x 41 pakrym staff 1312 Sep 24 09:17 ..\n-rw-r--r--@ 1 pakrym staff 6 Jul 9 16:16 .codespellignore\n-rw-r--r--@ 1 pakrym staff 258 Aug 13 09:40 .codespellrc\ndrwxr-xr-x@ 5 pakrym staff 160 Jul 23 08:26 .devcontainer\n-rw-r--r--@ 1 pakrym staff 6148 Jul 22 10:03 .DS_Store\ndrwxr-xr-x@ 15 pakrym staff 480 Sep 24 14:38 .git\ndrwxr-xr-x@ 12 pakrym staff 384 Sep 2 16:00 .github\n-rw-r--r--@ 1 pakrym staff 778 Jul 9 16:16 .gitignore\ndrwxr-xr-x@ 3 pakrym staff 96 Aug 11 09:37 .husky\n-rw-r--r--@ 1 pakrym staff 104 Jul 9 16:16 .npmrc\n-rw-r--r--@ 1 pakrym staff 96 Sep 2 08:52 .prettierignore\n-rw-r--r--@ 1 pakrym staff 170 Jul 9 16:16 .prettierrc.toml\ndrwxr-xr-x@ 5 pakrym staff 160 Sep 14 17:43 .vscode\ndrwxr-xr-x@ 2 pakrym staff 64 Sep 11 11:37 2025-09-11\n-rw-r--r--@ 1 pakrym staff 5505 Sep 18 09:28 AGENTS.md\n-rw-r--r--@ 1 pakrym staff 92 Sep 2 08:52 CHANGELOG.md\n-rw-r--r--@ 1 pakrym staff 1145 Jul 9 16:16 cliff.toml\ndrwxr-xr-x@ 11 pakrym staff 352 Sep 24 13:03 codex-cli\ndrwxr-xr-x@ 38 pakrym staff 1216 Sep 24 14:38 codex-rs\ndrwxr-xr-x@ 18 pakrym staff 576 Sep 23 11:01 docs\n-rw-r--r--@ 1 pakrym staff 2038 Jul 9 16:16 flake.lock\n-rw-r--r--@ 1 pakrym staff 1434 Jul 9 16:16 flake.nix\n-rw-r--r--@ 1 pakrym staff 10926 Jul 9 16:16 LICENSE\ndrwxr-xr-x@ 465 pakrym staff 14880 Jul 15 07:36 node_modules\n-rw-r--r--@ 1 pakrym staff 242 Aug 5 08:25 NOTICE\n-rw-r--r--@ 1 pakrym staff 578 Aug 14 12:31 package.json\n-rw-r--r--@ 1 pakrym staff 498 Aug 11 09:37 pnpm-lock.yaml\n-rw-r--r--@ 1 pakrym staff 58 Aug 11 09:37 pnpm-workspace.yaml\n-rw-r--r--@ 1 pakrym staff 2402 Jul 9 16:16 PNPM.md\n-rw-r--r--@ 1 pakrym staff 4393 Sep 12 14:36 README.md\ndrwxr-xr-x@ 4 pakrym staff 128 Sep 18 09:28 scripts\ndrwxr-xr-x@ 2 pakrym staff 64 Sep 11 11:34 tmp\n", "exit_code": 0, "status": "completed" } } { "type": "item.completed", "item": { "id": "itm_2", "item_type": "reasoning", "text": "Reviewing README.md file\n\nI've located the README.md file at the root, and it’s 4393 bytes. Now, I need to remove the first line, but first, I should check its content to make sure I’m patching it correctly. I’ll use sed to display the first 20 lines. By reviewing those lines, I can determine exactly what needs to be removed before I proceed with the editing. Let's do this carefully!" } } { "type": "item.completed", "item": { "id": "itm_3", "item_type": "command_execution", "command": "bash -lc sed -n '1,40p' README.md", "aggregated_output": "<h1 align=\"center\">OpenAI Codex CLI</h1>\n\n<p align=\"center\"><code>npm i -g @openai/codex</code><br />or <code>brew install codex</code></p>\n\n<p align=\"center\"><strong>Codex CLI</strong> is a coding agent from OpenAI that runs locally on your computer.\n</br>\n</br>If you want Codex in your code editor (VS Code, Cursor, Windsurf), <a href=\"https://developers.openai.com/codex/ide\">install in your IDE</a>\n</br>If you are looking for the <em>cloud-based agent</em> from OpenAI, <strong>Codex Web</strong>, go to <a href=\"https://chatgpt.com/codex\">chatgpt.com/codex</a></p>\n\n<p align=\"center\">\n <img src=\"./.github/codex-cli-splash.png\" alt=\"Codex CLI splash\" width=\"80%\" />\n </p>\n\n---\n\n## Quickstart\n\n### Installing and running Codex CLI\n\nInstall globally with your preferred package manager. If you use npm:\n\n```shell\nnpm install -g @openai/codex\n```\n\nAlternatively, if you use Homebrew:\n\n```shell\nbrew install codex\n```\n\nThen simply run `codex` to get started:\n\n```shell\ncodex\n```\n\n<details>\n<summary>You can also go to the <a href=\"https://github.com/openai/codex/releases/latest\">latest GitHub Release</a> and download the appropriate binary for your platform.</summary>\n\n", "exit_code": 0, "status": "completed" } } { "type": "item.completed", "item": { "id": "itm_4", "item_type": "reasoning", "text": "Updating README file\n\nThe user wants to update the README by removing the first line, which is an HTML h1 tag. I'll proceed to delete that line while keeping the rest intact. To do this, I need to use the apply_patch tool and create a patch in the diff format. I’ll take care to include context lines around the change. After removing the header, I’m considering whether to delete the following blank line as well to keep it tidy." } } { "type": "item.completed", "item": { "id": "itm_5", "item_type": "reasoning", "text": "Crafting the patch\n\nThe user has asked to remove just the first line, without mentioning the blank lines. Technically, leaving a blank line at the start is acceptable, but it's not ideal. Ideally, I'd prefer to remove the first line while maintaining the subsequent blank line. However, if I can also remove that blank line, it could tidy things up even more. Still, since the user's request is strictly to remove only the first line, I'll stick to that and craft the patch accordingly." } } { "type": "item.completed", "item": { "id": "itm_6", "item_type": "assistant_message", "text": "Applying the change by removing the first line from `README.md`." } } { "type": "item.completed", "item": { "id": "itm_7", "item_type": "file_change", "changes": [ { "path": "/Users/pakrym/code/codex/README.md", "kind": "update" } ], "status": "completed" } } { "type": "item.completed", "item": { "id": "itm_8", "item_type": "reasoning", "text": "Validating README Update\n\nI updated the README.md and now I need to validate that the first line has been successfully removed. To do this, I'll check the first 10 lines of the document to ensure everything looks correct. It’s always good to double-check these kinds of changes, so I’m going to take that extra step to make sure the update was executed properly. Let's see what those initial lines say!" } } { "type": "item.completed", "item": { "id": "itm_9", "item_type": "command_execution", "command": "bash -lc sed -n '1,20p' README.md", "aggregated_output": "\n<p align=\"center\"><code>npm i -g @openai/codex</code><br />or <code>brew install codex</code></p>\n\n<p align=\"center\"><strong>Codex CLI</strong> is a coding agent from OpenAI that runs locally on your computer.\n</br>\n</br>If you want Codex in your code editor (VS Code, Cursor, Windsurf), <a href=\"https://developers.openai.com/codex/ide\">install in your IDE</a>\n</br>If you are looking for the <em>cloud-based agent</em> from OpenAI, <strong>Codex Web</strong>, go to <a href=\"https://chatgpt.com/codex\">chatgpt.com/codex</a></p>\n\n<p align=\"center\">\n <img src=\"./.github/codex-cli-splash.png\" alt=\"Codex CLI splash\" width=\"80%\" />\n </p>\n\n---\n\n## Quickstart\n\n### Installing and running Codex CLI\n\nInstall globally with your preferred package manager. If you use npm:\n\n", "exit_code": 0, "status": "completed" } } { "type": "item.completed", "item": { "id": "itm_10", "item_type": "assistant_message", "text": "- Updated `README.md:1` by removing the opening H1 line.\n- The README now starts with the install instructions paragraph.\n\nWant me to also remove the leading blank line at the top?" } } ``` </details> The idea is to give users fully formatted items they can use directly in their rendering/application logic and avoid having them building up items manually based on events (unless they want to for streaming). This PR implements only the `item.completed` payload for some event types, more event types and item types to come. --------- Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-09-25 17:47:09 +00:00
Dylan	594248f415	[exec] add include-plan-tool flag and print it nicely (#3461 ) ### Summary Sometimes in exec runs, we want to allow the model to use the `update_plan` tool, but that's not easily configurable. This change adds a feature flag for this, and formats the output so it's human-readable ## Test Plan <img width="1280" height="354" alt="Screenshot 2025-09-11 at 12 39 44 AM" src="https://github.com/user-attachments/assets/72e11070-fb98-47f5-a784-5123ca7333d9" />	2025-09-23 16:50:59 -07:00
pakrym-oai	fdb8dadcae	Add exec output-schema parameter (#4079 ) Adds structured output to `exec` via the `--structured-output` parameter.	2025-09-23 13:59:16 -07:00
jif-oai	be366a31ab	chore: clippy on redundant closure (#4058 ) Add redundant closure clippy rules and let Codex fix it by minimising FQP	2025-09-22 19:30:16 +00:00
jif-oai	e5fe50d3ce	chore: unify cargo versions (#4044 ) Unify cargo versions at root	2025-09-22 16:47:01 +00:00
pakrym-oai	14a115d488	Add non_sandbox_test helper (#3880 ) Makes tests shorter	2025-09-22 14:50:41 +00:00
pakrym-oai	9b18875a42	Use helpers instead of fixtures (#3888 ) Move to using test helper method everywhere.	2025-09-19 06:46:25 -07:00
Michael Bolin	8595237505	fix: ensure cwd for conversation and sandbox are separate concerns (#3874 ) Previous to this PR, both of these functions take a single `cwd`: `71038381aa/codex-rs/core/src/seatbelt.rs (L19-L25)` `71038381aa/codex-rs/core/src/landlock.rs (L16-L23)` whereas `cwd` and `sandbox_cwd` should be set independently (fixed in this PR). Added `sandbox_distinguishes_command_and_policy_cwds()` to `codex-rs/exec/tests/suite/sandbox.rs` to verify this.	2025-09-18 14:37:06 -07:00
dedrisian-oai	62258df92f	feat: /review (#3774 ) Adds `/review` action in TUI <img width="637" height="370" alt="Screenshot 2025-09-17 at 12 41 19 AM" src="https://github.com/user-attachments/assets/b1979a6e-844a-4b97-ab20-107c185aec1d" />	2025-09-18 14:14:16 -07:00
dependabot[bot]	fdf4a68646	chore(deps): bump tracing-subscriber from 0.3.19 to 0.3.20 in /codex-rs (#3620 ) Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from 0.3.19 to 0.3.20. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/tokio-rs/tracing/releases">tracing-subscriber's releases</a>.</em></p> <blockquote> <h2>tracing-subscriber 0.3.20</h2> <p><strong>Security Fix</strong>: ANSI Escape Sequence Injection (CVE-TBD)</p> <h2>Impact</h2> <p>Previous versions of tracing-subscriber were vulnerable to ANSI escape sequence injection attacks. Untrusted user input containing ANSI escape sequences could be injected into terminal output when logged, potentially allowing attackers to:</p> <ul> <li>Manipulate terminal title bars</li> <li>Clear screens or modify terminal display</li> <li>Potentially mislead users through terminal manipulation</li> </ul> <p>In isolation, impact is minimal, however security issues have been found in terminal emulators that enabled an attacker to use ANSI escape sequences via logs to exploit vulnerabilities in the terminal emulator.</p> <h2>Solution</h2> <p>Version 0.3.20 fixes this vulnerability by escaping ANSI control characters in when writing events to destinations that may be printed to the terminal.</p> <h2>Affected Versions</h2> <p>All versions of tracing-subscriber prior to 0.3.20 are affected by this vulnerability.</p> <h2>Recommendations</h2> <p>Immediate Action Required: We recommend upgrading to tracing-subscriber 0.3.20 immediately, especially if your application:</p> <ul> <li>Logs user-provided input (form data, HTTP headers, query parameters, etc.)</li> <li>Runs in environments where terminal output is displayed to users</li> </ul> <h2>Migration</h2> <p>This is a patch release with no breaking API changes. Simply update your Cargo.toml:</p> <pre lang="toml"><code>[dependencies] tracing-subscriber = "0.3.20" </code></pre> <h2>Acknowledgments</h2> <p>We would like to thank <a href="http://github.com/zefr0x">zefr0x</a> who responsibly reported the issue at <code>security@tokio.rs</code>.</p> <p>If you believe you have found a security vulnerability in any tokio-rs project, please email us at <code>security@tokio.rs</code>.</p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`4c52ca5266`"><code>4c52ca5</code></a> fmt: fix ANSI escape sequence injection vulnerability (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3368">#3368</a>)</li> <li><a href="`f71cebe41e`"><code>f71cebe</code></a> subscriber: impl Clone for EnvFilter (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3360">#3360</a>)</li> <li><a href="`3a1f571102`"><code>3a1f571</code></a> Fix CI (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3361">#3361</a>)</li> <li><a href="`e63ef57f3d`"><code>e63ef57</code></a> chore: prepare tracing-attributes 0.1.30 (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3316">#3316</a>)</li> <li><a href="`6e59a13b1a`"><code>6e59a13</code></a> attributes: fix tracing::instrument regression around shadowing (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3311">#3311</a>)</li> <li><a href="`e4df761275`"><code>e4df761</code></a> tracing: update core to 0.1.34 and attributes to 0.1.29 (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3305">#3305</a>)</li> <li><a href="`643f392ebb`"><code>643f392</code></a> chore: prepare tracing-attributes 0.1.29 (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3304">#3304</a>)</li> <li><a href="`d08e7a6eea`"><code>d08e7a6</code></a> chore: prepare tracing-core 0.1.34 (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3302">#3302</a>)</li> <li><a href="`6e70c571d3`"><code>6e70c57</code></a> tracing-subscriber: count numbers of enters in <code>Timings</code> (<a href="https://redirect.github.com/tokio-rs/tracing/issues/2944">#2944</a>)</li> <li><a href="`c01d4fd9de`"><code>c01d4fd</code></a> fix docs and enable CI on <code>main</code> branch (<a href="https://redirect.github.com/tokio-rs/tracing/issues/3295">#3295</a>)</li> <li>Additional commits viewable in <a href="https://github.com/tokio-rs/tracing/compare/tracing-subscriber-0.3.19...tracing-subscriber-0.3.20">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=tracing-subscriber&package-manager=cargo&previous-version=0.3.19&new-version=0.3.20)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-09-15 00:51:33 -07:00
Ahmed Ibrahim	a30e5e40ee	enable-resume (#3537 ) Adding the ability to resume conversations. we have one verb `resume`. Behavior: `tui`: `codex resume`: opens session picker `codex resume --last`: continue last message `codex resume <session id>`: continue conversation with `session id` `exec`: `codex resume --last`: continue last conversation `codex resume <session id>`: continue conversation with `session id` Implementation: - I added a function to find the path in `~/.codex/sessions/` with a `UUID`. This is helpful in resuming with session id. - Added the above mentioned flags - Added lots of testing	2025-09-14 19:33:19 -04:00
dedrisian-oai	90a0fd342f	Review Mode (Core) (#3401 ) ## 📝 Review Mode -- Core This PR introduces the Core implementation for Review mode: - New op `Op::Review { prompt: String }:` spawns a child review task with isolated context, a review‑specific system prompt, and a `Config.review_model`. - `EnteredReviewMode`: emitted when the child review session starts. Every event from this point onwards reflects the review session. - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review finishes or is interrupted, with optional structured findings: ```json { "findings": [ { "title": "<≤ 80 chars, imperative>", "body": "<valid Markdown explaining why this is a problem; cite files/lines/functions>", "confidence_score": <float 0.0-1.0>, "priority": <int 0-3>, "code_location": { "absolute_file_path": "<file path>", "line_range": {"start": <int>, "end": <int>} } } ], "overall_correctness": "patch is correct" \| "patch is incorrect", "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>", "overall_confidence_score": <float 0.0-1.0> } ``` ## Questions ### Why separate out its own message history? We want the review thread to match the training of our review models as much as possible -- that means using a custom prompt, removing user instructions, and starting a clean chat history. We also want to make sure the review thread doesn't leak into the parent thread. ### Why do this as a mode, vs. sub-agents? 1. We want review to be a synchronous task, so it's fine for now to do a bespoke implementation. 2. We're still unclear about the final structure for sub-agents. We'd prefer to land this quickly and then refactor into sub-agents without rushing that implementation.	2025-09-12 23:25:10 +00:00
Michael Bolin	9bbeb75361	feat: include reasoning_effort in NewConversationResponse (#3506 ) `ClientRequest::NewConversation` picks up the reasoning level from the user's defaults in `config.toml`, so it should be reported in `NewConversationResponse`.	2025-09-11 21:04:40 -07:00
Michael Bolin	bec51f6c05	chore: enable clippy::redundant_clone (#3489 ) Created this PR by: - adding `redundant_clone` to `[workspace.lints.clippy]` in `cargo-rs/Cargol.toml` - running `cargo clippy --tests --fix` - running `just fmt` Though I had to clean up one instance of the following that resulted: ```rust let codex = codex; ```	2025-09-11 11:59:37 -07:00
Eric Traut	e13b35ecb0	Simplify auth flow and reconcile differences between ChatGPT and API Key auth (#3189 ) This PR does the following: * Adds the ability to paste or type an API key. * Removes the `preferred_auth_method` config option. The last login method is always persisted in auth.json, so this isn't needed. * If OPENAI_API_KEY env variable is defined, the value is used to prepopulate the new UI. The env variable is otherwise ignored by the CLI. * Adds a new MCP server entry point "login_api_key" so we can implement this same API key behavior for the VS Code extension. <img width="473" height="140" alt="Screenshot 2025-09-04 at 3 51 04 PM" src="https://github.com/user-attachments/assets/c11bbd5b-8a4d-4d71-90fd-34130460f9d9" /> <img width="726" height="254" alt="Screenshot 2025-09-04 at 3 51 32 PM" src="https://github.com/user-attachments/assets/6cc76b34-309a-4387-acbc-15ee5c756db9" />	2025-09-11 09:16:34 -07:00
Ahmed Ibrahim	162e1235a8	Change forking to read the rollout from file (#3440 ) This PR changes get history op to get path. Then, forking will use a path. This will help us have one unified codepath for resuming/forking conversations. Will also help in having rollout history in order. It also fixes a bug where you won't see the UI when resuming after forking.	2025-09-10 17:42:54 -07:00
Gabriel Peal	5eab4c7ab4	Replace config.responses_originator_header_internal_override with CODEX_INTERNAL_ORIGINATOR_OVERRIDE_ENV_VAR (#3388 ) The previous config approach had a few issues: 1. It is part of the config but not designed to be used externally 2. It had to be wired through many places (look at the +/- on this PR 3. It wasn't guaranteed to be set consistently everywhere because we don't have a super well defined way that configs stack. For example, the extension would configure during newConversation but anything that happened outside of that (like login) wouldn't get it. This env var approach is cleaner and also creates one less thing we have to deal with when coming up with a better holistic story around configs. One downside is that I removed the unit test testing for the override because I don't want to deal with setting the global env or spawning child processes and figuring out how to introspect their originator header. The new code is sufficiently simple and I tested it e2e that I feel as if this is still worth it.	2025-09-09 17:23:23 -04:00
Michael Bolin	2a76a08a9e	fix: include rollout_path in NewConversationResponse (#3352 ) Adding the `rollout_path` to the `NewConversationResponse` makes it so a client can perform subsequent operations on a `(ConversationId, PathBuf)` pair. #3353 will introduce support for `ArchiveConversation`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/3352). * #3353 * __->__ #3352	2025-09-09 00:11:48 -07:00
jif-oai	a9c68ea270	feat: Run cargo shear during CI (#3338 ) Run cargo shear as part of the CI to ensure no unused dependencies	2025-09-09 01:05:08 +00:00
Justin Lebar	18330c2362	Format large numbers in a more readable way. (#2046 ) - In the bottom line of the TUI, print the number of tokens to 3 sigfigs with an SI suffix, e.g. "1.23K". - Elsewhere where we print a number, I figure it's worthwhile to print the exact number, because e.g. it's a summary of your session. Here we print the numbers comma-separated.	2025-09-08 21:48:48 +00:00
Gabriel Peal	c8fab51372	Use ConversationId instead of raw Uuids (#3282 ) We're trying to migrate from `session_id: Uuid` to `conversation_id: ConversationId`. Not only does this give us more type safety but it unifies our terminology across Codex and with the implementation of session resuming, a conversation (which can span multiple sessions) is more appropriate. I started this impl on https://github.com/openai/codex/pull/3219 as part of getting resume working in the extension but it's big enough that it should be broken out.	2025-09-07 23:22:25 -04:00
pakrym-oai	0269096229	Move token usage/context information to session level (#3221 ) Move context information into the main loop so it can be used to interrupt the loop or start auto-compaction.	2025-09-06 15:19:23 +00:00
pakrym-oai	5775174ec2	Never store requests (#3212 ) When item ids are sent to Responses API it will load them from the database ignoring the provided values. This adds extra latency. Not having the mode to store requests also allows us to simplify the code. ## Breaking change The `disable_response_storage` configuration option is removed.	2025-09-05 10:41:47 -07:00
Ahmed Ibrahim	2b96f9f569	Dividing UserMsgs into categories to send it back to the tui (#3127 ) This PR does the following: - divides user msgs into 3 categories: plain, user instructions, and environment context - Centralizes adding user instructions and environment context to a degree - Improve the integration testing Building on top of #3123 Specifically this [comment](https://github.com/openai/codex/pull/3123#discussion_r2319885089). We need to send the user message while ignoring the User Instructions and Environment Context we attach.	2025-09-04 05:34:50 +00:00
Ahmed Ibrahim	f2036572b6	Replay EventMsgs from Response Items when resuming a session with history. (#3123 ) ### Overview This PR introduces the following changes: 1. Adds a unified mechanism to convert ResponseItem into EventMsg. 2. Ensures that when a session is initialized with initial history, a vector of EventMsg is sent along with the session configuration. This allows clients to re-render the UI accordingly. 3. Added integration testing ### Caveats This implementation does not send every EventMsg that was previously dispatched to clients. The excluded events fall into two categories: • “Arguably” rolled-out events Examples include tool calls and apply-patch calls. While these events are conceptually rolled out, we currently only roll out ResponseItems. These events are already being handled elsewhere and transformed into EventMsg before being sent. • Non-rolled-out events Certain events such as TurnDiff, Error, and TokenCount are not rolled out at all. ### Future Directions At present, resuming a session involves maintaining two states: • UI State Clients can replay most of the important UI from the provided EventMsg history. • Model State The model receives the complete session history to reconstruct its internal state. This design provides a solid foundation. If, in the future, more precise UI reconstruction is needed, we have two potential paths: 1. Introduce a third data structure that allows us to derive both ResponseItems and EventMsgs. 2. Clearly divide responsibilities: the core system ensures the integrity of the model state, while clients are responsible for reconstructing the UI.	2025-09-04 04:47:00 +00:00

1 2 3 4

154 Commits