valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Ahmed Ibrahim	f59978ed3d	Handle cancelling/aborting while processing a turn (#5543 ) Currently we collect all all turn items in a vector, then we add it to the history on success. This result in losing those items on errors including aborting `ctrl+c`. This PR: - Adds the ability for the tool call to handle cancellation - bubble the turn items up to where we are recording this info Admittedly, this logic is an ad-hoc logic that doesn't handle a lot of error edge cases. The right thing to do is recording to the history on the spot as `items`/`tool calls output` come. However, this isn't possible because of having different `task_kind` that has different `conversation_histories`. The `try_run_turn` has no idea what thread are we using. We cannot also pass an `arc` to the `conversation_histories` because it's a private element of `state`. That's said, `abort` is the most common case and we should cover it until we remove `task kind`	2025-10-23 08:47:10 -07:00
jif-oai	8e291a1706	chore: clean `handle_container_exec_with_params` (#5516 ) Drop `handle_container_exec_with_params` to have simpler and more straight forward execution path	2025-10-23 09:24:01 +01:00
Ahmed Ibrahim	273819aaae	Move changing turn input functionalities to `ConversationHistory` (#5473 ) We are doing some ad-hoc logic while dealing with conversation history. Ideally, we shouldn't mutate `vec[responseitem]` manually at all and should depend on `ConversationHistory` for those changes. Those changes are: - Adding input to the history - Removing items from the history - Correcting history I am also adding some `error` logs for cases we shouldn't ideally face. For example, we shouldn't be missing `toolcalls` or `outputs`. We shouldn't hit `ContextWindowExceeded` while performing `compact` This refactor will give us granular control over our context management.	2025-10-22 13:08:46 -07:00
Gabriel Peal	4cd6b01494	[MCP] Remove the legacy stdio client in favor of rmcp (#5529 ) I haven't heard of any issues with the studio rmcp client so let's remove the legacy one and default to the new one. Any code changes are moving code from the adapter inline but there should be no meaningful functionality changes.	2025-10-22 12:06:59 -07:00
pakrym-oai	3c90728a29	Add new thread items and rewire event parsing to use them (#5418 ) 1. Adds AgentMessage, Reasoning, WebSearch items. 2. Switches the ResponseItem parsing to use new items and then also emit 3. Removes user-item kind and filters out "special" (environment) user items when returning to clients.	2025-10-22 10:14:50 -07:00
pakrym-oai	1b10a3a1b2	Enable plan tool by default (#5384 ) ## Summary - make the plan tool available by default by removing the feature flag and always registering the handler - drop plan-tool CLI and API toggles across the exec, TUI, MCP server, and app server code paths - update tests and configs to reflect the always-on plan tool and guard workspace restriction tests against env leakage ## Testing Manually tested the extension. ------ https://chatgpt.com/codex/tasks/task_i_68f67a3ff2d083209562a773f814c1f9	2025-10-21 16:25:05 +00:00
pakrym-oai	789e65b9d2	Pass TurnContext around instead of sub_id (#5421 ) Today `sub_id` is an ID of a single incoming Codex Op submition. We then associate all events triggered by this operation using the same `sub_id`. At the same time we are also creating a TurnContext per submission and we'd like to start associating some events (item added/item completed) with an entire turn instead of just the operation that started it. Using turn context when sending events give us flexibility to change notification scheme.	2025-10-21 08:04:16 -07:00
Thibault Sottiaux	7fc01c6e9b	feat: include cwd in notify payload (#5415 ) Expose the session cwd in the notify payload and update docs so scripts and extensions receive the real project path; users get accurate project-aware notifications in CLI and VS Code. Fixes #5387	2025-10-20 23:53:03 +00:00
Gabriel Peal	ef806456e4	[MCP] Dedicated error message for GitHub MCPs missing a personal access token (#5393 ) Because the GitHub MCP is one of the most popular MCPs and it confusingly doesn't support OAuth, we should make it more clear how to make it work so people don't think Codex is broken.	2025-10-20 16:23:26 -07:00
pakrym-oai	9c903c4716	Add ItemStarted/ItemCompleted events for UserInputItem (#5306 ) Adds a new ItemStarted event and delivers UserMessage as the first item type (more to come). Renames `InputItem` to `UserInput` considering we're using the `Item` suffix for actual items.	2025-10-20 13:34:44 -07:00
jif-oai	5e4f3bbb0b	chore: rework tools execution workflow (#5278 ) Re-work the tool execution flow. Read `orchestrator.rs` to understand the structure	2025-10-20 20:57:37 +01:00
Ahmed Ibrahim	049a61bcfc	Auto compact at ~90% (#5292 ) Users now hit a window exceeded limit and they usually don't know what to do. This starts auto compact at ~90% of the window.	2025-10-20 11:29:49 -07:00
pakrym-oai	2287d2afde	Create independent TurnContexts (#5308 ) The goal of this change: 1. Unify user input and user turn implementation. 2. Have a single place where turn/session setting overrides are applied. 3. Have a single place where turn context is created. 4. Create TurnContext only for actual turn and have a separate structure for current session settings (reuse ConfigureSession)	2025-10-18 17:43:08 -07:00
Gabriel Peal	41900e9d0f	[MCP] When MCP auth expires, prompt the user to log in again. (#5300 ) Similar to https://github.com/openai/codex/pull/5193 but catches a case where the user _has_ authenticated but the auth expired or was revoked. Before: <img width="2976" height="632" alt="CleanShot 2025-10-17 at 14 28 11" src="https://github.com/user-attachments/assets/7c1bd11d-c075-46cb-9298-48891eaa77fe" /> After: <img width="591" height="283" alt="image" src="https://github.com/user-attachments/assets/fc14e08c-1a33-4077-8757-ff4ed3f00f8f" />	2025-10-17 18:16:22 -04:00
pakrym-oai	c03e31ecf5	Support graceful agent interruption (#5287 )	2025-10-17 18:52:57 +00:00
jif-oai	6915ba2100	feat: better UX during refusal (#5260 ) <img width="568" height="169" alt="Screenshot 2025-10-16 at 18 28 05" src="https://github.com/user-attachments/assets/f42e8d6d-b7de-4948-b291-a5fbb50b1312" />	2025-10-17 11:06:55 +02:00
Gabriel Peal	40fba1bb4c	[MCP] Add support for resources (#5239 ) This PR adds support for [MCP resources](https://modelcontextprotocol.io/specification/2025-06-18/server/resources) by adding three new tools for the model: 1. `list_resources` 2. `list_resource_templates` 3. `read_resource` These 3 tools correspond to the [three primary MCP resource protocol messages](https://modelcontextprotocol.io/specification/2025-06-18/server/resources#protocol-messages). Example of listing and reading a GitHub resource tempalte <img width="2984" height="804" alt="CleanShot 2025-10-15 at 17 31 10" src="https://github.com/user-attachments/assets/89b7f215-2e2a-41c5-90dd-b932ac84a585" /> `/mcp` with Figma configured <img width="2984" height="442" alt="CleanShot 2025-10-15 at 18 29 35" src="https://github.com/user-attachments/assets/a7578080-2ed2-4c59-b9b4-d8461f90d8ee" /> Fixes #4956	2025-10-17 01:05:15 -04:00
Anton Panasenko	c146585cdb	[codex][otel] propagate user email in otel events (#5223 ) include user email into otel events for proper user-level attribution in case of workspace setup	2025-10-15 17:53:33 -07:00
Michael Bolin	995f5c3614	feat: add Vec<ParsedCommand> to ExecApprovalRequestEvent (#5222 ) This adds `parsed_cmd: Vec<ParsedCommand>` to `ExecApprovalRequestEvent` in the core protocol (`protocol/src/protocol.rs`), which is also what this field is named on `ExecCommandBeginEvent`. Honestly, I don't love the name (it sounds like a single command, but it is actually a list of them), but I don't want to get distracted by a naming discussion right now. This also adds `parsed_cmd` to `ExecCommandApprovalParams` in `codex-rs/app-server-protocol/src/protocol.rs`, so it will be available via `codex app-server`, as well. For consistency, I also updated `ExecApprovalElicitRequestParams` in `codex-rs/mcp-server/src/exec_approval.rs` to include this field under the name `codex_parsed_cmd`, as that struct already has a number of special `codex_*` fields. Note this is the code for when Codex is used as an MCP _server_ and therefore has to conform to the official spec for an MCP elicitation type.	2025-10-15 13:58:40 -07:00
Michael Bolin	f38ad65254	chore: standardize on ParsedCommand from codex_protocol (#5218 ) Note these two types were identical, so it seems clear to standardize on the one in `codex_protocol` and eliminate the `Into` stuff. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/5218). * #5222 * __->__ #5218	2025-10-15 13:00:22 -07:00
Gabriel Peal	8a281cd1f4	[MCP] Prompt `mcp login` when adding a streamable HTTP server that supports oauth (#5193 ) 1. If Codex detects that a `codex mcp add -url …` server supports oauth, it will auto-initiate the login flow. 2. If the TUI starts and a MCP server supports oauth but isn't logged in, it will give the user an explicit warning telling them to log in.	2025-10-15 12:27:40 -04:00
Javi	13035561cd	feat: pass codex thread ID in notifier metadata (#4582 )	2025-10-14 11:55:10 -07:00
jif-oai	f7b4e29609	feat: feature flag (#4948 ) Add proper feature flag instead of having custom flags for everything. This is just for experimental/wip part of the code It can be used through CLI: ```bash codex --enable unified_exec --disable view_image_tool ``` Or in the `config.toml` ```toml # Global toggles applied to every profile unless overridden. [features] apply_patch_freeform = true view_image_tool = false ``` Follow-up: In a following PR, the goal is to have a default have `bundles` of features that we can associate to a model	2025-10-14 17:50:00 +00:00
jif-oai	268a10f917	feat: add header for task kind (#5142 ) Add a header in the responses API request for the task kind (compact, review, ...) for observability purpose The header name is `codex-task-type`	2025-10-14 15:17:00 +00:00
jif-oai	f98fa85b44	feat: message when stream get correctly resumed (#4988 ) <img width="366" height="109" alt="Screenshot 2025-10-09 at 17 44 16" src="https://github.com/user-attachments/assets/26bc6f60-11bc-4fc6-a1cc-430ca1203969" />	2025-10-10 09:07:14 +00:00
dedrisian-oai	4300236681	revert /name for now (#4978 ) There was a regression where we'd read entire rollout contents if there was no /name present.	2025-10-08 17:13:49 -07:00
dedrisian-oai	ec238a2c39	feat: Set chat name (#4974 ) Set chat name with `/name` so they appear in the codex resume page: https://github.com/user-attachments/assets/c0252bba-3a53-44c7-a740-f4690a3ad405	2025-10-08 16:35:35 -07:00
Gabriel Peal	3c5e12e2a4	[MCP] Add auth status to MCP servers (#4918 ) This adds a queryable auth status for MCP servers which is useful: 1. To determine whether a streamable HTTP server supports auth or not based on whether or not it supports RFC 8414-3.2 2. Allow us to build a better user experience on top of MCP status	2025-10-08 17:37:57 -04:00
Gabriel Peal	496cb801e1	[MCP] Add the ability to explicitly specify a credentials store (#4857 ) This lets users/companies explicitly choose whether to force/disallow the keyring/fallback file storage for mcp credentials. People who develop with Codex will want to use this until we sign binaries or else each ad-hoc debug builds will require keychain access on every build. I don't love this and am open to other ideas for how to handle that. ```toml mcp_oauth_credentials_store = "auto" mcp_oauth_credentials_store = "file" mcp_oauth_credentials_store = "keyrung" ``` Defaults to `auto`	2025-10-07 22:39:32 -04:00
dedrisian-oai	b016a3e7d8	Remove instruction hack for /review (#4896 ) We use to put the review prompt in the first user message as well to bypass statsig overrides, but now that's been resolved and instructions are being respected, so we're duplicating the review instructions.	2025-10-07 12:47:00 -07:00
pakrym-oai	f2555422b9	Simplify parallel (#4829 ) make tool processing return a future and then collect futures. handle cleanup on Drop	2025-10-07 10:12:38 -07:00
jif-oai	dc3c6bf62a	feat: parallel tool calls (#4663 ) Add parallel tool calls. This is configurable at model level and tool level	2025-10-05 16:10:49 +00:00
Ahmed Ibrahim	cc2f4aafd7	Add truncation hint on truncated exec output. (#4740 ) When truncating output, add a hint of the total number of lines	2025-10-05 03:29:07 +00:00
Ahmed Ibrahim	90ef94d3b3	Surface context window error to the client (#4675 ) In the past, we were treating `input exceeded context window` as a streaming error and retrying on it. Retrying on it has no point because it won't change the behavior. In this PR, we surface the error to the client without retry and also send a token count event to indicate that the context window is full. <img width="650" height="125" alt="image" src="https://github.com/user-attachments/assets/c26b1213-4c27-4bfc-90f4-51a270a3efd5" />	2025-10-05 01:40:06 +00:00
jif-oai	33d3ecbccc	chore: refactor tool handling (#4510 ) # Tool System Refactor - Centralizes tool definitions and execution in `core/src/tools/`: specs (`spec.rs`), handlers (`handlers/`), router (`router.rs`), registry/dispatch (`registry.rs`), and shared context (`context.rs`). One registry now builds the model-visible tool list and binds handlers. - Router converts model responses to tool calls; Registry dispatches with consistent telemetry via `codex-rs/otel` and unified error handling. Function, Local Shell, MCP, and experimental `unified_exec` all flow through this path; legacy shell aliases still work. - Rationale: reduce per‑tool boilerplate, keep spec/handler in sync, and make adding tools predictable and testable. Example: `read_file` - Spec: `core/src/tools/spec.rs` (see `create_read_file_tool`, registered by `build_specs`). - Handler: `core/src/tools/handlers/read_file.rs` (absolute `file_path`, 1‑indexed `offset`, `limit`, `L#: ` prefixes, safe truncation). - E2E test: `core/tests/suite/read_file.rs` validates the tool returns the requested lines. ## Next steps: - Decompose `handle_container_exec_with_params` - Add parallel tool calls	2025-10-03 13:21:06 +01:00
jif-oai	69cb72f842	chore: sandbox refactor 2 (#4653 ) Revert the revert and fix the UI issue	2025-10-03 11:17:39 +01:00
Ahmed Ibrahim	ed5d656fa8	Revert "chore: sanbox extraction" (#4626 ) Reverts openai/codex#4286	2025-10-02 21:09:21 +00:00
pakrym-oai	4c566d484a	Separate interactive and non-interactive sessions (#4612 ) Do not show exec session in VSCode/TUI selector.	2025-10-02 13:06:21 -07:00
Jeremy Rose	45936f8fbd	show "Viewed Image" when the model views an image (#4475 ) <img width="1022" height="339" alt="Screenshot 2025-09-29 at 4 22 00 PM" src="https://github.com/user-attachments/assets/12da7358-19be-4010-a71b-496ede6dfbbf" />	2025-10-02 18:36:03 +00:00
jif-oai	b8195a17e5	chore: sanbox extraction (#4286 ) # Extract and Centralize Sandboxing - Goal: Improve safety and clarity by centralizing sandbox planning and execution. - Approach: - Add planner (ExecPlan) and backend registry (Direct/Seatbelt/Linux) with run_with_plan. - Refactor codex.rs to plan-then-execute; handle failures/escalation via the plan. - Delegate apply_patch to the codex binary and run it with an empty env for determinism.	2025-10-01 12:05:12 +01:00
Michael Bolin	5881c0d6d4	fix: remove mcp-types from app server protocol (#4537 ) We continue the separation between `codex app-server` and `codex mcp-server`. In particular, we introduce a new crate, `codex-app-server-protocol`, and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it `codex-rs/app-server-protocol/src/protocol.rs`. Because `ConversationId` was defined in `mcp_protocol.rs`, we move it into its own file, `codex-rs/protocol/src/conversation_id.rs`, and because it is referenced in a ton of places, we have to touch a lot of files as part of this PR. We also decide to get away from proper JSON-RPC 2.0 semantics, so we also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which is basically the same `JSONRPCMessage` type defined in `mcp-types` except with all of the `"jsonrpc": "2.0"` removed. Getting rid of `"jsonrpc": "2.0"` makes our serialization logic considerably simpler, as we can lean heavier on serde to serialize directly into the wire format that we use now.	2025-10-01 02:16:26 +00:00
vishnu-oai	04c1782e52	OpenTelemetry events (#2103 ) ### Title ## otel Codex can emit [OpenTelemetry](https://opentelemetry.io/) log events that describe each run: outbound API requests, streamed responses, user input, tool-approval decisions, and the result of every tool invocation. Export is disabled by default so local runs remain self-contained. Opt in by adding an `[otel]` table and choosing an exporter. ```toml [otel] environment = "staging" # defaults to "dev" exporter = "none" # defaults to "none"; set to otlp-http or otlp-grpc to send events log_user_prompt = false # defaults to false; redact prompt text unless explicitly enabled ``` Codex tags every exported event with `service.name = "codex-cli"`, the CLI version, and an `env` attribute so downstream collectors can distinguish dev/staging/prod traffic. Only telemetry produced inside the `codex_otel` crate—the events listed below—is forwarded to the exporter. ### Event catalog Every event shares a common set of metadata fields: `event.timestamp`, `conversation.id`, `app.version`, `auth_mode` (when available), `user.account_id` (when available), `terminal.type`, `model`, and `slug`. With OTEL enabled Codex emits the following event types (in addition to the metadata above): - `codex.api_request` - `cf_ray` (optional) - `attempt` - `duration_ms` - `http.response.status_code` (optional) - `error.message` (failures) - `codex.sse_event` - `event.kind` - `duration_ms` - `error.message` (failures) - `input_token_count` (completion only) - `output_token_count` (completion only) - `cached_token_count` (completion only, optional) - `reasoning_token_count` (completion only, optional) - `tool_token_count` (completion only) - `codex.user_prompt` - `prompt_length` - `prompt` (redacted unless `log_user_prompt = true`) - `codex.tool_decision` - `tool_name` - `call_id` - `decision` (`approved`, `approved_for_session`, `denied`, or `abort`) - `source` (`config` or `user`) - `codex.tool_result` - `tool_name` - `call_id` - `arguments` - `duration_ms` (execution time for the tool) - `success` (`"true"` or `"false"`) - `output` ### Choosing an exporter Set `otel.exporter` to control where events go: - `none` – leaves instrumentation active but skips exporting. This is the default. - `otlp-http` – posts OTLP log records to an OTLP/HTTP collector. Specify the endpoint, protocol, and headers your collector expects: ```toml [otel] exporter = { otlp-http = { endpoint = "https://otel.example.com/v1/logs", protocol = "binary", headers = { "x-otlp-api-key" = "${OTLP_TOKEN}" } }} ``` - `otlp-grpc` – streams OTLP log records over gRPC. Provide the endpoint and any metadata headers: ```toml [otel] exporter = { otlp-grpc = { endpoint = "https://otel.example.com:4317", headers = { "x-otlp-meta" = "abc123" } }} ``` If the exporter is `none` nothing is written anywhere; otherwise you must run or point to your own collector. All exporters run on a background batch worker that is flushed on shutdown. If you build Codex from source the OTEL crate is still behind an `otel` feature flag; the official prebuilt binaries ship with the feature enabled. When the feature is disabled the telemetry hooks become no-ops so the CLI continues to function without the extra dependencies. --------- Co-authored-by: Anton Panasenko <apanasenko@openai.com>	2025-09-29 11:30:55 -07:00
Gabriel Peal	e555a36c6a	[MCP] Introduce an experimental official rust sdk based mcp client (#4252 ) The [official Rust SDK](`57fc428c57`) has come a long way since we first started our mcp client implementation 5 months ago and, today, it is much more complete than our own stdio-only implementation. This PR introduces a new config flag `experimental_use_rmcp_client` which will use a new mcp client powered by the sdk instead of our own. To keep this PR simple, I've only implemented the same stdio MCP functionality that we had but will expand on it with future PRs. --------- Co-authored-by: pakrym-oai <pakrym@openai.com>	2025-09-26 13:13:37 -04:00
jif-oai	1fc3413a46	ref: state - 2 (#4229 ) Extracting tasks in a module and start abstraction behind a Trait (more to come on this but each task will be tackled in a dedicated PR) The goal was to drop the ActiveTask and to have a (potentially) set of tasks during each turn	2025-09-26 13:49:08 +00:00
jif-oai	250b244ab4	ref: full state refactor (#4174 ) ## Current State Observations - `Session` currently holds many unrelated responsibilities (history, approval queues, task handles, rollout recorder, shell discovery, token tracking, etc.), making it hard to reason about ownership and lifetimes. - The anonymous `State` struct inside `codex.rs` mixes session-long data with turn-scoped queues and approval bookkeeping. - Turn execution (`run_task`) relies on ad-hoc local variables that should conceptually belong to a per-turn state object. - External modules (`codex::compact`, tests) frequently poke the raw `Session.state` mutex, which couples them to implementation details. - Interrupts, approvals, and rollout persistence all have bespoke cleanup paths, contributing to subtle bugs when a turn is aborted mid-flight. ## Desired End State - Keep a slim `Session` object that acts as the orchestrator and façade. It should expose a focused API (submit, approvals, interrupts, event emission) without storing unrelated fields directly. - Introduce a `state` module that encapsulates all mutable data structures: - `SessionState`: session-persistent data (history, approved commands, token/rate-limit info, maybe user preferences). - `ActiveTurn`: metadata for the currently running turn (sub-id, task kind, abort handle) and an `Arc<TurnState>`. - `TurnState`: all turn-scoped pieces (pending inputs, approval waiters, diff tracker, review history, auto-compact flags, last agent message, outstanding tool call bookkeeping). - Group long-lived helpers/managers into a dedicated `SessionServices` struct so `Session` does not accumulate "random" fields. - Provide clear, lock-safe APIs so other modules never touch raw mutexes. - Ensure every turn creates/drops a `TurnState` and that interrupts/finishes delegate cleanup to it.	2025-09-25 12:16:06 +02:00
pakrym-oai	addc946d13	Simplify tool implemetations (#4160 ) Use Result<String, FunctionCallError> for all tool handling code and rely on error propagation instead of creating failed items everywhere.	2025-09-24 17:27:35 +00:00
Ahmed Ibrahim	8227a5ba1b	Send limits when getting rate limited (#4102 ) Users need visibility on rate limits when they are rate limited.	2025-09-23 22:56:34 +00:00
pakrym-oai	fdb8dadcae	Add exec output-schema parameter (#4079 ) Adds structured output to `exec` via the `--structured-output` parameter.	2025-09-23 13:59:16 -07:00
jif-oai	b84a920067	chore: compact do not modify instructions (#4088 ) Keep the developer instruction and insert the summarisation message as a user message instead	2025-09-23 17:59:17 +01:00
pakrym-oai	5c7d9e27b1	Add notifier tests (#4064 ) Proposal: 1. Use anyhow for tests and avoid unwrap 2. Extract a helper for starting a test instance of codex	2025-09-23 14:25:46 +00:00

1 2 3 4 5

222 Commits