valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
pakrym-oai	2f6fb37d72	Support CODEX_API_KEY for codex exec (#4615 ) Allows to set API key per invocation of `codex exec`	2025-10-02 09:59:45 -07:00
Gabriel Peal	35c76ad47d	fix: update the gpt-5-codex prompt to be more explicit that it should always used fenced code blocks info tags (#4569 ) We get spurrious reports that the model writes fenced code blocks without an info tag which then causes auto-language detection in the extension to incorrectly highlight the code and show the wrong language. The model should really always include a tag when it can.	2025-10-01 22:41:56 -07:00
pakrym-oai	e899ae7d8a	Include request ID in the error message (#4572 ) To help with issue debugging <img width="1414" height="253" alt="image" src="https://github.com/user-attachments/assets/254732df-44ac-4252-997a-6c5e0927355b" />	2025-10-01 15:36:04 -07:00
iceweasel-oai	6f97ec4990	canonicalize display of Agents.md paths on Windows. (#4577 ) Canonicalize path on Windows to - remove unattractive path prefixes such as `\\?\` - simplify it (`../AGENTS.md` vs `C:\Users\iceweasel\code\coded\Agents.md`) before: <img width="1110" height="45" alt="Screenshot 2025-10-01 123520" src="https://github.com/user-attachments/assets/48920ae6-d89c-41b8-b4ea-df5c18fb5fad" /> after: <img width="585" height="46" alt="Screenshot 2025-10-01 123612" src="https://github.com/user-attachments/assets/70a1761a-9d97-4836-b14c-670b6f13e608" />	2025-10-01 14:33:19 -07:00
Thibault Sottiaux	5d78c1edd3	Revert "chore: prompt update to enforce good usage of apply_patch" (#4576 ) Reverts openai/codex#3846	2025-10-01 20:11:36 +00:00
easong-openai	400a5a90bf	Fall back to configured instruction files if AGENTS.md isn't available (#4544 ) Allow users to configure an agents.md alternative to consume, but warn the user it may degrade model performance. Fixes #4376	2025-10-01 18:19:59 +00:00
Ahmed Ibrahim	d78d0764aa	Add Updated at time in resume picker (#4468 ) <img width="639" height="281" alt="image" src="https://github.com/user-attachments/assets/92b2ad2b-9e18-4485-9b8d-d7056eb98651" />	2025-10-01 10:40:43 -07:00
iceweasel-oai	dde615f482	implement command safety for PowerShell commands (#4269 ) Implement command safety for PowerShell commands on Windows This change adds a new Windows-specific command-safety module under `codex-rs/core/src/command_safety/windows_safe_commands.rs` to strictly sanitise PowerShell invocations. Key points: - Introduce `is_safe_command_windows()` to only allow explicitly read-only PowerShell calls. - Parse and split PowerShell invocations (including inline `-Command` scripts and pipelines). - Block unsafe switches (`-File`, `-EncodedCommand`, `-ExecutionPolicy`, unknown flags, call operators, redirections, separators). - Whitelist only read-only cmdlets (`Get-ChildItem`, `Get-Content`, `Select-Object`, etc.), safe Git subcommands (`status`, `log`, `show`, `diff`, `cat-file`), and ripgrep without unsafe options. - Add comprehensive unit tests covering allowed and rejected command patterns (nested calls, side effects, chaining, redirections). This ensures Codex on Windows can safely execute discover-only PowerShell workflows without risking destructive operations.	2025-10-01 09:56:48 -07:00
jif-oai	b8195a17e5	chore: sanbox extraction (#4286 ) # Extract and Centralize Sandboxing - Goal: Improve safety and clarity by centralizing sandbox planning and execution. - Approach: - Add planner (ExecPlan) and backend registry (Direct/Seatbelt/Linux) with run_with_plan. - Refactor codex.rs to plan-then-execute; handle failures/escalation via the plan. - Delegate apply_patch to the codex binary and run it with an empty env for determinism.	2025-10-01 12:05:12 +01:00
Michael Bolin	5881c0d6d4	fix: remove mcp-types from app server protocol (#4537 ) We continue the separation between `codex app-server` and `codex mcp-server`. In particular, we introduce a new crate, `codex-app-server-protocol`, and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it `codex-rs/app-server-protocol/src/protocol.rs`. Because `ConversationId` was defined in `mcp_protocol.rs`, we move it into its own file, `codex-rs/protocol/src/conversation_id.rs`, and because it is referenced in a ton of places, we have to touch a lot of files as part of this PR. We also decide to get away from proper JSON-RPC 2.0 semantics, so we also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which is basically the same `JSONRPCMessage` type defined in `mcp-types` except with all of the `"jsonrpc": "2.0"` removed. Getting rid of `"jsonrpc": "2.0"` makes our serialization logic considerably simpler, as we can lean heavier on serde to serialize directly into the wire format that we use now.	2025-10-01 02:16:26 +00:00
jif-oai	f6a152848a	chore: prompt update to enforce good usage of apply_patch (#3846 ) Update prompt to prevent codex to use Python script or fancy commands to edit files. ## Testing: 3 scenarios have been considered: 1. Rename codex to meca_code. Proceed to the whole refactor file by file. Don't ask for approval at each step 2. Add a description to every single function you can find in the repo 3. Rewrite codex.rs in a more idiomatic way. Make sure to touch ONLY this file and that clippy does not complain at the end Before this update, 22% (estimation as it's sometimes hard to find all the creative way the model find to edit files) of the file editions where made using something else than a raw `apply_patch` After this update, not a single edition without `apply_patch` was found [EDIT] I managed to have a few `["bash", "-lc", "apply_path"]` when reaching < 10% context left	2025-09-30 10:18:59 -07:00
easong-openai	5b038135de	Add cloud tasks (#3197 ) Adds a TUI for managing, applying, and creating cloud tasks	2025-09-30 10:10:33 +00:00
pakrym-oai	c09e131653	Set originator for codex exec (#4485 ) Distinct from the main CLI.	2025-09-29 20:59:19 -07:00
Ahmed Ibrahim	16057e76b0	[Core]: add tail in the rollout data (#4461 ) This will help us show the conversation tail and last updated timestamp.	2025-09-29 14:32:26 -07:00
dedrisian-oai	83a4d4d8ed	Parse out frontmatter for custom prompts (#4456 ) [Cherry picked from https://github.com/openai/codex/pull/3565] Removes the frontmatter description/args from custom prompt files and only includes body.	2025-09-29 13:06:08 -07:00
vishnu-oai	04c1782e52	OpenTelemetry events (#2103 ) ### Title ## otel Codex can emit [OpenTelemetry](https://opentelemetry.io/) log events that describe each run: outbound API requests, streamed responses, user input, tool-approval decisions, and the result of every tool invocation. Export is disabled by default so local runs remain self-contained. Opt in by adding an `[otel]` table and choosing an exporter. ```toml [otel] environment = "staging" # defaults to "dev" exporter = "none" # defaults to "none"; set to otlp-http or otlp-grpc to send events log_user_prompt = false # defaults to false; redact prompt text unless explicitly enabled ``` Codex tags every exported event with `service.name = "codex-cli"`, the CLI version, and an `env` attribute so downstream collectors can distinguish dev/staging/prod traffic. Only telemetry produced inside the `codex_otel` crate—the events listed below—is forwarded to the exporter. ### Event catalog Every event shares a common set of metadata fields: `event.timestamp`, `conversation.id`, `app.version`, `auth_mode` (when available), `user.account_id` (when available), `terminal.type`, `model`, and `slug`. With OTEL enabled Codex emits the following event types (in addition to the metadata above): - `codex.api_request` - `cf_ray` (optional) - `attempt` - `duration_ms` - `http.response.status_code` (optional) - `error.message` (failures) - `codex.sse_event` - `event.kind` - `duration_ms` - `error.message` (failures) - `input_token_count` (completion only) - `output_token_count` (completion only) - `cached_token_count` (completion only, optional) - `reasoning_token_count` (completion only, optional) - `tool_token_count` (completion only) - `codex.user_prompt` - `prompt_length` - `prompt` (redacted unless `log_user_prompt = true`) - `codex.tool_decision` - `tool_name` - `call_id` - `decision` (`approved`, `approved_for_session`, `denied`, or `abort`) - `source` (`config` or `user`) - `codex.tool_result` - `tool_name` - `call_id` - `arguments` - `duration_ms` (execution time for the tool) - `success` (`"true"` or `"false"`) - `output` ### Choosing an exporter Set `otel.exporter` to control where events go: - `none` – leaves instrumentation active but skips exporting. This is the default. - `otlp-http` – posts OTLP log records to an OTLP/HTTP collector. Specify the endpoint, protocol, and headers your collector expects: ```toml [otel] exporter = { otlp-http = { endpoint = "https://otel.example.com/v1/logs", protocol = "binary", headers = { "x-otlp-api-key" = "${OTLP_TOKEN}" } }} ``` - `otlp-grpc` – streams OTLP log records over gRPC. Provide the endpoint and any metadata headers: ```toml [otel] exporter = { otlp-grpc = { endpoint = "https://otel.example.com:4317", headers = { "x-otlp-meta" = "abc123" } }} ``` If the exporter is `none` nothing is written anywhere; otherwise you must run or point to your own collector. All exporters run on a background batch worker that is flushed on shutdown. If you build Codex from source the OTEL crate is still behind an `otel` feature flag; the official prebuilt binaries ship with the feature enabled. When the feature is disabled the telemetry hooks become no-ops so the CLI continues to function without the extra dependencies. --------- Co-authored-by: Anton Panasenko <apanasenko@openai.com>	2025-09-29 11:30:55 -07:00
Michael Bolin	7407469791	chore: lower logging level from error to info for MCP startup (#4412 )	2025-09-28 15:13:44 -07:00
Thibault Sottiaux	d7286e9829	chore: remove model upgrade popup (#4332 )	2025-09-27 13:25:09 -07:00
Gabriel Peal	3a1be084f9	[MCP] Add experimental support for streamable HTTP MCP servers (#4317 ) This PR adds support for streamable HTTP MCP servers when the `experimental_use_rmcp_client` is enabled. To set one up, simply add a new mcp server config with the url: ``` [mcp_servers.figma] url = "http://127.0.0.1:3845/mcp" ``` It also supports an optional `bearer_token` which will be provided in an authorization header. The full oauth flow is not supported yet. The config parsing will throw if it detects that the user mixed and matched config fields (like command + bearer token or url + env). The best way to review it is to review `core/src` and then `rmcp-client/src/rmcp_client.rs` first. The rest is tests and propagating the `Transport` struct around the codebase. Example with the Figma MCP: <img width="5084" height="1614" alt="CleanShot 2025-09-26 at 13 35 40" src="https://github.com/user-attachments/assets/eaf2771e-df3e-4300-816b-184d7dec5a28" />	2025-09-26 21:24:01 -04:00
Jeremy Rose	43b63ccae8	update composer + user message styling (#4240 ) Changes: - the composer and user messages now have a colored background that stretches the entire width of the terminal. - the prompt character was changed from a cyan `▌` to a bold `›`. - the "working" shimmer now follows the "dark gray" color of the terminal, better matching the terminal's color scheme \| Terminal + Background \| Screenshot \| \|------------------------------\|------------\| \| iTerm with dark bg \| <img width="810" height="641" alt="Screenshot 2025-09-25 at 11 44 52 AM" src="https://github.com/user-attachments/assets/1317e579-64a9-4785-93e6-98b0258f5d92" /> \| \| iTerm with light bg \| <img width="845" height="540" alt="Screenshot 2025-09-25 at 11 46 29 AM" src="https://github.com/user-attachments/assets/e671d490-c747-4460-af0b-3f8d7f7a6b8e" /> \| \| iTerm with color bg \| <img width="825" height="564" alt="Screenshot 2025-09-25 at 11 47 12 AM" src="https://github.com/user-attachments/assets/141cda1b-1164-41d5-87da-3be11e6a3063" /> \| \| Terminal.app with dark bg \| <img width="577" height="367" alt="Screenshot 2025-09-25 at 11 45 22 AM" src="https://github.com/user-attachments/assets/93fc4781-99f7-4ee7-9c8e-3db3cd854fe5" /> \| \| Terminal.app with light bg \| <img width="577" height="367" alt="Screenshot 2025-09-25 at 11 46 04 AM" src="https://github.com/user-attachments/assets/19bf6a3c-91e0-447b-9667-b8033f512219" /> \| \| Terminal.app with color bg \| <img width="577" height="367" alt="Screenshot 2025-09-25 at 11 45 50 AM" src="https://github.com/user-attachments/assets/dd7c4b5b-342e-4028-8140-f4e65752bd0b" /> \|	2025-09-26 16:35:56 -07:00
iceweasel-oai	55801700de	reject dangerous commands for AskForApproval::Never (#4307 ) If we detect a dangerous command but approval_policy is Never, simply reject the command.	2025-09-26 14:08:28 -07:00
Ahmed Ibrahim	1fba99ed85	/status followup (#4304 ) - Render `send a message to load usage data` in the beginning of the session - Render `data not available yet` if received no rate limits - nit case - Deleted stall snapshots that were moved to `codex-rs/tui/src/status/snapshots`	2025-09-26 18:16:54 +00:00
Gabriel Peal	e555a36c6a	[MCP] Introduce an experimental official rust sdk based mcp client (#4252 ) The [official Rust SDK](`57fc428c57`) has come a long way since we first started our mcp client implementation 5 months ago and, today, it is much more complete than our own stdio-only implementation. This PR introduces a new config flag `experimental_use_rmcp_client` which will use a new mcp client powered by the sdk instead of our own. To keep this PR simple, I've only implemented the same stdio MCP functionality that we had but will expand on it with future PRs. --------- Co-authored-by: pakrym-oai <pakrym@openai.com>	2025-09-26 13:13:37 -04:00
jif-oai	8797145678	fix: token usage for compaction (#4281 ) Emit token usage update when draining compaction	2025-09-26 16:24:27 +02:00
jif-oai	1fc3413a46	ref: state - 2 (#4229 ) Extracting tasks in a module and start abstraction behind a Trait (more to come on this but each task will be tackled in a dedicated PR) The goal was to drop the ActiveTask and to have a (potentially) set of tasks during each turn	2025-09-26 13:49:08 +00:00
iceweasel-oai	eb2b739d6a	core: add potentially dangerous command check (#4211 ) Certain shell commands are potentially dangerous, and we want to check for them. Unless the user has explicitly approved a command, we will always ask them for approval when one of these commands is encountered, regardless of whether they are in a sandbox, or what their approval policy is. The first (of probably many) such examples is `git reset --hard`. We will be conservative and check for any `git reset`	2025-09-25 19:46:20 -07:00
pakrym-oai	a10403d697	Actually mount sse once (#4264 ) Mock server was responding with the same result many times.	2025-09-26 01:17:51 +00:00
pakrym-oai	8e3a048fec	Add codex exec testing helpers (#4254 ) Add a shortcut to create working directories and run codex exec with fake server.	2025-09-25 17:12:45 -07:00
Eric Traut	9f2ab97fbc	Fixed login failure with API key in IDE extension when a `.codex` directory doesn't exist (#4258 ) This addresses bug #4092 Testing: * Confirmed error occurs prior to fix if logging in using API key and no `~/.codex` directory exists * Confirmed after fix that `~/.codex` directory is properly created and error doesn't occur	2025-09-25 16:53:28 -07:00
Ahmed Ibrahim	7355ca48c5	fix (#4251 ) # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes.	2025-09-25 15:12:25 -07:00
Jeremy Rose	4a5f05c136	make tests pass cleanly in sandbox (#4067 ) This changes the reqwest client used in tests to be sandbox-friendly, and skips a bunch of other tests that don't work inside the sandbox/without network.	2025-09-25 13:11:14 -07:00
pakrym-oai	acc2b63dfb	Fix error message (#4204 ) Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>	2025-09-25 11:10:40 -07:00
Michael Bolin	a0c37f5d07	chore: refactor attempt_stream_responses() out of stream_responses() (#4194 ) I would like to be able to swap in a different way to resolve model sampling requests, so this refactoring consolidates things behind `attempt_stream_responses()` to make that easier. Ideally, we would support an in-memory backend that we can use in our integration tests, for example.	2025-09-25 10:34:07 -07:00
jif-oai	250b244ab4	ref: full state refactor (#4174 ) ## Current State Observations - `Session` currently holds many unrelated responsibilities (history, approval queues, task handles, rollout recorder, shell discovery, token tracking, etc.), making it hard to reason about ownership and lifetimes. - The anonymous `State` struct inside `codex.rs` mixes session-long data with turn-scoped queues and approval bookkeeping. - Turn execution (`run_task`) relies on ad-hoc local variables that should conceptually belong to a per-turn state object. - External modules (`codex::compact`, tests) frequently poke the raw `Session.state` mutex, which couples them to implementation details. - Interrupts, approvals, and rollout persistence all have bespoke cleanup paths, contributing to subtle bugs when a turn is aborted mid-flight. ## Desired End State - Keep a slim `Session` object that acts as the orchestrator and façade. It should expose a focused API (submit, approvals, interrupts, event emission) without storing unrelated fields directly. - Introduce a `state` module that encapsulates all mutable data structures: - `SessionState`: session-persistent data (history, approved commands, token/rate-limit info, maybe user preferences). - `ActiveTurn`: metadata for the currently running turn (sub-id, task kind, abort handle) and an `Arc<TurnState>`. - `TurnState`: all turn-scoped pieces (pending inputs, approval waiters, diff tracker, review history, auto-compact flags, last agent message, outstanding tool call bookkeeping). - Group long-lived helpers/managers into a dedicated `SessionServices` struct so `Session` does not accumulate "random" fields. - Provide clear, lock-safe APIs so other modules never touch raw mutexes. - Ensure every turn creates/drops a `TurnState` and that interrupts/finishes delegate cleanup to it.	2025-09-25 12:16:06 +02:00
pakrym-oai	e85742635f	Send text parameter for non-gpt-5 models (#4195 ) We had a hardcoded check for gpt-5 before. Fixes: https://github.com/openai/codex/issues/4181	2025-09-24 22:00:06 +00:00
Michael Bolin	87b299aa3f	chore: drop unused values from env_flags (#4188 ) For the most part, we try to avoid environment variables in favor of config options so the environment variables do not leak into child processes. These environment variables are no longer honored, so let's delete them to be clear. Ultimately, I would also like to eliminate `CODEX_RS_SSE_FIXTURE` in favor of something cleaner.	2025-09-24 14:29:51 -07:00
iceweasel-oai	0e58870634	adds a windows-specific method to check if a command is safe (#4119 ) refactors command_safety files into its own package, so we can add platform-specific ones Also creates a windows-specific of `is_known_safe_command` that just returns false always, since that is what happens today.	2025-09-24 14:03:43 -07:00
pakrym-oai	addc946d13	Simplify tool implemetations (#4160 ) Use Result<String, FunctionCallError> for all tool handling code and rely on error propagation instead of creating failed items everywhere.	2025-09-24 17:27:35 +00:00
Michael Bolin	639a6fd2f3	chore: upgrade to Rust 1.90 (#4124 ) Inspired by Dependabot's attempt to do this: https://github.com/openai/codex/pull/4029 The new version of Clippy found some unused structs that are removed in this PR. Though nothing stood out to me in the Release Notes in terms of things we should start to take advantage of: https://blog.rust-lang.org/2025/09/18/Rust-1.90.0/.	2025-09-24 08:32:00 -07:00
jif-oai	db4aa6f916	nit: 350k tokens (#4156 ) 350k tokens for gpt-5-codex auto-compaction and update comments for better description	2025-09-24 15:31:27 +00:00
Ahmed Ibrahim	cb96f4f596	Add Reset in for rate limits (#4111 ) - Parse the headers - Reorganize the struct because it's getting too long - show the resets at in the tui <img width="324" height="79" alt="image" src="https://github.com/user-attachments/assets/ca15cd48-f112-4556-91ab-1e3a9bc4683d" />	2025-09-24 15:31:08 +00:00
jif-oai	af6304c641	nit: drop instruction override for auto-compact (#4137 ) drop instruction override for auto-compact as this is not used and dangerous as it invalidates the cache	2025-09-24 10:47:12 +01:00
jif-oai	b90eeabd74	nit: update auto compact to 250k (#4135 ) update auto compact for gpt-5-codex to 250k	2025-09-24 09:41:33 +00:00
Ahmed Ibrahim	8227a5ba1b	Send limits when getting rate limited (#4102 ) Users need visibility on rate limits when they are rate limited.	2025-09-23 22:56:34 +00:00
pakrym-oai	fdb8dadcae	Add exec output-schema parameter (#4079 ) Adds structured output to `exec` via the `--structured-output` parameter.	2025-09-23 13:59:16 -07:00
pakrym-oai	0f9a796617	Use anyhow::Result in tests for error propagation (#4105 )	2025-09-23 13:31:36 -07:00
jif-oai	b84a920067	chore: compact do not modify instructions (#4088 ) Keep the developer instruction and insert the summarisation message as a user message instead	2025-09-23 17:59:17 +01:00
pakrym-oai	76ecbb3d8e	Use TestCodex builder in stream retry tests (#4096 ) ## Summary - refactor the stream retry integration tests to construct conversations through `TestCodex` - remove bespoke config and tempdir setup now handled by the shared builder ## Testing - cargo test -p codex-core --test all stream_error_allows_next_turn::continue_after_stream_error - cargo test -p codex-core --test all stream_no_completed::retries_on_early_close ------ https://chatgpt.com/codex/tasks/task_i_68d2b94d83888320bc75a0bc3bd77b49	2025-09-23 08:57:08 -07:00
jif-oai	2451b19d13	chore: enable auto-compaction for `gpt-5-codex` (#4093 ) enable auto-compaction for `gpt-5-codex` at 220k tokens	2025-09-23 16:12:36 +01:00
pakrym-oai	5c7d9e27b1	Add notifier tests (#4064 ) Proposal: 1. Use anyhow for tests and avoid unwrap 2. Extract a helper for starting a test instance of codex	2025-09-23 14:25:46 +00:00

1 2 3 4 5 ...

503 Commits