valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Owen Lin	6582554926	[app-server] feat: v2 Turn APIs (#6216 ) Implements: ``` turn/start turn/interrupt ``` along with their integration tests. These are relatively light wrappers around the existing core logic, and changes to core logic are minimal. However, an improvement made for developer ergonomics: - `turn/start` replaces both `SendUserMessage` (no turn overrides) and `SendUserTurn` (can override model, approval policy, etc.)	2025-11-06 16:36:36 +00:00
Thibault Sottiaux	667e841d3e	feat: support models with single reasoning effort (#6300 )	2025-11-05 23:06:45 -08:00
Celia Chen	229d18f4d2	[App-server] Add account/login/cancel v2 endpoint (#6288 ) Add `account/login/cancel` v2 endpoint for auth. this is similar implementation to `cancelLoginChatgpt` v1 endpoint.	2025-11-06 01:13:55 +00:00
Celia Chen	05f0b4f590	[App-server] Implement v2 for `account/login/start` and `account/login/completed` (#6183 ) This PR implements `account/login/start` and `account/login/completed`. Instead of having separate endpoints for login with chatgpt and api, we have a single enum handling different login methods. For sync auth methods like sign in with api key, we still send a `completed` notification back to be compatible with the async login flow.	2025-11-05 13:52:50 -08:00
Eric Traut	d7953aed74	Fixes intermittent test failures in CI (#6282 ) I'm seeing two tests fail intermittently in CI. This PR attempts to address (or at least mitigate) the flakiness. * summarize_context_three_requests_and_instructions - The test snapshots server.received_requests() immediately after observing TaskComplete. Because the OpenAI /v1/responses call is streamed, the HTTP request can still be draining when that event fires, so wiremock occasionally reports only two captured requests. Fix is to wait for async activity to complete. * archive_conversation_moves_rollout_into_archived_directory - times out on a slow CI run. Mitigation is to increase timeout value from 10s to 20s.	2025-11-05 13:12:25 -08:00
Owen Lin	2ab1650d4d	[app-server] feat: v2 Thread APIs (#6214 ) Implements: ``` thread/list thread/start thread/resume thread/archive ``` along with their integration tests. These are relatively light wrappers around the existing core logic, and changes to core logic are minimal. However, an improvement made for developer ergonomics: - `thread/start` and `thread/resume` automatically attaches a conversation listener internally, so clients don't have to make a separate `AddConversationListener` call like they do today. For consistency, also updated `model/list` and `feedback/upload` (naming conventions, list API params).	2025-11-05 20:28:43 +00:00
Owen Lin	edf4c3f627	[app-server] feat: export.rs supports a v2 namespace, initial v2 notifications (#6212 ) Typescript and JSON schema exports While working on Thread/Turn/Items type definitions, I realize we will run into name conflicts between v1 and v2 APIs (e.g. `RateLimitWindow` which won't be reusable since v1 uses `RateLimitWindow` from `protocol/` which uses snake_case, but we want to expose camelCase everywhere, so we'll define a V2 version of that struct that serializes as camelCase). To set us up for a clean and isolated v2 API, generate types into a `v2/` namespace for both typescript and JSON schema. - TypeScript: v2 types emit under `out_dir/v2/.ts`, and root index.ts now re-exports them via `export as v2 from "./v2"`;. - JSON Schemas: v2 definitions bundle under `#/definitions/v2/` rather than the root. The location for the original types (v1 and types pulled from `protocol/` and other core crates) haven't changed and are still at the root. This is for backwards compatibility: no breaking changes to existing usages of v1 APIs and types. Notifications* While working on export.rs, I: - refactored server/client notifications with macros (like we already do for methods) so they also get exported (I noticed they weren't being exported at all). - removed the hardcoded list of types to export as JSON schema by leveraging the existing macros instead - and took a stab at API V2 notifications. These aren't wired up yet, and I expect to iterate on these this week.	2025-11-05 01:02:39 +00:00
Celia Chen	d3187dbc17	[App-server] v2 for account/updated and account/logout (#6175 ) V2 for `account/updated` and `account/logout` for app server. correspond to old `authStatusChange` and `LogoutChatGpt` respectively. Followup PRs will make other v2 endpoints call `account/updated` instead of `authStatusChange` too.	2025-11-03 22:01:33 -08:00
pakrym-oai	2371d771cc	Update user instruction message format (#6010 )	2025-10-30 18:44:02 -07:00
Anton Panasenko	9572cfc782	[codex] add developer instructions (#5897 ) we are using developer instructions for code reviews, we need to pass them in cli as well.	2025-10-30 11:18:31 -07:00
Rasmus Rygaard	39e09c289d	Add a wrapper around raw response items (#5923 ) We currently have nested enums when sending raw response items in the app-server protocol. This makes downstream schemas confusing because we need to embed `type`-discriminated enums within each other. This PR adds a small wrapper around the response item so we can keep the schemas separate	2025-10-29 20:32:40 +00:00
Anton Panasenko	149e198ce8	[codex][app-server] resume conversation from history (#5893 )	2025-10-28 18:18:03 -07:00
Gabriel Peal	1d76ba5ebe	[App Server] Allow fetching or resuming a conversation summary from the conversation id (#5890 ) This PR adds an option to app server to allow conversation summaries to be fetched from just the conversation id rather than rollout path for convenience at the cost of some latency to discover the rollout path. This convenience is non-trivial as it allows app servers to simply maintain conversation ids rather than rollout paths and the associated platform (Windows) handling associated with storing and encoding them correctly.	2025-10-28 20:17:22 -04:00
Owen Lin	266419217e	chore: use anyhow::Result for all app-server integration tests (#5836 ) There's a lot of visual noise in app-server's integration tests due to the number of `.expect("<some_msg>")` lines which are largely redundant / not very useful. Clean them up by using `anyhow::Result` + `?` consistently. Replaces the existing pattern of: ``` let codex_home = TempDir::new().expect("create temp dir"); create_config_toml(codex_home.path()).expect("write config.toml"); let mut mcp = McpProcess::new(codex_home.path()) .await .expect("spawn mcp process"); timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()) .await .expect("initialize timeout") .expect("initialize request"); ``` With: ``` let codex_home = TempDir::new()?; create_config_toml(codex_home.path())?; let mut mcp = McpProcess::new(codex_home.path()).await?; timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??; ```	2025-10-28 08:10:23 -07:00
Celia Chen	4a42c4e142	[Auth] Choose which auth storage to use based on config (#5792 ) This PR is a follow-up to #5591. It allows users to choose which auth storage mode they want by using the new `cli_auth_credentials_store_mode` config.	2025-10-27 19:41:49 -07:00
Michael Bolin	5907422d65	feat: annotate conversations with model_provider for filtering (#5658 ) Because conversations that use the Responses API can have encrypted reasoning messages, trying to resume a conversation with a different provider could lead to confusing "failed to decrypt" errors. (This is reproducible by starting a conversation using ChatGPT login and resuming it as a conversation that uses OpenAI models via Azure.) This changes `ListConversationsParams` to take a `model_providers: Option<Vec<String>>` and adds `model_provider` on each `ConversationSummary` it returns so these cases can be disambiguated. Note this ended up making changes to `codex-rs/core/src/rollout/tests.rs` because it had a number of cases where it expected `Some` for the value of `next_cursor`, but the list of rollouts was complete, so according to this docstring: `bcd64c7e72/codex-rs/app-server-protocol/src/protocol.rs (L334-L337)` If there are no more items to return, then `next_cursor` should be `None`. This PR updates that logic. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/5658). * #5803 * #5793 * __->__ #5658	2025-10-27 02:03:30 -07:00
Eric Traut	0533bd2e7c	Fixed flaky unit test (#5654 ) This PR fixes a test that is sporadically failing in CI. The problem is that two unit tests (the older `login_and_cancel_chatgpt` and a recently added `login_chatgpt_includes_forced_workspace_query_param`) exercise code paths that start the login server. The server binds to a hard-coded localhost port number, so attempts to start more than one server at the same time will fail. If these two tests happen to run concurrently, one of them will fail. To fix this, I've added a simple mutex. We can use this same mutex for future tests that use the same pattern.	2025-10-24 16:31:24 -07:00
Anton Panasenko	6af83d86ff	[codex][app-server] introduce codex/event/raw_item events (#5578 )	2025-10-24 22:41:52 +00:00
Eric Traut	f8af4f5c8d	Added model summary and risk assessment for commands that violate sandbox policy (#5536 ) This PR adds support for a model-based summary and risk assessment for commands that violate the sandbox policy and require user approval. This aids the user in evaluating whether the command should be approved. The feature works by taking a failed command and passing it back to the model and asking it to summarize the command, give it a risk level (low, medium, high) and a risk category (e.g. "data deletion" or "data exfiltration"). It uses a new conversation thread so the context in the existing thread doesn't influence the answer. If the call to the model fails or takes longer than 5 seconds, it falls back to the current behavior. For now, this is an experimental feature and is gated by a config key `experimental_sandbox_command_assessment`. Here is a screen shot of the approval prompt showing the risk assessment and summary. <img width="723" height="282" alt="image" src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b" />	2025-10-24 15:23:44 -07:00
pakrym-oai	3c90728a29	Add new thread items and rewire event parsing to use them (#5418 ) 1. Adds AgentMessage, Reasoning, WebSearch items. 2. Switches the ResponseItem parsing to use new items and then also emit 3. Removes user-item kind and filters out "special" (environment) user items when returning to clients.	2025-10-22 10:14:50 -07:00
Owen Lin	26f314904a	[app-server] model/list API (#5382 ) Adds a `model/list` paginated API that returns the list of models supported by Codex.	2025-10-21 11:15:17 -07:00
Owen Lin	5c680c6587	[app-server] read rate limits API (#5302 ) Adds a `GET account/rateLimits/read` API to app-server. This calls the codex backend to fetch the user's current rate limits. This would be helpful in checking rate limits without having to send a message. For calling the codex backend usage API, I generated the types and manually copied the relevant ones into `codex-backend-openapi-types`. It'll be nice to extend our internal openapi generator to support Rust so we don't have to run these manual steps. # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes.	2025-10-20 14:11:54 -07:00
Gabriel Peal	d87f87e25b	Add forced_chatgpt_workspace_id and forced_login_method configuration options (#5303 ) This PR adds support for configs to specify a forced login method (chatgpt or api) as well as a forced chatgpt account id. This lets enterprises uses [managed configs](https://developers.openai.com/codex/security#managed-configuration) to force all employees to use their company's workspace instead of their own or any other. When a workspace id is set, a query param is sent to the login flow which auto-selects the given workspace or errors if the user isn't a member of it. This PR is large but a large % of it is tests, wiring, and required formatting changes. API login with chatgpt forced <img width="1592" height="116" alt="CleanShot 2025-10-19 at 22 40 04" src="https://github.com/user-attachments/assets/560c6bb4-a20a-4a37-95af-93df39d057dd" /> ChatGPT login with api forced <img width="1018" height="100" alt="CleanShot 2025-10-19 at 22 40 29" src="https://github.com/user-attachments/assets/d010bbbb-9c8d-4227-9eda-e55bf043b4af" /> Onboarding with api forced <img width="892" height="460" alt="CleanShot 2025-10-19 at 22 41 02" src="https://github.com/user-attachments/assets/cc0ed45c-b257-4d62-a32e-6ca7514b5edd" /> Onboarding with ChatGPT forced <img width="1154" height="426" alt="CleanShot 2025-10-19 at 22 41 27" src="https://github.com/user-attachments/assets/41c41417-dc68-4bb4-b3e7-3b7769f7e6a1" /> Logging in with the wrong workspace <img width="2222" height="84" alt="CleanShot 2025-10-19 at 22 42 31" src="https://github.com/user-attachments/assets/0ff4222c-f626-4dd3-b035-0b7fe998a046" />	2025-10-20 08:50:54 -07:00
Michael Bolin	995f5c3614	feat: add Vec<ParsedCommand> to ExecApprovalRequestEvent (#5222 ) This adds `parsed_cmd: Vec<ParsedCommand>` to `ExecApprovalRequestEvent` in the core protocol (`protocol/src/protocol.rs`), which is also what this field is named on `ExecCommandBeginEvent`. Honestly, I don't love the name (it sounds like a single command, but it is actually a list of them), but I don't want to get distracted by a naming discussion right now. This also adds `parsed_cmd` to `ExecCommandApprovalParams` in `codex-rs/app-server-protocol/src/protocol.rs`, so it will be available via `codex app-server`, as well. For consistency, I also updated `ExecApprovalElicitRequestParams` in `codex-rs/mcp-server/src/exec_approval.rs` to include this field under the name `codex_parsed_cmd`, as that struct already has a number of special `codex_*` fields. Note this is the code for when Codex is used as an MCP _server_ and therefore has to conform to the official spec for an MCP elicitation type.	2025-10-15 13:58:40 -07:00
jif-oai	69cb72f842	chore: sandbox refactor 2 (#4653 ) Revert the revert and fix the UI issue	2025-10-03 11:17:39 +01:00
Shijie Rao	7be3b484ad	feat: add file name to fuzzy search response (#4619 ) ### Summary * Updated fuzzy search result to include the file name. * This should not affect CLI usage and the UI there will be addressed in a separate PR. ### Testing Tested locally and with the extension. ### Screenshot <img width="431" height="244" alt="Screenshot 2025-10-02 at 11 08 44 AM" src="https://github.com/user-attachments/assets/ba2ca299-a81d-4453-9242-1750e945aea2" /> --------- Co-authored-by: shijie.rao <shijie.rao@squareup.com>	2025-10-02 18:19:13 -07:00
Ahmed Ibrahim	ed5d656fa8	Revert "chore: sanbox extraction" (#4626 ) Reverts openai/codex#4286	2025-10-02 21:09:21 +00:00
jif-oai	b8195a17e5	chore: sanbox extraction (#4286 ) # Extract and Centralize Sandboxing - Goal: Improve safety and clarity by centralizing sandbox planning and execution. - Approach: - Add planner (ExecPlan) and backend registry (Direct/Seatbelt/Linux) with run_with_plan. - Refactor codex.rs to plan-then-execute; handle failures/escalation via the plan. - Delegate apply_patch to the codex binary and run it with an empty env for determinism.	2025-10-01 12:05:12 +01:00
Michael Bolin	5881c0d6d4	fix: remove mcp-types from app server protocol (#4537 ) We continue the separation between `codex app-server` and `codex mcp-server`. In particular, we introduce a new crate, `codex-app-server-protocol`, and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it `codex-rs/app-server-protocol/src/protocol.rs`. Because `ConversationId` was defined in `mcp_protocol.rs`, we move it into its own file, `codex-rs/protocol/src/conversation_id.rs`, and because it is referenced in a ton of places, we have to touch a lot of files as part of this PR. We also decide to get away from proper JSON-RPC 2.0 semantics, so we also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which is basically the same `JSONRPCMessage` type defined in `mcp-types` except with all of the `"jsonrpc": "2.0"` removed. Getting rid of `"jsonrpc": "2.0"` makes our serialization logic considerably simpler, as we can lean heavier on serde to serialize directly into the wire format that we use now.	2025-10-01 02:16:26 +00:00
Michael Bolin	32853ecbc5	fix: use macros to ensure request/response symmetry (#4529 ) Manually curating `protocol-ts/src/lib.rs` was error-prone, as expected. I finally asked Codex to write some Rust macros so we can ensure that: - For every variant of `ClientRequest` and `ServerRequest`, there is an associated `params` and `response` type. - All response types are included automatically in the output of `codex generate-ts`.	2025-09-30 18:06:05 -07:00
Michael Bolin	d9dbf48828	fix: separate `codex mcp` into `codex mcp-server` and `codex app-server` (#4471 ) This is a very large PR with some non-backwards-compatible changes. Historically, `codex mcp` (or `codex mcp serve`) started a JSON-RPC-ish server that had two overlapping responsibilities: - Running an MCP server, providing some basic tool calls. - Running the app server used to power experiences such as the VS Code extension. This PR aims to separate these into distinct concepts: - `codex mcp-server` for the MCP server - `codex app-server` for the "application server" Note `codex mcp` still exists because it already has its own subcommands for MCP management (`list`, `add`, etc.) The MCP logic continues to live in `codex-rs/mcp-server` whereas the refactored app server logic is in the new `codex-rs/app-server` folder. Note that most of the existing integration tests in `codex-rs/mcp-server/tests/suite` were actually for the app server, so all the tests have been moved with the exception of `codex-rs/mcp-server/tests/suite/mod.rs`. Because this is already a large diff, I tried not to change more than I had to, so `codex-rs/app-server/tests/common/mcp_process.rs` still uses the name `McpProcess` for now, but I will do some mechanical renamings to things like `AppServer` in subsequent PRs. While `mcp-server` and `app-server` share some overlapping functionality (like reading streams of JSONL and dispatching based on message types) and some differences (completely different message types), I ended up doing a bit of copypasta between the two crates, as both have somewhat similar `message_processor.rs` and `outgoing_message.rs` files for now, though I expect them to diverge more in the near future. One material change is that of the initialize handshake for `codex app-server`, as we no longer use the MCP types for that handshake. Instead, we update `codex-rs/protocol/src/mcp_protocol.rs` to add an `Initialize` variant to `ClientRequest`, which takes the `ClientInfo` object we need to update the `USER_AGENT_SUFFIX` in `codex-rs/app-server/src/message_processor.rs`. One other material change is in `codex-rs/app-server/src/codex_message_processor.rs` where I eliminated a use of the `send_event_as_notification()` method I am generally trying to deprecate (because it blindly maps an `EventMsg` into a `JSONNotification`) in favor of `send_server_notification()`, which takes a `ServerNotification`, as that is intended to be a custom enum of all notification types supported by the app server. So to make this update, I had to introduce a new variant of `ServerNotification`, `SessionConfigured`, which is a non-backwards compatible change with the old `codex mcp`, and clients will have to be updated after the next release that contains this PR. Note that `codex-rs/app-server/tests/suite/list_resume.rs` also had to be update to reflect this change. I introduced `codex-rs/utils/json-to-toml/src/lib.rs` as a small utility crate to avoid some of the copying between `mcp-server` and `app-server`.	2025-09-30 07:06:18 +00:00

31 Commits