valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
jif-oai	8fed0b53c4	test: reduce time dependency on test harness (#5053 ) Tightened the CLI integration tests to stop relying on wall-clock sleeps—new fs watcher helper waits for session files instead of timing out, and SSE mocks/fixtures make the flows deterministic.	2025-10-15 09:56:59 +01:00
Dylan	0a0a10d8b3	fix: apply_patch shell_serialization tests (#4786 ) ## Summary Adds additional shell_serialization tests specifically for apply_patch and other cases. ## Test Plan - [x] These are all tests	2025-10-14 13:00:49 -07:00
pakrym-oai	35a770e871	Simplify request body assertions (#4845 ) We'll have a lot more test like these	2025-10-07 09:56:39 +01:00
pakrym-oai	b2d81a7cac	Make output assertions more explicit (#4784 ) Match using precise regexes.	2025-10-05 16:01:38 -07:00
pakrym-oai	aecbe0f333	Add helper for response created SSE events in tests (#4758 ) ## Summary - add a reusable `ev_response_created` helper that builds `response.created` SSE events for integration tests - update the exec and core integration suites to use the new helper instead of repeating manual JSON literals - keep the streaming fixtures consistent by relying on the shared helper in every touched test ## Testing - `just fmt` ------ https://chatgpt.com/codex/tasks/task_i_68e1fe885bb883208aafffb94218da61	2025-10-05 21:11:43 +00:00
Ahmed Ibrahim	90ef94d3b3	Surface context window error to the client (#4675 ) In the past, we were treating `input exceeded context window` as a streaming error and retrying on it. Retrying on it has no point because it won't change the behavior. In this PR, we surface the error to the client without retry and also send a token count event to indicate that the context window is full. <img width="650" height="125" alt="image" src="https://github.com/user-attachments/assets/c26b1213-4c27-4bfc-90f4-51a270a3efd5" />	2025-10-05 01:40:06 +00:00
Gabriel Peal	1d17ca1fa3	[MCP] Add support for MCP Oauth credentials (#4517 ) This PR adds oauth login support to streamable http servers when `experimental_use_rmcp_client` is enabled. This PR is large but represents the minimal amount of work required for this to work. To keep this PR smaller, login can only be done with `codex mcp login` and `codex mcp logout` but it doesn't appear in `/mcp` or `codex mcp list` yet. Fingers crossed that this is the last large MCP PR and that subsequent PRs can be smaller. Under the hood, credentials are stored using platform credential managers using the [keyring crate](https://crates.io/crates/keyring). When the keyring isn't available, it falls back to storing credentials in `CODEX_HOME/.credentials.json` which is consistent with how other coding agents handle authentication. I tested this on macOS, Windows, WSL (ubuntu), and Linux. I wasn't able to test the dbus store on linux but did verify that the fallback works. One quirk is that if you have credentials, during development, every build will have its own ad-hoc binary so the keyring won't recognize the reader as being the same as the write so it may ask for the user's password. I may add an override to disable this or allow users/enterprises to opt-out of the keyring storage if it causes issues. <img width="5064" height="686" alt="CleanShot 2025-09-30 at 19 31 40" src="https://github.com/user-attachments/assets/9573f9b4-07f1-4160-83b8-2920db287e2d" /> <img width="745" height="486" alt="image" src="https://github.com/user-attachments/assets/9562649b-ea5f-4f22-ace2-d0cb438b143e" />	2025-10-03 13:43:12 -04:00
jif-oai	33d3ecbccc	chore: refactor tool handling (#4510 ) # Tool System Refactor - Centralizes tool definitions and execution in `core/src/tools/`: specs (`spec.rs`), handlers (`handlers/`), router (`router.rs`), registry/dispatch (`registry.rs`), and shared context (`context.rs`). One registry now builds the model-visible tool list and binds handlers. - Router converts model responses to tool calls; Registry dispatches with consistent telemetry via `codex-rs/otel` and unified error handling. Function, Local Shell, MCP, and experimental `unified_exec` all flow through this path; legacy shell aliases still work. - Rationale: reduce per‑tool boilerplate, keep spec/handler in sync, and make adding tools predictable and testable. Example: `read_file` - Spec: `core/src/tools/spec.rs` (see `create_read_file_tool`, registered by `build_specs`). - Handler: `core/src/tools/handlers/read_file.rs` (absolute `file_path`, 1‑indexed `offset`, `limit`, `L#: ` prefixes, safe truncation). - E2E test: `core/tests/suite/read_file.rs` validates the tool returns the requested lines. ## Next steps: - Decompose `handle_container_exec_with_params` - Add parallel tool calls	2025-10-03 13:21:06 +01:00
pakrym-oai	2f6fb37d72	Support CODEX_API_KEY for codex exec (#4615 ) Allows to set API key per invocation of `codex exec`	2025-10-02 09:59:45 -07:00
vishnu-oai	04c1782e52	OpenTelemetry events (#2103 ) ### Title ## otel Codex can emit [OpenTelemetry](https://opentelemetry.io/) log events that describe each run: outbound API requests, streamed responses, user input, tool-approval decisions, and the result of every tool invocation. Export is disabled by default so local runs remain self-contained. Opt in by adding an `[otel]` table and choosing an exporter. ```toml [otel] environment = "staging" # defaults to "dev" exporter = "none" # defaults to "none"; set to otlp-http or otlp-grpc to send events log_user_prompt = false # defaults to false; redact prompt text unless explicitly enabled ``` Codex tags every exported event with `service.name = "codex-cli"`, the CLI version, and an `env` attribute so downstream collectors can distinguish dev/staging/prod traffic. Only telemetry produced inside the `codex_otel` crate—the events listed below—is forwarded to the exporter. ### Event catalog Every event shares a common set of metadata fields: `event.timestamp`, `conversation.id`, `app.version`, `auth_mode` (when available), `user.account_id` (when available), `terminal.type`, `model`, and `slug`. With OTEL enabled Codex emits the following event types (in addition to the metadata above): - `codex.api_request` - `cf_ray` (optional) - `attempt` - `duration_ms` - `http.response.status_code` (optional) - `error.message` (failures) - `codex.sse_event` - `event.kind` - `duration_ms` - `error.message` (failures) - `input_token_count` (completion only) - `output_token_count` (completion only) - `cached_token_count` (completion only, optional) - `reasoning_token_count` (completion only, optional) - `tool_token_count` (completion only) - `codex.user_prompt` - `prompt_length` - `prompt` (redacted unless `log_user_prompt = true`) - `codex.tool_decision` - `tool_name` - `call_id` - `decision` (`approved`, `approved_for_session`, `denied`, or `abort`) - `source` (`config` or `user`) - `codex.tool_result` - `tool_name` - `call_id` - `arguments` - `duration_ms` (execution time for the tool) - `success` (`"true"` or `"false"`) - `output` ### Choosing an exporter Set `otel.exporter` to control where events go: - `none` – leaves instrumentation active but skips exporting. This is the default. - `otlp-http` – posts OTLP log records to an OTLP/HTTP collector. Specify the endpoint, protocol, and headers your collector expects: ```toml [otel] exporter = { otlp-http = { endpoint = "https://otel.example.com/v1/logs", protocol = "binary", headers = { "x-otlp-api-key" = "${OTLP_TOKEN}" } }} ``` - `otlp-grpc` – streams OTLP log records over gRPC. Provide the endpoint and any metadata headers: ```toml [otel] exporter = { otlp-grpc = { endpoint = "https://otel.example.com:4317", headers = { "x-otlp-meta" = "abc123" } }} ``` If the exporter is `none` nothing is written anywhere; otherwise you must run or point to your own collector. All exporters run on a background batch worker that is flushed on shutdown. If you build Codex from source the OTEL crate is still behind an `otel` feature flag; the official prebuilt binaries ship with the feature enabled. When the feature is disabled the telemetry hooks become no-ops so the CLI continues to function without the extra dependencies. --------- Co-authored-by: Anton Panasenko <apanasenko@openai.com>	2025-09-29 11:30:55 -07:00
pakrym-oai	a10403d697	Actually mount sse once (#4264 ) Mock server was responding with the same result many times.	2025-09-26 01:17:51 +00:00
pakrym-oai	8e3a048fec	Add codex exec testing helpers (#4254 ) Add a shortcut to create working directories and run codex exec with fake server.	2025-09-25 17:12:45 -07:00
Jeremy Rose	4a5f05c136	make tests pass cleanly in sandbox (#4067 ) This changes the reqwest client used in tests to be sandbox-friendly, and skips a bunch of other tests that don't work inside the sandbox/without network.	2025-09-25 13:11:14 -07:00
pakrym-oai	5c7d9e27b1	Add notifier tests (#4064 ) Proposal: 1. Use anyhow for tests and avoid unwrap 2. Extract a helper for starting a test instance of codex	2025-09-23 14:25:46 +00:00
jif-oai	e5fe50d3ce	chore: unify cargo versions (#4044 ) Unify cargo versions at root	2025-09-22 16:47:01 +00:00
pakrym-oai	14a115d488	Add non_sandbox_test helper (#3880 ) Makes tests shorter	2025-09-22 14:50:41 +00:00
pakrym-oai	9b18875a42	Use helpers instead of fixtures (#3888 ) Move to using test helper method everywhere.	2025-09-19 06:46:25 -07:00
pakrym-oai	881c7978f1	Move responses mocking helpers to a shared lib (#3878 ) These are generally useful	2025-09-18 17:53:14 -07:00
Dylan	6df8e35314	[tools] Add apply_patch tool (#2303 ) ## Summary We've been seeing a number of issues and reports with our synthetic `apply_patch` tool, e.g. #802. Let's make this a real tool - in my anecdotal testing, it's critical for GPT-OSS models, but I'd like to make it the standard across GPT-5 and codex models as well. ## Testing - [x] Tested locally - [x] Integration test	2025-08-15 11:55:53 -04:00
Parker Thompson	a075424437	Added `allow-expect-in-tests` / `allow-unwrap-in-tests` (#2328 ) This PR: * Added the clippy.toml to configure allowable expect / unwrap usage in tests * Removed as many expect/allow lines as possible from tests * moved a bunch of allows to expects where possible Note: in integration tests, non `#[test]` helper functions are not covered by this so we had to leave a few lingering `expect(expect_used` checks around	2025-08-14 17:59:01 -07:00
Michael Bolin	08ed618f72	chore: introduce ConversationManager as a clearinghouse for all conversations (#2240 ) This PR does two things because after I got deep into the first one I started pulling on the thread to the second: - Makes `ConversationManager` the place where all in-memory conversations are created and stored. Previously, `MessageProcessor` in the `codex-mcp-server` crate was doing this via its `session_map`, but this is something that should be done in `codex-core`. - It unwinds the `ctrl_c: tokio::sync::Notify` that was threaded throughout our code. I think this made sense at one time, but now that we handle Ctrl-C within the TUI and have a proper `Op::Interrupt` event, I don't think this was quite right, so I removed it. For `codex exec` and `codex proto`, we now use `tokio::signal::ctrl_c()` directly, but we no longer make `Notify` a field of `Codex` or `CodexConversation`. Changes of note: - Adds the files `conversation_manager.rs` and `codex_conversation.rs` to `codex-core`. - `Codex` and `CodexSpawnOk` are no longer exported from `codex-core`: other crates must use `CodexConversation` instead (which is created via `ConversationManager`). - `core/src/codex_wrapper.rs` has been deleted in favor of `ConversationManager`. - `ConversationManager::new_conversation()` returns `NewConversation`, which is in line with the `new_conversation` tool we want to add to the MCP server. Note `NewConversation` includes `SessionConfiguredEvent`, so we eliminate checks in cases like `codex-rs/core/tests/client.rs` to verify `SessionConfiguredEvent` is the first event because that is now internal to `ConversationManager`. - Quite a bit of code was deleted from `codex-rs/mcp-server/src/message_processor.rs` since it no longer has to manage multiple conversations itself: it goes through `ConversationManager` instead. - `core/tests/live_agent.rs` has been deleted because I had to update a bunch of tests and all the tests in here were ignored, and I don't think anyone ever ran them, so this was just technical debt, at this point. - Removed `notify_on_sigint()` from `util.rs` (and in a follow-up, I hope to refactor the blandly-named `util.rs` into more descriptive files). - In general, I started replacing local variables named `codex` as `conversation`, where appropriate, though admittedly I didn't do it through all the integration tests because that would have added a lot of noise to this PR. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2240). * #2264 * #2263 * __->__ #2240	2025-08-13 13:38:18 -07:00
easong-openai	6340acd885	Re-add markdown streaming (#2029 ) Wait for newlines, then render markdown on a line by line basis. Word wrap it for the current terminal size and then spit it out line by line into the UI. Also adds tests and fixes some UI regressions.	2025-08-12 17:37:28 -07:00
Dylan	0091930f5a	[core] Allow resume after client errors (#2053 ) ## Summary Allow tui conversations to resume after the client fails out of retries. I tested this with exec / mocked api failures as well, and it appears to be fine. But happy to add an exec integration test as well! ## Testing - [x] Added integration test - [x] Tested locally	2025-08-08 18:21:19 -07:00
Michael Bolin	7af9cedbd7	fix: create separate test_support crates to eliminate #[allow(dead_code)] (#1667 ) Because of a quirk of how implementation tests work in Rust, we had a number of `#[allow(dead_code)]` annotations that were misleading because the functions _were_ being used, just not by all integration tests in a `tests/` folder, so when compiling the test that did not use the function, clippy would complain that it was unused. This fixes things by create a "test_support" crate under the `tests/` folder that is imported as a dev dependency for the respective crate.	2025-07-24 12:19:46 -07:00

24 Commits