valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
iceweasel-oai	eb2b739d6a	core: add potentially dangerous command check (#4211 ) Certain shell commands are potentially dangerous, and we want to check for them. Unless the user has explicitly approved a command, we will always ask them for approval when one of these commands is encountered, regardless of whether they are in a sandbox, or what their approval policy is. The first (of probably many) such examples is `git reset --hard`. We will be conservative and check for any `git reset`	2025-09-25 19:46:20 -07:00
pakrym-oai	a10403d697	Actually mount sse once (#4264 ) Mock server was responding with the same result many times.	2025-09-26 01:17:51 +00:00
pakrym-oai	8e3a048fec	Add codex exec testing helpers (#4254 ) Add a shortcut to create working directories and run codex exec with fake server.	2025-09-25 17:12:45 -07:00
Eric Traut	9f2ab97fbc	Fixed login failure with API key in IDE extension when a `.codex` directory doesn't exist (#4258 ) This addresses bug #4092 Testing: * Confirmed error occurs prior to fix if logging in using API key and no `~/.codex` directory exists * Confirmed after fix that `~/.codex` directory is properly created and error doesn't occur	2025-09-25 16:53:28 -07:00
Ahmed Ibrahim	7355ca48c5	fix (#4251 ) # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes.	2025-09-25 15:12:25 -07:00
Jeremy Rose	4a5f05c136	make tests pass cleanly in sandbox (#4067 ) This changes the reqwest client used in tests to be sandbox-friendly, and skips a bunch of other tests that don't work inside the sandbox/without network.	2025-09-25 13:11:14 -07:00
pakrym-oai	acc2b63dfb	Fix error message (#4204 ) Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>	2025-09-25 11:10:40 -07:00
Michael Bolin	a0c37f5d07	chore: refactor attempt_stream_responses() out of stream_responses() (#4194 ) I would like to be able to swap in a different way to resolve model sampling requests, so this refactoring consolidates things behind `attempt_stream_responses()` to make that easier. Ideally, we would support an in-memory backend that we can use in our integration tests, for example.	2025-09-25 10:34:07 -07:00
jif-oai	250b244ab4	ref: full state refactor (#4174 ) ## Current State Observations - `Session` currently holds many unrelated responsibilities (history, approval queues, task handles, rollout recorder, shell discovery, token tracking, etc.), making it hard to reason about ownership and lifetimes. - The anonymous `State` struct inside `codex.rs` mixes session-long data with turn-scoped queues and approval bookkeeping. - Turn execution (`run_task`) relies on ad-hoc local variables that should conceptually belong to a per-turn state object. - External modules (`codex::compact`, tests) frequently poke the raw `Session.state` mutex, which couples them to implementation details. - Interrupts, approvals, and rollout persistence all have bespoke cleanup paths, contributing to subtle bugs when a turn is aborted mid-flight. ## Desired End State - Keep a slim `Session` object that acts as the orchestrator and façade. It should expose a focused API (submit, approvals, interrupts, event emission) without storing unrelated fields directly. - Introduce a `state` module that encapsulates all mutable data structures: - `SessionState`: session-persistent data (history, approved commands, token/rate-limit info, maybe user preferences). - `ActiveTurn`: metadata for the currently running turn (sub-id, task kind, abort handle) and an `Arc<TurnState>`. - `TurnState`: all turn-scoped pieces (pending inputs, approval waiters, diff tracker, review history, auto-compact flags, last agent message, outstanding tool call bookkeeping). - Group long-lived helpers/managers into a dedicated `SessionServices` struct so `Session` does not accumulate "random" fields. - Provide clear, lock-safe APIs so other modules never touch raw mutexes. - Ensure every turn creates/drops a `TurnState` and that interrupts/finishes delegate cleanup to it.	2025-09-25 12:16:06 +02:00
pakrym-oai	e85742635f	Send text parameter for non-gpt-5 models (#4195 ) We had a hardcoded check for gpt-5 before. Fixes: https://github.com/openai/codex/issues/4181	2025-09-24 22:00:06 +00:00
Michael Bolin	87b299aa3f	chore: drop unused values from env_flags (#4188 ) For the most part, we try to avoid environment variables in favor of config options so the environment variables do not leak into child processes. These environment variables are no longer honored, so let's delete them to be clear. Ultimately, I would also like to eliminate `CODEX_RS_SSE_FIXTURE` in favor of something cleaner.	2025-09-24 14:29:51 -07:00
iceweasel-oai	0e58870634	adds a windows-specific method to check if a command is safe (#4119 ) refactors command_safety files into its own package, so we can add platform-specific ones Also creates a windows-specific of `is_known_safe_command` that just returns false always, since that is what happens today.	2025-09-24 14:03:43 -07:00
pakrym-oai	addc946d13	Simplify tool implemetations (#4160 ) Use Result<String, FunctionCallError> for all tool handling code and rely on error propagation instead of creating failed items everywhere.	2025-09-24 17:27:35 +00:00
Michael Bolin	639a6fd2f3	chore: upgrade to Rust 1.90 (#4124 ) Inspired by Dependabot's attempt to do this: https://github.com/openai/codex/pull/4029 The new version of Clippy found some unused structs that are removed in this PR. Though nothing stood out to me in the Release Notes in terms of things we should start to take advantage of: https://blog.rust-lang.org/2025/09/18/Rust-1.90.0/.	2025-09-24 08:32:00 -07:00
jif-oai	db4aa6f916	nit: 350k tokens (#4156 ) 350k tokens for gpt-5-codex auto-compaction and update comments for better description	2025-09-24 15:31:27 +00:00
Ahmed Ibrahim	cb96f4f596	Add Reset in for rate limits (#4111 ) - Parse the headers - Reorganize the struct because it's getting too long - show the resets at in the tui <img width="324" height="79" alt="image" src="https://github.com/user-attachments/assets/ca15cd48-f112-4556-91ab-1e3a9bc4683d" />	2025-09-24 15:31:08 +00:00
jif-oai	af6304c641	nit: drop instruction override for auto-compact (#4137 ) drop instruction override for auto-compact as this is not used and dangerous as it invalidates the cache	2025-09-24 10:47:12 +01:00
jif-oai	b90eeabd74	nit: update auto compact to 250k (#4135 ) update auto compact for gpt-5-codex to 250k	2025-09-24 09:41:33 +00:00
Ahmed Ibrahim	8227a5ba1b	Send limits when getting rate limited (#4102 ) Users need visibility on rate limits when they are rate limited.	2025-09-23 22:56:34 +00:00
pakrym-oai	fdb8dadcae	Add exec output-schema parameter (#4079 ) Adds structured output to `exec` via the `--structured-output` parameter.	2025-09-23 13:59:16 -07:00
pakrym-oai	0f9a796617	Use anyhow::Result in tests for error propagation (#4105 )	2025-09-23 13:31:36 -07:00
jif-oai	b84a920067	chore: compact do not modify instructions (#4088 ) Keep the developer instruction and insert the summarisation message as a user message instead	2025-09-23 17:59:17 +01:00
pakrym-oai	76ecbb3d8e	Use TestCodex builder in stream retry tests (#4096 ) ## Summary - refactor the stream retry integration tests to construct conversations through `TestCodex` - remove bespoke config and tempdir setup now handled by the shared builder ## Testing - cargo test -p codex-core --test all stream_error_allows_next_turn::continue_after_stream_error - cargo test -p codex-core --test all stream_no_completed::retries_on_early_close ------ https://chatgpt.com/codex/tasks/task_i_68d2b94d83888320bc75a0bc3bd77b49	2025-09-23 08:57:08 -07:00
jif-oai	2451b19d13	chore: enable auto-compaction for `gpt-5-codex` (#4093 ) enable auto-compaction for `gpt-5-codex` at 220k tokens	2025-09-23 16:12:36 +01:00
pakrym-oai	5c7d9e27b1	Add notifier tests (#4064 ) Proposal: 1. Use anyhow for tests and avoid unwrap 2. Extract a helper for starting a test instance of codex	2025-09-23 14:25:46 +00:00
Thibault Sottiaux	c93e77b68b	feat: update default (#4076 ) Changes: - Default model and docs now use gpt-5-codex. - Disables the GPT-5 Codex NUX by default. - Keeps presets available for API key users.	2025-09-22 20:10:52 -07:00
dedrisian-oai	c415827ac2	Truncate potentially long user messages in compact message. (#4068 ) If a prior user message is massive, any future `/compact` task would fail because we're verbatim copying the user message into the new chat.	2025-09-22 23:12:26 +00:00
Ahmed Ibrahim	dd56750612	Change headers and struct of rate limits (#4060 )	2025-09-22 21:06:20 +00:00
jif-oai	be366a31ab	chore: clippy on redundant closure (#4058 ) Add redundant closure clippy rules and let Codex fix it by minimising FQP	2025-09-22 19:30:16 +00:00
Jeremy Rose	19f46439ae	timeouts for mcp tool calls (#3959 ) defaults to 60sec, overridable with MCP_TOOL_TIMEOUT or on a per-server basis in the config.	2025-09-22 10:30:59 -07:00
jif-oai	e258ca61b4	chore: more clippy rules 2 (#4057 ) The only file to watch is the cargo.toml All the others come from just fix + a few manual small fix The set of rules have been taken from the list of clippy rules arbitrarily while trying to optimise the learning and style of the code while limiting the loss of productivity	2025-09-22 17:16:02 +00:00
jif-oai	e5fe50d3ce	chore: unify cargo versions (#4044 ) Unify cargo versions at root	2025-09-22 16:47:01 +00:00
pakrym-oai	14a115d488	Add non_sandbox_test helper (#3880 ) Makes tests shorter	2025-09-22 14:50:41 +00:00
dedrisian-oai	5996ee0e5f	feat: Add more /review options (#3961 ) Adds the following options: 1. Review current changes 2. Review a specific commit 3. Review against a base branch (PR style) 4. Custom instructions <img width="487" height="330" alt="Screenshot 2025-09-20 at 2 11 36 PM" src="https://github.com/user-attachments/assets/edb0aaa5-5747-47fa-881f-cc4c4f7fe8bc" /> --- \+ Adds the following UI helpers: 1. Makes list selection searchable 2. Adds navigation to the bottom pane, so you could add a stack of popups 3. Basic custom prompt view	2025-09-21 20:18:35 -07:00
Ahmed Ibrahim	04504d8218	Forward Rate limits to the UI (#3965 ) We currently get information about rate limits in the response headers. We want to forward them to the clients to have better transparency. UI/UX plans have been discussed and this information is needed.	2025-09-20 21:26:16 -07:00
pakrym-oai	9b18875a42	Use helpers instead of fixtures (#3888 ) Move to using test helper method everywhere.	2025-09-19 06:46:25 -07:00
pakrym-oai	881c7978f1	Move responses mocking helpers to a shared lib (#3878 ) These are generally useful	2025-09-18 17:53:14 -07:00
Ahmed Ibrahim	a7fda70053	Use a unified shell tell to not break cache (#3814 ) Currently, we change the tool description according to the sandbox policy and approval policy. This breaks the cache when the user hits `/approvals`. This PR does the following: - Always use the shell with escalation parameter: - removes `create_shell_tool_for_sandbox` and always uses unified tool via `create_shell_tool` - Reject the func call when the model uses escalation parameter when it cannot.	2025-09-19 00:08:28 +00:00
Michael Bolin	de64f5f007	fix: update try_parse_word_only_commands_sequence() to return commands in order (#3881 ) Incidentally, we had a test for this in `accepts_multiple_commands_with_allowed_operators()`, but it was verifying the bad behavior. Oops!	2025-09-18 16:07:38 -07:00
Michael Bolin	8595237505	fix: ensure cwd for conversation and sandbox are separate concerns (#3874 ) Previous to this PR, both of these functions take a single `cwd`: `71038381aa/codex-rs/core/src/seatbelt.rs (L19-L25)` `71038381aa/codex-rs/core/src/landlock.rs (L16-L23)` whereas `cwd` and `sandbox_cwd` should be set independently (fixed in this PR). Added `sandbox_distinguishes_command_and_policy_cwds()` to `codex-rs/exec/tests/suite/sandbox.rs` to verify this.	2025-09-18 14:37:06 -07:00
dedrisian-oai	62258df92f	feat: /review (#3774 ) Adds `/review` action in TUI <img width="637" height="370" alt="Screenshot 2025-09-17 at 12 41 19 AM" src="https://github.com/user-attachments/assets/b1979a6e-844a-4b97-ab20-107c185aec1d" />	2025-09-18 14:14:16 -07:00
Jeremy Rose	b34e906396	Reland "refactor transcript view to handle HistoryCells" (#3753 ) Reland of #3538	2025-09-18 20:55:53 +00:00
Jeremy Rose	71038381aa	fix error on missing notifications in [tui] (#3867 ) Fixes #3811.	2025-09-18 11:25:09 -07:00
jif-oai	277fc6254e	chore: use tokio mutex and async function to prevent blocking a worker (#3850 ) ### Why Use `tokio::sync::Mutex` `std::sync::Mutex` are not _async-aware_. As a result, they will block the entire thread instead of just yielding the task. Furthermore they can be poisoned which is not the case of `tokio` Mutex. This allows the Tokio runtime to continue running other tasks while waiting for the lock, preventing deadlocks and performance bottlenecks. In general, this is preferred in async environment	2025-09-18 18:21:52 +01:00
jif-oai	992b531180	fix: some nit Rust reference issues (#3849 ) Fix some small references issue. No behavioural change. Just making the code cleaner	2025-09-18 18:18:06 +01:00
jif-oai	4a5d6f7c71	Make ESC button work when auto-compaction (#3857 ) Only emit a task finished when the compaction comes from a `/compact`	2025-09-18 15:34:16 +00:00
pakrym-oai	d4aba772cb	Switch to uuid_v7 and tighten ConversationId usage (#3819 ) Make sure conversations have a timestamp.	2025-09-18 14:37:03 +00:00
jif-oai	4c97eeb32a	bug: Ignore tests for now (#3777 ) Ignore flaky / long tests for now	2025-09-18 10:43:45 +01:00
Thibault Sottiaux	c9505488a1	chore: update "Codex CLI harness, sandboxing, and approvals" section (#3822 )	2025-09-17 16:48:20 -07:00
dedrisian-oai	72733e34c4	Add dev message upon review out (#3758 ) Proposal: We want to record a dev message like so: ``` { "type": "message", "role": "user", "content": [ { "type": "input_text", "text": "<user_action> <context>User initiated a review task. Here's the full review output from reviewer model. User may select one or more comments to resolve.</context> <action>review</action> <results> {findings_str} </results> </user_action>" } ] }, ``` Without showing in the chat transcript. Rough idea, but it fixes issue where the user finishes a review thread, and asks the parent "fix the rest of the review issues" thinking that the parent knows about it. ### Question: Why not a tool call? Because the agent didn't make the call, it was a human. + we haven't implemented sub-agents yet, and we'll need to think about the way we represent these human-led tool calls for the agent.	2025-09-16 18:43:32 -07:00

1 2 3 4 5 ...

478 Commits