valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Gabriel Peal	4c9f7b6bcc	Fix flaky test_shell_command_approval_triggers_elicitation test (#1802 ) This doesn't flake very often but this should fix it.	2025-08-03 10:19:12 -04:00
David Z Hao	75eecb656e	Fix MacOS multiprocessing by relaxing sandbox (#1808 ) The following test script fails in the codex sandbox: ``` import multiprocessing from multiprocessing import Lock, Process def f(lock): with lock: print("Lock acquired in child process") if __name__ == '__main__': lock = Lock() p = Process(target=f, args=(lock,)) p.start() p.join() ``` with ``` Traceback (most recent call last): File "/Users/david.hao/code/codex/codex-rs/cli/test.py", line 9, in <module> lock = Lock() ^^^^^^ File "/Users/david.hao/.local/share/uv/python/cpython-3.12.9-macos-aarch64-none/lib/python3.12/multiprocessing/context.py", line 68, in Lock return Lock(ctx=self.get_context()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/david.hao/.local/share/uv/python/cpython-3.12.9-macos-aarch64-none/lib/python3.12/multiprocessing/synchronize.py", line 169, in __init__ SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx) File "/Users/david.hao/.local/share/uv/python/cpython-3.12.9-macos-aarch64-none/lib/python3.12/multiprocessing/synchronize.py", line 57, in __init__ sl = self._semlock = _multiprocessing.SemLock( ^^^^^^^^^^^^^^^^^^^^^^^^^ PermissionError: [Errno 1] Operation not permitted ``` After reading, adding this line to the sandbox configs fixes things - MacOS multiprocessing appears to use sem_lock(), which opens an IPC which is considered a disk write even though no file is created. I interrogated ChatGPT about whether it's okay to loosen, and my impression after reading is that it is, although would appreciate a close look Breadcrumb: You can run `cargo run -- debug seatbelt --full-auto <cmd>` to test the sandbox	2025-08-03 06:59:26 -07:00
aibrahim-oai	81bb1c9e26	Fix compact (#1798 ) We are not recording the summary in the history.	2025-08-02 12:05:06 -07:00
Jeremy Rose	7e0f506da2	check for updates (#1764 ) 1. Ping https://api.github.com/repos/openai/codex/releases/latest (at most once every 20 hrs) 2. Store the result in ~/.codex/version.jsonl 3. If CARGO_PKG_VERSION < latest_version, print a message at boot. --------- Co-authored-by: easong-openai <easong@openai.com>	2025-08-02 00:31:38 +00:00
pakrym-oai	929ba50adc	Update succesfull login page look (#1789 )	2025-08-01 23:30:15 +00:00
Michael Bolin	80555d4ff2	feat: make .git read-only within a writable root when using Seatbelt (#1765 ) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.	2025-08-01 16:11:24 -07:00
aibrahim-oai	97ab8fb610	MCP: add conversation.create tool [Stack 2/2] (#1783 ) Introduce conversation.create handler (handle_create_conversation) and wire it in MessageProcessor. Stack: Top: #1783 Bottom: #1784 --------- Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>	2025-08-01 22:18:36 +00:00
aibrahim-oai	fe62f859a6	Add Error variant to ConversationCreateResult [Stack 1/2] (#1784 ) Switch ConversationCreateResult from a struct to a tagged enum (Ok \| Error) Stack: Top: #1783 Bottom: #1784	2025-08-01 15:13:53 -07:00
Michael Bolin	92f3566d78	chore: introduce SandboxPolicy::WorkspaceWrite::include_default_writable_roots (#1785 ) Without this change, it is challenging to create integration tests to verify that the folders not included in `writable_roots` in `SandboxPolicy::WorkspaceWrite` are read-only because, by default, `get_writable_roots_with_cwd()` includes `TMPDIR`, which is where most integrationt tests do their work. This introduces a `use_exact_writable_roots` option to disable the default includes returned by `get_writable_roots_with_cwd()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1785). * #1765 * __->__ #1785	2025-08-01 14:15:55 -07:00
aibrahim-oai	f20de21cb6	collabse `stdout` and `stderr` delta events into one (#1787 )	2025-08-01 14:00:19 -07:00
aibrahim-oai	bc7beddaa2	feat: stream exec stdout events (#1786 ) ## Summary - stream command stdout as `ExecCommandStdout` events - forward streamed stdout to clients and ignore in human output processor - adjust call sites for new streaming API	2025-08-01 13:04:34 -07:00
Jeremy Rose	8360c6a3ec	fix insert_history modifier handling (#1774 ) This fixes a bug in insert_history_lines where writing `Line::From(vec!["A".bold(), "B".into()])` would write "B" as bold, because "B" didn't explicitly subtract bold.	2025-08-01 10:37:43 -07:00
aibrahim-oai	f918198bbb	Introduce a new function to just send user message [Stack 3/3] (#1686 ) - MCP server: add send-user-message tool to send user input to a running Codex session - Added an integration tests for the happy and sad paths Changes: • Add tool definition and schema. • Expose tool in capabilities. • Route and handle tool requests with validation. • Tests for success, bad UUID, and missing session. follow‑ups • Listen path not implemented yet; the tool is present but marked “don’t use yet” in code comments. • Session run flag reset: clear running_session_id_set appropriately after turn completion/errors. This is the third PR in a stack. Stack: Final: #1686 Intermediate: #1751 First: #1750	2025-08-01 17:04:12 +00:00
pakrym-oai	88ea215c80	Add a custom originator setting (#1781 )	2025-08-01 09:55:23 -07:00
aibrahim-oai	b67c485d84	ci fix (#1782 )	2025-08-01 09:17:13 -07:00
aibrahim-oai	e2c994e32a	Add /compact (#1527 ) - Add operation to summarize the context so far. - The operation runs a compact task that summarizes the context. - The operation clear the previous context to free the context window - The operation didn't use `run_task` to avoid corrupting the session - Add /compact in the tui https://github.com/user-attachments/assets/e06c24e5-dcfb-4806-934a-564d425a919c	2025-07-31 21:34:32 -07:00
aibrahim-oai	ad0295b893	MCP server: route structured tool-call requests and expose mcp_protocol [Stack 2/3] (#1751 ) - Expose mcp_protocol from mcp-server for reuse in tests and callers. - In MessageProcessor, detect structured ToolCallRequestParams in tools/call and forward to a new handler. - Add handle_new_tool_calls scaffold (returns error for now). - Test helper: add send_send_user_message_tool_call to McpProcess to send ConversationSendMessage requests; This is the second PR in a stack. Stack: Final: #1686 Intermediate: #1751 First: #1750	2025-08-01 02:46:04 +00:00
aibrahim-oai	d3aa5f46b7	MCP Protocol: Align tool-call response with CallToolResult [Stack 1/3] (#1750 ) # Summary - Align MCP server responses with mcp_types by emitting [CallToolResult, RequestId] instead of an object. Update send-message result to a tagged enum: Ok or Error { message }. # Why Protocol compliance with current MCP schema. # Tests - Updated assertions in mcp_protocol.rs for create/stream/send/list and error cases. This is the first PR in a stack. Stack: Final: #1686 Intermediate: #1751 First: #1750	2025-08-01 02:30:03 +00:00
easong-openai	575590e4c2	Detect kitty terminals (#1748 ) We want to detect kitty terminals so we can preferentially upgrade their UX without degrading older terminals.	2025-08-01 00:30:44 +00:00
Jeremy Rose	4aca3e46c8	insert history lines with redraw (#1769 ) This delays the call to insert_history_lines until a redraw is happening. Crucially, the new lines are inserted _after the viewport is resized_. This results in fewer stray blank lines below the viewport when modals (e.g. user approval) are closed.	2025-07-31 17:15:26 -07:00
Jeremy Rose	d787434aa8	fix: always send KeyEvent, we now check kind in the handler (#1772 ) https://github.com/openai/codex/pull/1754 and #1771 fixed the same thing in colliding ways.	2025-08-01 00:13:36 +00:00
Jeremy Rose	ea69a1d72f	lighter approval modal (#1768 ) The yellow hazard stripes were too scary :) This also has the added benefit of not rendering anything at the full width of the terminal, so resizing is a little easier to handle. <img width="860" height="390" alt="Screenshot 2025-07-31 at 4 03 29 PM" src="https://github.com/user-attachments/assets/18476e1a-065d-4da9-92fe-e94978ab0fce" /> <img width="860" height="390" alt="Screenshot 2025-07-31 at 4 05 03 PM" src="https://github.com/user-attachments/assets/337db0da-de40-48c6-ae71-0e40f24b87e7" />	2025-07-31 17:10:52 -07:00
Jeremy Rose	610addbc2e	do not dispatch key releases (#1771 ) when we enabled KKP in https://github.com/openai/codex/pull/1743, we started receiving keyup events, but didn't expect them anywhere in our code. for now, just don't dispatch them at all.	2025-07-31 17:00:48 -07:00
pakrym-oai	0935e6a875	Send account id when available (#1767 ) For users with multiple accounts we need to specify the account to use.	2025-07-31 15:40:19 -07:00
easong-openai	6ce0a5875b	Initial planning tool (#1753 ) We need to optimize the prompt, but this causes the model to use the new planning_tool. <img width="765" height="110" alt="image" src="https://github.com/user-attachments/assets/45633f7f-3c85-4e60-8b80-902f1b3b508d" />	2025-07-31 20:45:52 +00:00
Michael Bolin	5a0ad5ab8f	chore: refactor exec.rs: create separate seatbelt.rs and spawn.rs files (#1762 ) At 550 lines, `exec.rs` was a bit large. In particular, I found it hard to locate the Seatbelt-related code quickly without a file with `seatbelt` in the name, so this refactors things so: - `spawn_command_under_seatbelt()` and dependent code moves to a new `seatbelt.rs` file - `spawn_child_async()` and dependent code moves to a new `spawn.rs` file	2025-07-31 13:11:47 -07:00
easong-openai	9aa11269a5	Fix double-scrolling in approval model (#1754 ) Previously, pressing up or down arrow in the new approval modal would be the equivalent of two up or down presses.	2025-07-31 19:41:32 +00:00
Michael Bolin	06c786b2da	fix: ensure PatchApplyBeginEvent and PatchApplyEndEvent are dispatched reliably (#1760 ) This is a follow-up to https://github.com/openai/codex/pull/1705, as that PR inadvertently lost the logic where `PatchApplyBeginEvent` and `PatchApplyEndEvent` events were sent when patches were auto-approved. Though as part of this fix, I believe this also makes an important safety fix to `assess_patch_safety()`, as there was a case that returned `SandboxType::None`, which arguably is the thing we were trying to avoid in #1705. On a high level, we want there to be only one codepath where `apply_patch` happens, which should be unified with the patch to run `exec`, in general, so that sandboxing is applied consistently for both cases. Prior to this change, `apply_patch()` in `core` would either: * exit early, delegating to `exec()` to shell out to `apply_patch` using the appropriate sandbox * proceed to run the logic for `apply_patch` in memory `549846b29a/codex-rs/core/src/apply_patch.rs (L61-L63)` In this implementation, only the latter would dispatch `PatchApplyBeginEvent` and `PatchApplyEndEvent`, though the former would dispatch `ExecCommandBeginEvent` and `ExecCommandEndEvent` for the `apply_patch` call (or, more specifically, the `codex --codex-run-as-apply-patch PATCH` call). To unify things in this PR, we: * Eliminate the back half of the `apply_patch()` function, and instead have it also return with `DelegateToExec`, though we add an extra field to the return value, `user_explicitly_approved_this_action`. * In `codex.rs` where we process `DelegateToExec`, we use `SandboxType::None` when `user_explicitly_approved_this_action` is `true`. This means we no longer run the apply_patch logic in memory, as we always `exec()`. (Note this is what allowed us to delete so much code in `apply_patch.rs`.) * In `codex.rs`, we further update `notify_exec_command_begin()` and `notify_exec_command_end()` to take additional fields to determine what type of notification to send: `ExecCommand` or `PatchApply`. Admittedly, this PR also drops some of the functionality about giving the user the opportunity to expand the set of writable roots as part of approving the `apply_patch` command. I'm not sure how much that was used, and we should probably rethink how that works as we are currently tidying up the protocol to the TUI, in general.	2025-07-31 11:13:57 -07:00
pakrym-oai	549846b29a	Add codex login --api-key (#1759 ) Allow setting the API key via `codex login --api-key`	2025-07-31 17:48:49 +00:00
Jeremy Rose	96654a5d52	clamp render area to terminal size (#1758 ) this fixes a couple of panics that would happen when trying to render something larger than the terminal, or insert history lines when the top of the viewport is at y=0.	2025-07-31 09:59:36 -07:00
easong-openai	861ba86403	Show error message after panic (#1752 ) Previously we were swallowing errors and silently exiting, which isn't great for helping users help us.	2025-07-31 09:19:08 -07:00
Jeremy Rose	be0cd34300	fix git tests (#1747 ) the git tests were failing on my local machine due to gpg signing config in my ~/.gitconfig. tests should not be affected by ~/.gitconfig, so configure them to ignore it.	2025-07-31 09:17:59 -07:00
Jeremy Rose	d86270696e	streamline ui (#1733 ) Simplify and improve many UI elements. * Remove all-around borders in most places. These interact badly with terminal resizing and look heavy. Prefer left-side-only borders. * Make the viewport adjust to the size of its contents. * <kbd>/</kbd> and <kbd>@</kbd> autocomplete boxes appear below the prompt, instead of above it. * Restyle the keyboard shortcut hints & move them to the left. * Restyle the approval dialog. * Use synchronized rendering to avoid flashing during rerenders. https://github.com/user-attachments/assets/96f044af-283b-411c-b7fc-5e6b8a433c20 <img width="1117" height="858" alt="Screenshot 2025-07-30 at 5 29 20 PM" src="https://github.com/user-attachments/assets/0cc0af77-8396-429b-b6ee-9feaaccdbee7" />	2025-07-31 00:43:21 -07:00
pap-openai	defeafb279	add keyboard enhancements to support shift_return (#1743 ) For terminal that supports [keyboard enhancements](https://docs.rs/libcrossterm/latest/crossterm/enum.KeyboardEnhancementFlags.html), adds the enhancements (enabling [kitty keyboard protocol](https://sw.kovidgoyal.net/kitty/keyboard-protocol/)) to support shift+enter listener. Those users (users with terminals listed on [KPP](https://sw.kovidgoyal.net/kitty/keyboard-protocol/)) should be able to press shift+return for new line --------- Co-authored-by: easong-openai <easong@openai.com>	2025-07-31 03:23:56 +00:00
pakrym-oai	51b6bdefbe	Auto format toml (#1745 ) Add recommended extension and configure it to auto format prompt.	2025-07-30 18:37:00 -07:00
Jeremy Rose	f2134f6633	resizable viewport (#1732 ) Proof of concept for a resizable viewport. The general approach here is to duplicate the `Terminal` struct from ratatui, but with our own logic. This is a "light fork" in that we are still using all the base ratatui functions (`Buffer`, `Widget` and so on), but we're doing our own bookkeeping at the top level to determine where to draw everything. This approach could use improvement—e.g, when the window is resized to a smaller size, if the UI wraps, we don't correctly clear out the artifacts from wrapping. This is possible with a little work (i.e. tracking what parts of our UI would have been wrapped), but this behavior is at least at par with the existing behavior. https://github.com/user-attachments/assets/4eb17689-09fd-4daa-8315-c7ebc654986d cc @joshka who might have Thoughts™	2025-07-31 00:06:55 +00:00
Michael Bolin	221ebfcccc	fix: run apply_patch calls through the sandbox (#1705 ) Building on the work of https://github.com/openai/codex/pull/1702, this changes how a shell call to `apply_patch` is handled. Previously, a shell call to `apply_patch` was always handled in-process, never leveraging a sandbox. To determine whether the `apply_patch` operation could be auto-approved, the `is_write_patch_constrained_to_writable_paths()` function would check if all the paths listed in the paths were writable. If so, the agent would apply the changes listed in the patch. Unfortunately, this approach afforded a loophole: symlinks! * For a soft link, we could fix this issue by tracing the link and checking whether the target is in the set of writable paths, however... * ...For a hard link, things are not as simple. We can run `stat FILE` to see if the number of links is greater than 1, but then we would have to do something potentially expensive like `find . -inum <inode_number>` to find the other paths for `FILE`. Further, even if this worked, this approach runs the risk of a [TOCTOU](https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use) race condition, so it is not robust. The solution, implemented in this PR, is to take the virtual execution of the `apply_patch` CLI into an _actual_ execution using `codex --codex-run-as-apply-patch PATCH`, which we can run under the sandbox the user specified, just like any other `shell` call. This, of course, assumes that the sandbox prevents writing through symlinks as a mechanism to write to folders that are not in the writable set configured by the sandbox. I verified this by testing the following on both Mac and Linux: ```shell #!/usr/bin/env bash set -euo pipefail # Can running a command in SANDBOX_DIR write a file in EXPLOIT_DIR? # Codex is run in SANDBOX_DIR, so writes should be constrianed to this directory. SANDBOX_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX) # EXPLOIT_DIR is outside of SANDBOX_DIR, so let's see if we can write to it. EXPLOIT_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX) echo "SANDBOX_DIR: $SANDBOX_DIR" echo "EXPLOIT_DIR: $EXPLOIT_DIR" cleanup() { # Only remove if it looks sane and still exists [[ -n "${SANDBOX_DIR:-}" && -d "$SANDBOX_DIR" ]] && rm -rf -- "$SANDBOX_DIR" [[ -n "${EXPLOIT_DIR:-}" && -d "$EXPLOIT_DIR" ]] && rm -rf -- "$EXPLOIT_DIR" } trap cleanup EXIT echo "I am the original content" > "${EXPLOIT_DIR}/original.txt" # Drop the -s to test hard links. ln -s "${EXPLOIT_DIR}/original.txt" "${SANDBOX_DIR}/link-to-original.txt" cat "${SANDBOX_DIR}/link-to-original.txt" if [[ "$(uname)" == "Linux" ]]; then SANDBOX_SUBCOMMAND=landlock else SANDBOX_SUBCOMMAND=seatbelt fi # Attempt the exploit cd "${SANDBOX_DIR}" codex debug "${SANDBOX_SUBCOMMAND}" bash -lc "echo pwned > ./link-to-original.txt" \|\| true cat "${EXPLOIT_DIR}/original.txt" ``` Admittedly, this change merits a proper integration test, but I think I will have to do that in a follow-up PR.	2025-07-30 16:45:08 -07:00
pakrym-oai	301ec72107	Add login status command (#1716 ) Print the current login mode, sanitized key and return an appropriate status.	2025-07-30 14:09:26 -07:00
pakrym-oai	e0e245cc1c	Send AGENTS.md as a separate user message (#1737 )	2025-07-30 13:56:24 -07:00
aibrahim-oai	2f5557056d	moving input item from MCP Protocol back to core Protocol (#1740 ) - Currently we have duplicate input item. Let's have one source of truth in the core. - Used Requestid type	2025-07-30 13:43:08 -07:00
pakrym-oai	ea01a5ffe2	Add support for a separate chatgpt auth endpoint (#1712 ) Adds a `CodexAuth` type that encapsulates information about available auth modes and logic for refreshing the token. Changes `Responses` API to send requests to different endpoints based on the auth type. Updates login_with_chatgpt to support API-less mode and skip the key exchange.	2025-07-30 19:40:15 +00:00
aibrahim-oai	93341797c4	fix ci (#1739 ) I think this commit broke the CI because it changed the `McpToolCallBeginEvent` type: `347c81ad00`	2025-07-30 11:32:38 -07:00
Jeremy Rose	347c81ad00	remove conversation history widget (#1727 ) this widget is no longer used.	2025-07-30 10:05:40 -07:00
aibrahim-oai	3823b32b7a	Mcp protocol (#1715 ) - Add typed MCP protocol surface in `codex-rs/mcp-server/src/mcp_protocol.rs` for `requests`, `responses`, and `notifications` - Requests: `NewConversation`, `Connect`, `SendUserMessage`, `GetConversations` - Message content parts: `Text`, `Image` (`ImageUrl`/`FileId`, optional `ImageDetail`), File (`Url`/`Id`/`inline Data`) - Responses: `ToolCallResponseEnvelope` with optional `isError` and `structuredContent` variants (`NewConversation`, `Connect`, `SendUserMessageAccepted`, `GetConversations`) - Notifications: `InitialState`, `ConnectionRevoked`, `CodexEvent`, `Cancelled` - Uniform `_meta` on `notifications` via `NotificationMeta` (`conversationId`, `requestId`) - Unit tests validate JSON wire shapes for key `requests`/`responses`/`notifications`	2025-07-29 20:14:41 -07:00
pakrym-oai	6b10e22eb3	Trim bash lc and run with login shell (#1725 ) include .zshenv, .zprofile by running with the `-l` flag and don't start a shell inside a shell when we see the typical `bash -lc` invocation.	2025-07-29 16:49:02 -07:00
Gabriel Peal	8828f6f082	Add an experimental plan tool (#1726 ) This adds a tool the model can call to update a plan. The tool doesn't actually _do_ anything but it gives clients a chance to read and render the structured plan. We will likely iterate on the prompt and tools exposed for planning over time.	2025-07-29 14:22:02 -04:00
easong-openai	f8fcaaaf6f	Relative instruction file (#1722 ) Passing in an instruction file with a bad path led to silent failures, also instruction relative paths were handled in an unintuitive fashion.	2025-07-29 10:06:05 -07:00
Jeremy Rose	fc85f4812f	feat: map ^U to kill-line-to-head (#1711 ) see [discussion](https://github.com/rhysd/tui-textarea/issues/51#issuecomment-3021191712), it's surprising that ^U behaves this way. IMO the undo/redo functionality in tui-textarea isn't good enough to be worth preserving, but if we do bring it back it should probably be on C-z / C-S-z / C-y.	2025-07-29 09:40:26 -07:00
easong-openai	efe7f3c793	alternate login wording? (#1723 ) Co-authored-by: Jeremy Rose <172423086+nornagon-openai@users.noreply.github.com>	2025-07-29 16:23:09 +00:00
Jeremy Rose	f66704a88f	replace login screen with a simple prompt (#1713 ) Perhaps there was an intention to make the login screen prettier, but it feels quite silly right now to just have a screen that says "press q", so replace it with something that lets the user directly login without having to quit the app. <img width="1283" height="635" alt="Screenshot 2025-07-28 at 2 54 05 PM" src="https://github.com/user-attachments/assets/f19e5595-6ef9-4a2d-b409-aa61b30d3628" />	2025-07-28 17:25:14 -07:00

1 2 3 4 5 ...

302 Commits