valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Jeremy Rose	64cfbbd3c8	support more keys in textarea (#1820 ) Added: * C-m for newline (not sure if this is actually treated differently to Enter, but tui-textarea handles it and it doesn't hurt) * C-d to delete one char forwards (same as Del) * A-bksp to delete backwards one word * A-arrows to navigate by word	2025-08-04 11:25:01 -07:00
easong-openai	a6139aa003	Update prompt.md (#1819 ) The existing prompt is really bad. As a low-hanging fruit, let's correct the apply_patch instructions - this helps smaller models successfully apply patches.	2025-08-04 10:42:39 -07:00
ae	dc15a5cf0b	feat: accept custom instructions in profiles (#1803 ) Allows users to set their experimental_instructions_file in configs. For example the below enables experimental instructions when running `codex -p foo`. ``` [profiles.foo] experimental_instructions_file = "/Users/foo/.codex/prompt.md" ``` # Testing - ✅ Running against a profile with experimental_instructions_file works. - ✅ Running against a profile without experimental_instructions_file works. - ✅ Running against no profile with experimental_instructions_file works. - ✅ Running against no profile without experimental_instructions_file works.	2025-08-04 09:34:46 -07:00
Gabriel Peal	1f3318c1c5	Add a TurnDiffTracker to create a unified diff for an entire turn (#1770 ) This lets us show an accumulating diff across all patches in a turn. Refer to the docs for TurnDiffTracker for implementation details. There are multiple ways this could have been done and this felt like the right tradeoff between reliability and completeness: Pros * It will pick up all changes to files that the model touched including if they prettier or another command that updates them. * It will not pick up changes made by the user or other agents to files it didn't modify. Cons * It will pick up changes that the user made to a file that the model also touched * It will not pick up changes to codegen or files that were not modified with apply_patch	2025-08-04 11:57:04 -04:00
Dylan	e3565a3f43	[sandbox] Filter out certain non-sandbox errors (#1804 ) ## Summary Users frequently complain about re-approving commands that have failed for non-sandbox reasons. We can't diagnose with complete accuracy which errors happened because of a sandbox failure, but we can start to eliminate some common simple cases. This PR captures the most common case I've seen, which is a `command not found` error. ## Testing - [x] Added unit tests - [x] Ran a few cases locally	2025-08-03 13:05:48 -07:00
Jeremy Rose	2576fadc74	shimmer on working (#1807 ) change the animation on "working" to be a text shimmer https://github.com/user-attachments/assets/f64529eb-1c64-493a-8d97-0f68b964bdd0	2025-08-03 18:51:33 +00:00
Jeremy Rose	78a1d49fac	fix command duration display (#1806 ) we were always displaying "0ms" before. <img width="731" height="101" alt="Screenshot 2025-08-02 at 10 51 22 PM" src="https://github.com/user-attachments/assets/f56814ed-b9a4-4164-9e78-181c60ce19b7" />	2025-08-03 11:33:44 -07:00
Jeremy Rose	d62b703a21	custom textarea (#1794 ) This replaces tui-textarea with a custom textarea component. Key differences: 1. wrapped lines 2. better unicode handling 3. uses the native terminal cursor This should perhaps be spun out into its own separate crate at some point, but for now it's convenient to have it in-tree.	2025-08-03 11:31:35 -07:00
Gabriel Peal	4c9f7b6bcc	Fix flaky test_shell_command_approval_triggers_elicitation test (#1802 ) This doesn't flake very often but this should fix it.	2025-08-03 10:19:12 -04:00
David Z Hao	75eecb656e	Fix MacOS multiprocessing by relaxing sandbox (#1808 ) The following test script fails in the codex sandbox: ``` import multiprocessing from multiprocessing import Lock, Process def f(lock): with lock: print("Lock acquired in child process") if __name__ == '__main__': lock = Lock() p = Process(target=f, args=(lock,)) p.start() p.join() ``` with ``` Traceback (most recent call last): File "/Users/david.hao/code/codex/codex-rs/cli/test.py", line 9, in <module> lock = Lock() ^^^^^^ File "/Users/david.hao/.local/share/uv/python/cpython-3.12.9-macos-aarch64-none/lib/python3.12/multiprocessing/context.py", line 68, in Lock return Lock(ctx=self.get_context()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/david.hao/.local/share/uv/python/cpython-3.12.9-macos-aarch64-none/lib/python3.12/multiprocessing/synchronize.py", line 169, in __init__ SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx) File "/Users/david.hao/.local/share/uv/python/cpython-3.12.9-macos-aarch64-none/lib/python3.12/multiprocessing/synchronize.py", line 57, in __init__ sl = self._semlock = _multiprocessing.SemLock( ^^^^^^^^^^^^^^^^^^^^^^^^^ PermissionError: [Errno 1] Operation not permitted ``` After reading, adding this line to the sandbox configs fixes things - MacOS multiprocessing appears to use sem_lock(), which opens an IPC which is considered a disk write even though no file is created. I interrogated ChatGPT about whether it's okay to loosen, and my impression after reading is that it is, although would appreciate a close look Breadcrumb: You can run `cargo run -- debug seatbelt --full-auto <cmd>` to test the sandbox	2025-08-03 06:59:26 -07:00
aibrahim-oai	81bb1c9e26	Fix compact (#1798 ) We are not recording the summary in the history.	2025-08-02 12:05:06 -07:00
Jeremy Rose	7e0f506da2	check for updates (#1764 ) 1. Ping https://api.github.com/repos/openai/codex/releases/latest (at most once every 20 hrs) 2. Store the result in ~/.codex/version.jsonl 3. If CARGO_PKG_VERSION < latest_version, print a message at boot. --------- Co-authored-by: easong-openai <easong@openai.com>	2025-08-02 00:31:38 +00:00
pakrym-oai	929ba50adc	Update succesfull login page look (#1789 )	2025-08-01 23:30:15 +00:00
Michael Bolin	80555d4ff2	feat: make .git read-only within a writable root when using Seatbelt (#1765 ) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.	2025-08-01 16:11:24 -07:00
aibrahim-oai	97ab8fb610	MCP: add conversation.create tool [Stack 2/2] (#1783 ) Introduce conversation.create handler (handle_create_conversation) and wire it in MessageProcessor. Stack: Top: #1783 Bottom: #1784 --------- Co-authored-by: Gabriel Peal <gpeal@users.noreply.github.com>	2025-08-01 22:18:36 +00:00
aibrahim-oai	fe62f859a6	Add Error variant to ConversationCreateResult [Stack 1/2] (#1784 ) Switch ConversationCreateResult from a struct to a tagged enum (Ok \| Error) Stack: Top: #1783 Bottom: #1784	2025-08-01 15:13:53 -07:00
Michael Bolin	92f3566d78	chore: introduce SandboxPolicy::WorkspaceWrite::include_default_writable_roots (#1785 ) Without this change, it is challenging to create integration tests to verify that the folders not included in `writable_roots` in `SandboxPolicy::WorkspaceWrite` are read-only because, by default, `get_writable_roots_with_cwd()` includes `TMPDIR`, which is where most integrationt tests do their work. This introduces a `use_exact_writable_roots` option to disable the default includes returned by `get_writable_roots_with_cwd()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1785). * #1765 * __->__ #1785	2025-08-01 14:15:55 -07:00
aibrahim-oai	f20de21cb6	collabse `stdout` and `stderr` delta events into one (#1787 )	2025-08-01 14:00:19 -07:00
aibrahim-oai	bc7beddaa2	feat: stream exec stdout events (#1786 ) ## Summary - stream command stdout as `ExecCommandStdout` events - forward streamed stdout to clients and ignore in human output processor - adjust call sites for new streaming API	2025-08-01 13:04:34 -07:00
Jeremy Rose	8360c6a3ec	fix insert_history modifier handling (#1774 ) This fixes a bug in insert_history_lines where writing `Line::From(vec!["A".bold(), "B".into()])` would write "B" as bold, because "B" didn't explicitly subtract bold.	2025-08-01 10:37:43 -07:00
aibrahim-oai	f918198bbb	Introduce a new function to just send user message [Stack 3/3] (#1686 ) - MCP server: add send-user-message tool to send user input to a running Codex session - Added an integration tests for the happy and sad paths Changes: • Add tool definition and schema. • Expose tool in capabilities. • Route and handle tool requests with validation. • Tests for success, bad UUID, and missing session. follow‑ups • Listen path not implemented yet; the tool is present but marked “don’t use yet” in code comments. • Session run flag reset: clear running_session_id_set appropriately after turn completion/errors. This is the third PR in a stack. Stack: Final: #1686 Intermediate: #1751 First: #1750	2025-08-01 17:04:12 +00:00
pakrym-oai	88ea215c80	Add a custom originator setting (#1781 )	2025-08-01 09:55:23 -07:00
aibrahim-oai	b67c485d84	ci fix (#1782 )	2025-08-01 09:17:13 -07:00
aibrahim-oai	e2c994e32a	Add /compact (#1527 ) - Add operation to summarize the context so far. - The operation runs a compact task that summarizes the context. - The operation clear the previous context to free the context window - The operation didn't use `run_task` to avoid corrupting the session - Add /compact in the tui https://github.com/user-attachments/assets/e06c24e5-dcfb-4806-934a-564d425a919c	2025-07-31 21:34:32 -07:00
aibrahim-oai	ad0295b893	MCP server: route structured tool-call requests and expose mcp_protocol [Stack 2/3] (#1751 ) - Expose mcp_protocol from mcp-server for reuse in tests and callers. - In MessageProcessor, detect structured ToolCallRequestParams in tools/call and forward to a new handler. - Add handle_new_tool_calls scaffold (returns error for now). - Test helper: add send_send_user_message_tool_call to McpProcess to send ConversationSendMessage requests; This is the second PR in a stack. Stack: Final: #1686 Intermediate: #1751 First: #1750	2025-08-01 02:46:04 +00:00
aibrahim-oai	d3aa5f46b7	MCP Protocol: Align tool-call response with CallToolResult [Stack 1/3] (#1750 ) # Summary - Align MCP server responses with mcp_types by emitting [CallToolResult, RequestId] instead of an object. Update send-message result to a tagged enum: Ok or Error { message }. # Why Protocol compliance with current MCP schema. # Tests - Updated assertions in mcp_protocol.rs for create/stream/send/list and error cases. This is the first PR in a stack. Stack: Final: #1686 Intermediate: #1751 First: #1750	2025-08-01 02:30:03 +00:00
easong-openai	575590e4c2	Detect kitty terminals (#1748 ) We want to detect kitty terminals so we can preferentially upgrade their UX without degrading older terminals.	2025-08-01 00:30:44 +00:00
Jeremy Rose	4aca3e46c8	insert history lines with redraw (#1769 ) This delays the call to insert_history_lines until a redraw is happening. Crucially, the new lines are inserted _after the viewport is resized_. This results in fewer stray blank lines below the viewport when modals (e.g. user approval) are closed.	2025-07-31 17:15:26 -07:00
Jeremy Rose	d787434aa8	fix: always send KeyEvent, we now check kind in the handler (#1772 ) https://github.com/openai/codex/pull/1754 and #1771 fixed the same thing in colliding ways.	2025-08-01 00:13:36 +00:00
Jeremy Rose	ea69a1d72f	lighter approval modal (#1768 ) The yellow hazard stripes were too scary :) This also has the added benefit of not rendering anything at the full width of the terminal, so resizing is a little easier to handle. <img width="860" height="390" alt="Screenshot 2025-07-31 at 4 03 29 PM" src="https://github.com/user-attachments/assets/18476e1a-065d-4da9-92fe-e94978ab0fce" /> <img width="860" height="390" alt="Screenshot 2025-07-31 at 4 05 03 PM" src="https://github.com/user-attachments/assets/337db0da-de40-48c6-ae71-0e40f24b87e7" />	2025-07-31 17:10:52 -07:00
Jeremy Rose	610addbc2e	do not dispatch key releases (#1771 ) when we enabled KKP in https://github.com/openai/codex/pull/1743, we started receiving keyup events, but didn't expect them anywhere in our code. for now, just don't dispatch them at all.	2025-07-31 17:00:48 -07:00
pakrym-oai	0935e6a875	Send account id when available (#1767 ) For users with multiple accounts we need to specify the account to use.	2025-07-31 15:40:19 -07:00
easong-openai	6ce0a5875b	Initial planning tool (#1753 ) We need to optimize the prompt, but this causes the model to use the new planning_tool. <img width="765" height="110" alt="image" src="https://github.com/user-attachments/assets/45633f7f-3c85-4e60-8b80-902f1b3b508d" />	2025-07-31 20:45:52 +00:00
Michael Bolin	5a0ad5ab8f	chore: refactor exec.rs: create separate seatbelt.rs and spawn.rs files (#1762 ) At 550 lines, `exec.rs` was a bit large. In particular, I found it hard to locate the Seatbelt-related code quickly without a file with `seatbelt` in the name, so this refactors things so: - `spawn_command_under_seatbelt()` and dependent code moves to a new `seatbelt.rs` file - `spawn_child_async()` and dependent code moves to a new `spawn.rs` file	2025-07-31 13:11:47 -07:00
easong-openai	9aa11269a5	Fix double-scrolling in approval model (#1754 ) Previously, pressing up or down arrow in the new approval modal would be the equivalent of two up or down presses.	2025-07-31 19:41:32 +00:00
Michael Bolin	06c786b2da	fix: ensure PatchApplyBeginEvent and PatchApplyEndEvent are dispatched reliably (#1760 ) This is a follow-up to https://github.com/openai/codex/pull/1705, as that PR inadvertently lost the logic where `PatchApplyBeginEvent` and `PatchApplyEndEvent` events were sent when patches were auto-approved. Though as part of this fix, I believe this also makes an important safety fix to `assess_patch_safety()`, as there was a case that returned `SandboxType::None`, which arguably is the thing we were trying to avoid in #1705. On a high level, we want there to be only one codepath where `apply_patch` happens, which should be unified with the patch to run `exec`, in general, so that sandboxing is applied consistently for both cases. Prior to this change, `apply_patch()` in `core` would either: * exit early, delegating to `exec()` to shell out to `apply_patch` using the appropriate sandbox * proceed to run the logic for `apply_patch` in memory `549846b29a/codex-rs/core/src/apply_patch.rs (L61-L63)` In this implementation, only the latter would dispatch `PatchApplyBeginEvent` and `PatchApplyEndEvent`, though the former would dispatch `ExecCommandBeginEvent` and `ExecCommandEndEvent` for the `apply_patch` call (or, more specifically, the `codex --codex-run-as-apply-patch PATCH` call). To unify things in this PR, we: * Eliminate the back half of the `apply_patch()` function, and instead have it also return with `DelegateToExec`, though we add an extra field to the return value, `user_explicitly_approved_this_action`. * In `codex.rs` where we process `DelegateToExec`, we use `SandboxType::None` when `user_explicitly_approved_this_action` is `true`. This means we no longer run the apply_patch logic in memory, as we always `exec()`. (Note this is what allowed us to delete so much code in `apply_patch.rs`.) * In `codex.rs`, we further update `notify_exec_command_begin()` and `notify_exec_command_end()` to take additional fields to determine what type of notification to send: `ExecCommand` or `PatchApply`. Admittedly, this PR also drops some of the functionality about giving the user the opportunity to expand the set of writable roots as part of approving the `apply_patch` command. I'm not sure how much that was used, and we should probably rethink how that works as we are currently tidying up the protocol to the TUI, in general.	2025-07-31 11:13:57 -07:00
pakrym-oai	549846b29a	Add codex login --api-key (#1759 ) Allow setting the API key via `codex login --api-key`	2025-07-31 17:48:49 +00:00
Jeremy Rose	96654a5d52	clamp render area to terminal size (#1758 ) this fixes a couple of panics that would happen when trying to render something larger than the terminal, or insert history lines when the top of the viewport is at y=0.	2025-07-31 09:59:36 -07:00
easong-openai	861ba86403	Show error message after panic (#1752 ) Previously we were swallowing errors and silently exiting, which isn't great for helping users help us.	2025-07-31 09:19:08 -07:00
Jeremy Rose	be0cd34300	fix git tests (#1747 ) the git tests were failing on my local machine due to gpg signing config in my ~/.gitconfig. tests should not be affected by ~/.gitconfig, so configure them to ignore it.	2025-07-31 09:17:59 -07:00
Jeremy Rose	d86270696e	streamline ui (#1733 ) Simplify and improve many UI elements. * Remove all-around borders in most places. These interact badly with terminal resizing and look heavy. Prefer left-side-only borders. * Make the viewport adjust to the size of its contents. * <kbd>/</kbd> and <kbd>@</kbd> autocomplete boxes appear below the prompt, instead of above it. * Restyle the keyboard shortcut hints & move them to the left. * Restyle the approval dialog. * Use synchronized rendering to avoid flashing during rerenders. https://github.com/user-attachments/assets/96f044af-283b-411c-b7fc-5e6b8a433c20 <img width="1117" height="858" alt="Screenshot 2025-07-30 at 5 29 20 PM" src="https://github.com/user-attachments/assets/0cc0af77-8396-429b-b6ee-9feaaccdbee7" />	2025-07-31 00:43:21 -07:00
pap-openai	defeafb279	add keyboard enhancements to support shift_return (#1743 ) For terminal that supports [keyboard enhancements](https://docs.rs/libcrossterm/latest/crossterm/enum.KeyboardEnhancementFlags.html), adds the enhancements (enabling [kitty keyboard protocol](https://sw.kovidgoyal.net/kitty/keyboard-protocol/)) to support shift+enter listener. Those users (users with terminals listed on [KPP](https://sw.kovidgoyal.net/kitty/keyboard-protocol/)) should be able to press shift+return for new line --------- Co-authored-by: easong-openai <easong@openai.com>	2025-07-31 03:23:56 +00:00
pakrym-oai	51b6bdefbe	Auto format toml (#1745 ) Add recommended extension and configure it to auto format prompt.	2025-07-30 18:37:00 -07:00
Jeremy Rose	f2134f6633	resizable viewport (#1732 ) Proof of concept for a resizable viewport. The general approach here is to duplicate the `Terminal` struct from ratatui, but with our own logic. This is a "light fork" in that we are still using all the base ratatui functions (`Buffer`, `Widget` and so on), but we're doing our own bookkeeping at the top level to determine where to draw everything. This approach could use improvement—e.g, when the window is resized to a smaller size, if the UI wraps, we don't correctly clear out the artifacts from wrapping. This is possible with a little work (i.e. tracking what parts of our UI would have been wrapped), but this behavior is at least at par with the existing behavior. https://github.com/user-attachments/assets/4eb17689-09fd-4daa-8315-c7ebc654986d cc @joshka who might have Thoughts™	2025-07-31 00:06:55 +00:00
Michael Bolin	221ebfcccc	fix: run apply_patch calls through the sandbox (#1705 ) Building on the work of https://github.com/openai/codex/pull/1702, this changes how a shell call to `apply_patch` is handled. Previously, a shell call to `apply_patch` was always handled in-process, never leveraging a sandbox. To determine whether the `apply_patch` operation could be auto-approved, the `is_write_patch_constrained_to_writable_paths()` function would check if all the paths listed in the paths were writable. If so, the agent would apply the changes listed in the patch. Unfortunately, this approach afforded a loophole: symlinks! * For a soft link, we could fix this issue by tracing the link and checking whether the target is in the set of writable paths, however... * ...For a hard link, things are not as simple. We can run `stat FILE` to see if the number of links is greater than 1, but then we would have to do something potentially expensive like `find . -inum <inode_number>` to find the other paths for `FILE`. Further, even if this worked, this approach runs the risk of a [TOCTOU](https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use) race condition, so it is not robust. The solution, implemented in this PR, is to take the virtual execution of the `apply_patch` CLI into an _actual_ execution using `codex --codex-run-as-apply-patch PATCH`, which we can run under the sandbox the user specified, just like any other `shell` call. This, of course, assumes that the sandbox prevents writing through symlinks as a mechanism to write to folders that are not in the writable set configured by the sandbox. I verified this by testing the following on both Mac and Linux: ```shell #!/usr/bin/env bash set -euo pipefail # Can running a command in SANDBOX_DIR write a file in EXPLOIT_DIR? # Codex is run in SANDBOX_DIR, so writes should be constrianed to this directory. SANDBOX_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX) # EXPLOIT_DIR is outside of SANDBOX_DIR, so let's see if we can write to it. EXPLOIT_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX) echo "SANDBOX_DIR: $SANDBOX_DIR" echo "EXPLOIT_DIR: $EXPLOIT_DIR" cleanup() { # Only remove if it looks sane and still exists [[ -n "${SANDBOX_DIR:-}" && -d "$SANDBOX_DIR" ]] && rm -rf -- "$SANDBOX_DIR" [[ -n "${EXPLOIT_DIR:-}" && -d "$EXPLOIT_DIR" ]] && rm -rf -- "$EXPLOIT_DIR" } trap cleanup EXIT echo "I am the original content" > "${EXPLOIT_DIR}/original.txt" # Drop the -s to test hard links. ln -s "${EXPLOIT_DIR}/original.txt" "${SANDBOX_DIR}/link-to-original.txt" cat "${SANDBOX_DIR}/link-to-original.txt" if [[ "$(uname)" == "Linux" ]]; then SANDBOX_SUBCOMMAND=landlock else SANDBOX_SUBCOMMAND=seatbelt fi # Attempt the exploit cd "${SANDBOX_DIR}" codex debug "${SANDBOX_SUBCOMMAND}" bash -lc "echo pwned > ./link-to-original.txt" \|\| true cat "${EXPLOIT_DIR}/original.txt" ``` Admittedly, this change merits a proper integration test, but I think I will have to do that in a follow-up PR.	2025-07-30 16:45:08 -07:00
pakrym-oai	301ec72107	Add login status command (#1716 ) Print the current login mode, sanitized key and return an appropriate status.	2025-07-30 14:09:26 -07:00
pakrym-oai	e0e245cc1c	Send AGENTS.md as a separate user message (#1737 )	2025-07-30 13:56:24 -07:00
aibrahim-oai	2f5557056d	moving input item from MCP Protocol back to core Protocol (#1740 ) - Currently we have duplicate input item. Let's have one source of truth in the core. - Used Requestid type	2025-07-30 13:43:08 -07:00
pakrym-oai	ea01a5ffe2	Add support for a separate chatgpt auth endpoint (#1712 ) Adds a `CodexAuth` type that encapsulates information about available auth modes and logic for refreshing the token. Changes `Responses` API to send requests to different endpoints based on the auth type. Updates login_with_chatgpt to support API-less mode and skip the key exchange.	2025-07-30 19:40:15 +00:00
aibrahim-oai	93341797c4	fix ci (#1739 ) I think this commit broke the CI because it changed the `McpToolCallBeginEvent` type: `347c81ad00`	2025-07-30 11:32:38 -07:00

1 2 3 4 5 ...

310 Commits