This PR overhauls how active tool calls and completed tool calls are
displayed:
1. More use of colour to indicate success/failure and distinguish
between components like tool name+arguments
2. Previously, the entire `CallToolResult` was serialized to JSON and
pretty-printed. Now, we extract each individual `CallToolResultContent`
and print those
1. The previous solution was wasting space by unnecessarily showing
details of the `CallToolResult` struct to users, without formatting the
actual tool call results nicely
2. We're now able to show users more information from tool results in
less space, with nicer formatting when tools return JSON results
### Before:
<img width="1251" alt="Screenshot 2025-06-03 at 11 24 26"
src="https://github.com/user-attachments/assets/5a58f222-219c-4c53-ace7-d887194e30cf"
/>
### After:
<img width="1265" alt="image"
src="https://github.com/user-attachments/assets/99fe54d0-9ebe-406a-855b-7aa529b91274"
/>
## Future Work
1. Integrate image tool result handling better. We should be able to
display images even if they're not the first `CallToolResultContent`
2. Users should have some way to view the full version of truncated tool
results
3. It would be nice to add some left padding for tool results, make it
more clear that they are results. This is doable, just a little fiddly
due to the way `first_visible_line` scrolling works
4. There's almost certainly a better way to format JSON than "all on 1
line with spaces to make Ratatui wrapping work". But I think that works
OK for now.
The motivation behind this PR is to make it so a `HistoryCell` is more
like a `WidgetRef` that knows how to render itself into a `Rect` so that
it can be backed by something other than a `Vec<Line>`. Because a
`HistoryCell` is intended to appear in a scrollable list, we want to
ensure the stack of cells can be scrolled one `Line` at a time even if
the `HistoryCell` is not backed by a `Vec<Line>` itself.
To this end, we introduce the `CellWidget` trait whose key method is:
```
fn render_window(&self, first_visible_line: usize, area: Rect, buf: &mut Buffer);
```
The `first_visible_line` param is what differs from
`WidgetRef::render_ref()`, as a `CellWidget` needs to know the offset
into its "full view" at which it should start rendering.
The bookkeeping in `ConversationHistoryWidget` has been updated
accordingly to ensure each `CellWidget` in the history is rendered
appropriately.
This PR introduces support for `-c`/`--config` so users can override
individual config values on the command line using `--config
name=value`. Example:
```
codex --config model=o4-mini
```
Making it possible to set arbitrary config values on the command line
results in a more flexible configuration scheme and makes it easier to
provide single-line examples that can be copy-pasted from documentation.
Effectively, it means there are four levels of configuration for some
values:
- Default value (e.g., `model` currently defaults to `o4-mini`)
- Value in `config.toml` (e.g., user could override the default to be
`model = "o3"` in their `config.toml`)
- Specifying `-c` or `--config` to override `model` (e.g., user can
include `-c model=o3` in their list of args to Codex)
- If available, a config-specific flag can be used, which takes
precedence over `-c` (e.g., user can specify `--model o3` in their list
of args to Codex)
Now that it is possible to specify anything that could be configured in
`config.toml` on the command line using `-c`, we do not need to have a
custom flag for every possible config option (which can clutter the
output of `--help`). To that end, as part of this PR, we drop support
for the `--disable-response-storage` flag, as users can now specify `-c
disable_response_storage=true` to get the equivalent functionality.
Under the hood, this works by loading the `config.toml` into a
`toml::Value`. Then for each `key=value`, we create a small synthetic
TOML file with `value` so that we can run the TOML parser to get the
equivalent `toml::Value`. We then parse `key` to determine the point in
the original `toml::Value` to do the insert/replace. Once all of the
overrides from `-c` args have been applied, the `toml::Value` is
deserialized into a `ConfigToml` and then the `ConfigOverrides` are
applied, as before.
https://github.com/openai/codex/pull/1086 is a work-in-progress to make
Linux sandboxing work more like Seatbelt where, for the command we want
to sandbox, we build up the command and then hand it, and some sandbox
configuration flags, to another command to set up the sandbox and then
run it.
In the case of Seatbelt, macOS provides this helper binary and provides
it at `/usr/bin/sandbox-exec`. For Linux, we have to build our own and
pass it through (which is what #1086 does), so this makes the new
`codex_linux_sandbox_exe` available on `Config` so that it will later be
available in `exec.rs` when we need it in #1086.
I did a bit of research to understand why I could not use my mouse to
drag to select text to copy to the clipboard in iTerm.
Apparently https://github.com/openai/codex/pull/641 to enable mousewheel
scrolling broke this functionality. It seems that, unless we put in a
bit of effort, we can have drag-to-select or scrolling, but not both.
Though if you know the trick to hold down `Option` will dragging with
the mouse in iTerm, you can probably get by with this. (I did not know
about this option prior to researching this issue.)
Nevertheless, users may still prefer to disable mouse capture
altogether, so this PR introduces:
* the ability to set `tui.disable_mouse_capture = true` in `config.toml`
to disable mouse capture
* a new command, `/toggle-mouse-mode` to toggle mouse capture
Moving to Rust 1.87 introduced a clippy warning that
`SendError<AppEvent>` was too large.
In practice, the only thing we ever did when we got this error was log
it (if the mspc channel is closed, then the app is likely shutting down
or something, so there's not much to do...), so this finally motivated
me to introduce `AppEventSender`, which wraps
`std::sync::mpsc::Sender<AppEvent>` with a `send()` method that invokes
`send()` on the underlying `Sender` and logs an `Err` if it gets one.
This greatly simplifies the code, as many functions that previously
returned `Result<(), SendError<AppEvent>>` now return `()`, so we don't
have to propagate an `Err` all over the place that we don't really
handle, anyway.
This also makes it so we can upgrade to Rust 1.87 in CI.
Introduces support for slash commands like in the TypeScript CLI. We do
not support the full set of commands yet, but the core abstraction is
there now.
In particular, we have a `SlashCommand` enum and due to thoughtful use
of the [strum](https://crates.io/crates/strum) crate, it requires
minimal boilerplate to add a new command to the list.
The key new piece of UI is `CommandPopup`, though the keyboard events
are still handled by `ChatComposer`. The behavior is roughly as follows:
* if the first character in the composer is `/`, the command popup is
displayed (if you really want to send a message to Codex that starts
with a `/`, simply put a space before the `/`)
* while the popup is displayed, up/down can be used to change the
selection of the popup
* if there is a selection, hitting tab completes the command, but does
not send it
* if there is a selection, hitting enter sends the command
* if the prefix of the composer matches a command, the command will be
visible in the popup so the user can see the description (commands could
take arguments, so additional text may appear after the command name
itself)
https://github.com/user-attachments/assets/39c3e6ee-eeb7-4ef7-a911-466d8184975f
Incidentally, Codex wrote almost all the code for this PR!
This introduces a much-needed "profile" concept where users can specify
a collection of options under one name and then pass that via
`--profile` to the CLI.
This PR introduces the `ConfigProfile` struct and makes it a field of
`CargoToml`. It further updates
`Config::load_from_base_config_with_overrides()` to respect
`ConfigProfile`, overriding default values where appropriate. A detailed
unit test is added at the end of `config.rs` to verify this behavior.
Details on how to use this feature have also been added to
`codex-rs/README.md`.
This is a substantial PR to add support for the chat completions API,
which in turn makes it possible to use non-OpenAI model providers (just
like in the TypeScript CLI):
* It moves a number of structs from `client.rs` to `client_common.rs` so
they can be shared.
* It introduces support for the chat completions API in
`chat_completions.rs`.
* It updates `ModelProviderInfo` so that `env_key` is `Option<String>`
instead of `String` (for e.g., ollama) and adds a `wire_api` field
* It updates `client.rs` to choose between `stream_responses()` and
`stream_chat_completions()` based on the `wire_api` for the
`ModelProviderInfo`
* It updates the `exec` and TUI CLIs to no longer fail if the
`OPENAI_API_KEY` environment variable is not set
* It updates the TUI so that `EventMsg::Error` is displayed more
prominently when it occurs, particularly now that it is important to
alert users to the `CodexErr::EnvVar` variant.
* `CodexErr::EnvVar` was updated to include an optional `instructions`
field so we can preserve the behavior where we direct users to
https://platform.openai.com if `OPENAI_API_KEY` is not set.
* Cleaned up the "welcome message" in the TUI to ensure the model
provider is displayed.
* Updated the docs in `codex-rs/README.md`.
To exercise the chat completions API from OpenAI models, I added the
following to my `config.toml`:
```toml
model = "gpt-4o"
model_provider = "openai-chat-completions"
[model_providers.openai-chat-completions]
name = "OpenAI using Chat Completions"
base_url = "https://api.openai.com/v1"
env_key = "OPENAI_API_KEY"
wire_api = "chat"
```
Though to test a non-OpenAI provider, I installed ollama with mistral
locally on my Mac because ChatGPT said that would be a good match for my
hardware:
```shell
brew install ollama
ollama serve
ollama pull mistral
```
Then I added the following to my `~/.codex/config.toml`:
```toml
model = "mistral"
model_provider = "ollama"
```
Note this code could certainly use more test coverage, but I want to get
this in so folks can start playing with it.
For reference, I believe https://github.com/openai/codex/pull/247 was
roughly the comparable PR on the TypeScript side.
Sets submodules to use workspace lints. Added denying unwrap as a
workspace level lint, which found a couple of cases where we could have
propagated errors. Also manually labeled ones that were fine by my eye.
This is the first step in supporting other model providers in the Rust
CLI. Specifically, this PR adds support for the new entries in `Config`
and `ConfigOverrides` to specify a `ModelProviderInfo`, which is the
basic config needed for an LLM provider. This PR does not get us all the
way there yet because `client.rs` still categorically appends
`/responses` to the URL and expects the endpoint to support the OpenAI
Responses API. Will fix that next!
This introduces the use of the `tui-markdown` crate to parse an
assistant message as Markdown and style it using ANSI for a better user
experience. As shown in the screenshot below, it has support for syntax
highlighting for _tagged_ fenced code blocks:
<img width="907" alt="image"
src="https://github.com/user-attachments/assets/900dc229-80bb-46e8-b1bb-efee4c70ba3c"
/>
That said, `tui-markdown` is not as configurable (or stylish!) as
https://www.npmjs.com/package/marked-terminal, which is what we use in
the TypeScript CLI. In particular:
* The styles are hardcoded and `tui_markdown::from_str()` does not take
any options whatsoever. It uses "bold white" for inline code style which
does not stand out as much as the yellow used by `marked-terminal`:
65402cbda7/tui-markdown/src/lib.rs (L464)
I asked Codex to take a first pass at this and it came up with:
https://github.com/joshka/tui-markdown/pull/80
* If a fenced code block is not tagged, then it does not get
highlighted. I would rather add some logic here:
65402cbda7/tui-markdown/src/lib.rs (L262)
that uses something like https://pypi.org/project/guesslang/ to examine
the value of `text` and try to use the appropriate syntax highlighter.
* When we have a fenced code block, we do not want to show the opening
and closing triple backticks in the output.
To unblock ourselves, we might want to bundle our own fork of
`tui-markdown` temporarily until we figure out what the shape of the API
should be and then try to upstream it.
Some effects of this change:
- New formatting changes across many files. No functionality changes
should occur from that.
- Calls to `set_env` are considered unsafe, since this only happens in
tests we wrap them in `unsafe` blocks
https://github.com/openai/codex/pull/800 made `cwd` a property of
`Config` and made it so the `cwd` is not necessarily
`std::env::current_dir()`. As such, `is_inside_git_repo()` should check
`Config.cwd` rather than `std::env::current_dir()`.
This PR updates `is_inside_git_repo()` to take `Config` instead of an
arbitrary `PathBuf` to force the check to operate on a `Config` where
`cwd` has been resolved to what the user specified.
In order to expose Codex via an MCP server, I realized that we should be
taking `cwd` as a parameter rather than assuming
`std::env::current_dir()` as the `cwd`. Specifically, the user may want
to start a session in a directory other than the one where the MCP
server has been started.
This PR makes `cwd: PathBuf` a required field of `Session` and threads
it all the way through, though I think there is still an issue with not
honoring `workdir` for `apply_patch`, which is something we also had to
fix in the TypeScript version: https://github.com/openai/codex/pull/556.
This also adds `-C`/`--cd` to change the cwd via the command line.
To test, I ran:
```
cargo run --bin codex -- exec -C /tmp 'show the output of ls'
```
and verified it showed the contents of my `/tmp` folder instead of
`$PWD`.
Previous to this PR, `SandboxPolicy` was a bit difficult to work with:
237f8a11e1/codex-rs/core/src/protocol.rs (L98-L108)
Specifically:
* It was an `enum` and therefore options were mutually exclusive as
opposed to additive.
* It defined things in terms of what the agent _could not_ do as opposed
to what they _could_ do. This made things hard to support because we
would prefer to build up a sandbox config by starting with something
extremely restrictive and only granting permissions for things the user
as explicitly allowed.
This PR changes things substantially by redefining the policy in terms
of two concepts:
* A `SandboxPermission` enum that defines permissions that can be
granted to the agent/sandbox.
* A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`,
but externally exposes a simpler API that can be used to configure
Seatbelt/Landlock.
Previous to this PR, we supported a `--sandbox` flag that effectively
mapped to an enum value in `SandboxPolicy`. Though now that
`SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single
`--sandbox` flag no longer makes sense. While I could have turned it
into a flag that the user can specify multiple times, I think the
current values to use with such a flag are long and potentially messy,
so for the moment, I have dropped support for `--sandbox` altogether and
we can bring it back once we have figured out the naming thing.
Since `--sandbox` is gone, users now have to specify `--full-auto` to
get a sandbox that allows writes in `cwd`. Admittedly, there is no clean
way to specify the equivalent of `--full-auto` in your `config.toml`
right now, so we will have to revisit that, as well.
Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy`
changed considerably, I had to overhaul how config loading works, as
well. There are now two distinct concepts, `ConfigToml` and `Config`:
* `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one
might expect, every field is `Optional` and it is `#[derive(Deserialize,
Default)]`. Consistent use of `Optional` makes it clear what the user
has specified explicitly.
* `Config` is the "normalized config" and is produced by merging
`ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw
`Option<Vec<SandboxPermission>>`, `Config` presents only the final
`SandboxPolicy`.
The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra
special attention to ensure we are faithfully mapping the
`SandboxPolicy` to the Seatbelt and Landlock configs, respectively.
Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been
renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that
`(allow file-read*)` has been removed from the `.sbpl` file as now this
is added to the policy in `core/src/exec.rs` when
`sandbox_policy.has_full_disk_read_access()` is `true`.
https://github.com/openai/codex/pull/642 introduced support for the
`--disable-response-storage` flag, but if you are a ZDR customer, it is
tedious to set this every time, so this PR makes it possible to set this
once in `config.toml` and be done with it.
Incidentally, this tidies things up such that now `init_codex()` takes
only one parameter: `Config`.
This changes how instantiating `Config` works and also adds
`approval_policy` and `sandbox_policy` as fields. The idea is:
* All fields of `Config` have appropriate default values.
* `Config` is initially loaded from `~/.codex/config.toml`, so values in
`config.toml` will override those defaults.
* Clients must instantiate `Config` via
`Config::load_with_overrides(ConfigOverrides)` where `ConfigOverrides`
has optional overrides that are expected to be settable based on CLI
flags.
The `Config` should be defined early in the program and then passed
down. Now functions like `init_codex()` take fewer individual parameters
because they can just take a `Config`.
Also, `Config::load()` used to fail silently if `~/.codex/config.toml`
had a parse error and fell back to the default config. This seemed
really bad because it wasn't clear why the values in my `config.toml`
weren't getting picked up. I changed things so that
`load_with_overrides()` returns `Result<Config>` and verified that the
various CLIs print a reasonable error if `config.toml` is malformed.
Finally, I also updated the TUI to show which **sandbox** value is being
used, as we do for other key values like **model** and **approval**.
This was also a reminder that the various values of `--sandbox` are
honored on Linux but not macOS today, so I added some TODOs about fixing
that.
Previously, the Rust TUI was writing log files to `/tmp`, which is
world-readable and not available on Windows, so that isn't great.
This PR tries to clean things up by adding a function that provides the
path to the "Codex config dir," e.g., `~/.codex` (though I suppose we
could support `$CODEX_HOME` to override this?) and then defines other
paths in terms of the result of `codex_dir()`.
For example, `log_dir()` returns the folder where log files should be
written which is defined in terms of `codex_dir()`. I updated the TUI to
use this function. On UNIX, we even go so far as to `chmod 600` the log
file by default, though as noted in a comment, it's a bit tedious to do
the equivalent on Windows, so we just let that go for now.
This also changes the default logging level to `info` for `codex_core`
and `codex_tui` when `RUST_LOG` is not specified. I'm not really sure if
we should use a more verbose default (it may be helpful when debugging
user issues), though if so, we should probably also set up log rotation?
This adds support for the `--disable-response-storage` flag across our
multiple Rust CLIs to support customers who have opted into Zero-Data
Retention (ZDR). The analogous changes to the TypeScript CLI were:
* https://github.com/openai/codex/pull/481
* https://github.com/openai/codex/pull/543
For a client using ZDR, `previous_response_id` will never be available,
so the `input` field of an API request must include the full transcript
of the conversation thus far. As such, this PR changes the type of
`Prompt.input` from `Vec<ResponseInputItem>` to `Vec<ResponseItem>`.
Practically speaking, `ResponseItem` was effectively a "superset" of
`ResponseInputItem` already. The main difference for us is that
`ResponseItem` includes the `FunctionCall` variant that we have to
include as part of the conversation history in the ZDR case.
Another key change in this PR is modifying `try_run_turn()` so that it
returns the `Vec<ResponseItem>` for the turn in addition to the
`Vec<ResponseInputItem>` produced by `try_run_turn()`. This is because
the caller of `run_turn()` needs to record the `Vec<ResponseItem>` when
ZDR is enabled.
To that end, this PR introduces `ZdrTranscript` (and adds
`zdr_transcript: Option<ZdrTranscript>` to `struct State` in `codex.rs`)
to take responsibility for maintaining the conversation transcript in
the ZDR case.
It is intuitive to try to scroll the conversation history using the
mouse in the TUI, but prior to this change, we only supported scrolling
via keyboard events.
This PR enables mouse capture upon initialization (and disables it on
exit) such that we get `ScrollUp` and `ScrollDown` events in
`codex-rs/tui/src/app.rs`. I initially mapped each event to scrolling by
one line, but that felt sluggish. I decided to introduce
`ScrollEventHelper` so we could debounce scroll events and measure the
number of scroll events in a 100ms window to determine the "magnitude"
of the scroll event. I put in a basic heuristic to start, but perhaps
someone more motivated can play with it over time.
`ScrollEventHelper` takes care of handling the atomic fields and thread
management to ensure an `AppEvent::Scroll` event is pumped back through
the event loop at the appropriate time with the accumulated delta.
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.