Commit Graph

77 Commits

Author SHA1 Message Date
Michael Bolin
c6fcec55fe fix: always send full instructions when using the Responses API (#1207)
This fixes a longstanding error in the Rust CLI where `codex.rs`
contained an errant `is_first_turn` check that would exclude the user
instructions for subsequent "turns" of a conversation when using the
responses API (i.e., when `previous_response_id` existed).

While here, renames `Prompt.instructions` to `Prompt.user_instructions`
since we now have quite a few levels of instructions floating around.
Also removed an unnecessary use of `clone()` in
`Prompt.get_full_instructions()`.
2025-06-03 09:40:19 -07:00
Michael Bolin
6fcc528a43 fix: provide tolerance for apply_patch tool (#993)
As explained in detail in the doc comment for `ParseMode::Lenient`, we
have observed that GPT-4.1 does not always generate a valid invocation
of `apply_patch`. Fortunately, the error is predictable, so we introduce
some new logic to the `codex-apply-patch` crate to recover from this
error.

Because we would like to avoid this becoming a de facto standard (as it
would be incompatible if `apply_patch` were provided as an actual
executable, unless we also introduced the lenient behavior in the
executable, as well), we require passing `ParseMode::Lenient` to
`parse_patch_text()` to make it clear that the caller is opting into
supporting this special case.

Note the analogous change to the TypeScript CLI was
https://github.com/openai/codex/pull/930. In addition to changing the
accepted input to `apply_patch`, it also introduced additional
instructions for the model, which we include in this PR.

Note that `apply-patch` does not depend on either `regex` or
`regex-lite`, so some of the checks are slightly more verbose to avoid
introducing this dependency.

That said, this PR does not leverage the existing
`extract_heredoc_body_from_apply_patch_command()`, which depends on
`tree-sitter` and `tree-sitter-bash`:


5a5aa89914/codex-rs/apply-patch/src/lib.rs (L191-L246)

though perhaps it should.
2025-06-03 09:06:38 -07:00
Michael Bolin
0f3cc8f842 feat: make reasoning effort/summaries configurable (#1199)
Previous to this PR, we always set `reasoning` when making a request
using the Responses API:


d7245cbbc9/codex-rs/core/src/client.rs (L108-L111)

Though if you tried to use the Rust CLI with `--model gpt-4.1`, this
would fail with:

```shell
"Unsupported parameter: 'reasoning.effort' is not supported with this model."
```

We take a cue from the TypeScript CLI, which does a check on the model
name:


d7245cbbc9/codex-cli/src/utils/agent/agent-loop.ts (L786-L789)

This PR does a similar check, though also adds support for the following
config options:

```
model_reasoning_effort = "low" | "medium" | "high" | "none"
model_reasoning_summary = "auto" | "concise" | "detailed" | "none"
```

This way, if you have a model whose name happens to start with `"o"` (or
`"codex"`?), you can set these to `"none"` to explicitly disable
reasoning, if necessary. (That said, it seems unlikely anyone would use
the Responses API with non-OpenAI models, but we provide an escape
hatch, anyway.)

This PR also updates both the TUI and `codex exec` to show `reasoning
effort` and `reasoning summaries` in the header.
2025-06-02 16:01:34 -07:00
Michael Bolin
d7245cbbc9 fix: chat completions API now also passes tools along (#1167)
Prior to this PR, there were two big misses in `chat_completions.rs`:

1. The loop in `stream_chat_completions()` was only including items of
type `ResponseItem::Message` when building up the `"messages"` JSON for
the `POST` request to the `chat/completions` endpoint. This fixes things
by ensuring other variants (`FunctionCall`, `LocalShellCall`, and
`FunctionCallOutput`) are included, as well.
2. In `process_chat_sse()`, we were not recording tool calls and were
only emitting items of type
`ResponseEvent::OutputItemDone(ResponseItem::Message)` to the stream.
Now we introduce `FunctionCallState`, which is used to accumulate the
`delta`s of type `tool_calls`, so we can ultimately emit a
`ResponseItem::FunctionCall`, when appropriate.

While function calling now appears to work for chat completions with my
local testing, I believe that there are still edge cases that are not
covered and that this codepath would benefit from a battery of
integration tests. (As part of that further cleanup, we should also work
to support streaming responses in the UI.)

The other important part of this PR is some cleanup in
`core/src/codex.rs`. In particular, it was hard to reason about how
`run_task()` was building up the list of messages to include in a
request across the various cases:

- Responses API
- Chat Completions API
- Responses API used in concert with ZDR

I like to think things are a bit cleaner now where:

- `zdr_transcript` (if present) contains all messages in the history of
the conversation, which includes function call outputs that have not
been sent back to the model yet
- `pending_input` includes any messages the user has submitted while the
turn is in flight that need to be injected as part of the next `POST` to
the model
- `input_for_next_turn` includes the tool call outputs that have not
been sent back to the model yet
2025-06-02 13:47:51 -07:00
Michael Bolin
e40f86b446 chore: logging cleanup (#1196)
Update what we log to make `RUST_LOG=debug` a bit easier to work with.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1196).
* #1167
* __->__ #1196
2025-06-02 13:31:33 -07:00
Michael Bolin
e81327e5f4 feat: add hide_agent_reasoning config option (#1181)
This PR introduces a `hide_agent_reasoning` config option (that defaults
to `false`) that users can enable to make the output less verbose by
suppressing reasoning output.

To test, verified that this includes agent reasoning in the output:

```
echo hello | just exec
```

whereas this does not:

```
echo hello | just exec --config hide_agent_reasoning=false
```
2025-05-30 23:14:56 -07:00
Michael Bolin
1bf82056b3 fix: introduce create_tools_json() and share it with chat_completions.rs (#1177)
The main motivator behind this PR is that `stream_chat_completions()`
was not adding the `"tools"` entry to the payload posted to the
`/chat/completions` endpoint. This (1) refactors the existing logic to
build up the `"tools"` JSON from `client.rs` into `openai_tools.rs`, and
(2) updates the use of responses API (`client.rs`) and chat completions
API (`chat_completions.rs`) to both use it.

Note this PR alone is not sufficient to get tool calling from chat
completions working: that is done in
https://github.com/openai/codex/pull/1167.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1177).
* #1167
* __->__ #1177
2025-05-30 14:07:03 -07:00
Fouad Matin
828e2062c2 fix(codex-rs): use codex-mini-latest as default (#1164) 2025-05-29 16:55:19 -07:00
Michael Bolin
a768a6a41d fix: introduce ResponseInputItem::McpToolCallOutput variant (#1151)
The output of an MCP server tool call can be one of several types, but
to date, we treated all outputs as text by showing the serialized JSON
as the "tool output" in Codex:


25a9949c49/codex-rs/mcp-types/src/lib.rs (L96-L101)

This PR adds support for the `ImageContent` variant so we can now
display an image output from an MCP tool call.

In making this change, we introduce a new
`ResponseInputItem::McpToolCallOutput` variant so that we can work with
the `mcp_types::CallToolResult` directly when the function call is made
to an MCP server.

Though arguably the more significant change is the introduction of
`HistoryCell::CompletedMcpToolCallWithImageOutput`, which is a cell that
uses `ratatui_image` to render an image into the terminal. To support
this, we introduce `ImageRenderCache`, cache a
`ratatui_image::picker::Picker`, and `ensure_image_cache()` to cache the
appropriate scaled image data and dimensions based on the current
terminal size.

To test, I created a minimal `package.json`:

```json
{
  "name": "kitty-mcp",
  "version": "1.0.0",
  "type": "module",
  "description": "MCP that returns image of kitty",
  "main": "index.js",
  "dependencies": {
    "@modelcontextprotocol/sdk": "^1.12.0"
  }
}
```

with the following `index.js` to define the MCP server:

```js
#!/usr/bin/env node

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { readFile } from "node:fs/promises";
import { join } from "node:path";

const IMAGE_URI = "image://Ada.png";

const server = new McpServer({
  name: "Demo",
  version: "1.0.0",
});

server.tool(
  "get-cat-image",
  "If you need a cat image, this tool will provide one.",
  async () => ({
    content: [
      { type: "image", data: await getAdaPngBase64(), mimeType: "image/png" },
    ],
  })
);

server.resource("Ada the Cat", IMAGE_URI, async (uri) => {
  const base64Image = await getAdaPngBase64();
  return {
    contents: [
      {
        uri: uri.href,
        mimeType: "image/png",
        blob: base64Image,
      },
    ],
  };
});

async function getAdaPngBase64() {
  const __dirname = new URL(".", import.meta.url).pathname;
  // From 9705ce2c59/assets/Ada.png
  const filePath = join(__dirname, "Ada.png");
  const imageData = await readFile(filePath);
  const base64Image = imageData.toString("base64");
  return base64Image;
}

const transport = new StdioServerTransport();
await server.connect(transport);
```

With the local changes from this PR, I added the following to my
`config.toml`:

```toml
[mcp_servers.kitty]
command = "node"
args = ["/Users/mbolin/code/kitty-mcp/index.js"]
```

Running the TUI from source:

```
cargo run --bin codex -- --model o3 'I need a picture of a cat'
```

I get:

<img width="732" alt="image"
src="https://github.com/user-attachments/assets/bf80b721-9ca0-4d81-aec7-77d6899e2869"
/>

Now, that said, I have only tested in iTerm and there is definitely some
funny business with getting an accurate character-to-pixel ratio
(sometimes the `CompletedMcpToolCallWithImageOutput` thinks it needs 10
rows to render instead of 4), so there is still work to be done here.
2025-05-28 19:03:17 -07:00
Michael Bolin
25a9949c49 fix: ensure inputSchema for MCP tool always has "properties" field when talking to OpenAI (#1150)
As noted in the comment introduced in this PR, this is analogous to the
issue reported in
https://github.com/openai/openai-agents-python/issues/449. This seems to
work now.
2025-05-28 17:17:21 -07:00
Michael Bolin
d60f350cf8 feat: add support for -c/--config to override individual config items (#1137)
This PR introduces support for `-c`/`--config` so users can override
individual config values on the command line using `--config
name=value`. Example:

```
codex --config model=o4-mini
```

Making it possible to set arbitrary config values on the command line
results in a more flexible configuration scheme and makes it easier to
provide single-line examples that can be copy-pasted from documentation.

Effectively, it means there are four levels of configuration for some
values:

- Default value (e.g., `model` currently defaults to `o4-mini`)
- Value in `config.toml` (e.g., user could override the default to be
`model = "o3"` in their `config.toml`)
- Specifying `-c` or `--config` to override `model` (e.g., user can
include `-c model=o3` in their list of args to Codex)
- If available, a config-specific flag can be used, which takes
precedence over `-c` (e.g., user can specify `--model o3` in their list
of args to Codex)

Now that it is possible to specify anything that could be configured in
`config.toml` on the command line using `-c`, we do not need to have a
custom flag for every possible config option (which can clutter the
output of `--help`). To that end, as part of this PR, we drop support
for the `--disable-response-storage` flag, as users can now specify `-c
disable_response_storage=true` to get the equivalent functionality.

Under the hood, this works by loading the `config.toml` into a
`toml::Value`. Then for each `key=value`, we create a small synthetic
TOML file with `value` so that we can run the TOML parser to get the
equivalent `toml::Value`. We then parse `key` to determine the point in
the original `toml::Value` to do the insert/replace. Once all of the
overrides from `-c` args have been applied, the `toml::Value` is
deserialized into a `ConfigToml` and then the `ConfigOverrides` are
applied, as before.
2025-05-27 23:11:44 -07:00
Michael Bolin
29d154cb13 fix: use o4-mini as the default model (#1135)
Rollback of https://github.com/openai/codex/pull/972.
2025-05-27 09:12:55 -07:00
Michael Bolin
89ef4efdcf fix: overhaul how we spawn commands under seccomp/landlock on Linux (#1086)
Historically, we spawned the Seatbelt and Landlock sandboxes in
substantially different ways:

For **Seatbelt**, we would run `/usr/bin/sandbox-exec` with our policy
specified as an arg followed by the original command:


d1de7bb383/codex-rs/core/src/exec.rs (L147-L219)

For **Landlock/Seccomp**, we would do
`tokio::runtime::Builder::new_current_thread()`, _invoke
Landlock/Seccomp APIs to modify the permissions of that new thread_, and
then spawn the command:


d1de7bb383/codex-rs/core/src/exec_linux.rs (L28-L49)

While it is neat that Landlock/Seccomp supports applying a policy to
only one thread without having to apply it to the entire process, it
requires us to maintain two different codepaths and is a bit harder to
reason about. The tipping point was
https://github.com/openai/codex/pull/1061, in which we had to start
building up the `env` in an unexpected way for the existing
Landlock/Seccomp approach to continue to work.

This PR overhauls things so that we do similar things for Mac and Linux.
It turned out that we were already building our own "helper binary"
comparable to Mac's `sandbox-exec` as part of the `cli` crate:


d1de7bb383/codex-rs/cli/Cargo.toml (L10-L12)

We originally created this to build a small binary to include with the
Node.js version of the Codex CLI to provide support for Linux
sandboxing.

Though the sticky bit is that, at this point, we still want to deploy
the Rust version of Codex as a single, standalone binary rather than a
CLI and a supporting sandboxing binary. To satisfy this goal, we use
"the arg0 trick," in which we:

* use `std::env::current_exe()` to get the path to the CLI that is
currently running
* use the CLI as the `program` for the `Command`
* set `"codex-linux-sandbox"` as arg0 for the `Command`

A CLI that supports sandboxing should check arg0 at the start of the
program. If it is `"codex-linux-sandbox"`, it must invoke
`codex_linux_sandbox::run_main()`, which runs the CLI as if it were
`codex-linux-sandbox`. When acting as `codex-linux-sandbox`, we make the
appropriate Landlock/Seccomp API calls and then use `execvp(3)` to spawn
the original command, so do _replace_ the process rather than spawn a
subprocess. Incidentally, we do this before starting the Tokio runtime,
so the process should only have one thread when `execvp(3)` is called.

Because the `core` crate that needs to spawn the Linux sandboxing is not
a CLI in its own right, this means that every CLI that includes `core`
and relies on this behavior has to (1) implement it and (2) provide the
path to the sandboxing executable. While the path is almost always
`std::env::current_exe()`, we needed to make this configurable for
integration tests, so `Config` now has a `codex_linux_sandbox_exe:
Option<PathBuf>` property to facilitate threading this through,
introduced in https://github.com/openai/codex/pull/1089.

This common pattern is now captured in
`codex_linux_sandbox::run_with_sandbox()` and all of the `main.rs`
functions that should use it have been updated as part of this PR.

The `codex-linux-sandbox` crate added to the Cargo workspace as part of
this PR now has the bulk of the Landlock/Seccomp logic, which makes
`core` a bit simpler. Indeed, `core/src/exec_linux.rs` and
`core/src/landlock.rs` were removed/ported as part of this PR. I also
moved the unit tests for this code into an integration test,
`linux-sandbox/tests/landlock.rs`, in which I use
`env!("CARGO_BIN_EXE_codex-linux-sandbox")` as the value for
`codex_linux_sandbox_exe` since `std::env::current_exe()` is not
appropriate in that case.
2025-05-23 11:37:07 -07:00
Michael Bolin
d1de7bb383 feat: add codex_linux_sandbox_exe: Option<PathBuf> field to Config (#1089)
https://github.com/openai/codex/pull/1086 is a work-in-progress to make
Linux sandboxing work more like Seatbelt where, for the command we want
to sandbox, we build up the command and then hand it, and some sandbox
configuration flags, to another command to set up the sandbox and then
run it.

In the case of Seatbelt, macOS provides this helper binary and provides
it at `/usr/bin/sandbox-exec`. For Linux, we have to build our own and
pass it through (which is what #1086 does), so this makes the new
`codex_linux_sandbox_exe` available on `Config` so that it will later be
available in `exec.rs` when we need it in #1086.
2025-05-22 21:52:28 -07:00
Michael Bolin
cb379d7797 feat: introduce support for shell_environment_policy in config.toml (#1061)
To date, when handling `shell` and `local_shell` tool calls, we were
spawning new processes using the environment inherited from the Codex
process itself. This means that the sensitive `OPENAI_API_KEY` that
Codex needs to talk to OpenAI models was made available to everything
run by `shell` and `local_shell`. While there are cases where that might
be useful, it does not seem like a good default.

This PR introduces a complex `shell_environment_policy` config option to
control the `env` used with these tool calls. It is inevitably a bit
complex so that it is possible to override individual components of the
policy so without having to restate the entire thing.

Details are in the updated `README.md` in this PR, but here is the
relevant bit that explains the individual fields of
`shell_environment_policy`:

| Field | Type | Default | Description |
| ------------------------- | -------------------------- | ------- |
-----------------------------------------------------------------------------------------------------------------------------------------------
|
| `inherit` | string | `core` | Starting template for the
environment:<br>`core` (`HOME`, `PATH`, `USER`, …), `all` (clone full
parent env), or `none` (start empty). |
| `ignore_default_excludes` | boolean | `false` | When `false`, Codex
removes any var whose **name** contains `KEY`, `SECRET`, or `TOKEN`
(case-insensitive) before other rules run. |
| `exclude` | array&lt;string&gt; | `[]` | Case-insensitive glob
patterns to drop after the default filter.<br>Examples: `"AWS_*"`,
`"AZURE_*"`. |
| `set` | table&lt;string,string&gt; | `{}` | Explicit key/value
overrides or additions – always win over inherited values. |
| `include_only` | array&lt;string&gt; | `[]` | If non-empty, a
whitelist of patterns; only variables that match _one_ pattern survive
the final step. (Generally used with `inherit = "all"`.) |


In particular, note that the default is `inherit = "core"`, so:

* if you have extra env variables that you want to inherit from the
parent process, use `inherit = "all"` and then specify `include_only`
* if you have extra env variables where you want to hardcode the values,
the default `inherit = "core"` will work fine, but then you need to
specify `set`

This configuration is not battle-tested, so we will probably still have
to play with it a bit. `core/src/exec_env.rs` has the critical business
logic as well as unit tests.

Though if nothing else, previous to this change:

```
$ cargo run --bin codex -- debug seatbelt -- printenv OPENAI_API_KEY
# ...prints OPENAI_API_KEY...
```

But after this change it does not print anything (as desired).

One final thing to call out about this PR is that the
`configure_command!` macro we use in `core/src/exec.rs` has to do some
complex logic with respect to how it builds up the `env` for the process
being spawned under Landlock/seccomp. Specifically, doing
`cmd.env_clear()` followed by `cmd.envs(&$env_map)` (which is arguably
the most intuitive way to do it) caused the Landlock unit tests to fail
because the processes spawned by the unit tests started failing in
unexpected ways! If we forgo `env_clear()` in favor of updating env vars
one at a time, the tests still pass. The comment in the code talks about
this a bit, and while I would like to investigate this more, I need to
move on for the moment, but I do plan to come back to it to fully
understand what is going on. For example, this suggests that we might
not be able to spawn a C program that calls `env_clear()`, which would
be...weird. We may still have to fiddle with our Landlock config if that
is the case.
2025-05-22 09:51:19 -07:00
Michael Bolin
5746561428 chore: move types out of config.rs into config_types.rs (#1054)
`config.rs` is already quite long without these definitions. Since they
have no real dependencies of their own, let's move them to their own
file so `config.rs` can focus on the business logic of loading a config.
2025-05-20 11:55:25 -07:00
Michael Bolin
d766e845b3 feat: experimental --output-last-message flag to exec subcommand (#1037)
This introduces an experimental `--output-last-message` flag that can be
used to identify a file where the final message from the agent will be
written. Two use cases:

- Ultimately, we will likely add a `--quiet` option to `exec`, but even
if the user does not want any output written to the terminal, they
probably want to know what the agent did. Writing the output to a file
makes it possible to get that information in a clean way.
- Relatedly, when using `exec` in CI, it is easier to review the
transcript written "normally," (i.e., not as JSON or something with
extra escapes), but getting programmatic access to the last message is
likely helpful, so writing the last message to a file gets the best of
both worlds.

I am calling this "experimental" because it is possible that we are
overfitting and will want a more general solution to this problem that
would justify removing this flag.
2025-05-19 16:08:18 -07:00
Michael Bolin
1dc14cefa1 fix: make codex-mini-latest the default model in the Rust TUI (#972)
It's time to make `codex-mini-latest` the new default, as this should be
an "evergreen" model pointer.

* Equivalent change in TypeScript
https://github.com/openai/codex/pull/951
* See some notes about using `codex-mini-latest` with MCP in
https://github.com/openai/codex/pull/961
2025-05-16 17:08:18 -07:00
Michael Bolin
7ca84087e6 feat: make it possible to toggle mouse mode in the Rust TUI (#971)
I did a bit of research to understand why I could not use my mouse to
drag to select text to copy to the clipboard in iTerm.

Apparently https://github.com/openai/codex/pull/641 to enable mousewheel
scrolling broke this functionality. It seems that, unless we put in a
bit of effort, we can have drag-to-select or scrolling, but not both.
Though if you know the trick to hold down `Option` will dragging with
the mouse in iTerm, you can probably get by with this. (I did not know
about this option prior to researching this issue.)

Nevertheless, users may still prefer to disable mouse capture
altogether, so this PR introduces:

* the ability to set `tui.disable_mouse_capture = true` in `config.toml`
to disable mouse capture
* a new command, `/toggle-mouse-mode` to toggle mouse capture
2025-05-16 16:16:50 -07:00
Michael Bolin
f48dd99f22 feat: add support for OpenAI tool type, local_shell (#961)
The new `codex-mini-latest` model expects a new tool with `{"type":
"local_shell"}`. Its contract is similar to the existing `function` tool
with `"name": "shell"`, so this takes the `local_shell` tool call into
`ExecParams` and sends it through the existing
`handle_container_exec_with_params()` code path.

This also adds the following logic when adding the default set of tools
to a request:

```rust
let default_tools = if self.model.starts_with("codex") {
    &DEFAULT_CODEX_MODEL_TOOLS
} else {
    &DEFAULT_TOOLS
};
```

That is, if the model name starts with `"codex"`, we add `{"type":
"local_shell"}` to the list of tools; otherwise, we add the
aforementioned `shell` tool.

To test this, I ran the TUI with `-m codex-mini-latest` and verified
that it used the `local_shell` tool. Though I also had some entries in
`[mcp_servers]` in my personal `config.toml`. The `codex-mini-latest`
model seemed eager to try the tools from the MCP servers first, so I
have personally commented them out for now, so keep an eye out if you're
testing `codex-mini-latest`!

Perhaps we should include more details with `{"type": "local_shell"}` or
update the following:


fd0b1b0208/codex-rs/core/prompt.md

For reference, the corresponding change in the TypeScript CLI is
https://github.com/openai/codex/pull/951.
2025-05-16 14:38:08 -07:00
Michael Bolin
dfd54e1433 chore: refactor handle_function_call() into smaller functions (#965)
Overall, `codex.rs` is still far too large, but at least there's less
indenting now that things have been moved into smaller functions.

This will also make it easier to introduce the `local_shell` tool in
https://github.com/openai/codex/pull/961.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/965).
* #961
* __->__ #965
2025-05-16 14:17:10 -07:00
Michael Bolin
1e39189393 feat: add support for file_opener option in Rust, similiar to #911 (#957)
This ports the enhancement introduced in
https://github.com/openai/codex/pull/911 (and the fixes in
https://github.com/openai/codex/pull/919) for the TypeScript CLI to the
Rust one.
2025-05-16 11:33:08 -07:00
Michael Bolin
3d9f4fcd8a fix: introduce ExtractHeredocError that implements PartialEq (#958) 2025-05-16 09:42:27 -07:00
Michael Bolin
ce2ecbe72f feat: record messages from user in ~/.codex/history.jsonl (#939)
This is a large change to support a "history" feature like you would
expect in a shell like Bash.

History events are recorded in `$CODEX_HOME/history.jsonl`. Because it
is a JSONL file, it is straightforward to append new entries (as opposed
to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be
valid JSON, each new entry entails rewriting the entire file). Because
it is possible for there to be multiple instances of Codex CLI writing
to `history.jsonl` at once, we use advisory file locking when working
with `history.jsonl` in `codex-rs/core/src/message_history.rs`.

Because we believe history is a sufficiently useful feature, we enable
it by default. Though to provide some safety, we set the file
permissions of `history.jsonl` to be `o600` so that other users on the
system cannot read the user's history. We do not yet support a default
list of `SENSITIVE_PATTERNS` as the TypeScript CLI does:


3fdf9df133/codex-cli/src/utils/storage/command-history.ts (L10-L17)

We are going to take a more conservative approach to this list in the
Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude
sensitive information like API tokens, it would also exclude valuable
information such as references to Git commits.

As noted in the updated documentation, users can opt-out of history by
adding the following to `config.toml`:

```toml
[history]
persistence = "none" 
```

Because `history.jsonl` could, in theory, be quite large, we take a[n
arguably overly pedantic] approach in reading history entries into
memory. Specifically, we start by telling the client the current number
of entries in the history file (`history_entry_count`) as well as the
inode (`history_log_id`) of `history.jsonl` (see the new fields on
`SessionConfiguredEvent`).

The client is responsible for keeping new entries in memory to create a
"local history," but if the user hits up enough times to go "past" the
end of local history, then the client should use the new
`GetHistoryEntryRequest` in the protocol to fetch older entries.
Specifically, it should pass the `history_log_id` it was given
originally and work backwards from `history_entry_count`. (It should
really fetch history in batches rather than one-at-a-time, but that is
something we can improve upon in subsequent PRs.)

The motivation behind this crazy scheme is that it is designed to defend
against:

* The `history.jsonl` being truncated during the session such that the
index into the history is no longer consistent with what had been read
up to that point. We do not yet have logic to enforce a `max_bytes` for
`history.jsonl`, but once we do, we will aspire to implement it in a way
that should result in a new inode for the file on most systems.
* New items from concurrent Codex CLI sessions amending to the history.
Because, in absence of truncation, `history.jsonl` is an append-only
log, so long as the client reads backwards from `history_entry_count`,
it should always get a consistent view of history. (That said, it will
not be able to read _new_ commands from concurrent sessions, but perhaps
we will introduce a `/` command to reload latest history or something
down the road.)

Admittedly, my testing of this feature thus far has been fairly light. I
expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
Michael Bolin
ec5e82b77c chore: pin Rust version to 1.86 and use io::Error::other to prepare for 1.87 (#947)
Previously, our GitHub actions specified the Rust toolchain as
`dtolnay/rust-toolchain@stable`, which meant the version could change
out from under us. In this case, the move from 1.86 to 1.87 introduced
new clippy warnings, causing build failures.

Because it will take a little time to fix all the new clippy warnings,
this PR pins things to 1.86 for now to unbreak the build.

It also replaces `io::Error::new(io::ErrorKind::Other)` with
`io::Error::other()` in preparation for 1.87.
2025-05-15 14:07:16 -07:00
Michael Bolin
5fc9fc3e3e chore: expose codex_home via Config (#941) 2025-05-15 00:30:13 -07:00
Michael Bolin
34aa1991f1 chore: handle all cases for EventMsg (#936)
For now, this removes the `#[non_exhaustive]` directive on `EventMsg` so
that we are forced to handle all `EventMsg` by default. (We may revisit
this if/when we publish `core/` as a `lib` crate.) For now, it is
helpful to have this as a forcing function because we have effectively
two UIs (`tui` and `exec`) and usually when we add a new variant to
`EventMsg`, we want to be sure that we update both.
2025-05-14 13:36:43 -07:00
Michael Bolin
399e819c9b fix: increase timeout for test_dev_null_write (#933)
After updating this test in https://github.com/openai/codex/pull/923, I
have been getting some timeouts with this test in CI, so increasing the
timeout to match that of `test_writable_root`:


327cf41f0f/codex-rs/core/src/landlock.rs (L211-L213)
2025-05-14 10:06:14 -07:00
Yaroslav Halchenko
327cf41f0f Add codespell support (config, workflow to detect/not fix) and make it fix some typos (#903)
More about codespell: https://github.com/codespell-project/codespell .

I personally introduced it to dozens if not hundreds of projects already
and so far only positive feedback.

CI workflow has 'permissions' set only to 'read' so also should be safe.

Let me know if just want to take typo fixes in and get rid of the CI

---------

Signed-off-by: Yaroslav O. Halchenko <debian@onerussian.com>
2025-05-14 09:39:49 -07:00
Michael Bolin
5bf9445351 fix: test_dev_null_write() was not using echo as intended (#923)
I believe this test meant to verify that echoing content to `/dev/null`
succeeded, but instead, I believe it was testing the equivalent to `echo
'blah > /dev/null'`.
2025-05-13 21:40:26 -07:00
Michael Bolin
a5f3a34827 fix: change EventMsg enum so every variant takes a single struct (#925)
https://github.com/openai/codex/pull/922 did this for the
`SessionConfigured` enum variant, and I think it is generally helpful to
be able to work with the values as each enum variant as their own type,
so this converts the remaining variants and updates all of the
callsites.

Added a simple unit test to verify that the JSON-serialized version of
`Event` does not have any unexpected nesting.
2025-05-13 20:44:42 -07:00
Michael Bolin
e6c206d19d fix: tighten up some logic around session timestamps and ids (#922)
* update `SessionConfigured` event to include the UUID for the session
* show the UUID in the Rust TUI
* use local timestamps in log files instead of UTC
* include timestamps in log file names for easier discovery
2025-05-13 19:22:16 -07:00
Michael Bolin
3c03c25e56 feat: introduce --profile for Rust CLI (#921)
This introduces a much-needed "profile" concept where users can specify
a collection of options under one name and then pass that via
`--profile` to the CLI.

This PR introduces the `ConfigProfile` struct and makes it a field of
`CargoToml`. It further updates
`Config::load_from_base_config_with_overrides()` to respect
`ConfigProfile`, overriding default values where appropriate. A detailed
unit test is added at the end of `config.rs` to verify this behavior.

Details on how to use this feature have also been added to
`codex-rs/README.md`.
2025-05-13 16:52:52 -07:00
Michael Bolin
61b881d4e5 fix: agent instructions were not being included when ~/.codex/instructions.md was empty (#908)
I had seen issues where `codex-rs` would not always write files without
me pressuring it to do so, and between that and the report of
https://github.com/openai/codex/issues/900, I decided to look into this
further. I found two serious issues with agent instructions:

(1) We were only sending agent instructions on the first turn, but
looking at the TypeScript code, we should be sending them on every turn.

(2) There was a serious issue where the agent instructions were
frequently lost:

* The TypeScript CLI appears to keep writing `~/.codex/instructions.md`:
55142e3e6c/codex-cli/src/utils/config.ts (L586)
* If `instructions.md` is present, the Rust CLI uses the contents of it
INSTEAD OF the default prompt, even if `instructions.md` is empty:
55142e3e6c/codex-rs/core/src/config.rs (L202-L203)

The combination of these two things means that I have been using
`codex-rs` without these key instructions:
https://github.com/openai/codex/blob/main/codex-rs/core/prompt.md

Looking at the TypeScript code, it appears we should be concatenating
these three items every time (if they exist):

* `prompt.md`
* `~/.codex/instructions.md`
* nearest `AGENTS.md`

This PR fixes things so that:

* `Config.instructions` is `None` if `instructions.md` is empty
* `Payload.instructions` is now `&'a str` instead of `Option<&'a
String>` because we should always have _something_ to send
* `Prompt` now has a `get_full_instructions()` helper that returns a
`Cow<str>` that will always include the agent instructions first.
2025-05-12 17:24:44 -07:00
Michael Bolin
115fb0b95d fix: navigate initialization phase before tools/list request in MCP client (#904)
Apparently the MCP server implemented in JavaScript did not require the
`initialize` handshake before responding to tool list/call, so I missed
this.
2025-05-12 15:15:26 -07:00
jcoens-openai
f3bd143867 Disallow expect via lints (#865)
Adds `expect()` as a denied lint. Same deal applies with `unwrap()`
where we now need to put `#[expect(...` on ones that we legit want. Took
care to enable `expect()` in test contexts.

# Tests

```
cargo fmt
cargo clippy --all-features --all-targets --no-deps -- -D warnings
cargo test
```
2025-05-12 08:45:46 -07:00
Michael Bolin
b4785b5f88 feat: include "reasoning" messages in Rust TUI (#892)
As shown in the screenshot, we now include reasoning messages from the
model in the TUI under the heading "codex reasoning":


![image](https://github.com/user-attachments/assets/d8eb3dc3-2f9f-4e95-847e-d24b421249a8)

To ensure these are visible by default when using `o4-mini`, this also
changes the default value for `summary` (formerly `generate_summary`,
which is deprecated in favor of `summary` according to the docs) from
unset to `"auto"`.
2025-05-10 21:43:27 -07:00
Michael Bolin
2b122da087 feat: add support for AGENTS.md in Rust CLI (#885)
The TypeScript CLI already has support for including the contents of
`AGENTS.md` in the instructions sent with the first turn of a
conversation. This PR brings this functionality to the Rust CLI.

To be considered, `AGENTS.md` must be in the `cwd` of the session, or in
one of the parent folders up to a Git/filesystem root (whichever is
encountered first).

By default, a maximum of 32 KiB of `AGENTS.md` will be included, though
this is configurable using the new-in-this-PR `project_doc_max_bytes`
option in `config.toml`.
2025-05-10 17:52:59 -07:00
Michael Bolin
fde48aaa0d feat: experimental env var: CODEX_SANDBOX_NETWORK_DISABLED (#879)
When using Codex to develop Codex itself, I noticed that sometimes it
would try to add `#[ignore]` to the following tests:

```
keeps_previous_response_id_between_tasks()
retries_on_early_close()
```

Both of these tests start a `MockServer` that launches an HTTP server on
an ephemeral port and requires network access to hit it, which the
Seatbelt policy associated with `--full-auto` correctly denies. If I
wasn't paying attention to the code that Codex was generating, one of
these `#[ignore]` annotations could have slipped into the codebase,
effectively disabling the test for everyone.

To that end, this PR enables an experimental environment variable named
`CODEX_SANDBOX_NETWORK_DISABLED` that is set to `1` if the
`SandboxPolicy` used to spawn the process does not have full network
access. I say it is "experimental" because I'm not convinced this API is
quite right, but we need to start somewhere. (It might be more
appropriate to have an env var like `CODEX_SANDBOX=full-auto`, but the
challenge is that our newer `SandboxPolicy` abstraction does not map to
a simple set of enums like in the TypeScript CLI.)

We leverage this new functionality by adding the following code to the
aforementioned tests as a way to "dynamically disable" them:

```rust
if std::env::var(CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR).is_ok() {
    println!(
        "Skipping test because it cannot execute when network is disabled in a Codex sandbox."
    );
    return;
}
```

We can use the `debug seatbelt --full-auto` command to verify that
`cargo test` fails when run under Seatbelt prior to this change:

```
$ cargo run --bin codex -- debug seatbelt --full-auto -- cargo test
---- keeps_previous_response_id_between_tasks stdout ----

thread 'keeps_previous_response_id_between_tasks' panicked at /Users/mbolin/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wiremock-0.6.3/src/mock_server/builder.rs:107:46:
Failed to bind an OS port for a mock server.: Os { code: 1, kind: PermissionDenied, message: "Operation not permitted" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    keeps_previous_response_id_between_tasks

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

error: test failed, to rerun pass `-p codex-core --test previous_response_id`
```

Though after this change, the above command succeeds! This means that,
going forward, when Codex operates on Codex itself, when it runs `cargo
test`, only "real failures" should cause the command to fail.

As part of this change, I decided to tighten up the codepaths for
running `exec()` for shell tool calls. In particular, we do it in `core`
for the main Codex business logic itself, but we also expose this logic
via `debug` subcommands in the CLI in the `cli` crate. The logic for the
`debug` subcommands was not quite as faithful to the true business logic
as I liked, so I:

* refactored a bit of the Linux code, splitting `linux.rs` into
`linux_exec.rs` and `landlock.rs` in the `core` crate.
* gating less code behind `#[cfg(target_os = "linux")]` because such
code does not get built by default when I develop on Mac, which means I
either have to build the code in Docker or wait for CI signal
* introduced `macro_rules! configure_command` in `exec.rs` so we can
have both sync and async versions of this code. The synchronous version
seems more appropriate for straight threads or potentially fork/exec.
2025-05-09 18:29:34 -07:00
Michael Bolin
93817643ee chore: refactor exec() into spawn_child() and consume_truncated_output() (#878)
This PR is a straight refactor so that creating the `Child` process for
an `shell` tool call and consuming its output can be separate concerns.
For the actual tool call, we will always apply
`consume_truncated_output()`, but for the top-level debug commands in
the CLI (e.g., `debug seatbelt` and `debug landlock`), we only want to
use the `spawn_child()` part of `exec()`.

We want the subcommands to match the `shell` tool call usage as
faithfully as possible. This becomes more important when we introduce a
new parameter to `spawn_child()` in
https://github.com/openai/codex/pull/879.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/878).
* #879
* __->__ #878
2025-05-09 11:03:58 -07:00
Michael Bolin
27198bfe11 fix: make McpConnectionManager tolerant of MCPs that fail to start (#854)
I added a typo in my `config.toml` such that the `command` for one of my
`mcp_servers` did not exist and I verified that the error was surfaced
in the TUI (and that I was still able to use Codex).


![image](https://github.com/user-attachments/assets/f13cc08c-f4c6-40ec-9ab4-a9d75e03152f)
2025-05-08 23:45:54 -07:00
Michael Bolin
b940adae8e fix: get responses API working again in Rust (#872)
I inadvertently regressed support for the Responses API when adding
support for the chat completions API in
https://github.com/openai/codex/pull/862. This should get both APIs
working again, but the chat completions codepath seems more complex than
necessary. I'll try to clean that up shortly, but I want to get things
working again ASAP.
2025-05-08 22:49:15 -07:00
Michael Bolin
e924070cee feat: support the chat completions API in the Rust CLI (#862)
This is a substantial PR to add support for the chat completions API,
which in turn makes it possible to use non-OpenAI model providers (just
like in the TypeScript CLI):

* It moves a number of structs from `client.rs` to `client_common.rs` so
they can be shared.
* It introduces support for the chat completions API in
`chat_completions.rs`.
* It updates `ModelProviderInfo` so that `env_key` is `Option<String>`
instead of `String` (for e.g., ollama) and adds a `wire_api` field
* It updates `client.rs` to choose between `stream_responses()` and
`stream_chat_completions()` based on the `wire_api` for the
`ModelProviderInfo`
* It updates the `exec` and TUI CLIs to no longer fail if the
`OPENAI_API_KEY` environment variable is not set
* It updates the TUI so that `EventMsg::Error` is displayed more
prominently when it occurs, particularly now that it is important to
alert users to the `CodexErr::EnvVar` variant.
* `CodexErr::EnvVar` was updated to include an optional `instructions`
field so we can preserve the behavior where we direct users to
https://platform.openai.com if `OPENAI_API_KEY` is not set.
* Cleaned up the "welcome message" in the TUI to ensure the model
provider is displayed.
* Updated the docs in `codex-rs/README.md`.

To exercise the chat completions API from OpenAI models, I added the
following to my `config.toml`:

```toml
model = "gpt-4o"
model_provider = "openai-chat-completions"

[model_providers.openai-chat-completions]
name = "OpenAI using Chat Completions"
base_url = "https://api.openai.com/v1"
env_key = "OPENAI_API_KEY"
wire_api = "chat"
```

Though to test a non-OpenAI provider, I installed ollama with mistral
locally on my Mac because ChatGPT said that would be a good match for my
hardware:

```shell
brew install ollama
ollama serve
ollama pull mistral
```

Then I added the following to my `~/.codex/config.toml`:

```toml
model = "mistral"
model_provider = "ollama"
```

Note this code could certainly use more test coverage, but I want to get
this in so folks can start playing with it.

For reference, I believe https://github.com/openai/codex/pull/247 was
roughly the comparable PR on the TypeScript side.
2025-05-08 21:46:06 -07:00
Michael Bolin
a9adb4175c fix: enable clippy on tests (#870)
https://github.com/openai/codex/pull/855 added the clippy warning to
disallow `unwrap()`, but apparently we were not verifying that tests
were "clippy clean" in CI, so I ended up with a lot of local errors in
VS Code.

This turns on the check in CI and fixes the offenders.
2025-05-08 16:02:56 -07:00
jcoens-openai
87cf120873 Workspace lints and disallow unwrap (#855)
Sets submodules to use workspace lints. Added denying unwrap as a
workspace level lint, which found a couple of cases where we could have
propagated errors. Also manually labeled ones that were fine by my eye.
2025-05-08 09:46:18 -07:00
Michael Bolin
86022f097e feat: read model_provider and model_providers from config.toml (#853)
This is the first step in supporting other model providers in the Rust
CLI. Specifically, this PR adds support for the new entries in `Config`
and `ConfigOverrides` to specify a `ModelProviderInfo`, which is the
basic config needed for an LLM provider. This PR does not get us all the
way there yet because `client.rs` still categorically appends
`/responses` to the URL and expects the endpoint to support the OpenAI
Responses API. Will fix that next!
2025-05-07 17:38:28 -07:00
Michael Bolin
cfe50c7107 fix: creating an instance of Codex requires a Config (#859)
I discovered that I accidentally introduced a change in
https://github.com/openai/codex/pull/829 where we load a fresh `Config`
in the middle of `codex.rs`:


c3e10e180a/codex-rs/core/src/codex.rs (L515-L522)

This is not good because the `Config` could differ from the one that has
the user's overrides specified from the CLI. Also, in unit tests, it
means the `Config` was picking up my personal settings as opposed to
using a vanilla config, which was problematic.

This PR cleans things up by moving the common case where
`Op::ConfigureSession` is derived from `Config` (originally done in
`codex_wrapper.rs`) and making it the standard way to initialize `Codex`
by putting it in `Codex::spawn()`. Note this also eliminates quite a bit
of boilerplate from the tests and relieves the caller of the
responsibility of minting out unique IDs when invoking `submit()`.
2025-05-07 16:33:28 -07:00
Michael Bolin
c3e10e180a fix: remove CodexBuilder and Recorder (#858)
These abstractions were originally created exclusively for the REPL,
which was removed in https://github.com/openai/codex/pull/754.
Currently, the create some unnecessary Tokio tasks, so we are better off
without them. (We can always bring this back if we have a new use case.)
2025-05-07 16:11:42 -07:00
Michael Bolin
42617f8726 feat: save session transcripts when using Rust CLI (#845)
This adds support for saving transcripts when using the Rust CLI. Like
the TypeScript CLI, it saves the transcript to `~/.codex/sessions`,
though it uses JSONL for the file format (and `.jsonl` for the file
extension) so that even if Codex crashes, what was written to the
`.jsonl` file should generally still be valid JSONL content.
2025-05-07 13:49:15 -07:00
Michael Bolin
9da6ebef3f fix: add optional timeout to McpClient::send_request() (#852)
We now impose a 10s timeout on the initial `tools/list` request to an
MCP server. We do not apply a timeout for other types of requests yet,
but we should start enforcing those, as well.
2025-05-07 12:56:38 -07:00