## Summary
In an effort to make tools easier to work with and more configurable,
I'm introducing `ToolConfig` and updating `Prompt` to take in a general
list of Tools. I think this is simpler and better for a few reasons:
- We can easily assemble tools from various sources (our own harness,
mcp servers, etc.) and we can consolidate the logic for constructing the
logic in one place that is separate from serialization.
- client.rs no longer needs arbitrary config values, it just takes in a
list of tools to serialize
A hefty portion of the PR is now updating our conversion of
`mcp_types::Tool` to `OpenAITool`, but considering that @bolinfest
accurately called this out as a TODO long ago, I think it's time we
tackled it.
## Testing
- [x] Experimented locally, no changes, as expected
- [x] Added additional unit tests
- [x] Responded to rust-review
## Summary
Escalating out of sandbox is (almost always) not going to fix
long-running commands timing out - therefore we should just pass the
failure back to the model instead of asking the user to re-run a command
that took a long time anyway.
## Testing
- [x] Ran locally with a timeout and confirmed this worked as expected
https://github.com/openai/codex/pull/1835 has some messed up history.
This adds support for streaming chat completions, which is useful for ollama. We should probably take a very skeptical eye to the code introduced in this PR.
---------
Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
Stream models thoughts and responses instead of waiting for the whole
thing to come through. Very rough right now, but I'm making the risk call to push through.
This lets us show an accumulating diff across all patches in a turn.
Refer to the docs for TurnDiffTracker for implementation details.
There are multiple ways this could have been done and this felt like the
right tradeoff between reliability and completeness:
*Pros*
* It will pick up all changes to files that the model touched including
if they prettier or another command that updates them.
* It will not pick up changes made by the user or other agents to files
it didn't modify.
*Cons*
* It will pick up changes that the user made to a file that the model
also touched
* It will not pick up changes to codegen or files that were not modified
with apply_patch
## Summary
- stream command stdout as `ExecCommandStdout` events
- forward streamed stdout to clients and ignore in human output
processor
- adjust call sites for new streaming API
- Add operation to summarize the context so far.
- The operation runs a compact task that summarizes the context.
- The operation clear the previous context to free the context window
- The operation didn't use `run_task` to avoid corrupting the session
- Add /compact in the tui
https://github.com/user-attachments/assets/e06c24e5-dcfb-4806-934a-564d425a919c
This is a follow-up to https://github.com/openai/codex/pull/1705, as
that PR inadvertently lost the logic where `PatchApplyBeginEvent` and
`PatchApplyEndEvent` events were sent when patches were auto-approved.
Though as part of this fix, I believe this also makes an important
safety fix to `assess_patch_safety()`, as there was a case that returned
`SandboxType::None`, which arguably is the thing we were trying to avoid
in #1705.
On a high level, we want there to be only one codepath where
`apply_patch` happens, which should be unified with the patch to run
`exec`, in general, so that sandboxing is applied consistently for both
cases.
Prior to this change, `apply_patch()` in `core` would either:
* exit early, delegating to `exec()` to shell out to `apply_patch` using
the appropriate sandbox
* proceed to run the logic for `apply_patch` in memory
549846b29a/codex-rs/core/src/apply_patch.rs (L61-L63)
In this implementation, only the latter would dispatch
`PatchApplyBeginEvent` and `PatchApplyEndEvent`, though the former would
dispatch `ExecCommandBeginEvent` and `ExecCommandEndEvent` for the
`apply_patch` call (or, more specifically, the `codex
--codex-run-as-apply-patch PATCH` call).
To unify things in this PR, we:
* Eliminate the back half of the `apply_patch()` function, and instead
have it also return with `DelegateToExec`, though we add an extra field
to the return value, `user_explicitly_approved_this_action`.
* In `codex.rs` where we process `DelegateToExec`, we use
`SandboxType::None` when `user_explicitly_approved_this_action` is
`true`. This means **we no longer run the apply_patch logic in memory**,
as we always `exec()`. (Note this is what allowed us to delete so much
code in `apply_patch.rs`.)
* In `codex.rs`, we further update `notify_exec_command_begin()` and
`notify_exec_command_end()` to take additional fields to determine what
type of notification to send: `ExecCommand` or `PatchApply`.
Admittedly, this PR also drops some of the functionality about giving
the user the opportunity to expand the set of writable roots as part of
approving the `apply_patch` command. I'm not sure how much that was
used, and we should probably rethink how that works as we are currently
tidying up the protocol to the TUI, in general.
Building on the work of https://github.com/openai/codex/pull/1702, this
changes how a shell call to `apply_patch` is handled.
Previously, a shell call to `apply_patch` was always handled in-process,
never leveraging a sandbox. To determine whether the `apply_patch`
operation could be auto-approved, the
`is_write_patch_constrained_to_writable_paths()` function would check if
all the paths listed in the paths were writable. If so, the agent would
apply the changes listed in the patch.
Unfortunately, this approach afforded a loophole: symlinks!
* For a soft link, we could fix this issue by tracing the link and
checking whether the target is in the set of writable paths, however...
* ...For a hard link, things are not as simple. We can run `stat FILE`
to see if the number of links is greater than 1, but then we would have
to do something potentially expensive like `find . -inum <inode_number>`
to find the other paths for `FILE`. Further, even if this worked, this
approach runs the risk of a
[TOCTOU](https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use)
race condition, so it is not robust.
The solution, implemented in this PR, is to take the virtual execution
of the `apply_patch` CLI into an _actual_ execution using `codex
--codex-run-as-apply-patch PATCH`, which we can run under the sandbox
the user specified, just like any other `shell` call.
This, of course, assumes that the sandbox prevents writing through
symlinks as a mechanism to write to folders that are not in the writable
set configured by the sandbox. I verified this by testing the following
on both Mac and Linux:
```shell
#!/usr/bin/env bash
set -euo pipefail
# Can running a command in SANDBOX_DIR write a file in EXPLOIT_DIR?
# Codex is run in SANDBOX_DIR, so writes should be constrianed to this directory.
SANDBOX_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX)
# EXPLOIT_DIR is outside of SANDBOX_DIR, so let's see if we can write to it.
EXPLOIT_DIR=$(mktemp -d -p "$HOME" sandboxtesttemp.XXXXXX)
echo "SANDBOX_DIR: $SANDBOX_DIR"
echo "EXPLOIT_DIR: $EXPLOIT_DIR"
cleanup() {
# Only remove if it looks sane and still exists
[[ -n "${SANDBOX_DIR:-}" && -d "$SANDBOX_DIR" ]] && rm -rf -- "$SANDBOX_DIR"
[[ -n "${EXPLOIT_DIR:-}" && -d "$EXPLOIT_DIR" ]] && rm -rf -- "$EXPLOIT_DIR"
}
trap cleanup EXIT
echo "I am the original content" > "${EXPLOIT_DIR}/original.txt"
# Drop the -s to test hard links.
ln -s "${EXPLOIT_DIR}/original.txt" "${SANDBOX_DIR}/link-to-original.txt"
cat "${SANDBOX_DIR}/link-to-original.txt"
if [[ "$(uname)" == "Linux" ]]; then
SANDBOX_SUBCOMMAND=landlock
else
SANDBOX_SUBCOMMAND=seatbelt
fi
# Attempt the exploit
cd "${SANDBOX_DIR}"
codex debug "${SANDBOX_SUBCOMMAND}" bash -lc "echo pwned > ./link-to-original.txt" || true
cat "${EXPLOIT_DIR}/original.txt"
```
Admittedly, this change merits a proper integration test, but I think I
will have to do that in a follow-up PR.
Adds a `CodexAuth` type that encapsulates information about available
auth modes and logic for refreshing the token.
Changes `Responses` API to send requests to different endpoints based on
the auth type.
Updates login_with_chatgpt to support API-less mode and skip the key
exchange.
This adds a tool the model can call to update a plan. The tool doesn't
actually _do_ anything but it gives clients a chance to read and render
the structured plan. We will likely iterate on the prompt and tools
exposed for planning over time.
This is a straight refactor, moving apply-patch-related code from
`codex.rs` and into the new `apply_patch.rs` file. The only "logical"
change is inlining `#[allow(clippy::unwrap_used)]` instead of declaring
`#![allow(clippy::unwrap_used)]` at the top of the file (which is
currently the case in `codex.rs`).
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1703).
* #1705
* __->__ #1703
* #1702
* #1698
* #1697
# Summary
- Writing effective evals for codex sessions requires context of the
overall repository state at the moment the session began
- This change adds this metadata (git repository, branch, commit hash)
to the top of the rollout of the session (if available - if not it
doesn't add anything)
- Currently, this is only effective on a clean working tree, as we can't
track uncommitted/untracked changes with the current metadata set.
Ideally in the future we may want to track unclean changes somehow, or
perhaps prompt the user to stash or commit them.
# Testing
- Added unit tests
- `cargo test && cargo clippy --tests && cargo fmt -- --config
imports_granularity=Item`
### Resulting Rollout
<img width="1243" height="127" alt="Screenshot 2025-07-17 at 1 50 00 PM"
src="https://github.com/user-attachments/assets/68108941-f015-45b2-985c-ea315ce05415"
/>
1. Emit call_id to exec approval elicitations for mcp client convenience
2. Remove the `-retry` from the call id for the same reason as above but
upstream the reset behavior to the mcp client
Always store the entire conversation history.
Request encrypted COT when not storing Responses.
Send entire input context instead of sending previous_response_id
## Summary
Adds a new mcp tool call, `codex-reply`, so we can continue existing
sessions. This is a first draft and does not yet support sessions from
previous processes.
## Testing
- [x] tested with mcp client
This updates the schema in `generate_mcp_types.py` from `2025-03-26` to
`2025-06-18`, regenerates `mcp-types/src/lib.rs`, and then updates all
the code that uses `mcp-types` to honor the changes.
Ran
```
npx @modelcontextprotocol/inspector just codex mcp
```
and verified that I was able to invoke the `codex` tool, as expected.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1621).
* #1623
* #1622
* __->__ #1621
## Summary
- extend rollout format to store all session data in JSON
- add resume/write helpers for rollouts
- track session state after each conversation
- support `LoadSession` op to resume a previous rollout
- allow starting Codex with an existing session via
`experimental_resume` config variable
We need a way later for exploring the available sessions in a user
friendly way.
## Testing
- `cargo test --no-run` *(fails: `cargo: command not found`)*
------
https://chatgpt.com/codex/tasks/task_i_68792a29dd5c832190bf6930d3466fba
This video is outdated. you should use `-c experimental_resume:<full
path>` instead of `--resume <full path>`
https://github.com/user-attachments/assets/7a9975c7-aa04-4f4e-899a-9e87defd947a
## Summary
- add OpenAI retry and timeout fields to Config
- inject these settings in tests instead of mutating env vars
- plumb Config values through client and chat completions logic
- document new configuration options
## Testing
- `cargo test -p codex-core --no-run`
------
https://chatgpt.com/codex/tasks/task_i_68792c5b04cc832195c03050c8b6ea94
---------
Co-authored-by: Michael Bolin <mbolin@openai.com>
- Added support for message and reasoning deltas
- Skipped adding the support in the cli and tui for later
- Commented a failing test (wrong merge) that needs fix in a separate
PR.
Side note: I think we need to disable merge when the CI don't pass.
As noted in the updated docs, this makes it so that you can set:
```toml
model_supports_reasoning_summaries = true
```
as a way of overriding the existing heuristic for when to set the
`reasoning` field on a sampling request:
341c091c5b/codex-rs/core/src/client_common.rs (L152-L166)
When using the OpenAI Responses API, we now record the `usage` field for
a `"response.completed"` event, which includes metrics about the number
of tokens consumed. We also introduce `openai_model_info.rs`, which
includes current data about the most common OpenAI models available via
the API (specifically `context_window` and `max_output_tokens`). If
Codex does not recognize the model, you can set `model_context_window`
and `model_max_output_tokens` explicitly in `config.toml`.
When then introduce a new event type to `protocol.rs`, `TokenCount`,
which includes the `TokenUsage` for the most recent turn.
Finally, we update the TUI to record the running sum of tokens used so
the percentage of available context window remaining can be reported via
the placeholder text for the composer:

We could certainly get much fancier with this (such as reporting the
estimated cost of the conversation), but for now, we are just trying to
achieve feature parity with the TypeScript CLI.
Though arguably this improves upon the TypeScript CLI, as the TypeScript
CLI uses heuristics to estimate the number of tokens used rather than
using the `usage` information directly:
296996d74e/codex-cli/src/utils/approximate-tokens-used.ts (L3-L16)
Fixes https://github.com/openai/codex/issues/1242
This PR reworks `assess_command_safety()` so that the combination of
`AskForApproval::Never` and `SandboxPolicy::DangerFullAccess` ensures
that commands are run without _any_ sandbox and the user should never be
prompted. In turn, it adds support for a new
`--dangerously-bypass-approvals-and-sandbox` flag (that cannot be used
with `--approval-policy` or `--full-auto`) that sets both of those
options.
Fixes https://github.com/openai/codex/issues/1254
This fixes a longstanding error in the Rust CLI where `codex.rs`
contained an errant `is_first_turn` check that would exclude the user
instructions for subsequent "turns" of a conversation when using the
responses API (i.e., when `previous_response_id` existed).
While here, renames `Prompt.instructions` to `Prompt.user_instructions`
since we now have quite a few levels of instructions floating around.
Also removed an unnecessary use of `clone()` in
`Prompt.get_full_instructions()`.
Previous to this PR, we always set `reasoning` when making a request
using the Responses API:
d7245cbbc9/codex-rs/core/src/client.rs (L108-L111)
Though if you tried to use the Rust CLI with `--model gpt-4.1`, this
would fail with:
```shell
"Unsupported parameter: 'reasoning.effort' is not supported with this model."
```
We take a cue from the TypeScript CLI, which does a check on the model
name:
d7245cbbc9/codex-cli/src/utils/agent/agent-loop.ts (L786-L789)
This PR does a similar check, though also adds support for the following
config options:
```
model_reasoning_effort = "low" | "medium" | "high" | "none"
model_reasoning_summary = "auto" | "concise" | "detailed" | "none"
```
This way, if you have a model whose name happens to start with `"o"` (or
`"codex"`?), you can set these to `"none"` to explicitly disable
reasoning, if necessary. (That said, it seems unlikely anyone would use
the Responses API with non-OpenAI models, but we provide an escape
hatch, anyway.)
This PR also updates both the TUI and `codex exec` to show `reasoning
effort` and `reasoning summaries` in the header.
Prior to this PR, there were two big misses in `chat_completions.rs`:
1. The loop in `stream_chat_completions()` was only including items of
type `ResponseItem::Message` when building up the `"messages"` JSON for
the `POST` request to the `chat/completions` endpoint. This fixes things
by ensuring other variants (`FunctionCall`, `LocalShellCall`, and
`FunctionCallOutput`) are included, as well.
2. In `process_chat_sse()`, we were not recording tool calls and were
only emitting items of type
`ResponseEvent::OutputItemDone(ResponseItem::Message)` to the stream.
Now we introduce `FunctionCallState`, which is used to accumulate the
`delta`s of type `tool_calls`, so we can ultimately emit a
`ResponseItem::FunctionCall`, when appropriate.
While function calling now appears to work for chat completions with my
local testing, I believe that there are still edge cases that are not
covered and that this codepath would benefit from a battery of
integration tests. (As part of that further cleanup, we should also work
to support streaming responses in the UI.)
The other important part of this PR is some cleanup in
`core/src/codex.rs`. In particular, it was hard to reason about how
`run_task()` was building up the list of messages to include in a
request across the various cases:
- Responses API
- Chat Completions API
- Responses API used in concert with ZDR
I like to think things are a bit cleaner now where:
- `zdr_transcript` (if present) contains all messages in the history of
the conversation, which includes function call outputs that have not
been sent back to the model yet
- `pending_input` includes any messages the user has submitted while the
turn is in flight that need to be injected as part of the next `POST` to
the model
- `input_for_next_turn` includes the tool call outputs that have not
been sent back to the model yet
Historically, we spawned the Seatbelt and Landlock sandboxes in
substantially different ways:
For **Seatbelt**, we would run `/usr/bin/sandbox-exec` with our policy
specified as an arg followed by the original command:
d1de7bb383/codex-rs/core/src/exec.rs (L147-L219)
For **Landlock/Seccomp**, we would do
`tokio::runtime::Builder::new_current_thread()`, _invoke
Landlock/Seccomp APIs to modify the permissions of that new thread_, and
then spawn the command:
d1de7bb383/codex-rs/core/src/exec_linux.rs (L28-L49)
While it is neat that Landlock/Seccomp supports applying a policy to
only one thread without having to apply it to the entire process, it
requires us to maintain two different codepaths and is a bit harder to
reason about. The tipping point was
https://github.com/openai/codex/pull/1061, in which we had to start
building up the `env` in an unexpected way for the existing
Landlock/Seccomp approach to continue to work.
This PR overhauls things so that we do similar things for Mac and Linux.
It turned out that we were already building our own "helper binary"
comparable to Mac's `sandbox-exec` as part of the `cli` crate:
d1de7bb383/codex-rs/cli/Cargo.toml (L10-L12)
We originally created this to build a small binary to include with the
Node.js version of the Codex CLI to provide support for Linux
sandboxing.
Though the sticky bit is that, at this point, we still want to deploy
the Rust version of Codex as a single, standalone binary rather than a
CLI and a supporting sandboxing binary. To satisfy this goal, we use
"the arg0 trick," in which we:
* use `std::env::current_exe()` to get the path to the CLI that is
currently running
* use the CLI as the `program` for the `Command`
* set `"codex-linux-sandbox"` as arg0 for the `Command`
A CLI that supports sandboxing should check arg0 at the start of the
program. If it is `"codex-linux-sandbox"`, it must invoke
`codex_linux_sandbox::run_main()`, which runs the CLI as if it were
`codex-linux-sandbox`. When acting as `codex-linux-sandbox`, we make the
appropriate Landlock/Seccomp API calls and then use `execvp(3)` to spawn
the original command, so do _replace_ the process rather than spawn a
subprocess. Incidentally, we do this before starting the Tokio runtime,
so the process should only have one thread when `execvp(3)` is called.
Because the `core` crate that needs to spawn the Linux sandboxing is not
a CLI in its own right, this means that every CLI that includes `core`
and relies on this behavior has to (1) implement it and (2) provide the
path to the sandboxing executable. While the path is almost always
`std::env::current_exe()`, we needed to make this configurable for
integration tests, so `Config` now has a `codex_linux_sandbox_exe:
Option<PathBuf>` property to facilitate threading this through,
introduced in https://github.com/openai/codex/pull/1089.
This common pattern is now captured in
`codex_linux_sandbox::run_with_sandbox()` and all of the `main.rs`
functions that should use it have been updated as part of this PR.
The `codex-linux-sandbox` crate added to the Cargo workspace as part of
this PR now has the bulk of the Landlock/Seccomp logic, which makes
`core` a bit simpler. Indeed, `core/src/exec_linux.rs` and
`core/src/landlock.rs` were removed/ported as part of this PR. I also
moved the unit tests for this code into an integration test,
`linux-sandbox/tests/landlock.rs`, in which I use
`env!("CARGO_BIN_EXE_codex-linux-sandbox")` as the value for
`codex_linux_sandbox_exe` since `std::env::current_exe()` is not
appropriate in that case.
To date, when handling `shell` and `local_shell` tool calls, we were
spawning new processes using the environment inherited from the Codex
process itself. This means that the sensitive `OPENAI_API_KEY` that
Codex needs to talk to OpenAI models was made available to everything
run by `shell` and `local_shell`. While there are cases where that might
be useful, it does not seem like a good default.
This PR introduces a complex `shell_environment_policy` config option to
control the `env` used with these tool calls. It is inevitably a bit
complex so that it is possible to override individual components of the
policy so without having to restate the entire thing.
Details are in the updated `README.md` in this PR, but here is the
relevant bit that explains the individual fields of
`shell_environment_policy`:
| Field | Type | Default | Description |
| ------------------------- | -------------------------- | ------- |
-----------------------------------------------------------------------------------------------------------------------------------------------
|
| `inherit` | string | `core` | Starting template for the
environment:<br>`core` (`HOME`, `PATH`, `USER`, …), `all` (clone full
parent env), or `none` (start empty). |
| `ignore_default_excludes` | boolean | `false` | When `false`, Codex
removes any var whose **name** contains `KEY`, `SECRET`, or `TOKEN`
(case-insensitive) before other rules run. |
| `exclude` | array<string> | `[]` | Case-insensitive glob
patterns to drop after the default filter.<br>Examples: `"AWS_*"`,
`"AZURE_*"`. |
| `set` | table<string,string> | `{}` | Explicit key/value
overrides or additions – always win over inherited values. |
| `include_only` | array<string> | `[]` | If non-empty, a
whitelist of patterns; only variables that match _one_ pattern survive
the final step. (Generally used with `inherit = "all"`.) |
In particular, note that the default is `inherit = "core"`, so:
* if you have extra env variables that you want to inherit from the
parent process, use `inherit = "all"` and then specify `include_only`
* if you have extra env variables where you want to hardcode the values,
the default `inherit = "core"` will work fine, but then you need to
specify `set`
This configuration is not battle-tested, so we will probably still have
to play with it a bit. `core/src/exec_env.rs` has the critical business
logic as well as unit tests.
Though if nothing else, previous to this change:
```
$ cargo run --bin codex -- debug seatbelt -- printenv OPENAI_API_KEY
# ...prints OPENAI_API_KEY...
```
But after this change it does not print anything (as desired).
One final thing to call out about this PR is that the
`configure_command!` macro we use in `core/src/exec.rs` has to do some
complex logic with respect to how it builds up the `env` for the process
being spawned under Landlock/seccomp. Specifically, doing
`cmd.env_clear()` followed by `cmd.envs(&$env_map)` (which is arguably
the most intuitive way to do it) caused the Landlock unit tests to fail
because the processes spawned by the unit tests started failing in
unexpected ways! If we forgo `env_clear()` in favor of updating env vars
one at a time, the tests still pass. The comment in the code talks about
this a bit, and while I would like to investigate this more, I need to
move on for the moment, but I do plan to come back to it to fully
understand what is going on. For example, this suggests that we might
not be able to spawn a C program that calls `env_clear()`, which would
be...weird. We may still have to fiddle with our Landlock config if that
is the case.
This introduces an experimental `--output-last-message` flag that can be
used to identify a file where the final message from the agent will be
written. Two use cases:
- Ultimately, we will likely add a `--quiet` option to `exec`, but even
if the user does not want any output written to the terminal, they
probably want to know what the agent did. Writing the output to a file
makes it possible to get that information in a clean way.
- Relatedly, when using `exec` in CI, it is easier to review the
transcript written "normally," (i.e., not as JSON or something with
extra escapes), but getting programmatic access to the last message is
likely helpful, so writing the last message to a file gets the best of
both worlds.
I am calling this "experimental" because it is possible that we are
overfitting and will want a more general solution to this problem that
would justify removing this flag.
The new `codex-mini-latest` model expects a new tool with `{"type":
"local_shell"}`. Its contract is similar to the existing `function` tool
with `"name": "shell"`, so this takes the `local_shell` tool call into
`ExecParams` and sends it through the existing
`handle_container_exec_with_params()` code path.
This also adds the following logic when adding the default set of tools
to a request:
```rust
let default_tools = if self.model.starts_with("codex") {
&DEFAULT_CODEX_MODEL_TOOLS
} else {
&DEFAULT_TOOLS
};
```
That is, if the model name starts with `"codex"`, we add `{"type":
"local_shell"}` to the list of tools; otherwise, we add the
aforementioned `shell` tool.
To test this, I ran the TUI with `-m codex-mini-latest` and verified
that it used the `local_shell` tool. Though I also had some entries in
`[mcp_servers]` in my personal `config.toml`. The `codex-mini-latest`
model seemed eager to try the tools from the MCP servers first, so I
have personally commented them out for now, so keep an eye out if you're
testing `codex-mini-latest`!
Perhaps we should include more details with `{"type": "local_shell"}` or
update the following:
fd0b1b0208/codex-rs/core/prompt.md
For reference, the corresponding change in the TypeScript CLI is
https://github.com/openai/codex/pull/951.
This is a large change to support a "history" feature like you would
expect in a shell like Bash.
History events are recorded in `$CODEX_HOME/history.jsonl`. Because it
is a JSONL file, it is straightforward to append new entries (as opposed
to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be
valid JSON, each new entry entails rewriting the entire file). Because
it is possible for there to be multiple instances of Codex CLI writing
to `history.jsonl` at once, we use advisory file locking when working
with `history.jsonl` in `codex-rs/core/src/message_history.rs`.
Because we believe history is a sufficiently useful feature, we enable
it by default. Though to provide some safety, we set the file
permissions of `history.jsonl` to be `o600` so that other users on the
system cannot read the user's history. We do not yet support a default
list of `SENSITIVE_PATTERNS` as the TypeScript CLI does:
3fdf9df133/codex-cli/src/utils/storage/command-history.ts (L10-L17)
We are going to take a more conservative approach to this list in the
Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude
sensitive information like API tokens, it would also exclude valuable
information such as references to Git commits.
As noted in the updated documentation, users can opt-out of history by
adding the following to `config.toml`:
```toml
[history]
persistence = "none"
```
Because `history.jsonl` could, in theory, be quite large, we take a[n
arguably overly pedantic] approach in reading history entries into
memory. Specifically, we start by telling the client the current number
of entries in the history file (`history_entry_count`) as well as the
inode (`history_log_id`) of `history.jsonl` (see the new fields on
`SessionConfiguredEvent`).
The client is responsible for keeping new entries in memory to create a
"local history," but if the user hits up enough times to go "past" the
end of local history, then the client should use the new
`GetHistoryEntryRequest` in the protocol to fetch older entries.
Specifically, it should pass the `history_log_id` it was given
originally and work backwards from `history_entry_count`. (It should
really fetch history in batches rather than one-at-a-time, but that is
something we can improve upon in subsequent PRs.)
The motivation behind this crazy scheme is that it is designed to defend
against:
* The `history.jsonl` being truncated during the session such that the
index into the history is no longer consistent with what had been read
up to that point. We do not yet have logic to enforce a `max_bytes` for
`history.jsonl`, but once we do, we will aspire to implement it in a way
that should result in a new inode for the file on most systems.
* New items from concurrent Codex CLI sessions amending to the history.
Because, in absence of truncation, `history.jsonl` is an append-only
log, so long as the client reads backwards from `history_entry_count`,
it should always get a consistent view of history. (That said, it will
not be able to read _new_ commands from concurrent sessions, but perhaps
we will introduce a `/` command to reload latest history or something
down the road.)
Admittedly, my testing of this feature thus far has been fairly light. I
expect we will find bugs and introduce enhancements/fixes going forward.
More about codespell: https://github.com/codespell-project/codespell .
I personally introduced it to dozens if not hundreds of projects already
and so far only positive feedback.
CI workflow has 'permissions' set only to 'read' so also should be safe.
Let me know if just want to take typo fixes in and get rid of the CI
---------
Signed-off-by: Yaroslav O. Halchenko <debian@onerussian.com>
https://github.com/openai/codex/pull/922 did this for the
`SessionConfigured` enum variant, and I think it is generally helpful to
be able to work with the values as each enum variant as their own type,
so this converts the remaining variants and updates all of the
callsites.
Added a simple unit test to verify that the JSON-serialized version of
`Event` does not have any unexpected nesting.