Adding the ability to resume conversations.
we have one verb `resume`.
Behavior:
`tui`:
`codex resume`: opens session picker
`codex resume --last`: continue last message
`codex resume <session id>`: continue conversation with `session id`
`exec`:
`codex resume --last`: continue last conversation
`codex resume <session id>`: continue conversation with `session id`
Implementation:
- I added a function to find the path in `~/.codex/sessions/` with a
`UUID`. This is helpful in resuming with session id.
- Added the above mentioned flags
- Added lots of testing
This PR adds a central `AuthManager` struct that manages the auth
information used across conversations and the MCP server. Prior to this,
each conversation and the MCP server got their own private snapshots of
the auth information, and changes to one (such as a logout or token
refresh) were not seen by others.
This is especially problematic when multiple instances of the CLI are
run. For example, consider the case where you start CLI 1 and log in to
ChatGPT account X and then start CLI 2 and log out and then log in to
ChatGPT account Y. The conversation in CLI 1 is still using account X,
but if you create a new conversation, it will suddenly (and
unexpectedly) switch to account Y.
With the `AuthManager`, auth information is read from disk at the time
the `ConversationManager` is constructed, and it is cached in memory.
All new conversations use this same auth information, as do any token
refreshes.
The `AuthManager` is also used by the MCP server's GetAuthStatus
command, which now returns the auth method currently used by the MCP
server.
This PR also includes an enhancement to the GetAuthStatus command. It
now accepts two new (optional) input parameters: `include_token` and
`refresh_token`. Callers can use this to request the in-use auth token
and can optionally request to refresh the token.
The PR also adds tests for the login and auth APIs that I recently added
to the MCP server.
The existing `wire_format.rs` should share more types with the
`codex-protocol` crate (like `AskForApproval` instead of maintaining a
parallel `CodexToolCallApprovalPolicy` enum), so this PR moves
`wire_format.rs` into `codex-protocol`, renaming it as
`mcp-protocol.rs`. We also de-dupe types, where appropriate.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2423).
* #2424
* __->__ #2423
## Summary
We've been seeing a number of issues and reports with our synthetic
`apply_patch` tool, e.g. #802. Let's make this a real tool - in my
anecdotal testing, it's critical for GPT-OSS models, but I'd like to
make it the standard across GPT-5 and codex models as well.
## Testing
- [x] Tested locally
- [x] Integration test
When using codex-tui on a linux system I was unable to run `cargo
clippy` inside of codex due to:
```
[pid 3548377] socketpair(AF_UNIX, SOCK_SEQPACKET|SOCK_CLOEXEC, 0, <unfinished ...>
[pid 3548370] close(8 <unfinished ...>
[pid 3548377] <... socketpair resumed>0x7ffb97f4ed60) = -1 EPERM (Operation not permitted)
```
And
```
3611300 <... recvfrom resumed>0x708b8b5cffe0, 8, 0, NULL, NULL) = -1 EPERM (Operation not permitted)
```
This PR:
* Fixes a bug that disallowed AF_UNIX to allow it on `socket()`
* Adds recvfrom() to the syscall allow list, this should be fine since
we disable opening new sockets. But we should validate there is not a
open socket inheritance issue.
* Allow socketpair to be called for AF_UNIX
* Adds tests for AF_UNIX components
* All of which allows running `cargo clippy` within the sandbox on
linux, and possibly other tooling using a fork server model + AF_UNIX
comms.
This adds support for easily running Codex backed by a local Ollama
instance running our new open source models. See
https://github.com/openai/gpt-oss for details.
If you pass in `--oss` you'll be prompted to install/launch ollama, and
it will automatically download the 20b model and attempt to use it.
We'll likely want to expand this with some options later to make the
experience smoother for users who can't run the 20b or want to run the
120b.
Co-authored-by: Michael Bolin <mbolin@openai.com>
This introduces some special behavior to the CLIs that are using the
`codex-arg0` crate where if `arg1` is `--codex-run-as-apply-patch`, then
it will run as if `apply_patch arg2` were invoked. This is important
because it means we can do things like:
```
SANDBOX_TYPE=landlock # or seatbelt for macOS
codex debug "${SANDBOX_TYPE}" -- codex --codex-run-as-apply-patch PATCH
```
which gives us a way to run `apply_patch` while ensuring it adheres to
the sandbox the user specified.
While it would be nice to use the `arg0` trick like we are currently
doing for `codex-linux-sandbox`, there is no way to specify the `arg0`
for the underlying command when running under `/usr/bin/sandbox-exec`,
so it will not work for us in this case.
Admittedly, we could have also supported this via a custom environment
variable (e.g., `CODEX_ARG0`), but since environment variables are
inherited by child processes, that seemed like a potentially leakier
abstraction.
This change, as well as our existing reliance on checking `arg0`, place
additional requirements on those who include `codex-core`. Its
`README.md` has been updated to reflect this.
While we could have just added an `apply-patch` subcommand to the
`codex` multitool CLI, that would not be sufficient for the standalone
`codex-exec` CLI, which is something that we distribute as part of our
GitHub releases for those who know they will not be using the TUI and
therefore prefer to use a slightly smaller executable:
https://github.com/openai/codex/releases/tag/rust-v0.10.0
To that end, this PR adds an integration test to ensure that the
`--codex-run-as-apply-patch` option works with the standalone
`codex-exec` CLI.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/1702).
* #1705
* #1703
* __->__ #1702
* #1698
* #1697
Historically, we spawned the Seatbelt and Landlock sandboxes in
substantially different ways:
For **Seatbelt**, we would run `/usr/bin/sandbox-exec` with our policy
specified as an arg followed by the original command:
d1de7bb383/codex-rs/core/src/exec.rs (L147-L219)
For **Landlock/Seccomp**, we would do
`tokio::runtime::Builder::new_current_thread()`, _invoke
Landlock/Seccomp APIs to modify the permissions of that new thread_, and
then spawn the command:
d1de7bb383/codex-rs/core/src/exec_linux.rs (L28-L49)
While it is neat that Landlock/Seccomp supports applying a policy to
only one thread without having to apply it to the entire process, it
requires us to maintain two different codepaths and is a bit harder to
reason about. The tipping point was
https://github.com/openai/codex/pull/1061, in which we had to start
building up the `env` in an unexpected way for the existing
Landlock/Seccomp approach to continue to work.
This PR overhauls things so that we do similar things for Mac and Linux.
It turned out that we were already building our own "helper binary"
comparable to Mac's `sandbox-exec` as part of the `cli` crate:
d1de7bb383/codex-rs/cli/Cargo.toml (L10-L12)
We originally created this to build a small binary to include with the
Node.js version of the Codex CLI to provide support for Linux
sandboxing.
Though the sticky bit is that, at this point, we still want to deploy
the Rust version of Codex as a single, standalone binary rather than a
CLI and a supporting sandboxing binary. To satisfy this goal, we use
"the arg0 trick," in which we:
* use `std::env::current_exe()` to get the path to the CLI that is
currently running
* use the CLI as the `program` for the `Command`
* set `"codex-linux-sandbox"` as arg0 for the `Command`
A CLI that supports sandboxing should check arg0 at the start of the
program. If it is `"codex-linux-sandbox"`, it must invoke
`codex_linux_sandbox::run_main()`, which runs the CLI as if it were
`codex-linux-sandbox`. When acting as `codex-linux-sandbox`, we make the
appropriate Landlock/Seccomp API calls and then use `execvp(3)` to spawn
the original command, so do _replace_ the process rather than spawn a
subprocess. Incidentally, we do this before starting the Tokio runtime,
so the process should only have one thread when `execvp(3)` is called.
Because the `core` crate that needs to spawn the Linux sandboxing is not
a CLI in its own right, this means that every CLI that includes `core`
and relies on this behavior has to (1) implement it and (2) provide the
path to the sandboxing executable. While the path is almost always
`std::env::current_exe()`, we needed to make this configurable for
integration tests, so `Config` now has a `codex_linux_sandbox_exe:
Option<PathBuf>` property to facilitate threading this through,
introduced in https://github.com/openai/codex/pull/1089.
This common pattern is now captured in
`codex_linux_sandbox::run_with_sandbox()` and all of the `main.rs`
functions that should use it have been updated as part of this PR.
The `codex-linux-sandbox` crate added to the Cargo workspace as part of
this PR now has the bulk of the Landlock/Seccomp logic, which makes
`core` a bit simpler. Indeed, `core/src/exec_linux.rs` and
`core/src/landlock.rs` were removed/ported as part of this PR. I also
moved the unit tests for this code into an integration test,
`linux-sandbox/tests/landlock.rs`, in which I use
`env!("CARGO_BIN_EXE_codex-linux-sandbox")` as the value for
`codex_linux_sandbox_exe` since `std::env::current_exe()` is not
appropriate in that case.
Sets submodules to use workspace lints. Added denying unwrap as a
workspace level lint, which found a couple of cases where we could have
propagated errors. Also manually labeled ones that were fine by my eye.
Some effects of this change:
- New formatting changes across many files. No functionality changes
should occur from that.
- Calls to `set_env` are considered unsafe, since this only happens in
tests we wrap them in `unsafe` blocks
I started this PR because I wanted to share the `format_duration()`
utility function in `codex-rs/exec/src/event_processor.rs` with the TUI.
The question was: where to put it?
`core` should have as few dependencies as possible, so moving it there
would introduce a dependency on `chrono`, which seemed undesirable.
`core` already had this `cli` feature to deal with a similar situation
around sharing common utility functions, so I decided to:
* make `core` feature-free
* introduce `common`
* `common` can have as many "special interest" features as it needs,
each of which can declare their own deps
* the first two features of common are `cli` and `elapsed`
In practice, this meant updating a number of `Cargo.toml` files,
replacing this line:
```toml
codex-core = { path = "../core", features = ["cli"] }
```
with these:
```toml
codex-core = { path = "../core" }
codex-common = { path = "../common", features = ["cli"] }
```
Moving `format_duration()` into its own file gave it some "breathing
room" to add a unit test, so I had Codex generate some tests and new
support for durations over 1 minute.
Taking a pass at building artifacts per platform so we can consider
different distribution strategies that don't require users to install
the full `cargo` toolchain.
Right now this grabs just the `codex-repl` and `codex-tui` bins for 5
different targets and bundles them into a draft release. I think a
clearly marked pre-release set of artifacts will unblock the next step
of testing.
This changes how instantiating `Config` works and also adds
`approval_policy` and `sandbox_policy` as fields. The idea is:
* All fields of `Config` have appropriate default values.
* `Config` is initially loaded from `~/.codex/config.toml`, so values in
`config.toml` will override those defaults.
* Clients must instantiate `Config` via
`Config::load_with_overrides(ConfigOverrides)` where `ConfigOverrides`
has optional overrides that are expected to be settable based on CLI
flags.
The `Config` should be defined early in the program and then passed
down. Now functions like `init_codex()` take fewer individual parameters
because they can just take a `Config`.
Also, `Config::load()` used to fail silently if `~/.codex/config.toml`
had a parse error and fell back to the default config. This seemed
really bad because it wasn't clear why the values in my `config.toml`
weren't getting picked up. I changed things so that
`load_with_overrides()` returns `Result<Config>` and verified that the
various CLIs print a reasonable error if `config.toml` is malformed.
Finally, I also updated the TUI to show which **sandbox** value is being
used, as we do for other key values like **model** and **approval**.
This was also a reminder that the various values of `--sandbox` are
honored on Linux but not macOS today, so I added some TODOs about fixing
that.
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.