Phase 1: Repository & Infrastructure Setup

- Renamed directories: codex-rs -> llmx-rs, codex-cli -> llmx-cli - Updated package.json files: - Root: llmx-monorepo - CLI: @llmx/llmx - SDK: @llmx/llmx-sdk - Updated pnpm workspace configuration - Renamed binary: codex.js -> llmx.js - Updated environment variables: CODEX_* -> LLMX_* - Changed repository URLs to valknar/llmx 🤖 Generated with Claude Code
2025-11-11 14:01:52 +01:00
parent 052b052832
commit f237fe560d
1151 changed files with 41 additions and 35 deletions
--- a/llmx-rs/responses-api-proxy/Cargo.toml
+++ b/llmx-rs/responses-api-proxy/Cargo.toml
@@ -0,0 +1,27 @@
+[package]
+edition = "2024"
+name = "codex-responses-api-proxy"
+version = { workspace = true }
+
+[lib]
+name = "codex_responses_api_proxy"
+path = "src/lib.rs"
+
+[[bin]]
+name = "codex-responses-api-proxy"
+path = "src/main.rs"
+
+[lints]
+workspace = true
+
+[dependencies]
+anyhow = { workspace = true }
+clap = { workspace = true, features = ["derive"] }
+codex-process-hardening = { workspace = true }
+ctor = { workspace = true }
+libc = { workspace = true }
+reqwest = { workspace = true, features = ["blocking", "json", "rustls-tls"] }
+serde = { workspace = true, features = ["derive"] }
+serde_json = { workspace = true }
+tiny_http = { workspace = true }
+zeroize = { workspace = true }
--- a/llmx-rs/responses-api-proxy/README.md
+++ b/llmx-rs/responses-api-proxy/README.md
@@ -0,0 +1,80 @@
+# codex-responses-api-proxy
+
+A strict HTTP proxy that only forwards `POST` requests to `/v1/responses` to the OpenAI API (`https://api.openai.com`), injecting the `Authorization: Bearer $OPENAI_API_KEY` header. Everything else is rejected with `403 Forbidden`.
+
+## Expected Usage
+
+**IMPORTANT:** `codex-responses-api-proxy` is designed to be run by a privileged user with access to `OPENAI_API_KEY` so that an unprivileged user cannot inspect or tamper with the process. Though if `--http-shutdown` is specified, an unprivileged user _can_ make a `GET` request to `/shutdown` to shutdown the server, as an unprivileged user could not send `SIGTERM` to kill the process.
+
+A privileged user (i.e., `root` or a user with `sudo`) who has access to `OPENAI_API_KEY` would run the following to start the server, as `codex-responses-api-proxy` reads the auth token from `stdin`:
+
+```shell
+printenv OPENAI_API_KEY | env -u OPENAI_API_KEY codex-responses-api-proxy --http-shutdown --server-info /tmp/server-info.json
+```
+
+A non-privileged user would then run Codex as follows, specifying the `model_provider` dynamically:
+
+```shell
+PROXY_PORT=$(jq .port /tmp/server-info.json)
+PROXY_BASE_URL="http://127.0.0.1:${PROXY_PORT}"
+codex exec -c "model_providers.openai-proxy={ name = 'OpenAI Proxy', base_url = '${PROXY_BASE_URL}/v1', wire_api='responses' }" \
+    -c model_provider="openai-proxy" \
+    'Your prompt here'
+```
+
+When the unprivileged user was finished, they could shutdown the server using `curl` (since `kill -SIGTERM` is not an option):
+
+```shell
+curl --fail --silent --show-error "${PROXY_BASE_URL}/shutdown"
+```
+
+## Behavior
+
+- Reads the API key from `stdin`. All callers should pipe the key in (for example, `printenv OPENAI_API_KEY | codex-responses-api-proxy`).
+- Formats the header value as `Bearer <key>` and attempts to `mlock(2)` the memory holding that header so it is not swapped to disk.
+- Listens on the provided port or an ephemeral port if `--port` is not specified.
+- Accepts exactly `POST /v1/responses` (no query string). The request body is forwarded to `https://api.openai.com/v1/responses` with `Authorization: Bearer <key>` set. All original request headers (except any incoming `Authorization`) are forwarded upstream, with `Host` overridden to `api.openai.com`. For other requests, it responds with `403`.
+- Optionally writes a single-line JSON file with server info, currently `{ "port": <u16>, "pid": <u32> }`.
+- Optional `--http-shutdown` enables `GET /shutdown` to terminate the process with exit code `0`. This allows one user (e.g., `root`) to start the proxy and another unprivileged user on the host to shut it down.
+
+## CLI
+
+```
+codex-responses-api-proxy [--port <PORT>] [--server-info <FILE>] [--http-shutdown] [--upstream-url <URL>]
+```
+
+- `--port <PORT>`: Port to bind on `127.0.0.1`. If omitted, an ephemeral port is chosen.
+- `--server-info <FILE>`: If set, the proxy writes a single line of JSON with `{ "port": <PORT>, "pid": <PID> }` once listening.
+- `--http-shutdown`: If set, enables `GET /shutdown` to exit the process with code `0`.
+- `--upstream-url <URL>`: Absolute URL to forward requests to. Defaults to `https://api.openai.com/v1/responses`.
+- Authentication is fixed to `Authorization: Bearer <key>` to match the Codex CLI expectations.
+
+For Azure, for example (ensure your deployment accepts `Authorization: Bearer <key>`):
+
+```shell
+printenv AZURE_OPENAI_API_KEY | env -u AZURE_OPENAI_API_KEY codex-responses-api-proxy \
+  --http-shutdown \
+  --server-info /tmp/server-info.json \
+  --upstream-url "https://YOUR_PROJECT_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT/responses?api-version=2025-04-01-preview"
+```
+
+## Notes
+
+- Only `POST /v1/responses` is permitted. No query strings are allowed.
+- All request headers are forwarded to the upstream call (aside from overriding `Authorization` and `Host`). Response status and content-type are mirrored from upstream.
+
+## Hardening Details
+
+Care is taken to restrict access/copying to the value of `OPENAI_API_KEY` retained in memory:
+
+- We leverage [`codex_process_hardening`](https://github.com/openai/codex/blob/main/codex-rs/process-hardening/README.md) so `codex-responses-api-proxy` is run with standard process-hardening techniques.
+- At startup, we allocate a `1024` byte buffer on the stack and copy `"Bearer "` into the start of the buffer.
+- We then read from `stdin`, copying the contents into the buffer after `"Bearer "`.
+- After verifying the key matches `/^[a-zA-Z0-9_-]+$/` (and does not exceed the buffer), we create a `String` from that buffer (so the data is now on the heap).
+- We zero out the stack-allocated buffer using https://crates.io/crates/zeroize so it is not optimized away by the compiler.
+- We invoke `.leak()` on the `String` so we can treat its contents as a `&'static str`, as it will live for the rest of the process.
+- On UNIX, we `mlock(2)` the memory backing the `&'static str`.
+- When using the `&'static str` when building an HTTP request, we use `HeaderValue::from_static()` to avoid copying the `&str`.
+- We also invoke `.set_sensitive(true)` on the `HeaderValue`, which in theory indicates to other parts of the HTTP stack that the header should be treated with "special care" to avoid leakage:
+
+https://github.com/hyperium/http/blob/439d1c50d71e3be3204b6c4a1bf2255ed78e1f93/src/header/value.rs#L346-L376
--- a/llmx-rs/responses-api-proxy/npm/README.md
+++ b/llmx-rs/responses-api-proxy/npm/README.md
@@ -0,0 +1,13 @@
+# @openai/codex-responses-api-proxy
+
+<p align="center"><code>npm i -g @openai/codex-responses-api-proxy</code> to install <code>codex-responses-api-proxy</code></p>
+
+This package distributes the prebuilt [Codex Responses API proxy binary](https://github.com/openai/codex/tree/main/codex-rs/responses-api-proxy) for macOS, Linux, and Windows.
+
+To see available options, run:
+
+```
+node ./bin/codex-responses-api-proxy.js --help
+```
+
+Refer to [`codex-rs/responses-api-proxy/README.md`](https://github.com/openai/codex/blob/main/codex-rs/responses-api-proxy/README.md) for detailed documentation.
--- a/llmx-rs/responses-api-proxy/npm/bin/codex-responses-api-proxy.js
+++ b/llmx-rs/responses-api-proxy/npm/bin/codex-responses-api-proxy.js
@@ -0,0 +1,97 @@
+#!/usr/bin/env node
+// Entry point for the Codex responses API proxy binary.
+
+import { spawn } from "node:child_process";
+import path from "path";
+import { fileURLToPath } from "url";
+
+const __filename = fileURLToPath(import.meta.url);
+const __dirname = path.dirname(__filename);
+
+function determineTargetTriple(platform, arch) {
+  switch (platform) {
+    case "linux":
+    case "android":
+      if (arch === "x64") {
+        return "x86_64-unknown-linux-musl";
+      }
+      if (arch === "arm64") {
+        return "aarch64-unknown-linux-musl";
+      }
+      break;
+    case "darwin":
+      if (arch === "x64") {
+        return "x86_64-apple-darwin";
+      }
+      if (arch === "arm64") {
+        return "aarch64-apple-darwin";
+      }
+      break;
+    case "win32":
+      if (arch === "x64") {
+        return "x86_64-pc-windows-msvc";
+      }
+      if (arch === "arm64") {
+        return "aarch64-pc-windows-msvc";
+      }
+      break;
+    default:
+      break;
+  }
+  return null;
+}
+
+const targetTriple = determineTargetTriple(process.platform, process.arch);
+if (!targetTriple) {
+  throw new Error(
+    `Unsupported platform: ${process.platform} (${process.arch})`,
+  );
+}
+
+const vendorRoot = path.join(__dirname, "..", "vendor");
+const archRoot = path.join(vendorRoot, targetTriple);
+const binaryBaseName = "codex-responses-api-proxy";
+const binaryPath = path.join(
+  archRoot,
+  binaryBaseName,
+  process.platform === "win32" ? `${binaryBaseName}.exe` : binaryBaseName,
+);
+
+const child = spawn(binaryPath, process.argv.slice(2), {
+  stdio: "inherit",
+});
+
+child.on("error", (err) => {
+  console.error(err);
+  process.exit(1);
+});
+
+const forwardSignal = (signal) => {
+  if (!child.killed) {
+    try {
+      child.kill(signal);
+    } catch {
+      /* ignore */
+    }
+  }
+};
+
+["SIGINT", "SIGTERM", "SIGHUP"].forEach((sig) => {
+  process.on(sig, () => forwardSignal(sig));
+});
+
+const childResult = await new Promise((resolve) => {
+  child.on("exit", (code, signal) => {
+    if (signal) {
+      resolve({ type: "signal", signal });
+    } else {
+      resolve({ type: "code", exitCode: code ?? 1 });
+    }
+  });
+});
+
+if (childResult.type === "signal") {
+  process.kill(process.pid, childResult.signal);
+} else {
+  process.exit(childResult.exitCode);
+}
--- a/llmx-rs/responses-api-proxy/npm/package.json
+++ b/llmx-rs/responses-api-proxy/npm/package.json
@@ -0,0 +1,21 @@
+{
+  "name": "@openai/codex-responses-api-proxy",
+  "version": "0.0.0-dev",
+  "license": "Apache-2.0",
+  "bin": {
+    "codex-responses-api-proxy": "bin/codex-responses-api-proxy.js"
+  },
+  "type": "module",
+  "engines": {
+    "node": ">=16"
+  },
+  "files": [
+    "bin",
+    "vendor"
+  ],
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/openai/codex.git",
+    "directory": "codex-rs/responses-api-proxy/npm"
+  }
+}
--- a/llmx-rs/responses-api-proxy/src/lib.rs
+++ b/llmx-rs/responses-api-proxy/src/lib.rs
@@ -0,0 +1,237 @@
+use std::fs::File;
+use std::fs::{self};
+use std::io::Write;
+use std::net::SocketAddr;
+use std::net::TcpListener;
+use std::path::Path;
+use std::path::PathBuf;
+use std::sync::Arc;
+use std::time::Duration;
+
+use anyhow::Context;
+use anyhow::Result;
+use anyhow::anyhow;
+use clap::Parser;
+use reqwest::Url;
+use reqwest::blocking::Client;
+use reqwest::header::AUTHORIZATION;
+use reqwest::header::HOST;
+use reqwest::header::HeaderMap;
+use reqwest::header::HeaderName;
+use reqwest::header::HeaderValue;
+use serde::Serialize;
+use tiny_http::Header;
+use tiny_http::Method;
+use tiny_http::Request;
+use tiny_http::Response;
+use tiny_http::Server;
+use tiny_http::StatusCode;
+
+mod read_api_key;
+use read_api_key::read_auth_header_from_stdin;
+
+/// CLI arguments for the proxy.
+#[derive(Debug, Clone, Parser)]
+#[command(name = "responses-api-proxy", about = "Minimal OpenAI responses proxy")]
+pub struct Args {
+    /// Port to listen on. If not set, an ephemeral port is used.
+    #[arg(long)]
+    pub port: Option<u16>,
+
+    /// Path to a JSON file to write startup info (single line). Includes {"port": <u16>}.
+    #[arg(long, value_name = "FILE")]
+    pub server_info: Option<PathBuf>,
+
+    /// Enable HTTP shutdown endpoint at GET /shutdown
+    #[arg(long)]
+    pub http_shutdown: bool,
+
+    /// Absolute URL the proxy should forward requests to (defaults to OpenAI).
+    #[arg(long, default_value = "https://api.openai.com/v1/responses")]
+    pub upstream_url: String,
+}
+
+#[derive(Serialize)]
+struct ServerInfo {
+    port: u16,
+    pid: u32,
+}
+
+struct ForwardConfig {
+    upstream_url: Url,
+    host_header: HeaderValue,
+}
+
+/// Entry point for the library main, for parity with other crates.
+pub fn run_main(args: Args) -> Result<()> {
+    let auth_header = read_auth_header_from_stdin()?;
+
+    let upstream_url = Url::parse(&args.upstream_url).context("parsing --upstream-url")?;
+    let host = match (upstream_url.host_str(), upstream_url.port()) {
+        (Some(host), Some(port)) => format!("{host}:{port}"),
+        (Some(host), None) => host.to_string(),
+        _ => return Err(anyhow!("upstream URL must include a host")),
+    };
+    let host_header =
+        HeaderValue::from_str(&host).context("constructing Host header from upstream URL")?;
+
+    let forward_config = Arc::new(ForwardConfig {
+        upstream_url,
+        host_header,
+    });
+
+    let (listener, bound_addr) = bind_listener(args.port)?;
+    if let Some(path) = args.server_info.as_ref() {
+        write_server_info(path, bound_addr.port())?;
+    }
+    let server = Server::from_listener(listener, None)
+        .map_err(|err| anyhow!("creating HTTP server: {err}"))?;
+    let client = Arc::new(
+        Client::builder()
+            // Disable reqwest's 30s default so long-lived response streams keep flowing.
+            .timeout(None::<Duration>)
+            .build()
+            .context("building reqwest client")?,
+    );
+
+    eprintln!("responses-api-proxy listening on {bound_addr}");
+
+    let http_shutdown = args.http_shutdown;
+    for request in server.incoming_requests() {
+        let client = client.clone();
+        let forward_config = forward_config.clone();
+        std::thread::spawn(move || {
+            if http_shutdown && request.method() == &Method::Get && request.url() == "/shutdown" {
+                let _ = request.respond(Response::new_empty(StatusCode(200)));
+                std::process::exit(0);
+            }
+
+            if let Err(e) = forward_request(&client, auth_header, &forward_config, request) {
+                eprintln!("forwarding error: {e}");
+            }
+        });
+    }
+
+    Err(anyhow!("server stopped unexpectedly"))
+}
+
+fn bind_listener(port: Option<u16>) -> Result<(TcpListener, SocketAddr)> {
+    let addr = SocketAddr::from(([127, 0, 0, 1], port.unwrap_or(0)));
+    let listener = TcpListener::bind(addr).with_context(|| format!("failed to bind {addr}"))?;
+    let bound = listener.local_addr().context("failed to read local_addr")?;
+    Ok((listener, bound))
+}
+
+fn write_server_info(path: &Path, port: u16) -> Result<()> {
+    if let Some(parent) = path.parent()
+        && !parent.as_os_str().is_empty()
+    {
+        fs::create_dir_all(parent)?;
+    }
+
+    let info = ServerInfo {
+        port,
+        pid: std::process::id(),
+    };
+    let mut data = serde_json::to_string(&info)?;
+    data.push('\n');
+    let mut f = File::create(path)?;
+    f.write_all(data.as_bytes())?;
+    Ok(())
+}
+
+fn forward_request(
+    client: &Client,
+    auth_header: &'static str,
+    config: &ForwardConfig,
+    mut req: Request,
+) -> Result<()> {
+    // Only allow POST /v1/responses exactly, no query string.
+    let method = req.method().clone();
+    let url_path = req.url().to_string();
+    let allow = method == Method::Post && url_path == "/v1/responses";
+
+    if !allow {
+        let resp = Response::new_empty(StatusCode(403));
+        let _ = req.respond(resp);
+        return Ok(());
+    }
+
+    // Read request body
+    let mut body = Vec::new();
+    let mut reader = req.as_reader();
+    std::io::Read::read_to_end(&mut reader, &mut body)?;
+
+    // Build headers for upstream, forwarding everything from the incoming
+    // request except Authorization (we replace it below).
+    let mut headers = HeaderMap::new();
+    for header in req.headers() {
+        let name_ascii = header.field.as_str();
+        let lower = name_ascii.to_ascii_lowercase();
+        if lower.as_str() == "authorization" || lower.as_str() == "host" {
+            continue;
+        }
+
+        let header_name = match HeaderName::from_bytes(lower.as_bytes()) {
+            Ok(name) => name,
+            Err(_) => continue,
+        };
+        if let Ok(value) = HeaderValue::from_bytes(header.value.as_bytes()) {
+            headers.append(header_name, value);
+        }
+    }
+
+    // As part of our effort to to keep `auth_header` secret, we use a
+    // combination of `from_static()` and `set_sensitive(true)`.
+    let mut auth_header_value = HeaderValue::from_static(auth_header);
+    auth_header_value.set_sensitive(true);
+    headers.insert(AUTHORIZATION, auth_header_value);
+
+    headers.insert(HOST, config.host_header.clone());
+
+    let upstream_resp = client
+        .post(config.upstream_url.clone())
+        .headers(headers)
+        .body(body)
+        .send()
+        .context("forwarding request to upstream")?;
+
+    // We have to create an adapter between a `reqwest::blocking::Response`
+    // and a `tiny_http::Response`. Fortunately, `reqwest::blocking::Response`
+    // implements `Read`, so we can use it directly as the body of the
+    // `tiny_http::Response`.
+    let status = upstream_resp.status();
+    let mut response_headers = Vec::new();
+    for (name, value) in upstream_resp.headers().iter() {
+        // Skip headers that tiny_http manages itself.
+        if matches!(
+            name.as_str(),
+            "content-length" | "transfer-encoding" | "connection" | "trailer" | "upgrade"
+        ) {
+            continue;
+        }
+
+        if let Ok(header) = Header::from_bytes(name.as_str().as_bytes(), value.as_bytes()) {
+            response_headers.push(header);
+        }
+    }
+
+    let content_length = upstream_resp.content_length().and_then(|len| {
+        if len <= usize::MAX as u64 {
+            Some(len as usize)
+        } else {
+            None
+        }
+    });
+
+    let response = Response::new(
+        StatusCode(status.as_u16()),
+        response_headers,
+        upstream_resp,
+        content_length,
+        None,
+    );
+
+    let _ = req.respond(response);
+    Ok(())
+}
--- a/llmx-rs/responses-api-proxy/src/main.rs
+++ b/llmx-rs/responses-api-proxy/src/main.rs
@@ -0,0 +1,12 @@
+use clap::Parser;
+use codex_responses_api_proxy::Args as ResponsesApiProxyArgs;
+
+#[ctor::ctor]
+fn pre_main() {
+    codex_process_hardening::pre_main_hardening();
+}
+
+pub fn main() -> anyhow::Result<()> {
+    let args = ResponsesApiProxyArgs::parse();
+    codex_responses_api_proxy::run_main(args)
+}
--- a/llmx-rs/responses-api-proxy/src/read_api_key.rs
+++ b/llmx-rs/responses-api-proxy/src/read_api_key.rs
@@ -0,0 +1,342 @@
+use anyhow::Context;
+use anyhow::Result;
+use anyhow::anyhow;
+use zeroize::Zeroize;
+
+/// Use a generous buffer size to avoid truncation and to allow for longer API
+/// keys in the future.
+const BUFFER_SIZE: usize = 1024;
+const AUTH_HEADER_PREFIX: &[u8] = b"Bearer ";
+
+/// Reads the auth token from stdin and returns a static `Authorization` header
+/// value with the auth token used with `Bearer`. The header value is returned
+/// as a `&'static str` whose bytes are locked in memory to avoid accidental
+/// exposure.
+#[cfg(unix)]
+pub(crate) fn read_auth_header_from_stdin() -> Result<&'static str> {
+    read_auth_header_with(read_from_unix_stdin)
+}
+
+#[cfg(windows)]
+pub(crate) fn read_auth_header_from_stdin() -> Result<&'static str> {
+    use std::io::Read;
+
+    // Use of `stdio::io::stdin()` has the problem mentioned in the docstring on
+    // the UNIX version of `read_from_unix_stdin()`, so this should ultimately
+    // be replaced the low-level Windows equivalent. Because we do not have an
+    // equivalent of mlock() on Windows right now, it is not pressing until we
+    // address that issue.
+    read_auth_header_with(|buffer| std::io::stdin().read(buffer))
+}
+
+/// We perform a low-level read with `read(2)` because `stdio::io::stdin()` has
+/// an internal BufReader:
+///
+/// https://github.com/rust-lang/rust/blob/bcbbdcb8522fd3cb4a8dde62313b251ab107694d/library/std/src/io/stdio.rs#L250-L252
+///
+/// that can end up retaining a copy of stdin data in memory with no way to zero
+/// it out, whereas we aim to guarantee there is exactly one copy of the API key
+/// in memory, protected by mlock(2).
+#[cfg(unix)]
+fn read_from_unix_stdin(buffer: &mut [u8]) -> std::io::Result<usize> {
+    use libc::c_void;
+    use libc::read;
+
+    // Perform a single read(2) call into the provided buffer slice.
+    // Looping and newline/EOF handling are managed by the caller.
+    loop {
+        let result = unsafe {
+            read(
+                libc::STDIN_FILENO,
+                buffer.as_mut_ptr().cast::<c_void>(),
+                buffer.len(),
+            )
+        };
+
+        if result == 0 {
+            return Ok(0);
+        }
+
+        if result < 0 {
+            let err = std::io::Error::last_os_error();
+            if err.kind() == std::io::ErrorKind::Interrupted {
+                continue;
+            }
+            return Err(err);
+        }
+
+        return Ok(result as usize);
+    }
+}
+
+fn read_auth_header_with<F>(mut read_fn: F) -> Result<&'static str>
+where
+    F: FnMut(&mut [u8]) -> std::io::Result<usize>,
+{
+    // TAKE CARE WHEN MODIFYING THIS CODE!!!
+    //
+    // This function goes to great lengths to avoid leaving the API key in
+    // memory longer than necessary and to avoid copying it around. We read
+    // directly into a stack buffer so the only heap allocation should be the
+    // one to create the String (with the exact size) for the header value,
+    // which we then immediately protect with mlock(2).
+    let mut buf = [0u8; BUFFER_SIZE];
+    buf[..AUTH_HEADER_PREFIX.len()].copy_from_slice(AUTH_HEADER_PREFIX);
+
+    let prefix_len = AUTH_HEADER_PREFIX.len();
+    let capacity = buf.len() - prefix_len;
+    let mut total_read = 0usize; // number of bytes read into the token region
+    let mut saw_newline = false;
+    let mut saw_eof = false;
+
+    while total_read < capacity {
+        let slice = &mut buf[prefix_len + total_read..];
+        let read = match read_fn(slice) {
+            Ok(n) => n,
+            Err(err) => {
+                buf.zeroize();
+                return Err(err.into());
+            }
+        };
+
+        if read == 0 {
+            saw_eof = true;
+            break;
+        }
+
+        // Search only the newly written region for a newline.
+        let newly_written = &slice[..read];
+        if let Some(pos) = newly_written.iter().position(|&b| b == b'\n') {
+            total_read += pos + 1; // include the newline for trimming below
+            saw_newline = true;
+            break;
+        }
+
+        total_read += read;
+
+        // Continue loop; if buffer fills without newline/EOF we'll error below.
+    }
+
+    // If buffer filled and we did not see newline or EOF, error out.
+    if total_read == capacity && !saw_newline && !saw_eof {
+        buf.zeroize();
+        return Err(anyhow!(
+            "API key is too large to fit in the {BUFFER_SIZE}-byte buffer"
+        ));
+    }
+
+    let mut total = prefix_len + total_read;
+    while total > prefix_len && (buf[total - 1] == b'\n' || buf[total - 1] == b'\r') {
+        total -= 1;
+    }
+
+    if total == AUTH_HEADER_PREFIX.len() {
+        buf.zeroize();
+        return Err(anyhow!(
+            "API key must be provided via stdin (e.g. printenv OPENAI_API_KEY | codex responses-api-proxy)"
+        ));
+    }
+
+    if let Err(err) = validate_auth_header_bytes(&buf[AUTH_HEADER_PREFIX.len()..total]) {
+        buf.zeroize();
+        return Err(err);
+    }
+
+    let header_str = match std::str::from_utf8(&buf[..total]) {
+        Ok(value) => value,
+        Err(err) => {
+            // In theory, validate_auth_header_bytes() should have caught
+            // any invalid UTF-8 sequences, but just in case...
+            buf.zeroize();
+            return Err(err).context("reading Authorization header from stdin as UTF-8");
+        }
+    };
+
+    let header_value = String::from(header_str);
+    buf.zeroize();
+
+    let leaked: &'static mut str = header_value.leak();
+    mlock_str(leaked);
+
+    Ok(leaked)
+}
+
+#[cfg(unix)]
+fn mlock_str(value: &str) {
+    use libc::_SC_PAGESIZE;
+    use libc::c_void;
+    use libc::mlock;
+    use libc::sysconf;
+
+    if value.is_empty() {
+        return;
+    }
+
+    let page_size = unsafe { sysconf(_SC_PAGESIZE) };
+    if page_size <= 0 {
+        return;
+    }
+    let page_size = page_size as usize;
+    if page_size == 0 {
+        return;
+    }
+
+    let addr = value.as_ptr() as usize;
+    let len = value.len();
+    let start = addr & !(page_size - 1);
+    let addr_end = match addr.checked_add(len) {
+        Some(v) => match v.checked_add(page_size - 1) {
+            Some(total) => total,
+            None => return,
+        },
+        None => return,
+    };
+    let end = addr_end & !(page_size - 1);
+    let size = end.saturating_sub(start);
+    if size == 0 {
+        return;
+    }
+
+    let _ = unsafe { mlock(start as *const c_void, size) };
+}
+
+#[cfg(not(unix))]
+fn mlock_str(_value: &str) {}
+
+/// The key should match /^[A-Za-z0-9\-_]+$/. Ensure there is no funny business
+/// with NUL characters and whatnot.
+fn validate_auth_header_bytes(key_bytes: &[u8]) -> Result<()> {
+    if key_bytes
+        .iter()
+        .all(|byte| byte.is_ascii_alphanumeric() || matches!(byte, b'-' | b'_'))
+    {
+        return Ok(());
+    }
+
+    Err(anyhow!(
+        "API key may only contain ASCII letters, numbers, '-' or '_'"
+    ))
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use std::collections::VecDeque;
+    use std::io;
+
+    #[test]
+    fn reads_key_with_no_newlines() {
+        let mut sent = false;
+        let result = read_auth_header_with(|buf| {
+            if sent {
+                return Ok(0);
+            }
+            let data = b"sk-abc123";
+            buf[..data.len()].copy_from_slice(data);
+            sent = true;
+            Ok(data.len())
+        })
+        .unwrap();
+
+        assert_eq!(result, "Bearer sk-abc123");
+    }
+
+    #[test]
+    fn reads_key_with_short_reads() {
+        let mut chunks: VecDeque<&[u8]> =
+            VecDeque::from(vec![b"sk-".as_ref(), b"abc".as_ref(), b"123\n".as_ref()]);
+        let result = read_auth_header_with(|buf| match chunks.pop_front() {
+            Some(chunk) if !chunk.is_empty() => {
+                buf[..chunk.len()].copy_from_slice(chunk);
+                Ok(chunk.len())
+            }
+            _ => Ok(0),
+        })
+        .unwrap();
+
+        assert_eq!(result, "Bearer sk-abc123");
+    }
+
+    #[test]
+    fn reads_key_and_trims_newlines() {
+        let mut sent = false;
+        let result = read_auth_header_with(|buf| {
+            if sent {
+                return Ok(0);
+            }
+            let data = b"sk-abc123\r\n";
+            buf[..data.len()].copy_from_slice(data);
+            sent = true;
+            Ok(data.len())
+        })
+        .unwrap();
+
+        assert_eq!(result, "Bearer sk-abc123");
+    }
+
+    #[test]
+    fn errors_when_no_input_provided() {
+        let err = read_auth_header_with(|_| Ok(0)).unwrap_err();
+        let message = format!("{err:#}");
+        assert!(message.contains("must be provided"));
+    }
+
+    #[test]
+    fn errors_when_buffer_filled() {
+        let err = read_auth_header_with(|buf| {
+            let data = vec![b'a'; BUFFER_SIZE - AUTH_HEADER_PREFIX.len()];
+            buf[..data.len()].copy_from_slice(&data);
+            Ok(data.len())
+        })
+        .unwrap_err();
+        let message = format!("{err:#}");
+        let expected_error =
+            format!("API key is too large to fit in the {BUFFER_SIZE}-byte buffer");
+        assert!(message.contains(&expected_error));
+    }
+
+    #[test]
+    fn propagates_io_error() {
+        let err = read_auth_header_with(|_| Err(io::Error::other("boom"))).unwrap_err();
+
+        let io_error = err.downcast_ref::<io::Error>().unwrap();
+        assert_eq!(io_error.kind(), io::ErrorKind::Other);
+        assert_eq!(io_error.to_string(), "boom");
+    }
+
+    #[test]
+    fn errors_on_invalid_utf8() {
+        let mut sent = false;
+        let err = read_auth_header_with(|buf| {
+            if sent {
+                return Ok(0);
+            }
+            let data = b"sk-abc\xff";
+            buf[..data.len()].copy_from_slice(data);
+            sent = true;
+            Ok(data.len())
+        })
+        .unwrap_err();
+
+        let message = format!("{err:#}");
+        assert!(message.contains("API key may only contain ASCII letters, numbers, '-' or '_'"));
+    }
+
+    #[test]
+    fn errors_on_invalid_characters() {
+        let mut sent = false;
+        let err = read_auth_header_with(|buf| {
+            if sent {
+                return Ok(0);
+            }
+            let data = b"sk-abc!23";
+            buf[..data.len()].copy_from_slice(data);
+            sent = true;
+            Ok(data.len())
+        })
+        .unwrap_err();
+
+        let message = format!("{err:#}");
+        assert!(message.contains("API key may only contain ASCII letters, numbers, '-' or '_'"));
+    }
+}