valknar/llmx - llmx - dev.pivoine.art

Author	SHA1	Message	Date
Michael Bolin	e1f098b9b7	feat: add options to responses-api-proxy to support Azure (#6129 ) This PR introduces an `--upstream-url` option to the proxy CLI that determines the URL that Responses API requests should be forwarded to. To preserve existing behavior, the default value is `"https://api.openai.com/v1/responses"`. The motivation for this change is that the [Codex GitHub Action](https://github.com/openai/codex-action) should support those who use the OpenAI Responses API via Azure. Relevant issues: - https://github.com/openai/codex-action/issues/28 - https://github.com/openai/codex-action/issues/38 - https://github.com/openai/codex-action/pull/44 Though rather than introduce a bunch of new Azure-specific logic in the action as https://github.com/openai/codex-action/pull/44 proposes, we should leverage our Responses API proxy to get the _hardening_ benefits it provides: `d5853d9c47/codex-rs/responses-api-proxy/README.md (hardening-details)` This PR should make this straightforward to incorporate in the action. To see how the updated version of the action would consume these new options, see https://github.com/openai/codex-action/pull/47.	2025-11-03 10:06:00 -08:00
Thibault Sottiaux	c37469b5ba	docs: clarify responses proxy metadata (#5406 )	2025-10-20 15:04:02 -07:00
Michael Bolin	a30a902db5	fix: use low-level stdin read logic to avoid a BufReader (#4778 ) `codex-responses-api-proxy` is designed so that there should be exactly one copy of the API key in memory (that is `mlock`'d on UNIX), but in practice, I was seeing two when I dumped the process data from `/proc/$PID/mem`. It appears that `std::io::stdin()` maintains an internal `BufReader` that we cannot zero out, so this PR changes the implementation on UNIX so that we use a low-level `read(2)` instead. Even though it seems like it would be incredibly unlikely, we also make this logic tolerant of short reads. Either `\n` or `EOF` must be sent to signal the end of the key written to stdin.	2025-10-05 13:58:30 -07:00
Michael Bolin	618a42adf5	feat: introduce npm module for codex-responses-api-proxy (#4417 ) This PR expands `.github/workflows/rust-release.yml` so that it also builds and publishes the `npm` module for `@openai/codex-responses-api-proxy` in addition to `@openai/codex`. Note both `npm` modules are similar, in that they each contain a single `.js` file that is a thin launcher around the appropriate native executable. (Since we have a minimal dependency on Node.js, I also lowered the minimum version from 20 to 16 and verified that works on my machine.) As part of this change, we tighten up some of the docs around `codex-responses-api-proxy` and ensure the details regarding protecting the `OPENAI_API_KEY` in memory match the implementation. To test the `npm` build process, I ran: ``` ./codex-cli/scripts/build_npm_package.py --package codex-responses-api-proxy --version 0.43.0-alpha.3 ``` which stages the `npm` module for `@openai/codex-responses-api-proxy` in a temp directory, using the binary artifacts from https://github.com/openai/codex/releases/tag/rust-v0.43.0-alpha.3.	2025-09-28 19:34:06 -07:00
Michael Bolin	c549481513	feat: introduce responses-api-proxy (#4246 ) Details are in `responses-api-proxy/README.md`, but the key contribution of this PR is a new subcommand, `codex responses-api-proxy`, which reads the auth token for use with the OpenAI Responses API from `stdin` at startup and then proxies `POST` requests to `/v1/responses` over to `https://api.openai.com/v1/responses`, injecting the auth token as part of the `Authorization` header. The expectation is that `codex responses-api-proxy` is launched by a privileged user who has access to the auth token so that it can be used by unprivileged users of the Codex CLI on the same host. If the client only has one user account with `sudo`, one option is to: - run `sudo codex responses-api-proxy --http-shutdown --server-info /tmp/server-info.json` to start the server - record the port written to `/tmp/server-info.json` - relinquish their `sudo` privileges (which is irreversible!) like so: ``` sudo deluser $USER sudo \|\| sudo gpasswd -d $USER sudo \|\| true ``` - use `codex` with the proxy (see `README.md`) - when done, make a `GET` request to the server using the `PORT` from `server-info.json` to shut it down: ```shell curl --fail --silent --show-error "http://127.0.0.1:$PORT/shutdown" ``` To protect the auth token, we: - allocate a 1024 byte buffer on the stack and write `"Bearer "` into it to start - we then read from `stdin`, copying to the contents into the buffer after the prefix - after verifying the input looks good, we create a `String` from that buffer (so the data is now on the heap) - we zero out the stack-allocated buffer using https://crates.io/crates/zeroize so it is not optimized away by the compiler - we invoke `.leak()` on the `String` so we can treat its contents as a `&'static str`, as it will live for the rest of the processs - on UNIX, we `mlock(2)` the memory backing the `&'static str` - when using the `&'static str` when building an HTTP request, we use `HeaderValue::from_static()` to avoid copying the `&str` - we also invoke `.set_sensitive(true)` on the `HeaderValue`, which in theory indicates to other parts of the HTTP stack that the header should be treated with "special care" to avoid leakage: `439d1c50d7/src/header/value.rs (L346-L376)`	2025-09-26 08:19:00 -07:00

5 Commits