OpenTelemetry events (#2103)

### Title

## otel

Codex can emit [OpenTelemetry](https://opentelemetry.io/) **log events**
that
describe each run: outbound API requests, streamed responses, user
input,
tool-approval decisions, and the result of every tool invocation. Export
is
**disabled by default** so local runs remain self-contained. Opt in by
adding an
`[otel]` table and choosing an exporter.

```toml
[otel]
environment = "staging"   # defaults to "dev"
exporter = "none"          # defaults to "none"; set to otlp-http or otlp-grpc to send events
log_user_prompt = false    # defaults to false; redact prompt text unless explicitly enabled
```

Codex tags every exported event with `service.name = "codex-cli"`, the
CLI
version, and an `env` attribute so downstream collectors can distinguish
dev/staging/prod traffic. Only telemetry produced inside the
`codex_otel`
crate—the events listed below—is forwarded to the exporter.

### Event catalog

Every event shares a common set of metadata fields: `event.timestamp`,
`conversation.id`, `app.version`, `auth_mode` (when available),
`user.account_id` (when available), `terminal.type`, `model`, and
`slug`.

With OTEL enabled Codex emits the following event types (in addition to
the
metadata above):

- `codex.api_request`
  - `cf_ray` (optional)
  - `attempt`
  - `duration_ms`
  - `http.response.status_code` (optional)
  - `error.message` (failures)
- `codex.sse_event`
  - `event.kind`
  - `duration_ms`
  - `error.message` (failures)
  - `input_token_count` (completion only)
  - `output_token_count` (completion only)
  - `cached_token_count` (completion only, optional)
  - `reasoning_token_count` (completion only, optional)
  - `tool_token_count` (completion only)
- `codex.user_prompt`
  - `prompt_length`
  - `prompt` (redacted unless `log_user_prompt = true`)
- `codex.tool_decision`
  - `tool_name`
  - `call_id`
- `decision` (`approved`, `approved_for_session`, `denied`, or `abort`)
  - `source` (`config` or `user`)
- `codex.tool_result`
  - `tool_name`
  - `call_id`
  - `arguments`
  - `duration_ms` (execution time for the tool)
  - `success` (`"true"` or `"false"`)
  - `output`

### Choosing an exporter

Set `otel.exporter` to control where events go:

- `none` – leaves instrumentation active but skips exporting. This is
the
  default.
- `otlp-http` – posts OTLP log records to an OTLP/HTTP collector.
Specify the
  endpoint, protocol, and headers your collector expects:

  ```toml
  [otel]
  exporter = { otlp-http = {
    endpoint = "https://otel.example.com/v1/logs",
    protocol = "binary",
    headers = { "x-otlp-api-key" = "${OTLP_TOKEN}" }
  }}
  ```

- `otlp-grpc` – streams OTLP log records over gRPC. Provide the endpoint
and any
  metadata headers:

  ```toml
  [otel]
  exporter = { otlp-grpc = {
    endpoint = "https://otel.example.com:4317",
    headers = { "x-otlp-meta" = "abc123" }
  }}
  ```

If the exporter is `none` nothing is written anywhere; otherwise you
must run or point to your
own collector. All exporters run on a background batch worker that is
flushed on
shutdown.

If you build Codex from source the OTEL crate is still behind an `otel`
feature
flag; the official prebuilt binaries ship with the feature enabled. When
the
feature is disabled the telemetry hooks become no-ops so the CLI
continues to
function without the extra dependencies.

---------

Co-authored-by: Anton Panasenko <apanasenko@openai.com>
This commit is contained in:
vishnu-oai
2025-09-29 19:30:55 +01:00
committed by GitHub
parent d15253415a
commit 04c1782e52
38 changed files with 3069 additions and 142 deletions

View File

@@ -24,6 +24,7 @@ codex-file-search = { workspace = true }
codex-mcp-client = { workspace = true }
codex-rmcp-client = { workspace = true }
codex-protocol = { workspace = true }
codex-otel = { workspace = true, features = ["otel"] }
dirs = { workspace = true }
env-flags = { workspace = true }
eventsource-stream = { workspace = true }
@@ -91,6 +92,7 @@ tempfile = { workspace = true }
tokio-test = { workspace = true }
walkdir = { workspace = true }
wiremock = { workspace = true }
tracing-test = { workspace = true, features = ["no-env-filter"] }
[package.metadata.cargo-shear]
ignored = ["openssl-sys"]

View File

@@ -45,12 +45,13 @@ pub(crate) async fn apply_patch(
&turn_context.sandbox_policy,
&turn_context.cwd,
) {
SafetyCheck::AutoApprove { .. } => {
InternalApplyPatchInvocation::DelegateToExec(ApplyPatchExec {
action,
user_explicitly_approved_this_action: false,
})
}
SafetyCheck::AutoApprove {
user_explicitly_approved,
..
} => InternalApplyPatchInvocation::DelegateToExec(ApplyPatchExec {
action,
user_explicitly_approved_this_action: user_explicitly_approved,
}),
SafetyCheck::AskUser => {
// Compute a readable summary of path changes to include in the
// approval request so the user can make an informed decision.

View File

@@ -1,6 +1,19 @@
use std::time::Duration;
use crate::ModelProviderInfo;
use crate::client_common::Prompt;
use crate::client_common::ResponseEvent;
use crate::client_common::ResponseStream;
use crate::error::CodexErr;
use crate::error::Result;
use crate::model_family::ModelFamily;
use crate::openai_tools::create_tools_json_for_chat_completions_api;
use crate::util::backoff;
use bytes::Bytes;
use codex_otel::otel_event_manager::OtelEventManager;
use codex_protocol::models::ContentItem;
use codex_protocol::models::ReasoningItemContent;
use codex_protocol::models::ResponseItem;
use eventsource_stream::Eventsource;
use futures::Stream;
use futures::StreamExt;
@@ -15,25 +28,13 @@ use tokio::time::timeout;
use tracing::debug;
use tracing::trace;
use crate::ModelProviderInfo;
use crate::client_common::Prompt;
use crate::client_common::ResponseEvent;
use crate::client_common::ResponseStream;
use crate::error::CodexErr;
use crate::error::Result;
use crate::model_family::ModelFamily;
use crate::openai_tools::create_tools_json_for_chat_completions_api;
use crate::util::backoff;
use codex_protocol::models::ContentItem;
use codex_protocol::models::ReasoningItemContent;
use codex_protocol::models::ResponseItem;
/// Implementation for the classic Chat Completions API.
pub(crate) async fn stream_chat_completions(
prompt: &Prompt,
model_family: &ModelFamily,
client: &reqwest::Client,
provider: &ModelProviderInfo,
otel_event_manager: &OtelEventManager,
) -> Result<ResponseStream> {
if prompt.output_schema.is_some() {
return Err(CodexErr::UnsupportedOperation(
@@ -294,10 +295,13 @@ pub(crate) async fn stream_chat_completions(
let req_builder = provider.create_request_builder(client, &None).await?;
let res = req_builder
.header(reqwest::header::ACCEPT, "text/event-stream")
.json(&payload)
.send()
let res = otel_event_manager
.log_request(attempt, || {
req_builder
.header(reqwest::header::ACCEPT, "text/event-stream")
.json(&payload)
.send()
})
.await;
match res {
@@ -308,6 +312,7 @@ pub(crate) async fn stream_chat_completions(
stream,
tx_event,
provider.stream_idle_timeout(),
otel_event_manager.clone(),
));
return Ok(ResponseStream { rx_event });
}
@@ -351,6 +356,7 @@ async fn process_chat_sse<S>(
stream: S,
tx_event: mpsc::Sender<Result<ResponseEvent>>,
idle_timeout: Duration,
otel_event_manager: OtelEventManager,
) where
S: Stream<Item = Result<Bytes>> + Unpin,
{
@@ -374,7 +380,10 @@ async fn process_chat_sse<S>(
let mut reasoning_text = String::new();
loop {
let sse = match timeout(idle_timeout, stream.next()).await {
let sse = match otel_event_manager
.log_sse_event(|| timeout(idle_timeout, stream.next()))
.await
{
Ok(Some(Ok(ev))) => ev,
Ok(Some(Err(e))) => {
let _ = tx_event

View File

@@ -47,6 +47,7 @@ use crate::protocol::RateLimitWindow;
use crate::protocol::TokenUsage;
use crate::token_data::PlanType;
use crate::util::backoff;
use codex_otel::otel_event_manager::OtelEventManager;
use codex_protocol::config_types::ReasoningEffort as ReasoningEffortConfig;
use codex_protocol::config_types::ReasoningSummary as ReasoningSummaryConfig;
use codex_protocol::models::ResponseItem;
@@ -73,6 +74,7 @@ struct Error {
pub struct ModelClient {
config: Arc<Config>,
auth_manager: Option<Arc<AuthManager>>,
otel_event_manager: OtelEventManager,
client: reqwest::Client,
provider: ModelProviderInfo,
conversation_id: ConversationId,
@@ -84,6 +86,7 @@ impl ModelClient {
pub fn new(
config: Arc<Config>,
auth_manager: Option<Arc<AuthManager>>,
otel_event_manager: OtelEventManager,
provider: ModelProviderInfo,
effort: Option<ReasoningEffortConfig>,
summary: ReasoningSummaryConfig,
@@ -94,6 +97,7 @@ impl ModelClient {
Self {
config,
auth_manager,
otel_event_manager,
client,
provider,
conversation_id,
@@ -127,6 +131,7 @@ impl ModelClient {
&self.config.model_family,
&self.client,
&self.provider,
&self.otel_event_manager,
)
.await?;
@@ -163,7 +168,12 @@ impl ModelClient {
if let Some(path) = &*CODEX_RS_SSE_FIXTURE {
// short circuit for tests
warn!(path, "Streaming from fixture");
return stream_from_fixture(path, self.provider.clone()).await;
return stream_from_fixture(
path,
self.provider.clone(),
self.otel_event_manager.clone(),
)
.await;
}
let auth_manager = self.auth_manager.clone();
@@ -233,7 +243,7 @@ impl ModelClient {
let max_attempts = self.provider.request_max_retries();
for attempt in 0..=max_attempts {
match self
.attempt_stream_responses(&payload_json, &auth_manager)
.attempt_stream_responses(attempt, &payload_json, &auth_manager)
.await
{
Ok(stream) => {
@@ -258,6 +268,7 @@ impl ModelClient {
/// Single attempt to start a streaming Responses API call.
async fn attempt_stream_responses(
&self,
attempt: u64,
payload_json: &Value,
auth_manager: &Option<Arc<AuthManager>>,
) -> std::result::Result<ResponseStream, StreamAttemptError> {
@@ -291,7 +302,11 @@ impl ModelClient {
req_builder = req_builder.header("chatgpt-account-id", account_id);
}
let res = req_builder.send().await;
let res = self
.otel_event_manager
.log_request(attempt, || req_builder.send())
.await;
if let Ok(resp) = &res {
trace!(
"Response status: {}, cf-ray: {}",
@@ -322,6 +337,7 @@ impl ModelClient {
stream,
tx_event,
self.provider.stream_idle_timeout(),
self.otel_event_manager.clone(),
));
Ok(ResponseStream { rx_event })
@@ -399,6 +415,10 @@ impl ModelClient {
self.provider.clone()
}
pub fn get_otel_event_manager(&self) -> OtelEventManager {
self.otel_event_manager.clone()
}
/// Returns the currently configured model slug.
pub fn get_model(&self) -> String {
self.config.model.clone()
@@ -605,6 +625,7 @@ async fn process_sse<S>(
stream: S,
tx_event: mpsc::Sender<Result<ResponseEvent>>,
idle_timeout: Duration,
otel_event_manager: OtelEventManager,
) where
S: Stream<Item = Result<Bytes>> + Unpin,
{
@@ -616,7 +637,10 @@ async fn process_sse<S>(
let mut response_error: Option<CodexErr> = None;
loop {
let sse = match timeout(idle_timeout, stream.next()).await {
let sse = match otel_event_manager
.log_sse_event(|| timeout(idle_timeout, stream.next()))
.await
{
Ok(Some(Ok(sse))) => sse,
Ok(Some(Err(e))) => {
debug!("SSE Error: {e:#}");
@@ -630,6 +654,21 @@ async fn process_sse<S>(
id: response_id,
usage,
}) => {
if let Some(token_usage) = &usage {
otel_event_manager.sse_event_completed(
token_usage.input_tokens,
token_usage.output_tokens,
token_usage
.input_tokens_details
.as_ref()
.map(|d| d.cached_tokens),
token_usage
.output_tokens_details
.as_ref()
.map(|d| d.reasoning_tokens),
token_usage.total_tokens,
);
}
let event = ResponseEvent::Completed {
response_id,
token_usage: usage.map(Into::into),
@@ -637,12 +676,13 @@ async fn process_sse<S>(
let _ = tx_event.send(Ok(event)).await;
}
None => {
let _ = tx_event
.send(Err(response_error.unwrap_or(CodexErr::Stream(
"stream closed before response.completed".into(),
None,
))))
.await;
let error = response_error.unwrap_or(CodexErr::Stream(
"stream closed before response.completed".into(),
None,
));
otel_event_manager.see_event_completed_failed(&error);
let _ = tx_event.send(Err(error)).await;
}
}
return;
@@ -746,7 +786,9 @@ async fn process_sse<S>(
response_error = Some(CodexErr::Stream(message, delay));
}
Err(e) => {
debug!("failed to parse ErrorResponse: {e}");
let error = format!("failed to parse ErrorResponse: {e}");
debug!(error);
response_error = Some(CodexErr::Stream(error, None))
}
}
}
@@ -760,7 +802,9 @@ async fn process_sse<S>(
response_completed = Some(r);
}
Err(e) => {
debug!("failed to parse ResponseCompleted: {e}");
let error = format!("failed to parse ResponseCompleted: {e}");
debug!(error);
response_error = Some(CodexErr::Stream(error, None));
continue;
}
};
@@ -807,6 +851,7 @@ async fn process_sse<S>(
async fn stream_from_fixture(
path: impl AsRef<Path>,
provider: ModelProviderInfo,
otel_event_manager: OtelEventManager,
) -> Result<ResponseStream> {
let (tx_event, rx_event) = mpsc::channel::<Result<ResponseEvent>>(1600);
let f = std::fs::File::open(path.as_ref())?;
@@ -825,6 +870,7 @@ async fn stream_from_fixture(
stream,
tx_event,
provider.stream_idle_timeout(),
otel_event_manager,
));
Ok(ResponseStream { rx_event })
}
@@ -880,6 +926,7 @@ mod tests {
async fn collect_events(
chunks: &[&[u8]],
provider: ModelProviderInfo,
otel_event_manager: OtelEventManager,
) -> Vec<Result<ResponseEvent>> {
let mut builder = IoBuilder::new();
for chunk in chunks {
@@ -889,7 +936,12 @@ mod tests {
let reader = builder.build();
let stream = ReaderStream::new(reader).map_err(CodexErr::Io);
let (tx, mut rx) = mpsc::channel::<Result<ResponseEvent>>(16);
tokio::spawn(process_sse(stream, tx, provider.stream_idle_timeout()));
tokio::spawn(process_sse(
stream,
tx,
provider.stream_idle_timeout(),
otel_event_manager,
));
let mut events = Vec::new();
while let Some(ev) = rx.recv().await {
@@ -903,6 +955,7 @@ mod tests {
async fn run_sse(
events: Vec<serde_json::Value>,
provider: ModelProviderInfo,
otel_event_manager: OtelEventManager,
) -> Vec<ResponseEvent> {
let mut body = String::new();
for e in events {
@@ -919,7 +972,12 @@ mod tests {
let (tx, mut rx) = mpsc::channel::<Result<ResponseEvent>>(8);
let stream = ReaderStream::new(std::io::Cursor::new(body)).map_err(CodexErr::Io);
tokio::spawn(process_sse(stream, tx, provider.stream_idle_timeout()));
tokio::spawn(process_sse(
stream,
tx,
provider.stream_idle_timeout(),
otel_event_manager,
));
let mut out = Vec::new();
while let Some(ev) = rx.recv().await {
@@ -928,6 +986,18 @@ mod tests {
out
}
fn otel_event_manager() -> OtelEventManager {
OtelEventManager::new(
ConversationId::new(),
"test",
"test",
None,
Some(AuthMode::ChatGPT),
false,
"test".to_string(),
)
}
// ────────────────────────────
// Tests from `implement-test-for-responses-api-sse-parser`
// ────────────────────────────
@@ -979,9 +1049,12 @@ mod tests {
requires_openai_auth: false,
};
let otel_event_manager = otel_event_manager();
let events = collect_events(
&[sse1.as_bytes(), sse2.as_bytes(), sse3.as_bytes()],
provider,
otel_event_manager,
)
.await;
@@ -1039,7 +1112,9 @@ mod tests {
requires_openai_auth: false,
};
let events = collect_events(&[sse1.as_bytes()], provider).await;
let otel_event_manager = otel_event_manager();
let events = collect_events(&[sse1.as_bytes()], provider, otel_event_manager).await;
assert_eq!(events.len(), 2);
@@ -1073,7 +1148,9 @@ mod tests {
requires_openai_auth: false,
};
let events = collect_events(&[sse1.as_bytes()], provider).await;
let otel_event_manager = otel_event_manager();
let events = collect_events(&[sse1.as_bytes()], provider, otel_event_manager).await;
assert_eq!(events.len(), 1);
@@ -1178,7 +1255,9 @@ mod tests {
requires_openai_auth: false,
};
let out = run_sse(evs, provider).await;
let otel_event_manager = otel_event_manager();
let out = run_sse(evs, provider, otel_event_manager).await;
assert_eq!(out.len(), case.expected_len, "case {}", case.name);
assert!(
(case.expect_first)(&out[0]),

View File

@@ -1,5 +1,6 @@
use std::borrow::Cow;
use std::collections::HashMap;
use std::fmt::Debug;
use std::path::Path;
use std::path::PathBuf;
use std::sync::Arc;
@@ -11,6 +12,7 @@ use crate::client_common::REVIEW_PROMPT;
use crate::event_mapping::map_response_item_to_event_messages;
use crate::function_tool::FunctionCallError;
use crate::review_format::format_review_findings_block;
use crate::terminal;
use crate::user_notification::UserNotifier;
use async_channel::Receiver;
use async_channel::Sender;
@@ -125,6 +127,8 @@ use crate::unified_exec::UnifiedExecSessionManager;
use crate::user_instructions::UserInstructions;
use crate::user_notification::UserNotification;
use crate::util::backoff;
use codex_otel::otel_event_manager::OtelEventManager;
use codex_otel::otel_event_manager::ToolDecisionSource;
use codex_protocol::config_types::ReasoningEffort as ReasoningEffortConfig;
use codex_protocol::config_types::ReasoningSummary as ReasoningSummaryConfig;
use codex_protocol::custom_prompts::CustomPrompt;
@@ -422,11 +426,35 @@ impl Session {
}
}
let otel_event_manager = OtelEventManager::new(
conversation_id,
config.model.as_str(),
config.model_family.slug.as_str(),
auth_manager.auth().and_then(|a| a.get_account_id()),
auth_manager.auth().map(|a| a.mode),
config.otel.log_user_prompt,
terminal::user_agent(),
);
otel_event_manager.conversation_starts(
config.model_provider.name.as_str(),
config.model_reasoning_effort,
config.model_reasoning_summary,
config.model_context_window,
config.model_max_output_tokens,
config.model_auto_compact_token_limit,
config.approval_policy,
config.sandbox_policy.clone(),
config.mcp_servers.keys().map(String::as_str).collect(),
config.active_profile.clone(),
);
// Now that the conversation id is final (may have been updated by resume),
// construct the model client.
let client = ModelClient::new(
config.clone(),
Some(auth_manager.clone()),
otel_event_manager,
provider.clone(),
model_reasoning_effort,
model_reasoning_summary,
@@ -1122,9 +1150,15 @@ async fn submission_loop(
updated_config.model_context_window = Some(model_info.context_window);
}
let otel_event_manager = prev.client.get_otel_event_manager().with_model(
updated_config.model.as_str(),
updated_config.model_family.slug.as_str(),
);
let client = ModelClient::new(
Arc::new(updated_config),
auth_manager,
otel_event_manager,
provider,
effective_effort,
effective_summary,
@@ -1176,6 +1210,10 @@ async fn submission_loop(
}
}
Op::UserInput { items } => {
turn_context
.client
.get_otel_event_manager()
.user_prompt(&items);
// attempt to inject input into current task
if let Err(items) = sess.inject_input(items).await {
// no current task, spawn a new one
@@ -1193,6 +1231,10 @@ async fn submission_loop(
summary,
final_output_json_schema,
} => {
turn_context
.client
.get_otel_event_manager()
.user_prompt(&items);
// attempt to inject input into current task
if let Err(items) = sess.inject_input(items).await {
// Derive a fresh TurnContext for this turn using the provided overrides.
@@ -1211,11 +1253,18 @@ async fn submission_loop(
per_turn_config.model_context_window = Some(model_info.context_window);
}
let otel_event_manager =
turn_context.client.get_otel_event_manager().with_model(
per_turn_config.model.as_str(),
per_turn_config.model_family.slug.as_str(),
);
// Build a new client with perturn reasoning settings.
// Reuse the same provider and session id; auth defaults to env/API key.
let client = ModelClient::new(
Arc::new(per_turn_config),
auth_manager,
otel_event_manager,
provider,
effort,
summary,
@@ -1472,10 +1521,19 @@ async fn spawn_review_thread(
per_turn_config.model_context_window = Some(model_info.context_window);
}
let otel_event_manager = parent_turn_context
.client
.get_otel_event_manager()
.with_model(
per_turn_config.model.as_str(),
per_turn_config.model_family.slug.as_str(),
);
let per_turn_config = Arc::new(per_turn_config);
let client = ModelClient::new(
per_turn_config.clone(),
auth_manager,
otel_event_manager,
provider,
per_turn_config.model_reasoning_effort,
per_turn_config.model_reasoning_summary,
@@ -2140,16 +2198,21 @@ async fn handle_response_item(
.await;
Some(resp)
} else {
let result = handle_function_call(
sess,
turn_context,
turn_diff_tracker,
sub_id.to_string(),
name,
arguments,
call_id.clone(),
)
.await;
let result = turn_context
.client
.get_otel_event_manager()
.log_tool_result(name.as_str(), call_id.as_str(), arguments.as_str(), || {
handle_function_call(
sess,
turn_context,
turn_diff_tracker,
sub_id.to_string(),
name.to_owned(),
arguments.to_owned(),
call_id.clone(),
)
})
.await;
let output = match result {
Ok(content) => FunctionCallOutputPayload {
@@ -2170,6 +2233,7 @@ async fn handle_response_item(
status: _,
action,
} => {
let name = "local_shell";
let LocalShellAction::Exec(action) = action;
tracing::info!("LocalShellCall: {action:?}");
let params = ShellToolCallParams {
@@ -2183,11 +2247,18 @@ async fn handle_response_item(
(Some(call_id), _) => call_id,
(None, Some(id)) => id,
(None, None) => {
error!("LocalShellCall without call_id or id");
let error_message = "LocalShellCall without call_id or id";
turn_context
.client
.get_otel_event_manager()
.log_tool_failed(name, error_message);
error!(error_message);
return Ok(Some(ResponseInputItem::FunctionCallOutput {
call_id: "".to_string(),
output: FunctionCallOutputPayload {
content: "LocalShellCall without call_id or id".to_string(),
content: error_message.to_string(),
success: None,
},
}));
@@ -2196,15 +2267,26 @@ async fn handle_response_item(
let exec_params = to_exec_params(params, turn_context);
{
let result = handle_container_exec_with_params(
exec_params,
sess,
turn_context,
turn_diff_tracker,
sub_id.to_string(),
effective_call_id.clone(),
)
.await;
let result = turn_context
.client
.get_otel_event_manager()
.log_tool_result(
name,
effective_call_id.as_str(),
exec_params.command.join(" ").as_str(),
|| {
handle_container_exec_with_params(
name,
exec_params,
sess,
turn_context,
turn_diff_tracker,
sub_id.to_string(),
effective_call_id.clone(),
)
},
)
.await;
let output = match result {
Ok(content) => FunctionCallOutputPayload {
@@ -2229,16 +2311,21 @@ async fn handle_response_item(
input,
status: _,
} => {
let result = handle_custom_tool_call(
sess,
turn_context,
turn_diff_tracker,
sub_id.to_string(),
name,
input,
call_id.clone(),
)
.await;
let result = turn_context
.client
.get_otel_event_manager()
.log_tool_result(name.as_str(), call_id.as_str(), input.as_str(), || {
handle_custom_tool_call(
sess,
turn_context,
turn_diff_tracker,
sub_id.to_string(),
name.to_owned(),
input.to_owned(),
call_id.clone(),
)
})
.await;
let output = match result {
Ok(content) => content,
@@ -2344,6 +2431,7 @@ async fn handle_function_call(
"container.exec" | "shell" => {
let params = parse_container_exec_arguments(arguments, turn_context, &call_id)?;
handle_container_exec_with_params(
name.as_str(),
params,
sess,
turn_context,
@@ -2407,6 +2495,7 @@ async fn handle_function_call(
justification: None,
};
handle_container_exec_with_params(
name.as_str(),
exec_params,
sess,
turn_context,
@@ -2479,6 +2568,7 @@ async fn handle_custom_tool_call(
};
handle_container_exec_with_params(
name.as_str(),
exec_params,
sess,
turn_context,
@@ -2548,6 +2638,7 @@ fn maybe_translate_shell_command(
}
async fn handle_container_exec_with_params(
tool_name: &str,
params: ExecParams,
sess: &Session,
turn_context: &TurnContext,
@@ -2555,6 +2646,8 @@ async fn handle_container_exec_with_params(
sub_id: String,
call_id: String,
) -> Result<String, FunctionCallError> {
let otel_event_manager = turn_context.client.get_otel_event_manager();
if params.with_escalated_permissions.unwrap_or(false)
&& !matches!(turn_context.approval_policy, AskForApproval::OnRequest)
{
@@ -2618,6 +2711,7 @@ async fn handle_container_exec_with_params(
let safety = if *user_explicitly_approved_this_action {
SafetyCheck::AutoApprove {
sandbox_type: SandboxType::None,
user_explicitly_approved: true,
}
} else {
assess_safety_for_untrusted_command(
@@ -2649,7 +2743,23 @@ async fn handle_container_exec_with_params(
};
let sandbox_type = match safety {
SafetyCheck::AutoApprove { sandbox_type } => sandbox_type,
SafetyCheck::AutoApprove {
sandbox_type,
user_explicitly_approved,
} => {
otel_event_manager.tool_decision(
tool_name,
call_id.as_str(),
ReviewDecision::Approved,
if user_explicitly_approved {
ToolDecisionSource::User
} else {
ToolDecisionSource::Config
},
);
sandbox_type
}
SafetyCheck::AskUser => {
let decision = sess
.request_command_approval(
@@ -2661,15 +2771,45 @@ async fn handle_container_exec_with_params(
)
.await;
match decision {
ReviewDecision::Approved => (),
ReviewDecision::Approved => {
otel_event_manager.tool_decision(
tool_name,
call_id.as_str(),
ReviewDecision::Approved,
ToolDecisionSource::User,
);
}
ReviewDecision::ApprovedForSession => {
otel_event_manager.tool_decision(
tool_name,
call_id.as_str(),
ReviewDecision::ApprovedForSession,
ToolDecisionSource::User,
);
sess.add_approved_command(params.command.clone()).await;
}
ReviewDecision::Denied | ReviewDecision::Abort => {
ReviewDecision::Denied => {
otel_event_manager.tool_decision(
tool_name,
call_id.as_str(),
ReviewDecision::Denied,
ToolDecisionSource::User,
);
return Err(FunctionCallError::RespondToModel(
"exec command rejected by user".to_string(),
));
}
ReviewDecision::Abort => {
otel_event_manager.tool_decision(
tool_name,
call_id.as_str(),
ReviewDecision::Abort,
ToolDecisionSource::User,
);
return Err(FunctionCallError::RespondToModel(
"exec command aborted by user".to_string(),
));
}
}
// No sandboxing is applied because the user has given
// explicit approval. Often, we end up in this case because
@@ -2678,6 +2818,12 @@ async fn handle_container_exec_with_params(
SandboxType::None
}
SafetyCheck::Reject { reason } => {
otel_event_manager.tool_decision(
tool_name,
call_id.as_str(),
ReviewDecision::Denied,
ToolDecisionSource::Config,
);
return Err(FunctionCallError::RespondToModel(format!(
"exec command rejected: {reason:?}"
)));
@@ -2736,6 +2882,7 @@ async fn handle_container_exec_with_params(
}
Err(CodexErr::Sandbox(error)) => {
handle_sandbox_error(
tool_name,
turn_diff_tracker,
params,
exec_command_context,
@@ -2743,6 +2890,7 @@ async fn handle_container_exec_with_params(
sandbox_type,
sess,
turn_context,
&otel_event_manager,
)
.await
}
@@ -2752,7 +2900,9 @@ async fn handle_container_exec_with_params(
}
}
#[allow(clippy::too_many_arguments)]
async fn handle_sandbox_error(
tool_name: &str,
turn_diff_tracker: &mut TurnDiffTracker,
params: ExecParams,
exec_command_context: ExecCommandContext,
@@ -2760,6 +2910,7 @@ async fn handle_sandbox_error(
sandbox_type: SandboxType,
sess: &Session,
turn_context: &TurnContext,
otel_event_manager: &OtelEventManager,
) -> Result<String, FunctionCallError> {
let call_id = exec_command_context.call_id.clone();
let sub_id = exec_command_context.sub_id.clone();
@@ -2814,6 +2965,13 @@ async fn handle_sandbox_error(
sess.notify_background_event(&sub_id, "retrying command without sandbox")
.await;
otel_event_manager.tool_decision(
tool_name,
call_id.as_str(),
decision,
ToolDecisionSource::User,
);
// This is an escalated retry; the policy will not be
// examined and the sandbox has been set to `None`.
let retry_output_result = sess
@@ -2854,7 +3012,14 @@ async fn handle_sandbox_error(
))),
}
}
ReviewDecision::Denied | ReviewDecision::Abort => {
decision @ (ReviewDecision::Denied | ReviewDecision::Abort) => {
otel_event_manager.tool_decision(
tool_name,
call_id.as_str(),
decision,
ToolDecisionSource::User,
);
// Fall through to original failure handling.
Err(FunctionCallError::RespondToModel(
"exec command rejected by user".to_string(),
@@ -3129,13 +3294,17 @@ mod tests {
use super::*;
use crate::config::ConfigOverrides;
use crate::config::ConfigToml;
use crate::protocol::CompactedItem;
use crate::protocol::InitialHistory;
use crate::protocol::ResumedHistory;
use crate::state::TaskKind;
use crate::tasks::SessionTask;
use crate::tasks::SessionTaskContext;
use codex_protocol::mcp_protocol::AuthMode;
use codex_protocol::models::ContentItem;
use codex_protocol::models::ResponseItem;
use mcp_types::ContentBlock;
use mcp_types::TextContent;
use pretty_assertions::assert_eq;
@@ -3370,6 +3539,18 @@ mod tests {
})
}
fn otel_event_manager(conversation_id: ConversationId, config: &Config) -> OtelEventManager {
OtelEventManager::new(
conversation_id,
config.model.as_str(),
config.model_family.slug.as_str(),
None,
Some(AuthMode::ChatGPT),
false,
"test".to_string(),
)
}
pub(crate) fn make_session_and_context() -> (Session, TurnContext) {
let (tx_event, _rx_event) = async_channel::unbounded();
let codex_home = tempfile::tempdir().expect("create temp dir");
@@ -3381,9 +3562,11 @@ mod tests {
.expect("load default test config");
let config = Arc::new(config);
let conversation_id = ConversationId::default();
let otel_event_manager = otel_event_manager(conversation_id, config.as_ref());
let client = ModelClient::new(
config.clone(),
None,
otel_event_manager,
config.model_provider.clone(),
config.model_reasoning_effort,
config.model_reasoning_summary,
@@ -3448,9 +3631,11 @@ mod tests {
.expect("load default test config");
let config = Arc::new(config);
let conversation_id = ConversationId::default();
let otel_event_manager = otel_event_manager(conversation_id, config.as_ref());
let client = ModelClient::new(
config.clone(),
None,
otel_event_manager,
config.model_provider.clone(),
config.model_reasoning_effort,
config.model_reasoning_summary,
@@ -3741,10 +3926,12 @@ mod tests {
let mut turn_diff_tracker = TurnDiffTracker::new();
let tool_name = "shell";
let sub_id = "test-sub".to_string();
let call_id = "test-call".to_string();
let resp = handle_container_exec_with_params(
tool_name,
params,
&session,
&turn_context,
@@ -3770,6 +3957,7 @@ mod tests {
turn_context.sandbox_policy = SandboxPolicy::DangerFullAccess;
let resp2 = handle_container_exec_with_params(
tool_name,
params2,
&session,
&turn_context,

View File

@@ -1,8 +1,12 @@
use crate::config_profile::ConfigProfile;
use crate::config_types::DEFAULT_OTEL_ENVIRONMENT;
use crate::config_types::History;
use crate::config_types::McpServerConfig;
use crate::config_types::McpServerTransportConfig;
use crate::config_types::Notifications;
use crate::config_types::OtelConfig;
use crate::config_types::OtelConfigToml;
use crate::config_types::OtelExporterKind;
use crate::config_types::ReasoningSummaryFormat;
use crate::config_types::SandboxWorkspaceWrite;
use crate::config_types::ShellEnvironmentPolicy;
@@ -199,6 +203,9 @@ pub struct Config {
/// All characters are inserted as they are received, and no buffering
/// or placeholder replacement will occur for fast keypress bursts.
pub disable_paste_burst: bool,
/// OTEL configuration (exporter type, endpoint, headers, etc.).
pub otel: crate::config_types::OtelConfig,
}
impl Config {
@@ -719,6 +726,9 @@ pub struct ConfigToml {
/// All characters are inserted as they are received, and no buffering
/// or placeholder replacement will occur for fast keypress bursts.
pub disable_paste_burst: Option<bool>,
/// OTEL configuration.
pub otel: Option<crate::config_types::OtelConfigToml>,
}
impl From<ConfigToml> for UserSavedConfig {
@@ -1068,6 +1078,19 @@ impl Config {
.as_ref()
.map(|t| t.notifications.clone())
.unwrap_or_default(),
otel: {
let t: OtelConfigToml = cfg.otel.unwrap_or_default();
let log_user_prompt = t.log_user_prompt.unwrap_or(false);
let environment = t
.environment
.unwrap_or(DEFAULT_OTEL_ENVIRONMENT.to_string());
let exporter = t.exporter.unwrap_or(OtelExporterKind::None);
OtelConfig {
log_user_prompt,
environment,
exporter,
}
},
};
Ok(config)
}
@@ -1809,6 +1832,7 @@ model_verbosity = "high"
active_profile: Some("o3".to_string()),
disable_paste_burst: false,
tui_notifications: Default::default(),
otel: OtelConfig::default(),
},
o3_profile_config
);
@@ -1868,6 +1892,7 @@ model_verbosity = "high"
active_profile: Some("gpt3".to_string()),
disable_paste_burst: false,
tui_notifications: Default::default(),
otel: OtelConfig::default(),
};
assert_eq!(expected_gpt3_profile_config, gpt3_profile_config);
@@ -1942,6 +1967,7 @@ model_verbosity = "high"
active_profile: Some("zdr".to_string()),
disable_paste_burst: false,
tui_notifications: Default::default(),
otel: OtelConfig::default(),
};
assert_eq!(expected_zdr_profile_config, zdr_profile_config);
@@ -2002,6 +2028,7 @@ model_verbosity = "high"
active_profile: Some("gpt5".to_string()),
disable_paste_burst: false,
tui_notifications: Default::default(),
otel: OtelConfig::default(),
};
assert_eq!(expected_gpt5_profile_config, gpt5_profile_config);

View File

@@ -13,6 +13,8 @@ use serde::Deserialize;
use serde::Serialize;
use serde::de::Error as SerdeError;
pub const DEFAULT_OTEL_ENVIRONMENT: &str = "dev";
#[derive(Serialize, Debug, Clone, PartialEq)]
pub struct McpServerConfig {
#[serde(flatten)]
@@ -219,6 +221,64 @@ pub enum HistoryPersistence {
None,
}
// ===== OTEL configuration =====
#[derive(Deserialize, Debug, Clone, PartialEq)]
#[serde(rename_all = "kebab-case")]
pub enum OtelHttpProtocol {
/// Binary payload
Binary,
/// JSON payload
Json,
}
/// Which OTEL exporter to use.
#[derive(Deserialize, Debug, Clone, PartialEq)]
#[serde(rename_all = "kebab-case")]
pub enum OtelExporterKind {
None,
OtlpHttp {
endpoint: String,
headers: HashMap<String, String>,
protocol: OtelHttpProtocol,
},
OtlpGrpc {
endpoint: String,
headers: HashMap<String, String>,
},
}
/// OTEL settings loaded from config.toml. Fields are optional so we can apply defaults.
#[derive(Deserialize, Debug, Clone, PartialEq, Default)]
pub struct OtelConfigToml {
/// Log user prompt in traces
pub log_user_prompt: Option<bool>,
/// Mark traces with environment (dev, staging, prod, test). Defaults to dev.
pub environment: Option<String>,
/// Exporter to use. Defaults to `otlp-file`.
pub exporter: Option<OtelExporterKind>,
}
/// Effective OTEL settings after defaults are applied.
#[derive(Debug, Clone, PartialEq)]
pub struct OtelConfig {
pub log_user_prompt: bool,
pub environment: String,
pub exporter: OtelExporterKind,
}
impl Default for OtelConfig {
fn default() -> Self {
OtelConfig {
log_user_prompt: false,
environment: DEFAULT_OTEL_ENVIRONMENT.to_owned(),
exporter: OtelExporterKind::None,
}
}
}
#[derive(Debug, Clone, PartialEq, Eq, Deserialize)]
#[serde(untagged)]
pub enum Notifications {

View File

@@ -103,3 +103,5 @@ pub use codex_protocol::models::LocalShellExecAction;
pub use codex_protocol::models::LocalShellStatus;
pub use codex_protocol::models::ReasoningItemContent;
pub use codex_protocol::models::ResponseItem;
pub mod otel_init;

View File

@@ -0,0 +1,61 @@
use crate::config::Config;
use crate::config_types::OtelExporterKind as Kind;
use crate::config_types::OtelHttpProtocol as Protocol;
use crate::default_client::ORIGINATOR;
use codex_otel::config::OtelExporter;
use codex_otel::config::OtelHttpProtocol;
use codex_otel::config::OtelSettings;
use codex_otel::otel_provider::OtelProvider;
use std::error::Error;
/// Build an OpenTelemetry provider from the app Config.
///
/// Returns `None` when OTEL export is disabled.
pub fn build_provider(
config: &Config,
service_version: &str,
) -> Result<Option<OtelProvider>, Box<dyn Error>> {
let exporter = match &config.otel.exporter {
Kind::None => OtelExporter::None,
Kind::OtlpHttp {
endpoint,
headers,
protocol,
} => {
let protocol = match protocol {
Protocol::Json => OtelHttpProtocol::Json,
Protocol::Binary => OtelHttpProtocol::Binary,
};
OtelExporter::OtlpHttp {
endpoint: endpoint.clone(),
headers: headers
.iter()
.map(|(k, v)| (k.clone(), v.clone()))
.collect(),
protocol,
}
}
Kind::OtlpGrpc { endpoint, headers } => OtelExporter::OtlpGrpc {
endpoint: endpoint.clone(),
headers: headers
.iter()
.map(|(k, v)| (k.clone(), v.clone()))
.collect(),
},
};
OtelProvider::from(&OtelSettings {
service_name: ORIGINATOR.value.to_owned(),
service_version: service_version.to_string(),
codex_home: config.codex_home.clone(),
environment: config.otel.environment.to_string(),
exporter,
})
}
/// Filter predicate for exporting only Codex-owned events via OTEL.
/// Keeps events that originated from codex_otel module
pub fn codex_export_filter(meta: &tracing::Metadata<'_>) -> bool {
meta.target().starts_with("codex_otel")
}

View File

@@ -15,9 +15,14 @@ use crate::protocol::SandboxPolicy;
#[derive(Debug, PartialEq)]
pub enum SafetyCheck {
AutoApprove { sandbox_type: SandboxType },
AutoApprove {
sandbox_type: SandboxType,
user_explicitly_approved: bool,
},
AskUser,
Reject { reason: String },
Reject {
reason: String,
},
}
pub fn assess_patch_safety(
@@ -54,12 +59,16 @@ pub fn assess_patch_safety(
// fall back to asking the user because the patch may touch arbitrary
// paths outside the project.
match get_platform_sandbox() {
Some(sandbox_type) => SafetyCheck::AutoApprove { sandbox_type },
Some(sandbox_type) => SafetyCheck::AutoApprove {
sandbox_type,
user_explicitly_approved: false,
},
None if sandbox_policy == &SandboxPolicy::DangerFullAccess => {
// If the user has explicitly requested DangerFullAccess, then
// we can auto-approve even without a sandbox.
SafetyCheck::AutoApprove {
sandbox_type: SandboxType::None,
user_explicitly_approved: false,
}
}
None => SafetyCheck::AskUser,
@@ -118,6 +127,7 @@ pub fn assess_command_safety(
if is_known_safe_command(command) || approved.contains(command) {
return SafetyCheck::AutoApprove {
sandbox_type: SandboxType::None,
user_explicitly_approved: false,
};
}
@@ -143,13 +153,17 @@ pub(crate) fn assess_safety_for_untrusted_command(
| (Never, DangerFullAccess)
| (OnRequest, DangerFullAccess) => SafetyCheck::AutoApprove {
sandbox_type: SandboxType::None,
user_explicitly_approved: false,
},
(OnRequest, ReadOnly) | (OnRequest, WorkspaceWrite { .. }) => {
if with_escalated_permissions {
SafetyCheck::AskUser
} else {
match get_platform_sandbox() {
Some(sandbox_type) => SafetyCheck::AutoApprove { sandbox_type },
Some(sandbox_type) => SafetyCheck::AutoApprove {
sandbox_type,
user_explicitly_approved: false,
},
// Fall back to asking since the command is untrusted and
// we do not have a sandbox available
None => SafetyCheck::AskUser,
@@ -161,7 +175,10 @@ pub(crate) fn assess_safety_for_untrusted_command(
| (OnFailure, ReadOnly)
| (OnFailure, WorkspaceWrite { .. }) => {
match get_platform_sandbox() {
Some(sandbox_type) => SafetyCheck::AutoApprove { sandbox_type },
Some(sandbox_type) => SafetyCheck::AutoApprove {
sandbox_type,
user_explicitly_approved: false,
},
None => {
if matches!(approval_policy, OnFailure) {
// Since the command is not trusted, even though the
@@ -362,7 +379,8 @@ mod tests {
assert_eq!(
safety_check,
SafetyCheck::AutoApprove {
sandbox_type: SandboxType::None
sandbox_type: SandboxType::None,
user_explicitly_approved: false,
}
);
}
@@ -409,7 +427,10 @@ mod tests {
);
let expected = match get_platform_sandbox() {
Some(sandbox_type) => SafetyCheck::AutoApprove { sandbox_type },
Some(sandbox_type) => SafetyCheck::AutoApprove {
sandbox_type,
user_explicitly_approved: false,
},
None => SafetyCheck::AskUser,
};
assert_eq!(safety_check, expected);

View File

@@ -11,6 +11,8 @@ use codex_core::ReasoningItemContent;
use codex_core::ResponseItem;
use codex_core::WireApi;
use codex_core::spawn::CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR;
use codex_otel::otel_event_manager::OtelEventManager;
use codex_protocol::mcp_protocol::AuthMode;
use codex_protocol::mcp_protocol::ConversationId;
use core_test_support::load_default_config_for_test;
use futures::StreamExt;
@@ -70,13 +72,26 @@ async fn run_request(input: Vec<ResponseItem>) -> Value {
let summary = config.model_reasoning_summary;
let config = Arc::new(config);
let conversation_id = ConversationId::new();
let otel_event_manager = OtelEventManager::new(
conversation_id,
config.model.as_str(),
config.model_family.slug.as_str(),
None,
Some(AuthMode::ChatGPT),
false,
"test".to_string(),
);
let client = ModelClient::new(
Arc::clone(&config),
None,
otel_event_manager,
provider,
effort,
summary,
ConversationId::new(),
conversation_id,
);
let mut prompt = Prompt::default();

View File

@@ -1,4 +1,5 @@
use std::sync::Arc;
use tracing_test::traced_test;
use codex_core::ContentItem;
use codex_core::ModelClient;
@@ -8,6 +9,8 @@ use codex_core::ResponseEvent;
use codex_core::ResponseItem;
use codex_core::WireApi;
use codex_core::spawn::CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR;
use codex_otel::otel_event_manager::OtelEventManager;
use codex_protocol::mcp_protocol::AuthMode;
use codex_protocol::mcp_protocol::ConversationId;
use core_test_support::load_default_config_for_test;
use futures::StreamExt;
@@ -23,11 +26,15 @@ fn network_disabled() -> bool {
}
async fn run_stream(sse_body: &str) -> Vec<ResponseEvent> {
run_stream_with_bytes(sse_body.as_bytes()).await
}
async fn run_stream_with_bytes(sse_body: &[u8]) -> Vec<ResponseEvent> {
let server = MockServer::start().await;
let template = ResponseTemplate::new(200)
.insert_header("content-type", "text/event-stream")
.set_body_raw(sse_body.to_string(), "text/event-stream");
.set_body_bytes(sse_body.to_vec());
Mock::given(method("POST"))
.and(path("/v1/chat/completions"))
@@ -63,13 +70,26 @@ async fn run_stream(sse_body: &str) -> Vec<ResponseEvent> {
let summary = config.model_reasoning_summary;
let config = Arc::new(config);
let conversation_id = ConversationId::new();
let otel_event_manager = OtelEventManager::new(
conversation_id,
config.model.as_str(),
config.model_family.slug.as_str(),
None,
Some(AuthMode::ChatGPT),
false,
"test".to_string(),
);
let client = ModelClient::new(
Arc::clone(&config),
None,
otel_event_manager,
provider,
effort,
summary,
ConversationId::new(),
conversation_id,
);
let mut prompt = Prompt::default();
@@ -89,7 +109,8 @@ async fn run_stream(sse_body: &str) -> Vec<ResponseEvent> {
while let Some(event) = stream.next().await {
match event {
Ok(ev) => events.push(ev),
Err(e) => panic!("stream event error: {e}"),
// We still collect the error to exercise telemetry and complete the task.
Err(_e) => break,
}
}
events
@@ -318,3 +339,88 @@ async fn streams_reasoning_before_tool_call() {
assert!(matches!(events[3], ResponseEvent::Completed { .. }));
}
#[tokio::test]
#[traced_test]
async fn chat_sse_emits_failed_on_parse_error() {
if network_disabled() {
println!(
"Skipping test because it cannot execute when network is disabled in a Codex sandbox."
);
return;
}
let sse_body = concat!("data: not-json\n\n", "data: [DONE]\n\n");
let _ = run_stream(sse_body).await;
logs_assert(|lines: &[&str]| {
lines
.iter()
.find(|line| {
line.contains("codex.api_request") && line.contains("http.response.status_code=200")
})
.map(|_| Ok(()))
.unwrap_or(Err("cannot find codex.api_request event".to_string()))
});
logs_assert(|lines: &[&str]| {
lines
.iter()
.find(|line| {
line.contains("codex.sse_event")
&& line.contains("error.message")
&& line.contains("expected ident at line 1 column 2")
})
.map(|_| Ok(()))
.unwrap_or(Err("cannot find SSE event".to_string()))
});
}
#[tokio::test]
#[traced_test]
async fn chat_sse_done_chunk_emits_event() {
if network_disabled() {
println!(
"Skipping test because it cannot execute when network is disabled in a Codex sandbox."
);
return;
}
let sse_body = "data: [DONE]\n\n";
let _ = run_stream(sse_body).await;
logs_assert(|lines: &[&str]| {
lines
.iter()
.find(|line| line.contains("codex.sse_event") && line.contains("event.kind=message"))
.map(|_| Ok(()))
.unwrap_or(Err("cannot find SSE event".to_string()))
});
}
#[tokio::test]
#[traced_test]
async fn chat_sse_emits_error_on_invalid_utf8() {
if network_disabled() {
println!(
"Skipping test because it cannot execute when network is disabled in a Codex sandbox."
);
return;
}
let _ = run_stream_with_bytes(b"data: \x80\x80\n\n").await;
logs_assert(|lines: &[&str]| {
lines
.iter()
.find(|line| {
line.contains("codex.sse_event")
&& line.contains("error.message")
&& line.contains("UTF8 error: invalid utf-8 sequence of 1 bytes from index 0")
})
.map(|_| Ok(()))
.unwrap_or(Err("cannot find SSE event".to_string()))
});
}

View File

@@ -75,6 +75,33 @@ pub fn ev_function_call(call_id: &str, name: &str, arguments: &str) -> Value {
})
}
pub fn ev_custom_tool_call(call_id: &str, name: &str, input: &str) -> Value {
serde_json::json!({
"type": "response.output_item.done",
"item": {
"type": "custom_tool_call",
"call_id": call_id,
"name": name,
"input": input
}
})
}
pub fn ev_local_shell_call(call_id: &str, status: &str, command: Vec<&str>) -> Value {
serde_json::json!({
"type": "response.output_item.done",
"item": {
"type": "local_shell_call",
"call_id": call_id,
"status": status,
"action": {
"type": "exec",
"command": command,
}
}
})
}
/// Convenience: SSE event for an `apply_patch` custom tool call with raw patch
/// text. This mirrors the payload produced by the Responses API when the model
/// invokes `apply_patch` directly (before we convert it to a function call).
@@ -114,7 +141,7 @@ pub fn sse_response(body: String) -> ResponseTemplate {
.set_body_raw(body, "text/event-stream")
}
pub async fn mount_sse_once<M>(server: &MockServer, matcher: M, body: String)
pub async fn mount_sse_once_match<M>(server: &MockServer, matcher: M, body: String)
where
M: wiremock::Match + Send + Sync + 'static,
{
@@ -127,6 +154,23 @@ where
.await;
}
pub async fn mount_sse_once(server: &MockServer, body: String) {
Mock::given(method("POST"))
.and(path("/v1/responses"))
.respond_with(sse_response(body))
.expect(1)
.mount(server)
.await;
}
pub async fn mount_sse(server: &MockServer, body: String) {
Mock::given(method("POST"))
.and(path("/v1/responses"))
.respond_with(sse_response(body))
.mount(server)
.await;
}
pub async fn start_mock_server() -> MockServer {
MockServer::builder()
.body_print_limit(BodyPrintLimit::Limited(80_000))

View File

@@ -4,7 +4,7 @@ use codex_core::protocol::EventMsg;
use codex_core::protocol::InputItem;
use codex_core::protocol::Op;
use core_test_support::responses::ev_function_call;
use core_test_support::responses::mount_sse_once;
use core_test_support::responses::mount_sse_once_match;
use core_test_support::responses::sse;
use core_test_support::responses::start_mock_server;
use core_test_support::test_codex::test_codex;
@@ -30,7 +30,7 @@ async fn interrupt_long_running_tool_emits_turn_aborted() {
let body = sse(vec![ev_function_call("call_sleep", "shell", &args)]);
let server = start_mock_server().await;
mount_sse_once(&server, body_string_contains("start sleep"), body).await;
mount_sse_once_match(&server, body_string_contains("start sleep"), body).await;
let codex = test_codex().build(&server).await.unwrap().codex;

View File

@@ -16,6 +16,8 @@ use codex_core::built_in_model_providers;
use codex_core::protocol::EventMsg;
use codex_core::protocol::InputItem;
use codex_core::protocol::Op;
use codex_otel::otel_event_manager::OtelEventManager;
use codex_protocol::mcp_protocol::AuthMode;
use codex_protocol::mcp_protocol::ConversationId;
use codex_protocol::models::ReasoningItemReasoningSummary;
use codex_protocol::models::WebSearchAction;
@@ -664,13 +666,26 @@ async fn azure_responses_request_includes_store_and_reasoning_ids() {
let summary = config.model_reasoning_summary;
let config = Arc::new(config);
let conversation_id = ConversationId::new();
let otel_event_manager = OtelEventManager::new(
conversation_id,
config.model.as_str(),
config.model_family.slug.as_str(),
None,
Some(AuthMode::ChatGPT),
false,
"test".to_string(),
);
let client = ModelClient::new(
Arc::clone(&config),
None,
otel_event_manager,
provider,
effort,
summary,
ConversationId::new(),
conversation_id,
);
let mut prompt = Prompt::default();

View File

@@ -25,7 +25,7 @@ use core_test_support::responses::ev_assistant_message;
use core_test_support::responses::ev_completed;
use core_test_support::responses::ev_completed_with_tokens;
use core_test_support::responses::ev_function_call;
use core_test_support::responses::mount_sse_once;
use core_test_support::responses::mount_sse_once_match;
use core_test_support::responses::sse;
use core_test_support::responses::sse_response;
use core_test_support::responses::start_mock_server;
@@ -79,19 +79,19 @@ async fn summarize_context_three_requests_and_instructions() {
body.contains("\"text\":\"hello world\"")
&& !body.contains("You have exceeded the maximum number of tokens")
};
mount_sse_once(&server, first_matcher, sse1).await;
mount_sse_once_match(&server, first_matcher, sse1).await;
let second_matcher = |req: &wiremock::Request| {
let body = std::str::from_utf8(&req.body).unwrap_or("");
body.contains("You have exceeded the maximum number of tokens")
};
mount_sse_once(&server, second_matcher, sse2).await;
mount_sse_once_match(&server, second_matcher, sse2).await;
let third_matcher = |req: &wiremock::Request| {
let body = std::str::from_utf8(&req.body).unwrap_or("");
body.contains(&format!("\"text\":\"{THIRD_USER_MSG}\""))
};
mount_sse_once(&server, third_matcher, sse3).await;
mount_sse_once_match(&server, third_matcher, sse3).await;
// Build config pointing to the mock server and spawn Codex.
let model_provider = ModelProviderInfo {

View File

@@ -25,7 +25,7 @@ use codex_core::spawn::CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR;
use core_test_support::load_default_config_for_test;
use core_test_support::responses::ev_assistant_message;
use core_test_support::responses::ev_completed;
use core_test_support::responses::mount_sse_once;
use core_test_support::responses::mount_sse_once_match;
use core_test_support::responses::sse;
use core_test_support::wait_for_event;
use pretty_assertions::assert_eq;
@@ -702,13 +702,13 @@ async fn mount_initial_flow(server: &MockServer) {
&& !body.contains("\"text\":\"AFTER_RESUME\"")
&& !body.contains("\"text\":\"AFTER_FORK\"")
};
mount_sse_once(server, match_first, sse1).await;
mount_sse_once_match(server, match_first, sse1).await;
let match_compact = |req: &wiremock::Request| {
let body = std::str::from_utf8(&req.body).unwrap_or("");
body.contains("You have exceeded the maximum number of tokens")
};
mount_sse_once(server, match_compact, sse2).await;
mount_sse_once_match(server, match_compact, sse2).await;
let match_after_compact = |req: &wiremock::Request| {
let body = std::str::from_utf8(&req.body).unwrap_or("");
@@ -716,19 +716,19 @@ async fn mount_initial_flow(server: &MockServer) {
&& !body.contains("\"text\":\"AFTER_RESUME\"")
&& !body.contains("\"text\":\"AFTER_FORK\"")
};
mount_sse_once(server, match_after_compact, sse3).await;
mount_sse_once_match(server, match_after_compact, sse3).await;
let match_after_resume = |req: &wiremock::Request| {
let body = std::str::from_utf8(&req.body).unwrap_or("");
body.contains("\"text\":\"AFTER_RESUME\"")
};
mount_sse_once(server, match_after_resume, sse4).await;
mount_sse_once_match(server, match_after_resume, sse4).await;
let match_after_fork = |req: &wiremock::Request| {
let body = std::str::from_utf8(&req.body).unwrap_or("");
body.contains("\"text\":\"AFTER_FORK\"")
};
mount_sse_once(server, match_after_fork, sse5).await;
mount_sse_once_match(server, match_after_fork, sse5).await;
}
async fn mount_second_compact_flow(server: &MockServer) {
@@ -743,13 +743,13 @@ async fn mount_second_compact_flow(server: &MockServer) {
body.contains("You have exceeded the maximum number of tokens")
&& body.contains("AFTER_FORK")
};
mount_sse_once(server, match_second_compact, sse6).await;
mount_sse_once_match(server, match_second_compact, sse6).await;
let match_after_second_resume = |req: &wiremock::Request| {
let body = std::str::from_utf8(&req.body).unwrap_or("");
body.contains(&format!("\"text\":\"{AFTER_SECOND_RESUME}\""))
};
mount_sse_once(server, match_after_second_resume, sse7).await;
mount_sse_once_match(server, match_after_second_resume, sse7).await;
}
async fn start_test_conversation(

View File

@@ -67,7 +67,7 @@ async fn codex_returns_json_result(model: String) -> anyhow::Result<()> {
&& format.get("strict") == Some(&serde_json::Value::Bool(true))
&& format.get("schema") == Some(&expected_schema)
};
responses::mount_sse_once(&server, match_json_text_param, sse1).await;
responses::mount_sse_once_match(&server, match_json_text_param, sse1).await;
let TestCodex { codex, cwd, .. } = test_codex().build(&server).await?;

View File

@@ -12,6 +12,7 @@ mod fork_conversation;
mod json_result;
mod live_cli;
mod model_overrides;
mod otel;
mod prompt_caching;
mod review;
mod rmcp_client;

File diff suppressed because it is too large Load Diff

View File

@@ -12,7 +12,7 @@ use codex_core::protocol::Op;
use codex_core::protocol::SandboxPolicy;
use codex_protocol::config_types::ReasoningSummary;
use core_test_support::responses;
use core_test_support::responses::mount_sse_once;
use core_test_support::responses::mount_sse_once_match;
use core_test_support::skip_if_no_network;
use core_test_support::test_codex::test_codex;
use core_test_support::wait_for_event;
@@ -36,7 +36,7 @@ async fn stdio_server_round_trip() -> anyhow::Result<()> {
let server_name = "rmcp";
let tool_name = format!("{server_name}__echo");
mount_sse_once(
mount_sse_once_match(
&server,
any(),
responses::sse(vec![
@@ -49,7 +49,7 @@ async fn stdio_server_round_trip() -> anyhow::Result<()> {
]),
)
.await;
mount_sse_once(
mount_sse_once_match(
&server,
any(),
responses::sse(vec![
@@ -173,7 +173,7 @@ async fn streamable_http_tool_call_round_trip() -> anyhow::Result<()> {
let server_name = "rmcp_http";
let tool_name = format!("{server_name}__echo");
mount_sse_once(
mount_sse_once_match(
&server,
any(),
responses::sse(vec![
@@ -186,7 +186,7 @@ async fn streamable_http_tool_call_round_trip() -> anyhow::Result<()> {
]),
)
.await;
mount_sse_once(
mount_sse_once_match(
&server,
any(),
responses::sse(vec![

View File

@@ -28,7 +28,7 @@ async fn summarize_context_three_requests_and_instructions() -> anyhow::Result<(
let sse1 = sse(vec![ev_assistant_message("m1", "Done"), ev_completed("r1")]);
responses::mount_sse_once(&server, any(), sse1).await;
responses::mount_sse_once_match(&server, any(), sse1).await;
let notify_dir = TempDir::new()?;
// write a script to the notify that touches a file next to it