fix: Handle finish_reason 'length' to prevent hang when hitting max_tokens

When the response hits the max_tokens limit, the API returns finish_reason="length". Previously, this fell into the catch-all case which didn't emit pending items, causing llmx to hang with "working" status. Now: - Handle "length" the same as "stop" - emit assistant_item and reasoning_item - Also made catch-all case defensive: emit pending items for any unknown finish_reason 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
fix: Emit pending items before Completed when stream closes gracefully
2025-11-17 18:51:48 +01:00 · 2025-11-17 18:31:24 +01:00
1 changed files with 21 additions and 5 deletions
--- a/llmx-rs/core/src/chat_completions.rs
+++ b/llmx-rs/core/src/chat_completions.rs
@@ -645,7 +645,14 @@ async fn process_chat_sse<S>(
                return;
            }
            Ok(None) => {
-                // Stream closed gracefully – emit Completed with dummy id.
+                // Stream closed gracefully – emit any pending items first, then Completed
+                debug!("Stream closed gracefully (Ok(None)), emitting pending items");
+                if let Some(item) = assistant_item.take() {
+                    let _ = tx_event.send(Ok(ResponseEvent::OutputItemDone(item))).await;
+                }
+                if let Some(item) = reasoning_item.take() {
+                    let _ = tx_event.send(Ok(ResponseEvent::OutputItemDone(item))).await;
+                }
                let _ = tx_event
                    .send(Ok(ResponseEvent::Completed {
                        response_id: String::new(),
@@ -860,9 +867,9 @@ async fn process_chat_sse<S>(

                        let _ = tx_event.send(Ok(ResponseEvent::OutputItemDone(item))).await;
                    }
-                    "stop" => {
-                        // Regular turn without tool-call. Emit the final assistant message
-                        // as a single OutputItemDone so non-delta consumers see the result.
+                    "stop" | "length" => {
+                        // Regular turn without tool-call, or hit max_tokens limit.
+                        // Emit the final assistant message as a single OutputItemDone so non-delta consumers see the result.
                        if let Some(item) = assistant_item.take() {
                            let _ = tx_event.send(Ok(ResponseEvent::OutputItemDone(item))).await;
                        }
@@ -871,7 +878,16 @@ async fn process_chat_sse<S>(
                            let _ = tx_event.send(Ok(ResponseEvent::OutputItemDone(item))).await;
                        }
                    }
-                    _ => {}
+                    _ => {
+                        // Unknown finish_reason - still emit pending items to avoid hanging
+                        debug!("Unknown finish_reason: {}, emitting pending items", finish_reason);
+                        if let Some(item) = assistant_item.take() {
+                            let _ = tx_event.send(Ok(ResponseEvent::OutputItemDone(item))).await;
+                        }
+                        if let Some(item) = reasoning_item.take() {
+                            let _ = tx_event.send(Ok(ResponseEvent::OutputItemDone(item))).await;
+                        }
+                    }
                }

                // Emit Completed regardless of reason so the agent can advance.
Author	SHA1	Message	Date
Sebastian Krüger	0841ba05a8	fix: Handle finish_reason 'length' to prevent hang when hitting max_tokens Some checks failed ci / build-test (push) Failing after 4m52s Details Codespell / Check for spelling errors (push) Successful in 5s Details sdk / sdks (push) Successful in 11m18s Details rust-ci / Detect changed areas (push) Has been cancelled Details rust-ci / Format / etc (push) Has been cancelled Details rust-ci / cargo shear (push) Has been cancelled Details rust-ci / Lint/Build — macos-14 - aarch64-apple-darwin (push) Has been cancelled Details rust-ci / Lint/Build — macos-14 - x86_64-apple-darwin (push) Has been cancelled Details rust-ci / Lint/Build — ubuntu-24.04 - x86_64-unknown-linux-gnu (push) Has been cancelled Details rust-ci / Lint/Build — ubuntu-24.04 - x86_64-unknown-linux-musl (push) Has been cancelled Details rust-ci / Lint/Build — ubuntu-24.04-arm - aarch64-unknown-linux-gnu (push) Has been cancelled Details rust-ci / Lint/Build — ubuntu-24.04-arm - aarch64-unknown-linux-musl (push) Has been cancelled Details rust-ci / Lint/Build — windows-11-arm - aarch64-pc-windows-msvc (push) Has been cancelled Details rust-ci / Lint/Build — windows-latest - x86_64-pc-windows-msvc (push) Has been cancelled Details rust-ci / Lint/Build — macos-14 - aarch64-apple-darwin (release) (push) Has been cancelled Details rust-ci / Lint/Build — ubuntu-24.04 - x86_64-unknown-linux-musl (release) (push) Has been cancelled Details rust-ci / Lint/Build — windows-11-arm - aarch64-pc-windows-msvc (release) (push) Has been cancelled Details rust-ci / Lint/Build — windows-latest - x86_64-pc-windows-msvc (release) (push) Has been cancelled Details rust-ci / Tests — macos-14 - aarch64-apple-darwin (push) Has been cancelled Details rust-ci / Tests — ubuntu-24.04 - x86_64-unknown-linux-gnu (push) Has been cancelled Details rust-ci / Tests — ubuntu-24.04-arm - aarch64-unknown-linux-gnu (push) Has been cancelled Details rust-ci / Tests — windows-11-arm - aarch64-pc-windows-msvc (push) Has been cancelled Details rust-ci / Tests — windows-latest - x86_64-pc-windows-msvc (push) Has been cancelled Details rust-ci / CI results (required) (push) Has been cancelled Details When the response hits the max_tokens limit, the API returns finish_reason="length". Previously, this fell into the catch-all case which didn't emit pending items, causing llmx to hang with "working" status. Now: - Handle "length" the same as "stop" - emit assistant_item and reasoning_item - Also made catch-all case defensive: emit pending items for any unknown finish_reason 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-17 18:51:48 +01:00
Sebastian Krüger	44dc7a3bed	fix: Emit pending items before Completed when stream closes gracefully - When SSE stream closes with Ok(None), now emits pending assistant_item and reasoning_item BEFORE sending Completed event - Previously would send Completed immediately without emitting accumulated messages, causing UI to hang with "working" state - This fixes the hang when API returns 200 OK but SSE stream has no events - Added debug logging for graceful stream closure 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-17 18:31:24 +01:00