chore: Bump version to 0.1.8

This release includes two critical fixes: 1. fix: Accept '*** Create File:' as alias for '*** Add File:' in patch parser - Claude sometimes uses 'Create File' syntax instead of 'Add File' - Parser now accepts both markers to prevent validation failures - Updated error message to include both valid syntaxes 2. fix: Increase default max_tokens from 8192 to 20480 - Claude Sonnet 4.5 was getting cut off mid-task - New default is 5 * 4096 = 20480 tokens - Claude Sonnet 4.5 supports up to 64K tokens - Gives Claude enough space to complete comprehensive tasks 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
debug: Add extensive logging for finish_reason handling
2025-11-17 20:50:54 +01:00 · 2025-11-17 19:27:07 +01:00 · 2025-11-17 18:51:48 +01:00 · 2025-11-17 18:31:24 +01:00 · 2025-11-17 18:23:12 +01:00 · 2025-11-17 18:15:24 +01:00
28 changed files with 1021 additions and 744 deletions
--- a/.github/workflows/rust-release.yml
+++ b/.github/workflows/rust-release.yml
@@ -476,7 +476,7 @@ jobs:
          tag: ${{ github.ref_name }}
          config: .github/dotslash-config.json

-  # Publish to npm using authentication token
+  # Publish to npm using Trusted Publishers (OIDC)
  publish-npm:
    # Publish to npm for stable releases and alpha pre-releases with numeric suffixes.
    if: ${{ needs.release.outputs.should_publish_npm == 'true' }}
@@ -485,6 +485,7 @@ jobs:
    runs-on: ubuntu-latest
    permissions:
      contents: read
+      id-token: write  # Required for OIDC authentication

    steps:
      - name: Setup Node.js
@@ -492,6 +493,10 @@ jobs:
        with:
          node-version: 22
          registry-url: "https://registry.npmjs.org"
+          scope: "@valknarthing"
+
+      - name: Update npm
+        run: npm install -g npm@latest

      - name: Download npm tarballs from release
        env:
@@ -511,10 +516,6 @@ jobs:
          VERSION: ${{ needs.release.outputs.version }}
          NPM_TAG: ${{ needs.release.outputs.npm_tag }}
        run: |
-          # Write auth token to the .npmrc file that setup-node created
-          echo "//registry.npmjs.org/:_authToken=${{ secrets.NPM_TOKEN }}" >> ${NPM_CONFIG_USERCONFIG}
-
-
          set -euo pipefail
          tag_args=()
          if [[ -n "${NPM_TAG}" ]]; then
@@ -526,24 +527,24 @@ jobs:
          )

          for tarball in "${tarballs[@]}"; do
-            npm publish "${GITHUB_WORKSPACE}/dist/npm/${tarball}" --access public "${tag_args[@]}"
+            npm publish "${GITHUB_WORKSPACE}/dist/npm/${tarball}" --provenance --access public "${tag_args[@]}"
          done

-  update-branch:
-    name: Update latest-alpha-cli branch
-    permissions:
-      contents: write
-    needs: release
-    runs-on: ubuntu-latest
-
-    steps:
-      - name: Update latest-alpha-cli branch
-        env:
-          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        run: |
-          set -euo pipefail
-          gh api \
-            repos/${GITHUB_REPOSITORY}/git/refs/heads/latest-alpha-cli \
-            -X PATCH \
-            -f sha="${GITHUB_SHA}" \
-            -F force=true
+  # update-branch:
+  #   name: Update latest-alpha-cli branch
+  #   permissions:
+  #     contents: write
+  #   needs: release
+  #   runs-on: ubuntu-latest
+  #
+  #   steps:
+  #     - name: Update latest-alpha-cli branch
+  #       env:
+  #         GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+  #       run: |
+  #         set -euo pipefail
+  #         gh api \
+  #           repos/${GITHUB_REPOSITORY}/git/refs/heads/latest-alpha-cli \
+  #           -X PATCH \
+  #           -f sha="${GITHUB_SHA}" \
+  #           -F force=true
--- a/README.md
+++ b/README.md
@@ -47,56 +47,56 @@ LLMX is powered by [LiteLLM](https://docs.litellm.ai/), which provides access to

 ```bash
 # Set your LiteLLM server URL (default: http://localhost:4000/v1)
-export LITELLM_BASE_URL="http://localhost:4000/v1"
-export LITELLM_API_KEY="your-api-key"
+export LLMX_BASE_URL="http://localhost:4000/v1"
+export LLMX_API_KEY="your-api-key"

 # Run LLMX
 llmx "hello world"
 ```

-**Configuration:** See [LITELLM-SETUP.md](./LITELLM-SETUP.md) for detailed setup instructions.
+**Configuration:** See [LITELLM-SETUP.md](https://github.com/valknarthing/llmx/blob/main/LITELLM-SETUP.md) for detailed setup instructions.

-You can also use LLMX with ChatGPT or OpenAI API keys. For authentication options, see the [authentication docs](./docs/authentication.md).
+You can also use LLMX with ChatGPT or OpenAI API keys. For authentication options, see the [authentication docs](https://github.com/valknarthing/llmx/blob/main/docs/authentication.md).

 ### Model Context Protocol (MCP)

-LLMX can access MCP servers. To configure them, refer to the [config docs](./docs/config.md#mcp_servers).
+LLMX can access MCP servers. To configure them, refer to the [config docs](https://github.com/valknarthing/llmx/blob/main/docs/config.md#mcp_servers).

 ### Configuration

-LLMX CLI supports a rich set of configuration options, with preferences stored in `~/.llmx/config.toml`. For full configuration options, see [Configuration](./docs/config.md).
+LLMX CLI supports a rich set of configuration options, with preferences stored in `~/.llmx/config.toml`. For full configuration options, see [Configuration](https://github.com/valknarthing/llmx/blob/main/docs/config.md).

 ---

 ### Docs & FAQ

- [**Getting started**](./docs/getting-started.md)
-  - [CLI usage](./docs/getting-started.md#cli-usage)
-  - [Slash Commands](./docs/slash_commands.md)
-  - [Running with a prompt as input](./docs/getting-started.md#running-with-a-prompt-as-input)
-  - [Example prompts](./docs/getting-started.md#example-prompts)
-  - [Custom prompts](./docs/prompts.md)
-  - [Memory with AGENTS.md](./docs/getting-started.md#memory-with-agentsmd)
- [**Configuration**](./docs/config.md)
-  - [Example config](./docs/example-config.md)
- [**Sandbox & approvals**](./docs/sandbox.md)
- [**Authentication**](./docs/authentication.md)
-  - [Auth methods](./docs/authentication.md#forcing-a-specific-auth-method-advanced)
-  - [Login on a "Headless" machine](./docs/authentication.md#connecting-on-a-headless-machine)
+- [**Getting started**](https://github.com/valknarthing/llmx/blob/main/docs/getting-started.md)
+  - [CLI usage](https://github.com/valknarthing/llmx/blob/main/docs/getting-started.md#cli-usage)
+  - [Slash Commands](https://github.com/valknarthing/llmx/blob/main/docs/slash_commands.md)
+  - [Running with a prompt as input](https://github.com/valknarthing/llmx/blob/main/docs/getting-started.md#running-with-a-prompt-as-input)
+  - [Example prompts](https://github.com/valknarthing/llmx/blob/main/docs/getting-started.md#example-prompts)
+  - [Custom prompts](https://github.com/valknarthing/llmx/blob/main/docs/prompts.md)
+  - [Memory with AGENTS.md](https://github.com/valknarthing/llmx/blob/main/docs/getting-started.md#memory-with-agentsmd)
+- [**Configuration**](https://github.com/valknarthing/llmx/blob/main/docs/config.md)
+  - [Example config](https://github.com/valknarthing/llmx/blob/main/docs/example-config.md)
+- [**Sandbox & approvals**](https://github.com/valknarthing/llmx/blob/main/docs/sandbox.md)
+- [**Authentication**](https://github.com/valknarthing/llmx/blob/main/docs/authentication.md)
+  - [Auth methods](https://github.com/valknarthing/llmx/blob/main/docs/authentication.md#forcing-a-specific-auth-method-advanced)
+  - [Login on a "Headless" machine](https://github.com/valknarthing/llmx/blob/main/docs/authentication.md#connecting-on-a-headless-machine)
 - **Automating LLMX**
  - [GitHub Action](https://github.com/valknarthing/llmx-action)
-  - [TypeScript SDK](./sdk/typescript/README.md)
-  - [Non-interactive mode (`llmx exec`)](./docs/exec.md)
- [**Advanced**](./docs/advanced.md)
-  - [Tracing / verbose logging](./docs/advanced.md#tracing--verbose-logging)
-  - [Model Context Protocol (MCP)](./docs/advanced.md#model-context-protocol-mcp)
- [**Zero data retention (ZDR)**](./docs/zdr.md)
- [**Contributing**](./docs/contributing.md)
- [**Install & build**](./docs/install.md)
-  - [System Requirements](./docs/install.md#system-requirements)
-  - [DotSlash](./docs/install.md#dotslash)
-  - [Build from source](./docs/install.md#build-from-source)
- [**FAQ**](./docs/faq.md)
+  - [TypeScript SDK](https://github.com/valknarthing/llmx/blob/main/sdk/typescript/README.md)
+  - [Non-interactive mode (`llmx exec`)](https://github.com/valknarthing/llmx/blob/main/docs/exec.md)
+- [**Advanced**](https://github.com/valknarthing/llmx/blob/main/docs/advanced.md)
+  - [Tracing / verbose logging](https://github.com/valknarthing/llmx/blob/main/docs/advanced.md#tracing--verbose-logging)
+  - [Model Context Protocol (MCP)](https://github.com/valknarthing/llmx/blob/main/docs/advanced.md#model-context-protocol-mcp)
+- [**Zero data retention (ZDR)**](https://github.com/valknarthing/llmx/blob/main/docs/zdr.md)
+- [**Contributing**](https://github.com/valknarthing/llmx/blob/main/docs/contributing.md)
+- [**Install & build**](https://github.com/valknarthing/llmx/blob/main/docs/install.md)
+  - [System Requirements](https://github.com/valknarthing/llmx/blob/main/docs/install.md#system-requirements)
+  - [DotSlash](https://github.com/valknarthing/llmx/blob/main/docs/install.md#dotslash)
+  - [Build from source](https://github.com/valknarthing/llmx/blob/main/docs/install.md#build-from-source)
+- [**FAQ**](https://github.com/valknarthing/llmx/blob/main/docs/faq.md)

 ---

--- a/llmx-cli/package.json
+++ b/llmx-cli/package.json
@@ -1,6 +1,6 @@
 {
  "name": "@valknarthing/llmx",
-  "version": "0.1.1",
+  "version": "0.1.2",
  "license": "Apache-2.0",
  "description": "LLMX CLI - Multi-provider coding agent powered by LiteLLM",
  "bin": {
--- a/llmx-rs/Cargo.lock
+++ b/llmx-rs/Cargo.lock
--- a/llmx-rs/Cargo.toml
+++ b/llmx-rs/Cargo.toml
@@ -43,7 +43,7 @@ members = [
 resolver = "2"

 [workspace.package]
-version = "0.1.1"
+version = "0.1.8"
 # Track the edition for all workspace crates in one place. Individual
 # crates can still override this value, but keeping it here means new
 # crates created with `cargo new -w ...` automatically inherit the 2024
--- a/llmx-rs/app-server/tests/common/mcp_process.rs
+++ b/llmx-rs/app-server/tests/common/mcp_process.rs
@@ -138,7 +138,7 @@ impl McpProcess {
            client_info: ClientInfo {
                name: "llmx-app-server-tests".to_string(),
                title: None,
-                version: "0.1.1".to_string(),
+                version: "0.1.7".to_string(),
            },
        })?);
        let req_id = self.send_request("initialize", params).await?;
--- a/llmx-rs/app-server/tests/suite/user_agent.rs
+++ b/llmx-rs/app-server/tests/suite/user_agent.rs
@@ -26,7 +26,7 @@ async fn get_user_agent_returns_current_llmx_user_agent() -> Result<()> {

    let os_info = os_info::get();
    let user_agent = format!(
-        "llmx_cli_rs/0.1.1 ({} {}; {}) {} (llmx-app-server-tests; 0.1.1)",
+        "llmx_cli_rs/0.1.7 ({} {}; {}) {} (llmx-app-server-tests; 0.1.7)",
        os_info.os_type(),
        os_info.version(),
        os_info.architecture().unwrap_or("unknown"),
--- a/llmx-rs/apply-patch/src/parser.rs
+++ b/llmx-rs/apply-patch/src/parser.rs
@@ -31,6 +31,7 @@ use thiserror::Error;
 const BEGIN_PATCH_MARKER: &str = "*** Begin Patch";
 const END_PATCH_MARKER: &str = "*** End Patch";
 const ADD_FILE_MARKER: &str = "*** Add File: ";
+const CREATE_FILE_MARKER: &str = "*** Create File: "; // Alias for Add File
 const DELETE_FILE_MARKER: &str = "*** Delete File: ";
 const UPDATE_FILE_MARKER: &str = "*** Update File: ";
 const MOVE_TO_MARKER: &str = "*** Move to: ";
@@ -245,8 +246,8 @@ fn check_start_and_end_lines_strict(
 fn parse_one_hunk(lines: &[&str], line_number: usize) -> Result<(Hunk, usize), ParseError> {
    // Be tolerant of case mismatches and extra padding around marker strings.
    let first_line = lines[0].trim();
-    if let Some(path) = first_line.strip_prefix(ADD_FILE_MARKER) {
-        // Add File
+    if let Some(path) = first_line.strip_prefix(ADD_FILE_MARKER).or_else(|| first_line.strip_prefix(CREATE_FILE_MARKER)) {
+        // Add File (also accepts Create File as alias)
        let mut contents = String::new();
        let mut parsed_lines = 1;
        for add_line in &lines[1..] {
@@ -331,7 +332,7 @@ fn parse_one_hunk(lines: &[&str], line_number: usize) -> Result<(Hunk, usize), P

    Err(InvalidHunkError {
        message: format!(
-            "'{first_line}' is not a valid hunk header. Valid hunk headers: '*** Add File: {{path}}', '*** Delete File: {{path}}', '*** Update File: {{path}}'"
+            "'{first_line}' is not a valid hunk header. Valid hunk headers: '*** Add File: {{path}}', '*** Create File: {{path}}', '*** Delete File: {{path}}', '*** Update File: {{path}}'"
        ),
        line_number,
    })
--- a/llmx-rs/core/src/chat_completions.rs
+++ b/llmx-rs/core/src/chat_completions.rs
@@ -56,7 +56,12 @@ pub(crate) async fn stream_chat_completions(
    let mut messages = Vec::<serde_json::Value>::new();

    let full_instructions = prompt.get_full_instructions(model_family);
-    messages.push(json!({"role": "system", "content": full_instructions}));
+    // Add cache_control to system instructions for Anthropic prompt caching
+    messages.push(json!({
+        "role": "system",
+        "content": full_instructions,
+        "cache_control": {"type": "ephemeral"}
+    }));

    let input = prompt.get_formatted_input();

@@ -161,7 +166,65 @@ pub(crate) async fn stream_chat_completions(
    // aggregated assistant message was recorded alongside an earlier partial).
    let mut last_assistant_text: Option<String> = None;

+    // Build a map of which call_ids have outputs
+    // We'll use this to ensure we never send a FunctionCall without its corresponding output
+    let mut call_ids_with_outputs: std::collections::HashSet<String> = std::collections::HashSet::new();
+
+    // First pass: collect all call_ids that have outputs
+    for item in input.iter() {
+        if let ResponseItem::FunctionCallOutput { call_id, .. } = item {
+            call_ids_with_outputs.insert(call_id.clone());
+        }
+    }
+
+    debug!("=== Chat Completions Request Debug ===");
+    debug!("Input items count: {}", input.len());
+    debug!("Call IDs with outputs: {:?}", call_ids_with_outputs);
+
+    // Second pass: find the first FunctionCall that doesn't have an output
+    let mut cutoff_at_idx: Option<usize> = None;
    for (idx, item) in input.iter().enumerate() {
+        if let ResponseItem::FunctionCall { call_id, name, .. } = item {
+            if !call_ids_with_outputs.contains(call_id) {
+                debug!("Found unanswered function call '{}' (call_id: {}) at index {}", name, call_id, idx);
+                cutoff_at_idx = Some(idx);
+                break;
+            }
+        }
+    }
+
+    if let Some(cutoff) = cutoff_at_idx {
+        debug!("Cutting off at index {} to avoid orphaned tool calls", cutoff);
+    } else {
+        debug!("No unanswered function calls found, processing all items");
+    }
+
+    // Track whether the MOST RECENT FunctionCall with each call_id was skipped
+    // This allows the same call_id to be retried - we only skip outputs for the specific skipped calls
+    let mut call_id_skip_state: std::collections::HashMap<String, bool> = std::collections::HashMap::new();
+
+    for (idx, item) in input.iter().enumerate() {
+        // Stop processing if we've reached an unanswered function call
+        if let Some(cutoff) = cutoff_at_idx {
+            if idx >= cutoff {
+                debug!("Stopping at index {} due to unanswered function call", idx);
+                break;
+            }
+        }
+
+        debug!("Processing item {} of type: {}", idx, match item {
+            ResponseItem::Message { role, .. } => format!("Message(role={})", role),
+            ResponseItem::FunctionCall { name, call_id, .. } => format!("FunctionCall(name={}, call_id={})", name, call_id),
+            ResponseItem::FunctionCallOutput { call_id, .. } => format!("FunctionCallOutput(call_id={})", call_id),
+            ResponseItem::LocalShellCall { .. } => "LocalShellCall".to_string(),
+            ResponseItem::CustomToolCall { .. } => "CustomToolCall".to_string(),
+            ResponseItem::CustomToolCallOutput { .. } => "CustomToolCallOutput".to_string(),
+            ResponseItem::Reasoning { .. } => "Reasoning".to_string(),
+            ResponseItem::WebSearchCall { .. } => "WebSearchCall".to_string(),
+            ResponseItem::GhostSnapshot { .. } => "GhostSnapshot".to_string(),
+            ResponseItem::Other => "Other".to_string(),
+        });
+
        match item {
            ResponseItem::Message { role, content, .. } => {
                // Build content either as a plain string (typical for assistant text)
@@ -175,7 +238,10 @@ pub(crate) async fn stream_chat_completions(
                        ContentItem::InputText { text: t }
                        | ContentItem::OutputText { text: t } => {
                            text.push_str(t);
-                            items.push(json!({"type":"text","text": t}));
+                            // Only add text content blocks that are non-empty
+                            if !t.trim().is_empty() {
+                                items.push(json!({"type":"text","text": t}));
+                            }
                        }
                        ContentItem::InputImage { image_url } => {
                            saw_image = true;
@@ -184,6 +250,11 @@ pub(crate) async fn stream_chat_completions(
                    }
                }

+                // Skip messages with empty or whitespace-only text content (unless they contain images)
+                if text.trim().is_empty() && !saw_image {
+                    continue;
+                }
+
                // Skip exact-duplicate assistant messages.
                if role == "assistant" {
                    if let Some(prev) = &last_assistant_text
@@ -219,6 +290,18 @@ pub(crate) async fn stream_chat_completions(
                call_id,
                ..
            } => {
+                // Validate that arguments is valid JSON before sending to API
+                // If invalid, skip this function call to avoid API errors
+                if serde_json::from_str::<serde_json::Value>(arguments).is_err() {
+                    debug!("Skipping malformed function call with invalid JSON arguments: {}", arguments);
+                    // Mark this call_id's most recent state as skipped
+                    call_id_skip_state.insert(call_id.clone(), true);
+                    continue;
+                }
+
+                // Mark this call_id's most recent state as NOT skipped (valid call)
+                call_id_skip_state.insert(call_id.clone(), false);
+
                let mut msg = json!({
                    "role": "assistant",
                    "content": null,
@@ -263,6 +346,12 @@ pub(crate) async fn stream_chat_completions(
                messages.push(msg);
            }
            ResponseItem::FunctionCallOutput { call_id, output } => {
+                // Skip outputs only if the MOST RECENT FunctionCall with this call_id was skipped
+                if call_id_skip_state.get(call_id) == Some(&true) {
+                    debug!("Skipping function call output for most recent skipped call_id: {}", call_id);
+                    continue;
+                }
+
                // Prefer structured content items when available (e.g., images)
                // otherwise fall back to the legacy plain-string content.
                let content_value = if let Some(items) = &output.content_items {
@@ -328,14 +417,39 @@ pub(crate) async fn stream_chat_completions(
        }
    }

+    debug!("Built {} messages for API request", messages.len());
+
+    // Add cache_control to conversation history for Anthropic prompt caching
+    // Add it to a message that's at least 3 messages before the end (stable history)
+    // This caches the earlier conversation while keeping recent turns uncached
+    if messages.len() > 4 {
+        let cache_idx = messages.len().saturating_sub(4);
+        if let Some(msg) = messages.get_mut(cache_idx) {
+            if let Some(obj) = msg.as_object_mut() {
+                obj.insert("cache_control".to_string(), json!({"type": "ephemeral"}));
+                debug!("Added cache_control to message at index {} (conversation history)", cache_idx);
+            }
+        }
+    }
+
+    debug!("=== End Chat Completions Request Debug ===");
+
    let tools_json = create_tools_json_for_chat_completions_api(&prompt.tools)?;
-    let payload = json!({
+    let mut payload = json!({
        "model": model_family.slug,
        "messages": messages,
        "stream": true,
        "tools": tools_json,
    });

+    // Add max_tokens - required by Anthropic Messages API
+    // Use provider config value or default to 20480 (5 * 4096, Claude Sonnet 4.5 supports up to 64K)
+    let max_tokens = provider.max_tokens.unwrap_or(20480);
+    if let Some(obj) = payload.as_object_mut() {
+        obj.insert("max_tokens".to_string(), json!(max_tokens));
+    }
+    debug!("Using max_tokens: {}", max_tokens);
+
    debug!(
        "POST to {}: {}",
        provider.get_full_url(&None),
@@ -496,7 +610,9 @@ async fn process_chat_sse<S>(
 ) where
    S: Stream<Item = Result<Bytes>> + Unpin,
 {
+    debug!("process_chat_sse started, idle_timeout={:?}", idle_timeout);
    let mut stream = stream.eventsource();
+    debug!("SSE stream initialized, waiting for first event");

    // State to accumulate a function call across streaming chunks.
    // OpenAI may split the `arguments` string over multiple `delta` events
@@ -531,7 +647,14 @@ async fn process_chat_sse<S>(
                return;
            }
            Ok(None) => {
-                // Stream closed gracefully – emit Completed with dummy id.
+                // Stream closed gracefully – emit any pending items first, then Completed
+                debug!("Stream closed gracefully (Ok(None)), emitting pending items");
+                if let Some(item) = assistant_item.take() {
+                    let _ = tx_event.send(Ok(ResponseEvent::OutputItemDone(item))).await;
+                }
+                if let Some(item) = reasoning_item.take() {
+                    let _ = tx_event.send(Ok(ResponseEvent::OutputItemDone(item))).await;
+                }
                let _ = tx_event
                    .send(Ok(ResponseEvent::Completed {
                        response_id: String::new(),
@@ -727,6 +850,7 @@ async fn process_chat_sse<S>(

            // Emit end-of-turn when finish_reason signals completion.
            if let Some(finish_reason) = choice.get("finish_reason").and_then(|v| v.as_str()) {
+                debug!("Received finish_reason: {}", finish_reason);
                match finish_reason {
                    "tool_calls" if fn_call_state.active => {
                        // First, flush the terminal raw reasoning so UIs can finalize
@@ -745,27 +869,46 @@ async fn process_chat_sse<S>(

                        let _ = tx_event.send(Ok(ResponseEvent::OutputItemDone(item))).await;
                    }
-                    "stop" => {
-                        // Regular turn without tool-call. Emit the final assistant message
-                        // as a single OutputItemDone so non-delta consumers see the result.
+                    "stop" | "length" => {
+                        // Regular turn without tool-call, or hit max_tokens limit.
+                        debug!("Processing finish_reason={}, assistant_item.is_some()={}, reasoning_item.is_some()={}",
+                            finish_reason, assistant_item.is_some(), reasoning_item.is_some());
+                        // Emit the final assistant message as a single OutputItemDone so non-delta consumers see the result.
+                        if let Some(item) = assistant_item.take() {
+                            debug!("Emitting assistant_item: {:?}", item);
+                            let _ = tx_event.send(Ok(ResponseEvent::OutputItemDone(item))).await;
+                        } else {
+                            debug!("No assistant_item to emit");
+                        }
+                        // Also emit a terminal Reasoning item so UIs can finalize raw reasoning.
+                        if let Some(item) = reasoning_item.take() {
+                            debug!("Emitting reasoning_item");
+                            let _ = tx_event.send(Ok(ResponseEvent::OutputItemDone(item))).await;
+                        } else {
+                            debug!("No reasoning_item to emit");
+                        }
+                    }
+                    _ => {
+                        // Unknown finish_reason - still emit pending items to avoid hanging
+                        debug!("Unknown finish_reason: {}, emitting pending items", finish_reason);
                        if let Some(item) = assistant_item.take() {
                            let _ = tx_event.send(Ok(ResponseEvent::OutputItemDone(item))).await;
                        }
-                        // Also emit a terminal Reasoning item so UIs can finalize raw reasoning.
                        if let Some(item) = reasoning_item.take() {
                            let _ = tx_event.send(Ok(ResponseEvent::OutputItemDone(item))).await;
                        }
                    }
-                    _ => {}
                }

                // Emit Completed regardless of reason so the agent can advance.
+                debug!("Sending Completed event after finish_reason={}", finish_reason);
                let _ = tx_event
                    .send(Ok(ResponseEvent::Completed {
                        response_id: String::new(),
                        token_usage: token_usage.clone(),
                    }))
                    .await;
+                debug!("Completed event sent, returning from SSE processor");

                // Prepare for potential next turn (should not happen in same stream).
                // fn_call_state = FunctionCallState::default();
@@ -774,6 +917,22 @@ async fn process_chat_sse<S>(
            }
        }
    }
+
+    // Stream ended without finish_reason - this can happen when the stream closes abruptly
+    debug!("Stream ended without finish_reason, emitting final items and Completed event");
+    if let Some(item) = assistant_item.take() {
+        let _ = tx_event.send(Ok(ResponseEvent::OutputItemDone(item))).await;
+    }
+    if let Some(item) = reasoning_item.take() {
+        let _ = tx_event.send(Ok(ResponseEvent::OutputItemDone(item))).await;
+    }
+    // Send Completed event so llmx knows the turn is done
+    let _ = tx_event
+        .send(Ok(ResponseEvent::Completed {
+            response_id: String::new(),
+            token_usage: token_usage.clone(),
+        }))
+        .await;
 }

 /// Optional client-side aggregation helper
--- a/llmx-rs/core/src/client.rs
+++ b/llmx-rs/core/src/client.rs
@@ -1123,6 +1123,7 @@ mod tests {
            request_max_retries: Some(0),
            stream_max_retries: Some(0),
            stream_idle_timeout_ms: Some(1000),
+        max_tokens: None,
            requires_openai_auth: false,
        };

@@ -1187,6 +1188,7 @@ mod tests {
            request_max_retries: Some(0),
            stream_max_retries: Some(0),
            stream_idle_timeout_ms: Some(1000),
+        max_tokens: None,
            requires_openai_auth: false,
        };

@@ -1224,6 +1226,7 @@ mod tests {
            request_max_retries: Some(0),
            stream_max_retries: Some(0),
            stream_idle_timeout_ms: Some(1000),
+        max_tokens: None,
            requires_openai_auth: false,
        };

@@ -1263,6 +1266,7 @@ mod tests {
            request_max_retries: Some(0),
            stream_max_retries: Some(0),
            stream_idle_timeout_ms: Some(1000),
+        max_tokens: None,
            requires_openai_auth: false,
        };

@@ -1298,6 +1302,7 @@ mod tests {
            request_max_retries: Some(0),
            stream_max_retries: Some(0),
            stream_idle_timeout_ms: Some(1000),
+        max_tokens: None,
            requires_openai_auth: false,
        };

@@ -1333,6 +1338,7 @@ mod tests {
            request_max_retries: Some(0),
            stream_max_retries: Some(0),
            stream_idle_timeout_ms: Some(1000),
+        max_tokens: None,
            requires_openai_auth: false,
        };

@@ -1437,6 +1443,7 @@ mod tests {
                request_max_retries: Some(0),
                stream_max_retries: Some(0),
                stream_idle_timeout_ms: Some(1000),
+        max_tokens: None,
                requires_openai_auth: false,
            };

--- a/llmx-rs/core/src/config/mod.rs
+++ b/llmx-rs/core/src/config/mod.rs
@@ -2809,6 +2809,7 @@ model_verbosity = "high"
            request_max_retries: Some(4),
            stream_max_retries: Some(10),
            stream_idle_timeout_ms: Some(300_000),
+        max_tokens: None,
            requires_openai_auth: false,
        };
        let model_provider_map = {
--- a/llmx-rs/core/src/event_mapping.rs
+++ b/llmx-rs/core/src/event_mapping.rs
@@ -54,7 +54,7 @@ fn parse_user_message(message: &[ContentItem]) -> Option<UserMessageItem> {
    Some(UserMessageItem::new(&content))
 }

-fn parse_agent_message(id: Option<&String>, message: &[ContentItem]) -> AgentMessageItem {
+fn parse_agent_message(id: Option<&String>, message: &[ContentItem]) -> Option<AgentMessageItem> {
    let mut content: Vec<AgentMessageContent> = Vec::new();
    for content_item in message.iter() {
        match content_item {
@@ -69,18 +69,23 @@ fn parse_agent_message(id: Option<&String>, message: &[ContentItem]) -> AgentMes
            }
        }
    }
+
+    // If the message has no content, return None to signal turn completion
+    // This happens when the API ends a turn with an empty assistant message (e.g., after tool calls)
+    if content.is_empty() {
+        return None;
+    }
+
    let id = id.cloned().unwrap_or_else(|| Uuid::new_v4().to_string());
-    AgentMessageItem { id, content }
+    Some(AgentMessageItem { id, content })
 }

 pub fn parse_turn_item(item: &ResponseItem) -> Option<TurnItem> {
    match item {
        ResponseItem::Message { role, content, id } => match role.as_str() {
            "user" => parse_user_message(content).map(TurnItem::UserMessage),
-            "assistant" => Some(TurnItem::AgentMessage(parse_agent_message(
-                id.as_ref(),
-                content,
-            ))),
+            "assistant" => parse_agent_message(id.as_ref(), content)
+                .map(TurnItem::AgentMessage),
            "system" => None,
            _ => None,
        },
--- a/llmx-rs/core/src/model_provider_info.rs
+++ b/llmx-rs/core/src/model_provider_info.rs
@@ -87,6 +87,10 @@ pub struct ModelProviderInfo {
    /// the connection as lost.
    pub stream_idle_timeout_ms: Option<u64>,

+    /// Maximum number of tokens to generate in the response. If not specified, defaults to 8192.
+    /// This is required by some providers (e.g., Anthropic via LiteLLM).
+    pub max_tokens: Option<i64>,
+
    /// Does this provider require an OpenAI API Key or ChatGPT login token? If true,
    /// user is presented with login screen on first run, and login preference and token/key
    /// are stored in auth.json. If false (which is the default), login screen is skipped,
@@ -290,6 +294,7 @@ pub fn built_in_model_providers() -> HashMap<String, ModelProviderInfo> {
                request_max_retries: None,
                stream_max_retries: None,
                stream_idle_timeout_ms: None,
+            max_tokens: None,
                requires_openai_auth: false,
            },
        ),
@@ -330,6 +335,7 @@ pub fn built_in_model_providers() -> HashMap<String, ModelProviderInfo> {
                request_max_retries: None,
                stream_max_retries: None,
                stream_idle_timeout_ms: None,
+            max_tokens: None,
                requires_openai_auth: true,
            },
        ),
@@ -375,6 +381,7 @@ pub fn create_oss_provider_with_base_url(base_url: &str) -> ModelProviderInfo {
        request_max_retries: None,
        stream_max_retries: None,
        stream_idle_timeout_ms: None,
+            max_tokens: None,
        requires_openai_auth: false,
    }
 }
@@ -415,6 +422,7 @@ base_url = "http://localhost:11434/v1"
            request_max_retries: None,
            stream_max_retries: None,
            stream_idle_timeout_ms: None,
+            max_tokens: None,
            requires_openai_auth: false,
        };

@@ -445,6 +453,7 @@ query_params = { api-version = "2025-04-01-preview" }
            request_max_retries: None,
            stream_max_retries: None,
            stream_idle_timeout_ms: None,
+            max_tokens: None,
            requires_openai_auth: false,
        };

@@ -478,6 +487,7 @@ env_http_headers = { "X-Example-Env-Header" = "EXAMPLE_ENV_VAR" }
            request_max_retries: None,
            stream_max_retries: None,
            stream_idle_timeout_ms: None,
+            max_tokens: None,
            requires_openai_auth: false,
        };

@@ -501,6 +511,7 @@ env_http_headers = { "X-Example-Env-Header" = "EXAMPLE_ENV_VAR" }
                request_max_retries: None,
                stream_max_retries: None,
                stream_idle_timeout_ms: None,
+            max_tokens: None,
                requires_openai_auth: false,
            }
        }
@@ -534,6 +545,7 @@ env_http_headers = { "X-Example-Env-Header" = "EXAMPLE_ENV_VAR" }
            request_max_retries: None,
            stream_max_retries: None,
            stream_idle_timeout_ms: None,
+            max_tokens: None,
            requires_openai_auth: false,
        };
        assert!(named_provider.is_azure_responses_endpoint());
--- a/llmx-rs/core/src/tools/spec.rs
+++ b/llmx-rs/core/src/tools/spec.rs
@@ -693,7 +693,7 @@ pub(crate) fn create_tools_json_for_chat_completions_api(
    // We start with the JSON for the Responses API and than rewrite it to match
    // the chat completions tool call format.
    let responses_api_tools_json = create_tools_json_for_responses_api(tools)?;
-    let tools_json = responses_api_tools_json
+    let mut tools_json = responses_api_tools_json
        .into_iter()
        .filter_map(|mut tool| {
            if tool.get("type") != Some(&serde_json::Value::String("function".to_string())) {
@@ -712,6 +712,14 @@ pub(crate) fn create_tools_json_for_chat_completions_api(
            }
        })
        .collect::<Vec<serde_json::Value>>();
+
+    // Add cache_control to the last tool to enable Anthropic prompt caching
+    if let Some(last_tool) = tools_json.last_mut() {
+        if let Some(obj) = last_tool.as_object_mut() {
+            obj.insert("cache_control".to_string(), json!({"type": "ephemeral"}));
+        }
+    }
+
    Ok(tools_json)
 }

--- a/llmx-rs/core/tests/chat_completions_payload.rs
+++ b/llmx-rs/core/tests/chat_completions_payload.rs
@@ -58,6 +58,7 @@ async fn run_request(input: Vec<ResponseItem>) -> Value {
        request_max_retries: Some(0),
        stream_max_retries: Some(0),
        stream_idle_timeout_ms: Some(5_000),
+        max_tokens: None,
        requires_openai_auth: false,
    };

--- a/llmx-rs/core/tests/chat_completions_sse.rs
+++ b/llmx-rs/core/tests/chat_completions_sse.rs
@@ -58,6 +58,7 @@ async fn run_stream_with_bytes(sse_body: &[u8]) -> Vec<ResponseEvent> {
        request_max_retries: Some(0),
        stream_max_retries: Some(0),
        stream_idle_timeout_ms: Some(5_000),
+        max_tokens: None,
        requires_openai_auth: false,
    };

--- a/llmx-rs/core/tests/responses_headers.rs
+++ b/llmx-rs/core/tests/responses_headers.rs
@@ -47,6 +47,7 @@ async fn responses_stream_includes_subagent_header_on_review() {
        request_max_retries: Some(0),
        stream_max_retries: Some(0),
        stream_idle_timeout_ms: Some(5_000),
+        max_tokens: None,
        requires_openai_auth: false,
    };

@@ -135,6 +136,7 @@ async fn responses_stream_includes_subagent_header_on_other() {
        request_max_retries: Some(0),
        stream_max_retries: Some(0),
        stream_idle_timeout_ms: Some(5_000),
+        max_tokens: None,
        requires_openai_auth: false,
    };

--- a/llmx-rs/core/tests/suite/client.rs
+++ b/llmx-rs/core/tests/suite/client.rs
@@ -712,6 +712,7 @@ async fn azure_responses_request_includes_store_and_reasoning_ids() {
        request_max_retries: Some(0),
        stream_max_retries: Some(0),
        stream_idle_timeout_ms: Some(5_000),
+        max_tokens: None,
        requires_openai_auth: false,
    };

@@ -1195,6 +1196,7 @@ async fn azure_overrides_assign_properties_used_for_responses_url() {
        request_max_retries: None,
        stream_max_retries: None,
        stream_idle_timeout_ms: None,
+        max_tokens: None,
        requires_openai_auth: false,
    };

@@ -1272,6 +1274,7 @@ async fn env_var_overrides_loaded_auth() {
        request_max_retries: None,
        stream_max_retries: None,
        stream_idle_timeout_ms: None,
+        max_tokens: None,
        requires_openai_auth: false,
    };

--- a/llmx-rs/core/tests/suite/stream_error_allows_next_turn.rs
+++ b/llmx-rs/core/tests/suite/stream_error_allows_next_turn.rs
@@ -72,6 +72,7 @@ async fn continue_after_stream_error() {
        request_max_retries: Some(1),
        stream_max_retries: Some(1),
        stream_idle_timeout_ms: Some(2_000),
+        max_tokens: None,
        requires_openai_auth: false,
    };

--- a/llmx-rs/core/tests/suite/stream_no_completed.rs
+++ b/llmx-rs/core/tests/suite/stream_no_completed.rs
@@ -80,6 +80,7 @@ async fn retries_on_early_close() {
        request_max_retries: Some(0),
        stream_max_retries: Some(1),
        stream_idle_timeout_ms: Some(2000),
+        max_tokens: None,
        requires_openai_auth: false,
    };

--- a/llmx-rs/mcp-server/tests/common/mcp_process.rs
+++ b/llmx-rs/mcp-server/tests/common/mcp_process.rs
@@ -144,7 +144,7 @@ impl McpProcess {
        let initialized = self.read_jsonrpc_message().await?;
        let os_info = os_info::get();
        let user_agent = format!(
-            "llmx_cli_rs/0.1.1 ({} {}; {}) {} (elicitation test; 0.0.0)",
+            "llmx_cli_rs/0.1.7 ({} {}; {}) {} (elicitation test; 0.0.0)",
            os_info.os_type(),
            os_info.version(),
            os_info.architecture().unwrap_or("unknown"),
@@ -163,7 +163,7 @@ impl McpProcess {
                    "serverInfo": {
                        "name": "llmx-mcp-server",
                        "title": "LLMX",
-                        "version": "0.1.1",
+                        "version": "0.1.7",
                        "user_agent": user_agent
                    },
                    "protocolVersion": mcp_types::MCP_SCHEMA_VERSION
--- a/llmx-rs/tui/src/status/snapshots/llmx_tuistatustests__status_snapshot_includes_monthly_limit.snap
+++ b/llmx-rs/tui/src/status/snapshots/llmx_tuistatustests__status_snapshot_includes_monthly_limit.snap
@@ -5,7 +5,7 @@ expression: sanitized
 /status

 ╭───────────────────────────────────────────────────────────────────────────╮
-│  >_ LLMX (v0.1.1)                                                         │
+│  >_ LLMX (v0.1.7)                                                         │
 │                                                                           │
 │ Visit https://chatgpt.com/llmx/settings/usage for up-to-date              │
 │ information on rate limits and credits                                    │
--- a/llmx-rs/tui/src/status/snapshots/llmx_tuistatustests__status_snapshot_includes_reasoning_details.snap
+++ b/llmx-rs/tui/src/status/snapshots/llmx_tuistatustests__status_snapshot_includes_reasoning_details.snap
@@ -5,7 +5,7 @@ expression: sanitized
 /status

 ╭─────────────────────────────────────────────────────────────────╮
-│  >_ LLMX (v0.1.1)                                               │
+│  >_ LLMX (v0.1.7)                                               │
 │                                                                 │
 │ Visit https://chatgpt.com/llmx/settings/usage for up-to-date    │
 │ information on rate limits and credits                          │
--- a/llmx-rs/tui/src/status/snapshots/llmx_tuistatustests__status_snapshot_shows_empty_limits_message.snap
+++ b/llmx-rs/tui/src/status/snapshots/llmx_tuistatustests__status_snapshot_shows_empty_limits_message.snap
@@ -5,7 +5,7 @@ expression: sanitized
 /status

 ╭──────────────────────────────────────────────────────────────╮
-│  >_ LLMX (v0.1.1)                                            │
+│  >_ LLMX (v0.1.7)                                            │
 │                                                              │
 │ Visit https://chatgpt.com/llmx/settings/usage for up-to-date │
 │ information on rate limits and credits                       │
--- a/llmx-rs/tui/src/status/snapshots/llmx_tuistatustests__status_snapshot_shows_missing_limits_message.snap
+++ b/llmx-rs/tui/src/status/snapshots/llmx_tuistatustests__status_snapshot_shows_missing_limits_message.snap
@@ -5,7 +5,7 @@ expression: sanitized
 /status

 ╭──────────────────────────────────────────────────────────────╮
-│  >_ LLMX (v0.1.1)                                            │
+│  >_ LLMX (v0.1.7)                                            │
 │                                                              │
 │ Visit https://chatgpt.com/llmx/settings/usage for up-to-date │
 │ information on rate limits and credits                       │
--- a/llmx-rs/tui/src/status/snapshots/llmx_tuistatustests__status_snapshot_shows_stale_limits_message.snap
+++ b/llmx-rs/tui/src/status/snapshots/llmx_tuistatustests__status_snapshot_shows_stale_limits_message.snap
@@ -5,7 +5,7 @@ expression: sanitized
 /status

 ╭───────────────────────────────────────────────────────────────────╮
-│  >_ LLMX (v0.1.1)                                                 │
+│  >_ LLMX (v0.1.7)                                                 │
 │                                                                   │
 │ Visit https://chatgpt.com/llmx/settings/usage for up-to-date      │
 │ information on rate limits and credits                            │
--- a/llmx-rs/tui/src/status/snapshots/llmx_tuistatustests__status_snapshot_truncates_in_narrow_terminal.snap
+++ b/llmx-rs/tui/src/status/snapshots/llmx_tuistatustests__status_snapshot_truncates_in_narrow_terminal.snap
@@ -5,7 +5,7 @@ expression: sanitized
 /status

 ╭────────────────────────────────────────────╮
-│  >_ LLMX (v0.1.1)                          │
+│  >_ LLMX (v0.1.7)                          │
 │                                            │
 │ Visit https://chatgpt.com/llmx/settings/   │
 │ usage for up-to-date                       │
--- a/llmx-rs/tui/tests/fixtures/binary-size-log.jsonl
+++ b/llmx-rs/tui/tests/fixtures/binary-size-log.jsonl