9947fe37bb6a01348e51f7680240c4b6aa7d5481
The orchestrator was calling response.json() which buffered the entire streaming response before returning it. This caused LiteLLM to receive only one chunk with empty content instead of token-by-token streaming. Changes: - Detect streaming requests by parsing request body for 'stream': true - Use client.stream() with aiter_bytes() for streaming requests - Return StreamingResponse with proper SSE headers - Keep original JSONResponse behavior for non-streaming requests This fixes streaming from vLLM → orchestrator → LiteLLM chain.
Description
No description provided
Languages
Python
79.1%
Shell
19.3%
Dockerfile
1.6%