Files
runpod/models/vllm
Sebastian Krüger 7f1890517d fix: enable eager execution for proper token streaming in vLLM
- Set enforce_eager=True to disable CUDA graphs which were batching outputs
- Add disable_log_stats=True for better streaming performance
- This ensures AsyncLLMEngine yields tokens incrementally instead of returning complete response
2025-11-21 18:25:50 +01:00
..