runpod

Go to file

Sebastian Krüger 7f1890517d fix: enable eager execution for proper token streaming in vLLM

- Set enforce_eager=True to disable CUDA graphs which were batching outputs
- Add disable_log_stats=True for better streaming performance
- This ensures AsyncLLMEngine yields tokens incrementally instead of returning complete response

2025-11-21 18:25:50 +01:00

model-orchestrator

fix: correct vLLM service port to 8000

2025-11-21 16:28:54 +01:00

models

fix: enable eager execution for proper token streaming in vLLM

2025-11-21 18:25:50 +01:00

scripts

feat: implement Ansible-based process architecture for RunPod

2025-11-21 15:37:18 +01:00

.env.example

Initial commit: RunPod multi-modal AI orchestration stack

2025-11-21 14:34:55 +01:00

.gitignore

Initial commit: RunPod multi-modal AI orchestration stack

2025-11-21 14:34:55 +01:00

ansible.cfg

feat: implement Ansible-based process architecture for RunPod

2025-11-21 15:37:18 +01:00

inventory.yml

feat: implement Ansible-based process architecture for RunPod

2025-11-21 15:37:18 +01:00

playbook.yml

feat: implement Ansible-based process architecture for RunPod

2025-11-21 15:37:18 +01:00