runpod

Author	SHA1	Message	Date
Sebastian Krüger	205e591fb4	feat: add ComfyUI integration with Ansible playbook - Add ComfyUI installation to Ansible playbook with 'comfyui' tag - Create ComfyUI requirements.txt and start.sh script - Clone ComfyUI from official GitHub repository - Symlink HuggingFace cache for Flux model access - ComfyUI runs on port 8188 with CORS enabled - Add ComfyUI to services list in playbook 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 20:41:43 +01:00
Sebastian Krüger	7f1890517d	fix: enable eager execution for proper token streaming in vLLM - Set enforce_eager=True to disable CUDA graphs which were batching outputs - Add disable_log_stats=True for better streaming performance - This ensures AsyncLLMEngine yields tokens incrementally instead of returning complete response	2025-11-21 18:25:50 +01:00
Sebastian Krüger	d21caa56bc	fix: implement incremental streaming deltas for vLLM chat completions - Track previous_text to calculate deltas instead of sending full accumulated text - Fixes WebUI streaming issue where responses appeared empty - Only send new tokens in each SSE chunk delta - Resolves OpenAI API compatibility for streaming chat completions	2025-11-21 17:23:18 +01:00
Sebastian Krüger	9a637cc4fc	refactor: clean Docker files and restore standalone model services - Remove all Docker-related files (Dockerfiles, compose.yaml) - Remove documentation files (README, ARCHITECTURE, docs/) - Remove old core/ directory (base_service, service_manager) - Update models.yaml with correct service_script paths (models/*/server.py) - Simplify vLLM requirements.txt to let vLLM manage dependencies - Restore original standalone vLLM server (no base_service dependency) - Remove obsolete vllm/, musicgen/, flux/ directories Process-based architecture is now fully functional on RunPod. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 16:17:38 +01:00
Sebastian Krüger	9ee626a78e	feat: implement Ansible-based process architecture for RunPod Major architecture overhaul to address RunPod Docker limitations: Core Infrastructure: - Add base_service.py: Abstract base class for all AI services - Add service_manager.py: Process lifecycle management - Add core/requirements.txt: Core dependencies Model Services (Standalone Python): - Add models/vllm/server.py: Qwen 2.5 7B text generation - Add models/flux/server.py: Flux.1 Schnell image generation - Add models/musicgen/server.py: MusicGen Medium music generation - Each service inherits from GPUService base class - OpenAI-compatible APIs - Standalone execution support Ansible Deployment: - Add playbook.yml: Comprehensive deployment automation - Add ansible.cfg: Ansible configuration - Add inventory.yml: Localhost inventory - Tags: base, python, dependencies, models, tailscale, validate, cleanup Scripts: - Add scripts/install.sh: Full installation wrapper - Add scripts/download-models.sh: Model download wrapper - Add scripts/start-all.sh: Start orchestrator - Add scripts/stop-all.sh: Stop all services Documentation: - Update ARCHITECTURE.md: Document distributed VPS+GPU architecture Benefits: - No Docker: Avoids RunPod CAP_SYS_ADMIN limitations - Fully reproducible via Ansible - Extensible: Add models in 3 steps - Direct Python execution (no container overhead) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 15:37:18 +01:00

5 Commits