runpod

Author	SHA1	Message	Date
Sebastian Krüger	cc0d93060d	feat: expose Supervisor web UI for proxy access Disable Supervisor's built-in authentication and expose on all interfaces (0.0.0.0:9001) to enable reverse proxy access via nginx and Authelia SSO. Authentication is now handled by Authelia at supervisor.ai.pivoine.art. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 13:19:02 +01:00
Sebastian Krüger	16226c7b39	chore: remove Ansible infrastructure Remove Ansible playbook and configuration files as all infrastructure setup has been migrated to arty bash scripts. Deleted: - playbook.yml (~950 lines) - inventory.yml - ansible.cfg All functionality now available via arty scripts: - arty run install/essential - arty run setup/system-packages - arty run setup/comfyui-base - arty run setup/comfyui-nodes - arty run setup/supervisor - arty run setup/tailscale Model downloads will be handled separately (TBD). 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 12:23:11 +01:00
Sebastian Krüger	d5965ce0c4	refactor: replace Ansible with arty bash scripts Completely migrate infrastructure setup from Ansible to pure bash scripts in arty.yml, enabling simpler deployment and eliminating Ansible dependency on RunPod. ## New Installation Scripts System Setup: - setup/system-packages: Install apt packages, verify GPU - setup/python-env: Configure Python, pip, core packages - setup/comfyui-base: Install ComfyUI, create model directories - setup/comfyui-nodes: Install 5 essential custom nodes - setup/supervisor: Install Supervisor process manager - setup/tailscale: Install Tailscale VPN (optional) Utility Scripts: - setup/validate: Validate installation, check packages/GPU/cache - setup/cleanup: Clean sensitive files for template creation Orchestration Scripts: - install/minimal: System + Python + ComfyUI + Supervisor - install/essential: Minimal + Custom nodes (recommended) - install/full: Essential + Tailscale VPN ## Changes ### Added - 11 new bash scripts replacing Ansible playbook tasks - Comprehensive deployment notes with all commands - Script categorization: System Setup, Service Management, Health Checks - Detailed progress output with visual indicators (✓/❌) ### Removed - Ansible shortcut scripts (ansible/base, ansible/supervisor, etc.) - deps/comfyui-nodes (integrated into setup/comfyui-nodes) - References to Ansible in deployment workflow ### Modified - Deployment notes: Complete rewrite with organized sections - Fresh deployment: Now uses arty run install/essential - Documentation: Updated paths, commands, and workflow ## Benefits - ✅ No Ansible dependency - ✅ Faster debugging (pure bash, no templates) - ✅ RunPod compatible (no sudo issues) - ✅ Modular: Run individual steps or orchestration - ✅ Better error handling and user feedback - ✅ Easier to maintain and extend ## Migration Path Model downloads still use Ansible playbook (temporary): - ansible-playbook playbook.yml --tags comfyui-essential - ansible-playbook playbook.yml --tags comfyui-models-all After successful testing, Ansible files can be safely removed: - playbook.yml - inventory.yml - ansible.cfg ## Usage ```bash # Fresh deployment arty run install/essential ansible-playbook playbook.yml --tags comfyui-essential arty run models/link-comfyui arty run workflows/link-comfyui arty run services/start ``` 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 12:21:05 +01:00
Sebastian Krüger	b370c16eb9	feat: add workflow linking script to arty.yml Add new arty script `workflows/link-comfyui` to create symlinks for all production workflows from git repository to ComfyUI workflows directory. Script creates symlinks for: - All 6 workflow category directories (text-to-image, image-to-image, image-to-video, text-to-music, upscaling, advanced) - templates/ directory for future workflow templates - README.md and WORKFLOW_STANDARDS.md documentation Usage: arty run workflows/link-comfyui Updates: - Added workflows/link-comfyui script with informative output - Updated Fresh Deployment guide to include workflow linking step - Added section 6 "ComfyUI Workflows" with complete workflow listing - Updated Storage section to document workflow symlinks - Added workflow README to documentation list 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 12:12:06 +01:00
Sebastian Krüger	71a30c0e4d	feat: add comprehensive ComfyUI workflow collection Add 20 production-ready ComfyUI workflows across 6 categories: Text-to-Image (4 workflows): - FLUX Schnell (fast, 4 steps) - FLUX Dev (high-quality, 20-50 steps) - SDXL + Refiner (two-stage, detailed) - SD3.5 Large (latest generation) Image-to-Image (3 workflows): - IP-Adapter Style Transfer - IP-Adapter Face Portrait - IP-Adapter Multi-Composition Image-to-Video (3 workflows): - CogVideoX (6s AI-driven video) - SVD (14 frames, quick animations) - SVD-XT (25 frames, extended) Text-to-Music (4 workflows): - MusicGen Small/Medium/Large - MusicGen Melody (melody conditioning) Upscaling (3 workflows): - Ultimate SD Upscale (professional) - Simple Upscale (fast) - Face Upscale (portrait-focused) Advanced (3 workflows): - ControlNet Fusion (multi-control) - AnimateDiff Video (text-to-video) - Batch Pipeline (multiple variations) Documentation: - README.md: Usage guide, model requirements, examples - WORKFLOW_STANDARDS.md: Development standards, best practices All workflows include: - API compatibility for orchestrator integration - Error handling and validation - VRAM optimization for 24GB GPUs - Preview and save nodes - Comprehensive metadata and parameters - Performance benchmarks 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 12:08:18 +01:00
Sebastian Krüger	6323488591	fix: add missing ComfyUI-Impact-Pack dependencies - Add scikit-image for image processing - Add piexif for EXIF metadata handling - Add segment-anything for SAM model support These dependencies are required for ComfyUI-Impact-Pack to load correctly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 09:33:43 +01:00
Sebastian Krüger	664da9f4ea	feat: add Supervisor process manager for service management - Add supervisord.conf with ComfyUI and orchestrator services - Update Ansible playbook with supervisor installation tag - Rewrite start-all.sh and stop-all.sh to use Supervisor - Add status.sh script for checking service status - Update arty.yml with supervisor commands and shortcuts - Update CLAUDE.md with Supervisor documentation and troubleshooting - Services now auto-restart on crashes with centralized logging Benefits: - Better process control than manual pkill/background jobs - Auto-restart on service crashes - Centralized log management in /workspace/logs/ - Web interface for monitoring (port 9001) - Works perfectly in RunPod containers (no systemd needed) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 09:22:16 +01:00
Sebastian Krüger	2207d60f98	feat: add Arty configuration and Claude Code documentation - Add arty.yml for repository management with environment profiles (prod/dev/minimal) - Add CLAUDE.md with comprehensive architecture and usage documentation - Add comfyui_models.yaml for ComfyUI model configuration - Include deployment scripts for model linking and dependency installation - Document all git repositories (ComfyUI + 10 custom nodes) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 02:50:36 +01:00
Sebastian Krüger	c9b01eef68	refactor: consolidate model management into Ansible playbook Remove flux/musicgen standalone implementations in favor of ComfyUI: - Delete models/flux/ and models/musicgen/ directories - Remove redundant scripts (install.sh, download-models.sh, prepare-template.sh) - Update README.md to reference Ansible playbook commands - Update playbook.yml to remove flux/musicgen service definitions - Add COMFYUI_MODELS.md with comprehensive model installation guide - Update stop-all.sh to only manage orchestrator and vLLM services All model downloads and dependency management now handled via Ansible playbook tags (base, python, vllm, comfyui, comfyui-essential). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 00:31:26 +01:00
Sebastian Krüger	7edf17551a	Add Llama 3.1 8B model to orchestrator	2025-11-21 21:29:57 +01:00
Sebastian Krüger	c4426ccc58	Remove Flux and MusicGen models from orchestrator ComfyUI now handles Flux image generation directly. MusicGen is not being used and has been removed.	2025-11-21 21:14:43 +01:00
Sebastian Krüger	205e591fb4	feat: add ComfyUI integration with Ansible playbook - Add ComfyUI installation to Ansible playbook with 'comfyui' tag - Create ComfyUI requirements.txt and start.sh script - Clone ComfyUI from official GitHub repository - Symlink HuggingFace cache for Flux model access - ComfyUI runs on port 8188 with CORS enabled - Add ComfyUI to services list in playbook 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 20:41:43 +01:00
Sebastian Krüger	538609da3e	docs: add comprehensive README for AI model orchestrator - Architecture overview (process-based orchestrator) - Installation and setup instructions for RunPod - Available models (text, image, music generation) - API endpoints and usage examples - Integration guide for LiteLLM proxy - Troubleshooting section with streaming fixes - Deployment options (RunPod, VPS, Tailscale VPN) - Adding new models workflow - Project structure documentation Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 19:47:43 +01:00
Sebastian Krüger	34fc456a3b	feat: add /v1/models endpoint and fix streaming proxy in subprocess orchestrator	2025-11-21 19:34:20 +01:00
Sebastian Krüger	91089d3edc	feat: add /v1/models endpoint and systemd service for orchestrator - Add OpenAI-compatible /v1/models endpoint to list available models - Create systemd service file for proper service management - Service runs as root with automatic restart on failure - Logs to systemd journal for easy debugging	2025-11-21 19:28:16 +01:00
Sebastian Krüger	9947fe37bb	fix: properly proxy streaming requests without buffering The orchestrator was calling response.json() which buffered the entire streaming response before returning it. This caused LiteLLM to receive only one chunk with empty content instead of token-by-token streaming. Changes: - Detect streaming requests by parsing request body for 'stream': true - Use client.stream() with aiter_bytes() for streaming requests - Return StreamingResponse with proper SSE headers - Keep original JSONResponse behavior for non-streaming requests This fixes streaming from vLLM → orchestrator → LiteLLM chain.	2025-11-21 19:21:56 +01:00
Sebastian Krüger	7f1890517d	fix: enable eager execution for proper token streaming in vLLM - Set enforce_eager=True to disable CUDA graphs which were batching outputs - Add disable_log_stats=True for better streaming performance - This ensures AsyncLLMEngine yields tokens incrementally instead of returning complete response	2025-11-21 18:25:50 +01:00
Sebastian Krüger	94080da341	fix: remove incorrect start-vllm.sh that would break orchestrator architecture	2025-11-21 18:10:53 +01:00
Sebastian Krüger	6944e4ebd5	feat: add vllm serve script with proper streaming support	2025-11-21 18:08:21 +01:00
Sebastian Krüger	d21caa56bc	fix: implement incremental streaming deltas for vLLM chat completions - Track previous_text to calculate deltas instead of sending full accumulated text - Fixes WebUI streaming issue where responses appeared empty - Only send new tokens in each SSE chunk delta - Resolves OpenAI API compatibility for streaming chat completions	2025-11-21 17:23:18 +01:00
Sebastian Krüger	57b706abe6	fix: correct vLLM service port to 8000 - Updated qwen-2.5-7b port from 8001 to 8000 in models.yaml - Matches actual vLLM server default port configuration - Tested and verified: orchestrator successfully loaded model and generated response	2025-11-21 16:28:54 +01:00
Sebastian Krüger	9a637cc4fc	refactor: clean Docker files and restore standalone model services - Remove all Docker-related files (Dockerfiles, compose.yaml) - Remove documentation files (README, ARCHITECTURE, docs/) - Remove old core/ directory (base_service, service_manager) - Update models.yaml with correct service_script paths (models/*/server.py) - Simplify vLLM requirements.txt to let vLLM manage dependencies - Restore original standalone vLLM server (no base_service dependency) - Remove obsolete vllm/, musicgen/, flux/ directories Process-based architecture is now fully functional on RunPod. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 16:17:38 +01:00
Sebastian Krüger	9ee626a78e	feat: implement Ansible-based process architecture for RunPod Major architecture overhaul to address RunPod Docker limitations: Core Infrastructure: - Add base_service.py: Abstract base class for all AI services - Add service_manager.py: Process lifecycle management - Add core/requirements.txt: Core dependencies Model Services (Standalone Python): - Add models/vllm/server.py: Qwen 2.5 7B text generation - Add models/flux/server.py: Flux.1 Schnell image generation - Add models/musicgen/server.py: MusicGen Medium music generation - Each service inherits from GPUService base class - OpenAI-compatible APIs - Standalone execution support Ansible Deployment: - Add playbook.yml: Comprehensive deployment automation - Add ansible.cfg: Ansible configuration - Add inventory.yml: Localhost inventory - Tags: base, python, dependencies, models, tailscale, validate, cleanup Scripts: - Add scripts/install.sh: Full installation wrapper - Add scripts/download-models.sh: Model download wrapper - Add scripts/start-all.sh: Start orchestrator - Add scripts/stop-all.sh: Stop all services Documentation: - Update ARCHITECTURE.md: Document distributed VPS+GPU architecture Benefits: - No Docker: Avoids RunPod CAP_SYS_ADMIN limitations - Fully reproducible via Ansible - Extensible: Add models in 3 steps - Direct Python execution (no container overhead) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 15:37:18 +01:00
Sebastian Krüger	03a430894d	docs: add clean extensible architecture design Created comprehensive architecture document for RunPod deployment: Key Design Principles: - No Docker (direct Python for RunPod compatibility) - Extensible (add models in 3 simple steps) - Maintainable (clear structure, base classes) - Simple (one command startup) Structure: - core/ - Base service class + service manager - model-orchestrator/ - Request routing - models/ - Service implementations (vllm, flux, musicgen) - scripts/ - Install, start, stop, template prep - docs/ - Adding models, deployment, templates Adding New Models: 1. Create server.py inheriting BaseService 2. Add entry to models.yaml 3. Add requirements.txt That's it! Orchestrator handles lifecycle automatically. Next: Implement base_service.py and refactor existing services. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 15:16:51 +01:00
Sebastian Krüger	31be1932e7	wip: start architecture redesign for RunPod (no Docker) Started redesigning architecture to run services directly without Docker: Completed: - Created new process-based orchestrator (orchestrator_subprocess.py) - Uses subprocess instead of Docker SDK for process management - Updated models.yaml to reference service_script paths - vLLM server already standalone-ready Still needed: - Create/update Flux and MusicGen standalone servers - Create systemd service files or startup scripts - Update prepare-template script for Python deployment - Remove Docker/Compose dependencies - Test full stack on RunPod - Update documentation Reason for change: RunPod's containerized environment doesn't support Docker-in-Docker (requires CAP_SYS_ADMIN). Direct Python execution is simpler, faster, and more reliable for RunPod. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 15:09:30 +01:00
Sebastian Krüger	cd9e2eee2e	fix: use legacy Docker builder for RunPod compatibility - Set DOCKER_BUILDKIT=0 to use legacy builder - BuildKit has permission issues in RunPod's containerized environment - Legacy builder works reliably with RunPod's security constraints 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 15:01:16 +01:00
Sebastian Krüger	8f1d4bedd2	fix: update Docker daemon startup for RunPod environment - Changed from systemctl/service to direct dockerd command - Added --iptables=false --bridge=none flags (required for RunPod) - Added proper error checking and 10s wait time - Improved logging with verification step This fixes Docker startup in RunPod's containerized environment where systemd is not available and iptables require special handling. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 15:00:42 +01:00
Sebastian Krüger	0fa69cae28	refactor: rename docker-compose.gpu.yaml to compose.yaml Simplified compose file naming to follow Docker Compose best practices: - Renamed docker-compose.gpu.yaml to compose.yaml - Updated all references in documentation files (README.md, DEPLOYMENT.md, GPU_DEPLOYMENT_LOG.md, RUNPOD_TEMPLATE.md) - Updated references in scripts (prepare-template.sh) This change enables simpler command syntax: - Before: docker compose -f docker-compose.gpu.yaml up -d orchestrator - After: docker compose up -d orchestrator Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 14:49:42 +01:00
Sebastian Krüger	cafa0a1147	refactor: clean up runpod repository structure Removed facefusion and VPS-related files: - compose.yaml, postgres/, litellm-config.yaml (VPS services) - Dockerfile, entrypoint.sh, disable-nsfw-filter.patch (facefusion) Removed outdated documentation: - DOCKER_GPU_SETUP.md, README_GPU_SETUP.md, SETUP_GUIDE.md - TAILSCALE_SETUP.md, WIREGUARD_SETUP.md (covered in DEPLOYMENT.md) - GPU_EXPANSION_PLAN.md (historical planning doc) - gpu-server-compose.yaml, litellm-config-gpu.yaml (old versions) - deploy-gpu-stack.sh, simple_vllm_server.py (old scripts) Organized documentation: - Created docs/ directory - Moved DEPLOYMENT.md, RUNPOD_TEMPLATE.md, GPU_DEPLOYMENT_LOG.md to docs/ - Updated all documentation links in README.md Final structure: - Clean root directory with only GPU-specific files - Organized documentation in docs/ - Model services in dedicated directories (model-orchestrator/, vllm/, flux/, musicgen/) - Automation scripts in scripts/	2025-11-21 14:45:49 +01:00
Sebastian Krüger	277f1c95bd	Initial commit: RunPod multi-modal AI orchestration stack - Multi-modal AI infrastructure for RunPod RTX 4090 - Automatic model orchestration (text, image, music) - Text: vLLM + Qwen 2.5 7B Instruct - Image: Flux.1 Schnell via OpenEDAI - Music: MusicGen Medium via AudioCraft - Cost-optimized sequential loading on single GPU - Template preparation scripts for rapid deployment - Comprehensive documentation (README, DEPLOYMENT, TEMPLATE)	2025-11-21 14:34:55 +01:00

1 2 3 4

180 Commits