# RunPod Template Setup Guide This guide explains how to deploy the AI Orchestrator (ComfyUI + vLLM) on RunPod using a custom Docker template and network volume. ## Architecture Overview The deployment uses a **two-tier strategy**: 1. **Docker Image** (software layer) - Contains system packages, Supervisor, Tailscale 2. **Network Volume** (data layer) - Contains models, ComfyUI installation, venvs, configuration This approach allows fast pod deployment (~2-3 minutes) while keeping all large files (models, ~80-200GB) on a persistent network volume. ## Prerequisites - RunPod account with credits - Docker Hub account (for hosting the template image) - HuggingFace account with API token (for model downloads) - Tailscale account with auth key (optional, for VPN access) ## Step 1: Build and Push Docker Image ### Option A: Automated Build (Recommended) The repository includes a Gitea workflow that automatically builds and pushes the Docker image to your Gitea container registry when you push to the `main` branch or create a version tag. 1. **Configure Gitea Secret:** - Go to your Gitea repository → Settings → Secrets - Add `REGISTRY_TOKEN` = your Gitea access token with registry permissions - (The workflow automatically uses your Gitea username via `gitea.actor`) 2. **Trigger Build:** ```bash # Push to main branch git push origin main # Or create a version tag git tag v1.0.0 git push origin v1.0.0 ``` 3. **Monitor Build:** - Go to Actions tab in Gitea - Wait for build to complete (~5-10 minutes) - Note the Docker image name: `dev.pivoine.art/valknar/runpod-ai-orchestrator:latest` ### Option B: Manual Build If you prefer to build manually: ```bash # From the repository root cd /path/to/runpod # Build the image docker build -t dev.pivoine.art/valknar/runpod-ai-orchestrator:latest . # Login to your Gitea registry docker login dev.pivoine.art # Push to Gitea registry docker push dev.pivoine.art/valknar/runpod-ai-orchestrator:latest ``` ## Step 2: Create Network Volume Network volumes persist your models and data across pod restarts and rebuilds. 1. **Go to RunPod Dashboard → Storage → Network Volumes** 2. **Click "New Network Volume"** 3. **Configure:** - **Name**: `ai-orchestrator-models` - **Size**: `200GB` (adjust based on your needs) - Essential models only: ~80GB - All models: ~137-200GB - **Datacenter**: Choose closest to you (volume tied to datacenter) 4. **Click "Create Volume"** 5. **Note the Volume ID** (e.g., `vol-abc123def456`) for pod deployment ### Storage Requirements | Configuration | Size | Models Included | |--------------|------|-----------------| | Essential | ~80GB | FLUX Schnell, 1-2 SDXL checkpoints, MusicGen Medium | | Complete | ~137GB | All image/video/audio models from playbook | | Full + vLLM | ~200GB | Complete + Qwen 2.5 7B + Llama 3.1 8B | ## Step 3: Create RunPod Template 1. **Go to RunPod Dashboard → Templates** 2. **Click "New Template"** 3. **Configure Template Settings:** **Container Configuration:** - **Template Name**: `AI Orchestrator (ComfyUI + vLLM)` - **Template Type**: Docker - **Container Image**: `dev.pivoine.art/valknar/runpod-ai-orchestrator:latest` - **Container Disk**: `50GB` (for system and temp files) - **Docker Command**: Leave empty (uses default `/start.sh`) **Volume Configuration:** - **Volume Mount Path**: `/workspace` - **Attach to Network Volume**: Select your volume ID from Step 2 **Port Configuration:** - **Expose HTTP Ports**: `8188, 9000, 9001` - `8188` - ComfyUI web interface - `9000` - Model orchestrator API - `9001` - Supervisor web UI - **Expose TCP Ports**: `22` (SSH access) **Environment Variables:** ``` HF_TOKEN=your_huggingface_token_here TAILSCALE_AUTHKEY=tskey-auth-your_tailscale_authkey_here SUPERVISOR_BACKEND_HOST=localhost SUPERVISOR_BACKEND_PORT=9001 ``` **Advanced Settings:** - **Start Jupyter**: No - **Start SSH**: Yes (handled by base image) 4. **Click "Save Template"** ## Step 4: First Deployment (Initial Setup) The first time you deploy, you need to set up the network volume with models and configuration. ### 4.1 Deploy Pod 1. **Go to RunPod Dashboard → Pods** 2. **Click "Deploy"** or "GPU Pods" 3. **Select your custom template**: `AI Orchestrator (ComfyUI + vLLM)` 4. **Configure GPU:** - **GPU Type**: RTX 4090 (24GB VRAM) or higher - **Network Volume**: Select your volume from Step 2 - **On-Demand vs Spot**: Choose based on budget 5. **Click "Deploy"** ### 4.2 SSH into Pod ```bash # Get pod SSH command from RunPod dashboard ssh root@ -p -i ~/.ssh/id_ed25519 # Or use RunPod web terminal ``` ### 4.3 Initial Setup on Network Volume ```bash # 1. Clone the repository to /workspace/ai cd /workspace git clone https://github.com/your-username/runpod.git ai cd ai # 2. Create .env file with your credentials cp .env.example .env nano .env # Edit and add: # HF_TOKEN=your_huggingface_token # TAILSCALE_AUTHKEY=tskey-auth-your_key # GPU_TAILSCALE_IP= # 3. Download essential models (this takes 30-60 minutes) ansible-playbook playbook.yml --tags comfyui-essential # OR download all models (1-2 hours) ansible-playbook playbook.yml --tags comfyui-models-all # 4. Link models to ComfyUI bash scripts/link-comfyui-models.sh # OR if arty is available arty run models/link-comfyui # 5. Install ComfyUI custom nodes dependencies cd /workspace/ComfyUI/custom_nodes/ComfyUI-Manager pip install -r requirements.txt cd /workspace/ai # 6. Restart the container to apply all changes exit # Go to RunPod dashboard → Stop pod → Start pod ``` ### 4.4 Verify Services After restart, SSH back in and check: ```bash # Check supervisor status supervisorctl -c /workspace/supervisord.conf status # Expected output: # comfyui RUNNING pid 123, uptime 0:01:00 # (orchestrator is disabled by default - enable for vLLM) # Test ComfyUI curl -I http://localhost:8188 # Test Supervisor web UI curl -I http://localhost:9001 ``` ## Step 5: Subsequent Deployments After initial setup, deploying new pods is quick (2-3 minutes): 1. **Deploy pod** with same template + network volume 2. **Wait for startup** (~1-2 minutes for services to start) 3. **Access services:** - ComfyUI: `http://:8188` - Supervisor: `http://:9001` **All models, configuration, and data persist on the network volume!** ## Step 6: Access Services ### Via Direct IP (HTTP) Get pod IP and ports from RunPod dashboard: ``` ComfyUI: http://:8188 Supervisor UI: http://:9001 Orchestrator API: http://:9000 SSH: ssh root@ -p ``` ### Via Tailscale VPN (Recommended) If you configured `TAILSCALE_AUTHKEY`, the pod automatically joins your Tailscale network: 1. **Get Tailscale IP:** ```bash ssh root@ -p tailscale ip -4 # Example output: 100.114.60.40 ``` 2. **Access via Tailscale:** ``` ComfyUI: http://:8188 Supervisor: http://:9001 Orchestrator: http://:9000 SSH: ssh root@ ``` 3. **Update LiteLLM config** on your VPS with the Tailscale IP ## Service Management ### Start/Stop Services ```bash # Start all services supervisorctl -c /workspace/supervisord.conf start all # Stop all services supervisorctl -c /workspace/supervisord.conf stop all # Restart specific service supervisorctl -c /workspace/supervisord.conf restart comfyui # View status supervisorctl -c /workspace/supervisord.conf status ``` ### Enable vLLM Models (Text Generation) By default, only ComfyUI runs (to save VRAM). To enable vLLM: 1. **Stop ComfyUI** (frees up VRAM): ```bash supervisorctl -c /workspace/supervisord.conf stop comfyui ``` 2. **Start orchestrator** (manages vLLM models): ```bash supervisorctl -c /workspace/supervisord.conf start orchestrator ``` 3. **Test text generation:** ```bash curl -X POST http://localhost:9000/v1/chat/completions \ -H 'Content-Type: application/json' \ -d '{"model":"qwen-2.5-7b","messages":[{"role":"user","content":"Hello"}]}' ``` ### Switch Back to ComfyUI ```bash # Stop orchestrator (stops all vLLM models) supervisorctl -c /workspace/supervisord.conf stop orchestrator # Start ComfyUI supervisorctl -c /workspace/supervisord.conf start comfyui ``` ## Updating the Template When you make changes to code or configuration: ### Update Docker Image ```bash # 1. Make changes to Dockerfile or start.sh # 2. Push to repository git add . git commit -m "Update template configuration" git push origin main # 3. Gitea workflow auto-builds new image # 4. Terminate old pod and deploy new one with updated image ``` ### Update Network Volume Data ```bash # SSH into running pod ssh root@ -p # Update repository cd /workspace/ai git pull # Re-run Ansible if needed ansible-playbook playbook.yml --tags # Restart services supervisorctl -c /workspace/supervisord.conf restart all ``` ## Troubleshooting ### Pod fails to start **Check logs:** ```bash # Via SSH cat /workspace/logs/supervisord.log cat /workspace/logs/comfyui.err.log # Via RunPod web terminal tail -f /workspace/logs/*.log ``` **Common issues:** - Missing `.env` file → Create `/workspace/ai/.env` with required vars - Supervisor config not found → Ensure `/workspace/ai/supervisord.conf` exists - Port conflicts → Check if services are already running ### Tailscale not connecting **Check Tailscale status:** ```bash tailscale status tailscale ip -4 ``` **Common issues:** - Missing or invalid `TAILSCALE_AUTHKEY` in `.env` - Auth key expired → Generate new key in Tailscale admin - Firewall blocking → RunPod should allow Tailscale by default ### Services not starting **Check Supervisor:** ```bash supervisorctl -c /workspace/supervisord.conf status supervisorctl -c /workspace/supervisord.conf tail -f comfyui ``` **Common issues:** - venv broken → Re-run `scripts/bootstrap-venvs.sh` - Models not downloaded → Run Ansible playbook again - Python version mismatch → Rebuild venvs ### Out of VRAM **Check GPU memory:** ```bash nvidia-smi ``` **RTX 4090 (24GB) capacity:** - ComfyUI (FLUX Schnell): ~23GB (can't run with vLLM) - vLLM (Qwen 2.5 7B): ~14GB - vLLM (Llama 3.1 8B): ~17GB **Solution:** Only run one service at a time (see Service Management section) ### Network volume full **Check disk usage:** ```bash df -h /workspace du -sh /workspace/* ``` **Clean up:** ```bash # Remove old HuggingFace cache rm -rf /workspace/huggingface_cache # Re-download essential models only cd /workspace/ai ansible-playbook playbook.yml --tags comfyui-essential ``` ## Cost Optimization ### Spot vs On-Demand - **Spot instances**: ~70% cheaper, can be interrupted - **On-Demand**: More expensive, guaranteed availability **Recommendation:** Use spot for development, on-demand for production ### Network Volume Pricing - First 1TB: $0.07/GB/month - Beyond 1TB: $0.05/GB/month **200GB volume cost:** ~$14/month ### Pod Auto-Stop Configure auto-stop in RunPod pod settings to save costs when idle: - Stop after 15 minutes idle - Stop after 1 hour idle - Manual stop only ## Advanced Configuration ### Custom Environment Variables Add to template or pod environment variables: ```bash # Model cache locations HF_HOME=/workspace/huggingface_cache TRANSFORMERS_CACHE=/workspace/huggingface_cache # ComfyUI settings COMFYUI_PORT=8188 COMFYUI_LISTEN=0.0.0.0 # Orchestrator settings ORCHESTRATOR_PORT=9000 # GPU settings CUDA_VISIBLE_DEVICES=0 ``` ### Multiple Network Volumes You can attach multiple network volumes for organization: 1. **Models volume** - `/workspace/models` (read-only, shared) 2. **Data volume** - `/workspace/data` (read-write, per-project) ### Custom Startup Script Override `/start.sh` behavior by creating `/workspace/custom-start.sh`: ```bash #!/bin/bash # Custom startup commands # Source default startup source /start.sh # Add your custom commands here echo "Running custom initialization..." ``` ## References - [RunPod Documentation](https://docs.runpod.io/) - [RunPod Templates Overview](https://docs.runpod.io/pods/templates/overview) - [Network Volumes Guide](https://docs.runpod.io/storage/network-volumes) - [ComfyUI Documentation](https://github.com/comfyanonymous/ComfyUI) - [Supervisor Documentation](http://supervisord.org/) - [Tailscale Documentation](https://tailscale.com/kb/) ## Support For issues or questions: - Check troubleshooting section above - Review `/workspace/logs/` files - Check RunPod community forums - Open issue in project repository