- Use Gitea container registry instead of Docker Hub - Update workflow to use gitea.actor and REGISTRY_TOKEN - Update documentation to reflect correct registry URL - Match supervisor-ui workflow configuration
12 KiB
RunPod Template Setup Guide
This guide explains how to deploy the AI Orchestrator (ComfyUI + vLLM) on RunPod using a custom Docker template and network volume.
Architecture Overview
The deployment uses a two-tier strategy:
- Docker Image (software layer) - Contains system packages, Supervisor, Tailscale
- Network Volume (data layer) - Contains models, ComfyUI installation, venvs, configuration
This approach allows fast pod deployment (~2-3 minutes) while keeping all large files (models, ~80-200GB) on a persistent network volume.
Prerequisites
- RunPod account with credits
- Docker Hub account (for hosting the template image)
- HuggingFace account with API token (for model downloads)
- Tailscale account with auth key (optional, for VPN access)
Step 1: Build and Push Docker Image
Option A: Automated Build (Recommended)
The repository includes a Gitea workflow that automatically builds and pushes the Docker image to your Gitea container registry when you push to the main branch or create a version tag.
-
Configure Gitea Secret:
- Go to your Gitea repository → Settings → Secrets
- Add
REGISTRY_TOKEN= your Gitea access token with registry permissions - (The workflow automatically uses your Gitea username via
gitea.actor)
-
Trigger Build:
# Push to main branch git push origin main # Or create a version tag git tag v1.0.0 git push origin v1.0.0 -
Monitor Build:
- Go to Actions tab in Gitea
- Wait for build to complete (~5-10 minutes)
- Note the Docker image name:
dev.pivoine.art/valknar/runpod-ai-orchestrator:latest
Option B: Manual Build
If you prefer to build manually:
# From the repository root
cd /path/to/runpod
# Build the image
docker build -t dev.pivoine.art/valknar/runpod-ai-orchestrator:latest .
# Login to your Gitea registry
docker login dev.pivoine.art
# Push to Gitea registry
docker push dev.pivoine.art/valknar/runpod-ai-orchestrator:latest
Step 2: Create Network Volume
Network volumes persist your models and data across pod restarts and rebuilds.
-
Go to RunPod Dashboard → Storage → Network Volumes
-
Click "New Network Volume"
-
Configure:
- Name:
ai-orchestrator-models - Size:
200GB(adjust based on your needs)- Essential models only: ~80GB
- All models: ~137-200GB
- Datacenter: Choose closest to you (volume tied to datacenter)
- Name:
-
Click "Create Volume"
-
Note the Volume ID (e.g.,
vol-abc123def456) for pod deployment
Storage Requirements
| Configuration | Size | Models Included |
|---|---|---|
| Essential | ~80GB | FLUX Schnell, 1-2 SDXL checkpoints, MusicGen Medium |
| Complete | ~137GB | All image/video/audio models from playbook |
| Full + vLLM | ~200GB | Complete + Qwen 2.5 7B + Llama 3.1 8B |
Step 3: Create RunPod Template
-
Go to RunPod Dashboard → Templates
-
Click "New Template"
-
Configure Template Settings:
Container Configuration:
- Template Name:
AI Orchestrator (ComfyUI + vLLM) - Template Type: Docker
- Container Image:
dev.pivoine.art/valknar/runpod-ai-orchestrator:latest - Container Disk:
50GB(for system and temp files) - Docker Command: Leave empty (uses default
/start.sh)
Volume Configuration:
- Volume Mount Path:
/workspace - Attach to Network Volume: Select your volume ID from Step 2
Port Configuration:
- Expose HTTP Ports:
8188, 9000, 90018188- ComfyUI web interface9000- Model orchestrator API9001- Supervisor web UI
- Expose TCP Ports:
22(SSH access)
Environment Variables:
HF_TOKEN=your_huggingface_token_here TAILSCALE_AUTHKEY=tskey-auth-your_tailscale_authkey_here SUPERVISOR_BACKEND_HOST=localhost SUPERVISOR_BACKEND_PORT=9001Advanced Settings:
- Start Jupyter: No
- Start SSH: Yes (handled by base image)
- Template Name:
-
Click "Save Template"
Step 4: First Deployment (Initial Setup)
The first time you deploy, you need to set up the network volume with models and configuration.
4.1 Deploy Pod
- Go to RunPod Dashboard → Pods
- Click "Deploy" or "GPU Pods"
- Select your custom template:
AI Orchestrator (ComfyUI + vLLM) - Configure GPU:
- GPU Type: RTX 4090 (24GB VRAM) or higher
- Network Volume: Select your volume from Step 2
- On-Demand vs Spot: Choose based on budget
- Click "Deploy"
4.2 SSH into Pod
# Get pod SSH command from RunPod dashboard
ssh root@<pod-ip> -p <port> -i ~/.ssh/id_ed25519
# Or use RunPod web terminal
4.3 Initial Setup on Network Volume
# 1. Clone the repository to /workspace/ai
cd /workspace
git clone https://github.com/your-username/runpod.git ai
cd ai
# 2. Create .env file with your credentials
cp .env.example .env
nano .env
# Edit and add:
# HF_TOKEN=your_huggingface_token
# TAILSCALE_AUTHKEY=tskey-auth-your_key
# GPU_TAILSCALE_IP=<will be set automatically>
# 3. Download essential models (this takes 30-60 minutes)
ansible-playbook playbook.yml --tags comfyui-essential
# OR download all models (1-2 hours)
ansible-playbook playbook.yml --tags comfyui-models-all
# 4. Link models to ComfyUI
bash scripts/link-comfyui-models.sh
# OR if arty is available
arty run models/link-comfyui
# 5. Install ComfyUI custom nodes dependencies
cd /workspace/ComfyUI/custom_nodes/ComfyUI-Manager
pip install -r requirements.txt
cd /workspace/ai
# 6. Restart the container to apply all changes
exit
# Go to RunPod dashboard → Stop pod → Start pod
4.4 Verify Services
After restart, SSH back in and check:
# Check supervisor status
supervisorctl -c /workspace/supervisord.conf status
# Expected output:
# comfyui RUNNING pid 123, uptime 0:01:00
# (orchestrator is disabled by default - enable for vLLM)
# Test ComfyUI
curl -I http://localhost:8188
# Test Supervisor web UI
curl -I http://localhost:9001
Step 5: Subsequent Deployments
After initial setup, deploying new pods is quick (2-3 minutes):
- Deploy pod with same template + network volume
- Wait for startup (~1-2 minutes for services to start)
- Access services:
- ComfyUI:
http://<pod-ip>:8188 - Supervisor:
http://<pod-ip>:9001
- ComfyUI:
All models, configuration, and data persist on the network volume!
Step 6: Access Services
Via Direct IP (HTTP)
Get pod IP and ports from RunPod dashboard:
ComfyUI: http://<pod-ip>:8188
Supervisor UI: http://<pod-ip>:9001
Orchestrator API: http://<pod-ip>:9000
SSH: ssh root@<pod-ip> -p <port>
Via Tailscale VPN (Recommended)
If you configured TAILSCALE_AUTHKEY, the pod automatically joins your Tailscale network:
-
Get Tailscale IP:
ssh root@<pod-ip> -p <port> tailscale ip -4 # Example output: 100.114.60.40 -
Access via Tailscale:
ComfyUI: http://<tailscale-ip>:8188 Supervisor: http://<tailscale-ip>:9001 Orchestrator: http://<tailscale-ip>:9000 SSH: ssh root@<tailscale-ip> -
Update LiteLLM config on your VPS with the Tailscale IP
Service Management
Start/Stop Services
# Start all services
supervisorctl -c /workspace/supervisord.conf start all
# Stop all services
supervisorctl -c /workspace/supervisord.conf stop all
# Restart specific service
supervisorctl -c /workspace/supervisord.conf restart comfyui
# View status
supervisorctl -c /workspace/supervisord.conf status
Enable vLLM Models (Text Generation)
By default, only ComfyUI runs (to save VRAM). To enable vLLM:
-
Stop ComfyUI (frees up VRAM):
supervisorctl -c /workspace/supervisord.conf stop comfyui -
Start orchestrator (manages vLLM models):
supervisorctl -c /workspace/supervisord.conf start orchestrator -
Test text generation:
curl -X POST http://localhost:9000/v1/chat/completions \ -H 'Content-Type: application/json' \ -d '{"model":"qwen-2.5-7b","messages":[{"role":"user","content":"Hello"}]}'
Switch Back to ComfyUI
# Stop orchestrator (stops all vLLM models)
supervisorctl -c /workspace/supervisord.conf stop orchestrator
# Start ComfyUI
supervisorctl -c /workspace/supervisord.conf start comfyui
Updating the Template
When you make changes to code or configuration:
Update Docker Image
# 1. Make changes to Dockerfile or start.sh
# 2. Push to repository
git add .
git commit -m "Update template configuration"
git push origin main
# 3. Gitea workflow auto-builds new image
# 4. Terminate old pod and deploy new one with updated image
Update Network Volume Data
# SSH into running pod
ssh root@<pod-ip> -p <port>
# Update repository
cd /workspace/ai
git pull
# Re-run Ansible if needed
ansible-playbook playbook.yml --tags <specific-tag>
# Restart services
supervisorctl -c /workspace/supervisord.conf restart all
Troubleshooting
Pod fails to start
Check logs:
# Via SSH
cat /workspace/logs/supervisord.log
cat /workspace/logs/comfyui.err.log
# Via RunPod web terminal
tail -f /workspace/logs/*.log
Common issues:
- Missing
.envfile → Create/workspace/ai/.envwith required vars - Supervisor config not found → Ensure
/workspace/ai/supervisord.confexists - Port conflicts → Check if services are already running
Tailscale not connecting
Check Tailscale status:
tailscale status
tailscale ip -4
Common issues:
- Missing or invalid
TAILSCALE_AUTHKEYin.env - Auth key expired → Generate new key in Tailscale admin
- Firewall blocking → RunPod should allow Tailscale by default
Services not starting
Check Supervisor:
supervisorctl -c /workspace/supervisord.conf status
supervisorctl -c /workspace/supervisord.conf tail -f comfyui
Common issues:
- venv broken → Re-run
scripts/bootstrap-venvs.sh - Models not downloaded → Run Ansible playbook again
- Python version mismatch → Rebuild venvs
Out of VRAM
Check GPU memory:
nvidia-smi
RTX 4090 (24GB) capacity:
- ComfyUI (FLUX Schnell): ~23GB (can't run with vLLM)
- vLLM (Qwen 2.5 7B): ~14GB
- vLLM (Llama 3.1 8B): ~17GB
Solution: Only run one service at a time (see Service Management section)
Network volume full
Check disk usage:
df -h /workspace
du -sh /workspace/*
Clean up:
# Remove old HuggingFace cache
rm -rf /workspace/huggingface_cache
# Re-download essential models only
cd /workspace/ai
ansible-playbook playbook.yml --tags comfyui-essential
Cost Optimization
Spot vs On-Demand
- Spot instances: ~70% cheaper, can be interrupted
- On-Demand: More expensive, guaranteed availability
Recommendation: Use spot for development, on-demand for production
Network Volume Pricing
- First 1TB: $0.07/GB/month
- Beyond 1TB: $0.05/GB/month
200GB volume cost: ~$14/month
Pod Auto-Stop
Configure auto-stop in RunPod pod settings to save costs when idle:
- Stop after 15 minutes idle
- Stop after 1 hour idle
- Manual stop only
Advanced Configuration
Custom Environment Variables
Add to template or pod environment variables:
# Model cache locations
HF_HOME=/workspace/huggingface_cache
TRANSFORMERS_CACHE=/workspace/huggingface_cache
# ComfyUI settings
COMFYUI_PORT=8188
COMFYUI_LISTEN=0.0.0.0
# Orchestrator settings
ORCHESTRATOR_PORT=9000
# GPU settings
CUDA_VISIBLE_DEVICES=0
Multiple Network Volumes
You can attach multiple network volumes for organization:
- Models volume -
/workspace/models(read-only, shared) - Data volume -
/workspace/data(read-write, per-project)
Custom Startup Script
Override /start.sh behavior by creating /workspace/custom-start.sh:
#!/bin/bash
# Custom startup commands
# Source default startup
source /start.sh
# Add your custom commands here
echo "Running custom initialization..."
References
- RunPod Documentation
- RunPod Templates Overview
- Network Volumes Guide
- ComfyUI Documentation
- Supervisor Documentation
- Tailscale Documentation
Support
For issues or questions:
- Check troubleshooting section above
- Review
/workspace/logs/files - Check RunPod community forums
- Open issue in project repository