Simplified compose file naming to follow Docker Compose best practices: - Renamed docker-compose.gpu.yaml to compose.yaml - Updated all references in documentation files (README.md, DEPLOYMENT.md, GPU_DEPLOYMENT_LOG.md, RUNPOD_TEMPLATE.md) - Updated references in scripts (prepare-template.sh) This change enables simpler command syntax: - Before: docker compose -f docker-compose.gpu.yaml up -d orchestrator - After: docker compose up -d orchestrator Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
9.8 KiB
RunPod Template Creation Guide
This guide shows you how to create a reusable RunPod template so you never have to reinstall everything from scratch when Spot instances restart.
Why Create a Template?
Without Template (Manual Setup Every Time):
- ❌ Install Docker & Docker Compose (10-15 min)
- ❌ Install Tailscale (5 min)
- ❌ Pull Docker images (10-20 min)
- ❌ Download models: Qwen (~14GB), Flux (~12GB), MusicGen (~11GB) = 30-45 min
- ❌ Configure everything (5-10 min)
- Total: 60-90 minutes per Spot instance restart
With Template (Ready to Go):
- ✅ Everything pre-installed
- ✅ Models cached in
/workspace - ✅ Just start orchestrator
- Total: 2-3 minutes
Template Contents
System Software
- ✅ Docker 24.x + Docker Compose v2
- ✅ Tailscale latest
- ✅ NVIDIA Docker runtime
- ✅ Python 3.11
- ✅ Git, curl, wget, htop, nvtop
Docker Images (Pre-built)
- ✅
ai_orchestrator- Model orchestration service - ✅
ai_vllm-qwen_1- Text generation (vLLM + Qwen 2.5 7B) - ✅
ai_musicgen_1- Music generation (AudioCraft) - ✅
ghcr.io/matatonic/openedai-images-flux:latest- Image generation
Model Cache (/workspace - Persistent)
- ✅ Qwen 2.5 7B Instruct (~14GB)
- ✅ Flux.1 Schnell (~12GB)
- ✅ MusicGen Medium (~11GB)
- Total: ~37GB cached
Project Files (/workspace/ai)
- ✅ All orchestrator code
- ✅ Docker Compose configurations
- ✅ Model service configurations
- ✅ Documentation
Step-by-Step Template Creation
Prerequisites
- RunPod account
- Active RTX 4090 pod (or similar GPU)
- SSH access to the pod
- This repository cloned locally
Step 1: Deploy Fresh Pod
# Create new RunPod instance:
# - GPU: RTX 4090 (24GB VRAM)
# - Disk: 50GB container disk
# - Network Volume: Attach or create 100GB+ volume
# - Template: Start with official PyTorch or CUDA template
# Note the SSH connection details (host, port, password)
Step 2: Prepare the Instance
Run the automated preparation script:
# On your local machine, copy everything to RunPod
scp -P <PORT> -r /home/valknar/Projects/runpod/* root@<HOST>:/workspace/ai/
# SSH to the pod
ssh -p <PORT> root@<HOST>
# Run the preparation script
cd /workspace/ai
chmod +x scripts/prepare-template.sh
./scripts/prepare-template.sh
What the script does:
- Installs Docker & Docker Compose
- Installs Tailscale
- Builds all Docker images
- Pre-downloads all models
- Validates everything works
- Cleans up temporary files
Estimated time: 45-60 minutes
Step 3: Manual Verification
After the script completes, verify everything:
# Check Docker is installed
docker --version
docker compose version
# Check Tailscale
tailscale version
# Check all images are built
docker images | grep ai_
# Check models are cached
ls -lh /workspace/huggingface_cache/
ls -lh /workspace/flux/models/
ls -lh /workspace/musicgen/models/
# Test orchestrator starts
cd /workspace/ai
docker compose -f compose.yaml up -d orchestrator
docker logs ai_orchestrator
# Test model loading (should be fast since models are cached)
curl http://localhost:9000/health
# Stop orchestrator
docker compose -f compose.yaml down
Step 4: Clean Up Before Saving
IMPORTANT: Remove secrets and temporary data before creating template!
# Remove sensitive data
rm -f /workspace/ai/.env
rm -f /root/.ssh/known_hosts
rm -f /root/.bash_history
# Clear logs
rm -f /var/log/*.log
docker system prune -af --volumes # Clean Docker cache but keep images
# Clear Tailscale state (will re-authenticate on first use)
tailscale logout
# Create template-ready marker
echo "RunPod Multi-Modal AI Template v1.0" > /workspace/TEMPLATE_VERSION
echo "Created: $(date)" >> /workspace/TEMPLATE_VERSION
Step 5: Save Template in RunPod Dashboard
-
Go to RunPod Dashboard → "My Pods"
-
Select your prepared pod
-
Click "⋮" menu → "Save as Template"
-
Template Configuration:
- Name:
multi-modal-ai-v1.0 - Description:
Multi-Modal AI Stack with Orchestrator - Text: vLLM + Qwen 2.5 7B - Image: Flux.1 Schnell - Music: MusicGen Medium - Models pre-cached (~37GB) - Ready to deploy in 2-3 minutes - Category:
AI/ML - Docker Image: (auto-detected)
- Container Disk: 50GB
- Expose Ports: 9000, 8001, 8002, 8003
- Environment Variables (optional):
HF_TOKEN=<leave empty, user will add> TAILSCALE_AUTHKEY=<leave empty, user will add>
- Name:
-
Click "Save Template"
-
Wait for template creation (5-10 minutes)
-
Test the template by deploying a new pod with it
Using Your Template
Deploy New Pod from Template
-
RunPod Dashboard → "➕ Deploy"
-
Select "Community Templates" or "My Templates"
-
Choose:
multi-modal-ai-v1.0 -
Configure:
- GPU: RTX 4090 (or compatible)
- Network Volume: Attach your existing volume with
/workspacemount - Environment:
HF_TOKEN: Your Hugging Face token- (Tailscale will be configured via SSH)
-
Deploy Pod
First-Time Setup (On New Pod)
# SSH to the new pod
ssh -p <PORT> root@<HOST>
# Navigate to project
cd /workspace/ai
# Create .env file
cat > .env <<EOF
HF_TOKEN=hf_your_token_here
GPU_TAILSCALE_IP=100.100.108.13
EOF
# Configure Tailscale (one-time)
tailscale up --authkey=<YOUR_TAILSCALE_KEY>
# Start orchestrator (models already cached, starts in seconds!)
docker compose -f compose.yaml up -d orchestrator
# Verify
curl http://localhost:9000/health
# Check logs
docker logs -f ai_orchestrator
Total setup time: 2-3 minutes! 🎉
Updating SSH Config (If Spot Instance Restarts)
Since Spot instances can restart with new IPs/ports:
# On your local machine
# Update ~/.ssh/config with new connection details
Host gpu-pivoine
HostName <NEW_IP>
Port <NEW_PORT>
User root
IdentityFile ~/.ssh/id_ed25519
Template Maintenance
Updating the Template
When you add new models or make improvements:
- Deploy a pod from your existing template
- Make your changes
- Test everything
- Clean up (remove secrets)
- Save as new template version:
multi-modal-ai-v1.1 - Update your documentation
Version History
Keep track of template versions:
v1.0 (2025-11-21) - Initial release
- Text: Qwen 2.5 7B
- Image: Flux.1 Schnell
- Music: MusicGen Medium
- Docker orchestrator
v1.1 (future) - Planned
- Add Llama 3.1 8B
- Add Whisper Large v3
- Optimize model loading
Troubleshooting Template Creation
Models Not Downloading
# Manually trigger model downloads
docker compose --profile text up -d vllm-qwen
docker logs -f ai_vllm-qwen_1
# Wait for "Model loaded successfully"
docker compose stop vllm-qwen
# Repeat for other models
docker compose --profile image up -d flux
docker compose --profile audio up -d musicgen
Docker Images Not Building
# Build images one at a time
docker compose -f compose.yaml build orchestrator
docker compose -f compose.yaml build vllm-qwen
docker compose -f compose.yaml build musicgen
# Check build logs for errors
docker compose -f compose.yaml build --no-cache --progress=plain orchestrator
Tailscale Won't Install
# Manual Tailscale installation
curl -fsSL https://tailscale.com/install.sh | sh
# Start daemon
tailscaled --tun=userspace-networking --socks5-server=localhost:1055 &
# Test
tailscale version
Template Too Large
RunPod templates have size limits. If your template is too large:
Option 1: Use network volume for models
- Move models to network volume:
/workspace/models/ - Mount volume when deploying from template
- Models persist across pod restarts
Option 2: Reduce cached models
- Only cache most-used model (Qwen 2.5 7B)
- Download others on first use
- Accept slightly longer first-time startup
Option 3: Use Docker layer optimization
# In Dockerfile, order commands by change frequency
# Less frequently changed layers first
Cost Analysis
Template Storage Cost
- RunPod charges for template storage: ~$0.10/GB/month
- This template:
50GB = **$5/month** - Worth it! Saves 60-90 minutes per Spot restart
Time Savings
- Spot instance restarts: 2-5 times per week (highly variable)
- Time saved per restart: 60-90 minutes
- Total saved per month: 8-20 hours
- Value: Priceless for rapid deployment
Advanced: Automated Template Updates
Create a CI/CD pipeline to automatically update templates:
# GitHub Actions workflow (future enhancement)
# 1. Deploy pod from template
# 2. Pull latest code
# 3. Rebuild images
# 4. Test
# 5. Save new template version
# 6. Notify team
Template Checklist
Before saving your template, verify:
- All Docker images built and working
- All models downloaded and cached
- Tailscale installed (but logged out)
- Docker Compose files present
.envfile removed (secrets cleared)- Logs cleared
- SSH keys removed
- Bash history cleared
- Template version documented
- Test deployment successful
Support
If you have issues creating the template:
- Check
/workspace/ai/scripts/prepare-template.shlogs - Review Docker build logs:
docker compose build --progress=plain - Check model download logs:
docker logs <container> - Verify disk space:
df -h - Check network volume is mounted:
mount | grep workspace
For RunPod-specific issues:
- RunPod Docs: https://docs.runpod.io/
- RunPod Discord: https://discord.gg/runpod
Next Steps
After creating your template:
- ✅ Test deployment from template
- ✅ Document in
GPU_DEPLOYMENT_LOG.md - ✅ Share template ID with team (if applicable)
- ✅ Set up monitoring (Netdata, etc.)
- ✅ Configure auto-stop for cost optimization
- ✅ Add more models as needed
Your multi-modal AI infrastructure is now portable and reproducible! 🚀