Simplified compose file naming to follow Docker Compose best practices: - Renamed docker-compose.gpu.yaml to compose.yaml - Updated all references in documentation files (README.md, DEPLOYMENT.md, GPU_DEPLOYMENT_LOG.md, RUNPOD_TEMPLATE.md) - Updated references in scripts (prepare-template.sh) This change enables simpler command syntax: - Before: docker compose -f docker-compose.gpu.yaml up -d orchestrator - After: docker compose up -d orchestrator Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
417 lines
9.8 KiB
Markdown
417 lines
9.8 KiB
Markdown
# RunPod Template Creation Guide
|
||
|
||
This guide shows you how to create a reusable RunPod template so you never have to reinstall everything from scratch when Spot instances restart.
|
||
|
||
## Why Create a Template?
|
||
|
||
**Without Template** (Manual Setup Every Time):
|
||
- ❌ Install Docker & Docker Compose (10-15 min)
|
||
- ❌ Install Tailscale (5 min)
|
||
- ❌ Pull Docker images (10-20 min)
|
||
- ❌ Download models: Qwen (~14GB), Flux (~12GB), MusicGen (~11GB) = 30-45 min
|
||
- ❌ Configure everything (5-10 min)
|
||
- **Total: 60-90 minutes per Spot instance restart**
|
||
|
||
**With Template** (Ready to Go):
|
||
- ✅ Everything pre-installed
|
||
- ✅ Models cached in `/workspace`
|
||
- ✅ Just start orchestrator
|
||
- **Total: 2-3 minutes**
|
||
|
||
## Template Contents
|
||
|
||
### System Software
|
||
- ✅ Docker 24.x + Docker Compose v2
|
||
- ✅ Tailscale latest
|
||
- ✅ NVIDIA Docker runtime
|
||
- ✅ Python 3.11
|
||
- ✅ Git, curl, wget, htop, nvtop
|
||
|
||
### Docker Images (Pre-built)
|
||
- ✅ `ai_orchestrator` - Model orchestration service
|
||
- ✅ `ai_vllm-qwen_1` - Text generation (vLLM + Qwen 2.5 7B)
|
||
- ✅ `ai_musicgen_1` - Music generation (AudioCraft)
|
||
- ✅ `ghcr.io/matatonic/openedai-images-flux:latest` - Image generation
|
||
|
||
### Model Cache (/workspace - Persistent)
|
||
- ✅ Qwen 2.5 7B Instruct (~14GB)
|
||
- ✅ Flux.1 Schnell (~12GB)
|
||
- ✅ MusicGen Medium (~11GB)
|
||
- **Total: ~37GB cached**
|
||
|
||
### Project Files (/workspace/ai)
|
||
- ✅ All orchestrator code
|
||
- ✅ Docker Compose configurations
|
||
- ✅ Model service configurations
|
||
- ✅ Documentation
|
||
|
||
---
|
||
|
||
## Step-by-Step Template Creation
|
||
|
||
### Prerequisites
|
||
1. RunPod account
|
||
2. Active RTX 4090 pod (or similar GPU)
|
||
3. SSH access to the pod
|
||
4. This repository cloned locally
|
||
|
||
### Step 1: Deploy Fresh Pod
|
||
|
||
```bash
|
||
# Create new RunPod instance:
|
||
# - GPU: RTX 4090 (24GB VRAM)
|
||
# - Disk: 50GB container disk
|
||
# - Network Volume: Attach or create 100GB+ volume
|
||
# - Template: Start with official PyTorch or CUDA template
|
||
|
||
# Note the SSH connection details (host, port, password)
|
||
```
|
||
|
||
### Step 2: Prepare the Instance
|
||
|
||
Run the automated preparation script:
|
||
|
||
```bash
|
||
# On your local machine, copy everything to RunPod
|
||
scp -P <PORT> -r /home/valknar/Projects/runpod/* root@<HOST>:/workspace/ai/
|
||
|
||
# SSH to the pod
|
||
ssh -p <PORT> root@<HOST>
|
||
|
||
# Run the preparation script
|
||
cd /workspace/ai
|
||
chmod +x scripts/prepare-template.sh
|
||
./scripts/prepare-template.sh
|
||
```
|
||
|
||
**What the script does:**
|
||
1. Installs Docker & Docker Compose
|
||
2. Installs Tailscale
|
||
3. Builds all Docker images
|
||
4. Pre-downloads all models
|
||
5. Validates everything works
|
||
6. Cleans up temporary files
|
||
|
||
**Estimated time: 45-60 minutes**
|
||
|
||
### Step 3: Manual Verification
|
||
|
||
After the script completes, verify everything:
|
||
|
||
```bash
|
||
# Check Docker is installed
|
||
docker --version
|
||
docker compose version
|
||
|
||
# Check Tailscale
|
||
tailscale version
|
||
|
||
# Check all images are built
|
||
docker images | grep ai_
|
||
|
||
# Check models are cached
|
||
ls -lh /workspace/huggingface_cache/
|
||
ls -lh /workspace/flux/models/
|
||
ls -lh /workspace/musicgen/models/
|
||
|
||
# Test orchestrator starts
|
||
cd /workspace/ai
|
||
docker compose -f compose.yaml up -d orchestrator
|
||
docker logs ai_orchestrator
|
||
|
||
# Test model loading (should be fast since models are cached)
|
||
curl http://localhost:9000/health
|
||
|
||
# Stop orchestrator
|
||
docker compose -f compose.yaml down
|
||
```
|
||
|
||
### Step 4: Clean Up Before Saving
|
||
|
||
**IMPORTANT**: Remove secrets and temporary data before creating template!
|
||
|
||
```bash
|
||
# Remove sensitive data
|
||
rm -f /workspace/ai/.env
|
||
rm -f /root/.ssh/known_hosts
|
||
rm -f /root/.bash_history
|
||
|
||
# Clear logs
|
||
rm -f /var/log/*.log
|
||
docker system prune -af --volumes # Clean Docker cache but keep images
|
||
|
||
# Clear Tailscale state (will re-authenticate on first use)
|
||
tailscale logout
|
||
|
||
# Create template-ready marker
|
||
echo "RunPod Multi-Modal AI Template v1.0" > /workspace/TEMPLATE_VERSION
|
||
echo "Created: $(date)" >> /workspace/TEMPLATE_VERSION
|
||
```
|
||
|
||
### Step 5: Save Template in RunPod Dashboard
|
||
|
||
1. **Go to RunPod Dashboard** → "My Pods"
|
||
2. **Select your prepared pod**
|
||
3. **Click "⋮" menu** → "Save as Template"
|
||
4. **Template Configuration**:
|
||
- **Name**: `multi-modal-ai-v1.0`
|
||
- **Description**:
|
||
```
|
||
Multi-Modal AI Stack with Orchestrator
|
||
- Text: vLLM + Qwen 2.5 7B
|
||
- Image: Flux.1 Schnell
|
||
- Music: MusicGen Medium
|
||
- Models pre-cached (~37GB)
|
||
- Ready to deploy in 2-3 minutes
|
||
```
|
||
- **Category**: `AI/ML`
|
||
- **Docker Image**: (auto-detected)
|
||
- **Container Disk**: 50GB
|
||
- **Expose Ports**: 9000, 8001, 8002, 8003
|
||
- **Environment Variables** (optional):
|
||
```
|
||
HF_TOKEN=<leave empty, user will add>
|
||
TAILSCALE_AUTHKEY=<leave empty, user will add>
|
||
```
|
||
|
||
5. **Click "Save Template"**
|
||
6. **Wait for template creation** (5-10 minutes)
|
||
7. **Test the template** by deploying a new pod with it
|
||
|
||
---
|
||
|
||
## Using Your Template
|
||
|
||
### Deploy New Pod from Template
|
||
|
||
1. **RunPod Dashboard** → "➕ Deploy"
|
||
2. **Select "Community Templates"** or "My Templates"
|
||
3. **Choose**: `multi-modal-ai-v1.0`
|
||
4. **Configure**:
|
||
- GPU: RTX 4090 (or compatible)
|
||
- Network Volume: Attach your existing volume with `/workspace` mount
|
||
- Environment:
|
||
- `HF_TOKEN`: Your Hugging Face token
|
||
- (Tailscale will be configured via SSH)
|
||
|
||
5. **Deploy Pod**
|
||
|
||
### First-Time Setup (On New Pod)
|
||
|
||
```bash
|
||
# SSH to the new pod
|
||
ssh -p <PORT> root@<HOST>
|
||
|
||
# Navigate to project
|
||
cd /workspace/ai
|
||
|
||
# Create .env file
|
||
cat > .env <<EOF
|
||
HF_TOKEN=hf_your_token_here
|
||
GPU_TAILSCALE_IP=100.100.108.13
|
||
EOF
|
||
|
||
# Configure Tailscale (one-time)
|
||
tailscale up --authkey=<YOUR_TAILSCALE_KEY>
|
||
|
||
# Start orchestrator (models already cached, starts in seconds!)
|
||
docker compose -f compose.yaml up -d orchestrator
|
||
|
||
# Verify
|
||
curl http://localhost:9000/health
|
||
|
||
# Check logs
|
||
docker logs -f ai_orchestrator
|
||
```
|
||
|
||
**Total setup time: 2-3 minutes!** 🎉
|
||
|
||
### Updating SSH Config (If Spot Instance Restarts)
|
||
|
||
Since Spot instances can restart with new IPs/ports:
|
||
|
||
```bash
|
||
# On your local machine
|
||
# Update ~/.ssh/config with new connection details
|
||
|
||
Host gpu-pivoine
|
||
HostName <NEW_IP>
|
||
Port <NEW_PORT>
|
||
User root
|
||
IdentityFile ~/.ssh/id_ed25519
|
||
```
|
||
|
||
---
|
||
|
||
## Template Maintenance
|
||
|
||
### Updating the Template
|
||
|
||
When you add new models or make improvements:
|
||
|
||
1. Deploy a pod from your existing template
|
||
2. Make your changes
|
||
3. Test everything
|
||
4. Clean up (remove secrets)
|
||
5. Save as new template version: `multi-modal-ai-v1.1`
|
||
6. Update your documentation
|
||
|
||
### Version History
|
||
|
||
Keep track of template versions:
|
||
|
||
```
|
||
v1.0 (2025-11-21) - Initial release
|
||
- Text: Qwen 2.5 7B
|
||
- Image: Flux.1 Schnell
|
||
- Music: MusicGen Medium
|
||
- Docker orchestrator
|
||
|
||
v1.1 (future) - Planned
|
||
- Add Llama 3.1 8B
|
||
- Add Whisper Large v3
|
||
- Optimize model loading
|
||
```
|
||
|
||
---
|
||
|
||
## Troubleshooting Template Creation
|
||
|
||
### Models Not Downloading
|
||
|
||
```bash
|
||
# Manually trigger model downloads
|
||
docker compose --profile text up -d vllm-qwen
|
||
docker logs -f ai_vllm-qwen_1
|
||
# Wait for "Model loaded successfully"
|
||
docker compose stop vllm-qwen
|
||
|
||
# Repeat for other models
|
||
docker compose --profile image up -d flux
|
||
docker compose --profile audio up -d musicgen
|
||
```
|
||
|
||
### Docker Images Not Building
|
||
|
||
```bash
|
||
# Build images one at a time
|
||
docker compose -f compose.yaml build orchestrator
|
||
docker compose -f compose.yaml build vllm-qwen
|
||
docker compose -f compose.yaml build musicgen
|
||
|
||
# Check build logs for errors
|
||
docker compose -f compose.yaml build --no-cache --progress=plain orchestrator
|
||
```
|
||
|
||
### Tailscale Won't Install
|
||
|
||
```bash
|
||
# Manual Tailscale installation
|
||
curl -fsSL https://tailscale.com/install.sh | sh
|
||
|
||
# Start daemon
|
||
tailscaled --tun=userspace-networking --socks5-server=localhost:1055 &
|
||
|
||
# Test
|
||
tailscale version
|
||
```
|
||
|
||
### Template Too Large
|
||
|
||
RunPod templates have size limits. If your template is too large:
|
||
|
||
**Option 1**: Use network volume for models
|
||
- Move models to network volume: `/workspace/models/`
|
||
- Mount volume when deploying from template
|
||
- Models persist across pod restarts
|
||
|
||
**Option 2**: Reduce cached models
|
||
- Only cache most-used model (Qwen 2.5 7B)
|
||
- Download others on first use
|
||
- Accept slightly longer first-time startup
|
||
|
||
**Option 3**: Use Docker layer optimization
|
||
```dockerfile
|
||
# In Dockerfile, order commands by change frequency
|
||
# Less frequently changed layers first
|
||
```
|
||
|
||
---
|
||
|
||
## Cost Analysis
|
||
|
||
### Template Storage Cost
|
||
- RunPod charges for template storage: ~$0.10/GB/month
|
||
- This template: ~50GB = **~$5/month**
|
||
- **Worth it!** Saves 60-90 minutes per Spot restart
|
||
|
||
### Time Savings
|
||
- Spot instance restarts: 2-5 times per week (highly variable)
|
||
- Time saved per restart: 60-90 minutes
|
||
- **Total saved per month: 8-20 hours**
|
||
- **Value: Priceless for rapid deployment**
|
||
|
||
---
|
||
|
||
## Advanced: Automated Template Updates
|
||
|
||
Create a CI/CD pipeline to automatically update templates:
|
||
|
||
```bash
|
||
# GitHub Actions workflow (future enhancement)
|
||
# 1. Deploy pod from template
|
||
# 2. Pull latest code
|
||
# 3. Rebuild images
|
||
# 4. Test
|
||
# 5. Save new template version
|
||
# 6. Notify team
|
||
```
|
||
|
||
---
|
||
|
||
## Template Checklist
|
||
|
||
Before saving your template, verify:
|
||
|
||
- [ ] All Docker images built and working
|
||
- [ ] All models downloaded and cached
|
||
- [ ] Tailscale installed (but logged out)
|
||
- [ ] Docker Compose files present
|
||
- [ ] `.env` file removed (secrets cleared)
|
||
- [ ] Logs cleared
|
||
- [ ] SSH keys removed
|
||
- [ ] Bash history cleared
|
||
- [ ] Template version documented
|
||
- [ ] Test deployment successful
|
||
|
||
---
|
||
|
||
## Support
|
||
|
||
If you have issues creating the template:
|
||
|
||
1. Check `/workspace/ai/scripts/prepare-template.sh` logs
|
||
2. Review Docker build logs: `docker compose build --progress=plain`
|
||
3. Check model download logs: `docker logs <container>`
|
||
4. Verify disk space: `df -h`
|
||
5. Check network volume is mounted: `mount | grep workspace`
|
||
|
||
For RunPod-specific issues:
|
||
- RunPod Docs: https://docs.runpod.io/
|
||
- RunPod Discord: https://discord.gg/runpod
|
||
|
||
---
|
||
|
||
## Next Steps
|
||
|
||
After creating your template:
|
||
|
||
1. ✅ Test deployment from template
|
||
2. ✅ Document in `GPU_DEPLOYMENT_LOG.md`
|
||
3. ✅ Share template ID with team (if applicable)
|
||
4. ✅ Set up monitoring (Netdata, etc.)
|
||
5. ✅ Configure auto-stop for cost optimization
|
||
6. ✅ Add more models as needed
|
||
|
||
**Your multi-modal AI infrastructure is now portable and reproducible!** 🚀
|