# RunPod Template Creation Guide This guide shows you how to create a reusable RunPod template so you never have to reinstall everything from scratch when Spot instances restart. ## Why Create a Template? **Without Template** (Manual Setup Every Time): - ❌ Install Docker & Docker Compose (10-15 min) - ❌ Install Tailscale (5 min) - ❌ Pull Docker images (10-20 min) - ❌ Download models: Qwen (~14GB), Flux (~12GB), MusicGen (~11GB) = 30-45 min - ❌ Configure everything (5-10 min) - **Total: 60-90 minutes per Spot instance restart** **With Template** (Ready to Go): - ✅ Everything pre-installed - ✅ Models cached in `/workspace` - ✅ Just start orchestrator - **Total: 2-3 minutes** ## Template Contents ### System Software - ✅ Docker 24.x + Docker Compose v2 - ✅ Tailscale latest - ✅ NVIDIA Docker runtime - ✅ Python 3.11 - ✅ Git, curl, wget, htop, nvtop ### Docker Images (Pre-built) - ✅ `ai_orchestrator` - Model orchestration service - ✅ `ai_vllm-qwen_1` - Text generation (vLLM + Qwen 2.5 7B) - ✅ `ai_musicgen_1` - Music generation (AudioCraft) - ✅ `ghcr.io/matatonic/openedai-images-flux:latest` - Image generation ### Model Cache (/workspace - Persistent) - ✅ Qwen 2.5 7B Instruct (~14GB) - ✅ Flux.1 Schnell (~12GB) - ✅ MusicGen Medium (~11GB) - **Total: ~37GB cached** ### Project Files (/workspace/ai) - ✅ All orchestrator code - ✅ Docker Compose configurations - ✅ Model service configurations - ✅ Documentation --- ## Step-by-Step Template Creation ### Prerequisites 1. RunPod account 2. Active RTX 4090 pod (or similar GPU) 3. SSH access to the pod 4. This repository cloned locally ### Step 1: Deploy Fresh Pod ```bash # Create new RunPod instance: # - GPU: RTX 4090 (24GB VRAM) # - Disk: 50GB container disk # - Network Volume: Attach or create 100GB+ volume # - Template: Start with official PyTorch or CUDA template # Note the SSH connection details (host, port, password) ``` ### Step 2: Prepare the Instance Run the automated preparation script: ```bash # On your local machine, copy everything to RunPod scp -P -r /home/valknar/Projects/runpod/* root@:/workspace/ai/ # SSH to the pod ssh -p root@ # Run the preparation script cd /workspace/ai chmod +x scripts/prepare-template.sh ./scripts/prepare-template.sh ``` **What the script does:** 1. Installs Docker & Docker Compose 2. Installs Tailscale 3. Builds all Docker images 4. Pre-downloads all models 5. Validates everything works 6. Cleans up temporary files **Estimated time: 45-60 minutes** ### Step 3: Manual Verification After the script completes, verify everything: ```bash # Check Docker is installed docker --version docker compose version # Check Tailscale tailscale version # Check all images are built docker images | grep ai_ # Check models are cached ls -lh /workspace/huggingface_cache/ ls -lh /workspace/flux/models/ ls -lh /workspace/musicgen/models/ # Test orchestrator starts cd /workspace/ai docker compose -f compose.yaml up -d orchestrator docker logs ai_orchestrator # Test model loading (should be fast since models are cached) curl http://localhost:9000/health # Stop orchestrator docker compose -f compose.yaml down ``` ### Step 4: Clean Up Before Saving **IMPORTANT**: Remove secrets and temporary data before creating template! ```bash # Remove sensitive data rm -f /workspace/ai/.env rm -f /root/.ssh/known_hosts rm -f /root/.bash_history # Clear logs rm -f /var/log/*.log docker system prune -af --volumes # Clean Docker cache but keep images # Clear Tailscale state (will re-authenticate on first use) tailscale logout # Create template-ready marker echo "RunPod Multi-Modal AI Template v1.0" > /workspace/TEMPLATE_VERSION echo "Created: $(date)" >> /workspace/TEMPLATE_VERSION ``` ### Step 5: Save Template in RunPod Dashboard 1. **Go to RunPod Dashboard** → "My Pods" 2. **Select your prepared pod** 3. **Click "⋮" menu** → "Save as Template" 4. **Template Configuration**: - **Name**: `multi-modal-ai-v1.0` - **Description**: ``` Multi-Modal AI Stack with Orchestrator - Text: vLLM + Qwen 2.5 7B - Image: Flux.1 Schnell - Music: MusicGen Medium - Models pre-cached (~37GB) - Ready to deploy in 2-3 minutes ``` - **Category**: `AI/ML` - **Docker Image**: (auto-detected) - **Container Disk**: 50GB - **Expose Ports**: 9000, 8001, 8002, 8003 - **Environment Variables** (optional): ``` HF_TOKEN= TAILSCALE_AUTHKEY= ``` 5. **Click "Save Template"** 6. **Wait for template creation** (5-10 minutes) 7. **Test the template** by deploying a new pod with it --- ## Using Your Template ### Deploy New Pod from Template 1. **RunPod Dashboard** → "➕ Deploy" 2. **Select "Community Templates"** or "My Templates" 3. **Choose**: `multi-modal-ai-v1.0` 4. **Configure**: - GPU: RTX 4090 (or compatible) - Network Volume: Attach your existing volume with `/workspace` mount - Environment: - `HF_TOKEN`: Your Hugging Face token - (Tailscale will be configured via SSH) 5. **Deploy Pod** ### First-Time Setup (On New Pod) ```bash # SSH to the new pod ssh -p root@ # Navigate to project cd /workspace/ai # Create .env file cat > .env < # Start orchestrator (models already cached, starts in seconds!) docker compose -f compose.yaml up -d orchestrator # Verify curl http://localhost:9000/health # Check logs docker logs -f ai_orchestrator ``` **Total setup time: 2-3 minutes!** 🎉 ### Updating SSH Config (If Spot Instance Restarts) Since Spot instances can restart with new IPs/ports: ```bash # On your local machine # Update ~/.ssh/config with new connection details Host gpu-pivoine HostName Port User root IdentityFile ~/.ssh/id_ed25519 ``` --- ## Template Maintenance ### Updating the Template When you add new models or make improvements: 1. Deploy a pod from your existing template 2. Make your changes 3. Test everything 4. Clean up (remove secrets) 5. Save as new template version: `multi-modal-ai-v1.1` 6. Update your documentation ### Version History Keep track of template versions: ``` v1.0 (2025-11-21) - Initial release - Text: Qwen 2.5 7B - Image: Flux.1 Schnell - Music: MusicGen Medium - Docker orchestrator v1.1 (future) - Planned - Add Llama 3.1 8B - Add Whisper Large v3 - Optimize model loading ``` --- ## Troubleshooting Template Creation ### Models Not Downloading ```bash # Manually trigger model downloads docker compose --profile text up -d vllm-qwen docker logs -f ai_vllm-qwen_1 # Wait for "Model loaded successfully" docker compose stop vllm-qwen # Repeat for other models docker compose --profile image up -d flux docker compose --profile audio up -d musicgen ``` ### Docker Images Not Building ```bash # Build images one at a time docker compose -f compose.yaml build orchestrator docker compose -f compose.yaml build vllm-qwen docker compose -f compose.yaml build musicgen # Check build logs for errors docker compose -f compose.yaml build --no-cache --progress=plain orchestrator ``` ### Tailscale Won't Install ```bash # Manual Tailscale installation curl -fsSL https://tailscale.com/install.sh | sh # Start daemon tailscaled --tun=userspace-networking --socks5-server=localhost:1055 & # Test tailscale version ``` ### Template Too Large RunPod templates have size limits. If your template is too large: **Option 1**: Use network volume for models - Move models to network volume: `/workspace/models/` - Mount volume when deploying from template - Models persist across pod restarts **Option 2**: Reduce cached models - Only cache most-used model (Qwen 2.5 7B) - Download others on first use - Accept slightly longer first-time startup **Option 3**: Use Docker layer optimization ```dockerfile # In Dockerfile, order commands by change frequency # Less frequently changed layers first ``` --- ## Cost Analysis ### Template Storage Cost - RunPod charges for template storage: ~$0.10/GB/month - This template: ~50GB = **~$5/month** - **Worth it!** Saves 60-90 minutes per Spot restart ### Time Savings - Spot instance restarts: 2-5 times per week (highly variable) - Time saved per restart: 60-90 minutes - **Total saved per month: 8-20 hours** - **Value: Priceless for rapid deployment** --- ## Advanced: Automated Template Updates Create a CI/CD pipeline to automatically update templates: ```bash # GitHub Actions workflow (future enhancement) # 1. Deploy pod from template # 2. Pull latest code # 3. Rebuild images # 4. Test # 5. Save new template version # 6. Notify team ``` --- ## Template Checklist Before saving your template, verify: - [ ] All Docker images built and working - [ ] All models downloaded and cached - [ ] Tailscale installed (but logged out) - [ ] Docker Compose files present - [ ] `.env` file removed (secrets cleared) - [ ] Logs cleared - [ ] SSH keys removed - [ ] Bash history cleared - [ ] Template version documented - [ ] Test deployment successful --- ## Support If you have issues creating the template: 1. Check `/workspace/ai/scripts/prepare-template.sh` logs 2. Review Docker build logs: `docker compose build --progress=plain` 3. Check model download logs: `docker logs ` 4. Verify disk space: `df -h` 5. Check network volume is mounted: `mount | grep workspace` For RunPod-specific issues: - RunPod Docs: https://docs.runpod.io/ - RunPod Discord: https://discord.gg/runpod --- ## Next Steps After creating your template: 1. ✅ Test deployment from template 2. ✅ Document in `GPU_DEPLOYMENT_LOG.md` 3. ✅ Share template ID with team (if applicable) 4. ✅ Set up monitoring (Netdata, etc.) 5. ✅ Configure auto-stop for cost optimization 6. ✅ Add more models as needed **Your multi-modal AI infrastructure is now portable and reproducible!** 🚀