refactor: clean up runpod repository structure

Removed facefusion and VPS-related files:
- compose.yaml, postgres/, litellm-config.yaml (VPS services)
- Dockerfile, entrypoint.sh, disable-nsfw-filter.patch (facefusion)

Removed outdated documentation:
- DOCKER_GPU_SETUP.md, README_GPU_SETUP.md, SETUP_GUIDE.md
- TAILSCALE_SETUP.md, WIREGUARD_SETUP.md (covered in DEPLOYMENT.md)
- GPU_EXPANSION_PLAN.md (historical planning doc)
- gpu-server-compose.yaml, litellm-config-gpu.yaml (old versions)
- deploy-gpu-stack.sh, simple_vllm_server.py (old scripts)

Organized documentation:
- Created docs/ directory
- Moved DEPLOYMENT.md, RUNPOD_TEMPLATE.md, GPU_DEPLOYMENT_LOG.md to docs/
- Updated all documentation links in README.md

Final structure:
- Clean root directory with only GPU-specific files
- Organized documentation in docs/
- Model services in dedicated directories (model-orchestrator/, vllm/, flux/, musicgen/)
- Automation scripts in scripts/
This commit is contained in:
2025-11-21 14:45:49 +01:00
parent 277f1c95bd
commit cafa0a1147
20 changed files with 8 additions and 4612 deletions

View File

@@ -1,416 +0,0 @@
# RunPod Template Creation Guide
This guide shows you how to create a reusable RunPod template so you never have to reinstall everything from scratch when Spot instances restart.
## Why Create a Template?
**Without Template** (Manual Setup Every Time):
- ❌ Install Docker & Docker Compose (10-15 min)
- ❌ Install Tailscale (5 min)
- ❌ Pull Docker images (10-20 min)
- ❌ Download models: Qwen (~14GB), Flux (~12GB), MusicGen (~11GB) = 30-45 min
- ❌ Configure everything (5-10 min)
- **Total: 60-90 minutes per Spot instance restart**
**With Template** (Ready to Go):
- ✅ Everything pre-installed
- ✅ Models cached in `/workspace`
- ✅ Just start orchestrator
- **Total: 2-3 minutes**
## Template Contents
### System Software
- ✅ Docker 24.x + Docker Compose v2
- ✅ Tailscale latest
- ✅ NVIDIA Docker runtime
- ✅ Python 3.11
- ✅ Git, curl, wget, htop, nvtop
### Docker Images (Pre-built)
-`ai_orchestrator` - Model orchestration service
-`ai_vllm-qwen_1` - Text generation (vLLM + Qwen 2.5 7B)
-`ai_musicgen_1` - Music generation (AudioCraft)
-`ghcr.io/matatonic/openedai-images-flux:latest` - Image generation
### Model Cache (/workspace - Persistent)
- ✅ Qwen 2.5 7B Instruct (~14GB)
- ✅ Flux.1 Schnell (~12GB)
- ✅ MusicGen Medium (~11GB)
- **Total: ~37GB cached**
### Project Files (/workspace/ai)
- ✅ All orchestrator code
- ✅ Docker Compose configurations
- ✅ Model service configurations
- ✅ Documentation
---
## Step-by-Step Template Creation
### Prerequisites
1. RunPod account
2. Active RTX 4090 pod (or similar GPU)
3. SSH access to the pod
4. This repository cloned locally
### Step 1: Deploy Fresh Pod
```bash
# Create new RunPod instance:
# - GPU: RTX 4090 (24GB VRAM)
# - Disk: 50GB container disk
# - Network Volume: Attach or create 100GB+ volume
# - Template: Start with official PyTorch or CUDA template
# Note the SSH connection details (host, port, password)
```
### Step 2: Prepare the Instance
Run the automated preparation script:
```bash
# On your local machine, copy everything to RunPod
scp -P <PORT> -r /home/valknar/Projects/runpod/* root@<HOST>:/workspace/ai/
# SSH to the pod
ssh -p <PORT> root@<HOST>
# Run the preparation script
cd /workspace/ai
chmod +x scripts/prepare-template.sh
./scripts/prepare-template.sh
```
**What the script does:**
1. Installs Docker & Docker Compose
2. Installs Tailscale
3. Builds all Docker images
4. Pre-downloads all models
5. Validates everything works
6. Cleans up temporary files
**Estimated time: 45-60 minutes**
### Step 3: Manual Verification
After the script completes, verify everything:
```bash
# Check Docker is installed
docker --version
docker compose version
# Check Tailscale
tailscale version
# Check all images are built
docker images | grep ai_
# Check models are cached
ls -lh /workspace/huggingface_cache/
ls -lh /workspace/flux/models/
ls -lh /workspace/musicgen/models/
# Test orchestrator starts
cd /workspace/ai
docker compose -f docker-compose.gpu.yaml up -d orchestrator
docker logs ai_orchestrator
# Test model loading (should be fast since models are cached)
curl http://localhost:9000/health
# Stop orchestrator
docker compose -f docker-compose.gpu.yaml down
```
### Step 4: Clean Up Before Saving
**IMPORTANT**: Remove secrets and temporary data before creating template!
```bash
# Remove sensitive data
rm -f /workspace/ai/.env
rm -f /root/.ssh/known_hosts
rm -f /root/.bash_history
# Clear logs
rm -f /var/log/*.log
docker system prune -af --volumes # Clean Docker cache but keep images
# Clear Tailscale state (will re-authenticate on first use)
tailscale logout
# Create template-ready marker
echo "RunPod Multi-Modal AI Template v1.0" > /workspace/TEMPLATE_VERSION
echo "Created: $(date)" >> /workspace/TEMPLATE_VERSION
```
### Step 5: Save Template in RunPod Dashboard
1. **Go to RunPod Dashboard** → "My Pods"
2. **Select your prepared pod**
3. **Click "⋮" menu** → "Save as Template"
4. **Template Configuration**:
- **Name**: `multi-modal-ai-v1.0`
- **Description**:
```
Multi-Modal AI Stack with Orchestrator
- Text: vLLM + Qwen 2.5 7B
- Image: Flux.1 Schnell
- Music: MusicGen Medium
- Models pre-cached (~37GB)
- Ready to deploy in 2-3 minutes
```
- **Category**: `AI/ML`
- **Docker Image**: (auto-detected)
- **Container Disk**: 50GB
- **Expose Ports**: 9000, 8001, 8002, 8003
- **Environment Variables** (optional):
```
HF_TOKEN=<leave empty, user will add>
TAILSCALE_AUTHKEY=<leave empty, user will add>
```
5. **Click "Save Template"**
6. **Wait for template creation** (5-10 minutes)
7. **Test the template** by deploying a new pod with it
---
## Using Your Template
### Deploy New Pod from Template
1. **RunPod Dashboard** → " Deploy"
2. **Select "Community Templates"** or "My Templates"
3. **Choose**: `multi-modal-ai-v1.0`
4. **Configure**:
- GPU: RTX 4090 (or compatible)
- Network Volume: Attach your existing volume with `/workspace` mount
- Environment:
- `HF_TOKEN`: Your Hugging Face token
- (Tailscale will be configured via SSH)
5. **Deploy Pod**
### First-Time Setup (On New Pod)
```bash
# SSH to the new pod
ssh -p <PORT> root@<HOST>
# Navigate to project
cd /workspace/ai
# Create .env file
cat > .env <<EOF
HF_TOKEN=hf_your_token_here
GPU_TAILSCALE_IP=100.100.108.13
EOF
# Configure Tailscale (one-time)
tailscale up --authkey=<YOUR_TAILSCALE_KEY>
# Start orchestrator (models already cached, starts in seconds!)
docker compose -f docker-compose.gpu.yaml up -d orchestrator
# Verify
curl http://localhost:9000/health
# Check logs
docker logs -f ai_orchestrator
```
**Total setup time: 2-3 minutes!** 🎉
### Updating SSH Config (If Spot Instance Restarts)
Since Spot instances can restart with new IPs/ports:
```bash
# On your local machine
# Update ~/.ssh/config with new connection details
Host gpu-pivoine
HostName <NEW_IP>
Port <NEW_PORT>
User root
IdentityFile ~/.ssh/id_ed25519
```
---
## Template Maintenance
### Updating the Template
When you add new models or make improvements:
1. Deploy a pod from your existing template
2. Make your changes
3. Test everything
4. Clean up (remove secrets)
5. Save as new template version: `multi-modal-ai-v1.1`
6. Update your documentation
### Version History
Keep track of template versions:
```
v1.0 (2025-11-21) - Initial release
- Text: Qwen 2.5 7B
- Image: Flux.1 Schnell
- Music: MusicGen Medium
- Docker orchestrator
v1.1 (future) - Planned
- Add Llama 3.1 8B
- Add Whisper Large v3
- Optimize model loading
```
---
## Troubleshooting Template Creation
### Models Not Downloading
```bash
# Manually trigger model downloads
docker compose --profile text up -d vllm-qwen
docker logs -f ai_vllm-qwen_1
# Wait for "Model loaded successfully"
docker compose stop vllm-qwen
# Repeat for other models
docker compose --profile image up -d flux
docker compose --profile audio up -d musicgen
```
### Docker Images Not Building
```bash
# Build images one at a time
docker compose -f docker-compose.gpu.yaml build orchestrator
docker compose -f docker-compose.gpu.yaml build vllm-qwen
docker compose -f docker-compose.gpu.yaml build musicgen
# Check build logs for errors
docker compose -f docker-compose.gpu.yaml build --no-cache --progress=plain orchestrator
```
### Tailscale Won't Install
```bash
# Manual Tailscale installation
curl -fsSL https://tailscale.com/install.sh | sh
# Start daemon
tailscaled --tun=userspace-networking --socks5-server=localhost:1055 &
# Test
tailscale version
```
### Template Too Large
RunPod templates have size limits. If your template is too large:
**Option 1**: Use network volume for models
- Move models to network volume: `/workspace/models/`
- Mount volume when deploying from template
- Models persist across pod restarts
**Option 2**: Reduce cached models
- Only cache most-used model (Qwen 2.5 7B)
- Download others on first use
- Accept slightly longer first-time startup
**Option 3**: Use Docker layer optimization
```dockerfile
# In Dockerfile, order commands by change frequency
# Less frequently changed layers first
```
---
## Cost Analysis
### Template Storage Cost
- RunPod charges for template storage: ~$0.10/GB/month
- This template: ~50GB = **~$5/month**
- **Worth it!** Saves 60-90 minutes per Spot restart
### Time Savings
- Spot instance restarts: 2-5 times per week (highly variable)
- Time saved per restart: 60-90 minutes
- **Total saved per month: 8-20 hours**
- **Value: Priceless for rapid deployment**
---
## Advanced: Automated Template Updates
Create a CI/CD pipeline to automatically update templates:
```bash
# GitHub Actions workflow (future enhancement)
# 1. Deploy pod from template
# 2. Pull latest code
# 3. Rebuild images
# 4. Test
# 5. Save new template version
# 6. Notify team
```
---
## Template Checklist
Before saving your template, verify:
- [ ] All Docker images built and working
- [ ] All models downloaded and cached
- [ ] Tailscale installed (but logged out)
- [ ] Docker Compose files present
- [ ] `.env` file removed (secrets cleared)
- [ ] Logs cleared
- [ ] SSH keys removed
- [ ] Bash history cleared
- [ ] Template version documented
- [ ] Test deployment successful
---
## Support
If you have issues creating the template:
1. Check `/workspace/ai/scripts/prepare-template.sh` logs
2. Review Docker build logs: `docker compose build --progress=plain`
3. Check model download logs: `docker logs <container>`
4. Verify disk space: `df -h`
5. Check network volume is mounted: `mount | grep workspace`
For RunPod-specific issues:
- RunPod Docs: https://docs.runpod.io/
- RunPod Discord: https://discord.gg/runpod
---
## Next Steps
After creating your template:
1. ✅ Test deployment from template
2. ✅ Document in `GPU_DEPLOYMENT_LOG.md`
3. ✅ Share template ID with team (if applicable)
4. ✅ Set up monitoring (Netdata, etc.)
5. ✅ Configure auto-stop for cost optimization
6. ✅ Add more models as needed
**Your multi-modal AI infrastructure is now portable and reproducible!** 🚀