Initial commit: RunPod multi-modal AI orchestration stack

- Multi-modal AI infrastructure for RunPod RTX 4090 - Automatic model orchestration (text, image, music) - Text: vLLM + Qwen 2.5 7B Instruct - Image: Flux.1 Schnell via OpenEDAI - Music: MusicGen Medium via AudioCraft - Cost-optimized sequential loading on single GPU - Template preparation scripts for rapid deployment - Comprehensive documentation (README, DEPLOYMENT, TEMPLATE)
2025-11-21 14:34:55 +01:00
commit 277f1c95bd
35 changed files with 7654 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,180 @@
+# RunPod Multi-Modal AI Stack
+
+**Cost-optimized GPU deployment for text, image, and music generation on RunPod RTX 4090.**
+
+This repository contains everything needed to deploy and manage a multi-modal AI infrastructure on RunPod, featuring intelligent model orchestration that automatically switches between models based on request type.
+
+## Features
+
+- **Text Generation**: Qwen 2.5 7B Instruct via vLLM (~50 tokens/sec)
+- **Image Generation**: Flux.1 Schnell (~4-5 seconds per image)
+- **Music Generation**: MusicGen Medium (30 seconds of audio in 60-90 seconds)
+- **Automatic Model Switching**: Intelligent orchestrator manages sequential model loading
+- **OpenAI-Compatible APIs**: Works with existing AI tools and clients
+- **Easy Model Addition**: Just edit `model-orchestrator/models.yaml` to add new models
+- **Template Support**: Create reusable templates for 2-3 minute deployments (vs 60-90 minutes)
+
+## Quick Start
+
+### Option 1: Deploy from Template (Recommended)
+
+If you've already created a RunPod template:
+
+1. Deploy pod from template in RunPod dashboard
+2. SSH to the pod
+3. Create `.env` file with your credentials
+4. Start orchestrator: `docker compose -f docker-compose.gpu.yaml up -d orchestrator`
+
+**See**: [RUNPOD_TEMPLATE.md](RUNPOD_TEMPLATE.md) for template usage instructions.
+
+### Option 2: Fresh Deployment
+
+For first-time setup on a new RunPod instance:
+
+1. Copy files to RunPod: `scp -r * gpu-server:/workspace/ai/`
+2. SSH to GPU server: `ssh gpu-server`
+3. Run preparation script: `cd /workspace/ai && chmod +x scripts/prepare-template.sh && ./scripts/prepare-template.sh`
+
+**See**: [DEPLOYMENT.md](DEPLOYMENT.md) for detailed deployment guide.
+
+## Architecture
+
+```
+VPS (LiteLLM Proxy)
+    ↓ Tailscale VPN
+GPU Server (Orchestrator Port 9000)
+    ├── vLLM (Qwen 2.5 7B) - Port 8001
+    ├── Flux.1 Schnell - Port 8002
+    └── MusicGen Medium - Port 8003
+```
+
+All requests route through the orchestrator, which automatically loads the appropriate model. Only one model is active at a time for cost optimization (~$0.50/hr vs ~$0.75/hr for multi-GPU).
+
+## Cost Analysis
+
+**RunPod RTX 4090 Spot Instance**:
+- **Hourly**: ~$0.50
+- **Monthly (24/7)**: ~$360
+- **Monthly (8hr/day)**: ~$120
+
+**Template Benefits**:
+- **Without Template**: 60-90 minutes setup per Spot restart
+- **With Template**: 2-3 minutes deployment time
+- **Spot Restart Frequency**: 2-5 times per week (variable)
+
+## Documentation
+
+### Primary Docs
+- **[DEPLOYMENT.md](DEPLOYMENT.md)** - Complete deployment and usage guide
+- **[RUNPOD_TEMPLATE.md](RUNPOD_TEMPLATE.md)** - Template creation and usage
+- **[GPU_DEPLOYMENT_LOG.md](GPU_DEPLOYMENT_LOG.md)** - Deployment history and technical notes
+
+### Setup Guides (Historical)
+- `DOCKER_GPU_SETUP.md` - Docker configuration for GPU support
+- `TAILSCALE_SETUP.md` - Tailscale VPN setup
+- `WIREGUARD_SETUP.md` - WireGuard VPN (deprecated, use Tailscale)
+- `SETUP_GUIDE.md` - General setup instructions
+
+### Architecture Components
+- `model-orchestrator/` - FastAPI orchestrator managing model lifecycle
+- `vllm/` - Text generation service (Qwen 2.5 7B)
+- `flux/` - Image generation service (Flux.1 Schnell)
+- `musicgen/` - Music generation service (MusicGen Medium)
+- `scripts/` - Automation scripts
+
+## Creating a RunPod Template
+
+**Why create a template?**
+- Save 60-90 minutes on every Spot instance restart
+- Pre-downloaded models (~37GB cached)
+- Pre-built Docker images
+- Ready-to-use configuration
+
+**How to create:**
+1. Run `scripts/prepare-template.sh` on a fresh RunPod instance
+2. Wait 45-60 minutes for models to download and images to build
+3. Save pod as template in RunPod dashboard
+4. Name: `multi-modal-ai-v1.0`
+
+**See**: [RUNPOD_TEMPLATE.md](RUNPOD_TEMPLATE.md) for step-by-step guide.
+
+## Adding New Models
+
+Adding models is easy! Just edit `model-orchestrator/models.yaml`:
+
+```yaml
+models:
+  llama-3.1-8b:  # New model
+    type: text
+    framework: vllm
+    docker_service: vllm-llama
+    port: 8004
+    vram_gb: 17
+    startup_time_seconds: 120
+    endpoint: /v1/chat/completions
+```
+
+Then add the Docker service to `docker-compose.gpu.yaml` and restart the orchestrator.
+
+**See**: [DEPLOYMENT.md](DEPLOYMENT.md#adding-new-models) for complete instructions.
+
+## Usage Examples
+
+### Text Generation
+```bash
+curl http://100.100.108.13:9000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{"model": "qwen-2.5-7b", "messages": [{"role": "user", "content": "Hello!"}]}'
+```
+
+### Image Generation
+```bash
+curl http://100.100.108.13:9000/v1/images/generations \
+  -H "Content-Type: application/json" \
+  -d '{"model": "flux-schnell", "prompt": "a cute cat", "size": "1024x1024"}'
+```
+
+### Music Generation
+```bash
+curl http://100.100.108.13:9000/v1/audio/generations \
+  -H "Content-Type: application/json" \
+  -d '{"model": "musicgen-medium", "prompt": "upbeat electronic", "duration": 30}'
+```
+
+## Infrastructure
+
+**Provider**: RunPod (Spot Instance)
+**GPU**: NVIDIA RTX 4090 24GB VRAM
+**Region**: Europe
+**Network**: Tailscale VPN (100.100.108.13)
+**Storage**: 922TB network volume at `/workspace`
+
+## Monitoring
+
+```bash
+# Check active model
+curl http://100.100.108.13:9000/health
+
+# View orchestrator logs
+docker logs -f ai_orchestrator
+
+# GPU usage
+nvidia-smi
+```
+
+## Support
+
+For issues:
+1. Check orchestrator logs: `docker logs ai_orchestrator`
+2. Review [DEPLOYMENT.md](DEPLOYMENT.md#troubleshooting)
+3. Check [GPU_DEPLOYMENT_LOG.md](GPU_DEPLOYMENT_LOG.md) for deployment history
+
+## License
+
+Built with:
+- [vLLM](https://github.com/vllm-project/vllm) - Apache 2.0
+- [AudioCraft](https://github.com/facebookresearch/audiocraft) - MIT (code), CC-BY-NC (weights)
+- [Flux.1](https://github.com/black-forest-labs/flux) - Apache 2.0
+- [LiteLLM](https://github.com/BerriAI/litellm) - MIT
+
+**Note**: MusicGen pre-trained weights are non-commercial (CC-BY-NC).