- Multi-modal AI infrastructure for RunPod RTX 4090 - Automatic model orchestration (text, image, music) - Text: vLLM + Qwen 2.5 7B Instruct - Image: Flux.1 Schnell via OpenEDAI - Music: MusicGen Medium via AudioCraft - Cost-optimized sequential loading on single GPU - Template preparation scripts for rapid deployment - Comprehensive documentation (README, DEPLOYMENT, TEMPLATE)
181 lines
5.7 KiB
Markdown
181 lines
5.7 KiB
Markdown
# RunPod Multi-Modal AI Stack
|
|
|
|
**Cost-optimized GPU deployment for text, image, and music generation on RunPod RTX 4090.**
|
|
|
|
This repository contains everything needed to deploy and manage a multi-modal AI infrastructure on RunPod, featuring intelligent model orchestration that automatically switches between models based on request type.
|
|
|
|
## Features
|
|
|
|
- **Text Generation**: Qwen 2.5 7B Instruct via vLLM (~50 tokens/sec)
|
|
- **Image Generation**: Flux.1 Schnell (~4-5 seconds per image)
|
|
- **Music Generation**: MusicGen Medium (30 seconds of audio in 60-90 seconds)
|
|
- **Automatic Model Switching**: Intelligent orchestrator manages sequential model loading
|
|
- **OpenAI-Compatible APIs**: Works with existing AI tools and clients
|
|
- **Easy Model Addition**: Just edit `model-orchestrator/models.yaml` to add new models
|
|
- **Template Support**: Create reusable templates for 2-3 minute deployments (vs 60-90 minutes)
|
|
|
|
## Quick Start
|
|
|
|
### Option 1: Deploy from Template (Recommended)
|
|
|
|
If you've already created a RunPod template:
|
|
|
|
1. Deploy pod from template in RunPod dashboard
|
|
2. SSH to the pod
|
|
3. Create `.env` file with your credentials
|
|
4. Start orchestrator: `docker compose -f docker-compose.gpu.yaml up -d orchestrator`
|
|
|
|
**See**: [RUNPOD_TEMPLATE.md](RUNPOD_TEMPLATE.md) for template usage instructions.
|
|
|
|
### Option 2: Fresh Deployment
|
|
|
|
For first-time setup on a new RunPod instance:
|
|
|
|
1. Copy files to RunPod: `scp -r * gpu-server:/workspace/ai/`
|
|
2. SSH to GPU server: `ssh gpu-server`
|
|
3. Run preparation script: `cd /workspace/ai && chmod +x scripts/prepare-template.sh && ./scripts/prepare-template.sh`
|
|
|
|
**See**: [DEPLOYMENT.md](DEPLOYMENT.md) for detailed deployment guide.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
VPS (LiteLLM Proxy)
|
|
↓ Tailscale VPN
|
|
GPU Server (Orchestrator Port 9000)
|
|
├── vLLM (Qwen 2.5 7B) - Port 8001
|
|
├── Flux.1 Schnell - Port 8002
|
|
└── MusicGen Medium - Port 8003
|
|
```
|
|
|
|
All requests route through the orchestrator, which automatically loads the appropriate model. Only one model is active at a time for cost optimization (~$0.50/hr vs ~$0.75/hr for multi-GPU).
|
|
|
|
## Cost Analysis
|
|
|
|
**RunPod RTX 4090 Spot Instance**:
|
|
- **Hourly**: ~$0.50
|
|
- **Monthly (24/7)**: ~$360
|
|
- **Monthly (8hr/day)**: ~$120
|
|
|
|
**Template Benefits**:
|
|
- **Without Template**: 60-90 minutes setup per Spot restart
|
|
- **With Template**: 2-3 minutes deployment time
|
|
- **Spot Restart Frequency**: 2-5 times per week (variable)
|
|
|
|
## Documentation
|
|
|
|
### Primary Docs
|
|
- **[DEPLOYMENT.md](DEPLOYMENT.md)** - Complete deployment and usage guide
|
|
- **[RUNPOD_TEMPLATE.md](RUNPOD_TEMPLATE.md)** - Template creation and usage
|
|
- **[GPU_DEPLOYMENT_LOG.md](GPU_DEPLOYMENT_LOG.md)** - Deployment history and technical notes
|
|
|
|
### Setup Guides (Historical)
|
|
- `DOCKER_GPU_SETUP.md` - Docker configuration for GPU support
|
|
- `TAILSCALE_SETUP.md` - Tailscale VPN setup
|
|
- `WIREGUARD_SETUP.md` - WireGuard VPN (deprecated, use Tailscale)
|
|
- `SETUP_GUIDE.md` - General setup instructions
|
|
|
|
### Architecture Components
|
|
- `model-orchestrator/` - FastAPI orchestrator managing model lifecycle
|
|
- `vllm/` - Text generation service (Qwen 2.5 7B)
|
|
- `flux/` - Image generation service (Flux.1 Schnell)
|
|
- `musicgen/` - Music generation service (MusicGen Medium)
|
|
- `scripts/` - Automation scripts
|
|
|
|
## Creating a RunPod Template
|
|
|
|
**Why create a template?**
|
|
- Save 60-90 minutes on every Spot instance restart
|
|
- Pre-downloaded models (~37GB cached)
|
|
- Pre-built Docker images
|
|
- Ready-to-use configuration
|
|
|
|
**How to create:**
|
|
1. Run `scripts/prepare-template.sh` on a fresh RunPod instance
|
|
2. Wait 45-60 minutes for models to download and images to build
|
|
3. Save pod as template in RunPod dashboard
|
|
4. Name: `multi-modal-ai-v1.0`
|
|
|
|
**See**: [RUNPOD_TEMPLATE.md](RUNPOD_TEMPLATE.md) for step-by-step guide.
|
|
|
|
## Adding New Models
|
|
|
|
Adding models is easy! Just edit `model-orchestrator/models.yaml`:
|
|
|
|
```yaml
|
|
models:
|
|
llama-3.1-8b: # New model
|
|
type: text
|
|
framework: vllm
|
|
docker_service: vllm-llama
|
|
port: 8004
|
|
vram_gb: 17
|
|
startup_time_seconds: 120
|
|
endpoint: /v1/chat/completions
|
|
```
|
|
|
|
Then add the Docker service to `docker-compose.gpu.yaml` and restart the orchestrator.
|
|
|
|
**See**: [DEPLOYMENT.md](DEPLOYMENT.md#adding-new-models) for complete instructions.
|
|
|
|
## Usage Examples
|
|
|
|
### Text Generation
|
|
```bash
|
|
curl http://100.100.108.13:9000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model": "qwen-2.5-7b", "messages": [{"role": "user", "content": "Hello!"}]}'
|
|
```
|
|
|
|
### Image Generation
|
|
```bash
|
|
curl http://100.100.108.13:9000/v1/images/generations \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model": "flux-schnell", "prompt": "a cute cat", "size": "1024x1024"}'
|
|
```
|
|
|
|
### Music Generation
|
|
```bash
|
|
curl http://100.100.108.13:9000/v1/audio/generations \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model": "musicgen-medium", "prompt": "upbeat electronic", "duration": 30}'
|
|
```
|
|
|
|
## Infrastructure
|
|
|
|
**Provider**: RunPod (Spot Instance)
|
|
**GPU**: NVIDIA RTX 4090 24GB VRAM
|
|
**Region**: Europe
|
|
**Network**: Tailscale VPN (100.100.108.13)
|
|
**Storage**: 922TB network volume at `/workspace`
|
|
|
|
## Monitoring
|
|
|
|
```bash
|
|
# Check active model
|
|
curl http://100.100.108.13:9000/health
|
|
|
|
# View orchestrator logs
|
|
docker logs -f ai_orchestrator
|
|
|
|
# GPU usage
|
|
nvidia-smi
|
|
```
|
|
|
|
## Support
|
|
|
|
For issues:
|
|
1. Check orchestrator logs: `docker logs ai_orchestrator`
|
|
2. Review [DEPLOYMENT.md](DEPLOYMENT.md#troubleshooting)
|
|
3. Check [GPU_DEPLOYMENT_LOG.md](GPU_DEPLOYMENT_LOG.md) for deployment history
|
|
|
|
## License
|
|
|
|
Built with:
|
|
- [vLLM](https://github.com/vllm-project/vllm) - Apache 2.0
|
|
- [AudioCraft](https://github.com/facebookresearch/audiocraft) - MIT (code), CC-BY-NC (weights)
|
|
- [Flux.1](https://github.com/black-forest-labs/flux) - Apache 2.0
|
|
- [LiteLLM](https://github.com/BerriAI/litellm) - MIT
|
|
|
|
**Note**: MusicGen pre-trained weights are non-commercial (CC-BY-NC).
|