Initial commit: RunPod multi-modal AI orchestration stack
- Multi-modal AI infrastructure for RunPod RTX 4090 - Automatic model orchestration (text, image, music) - Text: vLLM + Qwen 2.5 7B Instruct - Image: Flux.1 Schnell via OpenEDAI - Music: MusicGen Medium via AudioCraft - Cost-optimized sequential loading on single GPU - Template preparation scripts for rapid deployment - Comprehensive documentation (README, DEPLOYMENT, TEMPLATE)
This commit is contained in:
180
README.md
Normal file
180
README.md
Normal file
@@ -0,0 +1,180 @@
|
||||
# RunPod Multi-Modal AI Stack
|
||||
|
||||
**Cost-optimized GPU deployment for text, image, and music generation on RunPod RTX 4090.**
|
||||
|
||||
This repository contains everything needed to deploy and manage a multi-modal AI infrastructure on RunPod, featuring intelligent model orchestration that automatically switches between models based on request type.
|
||||
|
||||
## Features
|
||||
|
||||
- **Text Generation**: Qwen 2.5 7B Instruct via vLLM (~50 tokens/sec)
|
||||
- **Image Generation**: Flux.1 Schnell (~4-5 seconds per image)
|
||||
- **Music Generation**: MusicGen Medium (30 seconds of audio in 60-90 seconds)
|
||||
- **Automatic Model Switching**: Intelligent orchestrator manages sequential model loading
|
||||
- **OpenAI-Compatible APIs**: Works with existing AI tools and clients
|
||||
- **Easy Model Addition**: Just edit `model-orchestrator/models.yaml` to add new models
|
||||
- **Template Support**: Create reusable templates for 2-3 minute deployments (vs 60-90 minutes)
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Option 1: Deploy from Template (Recommended)
|
||||
|
||||
If you've already created a RunPod template:
|
||||
|
||||
1. Deploy pod from template in RunPod dashboard
|
||||
2. SSH to the pod
|
||||
3. Create `.env` file with your credentials
|
||||
4. Start orchestrator: `docker compose -f docker-compose.gpu.yaml up -d orchestrator`
|
||||
|
||||
**See**: [RUNPOD_TEMPLATE.md](RUNPOD_TEMPLATE.md) for template usage instructions.
|
||||
|
||||
### Option 2: Fresh Deployment
|
||||
|
||||
For first-time setup on a new RunPod instance:
|
||||
|
||||
1. Copy files to RunPod: `scp -r * gpu-server:/workspace/ai/`
|
||||
2. SSH to GPU server: `ssh gpu-server`
|
||||
3. Run preparation script: `cd /workspace/ai && chmod +x scripts/prepare-template.sh && ./scripts/prepare-template.sh`
|
||||
|
||||
**See**: [DEPLOYMENT.md](DEPLOYMENT.md) for detailed deployment guide.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
VPS (LiteLLM Proxy)
|
||||
↓ Tailscale VPN
|
||||
GPU Server (Orchestrator Port 9000)
|
||||
├── vLLM (Qwen 2.5 7B) - Port 8001
|
||||
├── Flux.1 Schnell - Port 8002
|
||||
└── MusicGen Medium - Port 8003
|
||||
```
|
||||
|
||||
All requests route through the orchestrator, which automatically loads the appropriate model. Only one model is active at a time for cost optimization (~$0.50/hr vs ~$0.75/hr for multi-GPU).
|
||||
|
||||
## Cost Analysis
|
||||
|
||||
**RunPod RTX 4090 Spot Instance**:
|
||||
- **Hourly**: ~$0.50
|
||||
- **Monthly (24/7)**: ~$360
|
||||
- **Monthly (8hr/day)**: ~$120
|
||||
|
||||
**Template Benefits**:
|
||||
- **Without Template**: 60-90 minutes setup per Spot restart
|
||||
- **With Template**: 2-3 minutes deployment time
|
||||
- **Spot Restart Frequency**: 2-5 times per week (variable)
|
||||
|
||||
## Documentation
|
||||
|
||||
### Primary Docs
|
||||
- **[DEPLOYMENT.md](DEPLOYMENT.md)** - Complete deployment and usage guide
|
||||
- **[RUNPOD_TEMPLATE.md](RUNPOD_TEMPLATE.md)** - Template creation and usage
|
||||
- **[GPU_DEPLOYMENT_LOG.md](GPU_DEPLOYMENT_LOG.md)** - Deployment history and technical notes
|
||||
|
||||
### Setup Guides (Historical)
|
||||
- `DOCKER_GPU_SETUP.md` - Docker configuration for GPU support
|
||||
- `TAILSCALE_SETUP.md` - Tailscale VPN setup
|
||||
- `WIREGUARD_SETUP.md` - WireGuard VPN (deprecated, use Tailscale)
|
||||
- `SETUP_GUIDE.md` - General setup instructions
|
||||
|
||||
### Architecture Components
|
||||
- `model-orchestrator/` - FastAPI orchestrator managing model lifecycle
|
||||
- `vllm/` - Text generation service (Qwen 2.5 7B)
|
||||
- `flux/` - Image generation service (Flux.1 Schnell)
|
||||
- `musicgen/` - Music generation service (MusicGen Medium)
|
||||
- `scripts/` - Automation scripts
|
||||
|
||||
## Creating a RunPod Template
|
||||
|
||||
**Why create a template?**
|
||||
- Save 60-90 minutes on every Spot instance restart
|
||||
- Pre-downloaded models (~37GB cached)
|
||||
- Pre-built Docker images
|
||||
- Ready-to-use configuration
|
||||
|
||||
**How to create:**
|
||||
1. Run `scripts/prepare-template.sh` on a fresh RunPod instance
|
||||
2. Wait 45-60 minutes for models to download and images to build
|
||||
3. Save pod as template in RunPod dashboard
|
||||
4. Name: `multi-modal-ai-v1.0`
|
||||
|
||||
**See**: [RUNPOD_TEMPLATE.md](RUNPOD_TEMPLATE.md) for step-by-step guide.
|
||||
|
||||
## Adding New Models
|
||||
|
||||
Adding models is easy! Just edit `model-orchestrator/models.yaml`:
|
||||
|
||||
```yaml
|
||||
models:
|
||||
llama-3.1-8b: # New model
|
||||
type: text
|
||||
framework: vllm
|
||||
docker_service: vllm-llama
|
||||
port: 8004
|
||||
vram_gb: 17
|
||||
startup_time_seconds: 120
|
||||
endpoint: /v1/chat/completions
|
||||
```
|
||||
|
||||
Then add the Docker service to `docker-compose.gpu.yaml` and restart the orchestrator.
|
||||
|
||||
**See**: [DEPLOYMENT.md](DEPLOYMENT.md#adding-new-models) for complete instructions.
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Text Generation
|
||||
```bash
|
||||
curl http://100.100.108.13:9000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "qwen-2.5-7b", "messages": [{"role": "user", "content": "Hello!"}]}'
|
||||
```
|
||||
|
||||
### Image Generation
|
||||
```bash
|
||||
curl http://100.100.108.13:9000/v1/images/generations \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "flux-schnell", "prompt": "a cute cat", "size": "1024x1024"}'
|
||||
```
|
||||
|
||||
### Music Generation
|
||||
```bash
|
||||
curl http://100.100.108.13:9000/v1/audio/generations \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "musicgen-medium", "prompt": "upbeat electronic", "duration": 30}'
|
||||
```
|
||||
|
||||
## Infrastructure
|
||||
|
||||
**Provider**: RunPod (Spot Instance)
|
||||
**GPU**: NVIDIA RTX 4090 24GB VRAM
|
||||
**Region**: Europe
|
||||
**Network**: Tailscale VPN (100.100.108.13)
|
||||
**Storage**: 922TB network volume at `/workspace`
|
||||
|
||||
## Monitoring
|
||||
|
||||
```bash
|
||||
# Check active model
|
||||
curl http://100.100.108.13:9000/health
|
||||
|
||||
# View orchestrator logs
|
||||
docker logs -f ai_orchestrator
|
||||
|
||||
# GPU usage
|
||||
nvidia-smi
|
||||
```
|
||||
|
||||
## Support
|
||||
|
||||
For issues:
|
||||
1. Check orchestrator logs: `docker logs ai_orchestrator`
|
||||
2. Review [DEPLOYMENT.md](DEPLOYMENT.md#troubleshooting)
|
||||
3. Check [GPU_DEPLOYMENT_LOG.md](GPU_DEPLOYMENT_LOG.md) for deployment history
|
||||
|
||||
## License
|
||||
|
||||
Built with:
|
||||
- [vLLM](https://github.com/vllm-project/vllm) - Apache 2.0
|
||||
- [AudioCraft](https://github.com/facebookresearch/audiocraft) - MIT (code), CC-BY-NC (weights)
|
||||
- [Flux.1](https://github.com/black-forest-labs/flux) - Apache 2.0
|
||||
- [LiteLLM](https://github.com/BerriAI/litellm) - MIT
|
||||
|
||||
**Note**: MusicGen pre-trained weights are non-commercial (CC-BY-NC).
|
||||
Reference in New Issue
Block a user