runpod

Go to file

Sebastian Krüger 9ee626a78e feat: implement Ansible-based process architecture for RunPod

Major architecture overhaul to address RunPod Docker limitations:

Core Infrastructure:
- Add base_service.py: Abstract base class for all AI services
- Add service_manager.py: Process lifecycle management
- Add core/requirements.txt: Core dependencies

Model Services (Standalone Python):
- Add models/vllm/server.py: Qwen 2.5 7B text generation
- Add models/flux/server.py: Flux.1 Schnell image generation
- Add models/musicgen/server.py: MusicGen Medium music generation
- Each service inherits from GPUService base class
- OpenAI-compatible APIs
- Standalone execution support

Ansible Deployment:
- Add playbook.yml: Comprehensive deployment automation
- Add ansible.cfg: Ansible configuration
- Add inventory.yml: Localhost inventory
- Tags: base, python, dependencies, models, tailscale, validate, cleanup

Scripts:
- Add scripts/install.sh: Full installation wrapper
- Add scripts/download-models.sh: Model download wrapper
- Add scripts/start-all.sh: Start orchestrator
- Add scripts/stop-all.sh: Stop all services

Documentation:
- Update ARCHITECTURE.md: Document distributed VPS+GPU architecture

Benefits:
- No Docker: Avoids RunPod CAP_SYS_ADMIN limitations
- Fully reproducible via Ansible
- Extensible: Add models in 3 steps
- Direct Python execution (no container overhead)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-21 15:37:18 +01:00

core

feat: implement Ansible-based process architecture for RunPod

2025-11-21 15:37:18 +01:00

docs

refactor: rename docker-compose.gpu.yaml to compose.yaml

2025-11-21 14:49:42 +01:00

flux/config

Initial commit: RunPod multi-modal AI orchestration stack

2025-11-21 14:34:55 +01:00

model-orchestrator

wip: start architecture redesign for RunPod (no Docker)

2025-11-21 15:09:30 +01:00

models

feat: implement Ansible-based process architecture for RunPod

2025-11-21 15:37:18 +01:00

musicgen

Initial commit: RunPod multi-modal AI orchestration stack

2025-11-21 14:34:55 +01:00

scripts

feat: implement Ansible-based process architecture for RunPod

2025-11-21 15:37:18 +01:00

vllm

Initial commit: RunPod multi-modal AI orchestration stack

2025-11-21 14:34:55 +01:00

.env.example

Initial commit: RunPod multi-modal AI orchestration stack

2025-11-21 14:34:55 +01:00

.gitignore

Initial commit: RunPod multi-modal AI orchestration stack

2025-11-21 14:34:55 +01:00

ansible.cfg

feat: implement Ansible-based process architecture for RunPod

2025-11-21 15:37:18 +01:00

ARCHITECTURE.md

feat: implement Ansible-based process architecture for RunPod

2025-11-21 15:37:18 +01:00

compose.yaml

refactor: rename docker-compose.gpu.yaml to compose.yaml

2025-11-21 14:49:42 +01:00

inventory.yml

feat: implement Ansible-based process architecture for RunPod

2025-11-21 15:37:18 +01:00

playbook.yml

feat: implement Ansible-based process architecture for RunPod

2025-11-21 15:37:18 +01:00

README.md

refactor: rename docker-compose.gpu.yaml to compose.yaml

2025-11-21 14:49:42 +01:00

README.md

Cost-optimized GPU deployment for text, image, and music generation on RunPod RTX 4090.

This repository contains everything needed to deploy and manage a multi-modal AI infrastructure on RunPod, featuring intelligent model orchestration that automatically switches between models based on request type.

Features

Text Generation: Qwen 2.5 7B Instruct via vLLM (~50 tokens/sec)
Image Generation: Flux.1 Schnell (~4-5 seconds per image)
Music Generation: MusicGen Medium (30 seconds of audio in 60-90 seconds)
Automatic Model Switching: Intelligent orchestrator manages sequential model loading
OpenAI-Compatible APIs: Works with existing AI tools and clients
Easy Model Addition: Just edit model-orchestrator/models.yaml to add new models
Template Support: Create reusable templates for 2-3 minute deployments (vs 60-90 minutes)

Quick Start

Option 1: Deploy from Template (Recommended)

If you've already created a RunPod template:

Deploy pod from template in RunPod dashboard
SSH to the pod
Create .env file with your credentials
Start orchestrator: docker compose -f compose.yaml up -d orchestrator

See: RUNPOD_TEMPLATE.md for template usage instructions.

Option 2: Fresh Deployment

For first-time setup on a new RunPod instance:

Copy files to RunPod: scp -r * gpu-server:/workspace/ai/
SSH to GPU server: ssh gpu-server
Run preparation script: cd /workspace/ai && chmod +x scripts/prepare-template.sh && ./scripts/prepare-template.sh

See: docs/DEPLOYMENT.md for detailed deployment guide.

Architecture

VPS (LiteLLM Proxy)
    ↓ Tailscale VPN
GPU Server (Orchestrator Port 9000)
    ├── vLLM (Qwen 2.5 7B) - Port 8001
    ├── Flux.1 Schnell - Port 8002
    └── MusicGen Medium - Port 8003

All requests route through the orchestrator, which automatically loads the appropriate model. Only one model is active at a time for cost optimization (~$0.50/hr vs ~$0.75/hr for multi-GPU).

Cost Analysis

RunPod RTX 4090 Spot Instance:

Hourly: ~$0.50
Monthly (24/7): ~$360
Monthly (8hr/day): ~$120

Template Benefits:

Without Template: 60-90 minutes setup per Spot restart
With Template: 2-3 minutes deployment time
Spot Restart Frequency: 2-5 times per week (variable)

Documentation

docs/DEPLOYMENT.md - Complete deployment and usage guide
docs/RUNPOD_TEMPLATE.md - Template creation and usage
docs/GPU_DEPLOYMENT_LOG.md - Deployment history and technical notes

Architecture Components

model-orchestrator/ - FastAPI orchestrator managing model lifecycle
vllm/ - Text generation service (Qwen 2.5 7B)
flux/ - Image generation service (Flux.1 Schnell)
musicgen/ - Music generation service (MusicGen Medium)
scripts/ - Automation scripts

Creating a RunPod Template

Why create a template?

Save 60-90 minutes on every Spot instance restart
Pre-downloaded models (~37GB cached)
Pre-built Docker images
Ready-to-use configuration

How to create:

Run scripts/prepare-template.sh on a fresh RunPod instance
Wait 45-60 minutes for models to download and images to build
Save pod as template in RunPod dashboard
Name: multi-modal-ai-v1.0

See: docs/RUNPOD_TEMPLATE.md for step-by-step guide.

Adding New Models

Adding models is easy! Just edit model-orchestrator/models.yaml:

models:
  llama-3.1-8b:  # New model
    type: text
    framework: vllm
    docker_service: vllm-llama
    port: 8004
    vram_gb: 17
    startup_time_seconds: 120
    endpoint: /v1/chat/completions

Then add the Docker service to compose.yaml and restart the orchestrator.

See: docs/DEPLOYMENT.md for complete instructions.

Usage Examples

Text Generation

curl http://100.100.108.13:9000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen-2.5-7b", "messages": [{"role": "user", "content": "Hello!"}]}'

Image Generation

curl http://100.100.108.13:9000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{"model": "flux-schnell", "prompt": "a cute cat", "size": "1024x1024"}'

Music Generation

curl http://100.100.108.13:9000/v1/audio/generations \
  -H "Content-Type: application/json" \
  -d '{"model": "musicgen-medium", "prompt": "upbeat electronic", "duration": 30}'

Infrastructure

Provider: RunPod (Spot Instance) GPU: NVIDIA RTX 4090 24GB VRAM Region: Europe Network: Tailscale VPN (100.100.108.13) Storage: 922TB network volume at /workspace

Monitoring

# Check active model
curl http://100.100.108.13:9000/health

# View orchestrator logs
docker logs -f ai_orchestrator

# GPU usage
nvidia-smi

Support

For issues:

Check orchestrator logs: docker logs ai_orchestrator
Review docs/DEPLOYMENT.md
Check docs/GPU_DEPLOYMENT_LOG.md for deployment history

License

Built with:

vLLM - Apache 2.0
AudioCraft - MIT (code), CC-BY-NC (weights)
Flux.1 - Apache 2.0
LiteLLM - MIT

Note: MusicGen pre-trained weights are non-commercial (CC-BY-NC).

README.md

RunPod Multi-Modal AI Stack

Features

Quick Start

Option 1: Deploy from Template (Recommended)

Option 2: Fresh Deployment

Architecture

Cost Analysis

Documentation

Architecture Components

Creating a RunPod Template

Adding New Models

Usage Examples

Text Generation

Image Generation

Music Generation

Infrastructure

Monitoring

Support

License