runpod/ARCHITECTURE.md at 03a430894dae8b2921d210c0d45d2d9b5c3bdfc9

Files

Sebastian Krüger 03a430894d docs: add clean extensible architecture design

Created comprehensive architecture document for RunPod deployment:

**Key Design Principles:**
- No Docker (direct Python for RunPod compatibility)
- Extensible (add models in 3 simple steps)
- Maintainable (clear structure, base classes)
- Simple (one command startup)

**Structure:**
- core/ - Base service class + service manager
- model-orchestrator/ - Request routing
- models/ - Service implementations (vllm, flux, musicgen)
- scripts/ - Install, start, stop, template prep
- docs/ - Adding models, deployment, templates

**Adding New Models:**
1. Create server.py inheriting BaseService
2. Add entry to models.yaml
3. Add requirements.txt

That's it! Orchestrator handles lifecycle automatically.

Next: Implement base_service.py and refactor existing services.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-21 15:16:51 +01:00

6.3 KiB

Raw Blame History

Clean, extensible Python-based architecture for RunPod GPU instances

Design Principles

No Docker - Direct Python execution for RunPod compatibility
Extensible - Adding new models requires minimal code
Maintainable - Clear structure and separation of concerns
Simple - One command to start, easy to debug

Directory Structure

runpod/
├── core/                          # Core infrastructure
│   ├── base_service.py           # Abstract base class for all services
│   ├── service_manager.py        # Process lifecycle management
│   └── requirements.txt          # Core dependencies
│
├── model-orchestrator/            # Request orchestration
│   ├── orchestrator.py           # Main orchestrator (process-based)
│   ├── models.yaml               # Model registry (simple config)
│   └── requirements.txt          # Orchestrator dependencies
│
├── models/                        # Model service implementations
│   ├── vllm/                     # Text generation
│   │   ├── server.py             # vLLM service (inherits base_service)
│   │   └── requirements.txt      # vLLM dependencies
│   │
│   ├── flux/                     # Image generation
│   │   ├── server.py             # Flux service
│   │   └── requirements.txt      # Flux dependencies
│   │
│   └── musicgen/                 # Music generation
│       ├── server.py             # MusicGen service
│       └── requirements.txt      # AudioCraft dependencies
│
├── scripts/                       # Deployment & management
│   ├── install.sh                # Install all dependencies
│   ├── download-models.sh        # Pre-download models
│   ├── start-all.sh              # Start orchestrator + services
│   ├── stop-all.sh               # Stop all services
│   └── prepare-template.sh       # RunPod template preparation
│
├── systemd/                       # Optional systemd services
│   ├── ai-orchestrator.service
│   └── install-services.sh
│
└── docs/                          # Documentation
    ├── ADDING_MODELS.md          # Guide for adding new models
    ├── DEPLOYMENT.md             # Deployment guide
    └── RUNPOD_TEMPLATE.md        # Template creation guide

Component Responsibilities

Core (`core/`)

base_service.py: Abstract base class for all model services
- Health check endpoint
- Graceful shutdown
- Logging configuration
- Common utilities
service_manager.py: Process lifecycle management
- Start/stop services
- Health monitoring
- Auto-restart on failure
- Resource cleanup

Orchestrator (`model-orchestrator/`)

orchestrator.py: Routes requests to appropriate model
- Reads models.yaml configuration
- Manages model switching
- Proxies requests to services
- OpenAI-compatible API

models.yaml: Simple model registry

models:
  model-name:
    type: text|image|audio
    service_script: path/to/server.py
    port: 8001
    startup_time: 120
    endpoint: /v1/chat/completions

Models (`models/`)

Each model directory contains:

server.py: Service implementation (inherits BaseService)
requirements.txt: Model-specific dependencies

Services are standalone - can run independently for testing.

Scripts (`scripts/`)

install.sh: Install Python packages for all services
download-models.sh: Pre-download models to /workspace
start-all.sh: Start orchestrator (which manages model services)
stop-all.sh: Graceful shutdown of all services
prepare-template.sh: RunPod template preparation

Adding a New Model (3 steps)

1. Create Model Service

# models/mymodel/server.py
from core.base_service import BaseService

class MyModelService(BaseService):
    def __init__(self):
        super().__init__(
            name="mymodel",
            port=8004
        )

    async def initialize(self):
        """Load model"""
        self.model = load_my_model()

    def create_app(self):
        """Define FastAPI routes"""
        @self.app.post("/v1/mymodel/generate")
        async def generate(request: MyRequest):
            return self.model.generate(request.prompt)

if __name__ == "__main__":
    service = MyModelService()
    service.run()

2. Add to Registry

# model-orchestrator/models.yaml
models:
  mymodel:
    type: custom
    service_script: models/mymodel/server.py
    port: 8004
    startup_time: 60
    endpoint: /v1/mymodel/generate

3. Add Dependencies

# models/mymodel/requirements.txt
transformers==4.36.0
torch==2.1.0

That's it! The orchestrator handles everything else.

Request Flow

Client Request
     ↓
Orchestrator (port 9000)
     ↓ (determines model from endpoint)
Model Service (port 8001-800X)
     ↓
Response

Startup Flow

Run scripts/start-all.sh
Orchestrator starts on port 9000
Orchestrator reads models.yaml
On first request:
- Orchestrator starts appropriate model service
- Waits for health check
- Proxies request
On subsequent requests:
- If same model: direct proxy
- If different model: stop current, start new

Benefits

Simple: No Docker complexity, just Python
Fast: No container overhead, direct execution
Debuggable: Standard Python processes, easy to inspect
Extensible: Add models by creating one file + YAML entry
Maintainable: Clear structure, base classes, DRY principles
Portable: Works anywhere Python runs (local, RunPod, other cloud)

Development Workflow

# Local development
python3 models/vllm/server.py          # Test service directly
python3 model-orchestrator/orchestrator.py  # Test orchestrator

# RunPod deployment
./scripts/install.sh                    # Install dependencies
./scripts/download-models.sh            # Pre-download models
./scripts/start-all.sh                  # Start everything

# Create template
./scripts/prepare-template.sh           # Prepare for template save

Future Enhancements

Load balancing across multiple GPUs
Model pooling (keep multiple models loaded)
Batch request queueing
Metrics and monitoring
Auto-scaling based on demand

6.3 KiB Raw Blame History

RunPod Multi-Modal AI Architecture