docs: add clean extensible architecture design
Created comprehensive architecture document for RunPod deployment: **Key Design Principles:** - No Docker (direct Python for RunPod compatibility) - Extensible (add models in 3 simple steps) - Maintainable (clear structure, base classes) - Simple (one command startup) **Structure:** - core/ - Base service class + service manager - model-orchestrator/ - Request routing - models/ - Service implementations (vllm, flux, musicgen) - scripts/ - Install, start, stop, template prep - docs/ - Adding models, deployment, templates **Adding New Models:** 1. Create server.py inheriting BaseService 2. Add entry to models.yaml 3. Add requirements.txt That's it! Orchestrator handles lifecycle automatically. Next: Implement base_service.py and refactor existing services. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
212
ARCHITECTURE.md
Normal file
212
ARCHITECTURE.md
Normal file
@@ -0,0 +1,212 @@
|
||||
# RunPod Multi-Modal AI Architecture
|
||||
|
||||
**Clean, extensible Python-based architecture for RunPod GPU instances**
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **No Docker** - Direct Python execution for RunPod compatibility
|
||||
2. **Extensible** - Adding new models requires minimal code
|
||||
3. **Maintainable** - Clear structure and separation of concerns
|
||||
4. **Simple** - One command to start, easy to debug
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
runpod/
|
||||
├── core/ # Core infrastructure
|
||||
│ ├── base_service.py # Abstract base class for all services
|
||||
│ ├── service_manager.py # Process lifecycle management
|
||||
│ └── requirements.txt # Core dependencies
|
||||
│
|
||||
├── model-orchestrator/ # Request orchestration
|
||||
│ ├── orchestrator.py # Main orchestrator (process-based)
|
||||
│ ├── models.yaml # Model registry (simple config)
|
||||
│ └── requirements.txt # Orchestrator dependencies
|
||||
│
|
||||
├── models/ # Model service implementations
|
||||
│ ├── vllm/ # Text generation
|
||||
│ │ ├── server.py # vLLM service (inherits base_service)
|
||||
│ │ └── requirements.txt # vLLM dependencies
|
||||
│ │
|
||||
│ ├── flux/ # Image generation
|
||||
│ │ ├── server.py # Flux service
|
||||
│ │ └── requirements.txt # Flux dependencies
|
||||
│ │
|
||||
│ └── musicgen/ # Music generation
|
||||
│ ├── server.py # MusicGen service
|
||||
│ └── requirements.txt # AudioCraft dependencies
|
||||
│
|
||||
├── scripts/ # Deployment & management
|
||||
│ ├── install.sh # Install all dependencies
|
||||
│ ├── download-models.sh # Pre-download models
|
||||
│ ├── start-all.sh # Start orchestrator + services
|
||||
│ ├── stop-all.sh # Stop all services
|
||||
│ └── prepare-template.sh # RunPod template preparation
|
||||
│
|
||||
├── systemd/ # Optional systemd services
|
||||
│ ├── ai-orchestrator.service
|
||||
│ └── install-services.sh
|
||||
│
|
||||
└── docs/ # Documentation
|
||||
├── ADDING_MODELS.md # Guide for adding new models
|
||||
├── DEPLOYMENT.md # Deployment guide
|
||||
└── RUNPOD_TEMPLATE.md # Template creation guide
|
||||
```
|
||||
|
||||
## Component Responsibilities
|
||||
|
||||
### Core (`core/`)
|
||||
- **base_service.py**: Abstract base class for all model services
|
||||
- Health check endpoint
|
||||
- Graceful shutdown
|
||||
- Logging configuration
|
||||
- Common utilities
|
||||
|
||||
- **service_manager.py**: Process lifecycle management
|
||||
- Start/stop services
|
||||
- Health monitoring
|
||||
- Auto-restart on failure
|
||||
- Resource cleanup
|
||||
|
||||
### Orchestrator (`model-orchestrator/`)
|
||||
- **orchestrator.py**: Routes requests to appropriate model
|
||||
- Reads `models.yaml` configuration
|
||||
- Manages model switching
|
||||
- Proxies requests to services
|
||||
- OpenAI-compatible API
|
||||
|
||||
- **models.yaml**: Simple model registry
|
||||
```yaml
|
||||
models:
|
||||
model-name:
|
||||
type: text|image|audio
|
||||
service_script: path/to/server.py
|
||||
port: 8001
|
||||
startup_time: 120
|
||||
endpoint: /v1/chat/completions
|
||||
```
|
||||
|
||||
### Models (`models/`)
|
||||
Each model directory contains:
|
||||
- **server.py**: Service implementation (inherits `BaseService`)
|
||||
- **requirements.txt**: Model-specific dependencies
|
||||
|
||||
Services are standalone - can run independently for testing.
|
||||
|
||||
### Scripts (`scripts/`)
|
||||
- **install.sh**: Install Python packages for all services
|
||||
- **download-models.sh**: Pre-download models to `/workspace`
|
||||
- **start-all.sh**: Start orchestrator (which manages model services)
|
||||
- **stop-all.sh**: Graceful shutdown of all services
|
||||
- **prepare-template.sh**: RunPod template preparation
|
||||
|
||||
## Adding a New Model (3 steps)
|
||||
|
||||
### 1. Create Model Service
|
||||
|
||||
```python
|
||||
# models/mymodel/server.py
|
||||
from core.base_service import BaseService
|
||||
|
||||
class MyModelService(BaseService):
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
name="mymodel",
|
||||
port=8004
|
||||
)
|
||||
|
||||
async def initialize(self):
|
||||
"""Load model"""
|
||||
self.model = load_my_model()
|
||||
|
||||
def create_app(self):
|
||||
"""Define FastAPI routes"""
|
||||
@self.app.post("/v1/mymodel/generate")
|
||||
async def generate(request: MyRequest):
|
||||
return self.model.generate(request.prompt)
|
||||
|
||||
if __name__ == "__main__":
|
||||
service = MyModelService()
|
||||
service.run()
|
||||
```
|
||||
|
||||
### 2. Add to Registry
|
||||
|
||||
```yaml
|
||||
# model-orchestrator/models.yaml
|
||||
models:
|
||||
mymodel:
|
||||
type: custom
|
||||
service_script: models/mymodel/server.py
|
||||
port: 8004
|
||||
startup_time: 60
|
||||
endpoint: /v1/mymodel/generate
|
||||
```
|
||||
|
||||
### 3. Add Dependencies
|
||||
|
||||
```
|
||||
# models/mymodel/requirements.txt
|
||||
transformers==4.36.0
|
||||
torch==2.1.0
|
||||
```
|
||||
|
||||
That's it! The orchestrator handles everything else.
|
||||
|
||||
## Request Flow
|
||||
|
||||
```
|
||||
Client Request
|
||||
↓
|
||||
Orchestrator (port 9000)
|
||||
↓ (determines model from endpoint)
|
||||
Model Service (port 8001-800X)
|
||||
↓
|
||||
Response
|
||||
```
|
||||
|
||||
## Startup Flow
|
||||
|
||||
1. Run `scripts/start-all.sh`
|
||||
2. Orchestrator starts on port 9000
|
||||
3. Orchestrator reads `models.yaml`
|
||||
4. On first request:
|
||||
- Orchestrator starts appropriate model service
|
||||
- Waits for health check
|
||||
- Proxies request
|
||||
5. On subsequent requests:
|
||||
- If same model: direct proxy
|
||||
- If different model: stop current, start new
|
||||
|
||||
## Benefits
|
||||
|
||||
- **Simple**: No Docker complexity, just Python
|
||||
- **Fast**: No container overhead, direct execution
|
||||
- **Debuggable**: Standard Python processes, easy to inspect
|
||||
- **Extensible**: Add models by creating one file + YAML entry
|
||||
- **Maintainable**: Clear structure, base classes, DRY principles
|
||||
- **Portable**: Works anywhere Python runs (local, RunPod, other cloud)
|
||||
|
||||
## Development Workflow
|
||||
|
||||
```bash
|
||||
# Local development
|
||||
python3 models/vllm/server.py # Test service directly
|
||||
python3 model-orchestrator/orchestrator.py # Test orchestrator
|
||||
|
||||
# RunPod deployment
|
||||
./scripts/install.sh # Install dependencies
|
||||
./scripts/download-models.sh # Pre-download models
|
||||
./scripts/start-all.sh # Start everything
|
||||
|
||||
# Create template
|
||||
./scripts/prepare-template.sh # Prepare for template save
|
||||
```
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
- Load balancing across multiple GPUs
|
||||
- Model pooling (keep multiple models loaded)
|
||||
- Batch request queueing
|
||||
- Metrics and monitoring
|
||||
- Auto-scaling based on demand
|
||||
Reference in New Issue
Block a user