diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 0000000..388b51a --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,212 @@ +# RunPod Multi-Modal AI Architecture + +**Clean, extensible Python-based architecture for RunPod GPU instances** + +## Design Principles + +1. **No Docker** - Direct Python execution for RunPod compatibility +2. **Extensible** - Adding new models requires minimal code +3. **Maintainable** - Clear structure and separation of concerns +4. **Simple** - One command to start, easy to debug + +## Directory Structure + +``` +runpod/ +├── core/ # Core infrastructure +│ ├── base_service.py # Abstract base class for all services +│ ├── service_manager.py # Process lifecycle management +│ └── requirements.txt # Core dependencies +│ +├── model-orchestrator/ # Request orchestration +│ ├── orchestrator.py # Main orchestrator (process-based) +│ ├── models.yaml # Model registry (simple config) +│ └── requirements.txt # Orchestrator dependencies +│ +├── models/ # Model service implementations +│ ├── vllm/ # Text generation +│ │ ├── server.py # vLLM service (inherits base_service) +│ │ └── requirements.txt # vLLM dependencies +│ │ +│ ├── flux/ # Image generation +│ │ ├── server.py # Flux service +│ │ └── requirements.txt # Flux dependencies +│ │ +│ └── musicgen/ # Music generation +│ ├── server.py # MusicGen service +│ └── requirements.txt # AudioCraft dependencies +│ +├── scripts/ # Deployment & management +│ ├── install.sh # Install all dependencies +│ ├── download-models.sh # Pre-download models +│ ├── start-all.sh # Start orchestrator + services +│ ├── stop-all.sh # Stop all services +│ └── prepare-template.sh # RunPod template preparation +│ +├── systemd/ # Optional systemd services +│ ├── ai-orchestrator.service +│ └── install-services.sh +│ +└── docs/ # Documentation + ├── ADDING_MODELS.md # Guide for adding new models + ├── DEPLOYMENT.md # Deployment guide + └── RUNPOD_TEMPLATE.md # Template creation guide +``` + +## Component Responsibilities + +### Core (`core/`) +- **base_service.py**: Abstract base class for all model services + - Health check endpoint + - Graceful shutdown + - Logging configuration + - Common utilities + +- **service_manager.py**: Process lifecycle management + - Start/stop services + - Health monitoring + - Auto-restart on failure + - Resource cleanup + +### Orchestrator (`model-orchestrator/`) +- **orchestrator.py**: Routes requests to appropriate model + - Reads `models.yaml` configuration + - Manages model switching + - Proxies requests to services + - OpenAI-compatible API + +- **models.yaml**: Simple model registry + ```yaml + models: + model-name: + type: text|image|audio + service_script: path/to/server.py + port: 8001 + startup_time: 120 + endpoint: /v1/chat/completions + ``` + +### Models (`models/`) +Each model directory contains: +- **server.py**: Service implementation (inherits `BaseService`) +- **requirements.txt**: Model-specific dependencies + +Services are standalone - can run independently for testing. + +### Scripts (`scripts/`) +- **install.sh**: Install Python packages for all services +- **download-models.sh**: Pre-download models to `/workspace` +- **start-all.sh**: Start orchestrator (which manages model services) +- **stop-all.sh**: Graceful shutdown of all services +- **prepare-template.sh**: RunPod template preparation + +## Adding a New Model (3 steps) + +### 1. Create Model Service + +```python +# models/mymodel/server.py +from core.base_service import BaseService + +class MyModelService(BaseService): + def __init__(self): + super().__init__( + name="mymodel", + port=8004 + ) + + async def initialize(self): + """Load model""" + self.model = load_my_model() + + def create_app(self): + """Define FastAPI routes""" + @self.app.post("/v1/mymodel/generate") + async def generate(request: MyRequest): + return self.model.generate(request.prompt) + +if __name__ == "__main__": + service = MyModelService() + service.run() +``` + +### 2. Add to Registry + +```yaml +# model-orchestrator/models.yaml +models: + mymodel: + type: custom + service_script: models/mymodel/server.py + port: 8004 + startup_time: 60 + endpoint: /v1/mymodel/generate +``` + +### 3. Add Dependencies + +``` +# models/mymodel/requirements.txt +transformers==4.36.0 +torch==2.1.0 +``` + +That's it! The orchestrator handles everything else. + +## Request Flow + +``` +Client Request + ↓ +Orchestrator (port 9000) + ↓ (determines model from endpoint) +Model Service (port 8001-800X) + ↓ +Response +``` + +## Startup Flow + +1. Run `scripts/start-all.sh` +2. Orchestrator starts on port 9000 +3. Orchestrator reads `models.yaml` +4. On first request: + - Orchestrator starts appropriate model service + - Waits for health check + - Proxies request +5. On subsequent requests: + - If same model: direct proxy + - If different model: stop current, start new + +## Benefits + +- **Simple**: No Docker complexity, just Python +- **Fast**: No container overhead, direct execution +- **Debuggable**: Standard Python processes, easy to inspect +- **Extensible**: Add models by creating one file + YAML entry +- **Maintainable**: Clear structure, base classes, DRY principles +- **Portable**: Works anywhere Python runs (local, RunPod, other cloud) + +## Development Workflow + +```bash +# Local development +python3 models/vllm/server.py # Test service directly +python3 model-orchestrator/orchestrator.py # Test orchestrator + +# RunPod deployment +./scripts/install.sh # Install dependencies +./scripts/download-models.sh # Pre-download models +./scripts/start-all.sh # Start everything + +# Create template +./scripts/prepare-template.sh # Prepare for template save +``` + +## Future Enhancements + +- Load balancing across multiple GPUs +- Model pooling (keep multiple models loaded) +- Batch request queueing +- Metrics and monitoring +- Auto-scaling based on demand