diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
new file mode 100644
index 0000000..388b51a
--- /dev/null
+++ b/ARCHITECTURE.md
@@ -0,0 +1,212 @@
+# RunPod Multi-Modal AI Architecture
+
+**Clean, extensible Python-based architecture for RunPod GPU instances**
+
+## Design Principles
+
+1. **No Docker** - Direct Python execution for RunPod compatibility
+2. **Extensible** - Adding new models requires minimal code
+3. **Maintainable** - Clear structure and separation of concerns
+4. **Simple** - One command to start, easy to debug
+
+## Directory Structure
+
+```
+runpod/
+├── core/                          # Core infrastructure
+│   ├── base_service.py           # Abstract base class for all services
+│   ├── service_manager.py        # Process lifecycle management
+│   └── requirements.txt          # Core dependencies
+│
+├── model-orchestrator/            # Request orchestration
+│   ├── orchestrator.py           # Main orchestrator (process-based)
+│   ├── models.yaml               # Model registry (simple config)
+│   └── requirements.txt          # Orchestrator dependencies
+│
+├── models/                        # Model service implementations
+│   ├── vllm/                     # Text generation
+│   │   ├── server.py             # vLLM service (inherits base_service)
+│   │   └── requirements.txt      # vLLM dependencies
+│   │
+│   ├── flux/                     # Image generation
+│   │   ├── server.py             # Flux service
+│   │   └── requirements.txt      # Flux dependencies
+│   │
+│   └── musicgen/                 # Music generation
+│       ├── server.py             # MusicGen service
+│       └── requirements.txt      # AudioCraft dependencies
+│
+├── scripts/                       # Deployment & management
+│   ├── install.sh                # Install all dependencies
+│   ├── download-models.sh        # Pre-download models
+│   ├── start-all.sh              # Start orchestrator + services
+│   ├── stop-all.sh               # Stop all services
+│   └── prepare-template.sh       # RunPod template preparation
+│
+├── systemd/                       # Optional systemd services
+│   ├── ai-orchestrator.service
+│   └── install-services.sh
+│
+└── docs/                          # Documentation
+    ├── ADDING_MODELS.md          # Guide for adding new models
+    ├── DEPLOYMENT.md             # Deployment guide
+    └── RUNPOD_TEMPLATE.md        # Template creation guide
+```
+
+## Component Responsibilities
+
+### Core (`core/`)
+- **base_service.py**: Abstract base class for all model services
+  - Health check endpoint
+  - Graceful shutdown
+  - Logging configuration
+  - Common utilities
+
+- **service_manager.py**: Process lifecycle management
+  - Start/stop services
+  - Health monitoring
+  - Auto-restart on failure
+  - Resource cleanup
+
+### Orchestrator (`model-orchestrator/`)
+- **orchestrator.py**: Routes requests to appropriate model
+  - Reads `models.yaml` configuration
+  - Manages model switching
+  - Proxies requests to services
+  - OpenAI-compatible API
+
+- **models.yaml**: Simple model registry
+  ```yaml
+  models:
+    model-name:
+      type: text|image|audio
+      service_script: path/to/server.py
+      port: 8001
+      startup_time: 120
+      endpoint: /v1/chat/completions
+  ```
+
+### Models (`models/`)
+Each model directory contains:
+- **server.py**: Service implementation (inherits `BaseService`)
+- **requirements.txt**: Model-specific dependencies
+
+Services are standalone - can run independently for testing.
+
+### Scripts (`scripts/`)
+- **install.sh**: Install Python packages for all services
+- **download-models.sh**: Pre-download models to `/workspace`
+- **start-all.sh**: Start orchestrator (which manages model services)
+- **stop-all.sh**: Graceful shutdown of all services
+- **prepare-template.sh**: RunPod template preparation
+
+## Adding a New Model (3 steps)
+
+### 1. Create Model Service
+
+```python
+# models/mymodel/server.py
+from core.base_service import BaseService
+
+class MyModelService(BaseService):
+    def __init__(self):
+        super().__init__(
+            name="mymodel",
+            port=8004
+        )
+
+    async def initialize(self):
+        """Load model"""
+        self.model = load_my_model()
+
+    def create_app(self):
+        """Define FastAPI routes"""
+        @self.app.post("/v1/mymodel/generate")
+        async def generate(request: MyRequest):
+            return self.model.generate(request.prompt)
+
+if __name__ == "__main__":
+    service = MyModelService()
+    service.run()
+```
+
+### 2. Add to Registry
+
+```yaml
+# model-orchestrator/models.yaml
+models:
+  mymodel:
+    type: custom
+    service_script: models/mymodel/server.py
+    port: 8004
+    startup_time: 60
+    endpoint: /v1/mymodel/generate
+```
+
+### 3. Add Dependencies
+
+```
+# models/mymodel/requirements.txt
+transformers==4.36.0
+torch==2.1.0
+```
+
+That's it! The orchestrator handles everything else.
+
+## Request Flow
+
+```
+Client Request
+     ↓
+Orchestrator (port 9000)
+     ↓ (determines model from endpoint)
+Model Service (port 8001-800X)
+     ↓
+Response
+```
+
+## Startup Flow
+
+1. Run `scripts/start-all.sh`
+2. Orchestrator starts on port 9000
+3. Orchestrator reads `models.yaml`
+4. On first request:
+   - Orchestrator starts appropriate model service
+   - Waits for health check
+   - Proxies request
+5. On subsequent requests:
+   - If same model: direct proxy
+   - If different model: stop current, start new
+
+## Benefits
+
+- **Simple**: No Docker complexity, just Python
+- **Fast**: No container overhead, direct execution
+- **Debuggable**: Standard Python processes, easy to inspect
+- **Extensible**: Add models by creating one file + YAML entry
+- **Maintainable**: Clear structure, base classes, DRY principles
+- **Portable**: Works anywhere Python runs (local, RunPod, other cloud)
+
+## Development Workflow
+
+```bash
+# Local development
+python3 models/vllm/server.py          # Test service directly
+python3 model-orchestrator/orchestrator.py  # Test orchestrator
+
+# RunPod deployment
+./scripts/install.sh                    # Install dependencies
+./scripts/download-models.sh            # Pre-download models
+./scripts/start-all.sh                  # Start everything
+
+# Create template
+./scripts/prepare-template.sh           # Prepare for template save
+```
+
+## Future Enhancements
+
+- Load balancing across multiple GPUs
+- Model pooling (keep multiple models loaded)
+- Batch request queueing
+- Metrics and monitoring
+- Auto-scaling based on demand