feat: implement Ansible-based process architecture for RunPod

Major architecture overhaul to address RunPod Docker limitations: Core Infrastructure: - Add base_service.py: Abstract base class for all AI services - Add service_manager.py: Process lifecycle management - Add core/requirements.txt: Core dependencies Model Services (Standalone Python): - Add models/vllm/server.py: Qwen 2.5 7B text generation - Add models/flux/server.py: Flux.1 Schnell image generation - Add models/musicgen/server.py: MusicGen Medium music generation - Each service inherits from GPUService base class - OpenAI-compatible APIs - Standalone execution support Ansible Deployment: - Add playbook.yml: Comprehensive deployment automation - Add ansible.cfg: Ansible configuration - Add inventory.yml: Localhost inventory - Tags: base, python, dependencies, models, tailscale, validate, cleanup Scripts: - Add scripts/install.sh: Full installation wrapper - Add scripts/download-models.sh: Model download wrapper - Add scripts/start-all.sh: Start orchestrator - Add scripts/stop-all.sh: Stop all services Documentation: - Update ARCHITECTURE.md: Document distributed VPS+GPU architecture Benefits: - No Docker: Avoids RunPod CAP_SYS_ADMIN limitations - Fully reproducible via Ansible - Extensible: Add models in 3 steps - Direct Python execution (no container overhead) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 15:37:18 +01:00
parent 03a430894d
commit 9ee626a78e
17 changed files with 1817 additions and 5 deletions
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@@ -1,13 +1,15 @@
 # RunPod Multi-Modal AI Architecture

-**Clean, extensible Python-based architecture for RunPod GPU instances**
+**Clean, extensible distributed AI infrastructure spanning VPS and GPU**

 ## Design Principles

-1. **No Docker** - Direct Python execution for RunPod compatibility
-2. **Extensible** - Adding new models requires minimal code
-3. **Maintainable** - Clear structure and separation of concerns
-4. **Simple** - One command to start, easy to debug
+1. **Distributed** - VPS (UI/proxy) + GPU (models) connected via Tailscale
+2. **No Docker on GPU** - Direct Python for RunPod compatibility
+3. **Extensible** - Adding new models requires minimal code
+4. **Maintainable** - Clear structure and separation of concerns
+5. **Simple** - One command to start, easy to debug
+6. **OpenAI Compatible** - Works with standard AI tools

 ## Directory Structure