feat: implement Ansible-based process architecture for RunPod

Major architecture overhaul to address RunPod Docker limitations:

Core Infrastructure:
- Add base_service.py: Abstract base class for all AI services
- Add service_manager.py: Process lifecycle management
- Add core/requirements.txt: Core dependencies

Model Services (Standalone Python):
- Add models/vllm/server.py: Qwen 2.5 7B text generation
- Add models/flux/server.py: Flux.1 Schnell image generation
- Add models/musicgen/server.py: MusicGen Medium music generation
- Each service inherits from GPUService base class
- OpenAI-compatible APIs
- Standalone execution support

Ansible Deployment:
- Add playbook.yml: Comprehensive deployment automation
- Add ansible.cfg: Ansible configuration
- Add inventory.yml: Localhost inventory
- Tags: base, python, dependencies, models, tailscale, validate, cleanup

Scripts:
- Add scripts/install.sh: Full installation wrapper
- Add scripts/download-models.sh: Model download wrapper
- Add scripts/start-all.sh: Start orchestrator
- Add scripts/stop-all.sh: Stop all services

Documentation:
- Update ARCHITECTURE.md: Document distributed VPS+GPU architecture

Benefits:
- No Docker: Avoids RunPod CAP_SYS_ADMIN limitations
- Fully reproducible via Ansible
- Extensible: Add models in 3 steps
- Direct Python execution (no container overhead)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-11-21 15:37:18 +01:00
parent 03a430894d
commit 9ee626a78e
17 changed files with 1817 additions and 5 deletions

View File

@@ -0,0 +1,36 @@
#!/bin/bash
#
# Download AI Models
# Wrapper for Ansible models tag
#
set -e
cd "$(dirname "$0")/.."
echo "========================================="
echo " Downloading AI Models (~37GB)"
echo "========================================="
echo ""
# Source .env if it exists
if [ -f .env ]; then
set -a
source .env
set +a
fi
# Check HF_TOKEN
if [ -z "$HF_TOKEN" ]; then
echo "Error: HF_TOKEN not set"
echo "Add HF_TOKEN to .env file"
exit 1
fi
# Run Ansible with models tag
ansible-playbook playbook.yml --tags models
echo ""
echo "========================================="
echo " Model download complete!"
echo "========================================="

50
scripts/install.sh Normal file
View File

@@ -0,0 +1,50 @@
#!/bin/bash
#
# Install AI Infrastructure
# Wrapper script for Ansible playbook
#
# Usage:
# ./install.sh # Full installation
# ./install.sh --tags base # Install specific components
#
set -e
cd "$(dirname "$0")/.."
echo "========================================="
echo " RunPod AI Infrastructure Installation"
echo "========================================="
echo ""
# Check if Ansible is installed
if ! command -v ansible-playbook &> /dev/null; then
echo "Ansible not found. Installing..."
sudo apt update
sudo apt install -y ansible
fi
# Check for .env file
if [ ! -f .env ]; then
echo "Warning: .env file not found"
echo "Copy .env.example to .env and add your HF_TOKEN"
echo ""
fi
# Source .env if it exists
if [ -f .env ]; then
set -a
source .env
set +a
fi
# Run Ansible playbook
echo "Running Ansible playbook..."
echo ""
ansible-playbook playbook.yml "$@"
echo ""
echo "========================================="
echo " Installation complete!"
echo "========================================="

35
scripts/start-all.sh Normal file
View File

@@ -0,0 +1,35 @@
#!/bin/bash
#
# Start AI Orchestrator
# Starts the model orchestrator which manages all AI services
#
set -e
cd "$(dirname "$0")/.."
echo "========================================="
echo " Starting AI Orchestrator"
echo "========================================="
echo ""
# Check for .env file
if [ ! -f .env ]; then
echo "Warning: .env file not found"
echo "Copy .env.example to .env and add your configuration"
echo ""
fi
# Source .env if it exists
if [ -f .env ]; then
set -a
source .env
set +a
fi
# Start orchestrator
echo "Starting orchestrator on port 9000..."
python3 model-orchestrator/orchestrator_subprocess.py
echo ""
echo "Orchestrator stopped"

24
scripts/stop-all.sh Normal file
View File

@@ -0,0 +1,24 @@
#!/bin/bash
#
# Stop AI Services
# Gracefully stops all running AI services
#
set -e
echo "========================================="
echo " Stopping AI Services"
echo "========================================="
echo ""
# Kill orchestrator and model processes
echo "Stopping orchestrator..."
pkill -f "orchestrator_subprocess.py" || echo "Orchestrator not running"
echo "Stopping model services..."
pkill -f "models/vllm/server.py" || echo "vLLM not running"
pkill -f "models/flux/server.py" || echo "Flux not running"
pkill -f "models/musicgen/server.py" || echo "MusicGen not running"
echo ""
echo "All services stopped"