Commit Graph

58 Commits

Author SHA1 Message Date
9ee626a78e feat: implement Ansible-based process architecture for RunPod
Major architecture overhaul to address RunPod Docker limitations:

Core Infrastructure:
- Add base_service.py: Abstract base class for all AI services
- Add service_manager.py: Process lifecycle management
- Add core/requirements.txt: Core dependencies

Model Services (Standalone Python):
- Add models/vllm/server.py: Qwen 2.5 7B text generation
- Add models/flux/server.py: Flux.1 Schnell image generation
- Add models/musicgen/server.py: MusicGen Medium music generation
- Each service inherits from GPUService base class
- OpenAI-compatible APIs
- Standalone execution support

Ansible Deployment:
- Add playbook.yml: Comprehensive deployment automation
- Add ansible.cfg: Ansible configuration
- Add inventory.yml: Localhost inventory
- Tags: base, python, dependencies, models, tailscale, validate, cleanup

Scripts:
- Add scripts/install.sh: Full installation wrapper
- Add scripts/download-models.sh: Model download wrapper
- Add scripts/start-all.sh: Start orchestrator
- Add scripts/stop-all.sh: Stop all services

Documentation:
- Update ARCHITECTURE.md: Document distributed VPS+GPU architecture

Benefits:
- No Docker: Avoids RunPod CAP_SYS_ADMIN limitations
- Fully reproducible via Ansible
- Extensible: Add models in 3 steps
- Direct Python execution (no container overhead)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 15:37:18 +01:00
03a430894d docs: add clean extensible architecture design
Created comprehensive architecture document for RunPod deployment:

**Key Design Principles:**
- No Docker (direct Python for RunPod compatibility)
- Extensible (add models in 3 simple steps)
- Maintainable (clear structure, base classes)
- Simple (one command startup)

**Structure:**
- core/ - Base service class + service manager
- model-orchestrator/ - Request routing
- models/ - Service implementations (vllm, flux, musicgen)
- scripts/ - Install, start, stop, template prep
- docs/ - Adding models, deployment, templates

**Adding New Models:**
1. Create server.py inheriting BaseService
2. Add entry to models.yaml
3. Add requirements.txt

That's it! Orchestrator handles lifecycle automatically.

Next: Implement base_service.py and refactor existing services.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 15:16:51 +01:00
31be1932e7 wip: start architecture redesign for RunPod (no Docker)
Started redesigning architecture to run services directly without Docker:

**Completed:**
- Created new process-based orchestrator (orchestrator_subprocess.py)
- Uses subprocess instead of Docker SDK for process management
- Updated models.yaml to reference service_script paths
- vLLM server already standalone-ready

**Still needed:**
- Create/update Flux and MusicGen standalone servers
- Create systemd service files or startup scripts
- Update prepare-template script for Python deployment
- Remove Docker/Compose dependencies
- Test full stack on RunPod
- Update documentation

Reason for change: RunPod's containerized environment doesn't support
Docker-in-Docker (requires CAP_SYS_ADMIN). Direct Python execution is
simpler, faster, and more reliable for RunPod.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 15:09:30 +01:00
cd9e2eee2e fix: use legacy Docker builder for RunPod compatibility
- Set DOCKER_BUILDKIT=0 to use legacy builder
- BuildKit has permission issues in RunPod's containerized environment
- Legacy builder works reliably with RunPod's security constraints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 15:01:16 +01:00
8f1d4bedd2 fix: update Docker daemon startup for RunPod environment
- Changed from systemctl/service to direct dockerd command
- Added --iptables=false --bridge=none flags (required for RunPod)
- Added proper error checking and 10s wait time
- Improved logging with verification step

This fixes Docker startup in RunPod's containerized environment where
systemd is not available and iptables require special handling.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 15:00:42 +01:00
0fa69cae28 refactor: rename docker-compose.gpu.yaml to compose.yaml
Simplified compose file naming to follow Docker Compose best practices:
- Renamed docker-compose.gpu.yaml to compose.yaml
- Updated all references in documentation files (README.md, DEPLOYMENT.md, GPU_DEPLOYMENT_LOG.md, RUNPOD_TEMPLATE.md)
- Updated references in scripts (prepare-template.sh)

This change enables simpler command syntax:
- Before: docker compose -f docker-compose.gpu.yaml up -d orchestrator
- After: docker compose up -d orchestrator

Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 14:49:42 +01:00
cafa0a1147 refactor: clean up runpod repository structure
Removed facefusion and VPS-related files:
- compose.yaml, postgres/, litellm-config.yaml (VPS services)
- Dockerfile, entrypoint.sh, disable-nsfw-filter.patch (facefusion)

Removed outdated documentation:
- DOCKER_GPU_SETUP.md, README_GPU_SETUP.md, SETUP_GUIDE.md
- TAILSCALE_SETUP.md, WIREGUARD_SETUP.md (covered in DEPLOYMENT.md)
- GPU_EXPANSION_PLAN.md (historical planning doc)
- gpu-server-compose.yaml, litellm-config-gpu.yaml (old versions)
- deploy-gpu-stack.sh, simple_vllm_server.py (old scripts)

Organized documentation:
- Created docs/ directory
- Moved DEPLOYMENT.md, RUNPOD_TEMPLATE.md, GPU_DEPLOYMENT_LOG.md to docs/
- Updated all documentation links in README.md

Final structure:
- Clean root directory with only GPU-specific files
- Organized documentation in docs/
- Model services in dedicated directories (model-orchestrator/, vllm/, flux/, musicgen/)
- Automation scripts in scripts/
2025-11-21 14:45:49 +01:00
277f1c95bd Initial commit: RunPod multi-modal AI orchestration stack
- Multi-modal AI infrastructure for RunPod RTX 4090
- Automatic model orchestration (text, image, music)
- Text: vLLM + Qwen 2.5 7B Instruct
- Image: Flux.1 Schnell via OpenEDAI
- Music: MusicGen Medium via AudioCraft
- Cost-optimized sequential loading on single GPU
- Template preparation scripts for rapid deployment
- Comprehensive documentation (README, DEPLOYMENT, TEMPLATE)
2025-11-21 14:34:55 +01:00