10 Commits

Author SHA1 Message Date
571431955d feat: add RunPod Docker template with automated build workflow
- Add Dockerfile with minimal setup (supervisor, tailscale)
- Add start.sh bootstrap script for container initialization
- Add Gitea workflow for automated Docker image builds
- Add comprehensive RUNPOD_TEMPLATE.md documentation
- Add bootstrap-venvs.sh for Python venv health checks

This enables deployment of the AI orchestrator on RunPod using:
- Minimal Docker image (~2-3GB) for fast deployment
- Network volume for models and data persistence (~80-200GB)
- Automated builds on push to main or version tags
- Full Tailscale VPN integration
- Supervisor process management
2025-11-23 21:53:56 +01:00
18cd87fbd1 refactor: reorganize webdav-sync into dedicated directory
Clean up project structure by organizing WebDAV sync service properly.

Changes:
- Move scripts/comfyui_webdav_sync.py → webdav-sync/webdav_sync.py
- Create webdav-sync/requirements.txt with watchdog and webdavclient3
- Remove webdav dependencies from model-orchestrator/requirements.txt
- Delete unused scripts/ folder (start-all.sh, status.sh, stop-all.sh)
- Update supervisord.conf to use new path /workspace/ai/webdav-sync/webdav_sync.py

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 19:03:48 +01:00
79442bd62e feat: add WebDAV sync service for ComfyUI outputs
Add Python watchdog service to automatically sync ComfyUI outputs to HiDrive WebDAV storage.

Changes:
- Add scripts/comfyui_webdav_sync.py: File watcher service using watchdog + webdavclient3
- Update model-orchestrator/requirements.txt: Add watchdog and webdavclient3 dependencies
- Update supervisord.conf: Add webdav-sync program with ENV variable support
- Update arty.yml: Add service management scripts (start/stop/restart/status/logs)

WebDAV credentials are now loaded from .env file (WEBDAV_URL, WEBDAV_USERNAME, WEBDAV_PASSWORD, WEBDAV_REMOTE_PATH)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 18:58:18 +01:00
664da9f4ea feat: add Supervisor process manager for service management
- Add supervisord.conf with ComfyUI and orchestrator services
- Update Ansible playbook with supervisor installation tag
- Rewrite start-all.sh and stop-all.sh to use Supervisor
- Add status.sh script for checking service status
- Update arty.yml with supervisor commands and shortcuts
- Update CLAUDE.md with Supervisor documentation and troubleshooting
- Services now auto-restart on crashes with centralized logging

Benefits:
- Better process control than manual pkill/background jobs
- Auto-restart on service crashes
- Centralized log management in /workspace/logs/
- Web interface for monitoring (port 9001)
- Works perfectly in RunPod containers (no systemd needed)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 09:22:16 +01:00
c9b01eef68 refactor: consolidate model management into Ansible playbook
Remove flux/musicgen standalone implementations in favor of ComfyUI:
- Delete models/flux/ and models/musicgen/ directories
- Remove redundant scripts (install.sh, download-models.sh, prepare-template.sh)
- Update README.md to reference Ansible playbook commands
- Update playbook.yml to remove flux/musicgen service definitions
- Add COMFYUI_MODELS.md with comprehensive model installation guide
- Update stop-all.sh to only manage orchestrator and vLLM services

All model downloads and dependency management now handled via
Ansible playbook tags (base, python, vllm, comfyui, comfyui-essential).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 00:31:26 +01:00
9ee626a78e feat: implement Ansible-based process architecture for RunPod
Major architecture overhaul to address RunPod Docker limitations:

Core Infrastructure:
- Add base_service.py: Abstract base class for all AI services
- Add service_manager.py: Process lifecycle management
- Add core/requirements.txt: Core dependencies

Model Services (Standalone Python):
- Add models/vllm/server.py: Qwen 2.5 7B text generation
- Add models/flux/server.py: Flux.1 Schnell image generation
- Add models/musicgen/server.py: MusicGen Medium music generation
- Each service inherits from GPUService base class
- OpenAI-compatible APIs
- Standalone execution support

Ansible Deployment:
- Add playbook.yml: Comprehensive deployment automation
- Add ansible.cfg: Ansible configuration
- Add inventory.yml: Localhost inventory
- Tags: base, python, dependencies, models, tailscale, validate, cleanup

Scripts:
- Add scripts/install.sh: Full installation wrapper
- Add scripts/download-models.sh: Model download wrapper
- Add scripts/start-all.sh: Start orchestrator
- Add scripts/stop-all.sh: Stop all services

Documentation:
- Update ARCHITECTURE.md: Document distributed VPS+GPU architecture

Benefits:
- No Docker: Avoids RunPod CAP_SYS_ADMIN limitations
- Fully reproducible via Ansible
- Extensible: Add models in 3 steps
- Direct Python execution (no container overhead)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 15:37:18 +01:00
cd9e2eee2e fix: use legacy Docker builder for RunPod compatibility
- Set DOCKER_BUILDKIT=0 to use legacy builder
- BuildKit has permission issues in RunPod's containerized environment
- Legacy builder works reliably with RunPod's security constraints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 15:01:16 +01:00
8f1d4bedd2 fix: update Docker daemon startup for RunPod environment
- Changed from systemctl/service to direct dockerd command
- Added --iptables=false --bridge=none flags (required for RunPod)
- Added proper error checking and 10s wait time
- Improved logging with verification step

This fixes Docker startup in RunPod's containerized environment where
systemd is not available and iptables require special handling.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 15:00:42 +01:00
0fa69cae28 refactor: rename docker-compose.gpu.yaml to compose.yaml
Simplified compose file naming to follow Docker Compose best practices:
- Renamed docker-compose.gpu.yaml to compose.yaml
- Updated all references in documentation files (README.md, DEPLOYMENT.md, GPU_DEPLOYMENT_LOG.md, RUNPOD_TEMPLATE.md)
- Updated references in scripts (prepare-template.sh)

This change enables simpler command syntax:
- Before: docker compose -f docker-compose.gpu.yaml up -d orchestrator
- After: docker compose up -d orchestrator

Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 14:49:42 +01:00
277f1c95bd Initial commit: RunPod multi-modal AI orchestration stack
- Multi-modal AI infrastructure for RunPod RTX 4090
- Automatic model orchestration (text, image, music)
- Text: vLLM + Qwen 2.5 7B Instruct
- Image: Flux.1 Schnell via OpenEDAI
- Music: MusicGen Medium via AudioCraft
- Cost-optimized sequential loading on single GPU
- Template preparation scripts for rapid deployment
- Comprehensive documentation (README, DEPLOYMENT, TEMPLATE)
2025-11-21 14:34:55 +01:00