runpod

Author	SHA1	Message	Date
Sebastian Krüger	571431955d	feat: add RunPod Docker template with automated build workflow - Add Dockerfile with minimal setup (supervisor, tailscale) - Add start.sh bootstrap script for container initialization - Add Gitea workflow for automated Docker image builds - Add comprehensive RUNPOD_TEMPLATE.md documentation - Add bootstrap-venvs.sh for Python venv health checks This enables deployment of the AI orchestrator on RunPod using: - Minimal Docker image (~2-3GB) for fast deployment - Network volume for models and data persistence (~80-200GB) - Automated builds on push to main or version tags - Full Tailscale VPN integration - Supervisor process management	2025-11-23 21:53:56 +01:00
Sebastian Krüger	18cd87fbd1	refactor: reorganize webdav-sync into dedicated directory Clean up project structure by organizing WebDAV sync service properly. Changes: - Move scripts/comfyui_webdav_sync.py → webdav-sync/webdav_sync.py - Create webdav-sync/requirements.txt with watchdog and webdavclient3 - Remove webdav dependencies from model-orchestrator/requirements.txt - Delete unused scripts/ folder (start-all.sh, status.sh, stop-all.sh) - Update supervisord.conf to use new path /workspace/ai/webdav-sync/webdav_sync.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 19:03:48 +01:00
Sebastian Krüger	79442bd62e	feat: add WebDAV sync service for ComfyUI outputs Add Python watchdog service to automatically sync ComfyUI outputs to HiDrive WebDAV storage. Changes: - Add scripts/comfyui_webdav_sync.py: File watcher service using watchdog + webdavclient3 - Update model-orchestrator/requirements.txt: Add watchdog and webdavclient3 dependencies - Update supervisord.conf: Add webdav-sync program with ENV variable support - Update arty.yml: Add service management scripts (start/stop/restart/status/logs) WebDAV credentials are now loaded from .env file (WEBDAV_URL, WEBDAV_USERNAME, WEBDAV_PASSWORD, WEBDAV_REMOTE_PATH) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 18:58:18 +01:00
Sebastian Krüger	664da9f4ea	feat: add Supervisor process manager for service management - Add supervisord.conf with ComfyUI and orchestrator services - Update Ansible playbook with supervisor installation tag - Rewrite start-all.sh and stop-all.sh to use Supervisor - Add status.sh script for checking service status - Update arty.yml with supervisor commands and shortcuts - Update CLAUDE.md with Supervisor documentation and troubleshooting - Services now auto-restart on crashes with centralized logging Benefits: - Better process control than manual pkill/background jobs - Auto-restart on service crashes - Centralized log management in /workspace/logs/ - Web interface for monitoring (port 9001) - Works perfectly in RunPod containers (no systemd needed) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 09:22:16 +01:00
Sebastian Krüger	c9b01eef68	refactor: consolidate model management into Ansible playbook Remove flux/musicgen standalone implementations in favor of ComfyUI: - Delete models/flux/ and models/musicgen/ directories - Remove redundant scripts (install.sh, download-models.sh, prepare-template.sh) - Update README.md to reference Ansible playbook commands - Update playbook.yml to remove flux/musicgen service definitions - Add COMFYUI_MODELS.md with comprehensive model installation guide - Update stop-all.sh to only manage orchestrator and vLLM services All model downloads and dependency management now handled via Ansible playbook tags (base, python, vllm, comfyui, comfyui-essential). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 00:31:26 +01:00
Sebastian Krüger	9ee626a78e	feat: implement Ansible-based process architecture for RunPod Major architecture overhaul to address RunPod Docker limitations: Core Infrastructure: - Add base_service.py: Abstract base class for all AI services - Add service_manager.py: Process lifecycle management - Add core/requirements.txt: Core dependencies Model Services (Standalone Python): - Add models/vllm/server.py: Qwen 2.5 7B text generation - Add models/flux/server.py: Flux.1 Schnell image generation - Add models/musicgen/server.py: MusicGen Medium music generation - Each service inherits from GPUService base class - OpenAI-compatible APIs - Standalone execution support Ansible Deployment: - Add playbook.yml: Comprehensive deployment automation - Add ansible.cfg: Ansible configuration - Add inventory.yml: Localhost inventory - Tags: base, python, dependencies, models, tailscale, validate, cleanup Scripts: - Add scripts/install.sh: Full installation wrapper - Add scripts/download-models.sh: Model download wrapper - Add scripts/start-all.sh: Start orchestrator - Add scripts/stop-all.sh: Stop all services Documentation: - Update ARCHITECTURE.md: Document distributed VPS+GPU architecture Benefits: - No Docker: Avoids RunPod CAP_SYS_ADMIN limitations - Fully reproducible via Ansible - Extensible: Add models in 3 steps - Direct Python execution (no container overhead) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 15:37:18 +01:00
Sebastian Krüger	cd9e2eee2e	fix: use legacy Docker builder for RunPod compatibility - Set DOCKER_BUILDKIT=0 to use legacy builder - BuildKit has permission issues in RunPod's containerized environment - Legacy builder works reliably with RunPod's security constraints 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 15:01:16 +01:00
Sebastian Krüger	8f1d4bedd2	fix: update Docker daemon startup for RunPod environment - Changed from systemctl/service to direct dockerd command - Added --iptables=false --bridge=none flags (required for RunPod) - Added proper error checking and 10s wait time - Improved logging with verification step This fixes Docker startup in RunPod's containerized environment where systemd is not available and iptables require special handling. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 15:00:42 +01:00
Sebastian Krüger	0fa69cae28	refactor: rename docker-compose.gpu.yaml to compose.yaml Simplified compose file naming to follow Docker Compose best practices: - Renamed docker-compose.gpu.yaml to compose.yaml - Updated all references in documentation files (README.md, DEPLOYMENT.md, GPU_DEPLOYMENT_LOG.md, RUNPOD_TEMPLATE.md) - Updated references in scripts (prepare-template.sh) This change enables simpler command syntax: - Before: docker compose -f docker-compose.gpu.yaml up -d orchestrator - After: docker compose up -d orchestrator Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 14:49:42 +01:00
Sebastian Krüger	277f1c95bd	Initial commit: RunPod multi-modal AI orchestration stack - Multi-modal AI infrastructure for RunPod RTX 4090 - Automatic model orchestration (text, image, music) - Text: vLLM + Qwen 2.5 7B Instruct - Image: Flux.1 Schnell via OpenEDAI - Music: MusicGen Medium via AudioCraft - Cost-optimized sequential loading on single GPU - Template preparation scripts for rapid deployment - Comprehensive documentation (README, DEPLOYMENT, TEMPLATE)	2025-11-21 14:34:55 +01:00

10 Commits