- Add Dockerfile with minimal setup (supervisor, tailscale)
- Add start.sh bootstrap script for container initialization
- Add Gitea workflow for automated Docker image builds
- Add comprehensive RUNPOD_TEMPLATE.md documentation
- Add bootstrap-venvs.sh for Python venv health checks
This enables deployment of the AI orchestrator on RunPod using:
- Minimal Docker image (~2-3GB) for fast deployment
- Network volume for models and data persistence (~80-200GB)
- Automated builds on push to main or version tags
- Full Tailscale VPN integration
- Supervisor process management
Clean up project structure by organizing WebDAV sync service properly.
Changes:
- Move scripts/comfyui_webdav_sync.py → webdav-sync/webdav_sync.py
- Create webdav-sync/requirements.txt with watchdog and webdavclient3
- Remove webdav dependencies from model-orchestrator/requirements.txt
- Delete unused scripts/ folder (start-all.sh, status.sh, stop-all.sh)
- Update supervisord.conf to use new path /workspace/ai/webdav-sync/webdav_sync.py
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add Python watchdog service to automatically sync ComfyUI outputs to HiDrive WebDAV storage.
Changes:
- Add scripts/comfyui_webdav_sync.py: File watcher service using watchdog + webdavclient3
- Update model-orchestrator/requirements.txt: Add watchdog and webdavclient3 dependencies
- Update supervisord.conf: Add webdav-sync program with ENV variable support
- Update arty.yml: Add service management scripts (start/stop/restart/status/logs)
WebDAV credentials are now loaded from .env file (WEBDAV_URL, WEBDAV_USERNAME, WEBDAV_PASSWORD, WEBDAV_REMOTE_PATH)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add supervisord.conf with ComfyUI and orchestrator services
- Update Ansible playbook with supervisor installation tag
- Rewrite start-all.sh and stop-all.sh to use Supervisor
- Add status.sh script for checking service status
- Update arty.yml with supervisor commands and shortcuts
- Update CLAUDE.md with Supervisor documentation and troubleshooting
- Services now auto-restart on crashes with centralized logging
Benefits:
- Better process control than manual pkill/background jobs
- Auto-restart on service crashes
- Centralized log management in /workspace/logs/
- Web interface for monitoring (port 9001)
- Works perfectly in RunPod containers (no systemd needed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Remove flux/musicgen standalone implementations in favor of ComfyUI:
- Delete models/flux/ and models/musicgen/ directories
- Remove redundant scripts (install.sh, download-models.sh, prepare-template.sh)
- Update README.md to reference Ansible playbook commands
- Update playbook.yml to remove flux/musicgen service definitions
- Add COMFYUI_MODELS.md with comprehensive model installation guide
- Update stop-all.sh to only manage orchestrator and vLLM services
All model downloads and dependency management now handled via
Ansible playbook tags (base, python, vllm, comfyui, comfyui-essential).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Set DOCKER_BUILDKIT=0 to use legacy builder
- BuildKit has permission issues in RunPod's containerized environment
- Legacy builder works reliably with RunPod's security constraints
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Changed from systemctl/service to direct dockerd command
- Added --iptables=false --bridge=none flags (required for RunPod)
- Added proper error checking and 10s wait time
- Improved logging with verification step
This fixes Docker startup in RunPod's containerized environment where
systemd is not available and iptables require special handling.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>