Commit Graph

180 Commits

Author SHA1 Message Date
cc0d93060d feat: expose Supervisor web UI for proxy access
Disable Supervisor's built-in authentication and expose on all interfaces (0.0.0.0:9001) to enable reverse proxy access via nginx and Authelia SSO. Authentication is now handled by Authelia at supervisor.ai.pivoine.art.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 13:19:02 +01:00
16226c7b39 chore: remove Ansible infrastructure
Remove Ansible playbook and configuration files as all infrastructure
setup has been migrated to arty bash scripts.

Deleted:
- playbook.yml (~950 lines)
- inventory.yml
- ansible.cfg

All functionality now available via arty scripts:
- arty run install/essential
- arty run setup/system-packages
- arty run setup/comfyui-base
- arty run setup/comfyui-nodes
- arty run setup/supervisor
- arty run setup/tailscale

Model downloads will be handled separately (TBD).

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 12:23:11 +01:00
d5965ce0c4 refactor: replace Ansible with arty bash scripts
Completely migrate infrastructure setup from Ansible to pure bash scripts in arty.yml, enabling simpler deployment and eliminating Ansible dependency on RunPod.

## New Installation Scripts

**System Setup:**
- setup/system-packages: Install apt packages, verify GPU
- setup/python-env: Configure Python, pip, core packages
- setup/comfyui-base: Install ComfyUI, create model directories
- setup/comfyui-nodes: Install 5 essential custom nodes
- setup/supervisor: Install Supervisor process manager
- setup/tailscale: Install Tailscale VPN (optional)

**Utility Scripts:**
- setup/validate: Validate installation, check packages/GPU/cache
- setup/cleanup: Clean sensitive files for template creation

**Orchestration Scripts:**
- install/minimal: System + Python + ComfyUI + Supervisor
- install/essential: Minimal + Custom nodes (recommended)
- install/full: Essential + Tailscale VPN

## Changes

### Added
- 11 new bash scripts replacing Ansible playbook tasks
- Comprehensive deployment notes with all commands
- Script categorization: System Setup, Service Management, Health Checks
- Detailed progress output with visual indicators (✓/)

### Removed
- Ansible shortcut scripts (ansible/base, ansible/supervisor, etc.)
- deps/comfyui-nodes (integrated into setup/comfyui-nodes)
- References to Ansible in deployment workflow

### Modified
- Deployment notes: Complete rewrite with organized sections
- Fresh deployment: Now uses arty run install/essential
- Documentation: Updated paths, commands, and workflow

## Benefits

-  No Ansible dependency
-  Faster debugging (pure bash, no templates)
-  RunPod compatible (no sudo issues)
-  Modular: Run individual steps or orchestration
-  Better error handling and user feedback
-  Easier to maintain and extend

## Migration Path

Model downloads still use Ansible playbook (temporary):
- ansible-playbook playbook.yml --tags comfyui-essential
- ansible-playbook playbook.yml --tags comfyui-models-all

After successful testing, Ansible files can be safely removed:
- playbook.yml
- inventory.yml
- ansible.cfg

## Usage

```bash
# Fresh deployment
arty run install/essential
ansible-playbook playbook.yml --tags comfyui-essential
arty run models/link-comfyui
arty run workflows/link-comfyui
arty run services/start
```

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 12:21:05 +01:00
b370c16eb9 feat: add workflow linking script to arty.yml
Add new arty script `workflows/link-comfyui` to create symlinks for all
production workflows from git repository to ComfyUI workflows directory.

Script creates symlinks for:
- All 6 workflow category directories (text-to-image, image-to-image,
  image-to-video, text-to-music, upscaling, advanced)
- templates/ directory for future workflow templates
- README.md and WORKFLOW_STANDARDS.md documentation

Usage: arty run workflows/link-comfyui

Updates:
- Added workflows/link-comfyui script with informative output
- Updated Fresh Deployment guide to include workflow linking step
- Added section 6 "ComfyUI Workflows" with complete workflow listing
- Updated Storage section to document workflow symlinks
- Added workflow README to documentation list

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 12:12:06 +01:00
71a30c0e4d feat: add comprehensive ComfyUI workflow collection
Add 20 production-ready ComfyUI workflows across 6 categories:

Text-to-Image (4 workflows):
- FLUX Schnell (fast, 4 steps)
- FLUX Dev (high-quality, 20-50 steps)
- SDXL + Refiner (two-stage, detailed)
- SD3.5 Large (latest generation)

Image-to-Image (3 workflows):
- IP-Adapter Style Transfer
- IP-Adapter Face Portrait
- IP-Adapter Multi-Composition

Image-to-Video (3 workflows):
- CogVideoX (6s AI-driven video)
- SVD (14 frames, quick animations)
- SVD-XT (25 frames, extended)

Text-to-Music (4 workflows):
- MusicGen Small/Medium/Large
- MusicGen Melody (melody conditioning)

Upscaling (3 workflows):
- Ultimate SD Upscale (professional)
- Simple Upscale (fast)
- Face Upscale (portrait-focused)

Advanced (3 workflows):
- ControlNet Fusion (multi-control)
- AnimateDiff Video (text-to-video)
- Batch Pipeline (multiple variations)

Documentation:
- README.md: Usage guide, model requirements, examples
- WORKFLOW_STANDARDS.md: Development standards, best practices

All workflows include:
- API compatibility for orchestrator integration
- Error handling and validation
- VRAM optimization for 24GB GPUs
- Preview and save nodes
- Comprehensive metadata and parameters
- Performance benchmarks

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 12:08:18 +01:00
6323488591 fix: add missing ComfyUI-Impact-Pack dependencies
- Add scikit-image for image processing
- Add piexif for EXIF metadata handling
- Add segment-anything for SAM model support

These dependencies are required for ComfyUI-Impact-Pack to load correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 09:33:43 +01:00
664da9f4ea feat: add Supervisor process manager for service management
- Add supervisord.conf with ComfyUI and orchestrator services
- Update Ansible playbook with supervisor installation tag
- Rewrite start-all.sh and stop-all.sh to use Supervisor
- Add status.sh script for checking service status
- Update arty.yml with supervisor commands and shortcuts
- Update CLAUDE.md with Supervisor documentation and troubleshooting
- Services now auto-restart on crashes with centralized logging

Benefits:
- Better process control than manual pkill/background jobs
- Auto-restart on service crashes
- Centralized log management in /workspace/logs/
- Web interface for monitoring (port 9001)
- Works perfectly in RunPod containers (no systemd needed)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 09:22:16 +01:00
2207d60f98 feat: add Arty configuration and Claude Code documentation
- Add arty.yml for repository management with environment profiles (prod/dev/minimal)
- Add CLAUDE.md with comprehensive architecture and usage documentation
- Add comfyui_models.yaml for ComfyUI model configuration
- Include deployment scripts for model linking and dependency installation
- Document all git repositories (ComfyUI + 10 custom nodes)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 02:50:36 +01:00
c9b01eef68 refactor: consolidate model management into Ansible playbook
Remove flux/musicgen standalone implementations in favor of ComfyUI:
- Delete models/flux/ and models/musicgen/ directories
- Remove redundant scripts (install.sh, download-models.sh, prepare-template.sh)
- Update README.md to reference Ansible playbook commands
- Update playbook.yml to remove flux/musicgen service definitions
- Add COMFYUI_MODELS.md with comprehensive model installation guide
- Update stop-all.sh to only manage orchestrator and vLLM services

All model downloads and dependency management now handled via
Ansible playbook tags (base, python, vllm, comfyui, comfyui-essential).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 00:31:26 +01:00
7edf17551a Add Llama 3.1 8B model to orchestrator 2025-11-21 21:29:57 +01:00
c4426ccc58 Remove Flux and MusicGen models from orchestrator
ComfyUI now handles Flux image generation directly.
MusicGen is not being used and has been removed.
2025-11-21 21:14:43 +01:00
205e591fb4 feat: add ComfyUI integration with Ansible playbook
- Add ComfyUI installation to Ansible playbook with 'comfyui' tag
- Create ComfyUI requirements.txt and start.sh script
- Clone ComfyUI from official GitHub repository
- Symlink HuggingFace cache for Flux model access
- ComfyUI runs on port 8188 with CORS enabled
- Add ComfyUI to services list in playbook

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 20:41:43 +01:00
538609da3e docs: add comprehensive README for AI model orchestrator
- Architecture overview (process-based orchestrator)
- Installation and setup instructions for RunPod
- Available models (text, image, music generation)
- API endpoints and usage examples
- Integration guide for LiteLLM proxy
- Troubleshooting section with streaming fixes
- Deployment options (RunPod, VPS, Tailscale VPN)
- Adding new models workflow
- Project structure documentation

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 19:47:43 +01:00
34fc456a3b feat: add /v1/models endpoint and fix streaming proxy in subprocess orchestrator 2025-11-21 19:34:20 +01:00
91089d3edc feat: add /v1/models endpoint and systemd service for orchestrator
- Add OpenAI-compatible /v1/models endpoint to list available models
- Create systemd service file for proper service management
- Service runs as root with automatic restart on failure
- Logs to systemd journal for easy debugging
2025-11-21 19:28:16 +01:00
9947fe37bb fix: properly proxy streaming requests without buffering
The orchestrator was calling response.json() which buffered the entire
streaming response before returning it. This caused LiteLLM to receive
only one chunk with empty content instead of token-by-token streaming.

Changes:
- Detect streaming requests by parsing request body for 'stream': true
- Use client.stream() with aiter_bytes() for streaming requests
- Return StreamingResponse with proper SSE headers
- Keep original JSONResponse behavior for non-streaming requests

This fixes streaming from vLLM → orchestrator → LiteLLM chain.
2025-11-21 19:21:56 +01:00
7f1890517d fix: enable eager execution for proper token streaming in vLLM
- Set enforce_eager=True to disable CUDA graphs which were batching outputs
- Add disable_log_stats=True for better streaming performance
- This ensures AsyncLLMEngine yields tokens incrementally instead of returning complete response
2025-11-21 18:25:50 +01:00
94080da341 fix: remove incorrect start-vllm.sh that would break orchestrator architecture 2025-11-21 18:10:53 +01:00
6944e4ebd5 feat: add vllm serve script with proper streaming support 2025-11-21 18:08:21 +01:00
d21caa56bc fix: implement incremental streaming deltas for vLLM chat completions
- Track previous_text to calculate deltas instead of sending full accumulated text
- Fixes WebUI streaming issue where responses appeared empty
- Only send new tokens in each SSE chunk delta
- Resolves OpenAI API compatibility for streaming chat completions
2025-11-21 17:23:18 +01:00
57b706abe6 fix: correct vLLM service port to 8000
- Updated qwen-2.5-7b port from 8001 to 8000 in models.yaml
- Matches actual vLLM server default port configuration
- Tested and verified: orchestrator successfully loaded model and generated response
2025-11-21 16:28:54 +01:00
9a637cc4fc refactor: clean Docker files and restore standalone model services
- Remove all Docker-related files (Dockerfiles, compose.yaml)
- Remove documentation files (README, ARCHITECTURE, docs/)
- Remove old core/ directory (base_service, service_manager)
- Update models.yaml with correct service_script paths (models/*/server.py)
- Simplify vLLM requirements.txt to let vLLM manage dependencies
- Restore original standalone vLLM server (no base_service dependency)
- Remove obsolete vllm/, musicgen/, flux/ directories

Process-based architecture is now fully functional on RunPod.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 16:17:38 +01:00
9ee626a78e feat: implement Ansible-based process architecture for RunPod
Major architecture overhaul to address RunPod Docker limitations:

Core Infrastructure:
- Add base_service.py: Abstract base class for all AI services
- Add service_manager.py: Process lifecycle management
- Add core/requirements.txt: Core dependencies

Model Services (Standalone Python):
- Add models/vllm/server.py: Qwen 2.5 7B text generation
- Add models/flux/server.py: Flux.1 Schnell image generation
- Add models/musicgen/server.py: MusicGen Medium music generation
- Each service inherits from GPUService base class
- OpenAI-compatible APIs
- Standalone execution support

Ansible Deployment:
- Add playbook.yml: Comprehensive deployment automation
- Add ansible.cfg: Ansible configuration
- Add inventory.yml: Localhost inventory
- Tags: base, python, dependencies, models, tailscale, validate, cleanup

Scripts:
- Add scripts/install.sh: Full installation wrapper
- Add scripts/download-models.sh: Model download wrapper
- Add scripts/start-all.sh: Start orchestrator
- Add scripts/stop-all.sh: Stop all services

Documentation:
- Update ARCHITECTURE.md: Document distributed VPS+GPU architecture

Benefits:
- No Docker: Avoids RunPod CAP_SYS_ADMIN limitations
- Fully reproducible via Ansible
- Extensible: Add models in 3 steps
- Direct Python execution (no container overhead)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 15:37:18 +01:00
03a430894d docs: add clean extensible architecture design
Created comprehensive architecture document for RunPod deployment:

**Key Design Principles:**
- No Docker (direct Python for RunPod compatibility)
- Extensible (add models in 3 simple steps)
- Maintainable (clear structure, base classes)
- Simple (one command startup)

**Structure:**
- core/ - Base service class + service manager
- model-orchestrator/ - Request routing
- models/ - Service implementations (vllm, flux, musicgen)
- scripts/ - Install, start, stop, template prep
- docs/ - Adding models, deployment, templates

**Adding New Models:**
1. Create server.py inheriting BaseService
2. Add entry to models.yaml
3. Add requirements.txt

That's it! Orchestrator handles lifecycle automatically.

Next: Implement base_service.py and refactor existing services.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 15:16:51 +01:00
31be1932e7 wip: start architecture redesign for RunPod (no Docker)
Started redesigning architecture to run services directly without Docker:

**Completed:**
- Created new process-based orchestrator (orchestrator_subprocess.py)
- Uses subprocess instead of Docker SDK for process management
- Updated models.yaml to reference service_script paths
- vLLM server already standalone-ready

**Still needed:**
- Create/update Flux and MusicGen standalone servers
- Create systemd service files or startup scripts
- Update prepare-template script for Python deployment
- Remove Docker/Compose dependencies
- Test full stack on RunPod
- Update documentation

Reason for change: RunPod's containerized environment doesn't support
Docker-in-Docker (requires CAP_SYS_ADMIN). Direct Python execution is
simpler, faster, and more reliable for RunPod.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 15:09:30 +01:00
cd9e2eee2e fix: use legacy Docker builder for RunPod compatibility
- Set DOCKER_BUILDKIT=0 to use legacy builder
- BuildKit has permission issues in RunPod's containerized environment
- Legacy builder works reliably with RunPod's security constraints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 15:01:16 +01:00
8f1d4bedd2 fix: update Docker daemon startup for RunPod environment
- Changed from systemctl/service to direct dockerd command
- Added --iptables=false --bridge=none flags (required for RunPod)
- Added proper error checking and 10s wait time
- Improved logging with verification step

This fixes Docker startup in RunPod's containerized environment where
systemd is not available and iptables require special handling.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 15:00:42 +01:00
0fa69cae28 refactor: rename docker-compose.gpu.yaml to compose.yaml
Simplified compose file naming to follow Docker Compose best practices:
- Renamed docker-compose.gpu.yaml to compose.yaml
- Updated all references in documentation files (README.md, DEPLOYMENT.md, GPU_DEPLOYMENT_LOG.md, RUNPOD_TEMPLATE.md)
- Updated references in scripts (prepare-template.sh)

This change enables simpler command syntax:
- Before: docker compose -f docker-compose.gpu.yaml up -d orchestrator
- After: docker compose up -d orchestrator

Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 14:49:42 +01:00
cafa0a1147 refactor: clean up runpod repository structure
Removed facefusion and VPS-related files:
- compose.yaml, postgres/, litellm-config.yaml (VPS services)
- Dockerfile, entrypoint.sh, disable-nsfw-filter.patch (facefusion)

Removed outdated documentation:
- DOCKER_GPU_SETUP.md, README_GPU_SETUP.md, SETUP_GUIDE.md
- TAILSCALE_SETUP.md, WIREGUARD_SETUP.md (covered in DEPLOYMENT.md)
- GPU_EXPANSION_PLAN.md (historical planning doc)
- gpu-server-compose.yaml, litellm-config-gpu.yaml (old versions)
- deploy-gpu-stack.sh, simple_vllm_server.py (old scripts)

Organized documentation:
- Created docs/ directory
- Moved DEPLOYMENT.md, RUNPOD_TEMPLATE.md, GPU_DEPLOYMENT_LOG.md to docs/
- Updated all documentation links in README.md

Final structure:
- Clean root directory with only GPU-specific files
- Organized documentation in docs/
- Model services in dedicated directories (model-orchestrator/, vllm/, flux/, musicgen/)
- Automation scripts in scripts/
2025-11-21 14:45:49 +01:00
277f1c95bd Initial commit: RunPod multi-modal AI orchestration stack
- Multi-modal AI infrastructure for RunPod RTX 4090
- Automatic model orchestration (text, image, music)
- Text: vLLM + Qwen 2.5 7B Instruct
- Image: Flux.1 Schnell via OpenEDAI
- Music: MusicGen Medium via AudioCraft
- Cost-optimized sequential loading on single GPU
- Template preparation scripts for rapid deployment
- Comprehensive documentation (README, DEPLOYMENT, TEMPLATE)
2025-11-21 14:34:55 +01:00