Initial commit
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
This commit is contained in:
105
CLAUDE.md
Normal file
105
CLAUDE.md
Normal file
@@ -0,0 +1,105 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Overview
|
||||
|
||||
RunPod AI Orchestrator - A Docker-based deployment template for GPU instances on RunPod. Orchestrates AI services (ComfyUI, vLLM, AudioCraft) using Supervisor for process management, with optional Tailscale VPN integration.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Service Orchestration
|
||||
- **Supervisor** manages all services via `supervisord.conf`
|
||||
- Services run in isolated Python venvs under `services/<name>/venv/`
|
||||
- Logs written to `.logs/` directory
|
||||
- Each service has its own requirements.txt
|
||||
|
||||
### Services (managed by Supervisor)
|
||||
| Service | Port | Description | Auto-start |
|
||||
|---------|------|-------------|------------|
|
||||
| ComfyUI | 8188 | Node-based image/video/audio generation | Yes |
|
||||
| WebDAV Sync | - | Uploads ComfyUI outputs to HiDrive | Yes |
|
||||
| AudioCraft | - | Music generation | Yes |
|
||||
| vLLM Llama | 8001 | Llama 3.1 8B language model | No |
|
||||
| vLLM BGE | 8002 | BGE embedding model | No |
|
||||
|
||||
### Model Management
|
||||
- Models defined in YAML configs: `models/models_civitai.yaml`, `models/models_huggingface.yaml`
|
||||
- Downloaded to `.cache/` and symlinked to `services/comfyui/models/`
|
||||
- Uses external scripts: `artifact_civitai_download.sh`, `artifact_huggingface_download.sh`
|
||||
|
||||
### Repository Management (Arty)
|
||||
- `arty.yml` defines git repos to clone (ComfyUI + custom nodes)
|
||||
- Repos cloned to `services/comfyui/` and `services/comfyui/custom_nodes/`
|
||||
- Run `arty sync` to clone/update all dependencies
|
||||
|
||||
## Common Commands
|
||||
|
||||
### Full Setup (on RunPod)
|
||||
```bash
|
||||
arty setup # Complete setup: deps, tailscale, services, comfyui, models, supervisor
|
||||
```
|
||||
|
||||
### Supervisor Control
|
||||
```bash
|
||||
arty supervisor/start # Start supervisord
|
||||
arty supervisor/stop # Stop all services
|
||||
arty supervisor/status # Check service status
|
||||
arty supervisor/restart # Restart all services
|
||||
supervisorctl -c supervisord.conf status # Direct status check
|
||||
supervisorctl -c supervisord.conf start comfyui # Start specific service
|
||||
supervisorctl -c supervisord.conf tail -f comfyui # Follow logs
|
||||
```
|
||||
|
||||
### Model Management
|
||||
```bash
|
||||
arty models/download # Download models from Civitai/HuggingFace
|
||||
arty models/link # Symlink cached models to ComfyUI
|
||||
```
|
||||
|
||||
### Setup Components
|
||||
```bash
|
||||
arty deps # Clone git references
|
||||
arty setup/tailscale # Configure Tailscale VPN
|
||||
arty setup/services # Create venvs for all services
|
||||
arty setup/comfyui # Install ComfyUI and custom node dependencies
|
||||
```
|
||||
|
||||
### Docker Build (CI runs on Gitea)
|
||||
```bash
|
||||
docker build -t runpod-ai-orchestrator .
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Required in `.env` (or RunPod template):
|
||||
- `HF_TOKEN` - HuggingFace API token
|
||||
- `TAILSCALE_AUTHKEY` - Tailscale auth key (optional)
|
||||
- `CIVITAI_API_KEY` - Civitai API key for model downloads
|
||||
- `WEBDAV_URL`, `WEBDAV_USERNAME`, `WEBDAV_PASSWORD`, `WEBDAV_REMOTE_PATH` - WebDAV sync config
|
||||
- `PUBLIC_KEY` - SSH public key for RunPod access
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
├── Dockerfile # Minimal base image (PyTorch + CUDA)
|
||||
├── start.sh # Container entrypoint
|
||||
├── supervisord.conf # Process manager config
|
||||
├── arty.yml # Git repos + setup scripts
|
||||
├── models/
|
||||
│ ├── models_civitai.yaml # Civitai model definitions
|
||||
│ └── models_huggingface.yaml # HuggingFace model definitions
|
||||
├── services/
|
||||
│ ├── comfyui/ # ComfyUI + custom_nodes (cloned by arty)
|
||||
│ ├── audiocraft/ # AudioCraft Studio (cloned by arty)
|
||||
│ ├── vllm/ # vLLM configs
|
||||
│ └── webdav-sync/ # Output sync service
|
||||
└── .gitea/workflows/ # CI/CD for Docker builds
|
||||
```
|
||||
|
||||
## Important Notes
|
||||
|
||||
- **Network Volume**: On RunPod, `/workspace` is the persistent network volume. The orchestrator repo is cloned to `/workspace/orchestrator`.
|
||||
- **Service Ports**: ComfyUI (8188), Supervisor Web UI (9001), vLLM Llama (8001), vLLM BGE (8002)
|
||||
- **vLLM services** are disabled by default (autostart=false) to conserve GPU memory
|
||||
- **Custom nodes** have their dependencies installed into the ComfyUI venv during setup
|
||||
52
arty.yml
52
arty.yml
@@ -111,34 +111,33 @@ scripts:
|
||||
echo "========================================="
|
||||
echo ""
|
||||
|
||||
if [ -n "${TAILSCALE_AUTHKEY:-}" ]; then
|
||||
echo " Starting Tailscale daemon..."
|
||||
tailscaled --tun=userspace-networking --socks5-server=localhost:1055 &
|
||||
sleep 3
|
||||
|
||||
echo " Connecting to Tailscale network..."
|
||||
HOSTNAME="runpod-ai-orchestrator"
|
||||
tailscale up --authkey="$TAILSCALE_AUTHKEY" --advertise-tags=tag:gpu --hostname="$HOSTNAME" || {
|
||||
echo " ⚠ Tailscale connection failed, continuing without VPN"
|
||||
}
|
||||
|
||||
# Get Tailscale IP if connected
|
||||
TAILSCALE_IP=$(tailscale ip -4 2>/dev/null || echo "not connected")
|
||||
if [ "$TAILSCALE_IP" != "not connected" ]; then
|
||||
echo " ✓ Tailscale connected"
|
||||
echo " Hostname: $HOSTNAME"
|
||||
echo " IP: $TAILSCALE_IP"
|
||||
|
||||
# Export for other services
|
||||
export GPU_TAILSCALE_IP="$TAILSCALE_IP"
|
||||
else
|
||||
echo " ⚠ Tailscale failed to obtain IP"
|
||||
fi
|
||||
else
|
||||
echo " ⚠ Tailscale disabled (no TAILSCALE_AUTHKEY in env)"
|
||||
echo " Services requiring VPN connectivity will not work"
|
||||
if [ ! "$TAILSCALE_AUTHKEY" ]; then
|
||||
echo " ⚠ Tailscale disabled (no TAILSCALE_AUTHKEY in env)"
|
||||
echo " Services requiring VPN connectivity will not work"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo " Starting Tailscale daemon..."
|
||||
tailscaled --tun=userspace-networking --socks5-server=localhost:1055 &
|
||||
sleep 3
|
||||
|
||||
echo " Connecting to Tailscale network..."
|
||||
HOSTNAME="runpod-ai-orchestrator"
|
||||
tailscale up --authkey="$TAILSCALE_AUTHKEY" --advertise-tags=tag:gpu --hostname="$HOSTNAME" || {
|
||||
echo " ⚠ Tailscale connection failed, continuing without VPN"
|
||||
}
|
||||
|
||||
# Get Tailscale IP if connected
|
||||
TAILSCALE_IP=$(tailscale ip -4 2>/dev/null || echo "not connected")
|
||||
if [ "$TAILSCALE_IP" == "not connected" ]; then
|
||||
echo " ⚠ Tailscale failed to obtain IP"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo " ✓ Tailscale connected"
|
||||
echo " Hostname: $HOSTNAME"
|
||||
echo " IP: $TAILSCALE_IP"
|
||||
|
||||
setup/services: |
|
||||
echo "========================================="
|
||||
echo " Setting up services python venvs"
|
||||
@@ -241,6 +240,7 @@ scripts:
|
||||
# Supervisor Control Scripts
|
||||
#
|
||||
supervisor/start: |
|
||||
mkdir -p .logs/
|
||||
supervisord -c supervisord.conf
|
||||
|
||||
supervisor/stop: |
|
||||
|
||||
@@ -1,4 +0,0 @@
|
||||
model: meta-llama/Llama-3.1-8B-Instruct
|
||||
host: "0.0.0.0"
|
||||
port: 8001
|
||||
uvicorn-log-level: "info"
|
||||
42
runpod.yml
Normal file
42
runpod.yml
Normal file
@@ -0,0 +1,42 @@
|
||||
# RunPod Pod Configuration
|
||||
# Used by service_runpod_control.sh
|
||||
#
|
||||
# Usage:
|
||||
# service_runpod_control.sh create # Create pod from this config
|
||||
# service_runpod_control.sh get # Show pod status
|
||||
# service_runpod_control.sh start # Start the pod
|
||||
# service_runpod_control.sh stop # Stop the pod
|
||||
# service_runpod_control.sh remove # Delete the pod
|
||||
|
||||
pod:
|
||||
# Required fields
|
||||
name: "runpod-ai-orchestrator"
|
||||
gpuType: "NVIDIA GeForce RTX 4090"
|
||||
gpuCount: 1
|
||||
|
||||
# Template and volume IDs (from RunPod dashboard)
|
||||
templateId: "runpod-ai-orchestrator"
|
||||
networkVolumeId: "runpod-ai-orchestrator"
|
||||
imageName: "dev.pivoine.art/valknar/runpod-ai-orchestrator:latest"
|
||||
# Exposed ports
|
||||
ports:
|
||||
- "22/tcp"
|
||||
|
||||
# Optional: Resource limits
|
||||
# containerDiskSize: 20 # GB (default: 20)
|
||||
# volumeSize: 1 # GB (default: 1)
|
||||
# volumePath: "/runpod" # Mount path
|
||||
# mem: 20 # Minimum memory GB
|
||||
# vcpu: 1 # Minimum vCPUs
|
||||
|
||||
# Optional: Cloud selection
|
||||
# secureCloud: false # Use secure cloud only
|
||||
# communityCloud: false # Use community cloud only
|
||||
|
||||
# Optional: Custom image (overrides template)
|
||||
# imageName: "runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04"
|
||||
|
||||
# Optional: Environment variables
|
||||
# env:
|
||||
# - "HF_TOKEN=your_token_here"
|
||||
# - "CUSTOM_VAR=value"
|
||||
4
start.sh
4
start.sh
@@ -25,7 +25,7 @@ if [ ! -d "$PWD/bin" ] ; then
|
||||
git clone https://dev.pivoine.art/valknar/bin.git "$PWD/bin"
|
||||
echo " ✓ bin cloned"
|
||||
else
|
||||
cd "$PWD/bin" && git stash && git pull && git stash pop || true
|
||||
cd "$PWD/bin" && git fetch && git reset --hard origin/main
|
||||
echo " ✓ bin updated"
|
||||
cd -
|
||||
fi
|
||||
@@ -33,7 +33,7 @@ if [ ! -d "$PWD/orchestrator" ] ; then
|
||||
git clone https://dev.pivoine.art/valknar/runpod-ai-orchestrator.git "$PWD/orchestrator"
|
||||
echo " ✓ orchestrator cloned"
|
||||
else
|
||||
cd "$PWD/orchestrator" && git stash && git pull && git stash pop || true
|
||||
cd "$PWD/orchestrator" && git fetch && git reset --hard origin/main
|
||||
echo " ✓ orchestrator updated"
|
||||
cd -
|
||||
fi
|
||||
|
||||
Reference in New Issue
Block a user