Initial commit
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s

This commit is contained in:
2025-11-26 18:04:53 +01:00
parent 5c61ac5c67
commit 5f8c843b22
5 changed files with 175 additions and 32 deletions

105
CLAUDE.md Normal file
View File

@@ -0,0 +1,105 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Overview
RunPod AI Orchestrator - A Docker-based deployment template for GPU instances on RunPod. Orchestrates AI services (ComfyUI, vLLM, AudioCraft) using Supervisor for process management, with optional Tailscale VPN integration.
## Architecture
### Service Orchestration
- **Supervisor** manages all services via `supervisord.conf`
- Services run in isolated Python venvs under `services/<name>/venv/`
- Logs written to `.logs/` directory
- Each service has its own requirements.txt
### Services (managed by Supervisor)
| Service | Port | Description | Auto-start |
|---------|------|-------------|------------|
| ComfyUI | 8188 | Node-based image/video/audio generation | Yes |
| WebDAV Sync | - | Uploads ComfyUI outputs to HiDrive | Yes |
| AudioCraft | - | Music generation | Yes |
| vLLM Llama | 8001 | Llama 3.1 8B language model | No |
| vLLM BGE | 8002 | BGE embedding model | No |
### Model Management
- Models defined in YAML configs: `models/models_civitai.yaml`, `models/models_huggingface.yaml`
- Downloaded to `.cache/` and symlinked to `services/comfyui/models/`
- Uses external scripts: `artifact_civitai_download.sh`, `artifact_huggingface_download.sh`
### Repository Management (Arty)
- `arty.yml` defines git repos to clone (ComfyUI + custom nodes)
- Repos cloned to `services/comfyui/` and `services/comfyui/custom_nodes/`
- Run `arty sync` to clone/update all dependencies
## Common Commands
### Full Setup (on RunPod)
```bash
arty setup # Complete setup: deps, tailscale, services, comfyui, models, supervisor
```
### Supervisor Control
```bash
arty supervisor/start # Start supervisord
arty supervisor/stop # Stop all services
arty supervisor/status # Check service status
arty supervisor/restart # Restart all services
supervisorctl -c supervisord.conf status # Direct status check
supervisorctl -c supervisord.conf start comfyui # Start specific service
supervisorctl -c supervisord.conf tail -f comfyui # Follow logs
```
### Model Management
```bash
arty models/download # Download models from Civitai/HuggingFace
arty models/link # Symlink cached models to ComfyUI
```
### Setup Components
```bash
arty deps # Clone git references
arty setup/tailscale # Configure Tailscale VPN
arty setup/services # Create venvs for all services
arty setup/comfyui # Install ComfyUI and custom node dependencies
```
### Docker Build (CI runs on Gitea)
```bash
docker build -t runpod-ai-orchestrator .
```
## Environment Variables
Required in `.env` (or RunPod template):
- `HF_TOKEN` - HuggingFace API token
- `TAILSCALE_AUTHKEY` - Tailscale auth key (optional)
- `CIVITAI_API_KEY` - Civitai API key for model downloads
- `WEBDAV_URL`, `WEBDAV_USERNAME`, `WEBDAV_PASSWORD`, `WEBDAV_REMOTE_PATH` - WebDAV sync config
- `PUBLIC_KEY` - SSH public key for RunPod access
## File Structure
```
├── Dockerfile # Minimal base image (PyTorch + CUDA)
├── start.sh # Container entrypoint
├── supervisord.conf # Process manager config
├── arty.yml # Git repos + setup scripts
├── models/
│ ├── models_civitai.yaml # Civitai model definitions
│ └── models_huggingface.yaml # HuggingFace model definitions
├── services/
│ ├── comfyui/ # ComfyUI + custom_nodes (cloned by arty)
│ ├── audiocraft/ # AudioCraft Studio (cloned by arty)
│ ├── vllm/ # vLLM configs
│ └── webdav-sync/ # Output sync service
└── .gitea/workflows/ # CI/CD for Docker builds
```
## Important Notes
- **Network Volume**: On RunPod, `/workspace` is the persistent network volume. The orchestrator repo is cloned to `/workspace/orchestrator`.
- **Service Ports**: ComfyUI (8188), Supervisor Web UI (9001), vLLM Llama (8001), vLLM BGE (8002)
- **vLLM services** are disabled by default (autostart=false) to conserve GPU memory
- **Custom nodes** have their dependencies installed into the ComfyUI venv during setup

View File

@@ -111,34 +111,33 @@ scripts:
echo "========================================="
echo ""
if [ -n "${TAILSCALE_AUTHKEY:-}" ]; then
echo " Starting Tailscale daemon..."
tailscaled --tun=userspace-networking --socks5-server=localhost:1055 &
sleep 3
echo " Connecting to Tailscale network..."
HOSTNAME="runpod-ai-orchestrator"
tailscale up --authkey="$TAILSCALE_AUTHKEY" --advertise-tags=tag:gpu --hostname="$HOSTNAME" || {
echo " ⚠ Tailscale connection failed, continuing without VPN"
}
# Get Tailscale IP if connected
TAILSCALE_IP=$(tailscale ip -4 2>/dev/null || echo "not connected")
if [ "$TAILSCALE_IP" != "not connected" ]; then
echo " ✓ Tailscale connected"
echo " Hostname: $HOSTNAME"
echo " IP: $TAILSCALE_IP"
# Export for other services
export GPU_TAILSCALE_IP="$TAILSCALE_IP"
else
echo " ⚠ Tailscale failed to obtain IP"
fi
else
echo " ⚠ Tailscale disabled (no TAILSCALE_AUTHKEY in env)"
echo " Services requiring VPN connectivity will not work"
if [ ! "$TAILSCALE_AUTHKEY" ]; then
echo " Tailscale disabled (no TAILSCALE_AUTHKEY in env)"
echo " Services requiring VPN connectivity will not work"
exit 1
fi
echo " Starting Tailscale daemon..."
tailscaled --tun=userspace-networking --socks5-server=localhost:1055 &
sleep 3
echo " Connecting to Tailscale network..."
HOSTNAME="runpod-ai-orchestrator"
tailscale up --authkey="$TAILSCALE_AUTHKEY" --advertise-tags=tag:gpu --hostname="$HOSTNAME" || {
echo " ⚠ Tailscale connection failed, continuing without VPN"
}
# Get Tailscale IP if connected
TAILSCALE_IP=$(tailscale ip -4 2>/dev/null || echo "not connected")
if [ "$TAILSCALE_IP" == "not connected" ]; then
echo " ⚠ Tailscale failed to obtain IP"
exit 1
fi
echo " ✓ Tailscale connected"
echo " Hostname: $HOSTNAME"
echo " IP: $TAILSCALE_IP"
setup/services: |
echo "========================================="
echo " Setting up services python venvs"
@@ -241,6 +240,7 @@ scripts:
# Supervisor Control Scripts
#
supervisor/start: |
mkdir -p .logs/
supervisord -c supervisord.conf
supervisor/stop: |

View File

@@ -1,4 +0,0 @@
model: meta-llama/Llama-3.1-8B-Instruct
host: "0.0.0.0"
port: 8001
uvicorn-log-level: "info"

42
runpod.yml Normal file
View File

@@ -0,0 +1,42 @@
# RunPod Pod Configuration
# Used by service_runpod_control.sh
#
# Usage:
# service_runpod_control.sh create # Create pod from this config
# service_runpod_control.sh get # Show pod status
# service_runpod_control.sh start # Start the pod
# service_runpod_control.sh stop # Stop the pod
# service_runpod_control.sh remove # Delete the pod
pod:
# Required fields
name: "runpod-ai-orchestrator"
gpuType: "NVIDIA GeForce RTX 4090"
gpuCount: 1
# Template and volume IDs (from RunPod dashboard)
templateId: "runpod-ai-orchestrator"
networkVolumeId: "runpod-ai-orchestrator"
imageName: "dev.pivoine.art/valknar/runpod-ai-orchestrator:latest"
# Exposed ports
ports:
- "22/tcp"
# Optional: Resource limits
# containerDiskSize: 20 # GB (default: 20)
# volumeSize: 1 # GB (default: 1)
# volumePath: "/runpod" # Mount path
# mem: 20 # Minimum memory GB
# vcpu: 1 # Minimum vCPUs
# Optional: Cloud selection
# secureCloud: false # Use secure cloud only
# communityCloud: false # Use community cloud only
# Optional: Custom image (overrides template)
# imageName: "runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04"
# Optional: Environment variables
# env:
# - "HF_TOKEN=your_token_here"
# - "CUSTOM_VAR=value"

View File

@@ -25,7 +25,7 @@ if [ ! -d "$PWD/bin" ] ; then
git clone https://dev.pivoine.art/valknar/bin.git "$PWD/bin"
echo " ✓ bin cloned"
else
cd "$PWD/bin" && git stash && git pull && git stash pop || true
cd "$PWD/bin" && git fetch && git reset --hard origin/main
echo " ✓ bin updated"
cd -
fi
@@ -33,7 +33,7 @@ if [ ! -d "$PWD/orchestrator" ] ; then
git clone https://dev.pivoine.art/valknar/runpod-ai-orchestrator.git "$PWD/orchestrator"
echo " ✓ orchestrator cloned"
else
cd "$PWD/orchestrator" && git stash && git pull && git stash pop || true
cd "$PWD/orchestrator" && git fetch && git reset --hard origin/main
echo " ✓ orchestrator updated"
cd -
fi