fix: update Docker registry from Docker Hub to dev.pivoine.art

- Use Gitea container registry instead of Docker Hub - Update workflow to use gitea.actor and REGISTRY_TOKEN - Update documentation to reflect correct registry URL - Match supervisor-ui workflow configuration
feat: add RunPod Docker template with automated build workflow
2025-11-23 21:57:14 +01:00 · 2025-11-23 21:53:56 +01:00 · 2025-11-23 19:57:45 +01:00 · 2025-11-23 19:54:41 +01:00
7 changed files with 907 additions and 3 deletions
--- a/.gitea/workflows/build-docker-image.yml
+++ b/.gitea/workflows/build-docker-image.yml
@@ -0,0 +1,114 @@
 name: Build and Push RunPod Docker Image
 on:
  push:
    branches:
      - main
    tags:
      - 'v*.*.*'
  pull_request:
    branches:
      - main
  workflow_dispatch:
    inputs:
      tag:
        description: 'Custom tag for the image'
        required: false
        default: 'manual'
 env:
  REGISTRY: dev.pivoine.art
  IMAGE_NAME: valknar/runpod-ai-orchestrator
 jobs:
  build-and-push:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
        with:
          platforms: linux/amd64
      - name: Log in to Gitea Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ gitea.actor }}
          password: ${{ secrets.REGISTRY_TOKEN }}
      - name: Extract metadata (tags, labels)
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            # Tag as 'latest' for main branch
            type=raw,value=latest,enable={{is_default_branch}}
            # Tag with branch name
            type=ref,event=branch
            # Tag with PR number
            type=ref,event=pr
            # Tag with git tag (semver)
            type=semver,pattern={{version}}
            type=semver,pattern={{major}}.{{minor}}
            type=semver,pattern={{major}}
            # Tag with commit SHA
            type=sha,prefix={{branch}}-
            # Custom tag from workflow_dispatch
            type=raw,value=${{ gitea.event.inputs.tag }},enable=${{ gitea.event_name == 'workflow_dispatch' }}
          labels: |
            org.opencontainers.image.title=RunPod AI Orchestrator
            org.opencontainers.image.description=Minimal Docker template for RunPod deployment with ComfyUI + vLLM orchestration, Supervisor process management, and Tailscale VPN integration
            org.opencontainers.image.vendor=valknar
            org.opencontainers.image.source=https://dev.pivoine.art/${{ gitea.repository }}
      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          file: ./Dockerfile
          platforms: linux/amd64
          push: ${{ gitea.event_name != 'pull_request' }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:buildcache
          cache-to: type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:buildcache,mode=max
      - name: Generate image digest
        if: gitea.event_name != 'pull_request'
        run: |
          echo "### Docker Image Published :rocket:" >> $GITEA_STEP_SUMMARY
          echo "" >> $GITEA_STEP_SUMMARY
          echo "**Registry:** \`${{ env.REGISTRY }}\`" >> $GITEA_STEP_SUMMARY
          echo "**Image:** \`${{ env.IMAGE_NAME }}\`" >> $GITEA_STEP_SUMMARY
          echo "" >> $GITEA_STEP_SUMMARY
          echo "**Tags:**" >> $GITEA_STEP_SUMMARY
          echo "\`\`\`" >> $GITEA_STEP_SUMMARY
          echo "${{ steps.meta.outputs.tags }}" >> $GITEA_STEP_SUMMARY
          echo "\`\`\`" >> $GITEA_STEP_SUMMARY
          echo "" >> $GITEA_STEP_SUMMARY
          echo "**Pull command:**" >> $GITEA_STEP_SUMMARY
          echo "\`\`\`bash" >> $GITEA_STEP_SUMMARY
          echo "docker pull ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest" >> $GITEA_STEP_SUMMARY
          echo "\`\`\`" >> $GITEA_STEP_SUMMARY
          echo "" >> $GITEA_STEP_SUMMARY
          echo "**Use in RunPod template:**" >> $GITEA_STEP_SUMMARY
          echo "\`\`\`" >> $GITEA_STEP_SUMMARY
          echo "Container Image: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest" >> $GITEA_STEP_SUMMARY
          echo "\`\`\`" >> $GITEA_STEP_SUMMARY
      - name: PR Comment - Image built but not pushed
        if: gitea.event_name == 'pull_request'
        run: |
          echo "### Docker Image Built Successfully :white_check_mark:" >> $GITEA_STEP_SUMMARY
          echo "" >> $GITEA_STEP_SUMMARY
          echo "Image was built successfully but **not pushed** (PR builds are not published)." >> $GITEA_STEP_SUMMARY
          echo "" >> $GITEA_STEP_SUMMARY
          echo "**Would be tagged as:**" >> $GITEA_STEP_SUMMARY
          echo "\`\`\`" >> $GITEA_STEP_SUMMARY
          echo "${{ steps.meta.outputs.tags }}" >> $GITEA_STEP_SUMMARY
          echo "\`\`\`" >> $GITEA_STEP_SUMMARY
--- a/26
+++ b/26
@@ -0,0 +1,26 @@
 # RunPod AI Orchestrator Template
 # Minimal Docker image for ComfyUI + vLLM orchestration
 # Models and application code live on network volume at /workspace
 FROM runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
 # Install Supervisor for process management
 RUN pip install --no-cache-dir supervisor
 # Install Tailscale for VPN connectivity
 RUN curl -fsSL https://tailscale.com/install.sh | sh
 # Install additional system utilities
 RUN apt-get update && apt-get install -y \
    wget \
    && rm -rf /var/lib/apt/lists/*
 # Copy the startup script
 COPY start.sh /start.sh
 RUN chmod +x /start.sh
 # Set working directory to /workspace (network volume mount point)
 WORKDIR /workspace
 # RunPod calls /start.sh by default
 CMD ["/start.sh"]
--- a/RUNPOD_TEMPLATE.md
+++ b/RUNPOD_TEMPLATE.md
@@ -0,0 +1,503 @@
 # RunPod Template Setup Guide
 This guide explains how to deploy the AI Orchestrator (ComfyUI + vLLM) on RunPod using a custom Docker template and network volume.
 ## Architecture Overview
 The deployment uses a **two-tier strategy**:
 1. **Docker Image** (software layer) - Contains system packages, Supervisor, Tailscale
 2. **Network Volume** (data layer) - Contains models, ComfyUI installation, venvs, configuration
 This approach allows fast pod deployment (~2-3 minutes) while keeping all large files (models, ~80-200GB) on a persistent network volume.
 ## Prerequisites
 - RunPod account with credits
 - Docker Hub account (for hosting the template image)
 - HuggingFace account with API token (for model downloads)
 - Tailscale account with auth key (optional, for VPN access)
 ## Step 1: Build and Push Docker Image
 ### Option A: Automated Build (Recommended)
 The repository includes a Gitea workflow that automatically builds and pushes the Docker image to your Gitea container registry when you push to the `main` branch or create a version tag.
 1. **Configure Gitea Secret:**
   - Go to your Gitea repository → Settings → Secrets
   - Add `REGISTRY_TOKEN` = your Gitea access token with registry permissions
   - (The workflow automatically uses your Gitea username via `gitea.actor`)
 2. **Trigger Build:**
   ```bash
   # Push to main branch
   git push origin main
   # Or create a version tag
   git tag v1.0.0
   git push origin v1.0.0
   ```
 3. **Monitor Build:**
   - Go to Actions tab in Gitea
   - Wait for build to complete (~5-10 minutes)
   - Note the Docker image name: `dev.pivoine.art/valknar/runpod-ai-orchestrator:latest`
 ### Option B: Manual Build
 If you prefer to build manually:
 ```bash
 # From the repository root
 cd /path/to/runpod
 # Build the image
 docker build -t dev.pivoine.art/valknar/runpod-ai-orchestrator:latest .
 # Login to your Gitea registry
 docker login dev.pivoine.art
 # Push to Gitea registry
 docker push dev.pivoine.art/valknar/runpod-ai-orchestrator:latest
 ```
 ## Step 2: Create Network Volume
 Network volumes persist your models and data across pod restarts and rebuilds.
 1. **Go to RunPod Dashboard → Storage → Network Volumes**
 2. **Click "New Network Volume"**
 3. **Configure:**
   - **Name**: `ai-orchestrator-models`
   - **Size**: `200GB` (adjust based on your needs)
     - Essential models only: ~80GB
     - All models: ~137-200GB
   - **Datacenter**: Choose closest to you (volume tied to datacenter)
 4. **Click "Create Volume"**
 5. **Note the Volume ID** (e.g., `vol-abc123def456`) for pod deployment
 ### Storage Requirements
 | Configuration | Size | Models Included |
 |--------------|------|-----------------|
 | Essential | ~80GB | FLUX Schnell, 1-2 SDXL checkpoints, MusicGen Medium |
 | Complete | ~137GB | All image/video/audio models from playbook |
 | Full + vLLM | ~200GB | Complete + Qwen 2.5 7B + Llama 3.1 8B |
 ## Step 3: Create RunPod Template
 1. **Go to RunPod Dashboard → Templates**
 2. **Click "New Template"**
 3. **Configure Template Settings:**
   **Container Configuration:**
   - **Template Name**: `AI Orchestrator (ComfyUI + vLLM)`
   - **Template Type**: Docker
   - **Container Image**: `dev.pivoine.art/valknar/runpod-ai-orchestrator:latest`
   - **Container Disk**: `50GB` (for system and temp files)
   - **Docker Command**: Leave empty (uses default `/start.sh`)
   **Volume Configuration:**
   - **Volume Mount Path**: `/workspace`
   - **Attach to Network Volume**: Select your volume ID from Step 2
   **Port Configuration:**
   - **Expose HTTP Ports**: `8188, 9000, 9001`
     - `8188` - ComfyUI web interface
     - `9000` - Model orchestrator API
     - `9001` - Supervisor web UI
   - **Expose TCP Ports**: `22` (SSH access)
   **Environment Variables:**
   ```
   HF_TOKEN=your_huggingface_token_here
   TAILSCALE_AUTHKEY=tskey-auth-your_tailscale_authkey_here
   SUPERVISOR_BACKEND_HOST=localhost
   SUPERVISOR_BACKEND_PORT=9001
   ```
   **Advanced Settings:**
   - **Start Jupyter**: No
   - **Start SSH**: Yes (handled by base image)
 4. **Click "Save Template"**
 ## Step 4: First Deployment (Initial Setup)
 The first time you deploy, you need to set up the network volume with models and configuration.
 ### 4.1 Deploy Pod
 1. **Go to RunPod Dashboard → Pods**
 2. **Click "Deploy"** or "GPU Pods"
 3. **Select your custom template**: `AI Orchestrator (ComfyUI + vLLM)`
 4. **Configure GPU:**
   - **GPU Type**: RTX 4090 (24GB VRAM) or higher
   - **Network Volume**: Select your volume from Step 2
   - **On-Demand vs Spot**: Choose based on budget
 5. **Click "Deploy"**
 ### 4.2 SSH into Pod
 ```bash
 # Get pod SSH command from RunPod dashboard
 ssh root@<pod-ip> -p <port> -i ~/.ssh/id_ed25519
 # Or use RunPod web terminal
 ```
 ### 4.3 Initial Setup on Network Volume
 ```bash
 # 1. Clone the repository to /workspace/ai
 cd /workspace
 git clone https://github.com/your-username/runpod.git ai
 cd ai
 # 2. Create .env file with your credentials
 cp .env.example .env
 nano .env
 # Edit and add:
 # HF_TOKEN=your_huggingface_token
 # TAILSCALE_AUTHKEY=tskey-auth-your_key
 # GPU_TAILSCALE_IP=<will be set automatically>
 # 3. Download essential models (this takes 30-60 minutes)
 ansible-playbook playbook.yml --tags comfyui-essential
 # OR download all models (1-2 hours)
 ansible-playbook playbook.yml --tags comfyui-models-all
 # 4. Link models to ComfyUI
 bash scripts/link-comfyui-models.sh
 # OR if arty is available
 arty run models/link-comfyui
 # 5. Install ComfyUI custom nodes dependencies
 cd /workspace/ComfyUI/custom_nodes/ComfyUI-Manager
 pip install -r requirements.txt
 cd /workspace/ai
 # 6. Restart the container to apply all changes
 exit
 # Go to RunPod dashboard → Stop pod → Start pod
 ```
 ### 4.4 Verify Services
 After restart, SSH back in and check:
 ```bash
 # Check supervisor status
 supervisorctl -c /workspace/supervisord.conf status
 # Expected output:
 # comfyui                          RUNNING   pid 123, uptime 0:01:00
 # (orchestrator is disabled by default - enable for vLLM)
 # Test ComfyUI
 curl -I http://localhost:8188
 # Test Supervisor web UI
 curl -I http://localhost:9001
 ```
 ## Step 5: Subsequent Deployments
 After initial setup, deploying new pods is quick (2-3 minutes):
 1. **Deploy pod** with same template + network volume
 2. **Wait for startup** (~1-2 minutes for services to start)
 3. **Access services:**
   - ComfyUI: `http://<pod-ip>:8188`
   - Supervisor: `http://<pod-ip>:9001`
 **All models, configuration, and data persist on the network volume!**
 ## Step 6: Access Services
 ### Via Direct IP (HTTP)
 Get pod IP and ports from RunPod dashboard:
 ```
 ComfyUI:           http://<pod-ip>:8188
 Supervisor UI:     http://<pod-ip>:9001
 Orchestrator API:  http://<pod-ip>:9000
 SSH:               ssh root@<pod-ip> -p <port>
 ```
 ### Via Tailscale VPN (Recommended)
 If you configured `TAILSCALE_AUTHKEY`, the pod automatically joins your Tailscale network:
 1. **Get Tailscale IP:**
   ```bash
   ssh root@<pod-ip> -p <port>
   tailscale ip -4
   # Example output: 100.114.60.40
   ```
 2. **Access via Tailscale:**
   ```
   ComfyUI:      http://<tailscale-ip>:8188
   Supervisor:   http://<tailscale-ip>:9001
   Orchestrator: http://<tailscale-ip>:9000
   SSH:          ssh root@<tailscale-ip>
   ```
 3. **Update LiteLLM config** on your VPS with the Tailscale IP
 ## Service Management
 ### Start/Stop Services
 ```bash
 # Start all services
 supervisorctl -c /workspace/supervisord.conf start all
 # Stop all services
 supervisorctl -c /workspace/supervisord.conf stop all
 # Restart specific service
 supervisorctl -c /workspace/supervisord.conf restart comfyui
 # View status
 supervisorctl -c /workspace/supervisord.conf status
 ```
 ### Enable vLLM Models (Text Generation)
 By default, only ComfyUI runs (to save VRAM). To enable vLLM:
 1. **Stop ComfyUI** (frees up VRAM):
   ```bash
   supervisorctl -c /workspace/supervisord.conf stop comfyui
   ```
 2. **Start orchestrator** (manages vLLM models):
   ```bash
   supervisorctl -c /workspace/supervisord.conf start orchestrator
   ```
 3. **Test text generation:**
   ```bash
   curl -X POST http://localhost:9000/v1/chat/completions \
     -H 'Content-Type: application/json' \
     -d '{"model":"qwen-2.5-7b","messages":[{"role":"user","content":"Hello"}]}'
   ```
 ### Switch Back to ComfyUI
 ```bash
 # Stop orchestrator (stops all vLLM models)
 supervisorctl -c /workspace/supervisord.conf stop orchestrator
 # Start ComfyUI
 supervisorctl -c /workspace/supervisord.conf start comfyui
 ```
 ## Updating the Template
 When you make changes to code or configuration:
 ### Update Docker Image
 ```bash
 # 1. Make changes to Dockerfile or start.sh
 # 2. Push to repository
 git add .
 git commit -m "Update template configuration"
 git push origin main
 # 3. Gitea workflow auto-builds new image
 # 4. Terminate old pod and deploy new one with updated image
 ```
 ### Update Network Volume Data
 ```bash
 # SSH into running pod
 ssh root@<pod-ip> -p <port>
 # Update repository
 cd /workspace/ai
 git pull
 # Re-run Ansible if needed
 ansible-playbook playbook.yml --tags <specific-tag>
 # Restart services
 supervisorctl -c /workspace/supervisord.conf restart all
 ```
 ## Troubleshooting
 ### Pod fails to start
 **Check logs:**
 ```bash
 # Via SSH
 cat /workspace/logs/supervisord.log
 cat /workspace/logs/comfyui.err.log
 # Via RunPod web terminal
 tail -f /workspace/logs/*.log
 ```
 **Common issues:**
 - Missing `.env` file → Create `/workspace/ai/.env` with required vars
 - Supervisor config not found → Ensure `/workspace/ai/supervisord.conf` exists
 - Port conflicts → Check if services are already running
 ### Tailscale not connecting
 **Check Tailscale status:**
 ```bash
 tailscale status
 tailscale ip -4
 ```
 **Common issues:**
 - Missing or invalid `TAILSCALE_AUTHKEY` in `.env`
 - Auth key expired → Generate new key in Tailscale admin
 - Firewall blocking → RunPod should allow Tailscale by default
 ### Services not starting
 **Check Supervisor:**
 ```bash
 supervisorctl -c /workspace/supervisord.conf status
 supervisorctl -c /workspace/supervisord.conf tail -f comfyui
 ```
 **Common issues:**
 - venv broken → Re-run `scripts/bootstrap-venvs.sh`
 - Models not downloaded → Run Ansible playbook again
 - Python version mismatch → Rebuild venvs
 ### Out of VRAM
 **Check GPU memory:**
 ```bash
 nvidia-smi
 ```
 **RTX 4090 (24GB) capacity:**
 - ComfyUI (FLUX Schnell): ~23GB (can't run with vLLM)
 - vLLM (Qwen 2.5 7B): ~14GB
 - vLLM (Llama 3.1 8B): ~17GB
 **Solution:** Only run one service at a time (see Service Management section)
 ### Network volume full
 **Check disk usage:**
 ```bash
 df -h /workspace
 du -sh /workspace/*
 ```
 **Clean up:**
 ```bash
 # Remove old HuggingFace cache
 rm -rf /workspace/huggingface_cache
 # Re-download essential models only
 cd /workspace/ai
 ansible-playbook playbook.yml --tags comfyui-essential
 ```
 ## Cost Optimization
 ### Spot vs On-Demand
 - **Spot instances**: ~70% cheaper, can be interrupted
 - **On-Demand**: More expensive, guaranteed availability
 **Recommendation:** Use spot for development, on-demand for production
 ### Network Volume Pricing
 - First 1TB: $0.07/GB/month
 - Beyond 1TB: $0.05/GB/month
 **200GB volume cost:** ~$14/month
 ### Pod Auto-Stop
 Configure auto-stop in RunPod pod settings to save costs when idle:
 - Stop after 15 minutes idle
 - Stop after 1 hour idle
 - Manual stop only
 ## Advanced Configuration
 ### Custom Environment Variables
 Add to template or pod environment variables:
 ```bash
 # Model cache locations
 HF_HOME=/workspace/huggingface_cache
 TRANSFORMERS_CACHE=/workspace/huggingface_cache
 # ComfyUI settings
 COMFYUI_PORT=8188
 COMFYUI_LISTEN=0.0.0.0
 # Orchestrator settings
 ORCHESTRATOR_PORT=9000
 # GPU settings
 CUDA_VISIBLE_DEVICES=0
 ```
 ### Multiple Network Volumes
 You can attach multiple network volumes for organization:
 1. **Models volume** - `/workspace/models` (read-only, shared)
 2. **Data volume** - `/workspace/data` (read-write, per-project)
 ### Custom Startup Script
 Override `/start.sh` behavior by creating `/workspace/custom-start.sh`:
 ```bash
 #!/bin/bash
 # Custom startup commands
 # Source default startup
 source /start.sh
 # Add your custom commands here
 echo "Running custom initialization..."
 ```
 ## References
 - [RunPod Documentation](https://docs.runpod.io/)
 - [RunPod Templates Overview](https://docs.runpod.io/pods/templates/overview)
 - [Network Volumes Guide](https://docs.runpod.io/storage/network-volumes)
 - [ComfyUI Documentation](https://github.com/comfyanonymous/ComfyUI)
 - [Supervisor Documentation](http://supervisord.org/)
 - [Tailscale Documentation](https://tailscale.com/kb/)
 ## Support
 For issues or questions:
 - Check troubleshooting section above
 - Review `/workspace/logs/` files
 - Check RunPod community forums
 - Open issue in project repository
--- a/comfyui/workflows/nsfw/pony-anime-t2i-production-v1.json
+++ b/comfyui/workflows/nsfw/pony-anime-t2i-production-v1.json
@@ -33,8 +33,8 @@
      "properties": {
        "Node name for S&R": "CheckpointLoaderSimple"
      },
-      "widgets_values": ["waiIllustriousSDXL_v150.safetensors"],
+      "widgets_values": ["ponyDiffusionV6XL_v6StartWithThisOne.safetensors"],
-      "title": "WAI-NSFW-Illustrious SDXL Checkpoint (Anime/Furry)"
+      "title": "Pony Diffusion V6 XL Checkpoint (Anime/Furry)"
    },
    {
      "id": 2,
@@ -242,7 +242,7 @@
      "version": "1.0",
      "description": "Production workflow for Pony Diffusion V6 XL optimized for anime, cartoon, and furry NSFW generation with danbooru tag support and balanced content (safe/questionable/explicit)",
      "category": "nsfw",
-      "model": "add-detail-xl.safetensors",
+      "model": "ponyDiffusionV6XL_v6StartWithThisOne.safetensors",
      "recommended_settings": {
        "sampler": "euler_ancestral or dpmpp_2m",
        "scheduler": "normal or karras",
--- a/models_civitai.yaml
+++ b/models_civitai.yaml
@@ -61,6 +61,9 @@ model_categories:
      base_model: SDXL 1.0
      vram_gb: 12
      tags: [nsfw, anime, furry, cartoon, versatile]
      files:
        - source: "ponyDiffusionV6XL_v6StartWithThisOne.safetensors"
          dest: "ponyDiffusionV6XL_v6StartWithThisOne.safetensors"
      training_info:
        images: "2.6M aesthetically ranked"
        ratio: "1:1:1 safe/questionable/explicit"
@@ -226,6 +229,9 @@ model_categories:
      tags: [negative, quality, anatomy, sdxl]
      trigger_word: "BadX"
      usage: "embedding:BadX"
      files:
        - source: "BadX-neg.pt"
          dest: "BadX-neg.pt"
      notes: "Use in negative prompt with LUSTIFY, RealVisXL, or other SDXL checkpoints. Fixes facial/hand artifacts."
  # ==========================================================================
@@ -244,6 +250,9 @@ model_categories:
      tags: [negative, quality, pony, nsfw]
      trigger_word: "zPDXL3"
      usage: "embedding:zPDXL3"
      files:
        - source: "zPDXL3.safetensors"
          dest: "zPDXL3.safetensors"
      recommended_settings:
        strength: "1.0-2.0"
      notes: "ONLY works with Pony Diffusion models. Removes censoring and improves quality."
@@ -260,6 +269,9 @@ model_categories:
      tags: [negative, nsfw, pony]
      trigger_word: "zPDXLxxx"
      usage: "embedding:zPDXLxxx"
      files:
        - source: "zPDXLxxx.pt"
          dest: "zPDXLxxx.pt"
      recommended_settings:
        strength: "1.0-2.0"
      notes: "ONLY for Pony Diffusion models. Enables explicit NSFW content generation."
--- a/scripts/bootstrap-venvs.sh
+++ b/scripts/bootstrap-venvs.sh
@@ -0,0 +1,108 @@
 #!/bin/bash
 # Virtual Environment Health Check and Bootstrap Script
 # Checks if Python venvs are compatible with current Python version
 # Rebuilds venvs if needed
 set -e
 echo "=== Python Virtual Environment Health Check ==="
 # Get current system Python version
 SYSTEM_PYTHON=$(python3 --version | awk '{print $2}')
 SYSTEM_PYTHON_MAJOR_MINOR=$(echo "$SYSTEM_PYTHON" | cut -d'.' -f1,2)
 echo "System Python: $SYSTEM_PYTHON ($SYSTEM_PYTHON_MAJOR_MINOR)"
 # List of venvs to check
 VENVS=(
    "/workspace/ai/vllm/venv"
    "/workspace/ai/webdav-sync/venv"
    "/workspace/ComfyUI/venv"
 )
 REBUILD_NEEDED=0
 # Check each venv
 for VENV_PATH in "${VENVS[@]}"; do
    if [ ! -d "$VENV_PATH" ]; then
        echo "⚠ venv not found: $VENV_PATH (will be created on first service start)"
        continue
    fi
    VENV_NAME=$(basename $(dirname "$VENV_PATH"))
    echo ""
    echo "Checking venv: $VENV_NAME ($VENV_PATH)"
    # Check if venv Python executable works
    if ! "$VENV_PATH/bin/python" --version >/dev/null 2>&1; then
        echo "  ❌ BROKEN - Python executable not working"
        REBUILD_NEEDED=1
        continue
    fi
    # Get venv Python version
    VENV_PYTHON=$("$VENV_PATH/bin/python" --version 2>&1 | awk '{print $2}')
    VENV_PYTHON_MAJOR_MINOR=$(echo "$VENV_PYTHON" | cut -d'.' -f1,2)
    echo "  venv Python: $VENV_PYTHON ($VENV_PYTHON_MAJOR_MINOR)"
    # Compare major.minor versions
    if [ "$SYSTEM_PYTHON_MAJOR_MINOR" != "$VENV_PYTHON_MAJOR_MINOR" ]; then
        echo "  ⚠ VERSION MISMATCH - System is $SYSTEM_PYTHON_MAJOR_MINOR, venv is $VENV_PYTHON_MAJOR_MINOR"
        REBUILD_NEEDED=1
    else
        # Check if pip works
        if ! "$VENV_PATH/bin/pip" --version >/dev/null 2>&1; then
            echo "  ❌ BROKEN - pip not working"
            REBUILD_NEEDED=1
        else
            echo "  ✓ HEALTHY"
        fi
    fi
 done
 # If any venv needs rebuild, warn the user
 if [ $REBUILD_NEEDED -eq 1 ]; then
    echo ""
    echo "========================================"
    echo " ⚠ WARNING: Some venvs need rebuilding"
    echo "========================================"
    echo ""
    echo "One or more Python virtual environments are incompatible with the current"
    echo "Python version or are broken. This can happen when:"
    echo "  - Docker image Python version changed"
    echo "  - venv files were corrupted"
    echo "  - Binary dependencies are incompatible"
    echo ""
    echo "RECOMMENDED ACTIONS:"
    echo ""
    echo "1. vLLM venv rebuild:"
    echo "   cd /workspace/ai/vllm"
    echo "   rm -rf venv"
    echo "   python3 -m venv venv"
    echo "   source venv/bin/activate"
    echo "   pip install -r requirements.txt"
    echo ""
    echo "2. ComfyUI venv rebuild:"
    echo "   cd /workspace/ComfyUI"
    echo "   rm -rf venv"
    echo "   python3 -m venv venv"
    echo "   source venv/bin/activate"
    echo "   pip install -r requirements.txt"
    echo ""
    echo "3. WebDAV sync venv rebuild (if used):"
    echo "   cd /workspace/ai/webdav-sync"
    echo "   rm -rf venv"
    echo "   python3 -m venv venv"
    echo "   source venv/bin/activate"
    echo "   pip install -r requirements.txt"
    echo ""
    echo "Services may fail to start until venvs are rebuilt!"
    echo "========================================"
    echo ""
 else
    echo ""
    echo "✓ All virtual environments are healthy"
 fi
 exit 0
--- a/start.sh
+++ b/start.sh
@@ -0,0 +1,141 @@
 #!/bin/bash
 # RunPod container startup script
 # This script initializes the container environment and starts all services
 set -e
 echo "========================================"
 echo " RunPod AI Orchestrator - Starting"
 echo "========================================"
 # [1/7] Start SSH server (required by RunPod)
 echo "[1/7] Starting SSH server..."
 service ssh start
 echo "  ✓ SSH server started"
 # [2/7] Add /workspace/bin to PATH for arty and custom scripts
 echo "[2/7] Configuring PATH..."
 export PATH="/workspace/bin:$PATH"
 echo "  ✓ PATH updated: /workspace/bin added"
 # [3/7] Source environment variables from network volume
 echo "[3/7] Loading environment from network volume..."
 if [ -f /workspace/ai/.env ]; then
    set -a
    source /workspace/ai/.env
    set +a
    echo "  ✓ Environment loaded from /workspace/ai/.env"
 else
    echo "  ⚠ No .env file found at /workspace/ai/.env"
    echo "  Some services may not function correctly without environment variables"
 fi
 # [4/7] Configure and start Tailscale VPN
 echo "[4/7] Configuring Tailscale VPN..."
 if [ -n "${TAILSCALE_AUTHKEY:-}" ]; then
    echo "  Starting Tailscale daemon..."
    tailscaled --tun=userspace-networking --socks5-server=localhost:1055 &
    sleep 3
    echo "  Connecting to Tailscale network..."
    HOSTNAME="runpod-$(hostname)"
    tailscale up --authkey="$TAILSCALE_AUTHKEY" --advertise-tags=tag:gpu --hostname="$HOSTNAME" || {
        echo "  ⚠ Tailscale connection failed, continuing without VPN"
    }
    # Get Tailscale IP if connected
    TAILSCALE_IP=$(tailscale ip -4 2>/dev/null || echo "not connected")
    if [ "$TAILSCALE_IP" != "not connected" ]; then
        echo "  ✓ Tailscale connected"
        echo "    Hostname: $HOSTNAME"
        echo "    IP: $TAILSCALE_IP"
        # Export for other services
        export GPU_TAILSCALE_IP="$TAILSCALE_IP"
    else
        echo "  ⚠ Tailscale failed to obtain IP"
    fi
 else
    echo "  ⚠ Tailscale disabled (no TAILSCALE_AUTHKEY in .env)"
    echo "  Services requiring VPN connectivity will not work"
 fi
 # [5/7] Check Python virtual environments health
 echo "[5/7] Checking Python virtual environments..."
 PYTHON_VERSION=$(python3 --version)
 echo "  System Python: $PYTHON_VERSION"
 # Check if bootstrap script exists and run it
 if [ -f /workspace/ai/scripts/bootstrap-venvs.sh ]; then
    echo "  Running venv health check..."
    bash /workspace/ai/scripts/bootstrap-venvs.sh
 else
    echo "  ⚠ No venv bootstrap script found (optional)"
 fi
 # [6/7] Configure Supervisor
 echo "[6/7] Configuring Supervisor process manager..."
 if [ -f /workspace/ai/supervisord.conf ]; then
    # Supervisor expects config at /workspace/supervisord.conf (based on arty scripts)
    if [ ! -f /workspace/supervisord.conf ]; then
        cp /workspace/ai/supervisord.conf /workspace/supervisord.conf
        echo "  ✓ Supervisor config copied to /workspace/supervisord.conf"
    fi
    # Create logs directory if it doesn't exist
    mkdir -p /workspace/logs
    echo "  ✓ Supervisor configured"
 else
    echo "  ⚠ No supervisord.conf found at /workspace/ai/supervisord.conf"
    echo "  Supervisor will not be started"
 fi
 # [7/7] Start Supervisor to manage services
 echo "[7/7] Starting Supervisor and managed services..."
 if [ -f /workspace/supervisord.conf ]; then
    # Start supervisor daemon
    supervisord -c /workspace/supervisord.conf
    echo "  ✓ Supervisor daemon started"
    # Wait a moment for services to initialize
    sleep 3
    # Display service status
    echo ""
    echo "Service Status:"
    echo "---------------"
    supervisorctl -c /workspace/supervisord.conf status || echo "  ⚠ Could not query service status"
 else
    echo "  ⚠ Skipping Supervisor startup (no config file)"
 fi
 # Display connection information
 echo ""
 echo "========================================"
 echo " Container Ready"
 echo "========================================"
 echo "Services:"
 echo "  - SSH: port 22"
 echo "  - ComfyUI: http://localhost:8188"
 echo "  - Supervisor Web UI: http://localhost:9001"
 echo "  - Model Orchestrator: http://localhost:9000"
 if [ -n "${TAILSCALE_IP:-}" ] && [ "$TAILSCALE_IP" != "not connected" ]; then
    echo "  - Tailscale IP: $TAILSCALE_IP"
 fi
 echo ""
 echo "Network Volume: /workspace"
 echo "Project Directory: /workspace/ai"
 echo "Logs: /workspace/logs"
 echo ""
 echo "To view service logs:"
 echo "  supervisorctl -c /workspace/supervisord.conf tail -f <service_name>"
 echo ""
 echo "To manage services:"
 echo "  supervisorctl -c /workspace/supervisord.conf status"
 echo "  supervisorctl -c /workspace/supervisord.conf restart <service_name>"
 echo "========================================"
 # Keep container running
 echo "Container is running. Press Ctrl+C to stop."
 sleep infinity
Author	SHA1	Message	Date
Sebastian Krüger	9439185b3d	fix: update Docker registry from Docker Hub to dev.pivoine.art All checks were successful Build and Push RunPod Docker Image / build-and-push (push) Successful in 2m8s Details - Use Gitea container registry instead of Docker Hub - Update workflow to use gitea.actor and REGISTRY_TOKEN - Update documentation to reflect correct registry URL - Match supervisor-ui workflow configuration	2025-11-23 21:57:14 +01:00
Sebastian Krüger	571431955d	feat: add RunPod Docker template with automated build workflow - Add Dockerfile with minimal setup (supervisor, tailscale) - Add start.sh bootstrap script for container initialization - Add Gitea workflow for automated Docker image builds - Add comprehensive RUNPOD_TEMPLATE.md documentation - Add bootstrap-venvs.sh for Python venv health checks This enables deployment of the AI orchestrator on RunPod using: - Minimal Docker image (~2-3GB) for fast deployment - Network volume for models and data persistence (~80-200GB) - Automated builds on push to main or version tags - Full Tailscale VPN integration - Supervisor process management	2025-11-23 21:53:56 +01:00
Sebastian Krüger	0e3150e26c	fix: correct Pony Diffusion workflow checkpoint reference - Changed checkpoint from waiIllustriousSDXL_v150.safetensors to ponyDiffusionV6XL_v6StartWithThisOne.safetensors - Fixed metadata model reference (was incorrectly referencing LoRA) - Added files field to models_civitai.yaml for explicit filename mapping - Aligns workflow with actual Pony Diffusion V6 XL model	2025-11-23 19:57:45 +01:00
Sebastian Krüger	f6de19bec1	feat: add files field for embeddings with different filenames - Add files field to badx-sdxl, pony-pdxl-hq-v3, pony-pdxl-xxx - Specifies actual downloaded filenames (BadX-neg.pt, zPDXL3.safetensors, zPDXLxxx.pt) - Allows script to properly link embeddings where YAML name != filename	2025-11-23 19:54:41 +01:00