Multi-modal AI stack (text/image/music generation) has been moved to: Repository: ssh://git@dev.pivoine.art:2222/valknar/runpod.git Updated ai/README.md to document: - VPS AI services (Open WebUI, Crawl4AI, AI PostgreSQL) - Reference to new runpod repository for GPU infrastructure - Clear separation between VPS and GPU deployments - Integration architecture via Tailscale VPN
AI Infrastructure
This directory contains AI-related configurations for the VPS deployment.
Multi-Modal GPU Infrastructure (Migrated)
The multi-modal AI orchestration stack (text, image, music generation) has been moved to a dedicated repository:
Repository: https://dev.pivoine.art/valknar/runpod
The RunPod repository contains:
- Model orchestrator for automatic switching between text, image, and music models
- vLLM + Qwen 2.5 7B (text generation)
- Flux.1 Schnell (image generation)
- MusicGen Medium (music generation)
- RunPod template creation scripts
- Complete deployment documentation
This separation allows for independent management of:
- VPS Services (this repo): Open WebUI, Crawl4AI, AI database
- GPU Services (runpod repo): Model inference, orchestration, RunPod templates
VPS AI Services (ai/compose.yaml)
This compose stack manages the VPS-side AI infrastructure that integrates with the GPU server:
Services
ai_postgres
Dedicated PostgreSQL 16 instance with pgvector extension for AI workloads:
- Vector similarity search support
- Isolated from core database for performance
- Used by Open WebUI for RAG and embeddings
webui (Open WebUI)
ChatGPT-like interface exposed at ai.pivoine.art:8080:
- Claude API integration via Anthropic
- RAG support with document upload
- Vector storage via pgvector
- Web search capability
- SMTP email via IONOS
- User signup enabled
crawl4ai
Internal web scraping service for LLM content preparation:
- API on port 11235 (not exposed publicly)
- Optimized for AI/RAG workflows
- Integration with Open WebUI and n8n
Integration with GPU Server
The VPS AI services connect to the GPU server via Tailscale VPN:
- VPS Tailscale IP: 100.102.217.79
- GPU Tailscale IP: 100.100.108.13
LiteLLM Proxy (port 4000 on VPS) routes requests:
- Claude API for chat completions
- GPU orchestrator for self-hosted models (text, image, music)
See ../litellm-config.yaml for routing configuration.
Environment Variables
Required in .env:
# AI Database
AI_DB_PASSWORD=<password>
# Open WebUI
AI_WEBUI_SECRET_KEY=<secret>
# Claude API
ANTHROPIC_API_KEY=<api_key>
# Email (IONOS SMTP)
ADMIN_EMAIL=<email>
SMTP_HOST=smtp.ionos.com
SMTP_PORT=587
SMTP_USER=<smtp_user>
SMTP_PASSWORD=<smtp_password>
Backup Configuration
AI services are backed up daily via Restic:
- ai_postgres_data: 3 AM (7 daily, 4 weekly, 6 monthly, 2 yearly)
- ai_webui_data: 3 AM (same retention)
- ai_crawl4ai_data: 3 AM (same retention)
Repository: /mnt/hidrive/users/valknar/Backup
Management Commands
# Start AI stack
pnpm arty up ai_postgres webui crawl4ai
# View logs
docker logs -f ai_webui
docker logs -f ai_postgres
docker logs -f ai_crawl4ai
# Check Open WebUI
curl http://ai.pivoine.art:8080/health
# Restart AI services
pnpm arty restart ai_postgres webui crawl4ai
GPU Server Management
For GPU server operations (model orchestration, template creation, etc.):
# Clone the dedicated repository
git clone ssh://git@dev.pivoine.art:2222/valknar/runpod.git
# See runpod repository for:
# - Model orchestration setup
# - RunPod template creation
# - GPU deployment guides
Documentation
VPS AI Services
- GPU_DEPLOYMENT_LOG.md - VPS AI deployment history
GPU Server (Separate Repository)
- runpod/README.md - Main GPU documentation
- runpod/DEPLOYMENT.md - Deployment guide
- runpod/RUNPOD_TEMPLATE.md - Template creation
Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ VPS (Tailscale: 100.102.217.79) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ LiteLLM Proxy (Port 4000) │ │
│ │ Routes to: Claude API + GPU Orchestrator │ │
│ └───────┬───────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────▼─────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ Open WebUI │ │ Crawl4AI │ │ AI PostgreSQL │ │
│ │ Port: 8080 │ │ Port: 11235 │ │ + pgvector │ │
│ └─────────────────┘ └──────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ Tailscale VPN
┌──────────────────────────────┼──────────────────────────────────┐
│ RunPod GPU Server (Tailscale: 100.100.108.13) │
│ ┌───────────────────────────▼──────────────────────────────┐ │
│ │ Orchestrator (Port 9000) │ │
│ │ Manages sequential model loading │ │
│ └─────┬──────────────┬──────────────────┬──────────────────┘ │
│ │ │ │ │
│ ┌─────▼──────┐ ┌────▼────────┐ ┌──────▼───────┐ │
│ │vLLM │ │Flux.1 │ │MusicGen │ │
│ │Qwen 2.5 7B │ │Schnell │ │Medium │ │
│ │Port: 8001 │ │Port: 8002 │ │Port: 8003 │ │
│ └────────────┘ └─────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Support
For issues:
- VPS AI services: Check logs via
docker logs - GPU server: See runpod repository documentation
- LiteLLM routing: Review
../litellm-config.yaml