# AI Infrastructure This directory contains AI-related configurations for the VPS deployment. ## Multi-Modal GPU Infrastructure (Migrated) **The multi-modal AI orchestration stack (text, image, music generation) has been moved to a dedicated repository:** **Repository**: https://dev.pivoine.art/valknar/runpod The RunPod repository contains: - Model orchestrator for automatic switching between text, image, and music models - vLLM + Qwen 2.5 7B (text generation) - Flux.1 Schnell (image generation) - MusicGen Medium (music generation) - RunPod template creation scripts - Complete deployment documentation This separation allows for independent management of: - **VPS Services** (this repo): Open WebUI, Crawl4AI, AI database - **GPU Services** (runpod repo): Model inference, orchestration, RunPod templates ## VPS AI Services (ai/compose.yaml) This compose stack manages the VPS-side AI infrastructure that integrates with the GPU server: ### Services #### ai_postgres Dedicated PostgreSQL 16 instance with pgvector extension for AI workloads: - Vector similarity search support - Isolated from core database for performance - Used by Open WebUI for RAG and embeddings #### webui (Open WebUI) ChatGPT-like interface exposed at `ai.pivoine.art:8080`: - Claude API integration via Anthropic - RAG support with document upload - Vector storage via pgvector - Web search capability - SMTP email via IONOS - User signup enabled #### crawl4ai Internal web scraping service for LLM content preparation: - API on port 11235 (not exposed publicly) - Optimized for AI/RAG workflows - Integration with Open WebUI and n8n ## Integration with GPU Server The VPS AI services connect to the GPU server via Tailscale VPN: - **VPS Tailscale IP**: 100.102.217.79 - **GPU Tailscale IP**: 100.100.108.13 **LiteLLM Proxy** (port 4000 on VPS) routes requests: - Claude API for chat completions - GPU orchestrator for self-hosted models (text, image, music) See `../litellm-config.yaml` for routing configuration. ## Environment Variables Required in `.env`: ```bash # AI Database AI_DB_PASSWORD= # Open WebUI AI_WEBUI_SECRET_KEY= # Claude API ANTHROPIC_API_KEY= # Email (IONOS SMTP) ADMIN_EMAIL= SMTP_HOST=smtp.ionos.com SMTP_PORT=587 SMTP_USER= SMTP_PASSWORD= ``` ## Backup Configuration AI services are backed up daily via Restic: - **ai_postgres_data**: 3 AM (7 daily, 4 weekly, 6 monthly, 2 yearly) - **ai_webui_data**: 3 AM (same retention) - **ai_crawl4ai_data**: 3 AM (same retention) Repository: `/mnt/hidrive/users/valknar/Backup` ## Management Commands ```bash # Start AI stack pnpm arty up ai_postgres webui crawl4ai # View logs docker logs -f ai_webui docker logs -f ai_postgres docker logs -f ai_crawl4ai # Check Open WebUI curl http://ai.pivoine.art:8080/health # Restart AI services pnpm arty restart ai_postgres webui crawl4ai ``` ## GPU Server Management For GPU server operations (model orchestration, template creation, etc.): ```bash # Clone the dedicated repository git clone ssh://git@dev.pivoine.art:2222/valknar/runpod.git # See runpod repository for: # - Model orchestration setup # - RunPod template creation # - GPU deployment guides ``` ## Documentation ### VPS AI Services - [GPU_DEPLOYMENT_LOG.md](GPU_DEPLOYMENT_LOG.md) - VPS AI deployment history ### GPU Server (Separate Repository) - [runpod/README.md](https://dev.pivoine.art/valknar/runpod) - Main GPU documentation - [runpod/DEPLOYMENT.md](https://dev.pivoine.art/valknar/runpod) - Deployment guide - [runpod/RUNPOD_TEMPLATE.md](https://dev.pivoine.art/valknar/runpod) - Template creation ## Architecture Overview ``` ┌─────────────────────────────────────────────────────────────────┐ │ VPS (Tailscale: 100.102.217.79) │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ LiteLLM Proxy (Port 4000) │ │ │ │ Routes to: Claude API + GPU Orchestrator │ │ │ └───────┬───────────────────────────────────────────────────┘ │ │ │ │ │ ┌───────▼─────────┐ ┌──────────────┐ ┌─────────────────┐ │ │ │ Open WebUI │ │ Crawl4AI │ │ AI PostgreSQL │ │ │ │ Port: 8080 │ │ Port: 11235 │ │ + pgvector │ │ │ └─────────────────┘ └──────────────┘ └─────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ │ Tailscale VPN ┌──────────────────────────────┼──────────────────────────────────┐ │ RunPod GPU Server (Tailscale: 100.100.108.13) │ │ ┌───────────────────────────▼──────────────────────────────┐ │ │ │ Orchestrator (Port 9000) │ │ │ │ Manages sequential model loading │ │ │ └─────┬──────────────┬──────────────────┬──────────────────┘ │ │ │ │ │ │ │ ┌─────▼──────┐ ┌────▼────────┐ ┌──────▼───────┐ │ │ │vLLM │ │Flux.1 │ │MusicGen │ │ │ │Qwen 2.5 7B │ │Schnell │ │Medium │ │ │ │Port: 8001 │ │Port: 8002 │ │Port: 8003 │ │ │ └────────────┘ └─────────────┘ └──────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` ## Support For issues: - **VPS AI services**: Check logs via `docker logs` - **GPU server**: See runpod repository documentation - **LiteLLM routing**: Review `../litellm-config.yaml`