diff --git a/ai/README.md b/ai/README.md deleted file mode 100644 index 4f386a9..0000000 --- a/ai/README.md +++ /dev/null @@ -1,170 +0,0 @@ -# AI Infrastructure - -This directory contains AI-related configurations for the VPS deployment. - -## Multi-Modal GPU Infrastructure (Migrated) - -**The multi-modal AI orchestration stack (text, image, music generation) has been moved to a dedicated repository:** - -**Repository**: https://dev.pivoine.art/valknar/runpod - -The RunPod repository contains: -- Model orchestrator for automatic switching between text, image, and music models -- vLLM + Qwen 2.5 7B (text generation) -- Flux.1 Schnell (image generation) -- MusicGen Medium (music generation) -- RunPod template creation scripts -- Complete deployment documentation - -This separation allows for independent management of: -- **VPS Services** (this repo): Open WebUI, Crawl4AI, AI database -- **GPU Services** (runpod repo): Model inference, orchestration, RunPod templates - -## VPS AI Services (ai/compose.yaml) - -This compose stack manages the VPS-side AI infrastructure that integrates with the GPU server: - -### Services - -#### ai_postgres -Dedicated PostgreSQL 16 instance with pgvector extension for AI workloads: -- Vector similarity search support -- Isolated from core database for performance -- Used by Open WebUI for RAG and embeddings - -#### webui (Open WebUI) -ChatGPT-like interface exposed at `ai.pivoine.art:8080`: -- Claude API integration via Anthropic -- RAG support with document upload -- Vector storage via pgvector -- Web search capability -- SMTP email via IONOS -- User signup enabled - -#### crawl4ai -Internal web scraping service for LLM content preparation: -- API on port 11235 (not exposed publicly) -- Optimized for AI/RAG workflows -- Integration with Open WebUI and n8n - -## Integration with GPU Server - -The VPS AI services connect to the GPU server via Tailscale VPN: -- **VPS Tailscale IP**: 100.102.217.79 -- **GPU Tailscale IP**: 100.100.108.13 - -**LiteLLM Proxy** (port 4000 on VPS) routes requests: -- Claude API for chat completions -- GPU orchestrator for self-hosted models (text, image, music) - -See `../litellm-config.yaml` for routing configuration. - -## Environment Variables - -Required in `.env`: -```bash -# AI Database -AI_DB_PASSWORD= - -# Open WebUI -AI_WEBUI_SECRET_KEY= - -# Claude API -ANTHROPIC_API_KEY= - -# Email (IONOS SMTP) -ADMIN_EMAIL= -SMTP_HOST=smtp.ionos.com -SMTP_PORT=587 -SMTP_USER= -SMTP_PASSWORD= -``` - -## Backup Configuration - -AI services are backed up daily via Restic: -- **ai_postgres_data**: 3 AM (7 daily, 4 weekly, 6 monthly, 2 yearly) -- **ai_webui_data**: 3 AM (same retention) -- **ai_crawl4ai_data**: 3 AM (same retention) - -Repository: `/mnt/hidrive/users/valknar/Backup` - -## Management Commands - -```bash -# Start AI stack -pnpm arty up ai_postgres webui crawl4ai - -# View logs -docker logs -f ai_webui -docker logs -f ai_postgres -docker logs -f ai_crawl4ai - -# Check Open WebUI -curl http://ai.pivoine.art:8080/health - -# Restart AI services -pnpm arty restart ai_postgres webui crawl4ai -``` - -## GPU Server Management - -For GPU server operations (model orchestration, template creation, etc.): - -```bash -# Clone the dedicated repository -git clone ssh://git@dev.pivoine.art:2222/valknar/runpod.git - -# See runpod repository for: -# - Model orchestration setup -# - RunPod template creation -# - GPU deployment guides -``` - -## Documentation - -### VPS AI Services -- [GPU_DEPLOYMENT_LOG.md](GPU_DEPLOYMENT_LOG.md) - VPS AI deployment history - -### GPU Server (Separate Repository) -- [runpod/README.md](https://dev.pivoine.art/valknar/runpod) - Main GPU documentation -- [runpod/DEPLOYMENT.md](https://dev.pivoine.art/valknar/runpod) - Deployment guide -- [runpod/RUNPOD_TEMPLATE.md](https://dev.pivoine.art/valknar/runpod) - Template creation - -## Architecture Overview - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ VPS (Tailscale: 100.102.217.79) │ -│ ┌───────────────────────────────────────────────────────────┐ │ -│ │ LiteLLM Proxy (Port 4000) │ │ -│ │ Routes to: Claude API + GPU Orchestrator │ │ -│ └───────┬───────────────────────────────────────────────────┘ │ -│ │ │ -│ ┌───────▼─────────┐ ┌──────────────┐ ┌─────────────────┐ │ -│ │ Open WebUI │ │ Crawl4AI │ │ AI PostgreSQL │ │ -│ │ Port: 8080 │ │ Port: 11235 │ │ + pgvector │ │ -│ └─────────────────┘ └──────────────┘ └─────────────────┘ │ -└─────────────────────────────────────────────────────────────────┘ - │ Tailscale VPN -┌──────────────────────────────┼──────────────────────────────────┐ -│ RunPod GPU Server (Tailscale: 100.100.108.13) │ -│ ┌───────────────────────────▼──────────────────────────────┐ │ -│ │ Orchestrator (Port 9000) │ │ -│ │ Manages sequential model loading │ │ -│ └─────┬──────────────┬──────────────────┬──────────────────┘ │ -│ │ │ │ │ -│ ┌─────▼──────┐ ┌────▼────────┐ ┌──────▼───────┐ │ -│ │vLLM │ │Flux.1 │ │MusicGen │ │ -│ │Qwen 2.5 7B │ │Schnell │ │Medium │ │ -│ │Port: 8001 │ │Port: 8002 │ │Port: 8003 │ │ -│ └────────────┘ └─────────────┘ └──────────────┘ │ -└─────────────────────────────────────────────────────────────────┘ -``` - -## Support - -For issues: -- **VPS AI services**: Check logs via `docker logs` -- **GPU server**: See runpod repository documentation -- **LiteLLM routing**: Review `../litellm-config.yaml`