runpod

Author	SHA1	Message	Date
Sebastian Krüger	5af3eeb333	feat: add BGE embedding service and reorganize supervisor groups All checks were successful Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s Details - Add vLLM embedding server for BAAI/bge-large-en-v1.5 (port 8002) - Reorganize supervisor into two logical groups: - comfyui-services: comfyui, webdav-sync - vllm-services: vllm-qwen, vllm-llama, vllm-embedding - Update arty.yml service management scripts for new group structure - Add individual service control scripts for all vLLM models Note: Embedding server currently uses placeholder implementation For production use, switch to sentence-transformers or native vLLM embedding mode 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-25 06:32:01 +01:00
Sebastian Krüger	1ad99cdb53	refactor: replace orchestrator with dedicated vLLM servers for Qwen and Llama	2025-11-23 16:00:03 +01:00
Sebastian Krüger	cc0f55df38	fix: reduce max_model_len to 20000 to fit in 24GB VRAM	2025-11-23 15:43:37 +01:00
Sebastian Krüger	5cfd03f1ef	fix: improve streaming with proper delta format and increase max_model_len to 32768	2025-11-23 15:38:18 +01:00
Sebastian Krüger	fdd724298a	fix: increase max_tokens limit from 4096 to 32768 for LLMX CLI support	2025-11-23 15:10:06 +01:00
Sebastian Krüger	a8c2ee1b90	fix: make model name and port configurable via environment variables	2025-11-23 13:45:01 +01:00
Sebastian Krüger	16112e50f6	fix: relax dependency version constraints for vllm compatibility	2025-11-23 13:33:46 +01:00
Sebastian Krüger	e0a43259d4	fix: update pydantic version constraint to match vllm requirements	2025-11-23 13:33:22 +01:00
Sebastian Krüger	b94df17845	feat: add requirements.txt for vLLM models 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 13:25:03 +01:00
Sebastian Krüger	897dcb175a	refactor: reorganize directory structure and remove hardcoded paths Move comfyui and vllm out of models/ directory to top level for better organization. Replace all hardcoded /workspace paths with relative paths to make the configuration portable across different environments. Changes: - Move models/comfyui/ → comfyui/ - Move models/vllm/ → vllm/ - Remove models/ directory (empty) - Update arty.yml: replace /workspace with environment variables - Update supervisord.conf: use relative paths from /workspace/ai - Update all script references to use new paths - Maintain TQDM_DISABLE=1 to fix BrokenPipeError 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 20:49:27 +01:00
Sebastian Krüger	9a637cc4fc	refactor: clean Docker files and restore standalone model services - Remove all Docker-related files (Dockerfiles, compose.yaml) - Remove documentation files (README, ARCHITECTURE, docs/) - Remove old core/ directory (base_service, service_manager) - Update models.yaml with correct service_script paths (models/*/server.py) - Simplify vLLM requirements.txt to let vLLM manage dependencies - Restore original standalone vLLM server (no base_service dependency) - Remove obsolete vllm/, musicgen/, flux/ directories Process-based architecture is now fully functional on RunPod. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 16:17:38 +01:00
Sebastian Krüger	277f1c95bd	Initial commit: RunPod multi-modal AI orchestration stack - Multi-modal AI infrastructure for RunPod RTX 4090 - Automatic model orchestration (text, image, music) - Text: vLLM + Qwen 2.5 7B Instruct - Image: Flux.1 Schnell via OpenEDAI - Music: MusicGen Medium via AudioCraft - Cost-optimized sequential loading on single GPU - Template preparation scripts for rapid deployment - Comprehensive documentation (README, DEPLOYMENT, TEMPLATE)	2025-11-21 14:34:55 +01:00

12 Commits