runpod-ai-orchestrator

Author	SHA1	Message	Date
Sebastian Krüger	b2de3b17ee	fix: adjust VRAM allocation for concurrent Llama+BGE All checks were successful Build and Push RunPod Docker Image / build-and-push (push) Successful in 13s Details - Llama: 85% GPU, 8K context (model needs ~15GB base) - BGE: 10% GPU (1.3GB model) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-30 20:16:00 +01:00
Sebastian Krüger	f668e06228	feat: add BGE embedding model for concurrent operation with Llama All checks were successful Build and Push RunPod Docker Image / build-and-push (push) Successful in 36s Details - Create config_bge.yaml for BAAI/bge-large-en-v1.5 on port 8002 - Reduce Llama VRAM to 70% and context to 16K for concurrent use - Add BGE service to supervisor with vllm group 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-30 19:55:13 +01:00
Sebastian Krüger	b9beef283d	fix: remove vllm embedding All checks were successful Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s Details	2025-11-27 01:24:05 +01:00
Sebastian Krüger	90fa8a073c	fix: remove vllm embedding All checks were successful Build and Push RunPod Docker Image / build-and-push (push) Successful in 36s Details	2025-11-27 01:12:57 +01:00
Sebastian Krüger	4d7c811a46	fix: vllm gpu utilization 2 All checks were successful Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s Details	2025-11-27 00:57:14 +01:00
Sebastian Krüger	eaa8e0ebab	fix: vllm gpu utilization All checks were successful Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s Details	2025-11-27 00:50:42 +01:00
Sebastian Krüger	5c61ac5c67	Initial commit All checks were successful Build and Push RunPod Docker Image / build-and-push (push) Successful in 1m28s Details	2025-11-26 17:15:08 +01:00

7 Commits