33 Commits

Author SHA1 Message Date
f8694653d0 fix: adjust VRAM for 24K context based on actual usage
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 13s
Based on error output, model uses ~17.5GB (not 15GB estimated).
- Llama: 85% VRAM for 24576 context (3GB KV cache)
- BGE: 6% VRAM (reduced to fit)
- Total: 91%

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-30 22:46:34 +01:00
078043e35a feat: balance Llama 24K context with concurrent BGE
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Adjusted VRAM allocation for concurrent operation:
- Llama: 80% VRAM, 24576 context
- BGE: 8% VRAM
- Total: 88% of 24GB RTX 4090

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-30 22:39:08 +01:00
c969d10eaf feat: increase Llama context to 32K with 95% VRAM
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
For larger input requirements, increased max-model-len from 20480 to 32768.
BGE remains available but cannot run concurrently at this VRAM level.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-30 22:37:11 +01:00
f68bc47915 feat: increase Llama max-model-len to 20480
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
Adjusted VRAM allocation for larger context window:
- Llama: 90% VRAM, 20480 context (up from 8192)
- BGE: 8% VRAM (down from 10%)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-30 22:15:08 +01:00
b2de3b17ee fix: adjust VRAM allocation for concurrent Llama+BGE
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 13s
- Llama: 85% GPU, 8K context (model needs ~15GB base)
- BGE: 10% GPU (1.3GB model)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-30 20:16:00 +01:00
f668e06228 feat: add BGE embedding model for concurrent operation with Llama
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 36s
- Create config_bge.yaml for BAAI/bge-large-en-v1.5 on port 8002
- Reduce Llama VRAM to 70% and context to 16K for concurrent use
- Add BGE service to supervisor with vllm group

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-30 19:55:13 +01:00
c32340f23c fix: start.sh
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 1m27s
2025-11-28 11:11:52 +01:00
f3225f70d6 fix: start.sh
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 1m47s
2025-11-28 11:01:36 +01:00
875c25d802 fix: tailscale stale dir
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
2025-11-27 23:37:36 +01:00
1e1832c166 fix: start.sh
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 1m30s
2025-11-27 23:23:08 +01:00
07c6ab50e7 fix: tailscale --force-reauth
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
2025-11-27 23:17:23 +01:00
977b9c0f4f fix: proxy root path
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
2025-11-27 15:36:45 +01:00
6e57b21c78 fix: audiocraft HF_HOME env
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 16s
2025-11-27 14:58:09 +01:00
9e4c3f5a9c fix: supervisor upscale
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
2025-11-27 12:27:50 +01:00
364bf5f46d chore: supervisor reload
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
2025-11-27 12:25:36 +01:00
7cde108b88 fix: upscale service ref
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
2025-11-27 12:05:45 +01:00
44bfa271e3 feat: upscale service
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
2025-11-27 12:04:19 +01:00
69bf6af419 chore: minimal refactoring
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 1m26s
2025-11-27 11:51:21 +01:00
d625bf1c23 feat: SKIP_ARTY_SETUP env
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 1m25s
2025-11-27 11:41:21 +01:00
44742a1fb4 feat: FLUX.2 Dev models
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
2025-11-27 11:35:37 +01:00
a08879625c fix: remove audiocraft autostart
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
2025-11-27 10:52:57 +01:00
2669ff65a9 feat: HunyuanVideo 1.5
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
2025-11-27 10:01:28 +01:00
400e534f17 fix: webdav-sync config
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 17s
2025-11-27 09:33:31 +01:00
b9beef283d fix: remove vllm embedding
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
2025-11-27 01:24:05 +01:00
90fa8a073c fix: remove vllm embedding
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 36s
2025-11-27 01:12:57 +01:00
4d7c811a46 fix: vllm gpu utilization 2
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
2025-11-27 00:57:14 +01:00
eaa8e0ebab fix: vllm gpu utilization
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
2025-11-27 00:50:42 +01:00
5e9aa8f25d fix: HF_HOME for vllm
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
2025-11-27 00:20:42 +01:00
874e9a639d fix: models linking
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 15s
2025-11-26 23:51:21 +01:00
15182802ed fix: arty.yml
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 1m59s
2025-11-26 22:03:03 +01:00
73aea85181 docs: README.md
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
2025-11-26 21:26:19 +01:00
5f8c843b22 Initial commit
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 14s
2025-11-26 20:20:45 +01:00
5c61ac5c67 Initial commit
All checks were successful
Build and Push RunPod Docker Image / build-and-push (push) Successful in 1m28s
2025-11-26 17:15:08 +01:00