Accurate calculation for ~25,000 READMEs with 5000/hour rate limit:
- Requests needed: ~25,000 (1 per README + metadata)
- Cycles needed: 25,000 / 4,500 ≈ 5.5 cycles
- Time per cycle: ~44 min work + ~16 min wait = 60 min total
- Total time: 5.5 × 60 = 330 minutes (5.5 hours)
Previous timeouts were insufficient:
- 170 minutes: completed 2.9 cycles (~13,500 requests)
- 270 minutes: would complete 4.5 cycles (~20,000 requests)
- 330 minutes: allows full completion with buffer
Changes:
- Job timeout: 180m → 330m (5.5 hours)
- Script timeout: 170m → 320m
- Within GitHub Actions free tier limit (360 minutes/6 hours)
Alternative: Use 'sample' mode for faster builds if full index
is not immediately needed.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Changes for CI mode (process.env.CI === 'true'):
- Remove grace period (was 10min) to enable continuous monitoring
- Increase check frequency from 1% to 10% to catch low rate limits early
- Raise proactive threshold from 200 to 500 requests
- Increase resume threshold from 100 to 1000 requests
This prevents wasting time on small batches (e.g. 184 requests = 2min
work + 13min wait) by ensuring we work in larger 1000-5000 request
batches for better time efficiency within the 170-minute timeout.
Local mode unchanged: maintains user-friendly behavior with fewer
interruptions.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Use MIN_REMAINING_TO_CONTINUE = 100 in CI environments to allow
incremental progress within the 170-minute timeout constraint, while
maintaining 4500 locally for better user experience with fewer
interruptions during indexing.
This fixes the timeout issue where waiting for nearly full rate limit
reset (4500/5000) required ~58 minutes per cycle, causing builds to
exceed the 170-minute timeout after just 3 cycles.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Initialize database inside indexer process to ensure connection exists
- Configure GitHub token in same process as indexer
- Make indexer throw errors instead of returning early for CI failure detection
- Remove duplicate token configuration step
- Pass GITHUB_TOKEN as environment variable to build step
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This function is required by the GitHub Actions workflow for
gathering database statistics after the build completes.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>