feat: implement incremental indexing and remove proactive rate limit
Major performance improvements for CI builds: 1. **Removed proactive rate limit threshold** - No longer waits at 500 remaining requests - Uses full 5000 request quota before forced wait - Maximizes work per rate limit cycle 2. **Implemented incremental indexing** - Checks if repository already exists in database - Compares last_commit (pushedAt) to detect changes - Only fetches README for new or updated repositories - Skips README fetch for unchanged repos (major time savings) 3. **Increased timeout to GitHub maximum** - Job timeout: 180m → 360m (6 hours, GitHub free tier max) - Script timeout: 170m → 350m - Allows full first-run indexing to complete Impact on performance: **First run (empty database):** - Same as before: ~25,000 repos need full indexing - Will use all 360 minutes but should complete **Subsequent runs (incremental):** - Only fetches READMEs for changed repos (~5-10% typically) - Dramatically faster: estimated 30-60 minutes instead of 360 - Makes daily automated builds sustainable Files changed: - lib/github-api.js: Removed proactive rate limit check - lib/indexer.js: Added incremental indexing logic - .github/workflows/build-database.yml: Increased timeout to 360m 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
8
.github/workflows/build-database.yml
vendored
8
.github/workflows/build-database.yml
vendored
@@ -22,7 +22,7 @@ permissions:
|
||||
jobs:
|
||||
build-database:
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 180 # 3 hours max
|
||||
timeout-minutes: 360 # 6 hours (GitHub Actions maximum for free tier)
|
||||
|
||||
steps:
|
||||
- name: Checkout repository
|
||||
@@ -60,8 +60,8 @@ jobs:
|
||||
INDEX_MODE="${{ github.event.inputs.index_mode || 'full' }}"
|
||||
echo "Index mode: $INDEX_MODE"
|
||||
|
||||
# Build the index in non-interactive mode (170m timeout, job timeout is 180m)
|
||||
timeout 170m node -e "
|
||||
# Build the index in non-interactive mode (350m timeout, job timeout is 360m)
|
||||
timeout 350m node -e "
|
||||
const db = require('./lib/database');
|
||||
const dbOps = require('./lib/db-operations');
|
||||
const indexer = require('./lib/indexer');
|
||||
@@ -96,7 +96,7 @@ jobs:
|
||||
" || {
|
||||
EXIT_CODE=$?
|
||||
if [ $EXIT_CODE -eq 124 ]; then
|
||||
echo "❌ Index building timed out after 170 minutes"
|
||||
echo "❌ Index building timed out after 350 minutes"
|
||||
echo "This may indicate rate limiting issues or too many lists to index"
|
||||
fi
|
||||
exit $EXIT_CODE
|
||||
|
||||
Reference in New Issue
Block a user