Commit Graph

8 Commits

Author SHA1 Message Date
valknarness
279cc2fa25 fix: bypass rate limiting for raw.githubusercontent.com requests
CRITICAL FIX: raw.githubusercontent.com does NOT count against GitHub
API rate limits, but the code was treating all requests the same way.

Problem:
- README fetches (~25,000) were going through rateLimitedRequest()
- Added artificial delays, proactive checks, and unnecessary waits
- Build took ~7 hours instead of ~2-3 hours
- Only getRepoInfo() API calls actually count against rate limits

Solution:
1. Created fetchRawContent() function for direct raw content fetches
2. Updated getReadme() to use fetchRawContent()
3. Updated getAwesomeListsIndex() to use fetchRawContent()
4. Reduced workflow timeout: 330m → 180m (3 hours)

Impact:
- Build time: ~7 hours → ~2-3 hours (60% reduction)
- Only ~25K API calls (getRepoInfo) count against 5000/hour limit
- ~25K README fetches are now unrestricted via raw.githubusercontent.com
- Will complete well within GitHub Actions 6-hour free tier limit

Files changed:
- lib/github-api.js: Add fetchRawContent(), update getReadme() and
  getAwesomeListsIndex() to use it
- .github/workflows/build-database.yml: Reduce timeout to 180 minutes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-28 06:04:14 +01:00
valknarness
9c166fe56f fix: increase workflow timeout to 330 minutes (5.5 hours)
Accurate calculation for ~25,000 READMEs with 5000/hour rate limit:
- Requests needed: ~25,000 (1 per README + metadata)
- Cycles needed: 25,000 / 4,500 ≈ 5.5 cycles
- Time per cycle: ~44 min work + ~16 min wait = 60 min total
- Total time: 5.5 × 60 = 330 minutes (5.5 hours)

Previous timeouts were insufficient:
- 170 minutes: completed 2.9 cycles (~13,500 requests)
- 270 minutes: would complete 4.5 cycles (~20,000 requests)
- 330 minutes: allows full completion with buffer

Changes:
- Job timeout: 180m → 330m (5.5 hours)
- Script timeout: 170m → 320m
- Within GitHub Actions free tier limit (360 minutes/6 hours)

Alternative: Use 'sample' mode for faster builds if full index
is not immediately needed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-28 00:31:58 +01:00
valknarness
a136b929b0 fix: suspend 2025-10-27 02:36:46 +01:00
valknarness
c0d3ffd328 fix: CI indexing 2025-10-26 22:04:46 +01:00
valknarness
509795ab82 Fix workflow database initialization and error handling
- Initialize database inside indexer process to ensure connection exists
- Configure GitHub token in same process as indexer
- Make indexer throw errors instead of returning early for CI failure detection
- Remove duplicate token configuration step
- Pass GITHUB_TOKEN as environment variable to build step

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 14:05:21 +01:00
valknarness
10910d8537 fix: github workflow 2025-10-26 14:00:45 +01:00
valknarness
c73a14510b feat: github workflow 2025-10-26 13:55:27 +01:00
valknarness
4cdcc62e15 feat: github workflow 2025-10-26 13:48:23 +01:00