Commit Graph

2 Commits

Author SHA1 Message Date
valknar b832b62f5e fix: normalize Bosnia & Herzegovina and USA team name variants
Add TEAM_ALIASES to lib/wiki-scraper.ts applied at extraction time so both
scraper and sync consistently produce canonical names. Removes the duplicate
alias map from seed.ts in favour of the shared normalizeTeam() export.

Aliases added:
  Bosnia & Herzegovina  → Bosnia and Herzegovina
  USA                   → United States

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 17:33:05 +02:00
valknar f885e4312c refactor: extract lib/wiki-scraper.ts, make scraper composable, sync from Wikipedia
Move all scraping logic (fetchWikiHtml, scrapeYear, scrapeSquads and all
helpers) into lib/wiki-scraper.ts as exported functions shared by both scripts.

scrape-wikipedia.ts becomes a composable CLI:
  pnpm scrape [year]             — matches + squads (default)
  pnpm scrape [year] --matches   — matches/meta/stadiums only
  pnpm scrape [year] --squads    — squads only

sync.ts drops the openfootball GitHub dependency entirely and scrapes
Wikipedia directly. Incremental: completed groups (all matches have FT
scores) are detected via DB query and their sub-pages are skipped each run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 17:23:17 +02:00