Wikipedia stores match times as "6:00 p.m." (1-digit hour) which didn't
match the \d{2}:\d{2} regex, producing NULL for those matches. Introduced
parseTime12h() to handle 1-2 digit hours + AM/PM and convert to 24h.
Also sort upcomingMatches by NULLS LAST so unscheduled games appear after
timed ones rather than first. Dropped "openfootball" data attribution.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026 FIFA World Cup has 12 groups (A-L). The previous regex only matched A-H,
causing Groups I, J, K, L to fall through undetected and collapse into Group H.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add TEAM_ALIASES to lib/wiki-scraper.ts applied at extraction time so both
scraper and sync consistently produce canonical names. Removes the duplicate
alias map from seed.ts in favour of the shared normalizeTeam() export.
Aliases added:
Bosnia & Herzegovina → Bosnia and Herzegovina
USA → United States
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move all scraping logic (fetchWikiHtml, scrapeYear, scrapeSquads and all
helpers) into lib/wiki-scraper.ts as exported functions shared by both scripts.
scrape-wikipedia.ts becomes a composable CLI:
pnpm scrape [year] — matches + squads (default)
pnpm scrape [year] --matches — matches/meta/stadiums only
pnpm scrape [year] --squads — squads only
sync.ts drops the openfootball GitHub dependency entirely and scrapes
Wikipedia directly. Incremental: completed groups (all matches have FT
scores) are detected via DB query and their sub-pages are skipped each run.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>