Commit Graph

5 Commits

Author SHA1 Message Date
valknar 9ce2a4e27c fix: use full player names from title attr, preserve UTC offset in match times
Wikipedia abbreviates goal scorer display text (e.g. "Müller") but the
<a title="Thomas Müller"> attribute always has the full name. Switch
parseGoals() to prefer title attr and strip disambiguation suffixes like
"(soccer, born 1993)". This ensures Gerd Müller and Thomas Müller get
separate player pages.

Also preserve the UTC offset from Wikipedia's ftime (e.g. "12:00 UTC-4")
so that isLive() can accurately compute UTC kickoff time instead of
treating local time as UTC. upcomingMatches sorts by SPLIT_PART on the
HH:MM part to ignore the timezone suffix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 18:14:53 +02:00
valknar 187ee2e312 fix: parse Wikipedia 12h time format and sort upcoming matches with NULLS LAST
Wikipedia stores match times as "6:00 p.m." (1-digit hour) which didn't
match the \d{2}:\d{2} regex, producing NULL for those matches. Introduced
parseTime12h() to handle 1-2 digit hours + AM/PM and convert to 24h.
Also sort upcomingMatches by NULLS LAST so unscheduled games appear after
timed ones rather than first. Dropped "openfootball" data attribution.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 17:50:30 +02:00
valknar 42063cdfda fix: extend group heading regex from [a-h] to [a-z] for 2026 Groups I-L
2026 FIFA World Cup has 12 groups (A-L). The previous regex only matched A-H,
causing Groups I, J, K, L to fall through undetected and collapse into Group H.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 17:39:38 +02:00
valknar b832b62f5e fix: normalize Bosnia & Herzegovina and USA team name variants
Add TEAM_ALIASES to lib/wiki-scraper.ts applied at extraction time so both
scraper and sync consistently produce canonical names. Removes the duplicate
alias map from seed.ts in favour of the shared normalizeTeam() export.

Aliases added:
  Bosnia & Herzegovina  → Bosnia and Herzegovina
  USA                   → United States

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 17:33:05 +02:00
valknar f885e4312c refactor: extract lib/wiki-scraper.ts, make scraper composable, sync from Wikipedia
Move all scraping logic (fetchWikiHtml, scrapeYear, scrapeSquads and all
helpers) into lib/wiki-scraper.ts as exported functions shared by both scripts.

scrape-wikipedia.ts becomes a composable CLI:
  pnpm scrape [year]             — matches + squads (default)
  pnpm scrape [year] --matches   — matches/meta/stadiums only
  pnpm scrape [year] --squads    — squads only

sync.ts drops the openfootball GitHub dependency entirely and scrapes
Wikipedia directly. Incremental: completed groups (all matches have FT
scores) are detected via DB query and their sub-pages are skipped each run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 17:23:17 +02:00