feat: replace Kaggle CSV with Wikipedia scraper for historical match data

Add scripts/scrape-wikipedia.ts that fetches all 22 World Cups (1930–2022)
from English Wikipedia via MediaWiki API, handles group sub-pages, AET/penalty
detection, and goal parsing, writing openfootball-format JSON to app/data/openfootball/.

Rewrite scripts/seed.ts to read these local JSON files instead of the Kaggle
CSV, producing 965 matches and 2716 goals with per-group assignments for all
historical tournaments (enabling group standings on tournament pages).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-06-15 11:39:53 +02:00
parent 83b1ad3e35
commit 5dcd22ad22
88 changed files with 95625 additions and 127 deletions
+2
View File
@@ -10,6 +10,7 @@
"lint": "eslint",
"seed": "tsx scripts/seed.ts",
"sync": "tsx scripts/sync.ts",
"scrape": "tsx scripts/scrape-wikipedia.ts",
"db:generate": "drizzle-kit generate",
"db:push": "drizzle-kit push"
},
@@ -17,6 +18,7 @@
"@apollo/client": "^4.2.3",
"@graphql-tools/schema": "^10.0.33",
"@heroicons/react": "^2.2.0",
"cheerio": "^1.2.0",
"drizzle-orm": "^0.45.2",
"flag-icons": "^7.5.0",
"graphql": "^16.14.2",