feat: replace Kaggle CSV with Wikipedia scraper for historical match data
Add scripts/scrape-wikipedia.ts that fetches all 22 World Cups (1930–2022) from English Wikipedia via MediaWiki API, handles group sub-pages, AET/penalty detection, and goal parsing, writing openfootball-format JSON to app/data/openfootball/. Rewrite scripts/seed.ts to read these local JSON files instead of the Kaggle CSV, producing 965 matches and 2716 goals with per-group assignments for all historical tournaments (enabling group standings on tournament pages). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -10,6 +10,7 @@
|
||||
"lint": "eslint",
|
||||
"seed": "tsx scripts/seed.ts",
|
||||
"sync": "tsx scripts/sync.ts",
|
||||
"scrape": "tsx scripts/scrape-wikipedia.ts",
|
||||
"db:generate": "drizzle-kit generate",
|
||||
"db:push": "drizzle-kit push"
|
||||
},
|
||||
@@ -17,6 +18,7 @@
|
||||
"@apollo/client": "^4.2.3",
|
||||
"@graphql-tools/schema": "^10.0.33",
|
||||
"@heroicons/react": "^2.2.0",
|
||||
"cheerio": "^1.2.0",
|
||||
"drizzle-orm": "^0.45.2",
|
||||
"flag-icons": "^7.5.0",
|
||||
"graphql": "^16.14.2",
|
||||
|
||||
Reference in New Issue
Block a user