refactor: consolidate data/ into single root directory, fix historical player names

Merge data/wikipedia/{year}/ into data/{year}/ so there is a single
canonical location for World Cup JSON files. Update scrape and seed
scripts to use data/ instead of data/wikipedia/.

Re-scraped all 22 years (1930-2022) with fixed player name extraction
(full name from <a title="..."> rather than abbreviated display text)
so historical goals now show e.g. "Thomas Müller" not "Müller".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-06-15 18:27:35 +02:00
parent 9ce2a4e27c
commit d37ebe201e
48 changed files with 2488 additions and 3630 deletions
+3 -3
View File
@@ -1,6 +1,6 @@
/**
* Scrape English Wikipedia for World Cup data and write JSON files to
* data/wikipedia/{year}/.
* data/{year}/.
*
* Usage:
* pnpm scrape # all years, matches + squads
@@ -17,7 +17,7 @@ import {
} from '../lib/wiki-scraper'
const __dirname = path.dirname(fileURLToPath(import.meta.url))
const DATA_DIR = path.join(__dirname, '../data/wikipedia')
const DATA_DIR = path.join(__dirname, '../data')
const YEARS = [
1930,1934,1938,1950,1954,1958,1962,1966,1970,1974,
@@ -95,7 +95,7 @@ async function main() {
console.log()
}
console.log('\nDone! Files written to data/wikipedia/{year}/')
console.log('\nDone! Files written to data/{year}/')
}
main().catch(e => { console.error(e); process.exit(1) })