feat: scrape tournament meta from Wikipedia, drop world_cup.csv

Add worldcup.meta.json per year with host, teams_count, winner, runner_up,
third_place, fourth_place — derived from match results (Final/Third-place
match) with infobox as fallback for edge cases like 1950's round-robin final.

Fix infobox host extraction to handle <br>-separated multi-host entries
(2002: Japan / South Korea). Fix squad scraper to filter out zero-player
phantom sections that Wikipedia appends (References, Captains, etc.).

Drop app/data/world_cup.csv and the PLACEMENTS/parseCsv code in seed.ts —
all tournament metadata now comes from the scraped JSON files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-06-15 17:09:45 +02:00
parent ff4989f39f
commit d1171267a8
34 changed files with 319 additions and 254 deletions
@@ -0,0 +1,8 @@
{
"host": "France",
"teams_count": 32,
"winner": "France",
"runner_up": "Brazil",
"third_place": "Croatia",
"fourth_place": "Netherlands"
}
@@ -4388,17 +4388,5 @@
"date_of_birth": "1974-07-15"
}
]
},
{
"name": "Players",
"players": []
},
{
"name": "Captains",
"players": []
},
{
"name": "Goalkeepers",
"players": []
}
]