- Detect Wikipedia plain-text rate-limit response ("You are making too many
requests") and wait 30s before retrying, rather than silently failing
- Increase inter-attempt delay from 3s to 15s per attempt
- Increase group subpage delay from 1.2s to 3s, year delay from 0.6s to 2s
- Re-scrape 1982, 1998, 2002, 2006 which had failed groups; all groups now
complete — e.g. 2002 now has 64 matches including Group E (Germany/Klose)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Merge data/wikipedia/{year}/ into data/{year}/ so there is a single
canonical location for World Cup JSON files. Update scrape and seed
scripts to use data/ instead of data/wikipedia/.
Re-scraped all 22 years (1930-2022) with fixed player name extraction
(full name from <a title="..."> rather than abbreviated display text)
so historical goals now show e.g. "Thomas Müller" not "Müller".
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Needed to recover from duplicate team entries (Bosnia & Herzegovina / USA)
that persisted because ON CONFLICT matching is on team IDs, so old rows
with wrong team IDs are never updated. --force clears all 2026 data and
orphaned teams before re-syncing clean.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add TEAM_ALIASES to lib/wiki-scraper.ts applied at extraction time so both
scraper and sync consistently produce canonical names. Removes the duplicate
alias map from seed.ts in favour of the shared normalizeTeam() export.
Aliases added:
Bosnia & Herzegovina → Bosnia and Herzegovina
USA → United States
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move all scraping logic (fetchWikiHtml, scrapeYear, scrapeSquads and all
helpers) into lib/wiki-scraper.ts as exported functions shared by both scripts.
scrape-wikipedia.ts becomes a composable CLI:
pnpm scrape [year] — matches + squads (default)
pnpm scrape [year] --matches — matches/meta/stadiums only
pnpm scrape [year] --squads — squads only
sync.ts drops the openfootball GitHub dependency entirely and scrapes
Wikipedia directly. Incremental: completed groups (all matches have FT
scores) are detected via DB query and their sub-pages are skipped each run.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add worldcup.meta.json per year with host, teams_count, winner, runner_up,
third_place, fourth_place — derived from match results (Final/Third-place
match) with infobox as fallback for edge cases like 1950's round-robin final.
Fix infobox host extraction to handle <br>-separated multi-host entries
(2002: Japan / South Korea). Fix squad scraper to filter out zero-player
phantom sections that Wikipedia appends (References, Captains, etc.).
Drop app/data/world_cup.csv and the PLACEMENTS/parseCsv code in seed.ts —
all tournament metadata now comes from the scraped JSON files.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move world_cup.csv to app/data/ directly (the only remaining Kaggle file
used by seed.ts for tournament metadata). Delete the rest of the Kaggle CSVs.
Update path constants in scrape-wikipedia.ts and seed.ts accordingly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add scripts/scrape-wikipedia.ts that fetches all 22 World Cups (1930–2022)
from English Wikipedia via MediaWiki API, handles group sub-pages, AET/penalty
detection, and goal parsing, writing openfootball-format JSON to app/data/openfootball/.
Rewrite scripts/seed.ts to read these local JSON files instead of the Kaggle
CSV, producing 965 matches and 2716 goals with per-group assignments for all
historical tournaments (enabling group standings on tournament pages).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously each match goal sync did: DELETE (auto-commit) → N
individual INSERTs (each auto-commit). During those ~50ms readers
saw 0 goals for the match — the inconsistency window.
Now: collectGoals() builds the rows in memory, replaceGoals() wraps
the DELETE + single bulk VALUES INSERT in a transaction. Under
Postgres READ COMMITTED, readers see the old goals until commit and
the full new set after — never an empty window.
Also drop sync pool from max:5 → max:2; the job is fully sequential
and was holding idle connections unnecessarily.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TypeScript doesn't narrow module-level consts across closure
boundaries, so the explicit process.exit(1) guard isn't enough —
add ! assertion at the usage site inside run().
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The DDL block in sync.ts was a "safety net" but caused misleading
password auth errors when Coolify's scheduled task ran without
DATABASE_URL injected — the fallback `wc:wc` password was wrong.
- Drop the silent `?? 'postgres://wc:wc@...'` fallback; exit with a
clear message if DATABASE_URL is missing so the root cause is obvious
- Remove the 90-line CREATE TABLE IF NOT EXISTS block — seed.ts runs
before the server starts and guarantees all tables exist
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Soviet Union (su), Yugoslavia (yu), East Germany, Germany DR, FR Yugoslavia,
and Czechoslovakia have no valid entry in flag-icons. Map them to null in
TEAM_ISO so getIso() returns null, and render a muted initials badge in
TeamFlag instead of a broken/empty sprite. Also drop the buggy 2-char
substring fallback that generated random valid codes for unknown teams.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- scripts/seed.ts: one-time import of Kaggle FIFA dataset (matches_1930_2022.csv,
world_cup.csv) covering all 964 matches and 2720 goals from 1930-2022 with full
scorer names, minutes, penalties, and own goals for every tournament
- scripts/sync.ts: stripped to 2026 only (openfootball live data); historical years
removed since Kaggle is now authoritative for 1930-2022
- Dockerfile: copy app/data into runner image; CMD runs seed.ts before server.js so
a fresh deployment auto-seeds on first start (skips if already seeded)
- package.json: add 'seed' script; use --force to re-import from updated CSV files
- app/data/kaggle/: bundle Kaggle CSV files in repo
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
syncGoals() was calling DELETE FROM goals WHERE match_id=X at the top,
so processing goals2 (away team) wiped out goals1 (home team) that were
just inserted. Every match with goals from both sides lost all home-team
goals — Ronaldo's hat-trick vs Spain, Kane's vs Panama, and many others.
Fix: move DELETE above the goals1/goals2 loop, executed once per match.
Result: 2018 goal count corrected from 107 → 169; hat tricks from 8 → 18.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
openfootball uses 'West Germany' for 1954–1990 era matches. All DB references
(matches, goals, group_standings, squads) have been merged into the Germany
team on both local and VPS. TEAM_ALIASES map prevents re-creation on re-sync.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drizzle ORM mutates client.options (parsers/serializers) after the postgres
client is created, which causes the separately-passed password option to be
lost on the actual connection attempt. Root cause confirmed on VPS: raw
postgres.js query succeeded while drizzle.execute() failed with auth error.
Fix: encode the password directly in DATABASE_URL (%23 = #, %5D = ], %3D = =).
postgres.js decodes percent-encoding correctly. No separate DB_PASSWORD env
var needed in the app container anymore.
DB_PASSWORD is still used by the Postgres container (POSTGRES_PASSWORD).
Coolify env var to set: DATABASE_URL=postgres://wc:<encoded-pass>@db:5432/worldcup
Also adds resolver-level isMissingTable() guards so the app returns empty
results instead of GraphQL errors on a fresh deploy before sync runs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The sync script created its own postgres client and only read DATABASE_URL,
bypassing the DB_PASSWORD override that lib/db/index.ts already applied.
Since DATABASE_URL has no password embedded, auth always failed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Full-stack World Cup web app (1930–2026):
- Next.js 16 + TailwindCSS 4 + GraphQL Yoga + Apollo Client 4 + Drizzle + PostgreSQL 16
- 23 tournaments synced from openfootball/worldcup.json (matches, goals, teams, stadiums, squads, standings)
- Pages: home (live), groups, stats, history, search, /tournaments/[year], /teams/[slug], /players/[name]
- Live match detection via isLive() + Apollo 60 s poll
- pnpm with node-linker=hoisted for Docker compatibility
- docker-compose.yml with Traefik labels (HTTPS redirect, TLS, security middleware)
- docker-compose.dev.yml for local dev (DB only, port 5432 exposed)
- Dockerfile: multi-stage pnpm build, standalone Next.js output, sync script bundled
- .env.example with all required variables documented
- Comprehensive README with local dev, deployment, schema, and GraphQL API reference
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>