The Stink: Methods and Data Sources
Dataset
4,067 finished Premier League fixtures across 11 seasons:
| Season | Matches | WhoScored | Mclachbot | Notes |
|---|---|---|---|---|
| 2015/16 | 380 | Yes | — | |
| 2016/17 | 380 | Yes | — | |
| 2017/18 | 380 | Yes | Yes | Mclachbot coverage begins |
| 2018/19 | 380 | Yes | Yes | |
| 2019/20 | 379 | Yes | Yes | 1 unscraped match |
| 2020/21 | 380 | Yes | Yes | |
| 2021/22 | 377 | Yes | Yes | 3 unscraped matches |
| 2022/23 | 380 | Yes | Yes | |
| 2023/24 | 380 | Yes | Yes | |
| 2024/25 | 380 | Yes | Yes | |
| 2025/26 | 271 | Yes | Yes | Through late Feb (GW27) — feed ended |
Total: 4,067 matches with WhoScored event-level data. Opta aggregated shot patterns (mclachbot) cover 9 seasons from 2017/18 onwards (180 team-season records).
Data sources
WhoScored / Opta event data
-
Match-level event streams scraped from WhoScored and stored in Postgres
(
whoscored_match_data) -
Exported to
src/data/pl-meta/whoscored-events.jsonviascripts/export-whoscored-pl-meta.ts -
Shot context determined by Opta qualifiers:
22(RegularPlay),23(FastBreak),24(SetPiece),25(FromCorner),26(DirectFreekick) -
Throw-in set pieces tagged via qualifier
160(ThrowinSetPiece) — shots from throw-in sequences that do not carry the standard shot-context qualifiers. Counted under set pieces in the export -
Penalties identified by qualifier
9; own goals excluded (qualifier28) -
Body part:
15(Head),20(RightFoot),72(LeftFoot)
Opta aggregated shot patterns (mclachbot)
-
Source:
data/mclachbot-shots/aggregated-by-team-season.json— 9 seasons (2017/18 to 2025/26), 180 team-season records, 32 unique teams - Per-team-season breakdown: total shots/goals/xG, dead ball shots/goals/xG, pattern-level splits (corner, set_piece, free_kick, throw_in), including header sub-counts per pattern
- Includes against-side totals (shots/goals/xG conceded) for each pattern
-
Current-season granular data in
src/data/pl-meta/mclachbot-aggregated/: one file per restart type (corner, set_piece, free_kick, throw_in), 2025/26 only, 20 teams. Includes attack and defence sides per team
SportMonks Football API
-
src/data/pl-meta/fixture-stats.json: fixture-level statistics for 5 seasons (2021/22 to 2025/26), including corner counts -
src/data/pl-meta/squad-physicals.json: starts-weighted average height per team-season, 6 seasons (2020/21 to 2025/26). Only 2025/26 used in the article (height vs header conversion scatter) -
src/data/pl-meta/header-shots-by-club.json: per-club header shots, goals, conversion rate, and xG for 2025/26
Volley chart data (Sankey + suppression)
-
src/data/pl-meta/volley-chart-data.json: 2025/26 only, 20 teams - Sankey: set-piece shots conceded, structured as flows from source → phase → body part → outcome. Built from WhoScored event data with custom sequence analysis
- "Second phase" = any follow-up shot within 20 seconds of the initial restart event
- Suppression: dead-ball shots faced, goals conceded, and xG conceded per team, broken down by restart type (corner, set piece, free kick, throw-in)
Definitions
- Dead ball: set piece + from corner + direct free kick. Excludes penalties. This is the article's primary unit of analysis
- Corner conversion: goals scored from corner-sourced shots / total shots from corners (not corners taken)
- Restart types: corners, indirect free kicks
(
set_piecein Opta), direct free kicks (free_kick), and throw-in set-piece sequences (throw_in) - Second phase: follow-up shots occurring within 20 seconds of the initial restart event. Used in the Sankey diagram
- Big Six: Arsenal, Chelsea, Liverpool, Manchester City, Manchester United, Tottenham Hotspur
- xG (expected goals): Opta-provided xG values. This article does not use a custom xG model
- Rolling window: 20-match rolling average used for the construction timeline chart
Chart methodology
1. Where goals come from (stacked area, 11 seasons)
-
Source:
whoscored-events.json(4,067 matches, 2015/16 to 2025/26) - Per-season aggregation of all shots and goals by context: open play, dead ball, counter attack, penalty
- Toggles between % of goals and % of shots. "Focus" view isolates a single segment
2. The structural break (league-level trends, 9 seasons)
-
Source:
aggregated-by-team-season.json(2017/18 to 2025/26) - Three league-wide metrics per season: dead-ball shot share (intent), dead-ball goal share (outcome), corner conversion (execution)
- Also counts teams with 25%+ of goals from dead balls each season
3. The miasma theory (height vs header conversion, 2025/26)
-
Height: starts-weighted average from
squad-physicals.json(2025/26 only) -
Header conversion: from
header-shots-by-club.json(2025/26 only) - Rank comparison chart: teams ranked by height on the left axis and by header conversion on the right, with connecting lines
4. Engineered corners (corner xG per shot, 9 seasons)
-
Source:
aggregated-by-team-season.json, corner pattern - xG per corner shot for Arsenal and Manchester City, 2017/18 to 2025/26, with coaching change markers (Jover to City 2019, Jover to Arsenal 2021)
5. Construction timeline (rolling 20-match, 11 seasons)
-
Source:
whoscored-events.json, filtered to Arsenal and Man City - Rolling 20-match window: dead-ball conversion rate (goals/shots) and dead-ball shot share (shots/total shots)
- Coaching events marked: Jover → City (July 2019), Jover → Arsenal (July 2021)
- Toggle between "Jover focus" (zoomed to coaching-change window) and "Full history"
6. Different pipes (restart mix, 2025/26)
-
Source:
mclachbot-aggregated/current-season files - Stacked bar showing share of set-piece xG (or goals) by restart type: corner, indirect FK, throw-in, direct FK
- Toggles: xG vs goals, attack ("for") vs defence ("against"). Five featured teams plus league average
7. Who drags the average? (throw-in index, 2025/26)
-
Source:
mclachbot-aggregated/current-season files - Throw-in share of total set-piece xG (or goals), ranked across all 20 teams, with league average reference line
- Shows attack and defence sides
8. Brentford's throw (throw-in volume over time)
-
Source:
aggregated-by-team-season.json, Brentford only, throw_in pattern - Bars: throw-in shots per season. Line: xG per throw-in shot. Covers Brentford's Premier League seasons (2021/22 to 2025/26)
9. Set-piece shots against (Sankey, 2025/26)
-
Source:
volley-chart-data.json, sankey section - Flow diagram: restart source → phase (first ball / second phase) → body part → outcome (goal / on target / off target / blocked)
- Team selector compares any club against the league aggregate
10. Attack and defence (suppression dot plot, 2025/26)
-
Defence side:
volley-chart-data.jsonsuppression data (dead-ball goals conceded) -
Attack side:
mclachbot-aggregated/all-dead-ball.json(dead-ball goals scored) - Sorted by net dead-ball goals (scored minus conceded)
Caveats
- 2025/26 is in progress: 271 of 380 matches in WhoScored data. All current-season charts and statistics will change as the season completes.
- WhoScored data has small gaps: 2019/20 is missing 1 match (379 of 380), 2021/22 is missing 3 matches (377 of 380). These are unscraped fixtures and should not materially affect aggregates.
- "Set piece" categorisation relies on Opta qualifier tagging. Qualifier 160 (ThrowinSetPiece) was discovered during analysis — some shots from throw-in sequences were initially unclassified. The export script now counts them correctly.
- The mclachbot aggregated dataset starts at 2017/18. Charts using this source cover 9 seasons, not 11. The first two seasons (2015/16 and 2016/17) appear only in the stacked area and construction timeline charts, which use WhoScored event data directly.
- Height data is starts-weighted squad average. It does not capture set-piece-specific personnel (e.g. a team may bring on a tall substitute for late corners). Single-season snapshot (2025/26).
- Corner conversion is volatile at team level with small samples. A single season of 20–40 corner shots per club is insufficient to distinguish skill from variance. League-wide rates (400+ corner shots) are more stable.
- The Sankey "second phase" is defined as follow-up shots within 20 seconds of the initial restart event. This window is arbitrary but consistent across all teams and matches.
- xG values are Opta-provided. This article does not use a custom expected goals model. Opta xG is a proprietary model and its methodology is not publicly documented in full.
- Single-season charts (scatter, restart mix, Sankey, suppression) reflect a snapshot of 2025/26 at the time of data export. They do not show trends and should not be extrapolated.
- Man City decline table shows selected seasons (2017/18, 2021/22, 2024/25, 2025/26) rather than all 9 available. This is an editorial choice to highlight the arc: pre-Jover, peak, and decline.
- Arsenal pre/post-Jover xG comparison uses 2017/18–2020/21 as "before" and 2021/22–2024/25 as "with Jover" (4 seasons each). 2025/26 is excluded from the "with" average because it is incomplete.