mathdata sciencesports analytics

Fantasy Football as a Stats Class: Teaching Data Literacy with FPL

UUnknown

2026-01-30

10 min read

Teach probability and predictive modelling using Fantasy Premier League stats. A semester module with labs, projects, and active-recall techniques.

Hook: Turn student boredom into real-world data skills with Fantasy Premier League

Students often find probability and predictive modelling abstract, disconnected from daily life, and hard to retain. Use Fantasy Premier League (FPL) as the semester-long lab: real, messy, timely sports data that motivates decisions every gameweek. By the end of the module, learners will read team news, clean live data feeds, build predictive models, estimate probabilities, and make risk-aware decisions — all while practicing active recall and spaced repetition study techniques.

Why FPL is the perfect data-literacy playground in 2026

Recent trends (late 2025–early 2026) make sports data more accessible and pedagogically powerful:

Granular event data and richer open APIs mean students can work with xG, expected assists, pressures, and event timelines, not just goals and minutes.
Classroom-friendly tools like JupyterHub/Colab, cloud notebooks, and lightweight AutoML have lowered barriers to hands-on modelling for undergrads and high-schoolers.
Live event-level repositories (open-source event feeds) and classroom-friendly pipelines plus widely published team news (injuries, rotations, transfers) can be integrated into modelling pipelines to teach Bayesian updating and decision revision under uncertainty.

That combination — large, real datasets + immediate, consequential decisions — makes FPL ideal to teach the full statistics lifecycle: data collection, cleaning, exploratory analysis, modelling, evaluation, and decision-making.

Course overview: A semester-long module (12–14 weeks)

The module below assumes weekly FPL gameweeks and a semester of 12–14 sessions (one session per week, 2–3 hours each). Each week pairs a statistical concept with an FPL-focused lab, plus low-stakes retrieval practice to reinforce learning.

Learning goals

Interpret and compute probabilities from sports events.
Develop predictive models (regression, classification, tree ensembles, time-series).
Perform feature engineering using team news and advanced metrics.
Make evidence-based decisions under uncertainty and explain trade-offs.
Apply active recall and spaced repetition for durable retention.

Weekly syllabus (12–14 weeks)

Week 1 — Orientation & data sources:
- Intro to FPL mechanics, common stats, and key data sources (official FPL endpoints, community maintained FPL datasets, event-level repositories, weekly team news scrapes (BBC/club team news feeds)).
- Lab: Access an FPL dataset; explore player, team, and fixture tables.
- Retrieval practice: Quick quiz on definitions (xG, minutes, clean sheets).
Week 2 — Descriptive statistics & visualization:
- Measures of central tendency, dispersion, and distributions applied to player points and xG.
- Lab: Visualize distributions (histograms, boxplots) and spot outliers (e.g., burst weeks).
Week 3 — Probability basics & event modelling:
- Discrete vs continuous distributions, Poisson processes for goals, and modeling rare events.
- Lab: Estimate goal probabilities using Poisson and discuss limits (overdispersion).
Week 4 — Hypothesis testing & confidence intervals:
- t-tests, chi-square tests for categorical outcomes, and bootstrap confidence intervals for player means.
- Lab: Test whether Team A concedes significantly more chances than Team B.
Week 5 — Feature engineering & team news:
- Turn raw event feeds and team news (injuries, rotation risk) into predictive features.
- Lab: Build “rotation risk” and “fixture difficulty” features using team news scrapes and opponent stats.
Week 6 — Regression models (linear & Poisson):
- Use GLMs to predict expected points and goals; discuss assumptions and diagnostics.
- Lab: Fit Poisson regression for goals; compare to baseline mean predictor.
Week 7 — Classification & probability calibration:
- Logistic regression, probability calibration, and interpreting predicted probabilities for events like “player >6 points”.
- Lab: Build classifier for “big-game week” performance and evaluate Brier score or calibration plots.
Week 8 — Tree-based models & explainability:
- Decision trees, random forests, gradient boosting; use SHAP or partial dependence to explain predictions.
- Lab: Train XGBoost/LightGBM to predict player points and produce explanation plots for captaincy picks.
Week 9 — Time-series & concept drift:
- Rolling windows, exponential smoothing, and handling concept drift from transfers or managerial changes.
- Lab: Build rolling average features for player form; detect drift after transfer windows.
Week 10 — Cross-validation & model evaluation:
- Appropriate CV for time-ordered sports data, loss functions for points prediction, and backtesting.
- Lab: Backtest a season-long strategy (e.g., captaincy heuristic vs model-based picks) and run a practical backtest exercise to show how models behave under distribution shifts.
Week 11 — Decision theory & risk management:
- Expected utility, opportunity cost, portfolio thinking (team selection), and chip/tactical choices in FPL.
- Lab: Create expected-utility functions for transfers and captaincy choices under uncertainty.
Week 12 — Ensemble strategies & automated pipelines:
- Combine models, automate data pulls, and implement a simple pick-recommendation pipeline.
- Lab: Deploy a weekly recommendation notebook; schedule automated data pulls and alerts for team news.
Weeks 13–14 — Final projects & presentations:
- Student teams present a predictive model, live demo, and decision framework for a full gameweek strategy.
- Assessment: Model performance, interpretability, and a short pedagogy piece explaining how decisions were driven by data.

Instructional design: Active learning and mnemonic spacing

Design each week to combine short lectures, a hands-on lab, and retrieval practice. Integrate active recall with low-stakes quizzes and use spaced repetition for key formulas and modelling recipes.

Use daily/weekly flashcards (Anki or Quizlet) for distributions, model assumptions, and evaluation metrics.
Weekly “two-minute recall” at the start of labs: students recite one theorem, one model diagnostic, and one feature-engineering trick.
End-of-week tiny projects that take 15–45 minutes to encourage repeated practice and retention.

Datasets, APIs and tools: Practical setup (2026)

Recommended stack for reproducible, classroom-friendly workflows:

Python (Pandas, scikit-learn, statsmodels, xgboost/lightgbm, SHAP)
R (tidyverse, caret, mgcv) for stats-first classes
JupyterHub/Colab for managed student environments
Versioning: GitHub classroom for submissions and reproducibility
APIs & data: Official FPL data endpoints, community maintained FPL datasets, event-level repositories, and weekly team news scrapes (e.g., BBC/club updates) for rotation/injury features

Tip: In 2026, many leagues and analytics groups provide sanitized demo datasets for teaching. Use these to avoid legal issues and to ensure classroom privacy.

Sample labs and assignments (actionable templates)

Lab: Build a “Captaincy Probability” model (2–3 hours)

Objective: Predict the probability a chosen player will score >8 FPL points in the upcoming gameweek (a common definition of a successful captain pick).
Data: Past 2 seasons of player match-level stats, opponent defensive metrics, recent minutes, and latest team news (injury/doubt flags).
Steps:
- Feature engineering: rolling 3–6 match means, fixture difficulty, home/away, opponent xG conceded, rotation risk indicator.
- Train logistic regression and a gradient-boosted tree; calibrate probabilities and compare Brier scores.
- Explain top drivers using SHAP; produce a 1-slide recommendation for captaincy with uncertainty bands.
Deliverable: Jupyter notebook, calibration plot, and a short rationale (≤200 words).

Assignment: Transfer decision under uncertainty (take-home)

Students have one free transfer and must choose between two players. Provide model-based expected points distributions and team news scenarios (best case / worst case).
Students compute expected utility for each choice, factor in hit penalties, and justify their pick using probability statements (e.g., “Player A has a 35% chance to outscore Player B by ≥4 points”).
Grading emphasizes reasoning and uncertainty communication, not raw model accuracy.

Assessment & rubrics

Balance technical skill with communication and decision-making. Suggested weighting:

Weekly labs & quizzes — 30% (encourages regular retrieval practice)
Midterm project (model + write-up) — 25%
Final team project & presentation — 35%
Participation & peer review — 10%

Rubric highlights:

Reproducibility: notebooks run start-to-finish.
Model validation: appropriate backtesting and time-aware CV.
Communication: concise decision memo aimed at an FPL manager.

Classroom examples & mini case studies

Use short case studies to cement concepts. Example:

A manager must decide whether to transfer in a forward who has high xG but is doubtful in team news. Students used Bayesian updating: prior model predicted 55% chance of ≥6 points. After a late team news update indicating a likely absence, updated probability dropped to 20% — the model recommended holding the transfer and searching for alternatives. Students then simulated season outcomes to estimate long-term ROI of that conservative strategy.

That case ties probability, team news scraping, and decision theory into a single classroom moment.

Addressing common classroom challenges

Data messiness: Teach data cleaning as a skill; give starter notebooks with noisy examples and a checklist (missingness, duplicates, timezone issues in match timestamps).
Math anxiety: Use applied examples first (visual intuition), then formalize formulas. Use Anki flashcards for critical equations.
Group work fairness: Use GitHub assignment templates and require a short author contribution statement.

Ethics, reproducibility & 2026 considerations

When using live sports data in teaching, cover ethical and reproducibility topics:

Respect data licensing and give attribution for proprietary datasets.
Discuss model misuse and gambling risks; emphasize the course is about statistical reasoning, not guaranteed betting strategies.
Teach model explainability and responsible reporting (especially when recommending risky transfers).

Also, highlight 2026 trends: automated model explainers (SHAP, LIME) are now classroom staples, and educational AutoML tools help novices prototype models quickly while keeping instructors focused on interpretation and decision trade-offs. Consider edge-deployed or on-device personalization for privacy-sensitive classroom demos.

Measuring success: Learning outcomes & metrics

Evaluate student progress using these measurable outcomes:

Ability to produce a reproducible model and backtest it on unseen gameweeks.
Measured improvement in probabilistic calibration (e.g., Brier score reduction) between midterm and final projects.
Student self-reported confidence in making data-driven decisions (pre/post surveys).

Teacher resources & quick start checklist

Clone starter repo with notebooks (data ingestion, cleaning, feature templates).
Provision JupyterHub or Google Colab links for students.
Prepare weekly short quizzes and Anki deck for spaced repetition.
Set up automated team-news scrapes and a sandbox API key to avoid rate limits — pair this with a lightweight serverless scheduling pattern for reliability.
Create grading rubric emphasizing interpretation and decision justification.

Student study strategies aligned with the module

Daily 20–30 minute review sessions using flashcards for metrics and formulas.
Weekly deliberate practice: replicate one published FPL analysis and explain the steps aloud (retrieval + elaboration).
Peer-teaching: run short peer review sessions where students defend their transfer choices using model evidence.

Final project ideas (inspired by 2026 analytics)

Real-time captaincy recommender that ingests team news and outputs probability-weighted picks.
Season-long portfolio strategy: use portfolio optimization to allocate squad value across positions under budget constraints.
Explainable ensemble model predicting both points and rotation risk, packaged as an instructor-ready dashboard — consider integrating multimodal workflow tooling for reproducible media artifacts (see workflows).

Wrap-up: From weekly quizzes to season-winning insights

Teaching statistics through FPL gives students a compelling context to practice data literacy, probability, and predictive modelling with live, meaningful feedback. The module trains not only technical skills — cleaning data, building models, evaluating forecasts — but also the critical habit of making decisions under uncertainty and documenting reasoning clearly.

Start small: one lab, one active recall activity, and one project. Scale up as students gain confidence. By the end of a semester, learners will have a portfolio of reproducible work, a sharper intuition for probability, and practical experience translating model outputs into real FPL decisions.

Call to action

Ready to turn your stats class into a season-long, hands-on data lab? Download our free starter syllabus, ready-to-run Jupyter notebooks, and an Anki deck designed for this module. Sign up for our educator newsletter to get weekly updates (including live team-news templates) and a template rubric you can adopt today.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.