Mean Reversion and Premier League Predictions
- George Ferridge
- Aug 21, 2023
- 5 min read
With the first two weekends of Premier League action passed and the second fast approaching, it makes sense to start making some predictions. After some stunning debuts from new players (looking at you, Sandro Tonali) and some surprising performances from teams that were predicted to do badly or well (looking at you, Wolves and Manchester United), what do we think is going to happen this season?
Unfortunately, my approach to predictions has always been and will always be from the perspective of an economist. So while I enjoyed this start to Premier League action, two matches will never be a large enough sample size to extrapolate to the season. If we look back to 2015/2016, for example, no one in their right mind would have predicted Antonio Conte’s Chelsea side to walk away with the league title after a disappointing 3-0 loss to Arsenal in September of that year. So if we should ignore this past weekend’s games, what is the best way to make an educated prediction about the league instead of just who we “feel” might do well?

The easiest approach here would be to say that the league should go very similarly to how it did last season as not an awful lot has changed in the grand scheme of things. So our Premier League prediction for this season should look something like the table to the right.
Instantly, this doesn’t feel right. I may be slightly biased as a Chelsea fan but I would be shocked if they were to finish in 12th again this season. Gary O’Neil’s Bournemouth also appear to have massively overperformed at the tail end of last season, can we reasonably expect them to be completely out of the relegation battle in 15th? Similarly, Arsenal performed well above most people’s preseason expectations in 2nd, should they be predicted to push on and challenge City even further for the title? Or instead to drop off towards a top four battle?
So maybe another approach needs to be taken to predicting this season’s final standings. What we really want to get to the bottom of with these predictions is what the average is for each of these teams. Last year is not indicative because we expect a degree of mean reversion. While this concept is used fairly frequently in the world of sports, I will take a moment to explain for those to whom this is a new idea.
Mean reversion is the idea that in anything that you do, your performance will tend towards your personal average after each time you do it. Say that you go bowling, for example, and your average score is around a 100. This week when you go, you bowl a fantastic 130! What is your best estimate for how you perform the following week. Do you think you’ll score 130 or above again? Highly unlikely. What is most likely is that your score will revert (or regress) back to your “true” ability, which is a 100. This doesn’t mean that after that 130 you’ll be any worse as a bowler, but simply that for every good day that you have you also need to have a bad day in order for you to hit that long-run average.
Mean reversion was initially observed by Daniel Kahneman in his time helping the Israeli Air Force. Air Force leaders were trying to figure out the best way to motivate their pilots and had settled on a strategy that involved constantly berating the pilots for how they flew, regardless of how good it was. Why is this? Because they had noticed that after a pilot flew particularly poorly and they were lambasted, they tended to be better the following session. Meanwhile, pilots who had been praised for doing well after a good session tended to perform worse. Kahneman astutely observed that the reason for this had nothing to do with the way the pilots were being treated by their captains. Mean reversion just dictated that after an above average session, pilots were likely to perform worse, and vice versa.
So how does this relate to our football predictions? Well, we want to figure out what every team’s in the league’s average is as best we can, to take out the possibility of last year being an above or below average year.
There are two main ways that we can accomplish this easily using publicly available data. The first is to increase our sample size by looking back at multiple seasons and looking for the average. Arsenal have finished in their last 5 seasons in 2nd, 5th, 8th, 8th, and 5th. So our prediction for their finish this season would then be (2+5+8+8+5)/5, which gives us 5.6. That would mean our best guess for Arsenal’s finish should be 6th. Again, it doesn’t feel quite right.
The reason for this is due to the dynamic nature of football. Five years ago Bukayo Saka and Gabriel Martinelli were still in school, and Martin Odegaard was a Real Madrid youth player failing to live up to his potential. Other clubs in the league over the past five years have had many managers (Wolves), new ownership (Chelsea), and total changes in club philosophy (Burnley). So to get a better, but still imperfect, measure of the means of these clubs, we turn to our old friend xG.
Each year, various outlets develop a “Justice Table” which represents where teams in top leagues deserve to be placed based on their xG. For each game, outlets take the xG totals of both teams and calculate the probability of that game ending in a win, loss, or draw for either team. The probability is then multiplied by the number of points they get for each outcome to get their expected points (xPTS) for that match.

Take the match last Monday night between Manchester United and Wolves, for example. That match ended 1-0 to Manchester United, but the xG Wolves generated amounted to 2.49 compared to Manchester United’s 1.53. If that same match were to be played again, with the exact same chances, Wolves would be expected to win in the majority of them. Their expected points from that match would then be higher than United’s. In reality, however, United gained three points while Wolves were forced to leave Old Trafford with nothing.
xPTS, then, offers a chance for us to take the luck out of the Premier League table as much as possible, allowing us to see where teams deserve to be based on their performances last season instead of just their results. In the long run, we expect teams to gain points in line with their xPTS, so the table to the left should serve as our best prediction of teams performances this coming season.
Even in this form, the results are imperfect. The deficiencies that existed in our original table are corrected for slightly by this method, but serially under- or over- performances by some teams persisted in the xPTS. Additionally, we have no indication of how the newly promoted teams may do! Unfortunately for any Luton fans, no amount of statistical mastery can make up for not having played in the top division for thirty years.
To remedy this, we can combine the two methods of aggregation by creating a 3 year rolling average of Premier League xPTS. This should capture the expected performance of teams current squads better than a longer term average while also taking into account perennial performance. The weighting used between the 3 seasons is uneven (55-30-15), to reflect last season being the best indicator of performance. This creates what will serve as the official Zone 14 Data Driven Premier League Prediction for the 2023/2024 season:

We’ll see you at the end of the season to judge how we did.
Comments