15 June UPDATED: I have updated the Infostrada predictions with a more recent version, thanks to Simon Gleave (@SimonGleave).
The World Cup starts tomorrow. Prognosticators have been hard at work generating their predictions of who will advance and who will win. But which prediction is the best? Is it the one who picks the winner? Or is it the one which best anticipates the knock-out round seedings? How can we tell?
In an ongoing exercise here at The Least Thing I am going to evaluate 10 different World Cup predictions. To do this I am going to quantify the "skill" of each forecast. It is important to understand that forecast evaluation can be done, literally, in an infinite number of ways. Methodological choices must be made and different approaches may lead to different results. Below I'll spell out the choices that I've made and provide links to all the data.
A first thing to understand is that skill is a technical term which refers to how much a forecast improves upon what is called a "naive baseline," another technical term. (I went into more detail on this at FiveThirtyEight earlier this spring.) A naive baseline is essentially a simple prediction. For example, in forecast evaluation meteorologists use climatology as a naive baseline and mutual fund managers use the S&P 500 Index. The choice of which naive baseline to use can be the subject of debate, not least because it can set a low or a high bar for showing skill.
The naive baseline I have chosen to use in this excercise is the transfer market value of the 23-man World Cup teams from Transfermarkt.com. In an ideal world I would use the current club team salaries of each player in the tournament, but these just aren't publicly available. So I'm using the next best thing.
So for example, Lionel Messi, who plays his club team soccer at Barcelona and his national soccer for Argentina, is the world’s most valuable player. His rights have never been sold, as he has been with Barcelona since he was a child, yet he’s estimated to have a transfer market value of more than $200 million. By contrast all 23 men on the USA World Cup squad have a combined estimated value of $100 million. (I have all these data by player and team if you have any questions about them -- they are pretty interesting on their own.)
Team | Transfer value | ||
1 | Spain | $ 1,044,960,000 | |
2 | Germany | $ 944,160,000 | |
3 | Brazil | $ 803,040,000 | |
4 | France | $ 691,740,000 | |
5 | Argentina | $ 657,720,000 | |
6 | Belgium | $ 584,640,000 | |
7 | England | $ 561,120,000 | |
8 | Italy | $ 542,640,000 | |
9 | Portugal | $ 517,860,000 | |
10 | Uruguay | $ 364,476,000 | |
11 | Netherlands | $ 348,600,000 | |
12 | Croatia | $ 324,660,000 | |
13 | Colombia | $ 318,931,200 | |
14 | Russia | $ 308,784,000 | |
15 | Switzerland | $ 299,040,000 | |
16 | Chile | $ 234,864,000 | |
17 | Cote D'Ivoire | $ 202,389,600 | |
18 | Cameroon | $ 198,072,000 | |
19 | Bosnia and Herzegovina | $ 192,780,000 | |
20 | Ghana | $ 183,708,000 | |
21 | Japan | $ 164,640,000 | |
22 | Mexico | $ 152,964,000 | |
23 | Nigeria | $ 145,908,000 | |
24 | Greece | $ 134,232,000 | |
25 | Ecuador | $ 105,588,000 | |
26 | United States of America | $ 97,104,000 | |
27 | Algeria | $ 96,096,000 | |
28 | Korea Republic | $ 88,074,000 | |
29 | Costa Rica | $ 49,980,000 | |
30 | Iran | $ 41,076,000 | |
31 | Australia | $ 36,204,000 | |
32 | Honduras | $ 35,952,000 |
In using these numbers, my naive assumption is that the higher valued team will beat a lower valued team. As a method of forecasting that leaves a lot to be desired, obviously, as fans of Moneyball will no doubt understand. There is some evidence to suggest that across sports leagues, soccer has the greatest chance for an underdog to win a match. So in principle, a forecaster using more sophisticated method should be able to beat this naive baseline.
Here is what the naive baseline (based on rosters as of June 5) predicts for the Group Stages of the tournament: The final 4 will see Brazil vs. Germany and Spain vs. Argentina. Spain wins the tournament, beating most everyone’s favorite Brazil. The USA does not get out of the group stage, but England does. All 8 of the top valued teams make it into the final 8.
While this naive baseline is just logic and assumptions, work done by “Soccernomics” authors Stefan Szymanski and Simon Kuper indicates that a football team’s payroll tends to predict where it winds up every year in the league table. Payrolls aren't the same thing as transfer fees, of course, but they are related. Unfortunately, as mentioned above individual player salaries are not available for most soccer leagues around the world (MLS is a notable exception).
I will be evaluating 10 predictions over the course of the World Cup. There are (with links to the data sources that I have used, as of 10 June unless noted otherwise):
- Goldman Sachs
- FIFA World Rankings
- Elo Rankings
- Infostrada
- Hassan and Jimenez
- Danske Bank
- Bloomberg
- Andrew Yuan via The Economist
- Betfair.com Odds (on June 5, 2014 courtesy of Roddy Campbell)
- FiveThirtyEight
The predictions are not all expressed apples to apples. So to place them on a comparable basis I have made the following choices:
- A team with a higher probability of advancing from the group is assumed to beat a team with lower probability.
- If no group stage advancement probability is given I use the probability of winning the overall tournament in the same manner.
- This means that I have converted probabilities into deterministic forecasts. (There are of course far more sophisticated approaches to probabilistic forecast evaluation.)
- No draws are predicted, as no teams in the group stages have identical probabilities.
- The units here, in the group stage at least, will simply be games predicted correctly. No weightings.
Other choices could of course be made. These are designed to balance simplicity and transparency with a level playing field for the evaluation. Just as is the case with respect to the value of having a diversity of predictions, having a diversity of approaches to forecast evaluation would be instructive. No claim is made here that this is the only or best approach (laying the groundwork here for identifying eventual winners and losers).
With all that as background, below then are the predictions in one table (click on it for a bigger view). The yellow cells indicate the teams that the naive baseline sees advancing to the knockout stages, and the green shows the same for each of the 10 predictions. The numbers show the team rankings according to each prediction.
I will be tracking the performance of the 10 predictions against the naive baseline as the tournament unfolds, scoring them in a league table. I'll also discuss the methods and results as well as the sensitivity of the latter to the former. When the Group Stages wind up I'll reset for a second part of the prediction evaluation.
Finally, for now, I welcome any comments on this exercise. If there are other predictions that you'd like to track in the same manner alongside these, please enter them in the comments.
Let the game within the games begin!
With all that as background, below then are the predictions in one table (click on it for a bigger view). The yellow cells indicate the teams that the naive baseline sees advancing to the knockout stages, and the green shows the same for each of the 10 predictions. The numbers show the team rankings according to each prediction.
Finally, for now, I welcome any comments on this exercise. If there are other predictions that you'd like to track in the same manner alongside these, please enter them in the comments.
Let the game within the games begin!