Monday, September 12, 2011

Skill of the 2010/2011 PFPO Champions League Prediction

Last year the CIES Football Observatory (which also calls itself the Professional Football Players Observatory or PFPO, which is supported by FIFA) issued a prediction for the knockout round of the UEFA Champions League.  The PFPO touted its effort prior to the final as follows:
The PFPO has recently innovated to predict outcomes of the major football competitions. The scenario envisaged for the current UEFA Champions League has proven to be quite accurate. From the final 16 onwards, we have correctly foreseen the outcome of 11 matches out of 14 (download the pdf). If Barcelona wins the trophy, we will reach the threshold of 80% correct predictions. However, the gap between the Catalan team and Manchester United is negligible. All will depend on their form on the day, injuries and, of course, chance!

Our prediction model confers a relative strength to clubs on the basis of four criteria: the experience accumulated by players in Champions League matches, their quality in terms of participation in recent international matches, weighted according to the level of the national team represented, the teams’ squad stability, and the results obtained in the group stage of this year’s Champions League competition. The club with the richest squad has an index of 100. All other values have been adjusted according to it.
Was the PFPO as good prediction?  Well, on the one hand correctly picking the winners of 11 of 14 games up to the final seems pretty good.  But as (the few) readers here know, evaluating a prediction based solely on relative accuracy can be misleading.  It is important to ask if the prediction had skill. meaning an improvement on a naive baseline expectation.

In this case, one such naive baseline might be the UEFA club coefficients, calculated during each season and used to seed teams in the Champions and Europa League draws.  They are publicly available and thus readily used as a naive baseline.

How would a prediction of last season's knock-out stages of the Champions League have looked using the UEFA team coefficients for 2010/2011 at the start of the knock-out stage?

The higher ranked team according to the UEFA team coefficients at the start of the knock-out stage would have correctly predicted 13 out of 14 games prior to the final (missing out on Inter).  This means that the PFPO methodology did not perform as well as the naive methodology -- that is, it showed no skill compared to a naive baseline expectation.  A small consolation for the PFPO is that it predicted Barcelona to win the Champions League where as the UEFA coefficients gave it narrowly to Manchester United.

I'll re-run this exercise later in the season when we get to the round of 16.  Stay tuned.


23 comments:

  1. Are you sure it's 13/14 correct, only getting Inter wrong (I assume their loss to Schalke)?

    I just checked Tottenham's ranking versus AC Milan on your link, they were 10th and we (Spurs) were 18th - and yet we beat them.

    That's the only one I checked.

    PFPO ranking put us higher than Milan btw, and so got that one right.

    ReplyDelete
  2. I also note that the UEFA rankings for this season have Arsenal 5th, an improbable seeding given the absence of Fabregas, Nasri et al.

    I note that the first three PFPO criteria - ' the experience accumulated by players in Champions League matches, their quality in terms of participation in recent international matches, weighted according to the level of the national team represented, the teams’ squad stability' will take those departures into account, and then balance by group stage performance.

    ReplyDelete
  3. I have harnessed the power of my brain and sentimentality, incorporating UEFA coefficients, anthropogenic climate change and local weather patterns, cultural histories, player injury probabilities and the critical desire/pay grade ratios, and have predicted a 2012 winner: Chelsea FC
    Take that PFPO!

    ReplyDelete
  4. Thanks Roddy, you have to adjust the final 2010/2011 numbers to the round of 16 values using the UEFA methodology ... the numbers accumulate through the season. So Spurs were in fact ranked higher than ACM at the round of 16.

    The time to look at the numbers for 2011/2012 will be on the eve of the round of 16.

    Thanks!

    ReplyDelete
  5. Ah, thanks. Where can I find that adjustment, or how do I do it? I note that the UEFA rankings for this season show Tottenham in 28th, and ACM in 13th - are they really so sensitive that we went from 8 places below them before the group stages last season to above them by the end of the group stages, then back to 15 below them by the end of the season?

    My instinct would be that rankings that change so fast are a little fickle, and perhaps therefore not the best for a naive benchmark!

    ReplyDelete
  6. Roddy, They are described on the right hand side of this page:

    http://www.uefa.com/memberassociations/uefarankings/club//index.html

    Looks like Wikipedia also has a good description:
    http://en.wikipedia.org/wiki/UEFA_coefficient

    ReplyDelete
  7. Thanks Roger.

    Now I'm thoroughly confused. I can't see from their description (or Wiki) how we could have started so far behind them, climbed above by the round of 16, and ended below again.

    On this page http://www.uefa.com/memberassociations/uefarankings/club/season=2011/index.html ACM were 10th with 94 points and THFC were 18th with 78 points - I don't know if that was at the end of the season, beginning, or in between - I guess the end assuming the table is 'live'?

    The previous season ACM were 9th (100 pts)and THFC 30th (56 pts), 44 points apart.

    So during the season we climbed 22 points, and ACM rose 6 points.

    You get points from two sources, your own performance in the two comps (THFC had not taken part hardly, if at all) on the basis of:
    Group stage participation – 4 points
    Group stage win – 2 points
    Group stage draw – 1 point
    Round of 16 participation – 4 points

    and 20% of 'the association coefficient', I assume that means your country's teams, in which case THFC would have picked up some more points during the group stages of the two competitions, presumably the same or more than ACM would have done?

    Since it looks as though we were 44 points behind at the start of the season it seems unreasonable to have managed to get above them just in time to play them in the round of 16, and then slump to 16 points below again by the end of the season.

    Did you have a link or a press release showing exactly what the rankings were at the start of the round of 16?

    Don't answer if it's too boring!

    Regards,

    Roddy

    ReplyDelete
  8. Thanks Roddy, I wouldn't be surprised if I had made a mistake, given the complexity, but one difference I see in our numbers is that I am using just the 10/11 numbers and nothing from previous years.

    I'll get back to this in a bit, as my spreadsheet is on my laptop ... More soon, Thx!

    ReplyDelete
  9. Roddy, here are the ranks I back out of UEFA, please let me know if you see any errors, thanks!

    33.671 Manchester United
    33.642 Barcelona
    31.642 Real Madrid
    28.133 Schalke 04
    25.671 Chelsea
    25.016 Shakhtar
    24.133 Bayern
    23.671 Tottenham
    22.671 Arsenal
    21.642 Valencia
    20.314 Inter
    20.15 Marseille
    19.15 Lyon
    18.34 Kobenhavn
    18.314 AC Milan
    18.314 AS Roma

    ReplyDelete
  10. That is a difference, I thought they used the aggregate of five years from the link? So we had some points from the Europa League in prior seasons, and Schalke, for example, had almost none.

    'The club coefficient rankings are based on the results of clubs competing in the five previous seasons of the UEFA Champions League and UEFA Europa League. The rankings determine the seeding of each club in all UEFA competition draws.'

    Coefficient calculation
    Clubs' coefficients are determined by the sum of all points won in the previous five years, plus 20% of the association coefficient over the same period (33% before 2009).

    These rankings will be updated after each round of UEFA club competition matches.'

    If that's right your naive baseline, defined as 'In this case, one such naive baseline might be the UEFA club coefficients, calculated during each season and used to seed teams in the Champions and Europa League draws.' would be the five year total in the right hand column, updated to the point of the round of sixteen?

    (Otherwise, as in your table above, we would have ranked above Inter at that point, who won Serie A and were the holders of the Champions League, and also above Arsenal, even more directly odd, however satisfying, in effect a ranking mainly based on group performance).

    I note that on the UEFA link they have now updated for the first round group games, and so Man U's total has dropped from 150 to 130 as the whole year five years prior has dropped in favour of this year's running total - they now rank below Marseiile for example.

    I'll redo the naive baseline test using the four full prior years and your numbers above for the 2010/11 running total just prior to the round of 16.

    ReplyDelete
  11. Roger, I've done a Google docs spreadsheet I'll link you to by email, but I think the naive baseline did worse, getting four wrong out of fourteen!

    Correct

    Lyon/Real Yes
    ACM/Spurs No
    Ars/Barca yes
    Roma/Shak yes
    Inter/bayern No
    valencia/Schalke No
    Copenhagen/Chelsea Yes
    marseille/Man u Yes

    real/Spurs Yes
    barca/Shak Yes
    Inter/schalke No
    Chelsea/Man u Yes

    real/barca Yes
    Schalke/Man u Yes

    ReplyDelete
  12. Thanks Roddy, I'll have a look and get back to you a bit later today.

    ReplyDelete
  13. Roddy, looking at my numbers here is one difference:

    ACM/Spurs, I have Spurs, you have ACM

    Here is how I arrived at the ranking:

    At the end of 2011

    Spurs: 24.671
    ACM: 18.314

    But you have to back out the additional point Spur got for advancing to the round of 8 (i.e., subtract 1 point).

    So at the start of the round of 16 it was Spurs 23.671 and ACM 18.314

    Score one for the naive baseline.

    Does this make sense? Thanks!

    ReplyDelete
  14. I agree Spurs got one more point that season for going a round further than ACM, and that at the start of the round of 16 Spurs had got more points *that season* than ACM. That's because we did better in the qualifying round (and because we got 20% of the English clubs' points tally, as did ACM of the Italian, if I understand the system, which I may well not).

    As an aside I understand your backing out of the one extra point to go back one round, BUT the 20% of your association points will also affect it, so the answer won't be 23.671, but something else. But that's what I used in my spreadsheet, the numbers you posted. One would need to understand the 20% rule, and then back out 20% of the extra points that Man U and Chelsea got, and Inter for beating Bayern too from the ACM score. I think. 'Clubs' coefficients are determined by the sum of all points won in the previous five years, plus 20% of the association coefficient over the same period'.

    But that's not the UEFA coefficient used for seedings, which was specifically your definition of the naive baseline? The seedings are done on the last five years total, no?

    'The club coefficient rankings are based on the results of clubs competing in the five previous seasons of the UEFA Champions League and UEFA Europa League. The rankings determine the seeding of each club in all UEFA competition draws.'

    So I've used your backed-out numbers, which I think are wrong, but not by much probably, and added them to the season end totals from the prior four years. that's the naive baseline at the start of the round of 16. I think.

    ReplyDelete
  15. Roddy, yes, I am just using the single year number. That seems more appropriate as a naive baseline as it refers to the strength of the team in that year.

    My assumption is that since Serie A and the EPL have the same number of CL slots that the 20% rule applies equally to them (??) ...

    ReplyDelete
  16. Yes, Italian teams and English teams both get 20% of 4 teams' results, my point was that needs deducting/backing out of the numbers to arrive at a starting point for the round of 16 as well as the specific team's score.

    I can't agree that the 'best' naive baseline is the single season UEFA coefficient, as a particularly good qualifying group, which is something of a lottery, or indeed a good performance by one's compatriots that season, has a strong influence on the points.

    I think your original definition of the naive baseline, the official seeding system, is the best available null hypothesis. That is what will be used for the following year's seedings, it's the offical measure of probability. One can take it (the 5 year rolling number) at the start of the season, or perhaps at the start of the round of 16, but common sense tells me that that season's coefficient alone is too volatile.

    I like the way that the UEFA coefficient is entirely club-based, and the PFPO is player-based, quite different systems of prediction, interesting to contrast them.

    ReplyDelete
  17. Roddy, Thanks ... I would actually agree with you about the UEFA coefficients being "best" -- I am actually using them here because I want to understand them better.

    A "better" naive baseline might simply be the oddsmakers view on the eve of the knock out rounds. Or even team salary or combined player value (as I used for the World Cup predictions).

    One thing I am doubtful of is how much value added comes with the PFPO methodology ... Thanks!

    ReplyDelete
  18. You can't go wrong with Betfair, that would be my number one naive baseline, along with the offical seeding method (which, to be pedantic, I think did worse than the PFPO). My instinct would be the UEFA rankings, but depreciate them over time, so a 20% weight for 4 years ago, 40% 3 years ago etc. Something like that. But the bookies will always be able to adjust in a way the rankings can't.

    The PFPO methodology is instinctively interesting, this year it would (perhaps correctly) promote Man City, who otherwise would start very lowly, and diminish (perhaps correctly) Arsenal with its player focus, but how it handles 25 man squads is beyond me.

    (I was at Spurs for our demolition by Man City, they were truly awesome, it was like watching an anglicised Barca).

    ReplyDelete
  19. I've been musing on the 'unfair' system of getting 20% of one's compatriot's points, mainly because it puts Man U ahead of Barca when the evidence is other way.

    But it makes good sense in another way. Both Tottenham last season and Man City this have a reasonable number of points, most gained from the 20% rule, and that's probably right. Any team in the EPL that can qualify for Europe, and I assume something of the same is true across European leagues, is a very decent team.

    It just feels odd from an EPL perspective given that we have had a few years where the Big Four qualify relentlessly, with an occasional outsider, so we have more than enough data on those four to rank them purely from their own European experience, ditto Real and Barca.

    The UEFA system can't predict a Schalke either way, with or without the 20% rule, I wonder what the rankings would look like without that rule.

    ReplyDelete
  20. Hi Roddy, I think that UEFA's interest is both in fairly ranking team strength but also sharing team wealth, a la FFP regulations. So the rankings seem to be a compromise amongst those objectives.

    Imagine the thought experiment of instantaneously trading every player for Barcelona with, say, Copenhagen. The five-year rankings would still have Barcelona near top, even though they would in this scenario be a much weaker squad.

    I have started a look at CL diversity with respect to national leagues, building on this:

    http://www.playthegame.org/news/detailed/champions-league-201112-more-of-the-same-5200.html

    More on that soon ... Thanks!

    ReplyDelete
  21. The PFPO rankings look designed to solve that thought experiment by focussing on squad ratings (perhaps a proxy for your player value suggestion), squad stability, and the group stage results.

    Man City are a small example of that thought experiment this year having bought extensively - Aguero, Clichy, Nasri.

    As I mentioned before, 'complimenting' teams like Spurs on qualifying from the EPL by awarding them 20% of the points is also a rational seeding strategy, an acknowledgement of their likely strength for qualifying at all.

    But I dare say money might come into it too!

    The diversity question is interesting. Given certain static or slow-moving features such as fan base, stadium revenues, national TV revenue and the division thereof, it's a classic capitalism inevitably leads to monopoly/oligopoly problem isn't it? With Chelsea back when Abramovich took over and Man City now as the exogenous factors.

    Did you get my geek UEFA rankings on email, I stripped out the 'complimentary' points from the top 16 seeds.

    ReplyDelete
  22. Thanks Roddy, got the email, and it'll be weekend reading ... and given the effort you are putting in on this, maybe I could ask you if you'd be interested in a guest post here (guaranteed to reach dozens of faithful readers;-) on the UEFA rankings. They are an obscure topic, but worth highlighting for various reasons ... I'll get back to you via email. Thanks!

    ReplyDelete
  23. What fun. Dozens of readers indeed! I'll ponder it, your 'skill' test might be a way in to an idea. Perhaps test what (reasonable) system over the last x years might work best, stopping short of data mining of course. Maybe test for depreciation systems. Thanks. I've done a few guest posts on climate/energy stuff on tAV and Delingpole.

    This is both displacement therapy and practice for my Masters starting in a very few days at LSE. Do you know Christopher Dougherty, my course director?

    ReplyDelete