Friday, November 16, 2012

Coaching Changes and the Limits of Social Science

A new paper has been published in Social Science Quarterly on the performance consequences of college football coaching head changes by a team of authors, including several from the University of Colorado (my university).

The paper, "Pushing “Reset”: The Conditional Effects of Coaching Replacements on College Football Performance" by Adler et al., concludes:
We find that for particularly poorly performing teams, coach replacements have little effect on team performance as measured against comparable teams that did not replace their coach. However, for teams with middling records—that is, teams where entry conditions for a new coach appear to be more favorable—replacing the head coach appears to result in worse performance over subsequent years than comparable teams who retained their coach.
They draw some practical conclusions from their study:
Our findings have important practical implications for the high stakes environment that is contemporary college football. When a college football team's performance is disappointing, the first and often only remedy administrators, fans, and sports writers turn to is firing the coach. This is usually an expensive approach to solving the problem.12 In fact, the concern of sky-rocketing head coaching salaries was the key finding in a 2009 Knight Commission on Intercollegiate Athletics report based on interviews with 95 FBS university presidents (Knight Commission on Intercollegiate Athletics, 2009). Despite the fanfare that often accompanies the hiring of a new coach, our research demonstrates that at least with respect to on-field performance, coach replacement can be expected to be, at best, a break-even antidote. These findings, coupled with the significant costs universities typically incur by choosing to replace a head football coach, suggest that universities should be cautious in their decision to discharge their coach for performance reasons.
A big problem with this paper, which is endemic across social science research, is the making of a connection of a "Large N" study to particular policy contexts, that is, a single N. Let me explain.

The paper looked at 263 coaching changes form 1997 to 2010 and performed a range of statistical tests on the data, and concludes:
[T[he key findings are that coaching replacements, on average, appear to provide short-term benefits to teams that are performing extremely poorly. However, if anything, they have a deleterious effect on performance among teams where entry conditions are most favorable. Importantly, this dispels the common rationale used by university athletic directors when firing the head coach, namely, that replacing the incumbent coach is a necessary step to improve on-field performance. Our findings demonstrate that the actual effects of such replacements are generally the opposite of what is intended.
The key phrases here are "on average" and "generally." The data shows a distribution of outcomes, from a degradation of performance to an improvement.

When universities replace coaches, they are not making a "Large N" decisions, but a N=1 decision. The important policy question is not "what happens in general?" but "how do we arrive on the right side of the performance distribution?"

Individual coaches are not "unique trials" in a random statistical distribution, but living breathing humans with unique characteristics and skills. We know from other research that individual coaches can add value to a team's performance. The key is context -- what coaches in what settings with what resources?
Would anyone like to argue that Bill Snyder at KSU was not key two different times to improving Kansas State football performance? or Bill McCartney in the 1980s here at Colorado?

To be fair, Adler et al. do recognize this possibility in the paper, writing:
As with any statistical analysis, we cannot rule out the possibility that some specific instances of coaching replacements truly benefit a team. This is certainly a possibility and there is little doubt that many commentators, school administrators, and other observers believe that coaching changes are often responsible for turnarounds in team performance. However, it is important to bear in mind that the fact that a team’s performance improves following a coaching replacement does not necessarily mean that the coach should be given credit for the improvement.
However, the paper does not tell us from its analysis what factors are correlated or otherwise connected to performance improvement. What they have told us is that in aggregate, coaching changes don't work out. Fair enough.

However, that Large N finding tells us very little about particulars and is far from the sort of information that can help a particular struggling program decide whether a coaching change might be a key factor contributing to improved performance.


  1. Coaches come in a range of quality, from elite to stinkers. The problem schools face are two: they can't necessarily recognize ability based on past performance, and they can't necessarily attract or pay for those whose ability they can recognize. So, if they are stuck in mediocrity, they can only guess on coaches who will accept their job. If they truly want to improve their team's performance, they may have to guess more than once to find the diamond in the rough. So while changing coaches once may be no better, on average, than keeping the old coach, changing multiple times increases the odds of eventually ending up at your goal.

  2. Hmmm.

    As a social scientist I could take exception to being thrown in with your "endemic" characterization (In general, generalizations are wrong). And as one who has read the SSQ article (and a few hundred like it), I could take exception that they even do what you claim.

    But let's stick with your main point. Are you suggesting that decision makers ignore "Large N" information?

  3. Rod-

    Thanks for the comments. I'm also a social scientist, and agree that in general, generalizations are wrong;-)

    Large N information is most relevant to decisions about Large N subjects. Such information is far less useful in decisions where N=1. Of course, to go much further than this, we'd have to get into specifics. For instance, does this paper provide any useful guidance to Mike Bohn, athletic director at Colorado. Perhaps it tells him to hire the right coach, not just any coach. But I'd guess he knows that.

    What is your view?


  4. [I thought I tried this already. Try again.]

    Suppose an AD chooses quality level Q for the football program. This entails a particular historical feel for the type of coach that can generate quality Q. That's the "Large N" contribution--what type of pool am I looking into?

    The "Small N" choice is then to put together a pool at at point in time that is like the one suggested by the "Large N" information. The AD might get lucky and get more than he or she had hoped for. But the Large N information shapes that outlook.

    I just don't see anything productive in trying to say there is some separation in this idea. But maybe you just meant that it's a sequential problem for the AD--use the Large N information and then turn to the problem where Small N information is also useful.

  5. Thanks Rod,

    Take the decision by the Tennessee AD to fire Derek Dooley. My argument is that the post-coaching change experiences of the N=263 cases summarized by Adler et al. is essentially irrelevant to this decision.

    The paper says:

    "Despite the fanfare that often accompanies the hiring of a new coach, our research demonstrates that at least with respect to on-field performance, coach replacement can be expected to be, at best, a break-even antidote."

    In general, yes. For Tennessee is their current coaching change a break-even proposition? I'd say that it depends upon who they hire. I simply don't think you can translate the N=263 to N=1.

    The paper, and certainly the press release, make such a connection.