Wednesday, April 25, 2018

Some Resources on Testosterone Regulation in Elite Athletics

It appears that the IAAF is on the verge of announcing another set of regulations governing allowable natural testosterone levels in women athletes. This is a bad idea. In anticipation of the new regulations I thought I'd post up some resources for those who are interested in the issue.

The regulation of testosterone only the latest effort by sports administrators to police how women should look. There are countless biological characteristics of humans that in some way contribute to elite athletics performance -- testosterone in women (but not in men) is the only naturally occurring biological characteristic that is regulated.

I take on this issue in some depth in this paper
Pielke Jr, R. (2017). Sugar, spice and everything nice: how to end ‘sex testing’in international athletics. International Journal of Sport Policy and Politics, 9:649-665. (PDF, free to read)
Remarkably, in 2011 the IAAF listed a set of criteria for how women should appear, lest they be reported to officials for investigation of their testosterone level. These criteria are listed in the slide below (from a talk I give on this subject). Two of the nine criteria have to do with breats size and shape.
More generally, let's say that you accept the argument that testosterone should be regulated. I don't, but let's play along. Even here, the science relied on by the IAAF does not support the case that they are making.

The IAAF bases its case on this paper:
Bermon, S., & Garnier, P. Y. (2017). Serum androgen levels and their relation to performance in track and field: mass spectrometry results from 2127 observations in male and female elite athletes. British Journal of Sports Medicine
That paper purports to show that women in certain events gain a benefit from testosterone levels in the higher end of the range found in female athletes. 

That paper has received a range of criticism as being flawed. Notably:
Franklin S, Ospina Betancurt J, Camporesi S What statistical data of observational performance can tell us and what they cannot: the case of Dutee Chand v. AFI & IAAF Br J Sports Med Published Online First: 23 February 2018. doi: 10.1136/bjsports-2017-098513
That paper concludes:
we believe that it is scientifically incorrect to draw the conclusions in the Bermon and Garnier paper from the statistical results presented. Their paper claims that certain athletes have an advantage in precisely the five events where a significant effect was found: we calculate that a high share of those five significant effects are likely to be false positives.
the statistical analysis data processing in this paper is such a mess that I can’t really figure out what data they are working with, what exactly they are doing, or the connection between some of their analyses and their scientific goals. 
Gelman was motivated by Simon Franklin, a post-doc at LSE, who emailed him that:
There are more than a few problems with the paper, not least the fact that it makes causal claims from correlations in a highly selective sample, and the bizarre choice of comparing averages within the highest and lowest tertiles of fT levels using a student t-test (without any other statistical tests presented).

But most problematic is the multiple hypothesis testing. The authors test for a correlation between T-levels and performance across a total of over 40 events (men and women) and find a significant correlation in 5 events, at the 5% level. They then conclude:
These are 5 events for which they found significant correlations! And we are lead to believe that there is no such advantage for any of the other events.
Female athletes with high fT levels have a significant competitive advantage over those with low fT in 400 m, 400 m hurdles, 800 m, hammer throw, and pole vault.
I also have written two critiques. First, a post-publication peer review:
My bottom line: The paper has some significant methodological issues, most notably the inclusion of female athletes who doped with those with naturally high levels of T. There is some double counting of athletes in 2011 and 2013. There is also speculation that the male findings are contaminated by doping. Methodological issues notwithstanding, the paper nonetheless strongly reinforces the 2015 CAS Chand decision. 

The IAAF data of Bermon and Garnier (2017) don't support the proposed regulations of testosterone in women at distances of 400m to one mile. Consider the figure below:
Let's accept the analysis as valid (maybe not, but let's play along). These IAAF data (pink bar) indicate that over distances of 400m, 800m and 1500m high testosterone women are on average 1.1% faster than their low testosterone counterparts. Unfair, IAAF might scream.

But look at the data that IAAF collected for men at 400m and 1500m (blue bar). These data indicate that high testosterone men are on average 1.1% faster than their low testosterone counterparts. Surely if high T in women in selected events where performance differs is to be regulated, then high T in men in selected events where performance differs is also to be regulated?

If IAAF responds that the T standard applies only to women but not men based on performance data, then this is the very hallmark of sex discrimination. This only scratches the surfaced of flawed T regulation.

We shall see what IAAF actually presents tomorrow. However, based on the evidence and arguments that IAAF have presented thus far, its T regulations are focused on one athlete (initials CS), discriminatory, sexist and (for those who think analysis of T levels in athlete performance is relevant) resting on a flawed evidence base.

There can be little doubt that this new policy will be challenged at CAS.


Post a Comment