Thursday, August 16, 2018

Thanks for Reading

For now, at least, this blog is not being updated. Thanks for reading.  If you'd like to be in touch, please send an email. If I've got something to say it'll be at rogerpielkejr.com.

Wednesday, August 1, 2018

Submission to BJSM on Unreliable Data and Results in IAAF Science on Testosterone

Along with Ross Tucker and Erick Boye, today I have submitted our re-vised paper on erroneous data and unreliable results on Bermon and Garnier (2017), the IAAF study which underpins its new testosterone regulations.

The full paper can be found here in PDF.

For additional background see here, here, here. I am happy to hear your comments or questions, here in the comments or on Twitter.

Friday, July 27, 2018

BJSM Lets Stand a Deeply Flawed Paper, Why?

A few weeks ago the New York Times wrote about a paper we had submitted to the British Journal of Sports Medicine calling for Bermon and Garnier (2017, BG17) to be retracted. You can get the back story at the links in the previous sentence, but two things to understand up front:
  • BG17 is not just any old scientific paper -- it is the only scientific basis for regulations to be implemented by the International Association of Athletics Federation (IAAF) governing naturally occurring testosterone in female athletes.
  • Calling for a retraction of a scientific paper is not something to be done lightly. BG17 is the first paper that I have called on publicly to be retracted in 25+ years of publishing, reviewing, serving on editorial boards and studying science in policy. Yes, it is that bad.
Today the editor of BJSM emailed with the following information (which are quoted in full from his message):
1. The BJSM editorial team has considered the various points raised to us about retracting BG17 (including yours) and stand by our decision that retraction would be inappropriate. 
2. We respect the authors’ decision not to open these data even though we support the general principle of data sharing.
No retraction, no sharing of data.

When should a paper be retracted? Fortunately, the publisher of BJSM has a policy on retraction which states:
Retractions are considered by journal editors in cases of evidence of unreliable data or findings, plagiarism, duplicate publication, and unethical research.
This retraction policy is similar to the recommendation of the Committee on Publication Ethics (COPE), whose guidelines are followed by most scientific publishers (PDF):
Retraction is a mechanism for correcting the literature and alerting readers to publications that contain such seriously flawed or erroneous data that their findings and conclusions cannot be relied upon.
Why have we called for BJSM to retract BG17? Because of seriously flawed and erroneous data such that the paper's conclusions cannot be relied on. This is such a clear case that it is baffling why BJSM has chosen not only to let the paper stand, but to not require the paper's flawed data to be shared openly.

Why is the case so clear?
An editorial board should be so lucky as to have such a clear cut case. It's a no brainer. The message to BG17 should be: Sorry guys but this effort is so flawed that we are going to pull it. End of story.

So why did the BJSM editorial board act as they did? I have no insight on their internal deliberations, but given the retraction policy of the publisher of BJSM and the ethical guidelines suggested by COPE, there logically can be only three possibilities.
  • The BJSM editorial board disagrees with our analysis and the statement of the lead author of BG17 that there are pervasive errors underlying the original analysis. This would be a very odd position to take, as it is contrary to both evidence and the admission of the researchers who wrote BG17 and BHKE18.
  • The BJSM editorial board accepts that there are pervasive errors in BG17 and has decided to let the paper stand regardless. This too would be an odd position to take, as it is unethical and unscientific (according to COPE) and contrary to the retraction policy that BJSM is expected to follow. No scientific publisher worthy of the title would let flawed science stand. 
  • The BJSM editorial board is uncertain about the presence of pervasive errors in BG17 and in the face of this uncertainty has decided to let the paper stand. This would be an exceptionally odd position to take in light of the fact that BJSM has concluded (emphasis added), "We respect the authors’ decision not to open these data even though we support the general principle of data sharing." A really good way to understand the true depth of data errors would be for BJSM to require the authors of BG17 to release fully 100% of their data that has no privacy concerns.
So which is it?

The bottom line here is that BJSM has failed in its core scientific obligations. By all appearances BJSM is acting in the interests of IAAF and protecting IAAF research from normal scientific scrutiny.  I have no idea why this is so, but it is a subject that I'll continue to pursue.

Ross Tucker, Erik Boye and I will be revising our submission to BJSM and will ask to have it reviewed, published and linked to BG17. Obvious more to come, stay tuned.

(Note: This post represents my views only, though everyone is welcome to share them.)

Tuesday, July 17, 2018

A Deeper Dive into the Scientific Basis for IAAF Testosterone Regulations

This post represents some notes, references and quotes that I'd like to have accessible. Perhaps they are useful to others. It is technical and terse, but if you are following along on Bermon and Garnier (2017, BG17), it'll probably make sense. As always here, caveat lector.

The IAAF testosterone regulations focus on what is called "circulating testosterone" or "serum testosterone." These concepts as well as "free testosterone" are defined as follows by Goldman et al. 2017:
Total testosterone refers to the sum of the concentrations of protein-bound and unbound testosterone in circulation. The fraction of circulating testosterone that is unbound to any plasma protein is referred to as the free testosterone fraction.
T and fT are important variables in BG17. For reasons I do not fully understand (I welcome explanations), BG17 chose not to analyze athletes according to T values, but instead divides them into tertiles based on fT levels.
In order to test the influence of serum androgen levels on athletic performance, in each of the 21 female athletic events and 22 male athletic events, athletes were classified in tertiles according to their fT concentration.
There are two methodological issues/questions here:
  1. Why use tertiles rather than look at correlating all data in continuous fashion (discussed here)?
  2. Why use fT at all, as T is measured in the study and fT is calculated (a point raised by Sőnkson et al. 2018)?
There is no good answer to #1.

I welcome being educated on #2. 

However, there is an interesting twist here. In my reading and researching this topic, I came across a recent paper by David Handelsman (2017) on the use of "free testosterone" in clinical research. For those scoring at home, Handelsman is the lead author of the other new peer-reviewed paper (in addition to BG17) that is cited in the IAAF T regulations.

Handelsman (2017) says this about "free testosterone":
Despite being extant for decades, the use of FT measurement has barely been stated in a testable form and virtually never directly tested as a refutable hypothesis. Rather by inference and repetition as if it is self-evident, it has become entrenched as an enthusiastically wielded yet largely untested concept that goes from one paper to the next without ever seeming to pass through a questioning mind.
Hmm ... more:
A valid scientific concept requires a sound foundational theory and evidence and being open to testing and refutation not just an unshakeable belief. Introducing the FT into clinical guidelines is particularly hard to understand as using that unproven criterion can only provide false reassurance by merely shifting the uncertainty to an even shakier footing in a subtle bait-and-switch.
OK, so Handelsman is not a fan of "free testosterone" as a meaningful metric to be used in clinical guidelines. Got it.

Yet, fT is the basis for the binning of athletes that forms the very basis for the analysis and conclusions of BG17. Even if we were to ignore the clearly fatal data problems that we have identified, it would seem that one expert that the IAAF relies on has eviscerated the very basis for the study performed by the other experts that IAAF relies on. Even with perfect data, BG17 is problematic.

This provides yet another reason why BG17 doesn't form a legitimate basis for any regulatory action. At CAS, lawyers will surely have a great time asking Prof. Handlesman about his views on fT and its role in the methodologies of BG17.

But there is more. I discussed BHKE18 as a do-over, as it re-did the BG17 analysis after dropping some 220 data points. Important side note here: For comparison, BHKE18 says (emphasis added): 
We have excluded 230 observations, corrected some data capture errors and performed the modified analysis on a population of 1102 female athletes.
Below is my tabulation of data points in the new (BHKE18) and original (BG17), and the difference between the two representing what I've called here bad data. It appears that BHKE18 has miscalculated how many bad data points that it identified (220 vs. 230) between the two studies. Sure it may be a typo, but it is exceedingly sloppy to get wrong the number of bad data points you are reporting:

new original bad data
Event n n
100 m   96 112 16
100 m H 59 73 14
200 m 59 71 12
400 m 62 67 5
400 m H 52 67 15
800 m 56 64 8
1500 m 55 66 11
3000 m SC 49 56 7
5000 m 36 40 4
10 000 m 29 33 4
Marathon 86 92 6
Discus 36 48 12
Hammer Throw 42 54 12
Shot Put 42 54 12
Javelin 42 55 13
Long Jump 50 62 12
Triple Jump 41 54 13
High Jump 44 56 12
Pole Vault 39 48 9
20 km RW 80 97 17
Heptathlon 47 53 6
SUM 1322 220

In addition to dropping the bad data, BHKE18 re-does the analysis focused on T not fT in response to Sőnkson et al. 2018. In what sure looks like some kind of p-hacking, BHKE18 adopts some significant changes to their methodology (quotes from BHKE18 below, followed by my comments in italics):
  • BHKE18: "we used running events from 400 m up to 1 mile, on the basis that that is where T produces its greatest performance-enhancing effects"
    • This of course was a conclusion of BG17 and subsequently the focus of the IAAF regulations. With the bad data of BG17, there is no longer any evidence that these events are where T has the greatest effects. There is no basis here. This is affirming the consequent. 
  • BHKE18: "we have aggregated result from the long sprints (400 m events), then middle-distance runs (800 m and 1500 m), and finally long sprints and middle-distance runs (400 m, 800 m and 1500 m), into one event group for further statistical analysis."
    • No such grouping or aggregation was used in the original study. In fact BG17 said this: "These different athletic events were considered as distinct independent analyses and adjustment for multiple comparisons was not required." So a complete reversal, which is it?
  • BHKE18: "The time results were transformed into an index, that is, percentage of the best performance achieved by each event"
    • The original analysis focused on absolute times, not a percentage index.
  • BHKE18: "We have used the Spearman rankorder correlation coefficient to explore the correlation between competition results and testosterone levels, using a two-sided test at the 0.05 significance level."
    • Here we see a new statistical test (a common one at that), based on ranked ordering of values rather than the actual values themselves.
  • BHKE18: "Finally, we used a serum T threshold concentration of 2 nmol/L to identify a group of female athletes with ‘high T’ levels, for comparison against the results of athletes with T levels of less than 2 nmol/L (‘normal T’ levels).
    • Where does 2 nmol/L come from? The IAAF regulations identify 5 nmol/L. 
It seems logical that if BHKE18 could have simply applied the same statistical methods of BG17 to the new dataset (minus the bad data) and obtained the same or similar results, they would have. The presence of so many methodological modifications is a big red flag.

BHKE18 clearly represents a do-over from a flawed study. But the authors double down and characterize it as somehow being an independent verification of the flawed study:
In conclusion, our complementary statistical analysis and sensitivity analysis using a modified analysis population shows consistent and robust results and has strengthened the evidence from this study, where we have shown exploratory evidence that female athletes with the highest T concentration have a significant competitive advantage over those with lower T concentration, in 400 m, 400 m hurdles, 800 m and hammer throw, and that there is a very strong correlation between testosterone levels and best results obtained in the World Championships in those events. A similar trend is also observed for 1500 m and pole vault events.
Thee results of BHKE18 are not the same as BG17, which were the basis for the IAAF regulations. From where I sit this situation looks like this:
  • BG17 relied on flawed data and questionable methods and arrived at a set of results that formed the basis for IAAF regulations;
  • When the data was challenged and errors identified, it seems logical that the methods of BG17 could not reproduce the results of BG17 using the new dataset (without the flawed data);
  • But the IAAF regulations had already been released, focused on four specific events identified based on BG17;
  • Altering the regulations based on errors in BG17 would of course mean admitting that BG17 was flawed in important respects, undercutting the basis for the regulations and IAAF;
  • So it seems that a considerable variety of new methods were introduced in BHKE18 that allowed the reduced-form dataset to plausibly approximate the results of BG17;
  • The regulations thus stand as written;
  • The new conclusions of BHKE18 are characterized as reinforcing BG17, giving at least a surface impression of a greater scientific basis for the regulations. In fact, the opposite has occurred.
This episode helps to illustrate why it is not a good idea to have an organization responsible for implementing desired regulations to be in charge of performing the science that produces the evidence on which those regulations are based. This would seem obvious, but has not really taken hold in the world of sports governance.

There is of course a need for BJSM to require the authors to release all data and code for both papers. One question that independent researchers will want to ask is how the results look when the methods of BG17 are applied to the data of BHKE18. Why were the methodological innovations introduced and what were their quantitative effects? This is the basic sort of independent check that makes science strong.

It is a fascinating case and no doubt has a few more twists and turns to come.It'll make for a great case study when all is said and done.

Thursday, July 12, 2018

A Call for Bermon and Garnier (2017) to be Retracted

The New York Times has a story just out on an analysis we've done on a recent IAAF study. Take a seat, this is a bombshell and these are my individual views on it.

Earlier this year, the IAAF announced new regulations governing natural testosterone levels in female athletes. One of the few academic studies that the regulations refer to is Bermon and Garnier (2017, hereafter BG17), conducted by two IAAF researchers and published in the British Journal of Sports Medicine.

Earlier this year several of us (me, Ross Tucker and Erik Boye) formally asked Drs. Bermon and Garnier to release their data (the part not involving private medical data) for purposes of independent replication. Dr. Bermon shared with us a subset of that data last week.

What the shared data shows is absolutely remarkable and has led to the three of us submitting a "Discussion" (here in PDF, as submitted except for a few typos fixed and page numbers added) to BJSM calling for BG17 to be formally retracted.

Here is what we wrote in that submission:
Due to the pervasiveness of problematic data we are calling for Bermon and Garnier (2017) to be retracted immediately by the authors and by BJSM. If a new analysis is subsequently completed and submitted for publication, we request that it be done so only with a full, independent audit of the underlying data and results by a team committed to keeping private the associated medical data. Further, upon publication, any such analysis should also in parallel publish performance data (i.e. not the medical data with privacy concerns) such that replication of this part of the analysis is possible by any independent scholar.

This case illustrates the importance of data sharing in science as well as the role of independent checks on data with policy or regulatory significance. We encourage BJSM to adopt immediately a more rigorous policy on data availability consistent with best practices among scientific publishers. Mistakes happen. Science is robust because they can be corrected
We identified 3 types of errors in their data:
  •  Duplicated athletes: more than one time is included for an individual. In each of these instances, more than one time from the 2011 and 2013 World Championships is included for the same athlete, contrary to the paper’s stated methods;
  • Duplicated times: the same time is repeated once or more for an individual athlete, which is clearly a data error;
  • Phantom times: no athlete could be found with the reported time for the event.
We also identified the inclusion of times from Russian athletes who had been disqualified due to doping. The Table below shows a summary of the number of problematic data points we found for four events in the BG17 analysis (400m, 400mH, 800m, 1500m).


We found between 17% and 33% problematic data in the four women's events and suggested that such errors may be present throughout other women's and men's data. This is unacceptable in a peer-reviewed scientific paper. Thus, we have called for retraction, as a matter of basic scientific integrity. It's not a difficult call.

Much to our surprise we subsequently learned that Dr. Bermon and three colleagues had published a new letter at BJSM just days before our submission (which I'll call BHKE18 hereafter). From all indications, BHKE18 represents a "do over" after they realized that they had serious data problems in the original work. 

BHKE18 unambiguously also confirms our identification of bad data. Just compare the number of data points included in BHKE18 versus BG17 shown in the graph below.

There are fully 220 data points eliminated from one analysis to the next, representing ~17% of the total. The elimination of data (which BHKE18 alludes to in passing as some double counting in BG17) clearly supports our critique.

And yet, the elimination of problematic data points still does not reconcile with our re-creation of the BG17 dataset for the four events that we looked at closely.

Data points for four women's events
BG17 BHKE18 PTB18
400 m 67 62 45
400 m H 67 52 48
800 m 64 56 53
1500 m 66 55 51

It appears that there remains problematic data. Further, the new letter is not peer reviewed, nor are its data publicly available for replication. By not being candid about their data errors in BG17, Bermon and colleagues have added confusion on top of confusion. This is not how science is supposed to work.

Mistakes are made, it is inevitable. What matters is what happens after that.

Here is what my colleague Erik Boye, Oslo University Hospital, says about this episode:
A set of data normally follows publications like BG17. The conclusions are linked to the data and their interpretation and the data must be made available to the general public. That is basic in science. If now the authors have received some help to understand that their data are fraught with errors they should call for a retraction and resubmit a new paper with new data if they so wish. We have pointed out this to the IAAF and to the publisher. None of them appear to handle this well. It is unacceptable that the paper stands and that a few people are informed that there were serious errors attached to the data and that unseen changes have been made to the data set. Furthermore, there is no sign that the new set of data has been subjected to any more of a critical review or that it will be released for external scrutiny.

For this reasons we should insist that scientific standards and rules are followed. In my practice at editorial boards (the EMBO and FEBS publications) I am certain that such a faulty data set would have released a demand for a retraction, with the possibility of a resubmittal.
I agree 100%. There is only one acceptable outcome here. BG17 must be retracted by BJSM. This could be done by a request from the authors or by BJSM itself. You do not get a "do over" in research when such pervasive errors are made. If Bermon et al. wish to submit another analysis for peer review, I'd expect that the data should be provided and a full audit done prior to publication.

By all indications neither BG17 nor IAAF intend to retract the paper. This says something about conflicts of interest in research, I would think. Thus, the ball is in the BJSM court. This will be a test of scientific integrity standards at BJSM. I hope they pass, for BJSM and for research integrity sake.

The IAAF analysis is far to important to be treated in such sloppy fashion. I'll be following up on the significance of the flawed data, IAAF's refusal to retract and what it might mean for the fate of the IAAF T regulations in days to come.

Tuesday, June 5, 2018

IAAF Opens Up on Testosterone: Some Reactions

My experiences are that sports organizations rarely like to engage in public. However, this norm seems to be evolving, perhaps a motivated both by necessity and a by a newer commitment to engagement among forward-thinking sports administrators.

The IAAF, via one of its lawyers, Jonathan Taylor of Bird & Bird, has written a lengthy response to a Sports Integrity Initiative article on proposed new testosterone regulations. That Sports Integrity Initiative commentary can be found here. I am less interested about the back-and-forth than I am in what the IAAF response says about their proposed regulatory approach to testosterone regulation.

In this post I offer a few thoughts on the new IAAF arguments and applaud their commitment to public engagement. In that spirit, if Mr. Taylor or IAAF wish to comment here, I'm happy to host their views. Sport is better through such engagement, even (especially) when there is disagreement that can be clearly articulated.

Can IAAF Regulate Sport According to Athlete Biological Characteristics?

The answer here is clearly "yes."

Sports organizations routinely segregate athletes by biological characteristics, most obviously by weight classes in boxing and wrestling, and of course systematically in the Paralympics.

There are two logical fallacies here that are worth discarding up front, one typically advanced by opponents to T regulations and one advanced by IAAF in support of T regulations. They are:
  • Fallacy #1: Governing bodies do not (generally) regulate other "natural advantages" so IAAF cannot regulate T. 
  • Fallacy #2: Governing bodies do (sometimes) regulate other "natural advantages" so IAAF should regulate T.
The issue here is not going to be settled by invocation of general principles, but rather, the specific question of whether it is appropriate for IAAF to regulate women's athletics based on endogenous levels of T across four events.

What a big picture view can tell us however is that biological regulation of athletes in the disciplines of athletics is incredibly unique, and T would be the only biological characteristic that is regulated in all of the Olympic sport of Athletics. This fact does not determine an outcome, but it should set a high bar for approving any such regulatory action. 

Are the Male/Female Classifications in Athletics Regulated According to Biology?

This is tricky. The answer however is clearly "no."

The T regulations are an effort to regulate the male/female classification according to a biological characteristic. But at the moment, and for much of recent years, there has been no such regulation in place. Thus, the male/female is not regulated at present according to biological characteristics.
  • Do men and women have different biological characteristics? Of course
  • Are men typically faster and stronger? Of course
Male and female are genders with a strong, but not perfect, correlation with the biology of sex. One important reason for this imperfect correlation is that male and female are discrete categorizations in sport competition, whereas the biology of sex is not discrete. 

Taylor observes that the parties to the Chand case (2015, here in PDF) all agreed that it is appropriate to distinguish male and female classifications because males enjoy such a performance advantage that virtually all females would be excluded from elite competition. This issue need not be debated.

But none of this helps in resolving questions about T regulation. The challenge at hand is to determine eligibility for participation in male and female classifications. To state that males and females compete in different classifications is simply to set the stage. We should beware circular reasoning. 

Right now, society outside of sport does all the work of determining who is female and who is male for purposes of elite sports competition. This work embodies complex social processes that integrates considerations of biology, culture, politics, law and more into determining who gets classified as female and who as male. IAAF is not satisfied with how society is doing this work and is seeking to create its own regulations. (As a comparison, not long ago the sport of gymnastics decided that it was unhappy with how society was accounting for the ages of its athletes and so internalized age certification as a regulatory process.)

But make no mistake, at present neither males nor females are classified according to biological characteristics by the IAAF. That is what the proposed regulations are about. 

Does Science Distinguish Between Males and Females?

The short answer is "no." There is simply no single biological characteristic - chromosomes, hormones, whatever -- that uniquely and unambiguously distinguishes the biological sexes. This point is not particularly controversial, even by IAAF.

However, IAAF appears to be somewhat conflicted on this topic. The issues here are not male vs female, but female vs. female. This is clearly explained in the CAS Chand decision which (at 51) noted: "the Regulations do not police the male/female divide but establish a female/female divide within the female category."

The key issue here is not whether some females have biological characteristics more typically found among males, but whether those specific biological characteristics are clearly associated with a performance difference between females of a magnitude similar to those typically observed between males and females. Please read that sentence again.

In his response, Taylor introduces a biological characteristic into the debate that is not mentioned in the IAAF T regulations: testes. He writes:
the physical advantages enjoyed by male athletes are due to the fact that they have testes that produce testosterone in amounts that circulate in serum in the range 7.7 to 29.4 nmol/L, whereas female athletes have ovaries that produce much lower levels of testosterone, in the range 0.12 to 1.79 nmol/L. (RP: Based on a forthcoming, but not yet available paper by Handelsman et al.)
Males have testes, females have ovaries. OK, got it. Taylor then writes:
Due to conditions referred to as ‘differences in sex development’ (most often, 5-α reductase deficiency, or partial androgen insensitivity), an XY baby’s testes may not descend from the abdomen, so that it presents on birth with female or ambiguous genitalia, and so may be assigned the female sex. At puberty, however, the testes start producing the much larger levels of testosterone mentioned above, which (unless the XY female is completely androgen-insensitive) will have an androgenising effect on her body and will increase her circulating haemoglobin, in the same way as happens to an XY male at puberty.
So we have an individual with a "condition" called DSD ("differences in sex development") who has testes but "may be assigned the female sex." At puberty her body is responds the same way as an XY male. So is Taylor implying that she is actually a male (i.e., with testes)?

Taylor further writes:
the ‘natural physiology’ of most DSD athletes includes male gonads (testes) that produce levels of circulating testosterone not in the normal female range (0.12 to 1.79 nmol/L in serum) but in the normal male range (7.7 to 29.4 nmol/L), producing (if the athlete is not androgen-insensitive) lean body mass and levels of circulating haemoglobin well above the normal female range and rather in the normal male range.
The language here is important (and confused). "Male gonads" -- can individual body parts have their own genders? Can a woman have male body parts? Can a man have female body parts? If IAAF wants a gonad policy they should call it a gonad policy. The presence of gonads/testes is completely irrelevant in the proposed regulations, as it is focused on testosterone levels.

This issue is important because the proposed IAAF regulations stress that they are not seeking to classify athletes as male or female:
These  Regulations  exist  solely  to  ensure  fair  and  meaningful  competition  within  the  female classification, for the benefit of the broad class of female athletes.  In no way are  they  intended  as  any  kind  of  judgement  on  or  questioning  of  the  sex  or  the  gender  identity of any athlete.
Taylor's introduction of testes would seem to betray this claim. He writes:
If it is not fair and meaningful for a female athlete to have to compete with a male athlete whose gonads produce 10-30 times more testosterone than she does, so too it is not fair and meaningful for that female athlete to have to compete with a DSD athlete whose gonads also produce 10-30 times more testosterone than she does.
The IAAF regulations explain that a female athlete who does not meet the regulatory standard "will not be eligible to compete in the female classification in a Restricted Event at an International Competition" but would be eligible to compete in the male classification.

If this is not sex testing and classification according to physical characteristics, I don't know what is.

What about Performance?

Taylor's response emphasizes biological differences and says very little about performance or how it is related to testosterone (presumably because he was writing a response, so fair enough).. However, performance is absolutely essential to the IAAF case.

The CAS ruled against IAAF in the Chand case because the evidence available did not support the claim that high testosterone levels in certain female athletes were associated with a difference in performance between these women and other women that was similar to the difference between male and female.

CAS explained (527):
The Panel considers the lack of evidence regarding the quantitative relationship between enhanced levels of endogenous testosterone and enhanced athletic performance to be an important issue. While a 10% difference in athletic performance certainly justifies having separate male and female categories, a 1% difference may not justify a separation between athletes in the female category, given the many other relevant variables that also legitimately affect athletic performance. The numbers therefore matter. 
Because the performance numbers matter, levels of testosterone (or, unmentioned in the regulations, the presence of testes) are by themselves irrelevant. CAS judged that it is only if high levels of testosterone can be associated with a performance advantage of the order enjoyed by men over women that regulation might make sense.

CAS further explained (528):
However, in order to justify excluding an individual from competing in a particular category on the basis of a naturally occurring characteristic such as endogenous testosterone, it is not enough simply to establish that the characteristic has some performance enhancing effect. Instead, the IAAF needs to establish that the characteristic in question confers such a significant performance advantage over other members of the category that allowing individuals with that  characteristic to compete would subvert the very basis for having the separate category and thereby prevent a level playing field. The degree or magnitude of the advantage is therefore critical. 
This is where things get a bit sticky.

Upon receiving this judgment IAAF sought to commission research on the relationship of testosterone and performance. Rather than invite independent researchers to conduct such research, IAAF conducted it internally. This approach is clearly problematic because IAAF, as an interested party in the outcome, can hardly be called independent. Thus, IAAF handicapped itself from the outset.

The resulting research (much discussed on this blog) is Bermon and Garnier (2017). Not surprisingly, IAAF claims that its results support further regulation of testosterone. A close look doesn't really support this claim.

The most striking conclusion of this paper -- taking it at face value -- is that the resulting statistics come no where close to the 10% difference in athletic performance cited by CAS as an appropriate basis for regulation. In fact, the paper found no performance difference worth regulating in 19 of 23 athletic events in which women compete.

Think about that. After all of the talk of the overwhelming importance of testosterone to athletic performance, an internal IAAF study designed to look for such differences could not justify testosterone regulations for almost all women's events. Clearly, testosterone is not the magical athletic elixir claimed by some.

Of the four events that IAAF decided to regulate (400m, 400mH, 800m and 1500m), the Bermon and Garnier study found performance differences between the highest and lowest tertiles to be, respectively for each event: 1.5%, 3.1%, 1.6% and 0.3% (from Table 6). Only the first 3 were claimed to be statistically significant differences. These differences are similar to those that led CAS to suspend the original IAAF regulations at dispute in the Chand case, and far removed from 10%.

Given these numbers, it is surprising that IAAF has sought to again implement regulations that were previously unsuccessful at CAS. There seems to be no case here. Perhaps IAAF has some additional science in its back pocket.

Finally, on performance data, a last note. Along with Ross Tucker and Erik Boye, I have requested the underlying performance data of Bermon and Garnier. This is a normal request in research and should be expected of anyone who publishes peer reviewed research. Thus far IAAF has not released the data. This is deeply troublesome. We have engaged the journal's editor and will push this as far as it takes. As CAS explains, the numbers matter.

Bottom Line

It is very good to see IAAF (or its representatives) engaging in public. This is good for sport governance, for athlete rights and for the effective role of evidence in decision making. In this instance, I applaud Jonathan Taylor for his lengthy defense of the newly proposed IAAF regulations. He provides a further window into their basis and justification. They also raise some important issues worthy of further debate and discussion.

Monday, May 14, 2018

Wisdom on College Hoops from Carlon Brown

This series of comments below from Carlon Brown ((@carlonautentico) on Twitter offers a fantastic perspective on what college basketball prepares an athlete for and what it does not. Brown played at the University of Colorado and professionally in the US and overseas.

It'd be great to get him to my class next year. The perspective below is smart, have a read.