New York Times has a story just out on an analysis we've done on a recent IAAF study. Take a seat, this is a bombshell and these are my individual views on it.
Earlier this year, the IAAF announced new regulations governing natural testosterone levels in female athletes. One of the few academic studies that the regulations refer to is Bermon and Garnier (2017, hereafter BG17), conducted by two IAAF researchers and published in the British Journal of Sports Medicine.
Earlier this year several of us (me, Ross Tucker and Erik Boye) formally asked Drs. Bermon and Garnier to release their data (the part not involving private medical data) for purposes of independent replication. Dr. Bermon shared with us a subset of that data last week.
What the shared data shows is absolutely remarkable and has led to the three of us submitting a "Discussion" (here in PDF, as submitted except for a few typos fixed and page numbers added) to BJSM calling for BG17 to be formally retracted.
Here is what we wrote in that submission:
Due to the pervasiveness of problematic data we are calling for Bermon and Garnier (2017) to be retracted immediately by the authors and by BJSM. If a new analysis is subsequently completed and submitted for publication, we request that it be done so only with a full, independent audit of the underlying data and results by a team committed to keeping private the associated medical data. Further, upon publication, any such analysis should also in parallel publish performance data (i.e. not the medical data with privacy concerns) such that replication of this part of the analysis is possible by any independent scholar.We identified 3 types of errors in their data:
This case illustrates the importance of data sharing in science as well as the role of independent checks on data with policy or regulatory significance. We encourage BJSM to adopt immediately a more rigorous policy on data availability consistent with best practices among scientific publishers. Mistakes happen. Science is robust because they can be corrected
- Duplicated athletes: more than one time is included for an individual. In each of these instances, more than one time from the 2011 and 2013 World Championships is included for the same athlete, contrary to the paper’s stated methods;
- Duplicated times: the same time is repeated once or more for an individual athlete, which is clearly a data error;
- Phantom times: no athlete could be found with the reported time for the event.
We also identified the inclusion of times from Russian athletes who had been disqualified due to doping. The Table below shows a summary of the number of problematic data points we found for four events in the BG17 analysis (400m, 400mH, 800m, 1500m).
We found between 17% and 33% problematic data in the four women's events and suggested that such errors may be present throughout other women's and men's data. This is unacceptable in a peer-reviewed scientific paper. Thus, we have called for retraction, as a matter of basic scientific integrity. It's not a difficult call.
Much to our surprise we subsequently learned that Dr. Bermon and three colleagues had published a new letter at BJSM just days before our submission (which I'll call BHKE18 hereafter). From all indications, BHKE18 represents a "do over" after they realized that they had serious data problems in the original work.
BHKE18 unambiguously also confirms our identification of bad data. Just compare the number of data points included in BHKE18 versus BG17 shown in the graph below.
There are fully 220 data points eliminated from one analysis to the next, representing ~17% of the total. The elimination of data (which BHKE18 alludes to in passing as some double counting in BG17) clearly supports our critique.
And yet, the elimination of problematic data points still does not reconcile with our re-creation of the BG17 dataset for the four events that we looked at closely.
|Data points for four women's events|
|400 m H||67||52||48|
It appears that there remains problematic data. Further, the new letter is not peer reviewed, nor are its data publicly available for replication. By not being candid about their data errors in BG17, Bermon and colleagues have added confusion on top of confusion. This is not how science is supposed to work.
Mistakes are made, it is inevitable. What matters is what happens after that.
Here is what my colleague Erik Boye, Oslo University Hospital, says about this episode:
A set of data normally follows publications like BG17. The conclusions are linked to the data and their interpretation and the data must be made available to the general public. That is basic in science. If now the authors have received some help to understand that their data are fraught with errors they should call for a retraction and resubmit a new paper with new data if they so wish. We have pointed out this to the IAAF and to the publisher. None of them appear to handle this well. It is unacceptable that the paper stands and that a few people are informed that there were serious errors attached to the data and that unseen changes have been made to the data set. Furthermore, there is no sign that the new set of data has been subjected to any more of a critical review or that it will be released for external scrutiny.I agree 100%. There is only one acceptable outcome here. BG17 must be retracted by BJSM. This could be done by a request from the authors or by BJSM itself. You do not get a "do over" in research when such pervasive errors are made. If Bermon et al. wish to submit another analysis for peer review, I'd expect that the data should be provided and a full audit done prior to publication.
For this reasons we should insist that scientific standards and rules are followed. In my practice at editorial boards (the EMBO and FEBS publications) I am certain that such a faulty data set would have released a demand for a retraction, with the possibility of a resubmittal.
By all indications neither BG17 nor IAAF intend to retract the paper. This says something about conflicts of interest in research, I would think. Thus, the ball is in the BJSM court. This will be a test of scientific integrity standards at BJSM. I hope they pass, for BJSM and for research integrity sake.
The IAAF analysis is far to important to be treated in such sloppy fashion. I'll be following up on the significance of the flawed data, IAAF's refusal to retract and what it might mean for the fate of the IAAF T regulations in days to come.