Specifically, the IAAF argues that very strict scientific protocols must be followed in order for the blood data to have any practical meaning, and these protocols were not introduced until 2009: "any apparent differences in results from one sample to the next have no scientific validity. Those standardised procedures were only introduced in 2009." Consequently, the conclusions that have been drawn based on the pre-2009 data are simply invalid.
However, the IAAF published a peer-reviewed paper in 2011 using the same or very similar data, where IAAF made a contradictory claim, that the blood data could in fact be used to estimated the prevalence of doping among elite athletes. The changing interpretation of validity of blood testing science by the IAAF is notable.
Here is what the IAAF said on November 27, 2015 (at 4.10, p. 14, emphasis added):
[A]ny competent expert would refuse to consider the values from the blood samples collected by the IAAF prior to 2009 as reliable evidence of blood doping, and would certainly have refused to compare the values from one sample in a profile to other samples in the same profile, because those samples were collected before WADA finalised and published the mandatory requirements for collection, transport and analysis of ABP samples that are set out in the WADA ABP Protocol, and so did not all comply with all of those requirements. This did not mean the values were worthless. To the contrary, they could be used to identify athletes who should be targeted for testing, in an effort to collect a urine sample with rEPO present in it, and when (as often happened) rEPO was found in the urine, that adverse finding would then be competent evidence of doping. But it did mean that the pre-2009 blood values could not be said to be reliable evidence of doping in and of themselves, for the reasons just explained.Yet, in a 2011 peer-reviewed paper (Sottas et al. 2011 here in PDF) the IAAF said almost exactly the opposite. Specifically the IAAF noted that in blood testing, "the standardization was rigorous enough for drawing sound conclusions." That 2001 paper looks at the overall data to estimate the prevalence of blood doping among elite athletes, and found a rate of 18-19% among elite track and field endurance athletes, an estimate consistent with a 2011 study buried by WADA and recent research.
Here is an excerpt from that 2011 IAAF paper (emphasis added):
Let us present through a simple example how the prevalence of blood doping can be estimated thanks to hematological data. The hemoglobin of a population composed of undoped Caucasian male endurance elite athletes living at low altitude is well described by a normal distribution, with a mean of 146 g/L and an SD of 9 g/L. If a blood sample is collected from 200 of these athletes, between 1 and 9 of them (4 on average) should present a value higher than 164 g/L. If 30 of these athletes presented a value higher than 164 g/L, then between 21 (11%) and 29 (15%) presented a value that was too high. Only an external cause (doping or a medical condition) can explain this discrepancy. If the prevalence of the medical conditions is known to be low, then doping is the primary cause. Although this simplistic example is biased— superior methods that use or do not use population statistics of the relevant cause (here doping) have been described elsewhere (3, 10 )—it shows nevertheless that it is not necessary to have a test able to easily identify identify drug cheats to estimate the prevalence of test results attributable to the cause “doping”: A biomarker of doping with a known discriminative aptitude and the knowledge of the prevalence of confounding causes is enough. Today, the high standardization of the blood tests makes possible the estimation of the prevalence of blood doping by using epidemiological measures of occurrence.The case being made by the IAAF in 2015 is apparently that the leaked (or perhaps stolen/hacked) data obtained by the German documentary film maker (ARD) is unreliable because it contains too many false negatives - that it athletes with seemingly suspicious blood values which have an entirely innocent explanation. The IAAF singles out Paula Radcliffe as an example of such a case.
While the case of Radcliffe is interesting, and worth a future post itself, as anyone familiar with statistics knows, arguing about a large data set by picking out specific cases is not a good route to robust interpretation of evidence.
Here is what IAAF can do to prove its case of many false positives: Apply the population-based statistical methodology that it used in Sollas et al. 2011 to the leaked data obtained by the ARD. If there are numerous "false positives" in the leaked data then the population-level analysis should show an incidence of doping that is statistically significantly higher than that reported in Sollas et al. 2011. This is just a logical point, easily addressed using mathematics.
If the population data is not distinguishable between the two datasets, then the IAAF has no case to make about the lack of robustness of the data set.
This is simple mathematics, and could easily be conducted by an independent party with all relevant identification metadata stripped from the data sets. In fact, to conduct the statistical analysis, which data set is which would not even have to be identified to the independent statistician.
Until IAAF resolves its internal inconsistencies on blood doping between 2011 and 2015, and applies simple and obvious statistical tests on the blood data, the organization will have a hard time convincing outside observers that it is worthy of any degree of trust in its claims.
Math can go a long way here. Use it.