Graduate Grumblings: When Statistics slams into Science

In the near future my first scientific publications should be coming out. The first is on quantile regression applications for understanding cotton fiber length distributions. The others build on my PhD thesis. The problem is that according to this article the probability gods would say that my research findings are false. John Ioanniddis uses equations for calculating statistical error rates to show that most articles are wrong. His article reads like pilpul - in this case statistics of statistics, but it made me pause and think about the validity of the things I have written. I should confess that some of the claims I made in my masters thesis have already been debunked and it was a painful process.

Truth in scientific papers rests on "proving" that results are "statistically significant." This is not as simple as usually presented. First, statistically most tests are based on the probability of the data given a null hypothesis. This probability is called a p-value. Null hypotheses are usually the opposite of what you really hypothesize. They are usually assuming that the groups or treatments have no effect.

For example, suppose that you have been measuring lung cancer. Among the patients there are four groups: Chain smokers with asbestos sheets, smokers, non-smokers, and vegans from the garden of eden. Suppose the results of the study (Obviously I made this up) were that 100% of chain smokers with asbestos underwear had lung cancer, 20% smokers, 10% non smokers, and 0% of Utopian vegetarians. The null hypothesis would be that all four groups have the same cancer rate, which is on face value a ridiculous idea. We "know" that it is false, the significance of the findings are determined on how false it is. Now the p-value is the probability of a sample of people that match our results given the null hypothesis is TRUE.

This is a convoluted way of thinking, but is at the core of most science. We are assuming that the study is not of the entire population, just a sample, and that different samples might give different estimates of cancer rates depending on what people were included in the study. If we were to repeat this study a million times with a different random sample each time, most estimates of cancer rates would be close to the true value, but some would vary wildly, depending on the people picked to be included. Statistical tests find the sampling frequencies given a population value.

For example, say there were 1000 utopian vegetarians in the world total and 2 of them had lung cancer and only 1000 people are in the original study. So 1000 out of 6.6 billion people are utopian vegetarians and very few are likely to be included in the study. If say two are chosen each time randomly, then almost every time those two are not going to have lung cancer. But, there is a chance that both of the sick utopians would be picked and then the estimate of utopian cancer rate would not be close to 2/1000 but 100%. Now to prove significance of the research results of this made-up study we would set the population frequencies equal for all groups, our null hypothesis. The probability of getting a sample as divergent from that as we saw (100, 20, 5, 0) would be pretty low, assuming the true population values are equal (25, 25, 25, 25).

As pointed out by the Ioanniddis: the difficulty is that not all samples are done well; the biases of the researchers affect what the null hypothesis is; sample sizes may not be large enough; too often experiments aren't replicated; and statistical significance is easier to detect with large effects, not small subtle ones.

What he neglects is Likelihood. Which I only wish I could write a book about, luckily A.W.F. Edwards already did.

Graduate Grumblings

Monday, August 20, 2007

When Statistics slams into Science

1 comment: