Monday, August 20, 2007

When Statistics slams into Science

In the near future my first scientific publications should be coming out. The first is on quantile regression applications for understanding cotton fiber length distributions. The others build on my PhD thesis. The problem is that according to this article the probability gods would say that my research findings are false. John Ioanniddis uses equations for calculating statistical error rates to show that most articles are wrong. His article reads like pilpul - in this case statistics of statistics, but it made me pause and think about the validity of the things I have written. I should confess that some of the claims I made in my masters thesis have already been debunked and it was a painful process.

Truth in scientific papers rests on "proving" that results are "statistically significant." This is not as simple as usually presented. First, statistically most tests are based on the probability of the data given a null hypothesis. This probability is called a p-value. Null hypotheses are usually the opposite of what you really hypothesize. They are usually assuming that the groups or treatments have no effect.

For example, suppose that you have been measuring lung cancer. Among the patients there are four groups: Chain smokers with asbestos sheets, smokers, non-smokers, and vegans from the garden of eden. Suppose the results of the study (Obviously I made this up) were that 100% of chain smokers with asbestos underwear had lung cancer, 20% smokers, 10% non smokers, and 0% of Utopian vegetarians. The null hypothesis would be that all four groups have the same cancer rate, which is on face value a ridiculous idea. We "know" that it is false, the significance of the findings are determined on how false it is. Now the p-value is the probability of a sample of people that match our results given the null hypothesis is TRUE.

This is a convoluted way of thinking, but is at the core of most science. We are assuming that the study is not of the entire population, just a sample, and that different samples might give different estimates of cancer rates depending on what people were included in the study. If we were to repeat this study a million times with a different random sample each time, most estimates of cancer rates would be close to the true value, but some would vary wildly, depending on the people picked to be included. Statistical tests find the sampling frequencies given a population value.

For example, say there were 1000 utopian vegetarians in the world total and 2 of them had lung cancer and only 1000 people are in the original study. So 1000 out of 6.6 billion people are utopian vegetarians and very few are likely to be included in the study. If say two are chosen each time randomly, then almost every time those two are not going to have lung cancer. But, there is a chance that both of the sick utopians would be picked and then the estimate of utopian cancer rate would not be close to 2/1000 but 100%. Now to prove significance of the research results of this made-up study we would set the population frequencies equal for all groups, our null hypothesis. The probability of getting a sample as divergent from that as we saw (100, 20, 5, 0) would be pretty low, assuming the true population values are equal (25, 25, 25, 25).

As pointed out by the Ioanniddis: the difficulty is that not all samples are done well; the biases of the researchers affect what the null hypothesis is; sample sizes may not be large enough; too often experiments aren't replicated; and statistical significance is easier to detect with large effects, not small subtle ones.

What he neglects is Likelihood. Which I only wish I could write a book about, luckily A.W.F. Edwards already did.

Friday, August 17, 2007

Colleen Renée Gardunia

We signed documents for a birth certificate and social security number so "girl" gardunia is now officially Colleen Renée.

Now that it is done I realize that there are a lot of e's in that name as well as the accent. Accent's don't work that great with Anglo keyboards, but can be written by holding the ALT key and pressing 0233. So Colleen's middle name is Ren[ALT-0233]e. She is like a Sci-fi robot.

Wednesday, August 08, 2007

Pictures of the baby

Well, I spoke too soon. Leila isn't sure about the name yet.

It's a baby Girl!!!

Trial: Let me know if it works.

The baby has landed!!!

Leila woke me up sometime between three and four in the morning and told me she was having contractions. I started to go back to sleep, contractions have come and gone all week, when I realized that she meant CONTRACTIONS! Luckily my Mom changed her flights and so didn't need to fly out of Chicago today so she drove Grandma to the airport and took care of the kids while Leila and I were at the hospital.

I won't describe any of the harrowing details of the process, but I will say thank goodness for Epidurals. Things went so much smoother after Leila wasn't completely overcome by pain.

The baby was born at 10:55 AM and weighed 7 lbs and 6 oz. She was bright eyed and awake, but quiet. It isn't 100% official yet, and I hope that I am not going out on a limb, but we are calling her Colleen Renée Gardunia.

Tuesday, August 07, 2007

Bowed Radio and "The Soloist"

I thought I would make a shameless plug for Bowed Radio. It is an odd mix of hard rock goth cello, bluegrass, modern orchestral, oriental/indian/iranian/egyptian classical music, and jazz violin music put together as a podcast each week.

Some of the music is downright wierd - like any of the goth cello stuff; one song was entitled "My mother was an opium smoker." I bet that didn't go over great at Mother's day brunch. But most of it is fresh and beautiful. I love the middle eastern music and the Indian/Chinese improvisations.

I don't know what the host, Mark Allender, does for a living. His group performs on one of the podcasts and he says they try to make as much noise as possible without pissing off the audience too much. Not exactly elevator music.

It has motivated me to expand my violin playing. I have just had the hardest time motivating myself to pick up scale books, etudes, and student concertos again. When I do practice, and it is not near often enough, I try to imitate some of the fiddle or eastern music I have been listening to. Since mom is here we all pulled out our instruments, even Emily. Mom played a song she learned in High School and we played it back and forth to each other. It was fun.

Violin music stopped being fun for me sometime ago. When I was at BYU I worked so hard at playing that first year and burned out on it completely. My teacher, Wolfgang TsouTsouris, pushed me so hard and instead of getting better, I just got so tense. I couldn't seem to get in tune. I would play the same part over and over again hours a day and it was never right. When I performed it was worse. It was like watching a puppet play and I could only make gross changes to force my distant fingers to adjust to the music as it crescendoed out of control.

I ran into a book called "The Soloist" by Mark Salzman that captured that feeling so exactly that I had to buy it. I reread it regularly. The main character was a child progeny that broke down and is unable to perform any more. In the beginning of the book, he is a mediocre musical teacher at a Californian university, until he gets a child prodigy as a student, is put on a jury for a murder trial, and finally gets a cat.

Like the main character, I have my violin ritual that has replaced much of my practice time. I now get out my violin and improv on something that I have listened to or I try to learn some fiddle songs. I think about how to teach my violin student and I let myself have fun while I play. When it stops being fun, I put it away.