Wednesday, October 26, 2005

Sometimes professors just don't listen

I have had this idea to use quantile-quantile comparisons to analyze AFIS distribution data. AFIS is a machine that takes a cotton sample runs it through a wheel with little combs and separates the fibers. The individual fibers are blown into an airstream in a narrow tube. In the middle of the tube is an optical sensor that registers when a fiber passes, its length, and thickness. It can tell the difference between tangled and single fibers as well as trash. It then classifies the fibers into length, fineness, and maturity categories, while keeping count on trash and tangles, called neps. The distribution of fibers of different lengths makes a big difference on spinning quality. So people want to select for the best distribution of fibers, not just the best mean length or upper quartile length.

Quantiles divide the distribution into regular divisions like the median is the 50% quantile. The mean is not a quantile. It is not a rank. Anyway, the distribution, if continuous, can be divided into as many quantiles as one could want. A box plot is a representation of the 25% quantile, 50% quantile, and 75% quantile making the box. The whiskers are usually the 5% quantile and 95 or 97.5% quantile. The length values for the same quantiles for different distributions can be compared. If they are the same then the distributions are identical. If different the pattern of differences is how the distributions differ.

Anyway, to make a short story even longer and more boring, I went to a Statistics professor to talk about my idea. He thought it was great, but he never has actually got around to hearing how I want to do it. He has lectured me for two visits on the details of quantiles and distributions. Not that it hasn't been helpful, but he doesn't realize that I already know what he is trying to teach me.

I want to know if what he is calling a p-p plot, or a sample quantile-quantile graphical approach can be extended to more than one comparison kinda like covariance analysis. The y would be the quantiles of the check cultivar. The x's the quantiles of the experimental cultivar. The equation would be y = B1*[Year]*[X] + B2*[Replication]*[X] + B3*[Genotype]*[X] + [error variance], where [] notes matrix. The test would be to see if the slopes for genotype, replication, and years are equal, as well as the intercepts. If this is not right, then the other test I thought of would be to look at the deviations from x=y for each distribution. He did tell me about graphing the quantiles as y-x = x so that it is around 0 instead of a slope of 1, then the area under the curve can be calculated, deviations again. This number can be treated like a Wilcoxon type statistic. I just need to read up on Wilcoxon statistics I vaguely remember them in terms of nonparametric statistics.

I will put pictures up for the steps in the next few days.


Well if that isn't boring enough for you I don't know what is. I think with a few more posts like this I can cut my readership back to 0. I need to include a few references to swimsuits or hot chicks or cute girls or something in order to get someone to read the site. Statistics just isn't sexy enough.

No comments: