© 2006 American Thoracic Society
On the Usage of Principal Components Analysis and Multiple TestingTo the Editor:Butler and colleagues in their recent article (1) chose to use principal components analysis to examine dietary patterns in relation to the onset of persistent cough with phlegm. They identified two principal components defined to constitute a "meatdim sum" pattern and a "vegetablefruitsoy" pattern, which explain 7.3% and 7.2% of the variance, respectively. It was cited that these were identified from components retained on examination of scree plots, factor interpretability, eigenvalues, and percentage variance explained. As the authors did not specify the ranks of the chosen principal components, we are led to question whether the two identified patterns are the first and second components. The right way to interpret the findings from principal components analysis is to consider the first few principal components, which together explain most of the original variation in the data. Kaiser's criterion provides an objective indicator to decide on the number of principal components to be studied (2). The criterion of factor interpretability can be subjective and biased. Using the two chosen principal components, the authors conducted multiple tests of association between quartiles of these principal components and six phenotypes based on the incidence or prevalence of cough and/or phlegm. The same data were also used to measure the association between these principal components and the incidence of asthma. They subsequently reported that the meatdim sum pattern was positively associated with new-onset cough with phlegm (p = 0.02), and found weak associations for more chronic symptoms and incident asthma. A weak association was identified for the vegetablefruitsoy pattern (p = 0.04), which was insignificant on adjustment for nonstarch polysaccharide intake. The same set of data, especially those for the unaffected controls, has been used multiple times, and correction for multiple comparisons was not performed. This is crucial, especially in the field of epidemiology, due to the typically large number of tests performed, to minimize false-positive associations. Although overenthusiastic use of Bonferroni corrections may lower power to identify true associations, a conservative threshold (such as p = 0.01 or 0.001) can be adopted. None of the reported positive findings remain statistically significant even at the higher conservative threshold of 0.01. This issue of multiple testing is relevant again when the authors examined the correlations of food groups and nutrients against the two principal components. We advocate the appropriate use of statistics in epidemiologic studies, where the results for associations are fundamentally identified through statistical rigor.
University of Oxford, Oxford, United Kingdom FOOTNOTES Conflict of Interest Statement: Neither author has a financial relationship with a commercial entity that has an interest in the subject of this manuscript. REFERENCES
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||