© 2008 American Thoracic Society doi: 10.1164/rccm.200710-1547ED
Gene Expression Profiling in Chronic Obstructive Pulmonary Disease
University of Pittsburgh
Brigham & Women's Hospital The launch of the human genome project and the development of gene expression profiling analyses at nearly the same time have had a dramatic effect on scientific inquiry in the molecular basis of human diseases. Previously, when one wished to study the effect of a disease process or environmental perturbation on various genes in the human body, it was necessary to first isolate the gene, then to study each individual gene in series. Now we can perform a parallel analysis using microarray or SAGE (serial analysis of gene expression) approaches on thousands of genes in the body simultaneously. However, questions remain about the ability of these new technologies to impact clinical medicine in a meaningful way. Gene expression profiling studies using microarray approaches have the potential to profoundly affect the understanding of human diseases in identifying candidate genes important in pathogenesis of diseases, signaling pathways, and gene transcription regulatory networks. There have been several recent studies attempting to identify novel pathways involved in the pathogenesis of chronic obstructive pulmonary disease (COPD) (1–6) using gene expression profiling analyses. However, there was minimal overlap between differentially expressed genes among the different datasets. This issue highlights the complexity of expression profiling analysis in a human disease, such as COPD, with tissue heterogeneity and variable clinical phenotype. The nonoverlapping gene datasets from these studies are likely due to several factors, including differences in sample acquisition, disease severity, sample size, tissue and cell components, and expression platforms. Nevertheless, these observations have provided useful information and insights into the pathogenesis of COPD. In this issue of the Journal (pp. 402–411),Wang and colleagues (7) provide comprehensive gene expression profiling on a large sampling of human lung tissues from both non-COPD and COPD patients in various GOLD (Global Initiative for Chronic Obstructive Lung Disease) stages, with the goal of identifying novel candidate genes and pathways for disease pathogenesis. Unique features of this study include a large group of individuals with well-characterized clinical phenotype including detailed lung function and quantitative lung morphometry. This study selected samples from a population of 185 tissue samples, which, although biased in the sense that each sample is from a patient who underwent lung resection, is more representative of the global COPD population than earlier studies. Wang and colleagues demonstrate that among the 203 significant gene transcripts in their "COPD" signature set, genes involved in tissue remodeling and repair, such as extracellular matrix, apoptosis, and inflammation, were important in the pathogenesis of COPD. One major limitation of gene expression profiling studies involves the small sample size examined. This is a particular problem with human microarray studies in which the overall goal of making inferences about the general population is limited by the fact that gene expression information comes from a limited subset of the population. In many cases this leads to the phenomenon of overfitting, occurring when a large amount of gene expression data is obtained from a relatively small population. In cases of overfitting, the gene expression information obtained will fit a small population extremely well, and is often easily validated by reverse transcriptase–polymerase chain reaction, but the results are often not replicable in other follow-up studies. Often, the more genes that are profiled, and the more complicated the analysis, the greater the chances of overfitting. This was the difficulty in many earlier microarray studies for COPD, and may help to explain the divergent results in gene expression between the different studies. The current study by Wang and colleagues is a good step toward the tenet that "more is better" in the field of microarray analysis. Microarray technology may be useful clinically through its use in identifying biomarkers to classify disease processes. The process of classification uses the tools developed in the past 30 years within the data-mining and statistical machine–learning communities to aid in the diagnosis of diseases. One recent example, which utilized the classification strategy in augmenting clinical diagnosis, was the microarray study of tumor samples from patients with early-stage non–small cell lung cancer (8). This study performed gene expression profiling on tumor samples of patients with early-stage non–small cell lung cancer, in an attempt to identify which patients were the most likely to have a recurrence of cancer, and thus would be most likely to benefit from adjuvant chemotherapy. The algorithms employed in that study found specific groups of meta-genes, which could discriminate between lung cancer progressors and nonprogressors. These algorithms had an overall predictive accuracy of between 72 and 79%, which was a substantial improvement over the predictive accuracy of clinical diagnosis alone, which was only 64%. Another way in which the study is important is that it utilized genomic technologies in a way that is immediately clinically applicable. It must be emphasized that classification strategies attempt only to identify biomarkers, not to elucidate pathophysiology. This does not mean that the particular genes identified as biomarkers may not have relevance to a particular disease process but that information regarding pathophysiology must be considered as post hoc, and more information may need to be obtained from subsequent studies. We have attempted to highlight the potency of microarray analysis in the elucidation of novel pathways and insights to an improved understanding of pathogenesis of complex human diseases, such as COPD. This approach undoubtedly will bear fruitful new discoveries as long as we are cautious and cognizant at the same time of the limitations mentioned above in these high-throughput microarray assays and analyses. The "garbage in–garbage out" adage that we often quote in gene expression profiling studies cannot be overstated or underestimated. Future larger-scale multidisciplinary collaborative studies, evolving systems biology analysis, and machine-learning algorithms should further improve our confidence in the data generated from these approaches. In the spirit of the National Institutes of Health Roadmap and multidisciplinary translational research and medicine, thoracic surgeons, pulmonologists, radiologists, physiologists, pathologists, molecular biologists, computational biologists, and bioinformatics analysts must collectively work together to carefully collect human samples from well-characterized, clinically phenotyped patients who then undergo rigorous gene expression profiling assays and analyses. FOOTNOTES Conflict of Interest Statement: Neither author has a financial relationship with a commercial entity that has an interest in the subject of this manuscript. REFERENCES
Related articles in AJRCCM:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||