Published ahead of print on April 15, 2004, doi:10.1164/rccm.200401-066OC
© 2004 American Thoracic Society
Molecular Signatures in Biopsy Specimens of Lung CancerDepartment of Pathology, Department of Medicine, Department of Radiology, Department of Biomedical Informatics, and Herbert Irving Comprehensive Cancer Center, Columbia University College of Physicians and Surgeons, New York, New York Correspondence and requests for reprints should be addressed to Charles A. Powell, M.D., Department of Pulmonary, Allergy, and Critical Care Medicine, Columbia University College of Physicians and Surgeons, 630 W. 168th Street, Box 91, New York, NY 10032. E-mail: cap6{at}columbia.edu
Gene expression profiles of resected tumors may predict treatment response and outcome. We hypothesized that profiles derived from lung tumor biopsies would discriminate tumor-specific gene signatures and provide predictive information about outcome. Lung carcinoma specimens were obtained from 23 patients undergoing computed tomography-guided transthoracic biopsy or endobronchial brushing for undiagnosed nodules. Excess tissue was processed for gene profiling. We built class prediction models for lung cancer histology and for cancer outcome. The histology model used an F test to identify 99 genes that were differentially expressed among lung cancer subtypes. The histology validation set class prediction accuracy rate was 86%. The outcome model used the maximum difference subset algorithm to identify 42 genes associated with high risk for cancer death. The outcome training set class prediction accuracy rate was 87%. In conclusion, gene expression profiles of biopsy specimens of lung cancers identify unique tumoral signatures that provide information about tissue morphology and prognosis. The use of specimens acquired from lung biopsy procedures to identify biomarkers of clinical outcome may have application in the management of patients with lung cancer. The procedures are safe and feasible; the efficacy and utility of this strategy will ultimately be determined by prospective clinical trials.
Key Words: lung neoplasms microarray analysis of gene expression prognosis Lung cancer is the leading cause of cancer death in the United States, with 187,000 cases and 165,000 deaths expected in 2004 (1). Despite innovations in diagnostic testing, surgical technique, and the development of new therapeutic agents, the five-year survival rate has remained about 1315% throughout the past three decades. Factors contributing to the low lung cancer survival rate include the small proportion of patients presenting with resectable disease and chemotherapy response rates ranging from 1342% in patients with advanced stage disease (2, 3). However, even for patients with resected Stage I lung carcinoma, up to 30% will succumb to their disease within five years. Research has been directed toward the identification of patients at high risk for death after resection or chemotherapy; these individuals could be candidates for adjuvant therapy or alternative management strategies. Other than clinical stage, there are no established cancer-specific clinical variables or biomarkers that reliably identify individuals at increased risk for death after either surgical resection for early-stage nonsmall cell carcinomas or chemotherapy and/or radiation therapy for advanced stage carcinomas. Studies indicate that gene expression profiles of resected tumors can provide insights into lung carcinogenesis (46) and may predict risk for recurrence and death in early-stage lung carcinomas treated by surgical resection (7, 8). These studies suggest that prognostic information provided by molecular profiling of resected lung tumors may be useful in guiding adjuvant therapy or postresection surveillance strategies. However, because approximately only 20% of patients with lung cancer undergo surgical resection with curative intent (9), the applicability of this strategy may be limited. In contrast, biopsy specimens obtained by computed tomography (CT)guided approaches or by fiberoptic bronchoscopy are available from patients with both resectable and unresectable disease (10). Therefore, approaches to examine gene expression profiles from lung cancer biopsies may identify clinically relevant signatures that offer the potential to be widely applicable to the management of patients with lung cancer. We hypothesized that gene expression profiles derived from biopsies of lung tumors could discriminate tumor-specific gene signatures and provide predictive information about clinical outcome. Similar to other invasive diagnostic procedures, lung biopsies are safe but may be associated with complications that are infrequently medically significant. For transthoracic needle biopsies, pneumothorax rates are about 20%, and the incidence of hemoptysis varies from 5 to 15% (1113). In addition, fiberoptic bronchoscopy, endobronchial biopsy, and brushing complications are uncommon (10). To eliminate the risk of complications from obtaining biopsy specimens specifically for gene profiling analysis, we utilized residual material obtained from diagnostic lung cancer biopsies. Thus, no additional biopsies were performed specifically for these studies. We show that biopsy molecular signatures identify genes associated with tumor histology and we show that a classifier set of 42 genes can predict risk for lung cancer death. Some of the results of these studies have been reported previously in the form of an abstract (14).
Subjects were recruited from a consecutive series of patients referred for transthoracic needle biopsy or bronchoscopy of an undiagnosed lung nodule or mass. An additional inclusion criterion was the diagnosis of a primary lung carcinoma. Tissue specimens were obtained from 26 patients undergoing CT-guided biopsy (n = 23) (Temno coaxial core biopsy system; Allegiance/Cardinal Health, McGaw Park, IL) or endobronchial brushing (n = 3) (Cellebrity endoscopic cytology brush; Microvasive/Boston Scientific, Watertown, MA) of undiagnosed pulmonary nodules. After needle biopsy and brushing specimens were collected for pathologic diagnosis, the needle or brush containing cells that would otherwise have been discarded was placed into 1 ml of RNA extraction buffer (RNeasy minikit; Qiagen, Valencia, CA). cRNA was generated by the modified Eberwine protocol (http://www.affymetrix.com/support/technical/technotes/smallv2_technote.pdf) (15). Compared with the standard amplification protocol, the modified Eberwine procedure incorporates a second cycle of reverse transcription and a second cycle of in vitro transcription. Biotinylated cRNA was hybridized to the Affymetrix (Santa Clara, CA) U95Av2 DNA array, which contains probes for about 12,600 human genes. Probe-level analysis and normalization to nonmalignant lung tissue were performed according to the robust multiarray algorithm (16) (GeneTraffic; Iobion, La Jolla, CA). Affymetrix Microarray Suite 5.0 was used to designate each gene as present, absent, or marginal. We excluded from further analysis three arrays of poor quality as demonstrated by fewer than 35% of genes detected as present. Genes were filtered to remove those not present in at least two specimens and genes whose mean log ratio range was less than one. After filtering, 2,194 genes in 23 specimens were used for subsequent analyses. Analyses were performed with BRB ArrayTools (version 3.01; R. Simon and A. P. Lam, National Cancer Institute, Bethesda, MD) (17, 18) and with the Maximum Difference Subset (MDSS) algorithm (http://bioinformatics.upmc.edu/GE2/GEDA.html) (19) (microarray data available online at http://hora.cpmc.columbia.edu/dept/pulmonary/5ResearchPages/Laboratories/Powell%20Lab.htm). It was not possible to perform cytologic analysis of specimens used for gene profiling because the residual specimens for research were immediately placed into lysis buffer. We examined the cellularity of four additional specimens acquired from transthoracic needle biopsies; these were collected by standard procedures but were not processed for gene expression analysis. We determined that 1,5002,000 cells were present in residual specimens obtained from biopsy needles. Cells in the residual specimens were similar in morphology to the tumor cells in paraffin-embedded core biopsy tissues (see Figure E1 in the online supplement). RNA was not specifically quantitated. On the basis of cell counts and cRNA yields during processing for expression analysis, we estimate that needle biopsy specimens contained about 2050 ng of total RNA. RNA yields from residual material on bronchoscopy brushings ranged from 500 to 600 ng. Biopsy histologic diagnosis was acquired from the medical record. Permanent sections were reviewed by a second pathologist, who concurred with the original diagnosis in each instance. The histology was classified according to the World Health Organization lung tumor classification scheme for small cell and nonsmall cell carcinoma (20). In biopsy and brushing specimens, a diagnosis of adenocarcinoma or squamous cell carcinoma was rendered when there were features associated with differentiation (e.g., gland formation or mucin droplets for adenocarcinoma; keratin or intercellular bridges for squamous carcinoma). If the carcinoma was poorly differentiated, a designation of "nonsmall cell carcinoma" was assigned. Clinical information for the subjects was obtained from the medical record and from patients' physicians (Table 1) . All procedures were approved by the Columbia University Medical Center (New York, NY) Institutional Review Board and informed consent was obtained from participants.
For validation of the histology class prediction model, an independent set of 29 lung carcinoma resection specimens was microdissected and processed for microarray analysis according to standard protocols, as reported previously (6). For validation of the outcome class prediction model, gene expression and clinical data from a Massachusetts-based independent cohort of 109 patients with lung adenocarcinoma were accessed from http://www-genome.wi.mit.edu/mpr/lung/. Hu95Av2 CEL files from Massachusetts-based Dataset A(7) were imported into GeneTraffic and processed as described above. For the MantelHaenszel test for survivorship data (log rank test) (21), specimens were classified as high expression or low expression on the basis of gene expression relative to the median across all specimens. Statistical analyses of survival (22) were performed with SPSS 11.0 (SPSS, Chicago, IL). The following data sets were used for analysis: histology training set (n = 19 biopsies of adenocarcinoma and squamous, and small cell carcinoma), histology validation set (n = 29 microdissected primary lung carcinoma specimens), outcome training set (n = 23 biopsies), and outcome validation set (n = 109 patients with lung adenocarcinoma from the Massachusetts-based cohort).
Immunohistochemistry
Biopsy specimens were adequate for gene expression profiling analysis in 23 of 26 cases. Because our procedures utilized residual material from clinically indicated biopsies, there were no patient complications attributable to the research procedures. A limitation of gene expression profiling of small specimens obtained in this manner is that the number of cells captured does not provide an adequate quantity of total RNA for analysis on Affymetrix oligonucleotide arrays, using standard amplification protocols. We therefore instituted the modified Eberwine procedure, which is an established modification designed to uniformly amplify RNA obtained from small samples for analysis on microarrays. We examined two potential sources of variability in gene profiling of small specimens obtained from diagnostic biopsies: nucleic acid amplification and cellular heterogeneity. To examine the variability introduced by the additional round of amplification in the modified Eberwine procedure, we compared gene expression data of tumor RNA (2 µg) processed by standard procedures with that of diluted tumor RNA (200 ng) from the same specimen, but processed by the modified Eberwine protocol. Examination of scatter plots and correlation coefficients shows that gene signal intensities were highly similar between the two methods of amplification, as has been shown by other researchers (2325) (Figure 1A) .
To examine variability introduced by the admixture of cells present in the diagnostic specimens, we compared gene expression data of biopsy material with that of diluted microdissected tumor RNA from the same patient. The results indicate that the gene expression intensities are similar, but that there is more heterogeneity than in the comparison of amplification protocols (Figure 1B). Because both specimens were processed according to the modified Eberwine procedure, the variability was likely attributable to the presence of cellular heterogeneity in biopsy specimens. Compared with microdissected resected tumors, which contain more than 90% tumor cells, the biopsy specimens often contain cells from normal lung, pleura, muscle, and skin; inflammatory cells; and blood leukocytes in addition to tumor cells. Despite this heterogeneity, we hypothesized that unique tumor-specific molecular signatures (i.e., histology classifiers) could be detected in these specimens.
Histology
Among the lung histology classifier genes detected in the biopsy specimens, several have been identified in other studies that used the U95A microarray platform. These marker genes include ERBB2, TTF-1, MUC1, BENE, SELENBP1, TGFBR2 (adenocarcinoma); KIF5C, TMSNB, TUBB, FOXG1B, ESPL1, TRIM28 (small cell carcinoma); and KRT17, KRT6E, BPAG1 (squamous cell carcinoma) (6, 7, 27). To further examine the association of the classifiers with lung cancer histology, we performed class prediction testing with a k-nearest neighbor (28) leave-one-out cross-validation. In this procedure, one sample is removed from the training set, a new gene set is generated from which a classifier is generated, and this classifier is applied to the sample left out. This procedure is repeated for all the samples. Three nearest neighbor classifiers generated in this manner correctly predicted the histologic class for 13 (68%) of 19 samples. A permutation analysis of the predictor was performed. On the basis of 1,000 random permutations, the classifier had a p value of 0.035, indicating that the misclassification rate of the predictor was significantly smaller than the misclassification rate of the permutations. We tested the accuracy of the biopsy histology classifier model by using it to predict the histology of 29 independently obtained lung carcinoma resection specimens (histology validation set). The distribution of the histology validation set was adenocarcinoma (n = 22); small cell (n = 2); and squamous cell carcinoma (n = 5). The 99-gene histology classifier model was able to accurately predict histology in 25 (86%) of 29 tumors (Table 3) . Four of the adenocarcinoma tumors were incorrectly classified as squamous cell carcinomas. Interestingly, histologic sections of these tumors showed areas of squamous differentiation within a predominantly glandular tumor, and in a previous study three of these adenocarcinomas segregated with squamous cell carcinomas in an unsupervised clustering procedure (6). Therefore, histologic heterogeneity may have accounted for misclassification by histology classifier genes in these tumors. The results of histology training and validation set class prediction analyses indicate that gene expression profiles of lung biopsies were representative of histologically specific subtypes of lung carcinoma.
Prognosis We examined whether biopsy gene expression signatures could predict another clinically relevant end point, prognosis. Among the 23 patients who underwent lung biopsy, 6 cancer deaths occurred within 12 months. These patients were classified as high risk for early cancer death. We identified genes associated with high-risk and low-risk outcome, using the Maximum Difference Subset (MDSS) algorithm. This tool combines standard statistical tests (pooled variance t test) and machine prediction learning to identify class predictors with higher specificity and accuracy compared with other classification algorithms (19). In the biopsy data set, MDSS identified 42 genes associated with cancer death within 12 months (Table 4) . We tested the accuracy of these predictors to classify risk for cancer death. The overall outcome training set class prediction accuracy rate was 87% (20 of 23 predicted correctly), with a p value of 0.008 based on 1,000 random permutations of the class labels.
To determine whether the outcome classifiers identified in expression profiling of lung cancer biopsies were applicable to other lung cancer gene expression data sets, we examined whether our genes were associated with cancer-free survival in an independent set of homogenized tumors resected from a large cohort of Massachusetts-based patients with lung adenocarcinoma (outcome validation set) (7). We determined that 9 of the 42 genes associated with risk for one-year cancer death in our outcome training set were associated with cancer-free survival in the Massachusetts-based outcome validation data set, using the log rank test (p < 0.05; Figure 2) . These genes were as follows: CCNB1, FHL2, HLA-DPB1, LOXL2, IRS-1, PLOD2, MTHFD2, TGFB1, and TRIPBR2. This result suggests that despite differences in histologic subtypes, specimen types, and amplification protocols, selected outcome genes may be applicable to the prediction of lung carcinoma outcome in other patients.
Immunohistochemistry Because tumor behavior may be modulated by signals from the tumor and its surrounding microenvironment, we examined immunolocalization of representative outcome marker proteins to determine whether expression was detectable in tumor cells. Antibodies were selected on the basis of commercial availability. Immunoreactivity for both FHL2 (nuclear) and cyclin B1 (cytoplasmic) was detectable in tumor cells, suggesting that biopsy gene expression signatures are derived from tumor cells (Figure 3) .
Lung cancer biopsy gene expression profiles identify unique tumoral signatures that provide information about tissue morphology and clinical outcome. Using validated methods of gene identification that account for the statistical problems associated with multiple comparisons, the present study identified 42 genes associated with high risk for cancer death within one year. The use of specimens acquired by lung biopsy procedures to identify genes associated with clinical outcome suggests several applications as biomarkers of prognosis or treatment response. The relevance of the outcome marker genes identified in the biopsy specimens is supported by other studies indicating that several genes are associated with prognosis in patients with lung carcinoma or other carcinomas. Examples include MYC, encoding the nuclear transcription factor c-Myc, which functions in cell growth and proliferation and is frequently amplified in lung carcinoma (29). Increased expression of c-Myc is associated with adverse prognosis in lymphoma and node-negative breast carcinoma (30, 31). CCNB1 encodes the cell cycle-regulatory protein cyclin B1, which regulates the G2M transition. Increased expression of cyclin B1 is associated with poor survival in esophageal carcinoma and in nonsmall cell lung carcinoma (32, 33). FHL2 encodes four and a half of LIM-only protein, which is a ß-catenin-binding protein with trans-activation activity (34). FHL2 expression is increased in hepatoblastoma and is associated with cyclin D1 promoter activation in a ß-catenin-dependent fashion. Whereas FHL2 is not directly associated with cancer outcome, cyclin D1 expression is associated with decreased survival in resected lung carcinomas (35). HLA-DPB1, which encodes a human MHC Class II lymphocyte antigen ß chain, was associated with improved survival in our data set. A similar association was reported in a gene profiling study of diffuse large B cell lymphoma specimens. Lower expression of HLA-DPB1 and other MHC Class II genes was associated with poor patient survival and decreased tumor immunosurveillance (36). The 5-year survival rate for lung cancer is about 15%, which is markedly lower than the rates for other common cancers of the breast, colon, and prostate (37). This discrepancy may be due to biological differences such as histologic heterogeneity or to the absence of proven screening programs that effectively detect cancers at an early, curable stage. However, even for surgically resected early Stage I nonsmall cell lung carcinomas, the recurrence rate is 35% annually and the 5-year survival rate is about 70%. Studies suggest that gene expression profiles of early-stage lung adenocarcinomas may predict risk for death (7, 8) and therefore may be useful to identify individuals who would be most likely to benefit from systemic therapy delivered before or after resection. Data from early-stage lung cancer systemic therapy trials indicate that neoadjuvant chemotherapy combined with radiation therapy (38) and adjuvant chemotherapy (39) may provide a survival benefit for a small proportion of patients. The potential role of lung biopsy gene expression profiling in the management of early-stage nonsmall cell carcinoma would be to identify patients with high-risk tumors who would be most likely to benefit from neoadjuvant systemic therapy. The potential utility of this approach has been demonstrated in breast carcinoma. Gene profiles obtained from breast tumors have been shown to predict a short-term clinical response to neoadjuvant docetaxel (40). Another potential role for gene profiling of lung cancer biopsies that might be applicable to the large proportion of patients with lung cancer with unresectable tumors is selection of chemotherapy agents. Advanced stage nonsmall cell carcinomas and small cell carcinomas are treated by systemic chemotherapy. For nonsmall cell lung carcinomas, the average response rate in previously untreated patients ranges widely, from 13 to 42% (2); yet there are no reliable biomarkers to guide the selection of particular regimens to patients who are most likely to benefit. In vitro studies show that the response of lung cancer cells and other cancer cells to single chemotherapy agents can be predicted by distinct gene expression profiles (41, 42). These results suggest that gene profiling may complement decisions regarding the selection of systemic chemotherapeutic agents. This hypothesis is supported by B cell lymphoma clinical trials that identified tumor gene expression predictors of patient survival after chemotherapy treatment (43, 44). Interestingly, adverse prognosis genes were associated with a proliferation functional class whereas favorable outcome was associated with MHC Class II function (43). In our lung biopsy data set, proliferation genes (CCNB1, MYC, FHL2, and NR4A2) and MHC Class II genes (HLA-DPB1) were similarly associated with adverse and favorable outcomes, respectively. Further characterization of the function of these genes in lung carcinogenesis may lead to the development of novel targeted therapies. Some methodologic limitations apply to our approach. First, our use of residual biopsy specimens did not consistently provide enough cellular material for gene expression analysis according to standard amplification protocols. Rather, we used a modified protocol that incorporated a second round of amplification and therefore increased the opportunity for variability and inconsistency in the data. However, our validation experiments and those performed by others indicate that experimental variability attributable to amplification procedures is small and that data produced from small specimens are reliable. Our technical adequacy rate was higher than those reported by other studies that examined gene expression profiles of lung and breast biopsies (25, 45). Second, the sample size was relatively small, which may introduce bias and reduce the ability to generalize our results to other lung cancer populations. To address this issue, we examined the ability of the outcome classifier model to predict cancer-free survival in a large independent gene expression data set of lung adenocarcinoma tumors. Despite differences in tumor specimen composition and in experimental protocols, several of our cancer outcome classifier genes were similarly associated with cancer-free survival in Massachusetts-based lung adenocarcinoma cases. Future prospective validation of the gene classifier model in an independent cohort of patients undergoing biopsy will reduce confounding by technical and clinical factors and will confirm the generalizability of the results. Third, because our data set was composed entirely of lung carcinoma biopsies, we could not examine the utility of biopsy gene profiles to distinguish malignant tumors from benign nodules. Experience with screening chest CT indicates a high prevalence of nodules (2566%) of which only a small fraction (13%) are malignant (46). Although nodule size and interval change in size are useful tools to distinguish malignant from benign lesions, it is possible that gene expression profiles of CT-detected nodules may enhance diagnostic algorithms and the clinical utility of the procedure. Other reports support the potential utility of biopsy gene profiles in the clinical management of breast carcinoma. Compared with breast biopsies, lung biopsy is associated with a higher risk of complications such as bleeding and pneumothorax. We addressed this risk in our study procedures by utilizing residual specimens from clinically indicated diagnostic lung biopsies; thus no medical risk was attributable to procedures utilized for gene expression analysis of lung biopsies. The gene expression signatures generated by the lung biopsies are robust, clinically relevant, and have the potential to improve lung cancer treatment and outcome. The procedures are safe and feasible; we suggest that the efficacy and utility of this strategy are now appropriate for assessment by prospective clinical trials.
Analyses were performed with BRB ArrayTools, developed by Dr. Richard Simon and Amy Peng, and with MDSS, developed by Dr. James Lyons-Weiler and Satish Patel. We also thank Vladan Milkovic and Diane Alexis for technical assistance.
Supported by the National Institutes of Health (ES00354), the American Cancer Society (CRTG00058), and the Herbert and Florence Irving Scholar Fund. This article has an online supplement, which is accessible from this issue's table of contents online at www.atsjournals.org Conflict of Interest Statement: A.C.B. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript; L.S. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript; G.D.N.P. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript; K.L.W. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript; L.W. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript; J.H.M.A. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript; R.A.F. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript; C.A.P. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript. Received in original form January 15, 2004; accepted in final form April 12, 2004
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||