Published ahead of print on July 24, 2008, doi:10.1164/rccm.200712-1895OC
© 2008 American Thoracic Society doi: 10.1164/rccm.200712-1895OC
Proteomic and Computational Analysis of Bronchoalveolar Proteins during the Course of the Acute Respiratory Distress Syndrome1 Medical Research Service of the VA Puget Sound Healthcare System, Seattle, Washington; Divisions of 2 Pulmonary and Critical Care Medicine and 3 Endocrinology and Metabolism, Department of Medicine; and 4 Department of Microbiology; 5 Center for Lung Biology; and 6 Fred Hutchinson Cancer Research Institute, University of Washington, Seattle, Washington Correspondence and requests for reprints should be addressed to Thomas R. Martin, M.D., Pulmonary Research Laboratories, VA Puget Sound Health Care System, 1660 S. Columbian Way, 151L Seattle, WA 98108. E-mail: trmartin{at}u.washington.edu
Rationale: Acute lung injury causes complex changes in protein expression in the lungs. Whereas most prior studies focused on single proteins, newer methods allowing the simultaneous study of many proteins could lead to a better understanding of pathogenesis and new targets for treatment. Objectives: The purpose of this study was to examine the changes in protein expression in the bronchoalveolar lavage fluid (BALF) of patients during the course of the acute respiratory distress syndrome (ARDS). Methods: Using two-dimensional difference gel electrophoresis (DIGE), the expression of proteins in the BALF from patients on Days 1 (n = 7), 3 (n = 8), and 7 (n = 5) of ARDS were compared with findings in normal volunteers (n = 9). The patterns of protein expression were analyzed using principal component analysis (PCA). Biological processes that were enriched in the BALF proteins of patients with ARDS were identified using Gene Ontology (GO) analysis. Protein networks that model the protein interactions in the BALF were generated using Ingenuity Pathway Analysis. Measurements and Main Results: An average of 991 protein spots were detected using DIGE. Of these, 80 protein spots, representing 37 unique proteins in all of the fluids, were identified using mass spectrometry. PCA confirmed important differences between the proteins in the ARDS and normal samples. GO analysis showed that these differences are due to the enrichment of proteins involved in inflammation, infection, and injury. The protein network analysis showed that the protein interactions in ARDS are complex and redundant, and revealed unexpected central components in the protein networks. Conclusions: Proteomics and protein network analysis reveals the complex nature of lung protein interactions in ARDS. The results provide new insights about protein networks in injured lungs, and identify novel mediators that are likely to be involved in the pathogenesis and progression of acute lung injury.
Key Words: acute respiratory distress syndrome acute lung injury proteomic analysis bronchoalveolar lavage 2D gel electrophoresis
The study of single pathways and one-dimensional protein–protein interactions in complex diseases, such as acute lung injury (ALI) and the acute respiratory distress syndrome (ARDS), places major limitations on understanding the pathophysiology of these complex diseases. Previous studies suggest that numerous biological processes, including inflammation, apoptosis, and thrombosis, are involved in the pathogenesis of ALI and ARDS (1). Traditional research methods that explore single biological pathways cannot capture the complex interactions between these processes. In contrast, systems-based methodologies, such as proteomics, analyze global biological changes and provide an opportunity to examine the complexity that is inherent in human diseases, such as ALI and ARDS. Proteomics methods have been applied to the study of lung injury by several groups of investigators. Bowler and coworkers used an electrophoresis-based proteomics method to show that patients with ARDS have differences in both the expression and post-translational modification of proteins in distal lung fluid as compared with healthy control subjects (2). Schnapp and colleagues used a shotgun proteomics approach to identify insulin-like growth factor-binding protein-3 (IGFBP-3) as a novel mediator of apoptotic pathways in acute lung injury (3). De Torre and coworkers identified markers of lung inflammation, such as S100 A8 and A9 proteins, in the BALF of subjects challenged with endobronchial endotoxin and patients with ARDS using SELDI-TOF and electrophoresis-based proteomics methods (4). These studies applied proteomics techniques to study proteins in ARDS lavage fluids at a single time, but they did not address the complex and dynamic changes that occur during the course of lung injury. Thus, the purpose of this study was to use proteomic analysis to profile the changes in protein expression in the lungs at the onset and during the course of acute lung injury to examine the protein pathways that have important roles in its pathogenesis. We used a quantitative proteomics approach to profile proteins in the bronchoalveolar lavage (BAL) fluid (BALF) of patients with ARDS at Days 1, 3, and 7 after the onset of illness and compared the results with protein profiles in the BALF of healthy control subjects. We then applied advanced methods in computational analysis to map complex protein interactions in the lung fluids and study how these interactions changed during the course of ARDS. This approach to protein network analysis identified novel mediators of acute lung injury, and showed that protein pathways were redundant and involved in multiple biological processes. These characteristics of the protein interactions in the lungs of patients with ARDS have important implications for the development of new molecular-based therapies. Some of the results of this study have been previously reported in the form of an abstract (5).
Patient Population Patients with ARDS, as defined by the American-European Consensus Conference, were enrolled at Harborview Medical Center (Seattle, WA), a tertiary, university-based hospital (6). The patients underwent fiberoptic bronchoscopy and BAL in either the right middle lobe or lingula on Days 1, 3, and 7, as described (7). The control BAL fluid samples were obtained from healthy, nonsmoking volunteers between the ages of 18 and 50. The experimental protocol was approved by the Institutional Review Board of the University of Washington. Informed consent was obtained from the patient or a legal representative.
Sample Preparation To improve the detection of low-abundance proteins in the proteomic analysis, all samples were depleted of six highly abundant serum proteins (albumin, transferrin, haptoglobin, antitrypsin, IgG, IgA) using a monoclonal IgG immunoaffinity HPLC column (Multiple Affinity Removal System; Agilent Technologies, Wilmington, DE). The BAL fluids were passed over the depletion column, which absorbed the high abundance proteins. The buffer of the column flow-through fraction was exchanged to 7 M urea, 2 M thiourea, 4% CHAPS, 10 mM tris, pH 8.5 and concentrated to approximately 100 µl using a 5-kD centrifugation filter (Amicon Ulta-15; Millipore). The final protein concentration was measured using the 2D-Quant Assay, which allows protein measurements in urea-based solutions (Amersham Biosciences, Piscataway, NJ). The specificity, reproducibility, and improvements in protein spot detection during proteomic analysis of BALF using this approach have been reported (8, 9).
Two-Dimensional Difference Gel Electrophoresis For each subject, 75 µg of reference standard and 75 µg of BAL protein were labeled with Cy3 and Cy5, respectively. These samples were then applied onto a single rehydrated immobilized pH gradient (IPG) strip pH 4–7, 24 cm (GE Healthcare). Two-dimensional electrophoresis was performed as described in the online supplement, separating proteins by isoelectric point in the first dimension and molecular weight in the second dimension. To identify individual protein spots, the gels were scanned using the Typhoon 9400 Series Variable Imager (GE Healthcare) with excitation wavelengths of 532 nm for Cy3 and 580 nm for Cy5. This procedure was performed using the samples from each of the 20 patients with ARDS (Day 1, n = 7; Day 3, n = 8; Day 7, n = 5) and 9 control subjects, yielding a total of 29 separate gels.
Protein Spot Analysis
Identification of Protein Spots
Cytokine Assays
Data Analysis
Differences in the abundance of protein spots in the ARDS BALF at Days 1, 3, and 7 compared with normal BALF were determined using the EDGE algorithm (11). Multiple hypothesis testing was addressed by false discovery analysis using a Q-value of Principal components analysis (PCA) was performed based on the covariance matrix of normalized protein abundance values for all subjects (12). PCA is an exploratory analytical tool that is used to (1) reduce the complexity of the dataset and (2) identify meaningful groups and associations in the dataset. PCA transforms a number of correlated variables (e.g., individual protein spot abundance levels in each experimental sample) into a smaller number of uncorrelated variables, called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for successively decreasing amounts of the remaining variability. PCA was used in this study to cluster the experimental groups based on the expression of protein spots in the BALF. All protein spots that were present in more than 50% of the gels and identified by mass spectrometry were included in the PCA analysis.
Enriched functional categories within groups of proteins that were differentially expressed in the BALF of subjects with ARDS compared with healthy control subjects were determined using the DAVID program (Database for Annotation, Visualization, and Integrated Discovery) (13). A Benjamini-corrected P Protein interaction networks reflecting temporal changes in the BALF protein expression during the development of ARDS were constructed using the Ingenuity Pathways Analysis software and database (IPA; Ingenuity Systems; Redwood City, CA) (14). This database has been manually curated from over 200,000 full-text, peer-reviewed scientific articles studying approximately 10,000 human, 8,000 mouse, and 5,000 rat genes. A molecular network of direct and indirect interactions among these mammalian orthologs has been developed and served as the basis for creating smaller networks from the proteomics data. These subnetworks were built around proteins with the highest connectivity using an iterative algorithm and then merged together to create a larger overall network, called the interactome. Several protein hubs that had not been identified in the proteomics experiment but were critical in linking the detected proteins together were identified by the program and incorporated into the network. Details of the IPA program and database can be found at the manufacturer's website (http://www.ingenuity.com/products/pathways_analysis.html).
Patient Characteristics The baseline clinical characteristics of the patients with ARDS are summarized in Table 1. There were no statistically significant differences in the clinical characteristics between the patients with ARDS at Days 1, 3, and 7 of disease. Sepsis was the clinical risk factor for ARDS in all of the patients. The trends in mortality, protein concentration, and P/F ratio in the Day 7 group suggest that the patients who remained alive and intubated on Day 7 may have had improvement in these clinical parameters as a result of resolving lung injury. The control group consisted of nine normal volunteers who did not smoke cigarettes and did not have underlying lung disease.
Identification of BALF Proteins Using DIGE An average of 991 protein spots was detected in each gel across all experimental groups. Irrelevant spots were excluded based on an algorithm in the Decyder program designed to detect dust, defects in the gels, and other experimental artifacts. Of the protein spots, only those that were found in over 50% of the gels in every experimental group were considered for further analysis. Protein spots of interest were excised and analyzed by MALDI-TOF/TOF mass spectrometry for protein identification. A total of 80 protein spots were identified by tandem mass spectrometry, representing 37 unique proteins. The location of these 80 protein spots on a representative gel and their identifications are shown in Figure 1 and Table 2, respectively. Overall, the 37 proteins that were identified represented diverse protein families, including opsonins, antioxidants, basement membrane proteins, coagulation proteins, and serum acute-phase reactants, among others. Of the 37 proteins that were identified, 22 were differentially expressed in the ARDS BALF over time as compared with controls (EDGE analysis, Q-value 0.01). These proteins are identified in Table 2 (*). The average abundance values of the protein spots in the normal and ARDS BALF samples, measured as a log2 ratio between the spot of interest in the experimental sample (Cy5 wavelength) and the corresponding spot in the reference standard (Cy3 wavelength), are shown in the online supplement (Table E1).
PCA Principal component analysis (PCA) was used to reduce the complexity of the proteomics data and to examine global trends in protein expression in the lungs of patients with ARDS over the course of their disease. BALF samples were grouped based on the variance of their protein expression (Figure 2). The majority of the variance (> 70%) in the protein expression between normal and ARDS BALF was captured by the first three principal components. PCA showed that the ARDS and normal samples cluster into distinct groups. Within the ARDS cluster, PCA did not discriminate among samples collected on Days 1, 3, and 7. These results show that the pathophysiologic events in ALI/ARDS lead to significant derangements in the lungs that are reflected in dramatic changes in the BALF proteins at the onset of disease. By comparison, the PCA suggests that the changes that occur in the proteome of the lungs over the subsequent course of ALI/ARDS are more modest.
Functional Analysis To better characterize the functional differences between proteins detected in the BALF of subjects with ARDS versus normal subjects, the 22 proteins whose expression changed during the progression of ARDS were analyzed using the Gene Ontology (GO) database. The GO analysis revealed that the ARDS BALF was significantly enriched in processes involved in inflammation, immunity, response to microbials, response to stress/injury, and enzyme inhibitor activity (Figure 3A). The proteins included in each of the functional categories are listed in the online supplement (Table E3). Many of the differentially expressed BALF proteins were common to several enriched categories, as shown in Figure 3B. GO analysis of the proteins that were differentially present in the ARDS BALF on Days 1, 3, and 7 compared with normal BALF showed that the top five biological processes that are enriched at each day of ARDS are similar (Table E2).
Network Analysis To examine the relationships between these functional groups of proteins in greater detail, an in silico protein network was created based on known protein interactions derived from the biomedical literature. The proteins detected in the proteomic analysis are shown in a red to green color scale corresponding to their relative abundance as compared with the pooled standard used in the proteomic analysis (Figure 4). The connectivity among these identified proteins was enhanced by systematically incorporating proteins or other biological compounds, such as hormones, that have known interactions with the identified proteins but were not detected in the proteomic analysis. These additional proteins and mediators are shown in yellow. To validate their inclusion into the network, several of the proteins that were added by the IPA program were measured in the ARDS BALF using sensitive immunoassays (Figure 5). The concentrations of TNF- and IL-1β were low, but measurable, in these dilute BALF samples. The concentration of IL-6 was higher at the onset of ALI, but progressively decreased over the course of disease. These findings are concordant with our previously published work that showed that TNF- , IL-1β, IL-6, and LPS-binding protein (LBP), among other proteins, were detectable on Days 1, 3, and 7 of ARDS (12–15). All of these proteins were nearly undetectable in normal BALF.
The computational network analysis in this study was an attempt to capture the complex and dynamic nature of the changes in the lungs of patients with ALI/ARDS. The structure of the network that was generated from the proteomic analysis highlighted the complex interactions among its various nodes and identified several proteins as central hubs of connectivity. There is increasing evidence that the functional stability of biological networks is critically dependent on such hubs (15). In the ARDS protein network, many of the central hubs were proteins that have been implicated previously in the pathogenesis of acute lung injury, including TNF- , IL-1β, LBP, and p38 MAPK (12–16). Other highly connected nodes, however, were proteins and biological compounds whose roles have not been well studied in ALI, including retinoic acid, β-estradiol, annexin A1, the S100 proteins, and the EGF receptor (EGFR). The temporally dynamic characteristic of ARDS was revealed by the time course network analysis. Significant differences were seen in the expression of many members of the network between normal controls and Day 1 of ARDS (Figure 4). Complement proteins, antiproteases, annexin A3, S100 proteins (S100A9, S100A12), actin, and extracellular matrix proteins (basement membrane–specific heparan sulfate) were increased in the BALF on the first day of ARDS compared with normal BALF. Conversely, several proteins, including surfactant protein-A, annexin A1, fibrinogen, and fatty acid–binding protein, were decreased on Day 1 of ARDS. The changes in the expression of these proteins in the ARDS BALF likely reflected perturbations in innate immune, oxidative, and apoptotic pathways, as well as cell and extracellular matrix breakdown during the acute phase of lung injury. In contrast to the significant changes in protein expression between the normal and Day 1 ARDS BALF, the differences between Days 1 and 3 of ARDS were less dramatic (Figure 4). Nevertheless, many proteins changed in abundance between these times. Some proteins, such as complement C3 and peroxiredoxin 2, showed major differences, while others, including superoxide dismutase, complement C4, histidine-rich glycoprotein, and protein disulfide isomerase A3, had modest, but detectable, changes. These changes likely reflect changes in innate immune and oxidant pathways in the early stages of lung injury. The expression profile of the network at Days 3 versus 7 of ARDS also revealed several proteins that changed significantly in expression (Figure 4). These included annexin A3 (decreased), surfactant protein-A (increased), and actin (decreased), among others. These changes likely reflect regeneration of the lung epithelium, decreased cellular injury and turnover, and the resolution of lung injury. However, the magnitudes of the changes in most proteins in the network between Days 3 and 7 of ARDS were modest. The changes in protein expression on Days 1, 3, and 7 of ARDS compared with normal control subjects are summarized in Figure 6.
In this study we used a quantitative electrophoresis-based proteomics method (DIGE) to profile the changes in the expression of 37 proteins in the BALF of healthy subjects and patients on Days 1, 3, and 7 after the onset of ARDS. Bioinformatics and computational analysis showed important differences in the BALF proteins between normal subjects and patients with ARDS that reflect selective enrichment of specific biological processes, such as immunity, defense responses, and inflammatory responses to pathogens and injury. These findings are consistent with the current paradigm of the mechanisms leading to the development of lung injury, and suggest that critical events in the progression of ARDS can be captured by proteomic analysis of the BALF.
We examined temporal changes in the BALF proteome during the progression of ARDS by generating protein networks that mapped detailed protein interactions. Our approach has several key advantages. By incorporating proteins and biological compounds that were not identified from our proteomics experiments, but are known to be highly connected with other members of the interactome in the lungs, the network analysis correctly predicted a number of critical mediators in the pathogenesis of ARDS. These proteins included TNF- The unprompted addition of these proteins with previously identified roles in ARDS to the in silico network has two important implications. First, it supports the interpretation that the protein networks generated in this study accurately model the proteome of the injured lung. Second, it suggests that the nodes in the network that have not been well described in the pathogenesis of ARDS may also represent novel mediators of disease.
β-Estradiol is an example of one such novel mediator. In the network, β-estradiol had multiple predicted interactions with proteins such as TNF-
In addition to identifying novel mediators of disease, the network analysis also revealed important global characteristics of the protein-to-protein relationships that occur in lungs of patients with ARDS. The most striking feature was the complex topology of the protein interactions. Nested in this complexity were two features that have important implications for the development of molecular strategies to treat patients with ARDS. First, many of the protein pathways were redundant. For example, the p38 MAPK pathway has been implicated in the development of the exaggerated inflammation that is seen in ARDS (20). The network analysis showed that p38 MAPK may trigger part of its biological effect by increasing the expression of TNF-
In addition to redundancy, another feature of the protein interactome was that many protein nodes were involved in multiple biological processes. Using TNF- The complex characteristics of the protein network in the lungs of patients with ARDS illustrate the challenges associated with identifying effective therapeutic targets in human diseases. The redundancy of protein pathways suggests that single protein interventions may not effectively disrupt pathogenic mechanisms. Furthermore, the overlapping functions of single proteins in numerous biological processes highlight the difficulty in generating focused therapeutic strategies that do not disrupt other important homeostatic pathways. In light of these challenges, network analysis offers a way to examine these biologic redundancies and overlapping functions and identify rational targets for molecular-based interventions that maximize clinical efficacy and minimize adverse effects. This may be achieved by targeting nodes within the network that are not highly connected themselves but critically influence densely connected hubs in the interactome.
Based on this premise, the protein network in this study predicts that the S100 proteins (S100A8 and S100A9) might be appropriate candidates for targeted intervention in ARDS because these proteins interact with several key modulators of ARDS, including TNF- This study has several potential limitations. First, only a subset of the lung proteome was examined using DIGE proteomic analysis. As a result, a comprehensive molecular model of ARDS could not be generated, because doing so using a limited subset of the overall lung proteome could lead to over-interpretation of the data and erroneous conclusions. Instead, we focused the computational analysis on identifying novel molecular connections associated with the identified proteins that could have a role in the pathophysiology of lung injury. In doing so, the computational analysis confirmed several previous findings in the BALF of patients with ARDS and suggested new pathway interactions. Future studies using alternative methods, such as mass spectrometry–based proteomic analysis (shotgun proteomics), will permit the analysis of a larger set of the BALF proteome and the development of more comprehensive molecular models of ALI and ARDS. In addition, new methods need to be developed that will provide precise measurements of temporal changes in global protein networks. Although the number of BALF samples analyzed in each experimental group was modest, two important features of this study suggest that the findings are robust. First, we used a proteomics approach (DIGE) that incorporates a pooled standard and minimizes the experimental variability associated with gel-to-gel comparisons. Thus, the DIGE method increased the likelihood that the changes in protein expression between the experimental groups were true biological differences. Second, we only analyzed protein spots that were detected in more than 50% of the samples in every experimental group. Thus, the proteome that we examined in this study is likely to be consistently found in the lungs of patients with ARDS. Because the control group used in this study included healthy volunteers that were not mechanically ventilated, it is possible that some of the differences in the proteomic analysis may be attributable to critical illness or mechanical ventilation, rather than lung injury. Previous studies of mechanically ventilated patients that did not have lung injury showed that the lungs of these patients do not contain increased amounts of total protein, neutrophils, or cytokines (34). This supports the interpretation that the changes seen in the proteomic analysis are likely due to the development of ARDS. Another limitation of this study is the dependence of the network analysis on our current knowledge of protein interactions. Thus, it is subject to any inaccuracies and shortcomings in the biomedical literature. Furthermore, there is likely to be an inherent bias in the identification of densely connected nodes, since many of these hubs have been previously well studied, and there is an abundance of information on their interactions. Therefore, predictions based on the protein network generated in this study will require further validation of biological relevance. Despite these limitations, the computational proteomics approach used in this study confirms previous findings about the role of individual proteins in ARDS and provides insights about novel proteins and mediators that are likely to be involved in the progression of lung injury. In addition, the analysis shows the dynamic nature of differential protein expression in the lungs of patients with ARDS, the complexity of the protein–protein interactions, and the utility of network analysis in identifying potential targets for molecular-based interventions. Future studies that examine the specific changes in these networks over time are likely to provide important insight about the key biological processes that interact to produce lung injury.
* These authors contributed equally to this study. Funding: NIH/NHLBI HL073996, NIH/NHBLI HL090298. This article has an online supplement, which is accessible from this issue's table of contents at www.atsjournals.org. Originally Published in Press as DOI: 10.1164/rccm.200712-1895OC on July 24, 2008 Conflict of Interest Statement: None of the authors has a financial relationship with a commercial entity that has an interest in the subject of this manuscript. Received in original form December 26, 2007; accepted in final form July 21, 2008
This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||