help button home button
AJRCCM
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by MEADE, M. O.
Right arrow Articles by STEWART, T. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by MEADE, M. O.
Right arrow Articles by STEWART, T. E.
Am. J. Respir. Crit. Care Med., Volume 161, Number 1, January 2000, 85-90

Interobserver Variation in Interpreting Chest Radiographs for the Diagnosis of Acute Respiratory Distress Syndrome

MAUREEN O. MEADE, RICHARD J. COOK, GORDON H. GUYATT, RYAN GROLL, JOHN R. KACHURA, MICHEL BEDARD, DEBORAH J. COOK, ARTHUR S. SLUTSKY, and THOMAS E. STEWART

Department of Medicine, McMaster University Faculty of Health Sciences, Hamilton, Ontario, Canada; Department of Statistics and Actuarial Science, University of Waterloo, Ontario, Canada; Department of Clinical Epidemiology and Biostatistics, McMaster University Faculty of Health Sciences, Hamilton, Ontario, Canada; Department of Medicine, Mount Sinai Hospital, and Adult Critical Care Medicine Program, University of Toronto, Toronto, Ontario, Canada; and Department of Medical Imaging, The Toronto Hospital, University of Toronto, Toronto, Ontario, Canada

    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

To measure the reliability of chest radiographic diagnosis of acute respiratory distress syndrome (ARDS) we conducted an observer agreement study in which two of eight intensivists and a radiologist, blinded to one another's interpretation, reviewed 778 radiographs from 99 critically ill patients. One intensivist and a radiologist participated in pilot training. Raters made a global rating of the presence of ARDS on the basis of diffuse bilateral infiltrates. We assessed interobserver agreement in a pairwise fashion. For rater pairings in which one rater had not participated in the consensus process we found moderate levels of raw (0.68 to 0.80), chance-corrected (kappa  0.38 to 0.55), and chance-independent (Phi  0.53 to 0.75) agreement. The pair of raters who participated in consensus training achieved excellent to almost perfect raw (0.88 to 0.94), chance-corrected (kappa  0.72 to 0.88), and chance-independent (Phi  0.74 to 0.89) agreement. We conclude that intensivists without formal consensus training can achieve moderate levels of agreement. Consensus training is necessary to achieve the substantial or almost perfect levels of agreement optimal for the conduct of clinical trials. Meade MO, Cook RJ, Guyatt GH, Groll R, Kachura JR, Bedard M, Cook DJ, Slutsky AS, Stewart TE. Interobserver variation in interpreting chest radiographs for the diagnosis of acute respiratory distress syndrome.

    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Acute respiratory distress syndrome (ARDS) is an advanced form of acute lung injury characterized by diffuse pathophysiological changes of increased capillary permeability, inflammation, and tissue repair. The clinical syndrome includes a triad of hypoxemia, decreased lung compliance, and chest radiographic abnormalities. The high incidence of ARDS in patients with predisposing clinical conditions such as sepsis, gastric aspiration, and multiple trauma [up to 35% (1)], and the associated mortality of 20 to 74% (2, 3), have made ARDS a major concern for clinicians and investigators.

ARDS is a more severe form of the continuum of acute lung injury, the threshold for which is somewhat arbitrary. Because of the arbitrariness of this threshold, defining ARDS and identifying the syndrome in individual patients presents a challenge. This problem of definition has led to considerable difficulties in comparing epidemiologic data relating to ARDS incidence (4), difficulties that will be resolved only through multinational collaboration (5). Variability in defining and identifying ARDS is also an important concern in clinical trails that consider ARDS as an inclusion criterion or as a study outcome.

Whatever definition one chooses, the diagnosis of ARDS depends in part on identifying characteristic radiographic abnormalities. To be consistently useful, interpretation of a radiological investigation must be reliable. Highly desirable in the delivery of clinical care, reliability becomes crucial in clinical studies that rely on radiologic findings. Lack of reproducibility will inflate required samples sizes, and potentially lead to false-negative trial results. The limited interobserver agreement that investigators have usually observed when examining radiographic interpretation (6) suggests that both clinicians and scientists should attend to this issue.

We conducted a multicenter randomized trial of a pressure- and volume-limited ventilation strategy (10) in patients at high risk for ARDS. Our study demonstrated similar outcomes for the alternative strategies that we examined. When planning this study, we considered using ARDS as a possible inclusion criterion, and as a possible outcome. We rejected ARDS as an inclusion criterion because we ultimately decided our intervention might prevent the development of ARDS. We rejected ARDS as an outcome since the two different ventilation strategies used different mean airway pressures, and hence would potentially bias chest radiograph and oxygenation criteria of ARDS. We were nevertheless interested in the frequency with which ARDS occurred, and in the reliability with which we might measure that frequency. We therefore examined the extent to which intensive care physicians and a radiologist could agree on the radiologic diagnosis of ARDS.

    METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Source of Chest Radiographs

We used films from patients enrolled in our randomized trial at seven participating hospitals in Toronto (Ontario, Canada). Adult patients who met the following criteria were eligible for the trial: intubated less than 24 h; peak airway pressures =< 30 cm H2O; hypoxemia: PaO2/ FIO2 < 250, on positive end-expiratory pressure (PEEP) = 5 cm H2O; one or more known risk factors for ARDS. The trial excluded patients with the following characteristics: anticipated duration of ICU admission < 48 h; very unlikely survival, defined by premorbid or acute life expectancy; heart failure; acute asthmatic exacerbation; high risk of cardiac arrhythmia or ischemia; intracranial abnormalities associated with intracranial hypertension; or pregnancy.

Because of difficulty in obtaining the films, we omitted all films from the Ottawa center; from the Toronto Hospitals, we omitted films that film library personnel could not locate. Study patients had chest radiographs taken at least once daily, and we chose the first film from each consecutive day of participation. We included 841 films from 99 patients; individual patients provided from 1 to 32 films (median, 7).

Raters

Three raters interpreted each radiograph. Seven study intensivist/investigators, one from each participating hospital, provided the first interpretation by reading films done at their hospital (Rater 1). After the randomized trial was completed, two other raters, one an intensivist (M.M., Rater 2) and one a radiologist (J.K., Rater 3) interpreted each film, independently and without knowledge of other interpretations.

Preparation of Chest Radiographs

Site investigators reviewed films at the time they were taken. To prepare the films for review by Raters 2 and 3, we shuffled them in batches of approximately 150 as they arrived at the study office, numbered them in their new sequence, removed each film from the associated envelope, and covered the identification label with an opaque sticker bearing the study film number. The purpose of these preparations was to minimize bias that might occur if raters reviewed serial films from a single patient in sequence.

Review Process

Site investigators recorded their interpretations of study films on data forms included in the randomized trial. Site investigators had no study-specific training (no study-specific definitions or standardized techniques) in judging the presence of ARDS-related infiltrates. Raters 2 and 3 began by independently interpreting 63 films from 11 patients---we refer to these films as the "training set." Raters 2 and 3 then repeated the review of the training set films, this time with one another, discussing the reasons for disagreement and refining the standards and rules they would apply when the interpretation was difficult. We refer to this process as the "standardized review." Raters 2 and 3 then completed their interpretation of the full sample of 841 films, including the 63 films from the training set. The training set films were included at random among the others and the raters were thus unlikely to identify them.

Radiograph Interpretation

Each interpreter made two ratings in accordance with two definitions for ARDS that are commonly used in clinical trials. One rating, based on the definition of ARDS provided by an American-European Consensus Conference (AECC) statement (5), involved deciding whether a chest radiograph had diffuse bilateral infiltrates. Although the original AECC statement specifies "bilateral infiltrates" in the list of criteria defining ARDS, we specified "diffuse bilateral infiltrates" for two reasons. First, the AECC statement included a discussion of ARDS as a diffuse process that is therefore associated with diffuse infiltrates. Second, we were unwilling to interpret films with discrete bilateral subsegmental infiltrates as being consistent with ARDS. The refined criteria arising from the standardization process included conventional definitions from the radiology literature (11), defining infiltrate as "any ill-defined opacity in the lung that neither destroys nor displaces the gross morphology of the lung and is presumed to represent a pathophysiological process." The refined criteria also included defining diffuse as widespread and continuous, by which the reviewers meant involving at least 80% of a lung field and not excluding specific lung segments.

The other rating, based on the Lung Injury Severity Score (4), involved deciding how many quadrants contained an area of consolidation. The consensus standardization led to a definition of consolidation as "a homogeneous opacity in the lung characterized by little or no loss of volume, by effacement of pulmonary blood vessels, and sometimes by the presence of an air bronchogram" and excluded definite effusions and masses. To distinguish between the upper and lower quadrant of a lung field, Raters 2 and 3 agreed to use the horizontal plane of the ipsilateral pulmonary artery at its midpoint at the hilum. When this landmark was obscured, they used the contralateral pulmonary artery and, when both were obscured, they used the midpoint of the height of the lung fields.

Statistical Methods

We were interested in the level of agreement in each of the three possible pairings among Raters 1, 2, and 3. We refer to the pairing of Raters 2 and 3 as the "standardized pair" to distinguish this pairing from other pairings in which one rater (the site intensivist/investigator) had not participated in the consensus process.

We measured agreement among raters by addressing two questions. The first question, Is this chest radiograph consistent with ARDS?, is relevant to clinical practice or to the use or ARDS as an inclusion criterion in clinical trials. To address this issue, we would like as many films as possible. Therefore, it would be convenient if we could treat the 841 films from this study as if they were 841 films from different patients. However, we have serial films from 99 patients; therefore, we cannot treat our observations as if they came from different patients (using technical languages, as if they were independent). If we did assume independence, results from standard kappa -type analyses could be subject to major distortion. We will return to this issue shortly.

The second question, Did this patient develop ARDS?, applies to the use of ARDS as a study outcome. Measuring agreement among raters in this setting requires reviewing all films for each patient, and developing a criterion for a series of films being consistent with ARDS. We tested two possible criteria: (1) any film consistent with ARDS, and (2) films on two consecutive days consistent with ARDS.

Because seven intensivists contributed to "Rater 1," we began by comparing odds ratios of agreement between each of the seven and Raters 2 and 3 with respect to the presence of diffuse bilateral infiltrates on Day 1, two consecutive days, or any day. Testing failed to reject the null hypothesis, i.e., that the seven intensivists achieved the same levels of agreement with Raters 2 and 3. Testing for heterogeneity of odds ratios generated by different observers and pooling across estimates if no heterogeneity is found is standard statistical methodology (12).

For comparisons of rating of the presence or absence of diffuse bilateral infiltrates, we calculated raw agreement, chance-corrected agreement (using kappa ), and chance-independent agreement (using Phi ). Table 1 presents the formulas for our measures of agreement based on a 2 × 2 table. The rationale for using these three methods is as follows. Raw agreement---the proportion of films in which both raters conclude that diffuse infiltrates were, or were not, present---can be misleading. In particular, if two raters both make a high or low proportion of positive ratings, raw agreement will be high even if the raters are just guessing. That is, their agreement will be high simply by chance. High agreement by chance tends to occur when two observers believe the prevalence of the clinical entity of interest is high or low in the population under study.

                              
View this table:
[in this window]
[in a new window]
 

TABLE 1

CALCULATIONS OF AGREEMENT*

Because of this problem with raw agreement, we calculated chance- corrected agreement, using the kappa  statistic (13). While avoiding spuriously high levels of agreement due to chance, kappa  has its own limitations that have led to sharp criticism (14). One of the major difficulties with kappa  is that when the proportion of positive ratings is extreme, the possible agreement above chance agreement is small, and it is difficult to achieve even moderate values of kappa . Thus, if one uses the same raters in a variety of settings, as the proportion of positive ratings becomes extreme, kappa  will decrease even if the way the raters interpret films does not change.

To address this limitation, we also calculated chance-independent agreement using Phi , a relatively new approach to assessing observer agreement (15). One begins by estimating the odds ratio from a 2 × 2 table displaying the agreement between two observers, such as the one presented in Table 1. The odds ratio is given by OR = ad/bc. In this case it is simply the odds of a positive classification by rater B when rater A gives a positive classification divided by the odds of a positive classification by rater B when rater A gives a negative classification. As such, it provides a natural measure of agreement. This agreement can be made more easily interpretable by converting it into a form that takes values from -1.0 (representing extreme disagreement) to 1.0 (representing extreme agreement). The Phi  statistic makes this conversion by the following formula:
Φ=<FR><NU>(OR)<SUP>1/2</SUP>−1</NU><DE>(OR)<SUP>1/2</SUP>+1</DE></FR>=<FR><NU>(ad)<SUP>1/2</SUP>−(bc)<SUP>1/2</SUP></NU><DE>(ad)<SUP>1/2</SUP>+(bc)<SUP>1/2</SUP></DE></FR> (1)

When both margins are 0.5 (that is, both raters conclude that 50% of the patients are positive and 50% negative for the trait of interest) Phi  is equal to kappa .

Phi has three important advantages over existing approaches. First, it is independent of the level of chance agreement. Thus, investigators could expect to find similar levels of Phi  whether the distribution of results is 50% positive and 50% negative, or 90% positive and 10% negative. This is not true for measures of the kappa  statistic, a chance-corrected index of agreement.

Second, Phi  allows modeling approaches that the kappa  statistic does not. For example, in the present data set, because of the possible lack of independence in degree of agreement across multiple films from a single patient, kappa  would not allow us to take full advantage of the 841 films that our raters evaluated. Phi  allowed us to adjust for the degree of intrapatient correlation in assessments of serial radiographs, and thus make more efficient use of the data and generate narrower confidence intervals around the level of agreement. Third, Phi  also allowed us to test whether differences in agreement between pairings were significant, an option not available with kappa .

For ratings of the presence or absence of diffuse bilateral infiltrates, we compared not only the agreement between the three pairings of raters, but also the agreement between the standardized pairing (Raters 2 and 3) on the training set before and after the standardized review. Because of the possibility that viewing the training set twice may have influenced the standardized raters' interpretation of those films, we omitted them from the primary comparisons. Thus, the primary comparisons of the three possible pairings of raters included only 778 films.

As we have mentioned, we could not use kappa  to calculate agreement on the presence or absence of diffuse bilateral infiltrates using all films, because of the lack of independence in multiple films on the same patients. We were able to assess agreement across all 778 films based on the Phi  statistic and applied maximum likelihood estimation based on the noncentral hypergeometric distribution to generate estimates that account for the degree of correlation in multiple films coming from the same patient (the APPENDIX describes the approach to maximum likelihood estimation).

We also conducted significance tests on the agreement between the three pairings of raters on the three ratings of the presence of radiographic ARDS (bilateral infiltrates present on first film, any film, and two consecutive films), and on the agreement between the consensus raters before and after training. We interpreted both kappa  and Phi  results as follows: values of less than 0, poor; 0 to 0.2, slight; 0.2 to 0.4, fair agreement; 0.4 to 0.6, moderate agreement; 0.6 to 0.8, substantial agreement; and values of 0.8 to 1.0 represent almost perfect agreement (16).

Methods for calculating chance-independent agreement with multiple categories---in this case multiple quadrants---remain undeveloped. Therefore, to assess agreement between the three pairings of raters on the rating of consolidation in 0 to 4 quadrants, we relied on weighted kappa  with quadratic weights allowing for partial agreement (17). We have explained why, because of lack of independence, we could not use all films for assessing kappa , and thus used the new methodology for chance independent agreement. Because we did not have an equivalent methodology to deal with multiple quadrants, we used only the first film on each patient to address agreement on the rating of the number of quadrants involved.

    RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

The patients contributed from 1 to 33 films each to the agreement process, with a mean of 8.9. The seven intensivists who contributed to the "Rater 1" comparisons evaluated films from between 3 and 27 patients. The proportion of patients judged by Raters 1, 2, and 3, respectively, to have bilateral infiltrates present on Day 1 were 0.54, 0.27, and 0.30; 0.70, 0.60, and 0.61 for the proportion of patients who had diffuse bilateral infiltrates present on any day; and 0.64, 0.40, and 0.41 for the proportion of patients who had diffuse bilateral infiltrates present on two consecutive days. For the seven intensivists who contributed to the Rater 1 ratings, the proportions of patients with diffuse bilateral infiltrates present on Day 1, any day, or two consecutive days, respectively, ranged from 0 to 0.78, 0.20 to 0.91, and 0.20 to 0.83. The rater with the 0 and 0.2 proportions reviewed films from only 5 patients.

Table 2A to 2C presents the agreement across the three pairings of raters for the three approaches to judging bilateral infiltrates present or absent, using raw agreement, kappa  and Phi . Agreement between Raters 2 and 3 was substantial to almost perfect for all three criteria, using all three approaches. Raw agreement between the other two pairings varied from 0.68 to 0.80. The agreement between these two pairings was moderate for all three criteria, using kappa , and moderate to substantial using Phi .

We were interested in whether the consistent trend showing higher agreement in the standardized pairing could be a chance phenomenon. While methods for testing the statistical significance of two kappa  values in this situation have not been developed, the methodology of chance-independent agreement allows this comparison. Despite the consistency of the trend toward greater agreement in the standardized pairing, the difference between the levels of agreement approached conventional levels of significance for only one of the three ratings related to bilateral infiltrates (p values of 0.83, 0.05, and 0.12 for first film positive, any film positive, and two consecutive films positive by Raters 2 and 3 versus 1 and 3; 0.24, 0.91, and 0.95 by Raters 2 and 3 versus 1 and 2).

This lack of significance could be a problem of power---we may not have had enough films to exclude chance as an explanation. This problem could be ameliorated by using all of the films. Using all films, however, requires adjustment for any lack of independence in ratings of multiple films from the same individual. Including all films in the evaluation of diffuse infiltrates and adjusting for lack of independence, the Phi  for Raters 2 and 3 was 0.69 (95% CI, 0.60-0.77), for Raters 1 and 2 the Phi  was 0.60 (95% CI, 0.44-0.72), and for 1 and 3 the Phi  was 0.56 (95% CI, 0.41-0.69). The difference in these levels of Phi  was highly significant (p values of < 0.001 comparing Raters 2 and 3 with either 1 and 2, or 1 and 3).

Table 3 addresses the hypothesis that the reason for the superior agreement of Raters 2 and 3 and the other pairs was the consensus process Raters 2 and 3 undertook in reviewing the first 63 films together. Table 3 presents the level of agreement related to the presence of bilateral infiltrates before and after the consensus process. While the number are small, there is a strong trend for a higher level of agreement after the consensus process. Here, the small data set leads to empty cells (cells with 0 observations), which makes it difficult to make meaningful calculations of Phi .

                              
View this table:
[in this window]
[in a new window]
 

TABLE 3

CHANCE-CORRECTED AGREEMENT BETWEEN RATERS 2 AND 3 ON THE TRAINING SET OF 63 CHEST RADIOGRAPHS BEFORE AND AFTER STANDARDIZED REVIEW*

The weighted kappa  for the number of quadrants involved in the first film of each patient was as follows for the three pairings: Raters 2 and 3, 0.74 (95% CI, 0.63-0.85); Raters 1 and 2, 0.47 (95% CI, 0.31-0.63); Raters 1 and 3, 0.54 (95% CI, 0.42- 0.67).

    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

We found moderate to good agreement on the presence of diffuse bilateral infiltrates suggestive of ARDS, irrespective of which of a number of possible criteria we used. This level of agreement is high in comparison with most clinical ratings, and many of the radiographic interpretations, that clinicians use regularly in clinical practice. For instance, the intensivists in our study demonstrated considerably better agreement than did those who participated in a prior study of interpretation of chest radiographs of patients with ARDS. Beards and coworkers found a kappa  of only 0.05 for intensivists' rating of the number of quadrants in which consolidation was present (18).

In the clinical trial setting, however, agreement that is less than excellent compromises precision of measurement, and may result in misleading findings, large sample size requirements, or both. For instance, consider a trial enrolling patients with established ARDS, in which the presence of bilateral infiltrates would constitute one criterion for inclusion. The site intensivist and Rater 2 (the study intensivist) agreed on 68% of the ratings of the presence of bilateral infiltrates in the first film from each patient (Table 2A). This limited level of agreement would lead to appreciable differences in the patients enrolled in the study. Similarly, if a study considered ARDS as an outcome and the presence of diffuse infiltrates at any time while the patients stayed in the ICU contributed to the diagnosis, intensivists' ratings would agree only 78% of the time (Table 2A). This limited level of agreement could contribute substantial random error to the study results.

Fortunately, there is a partial solution to this problem. Development of standardized criteria and reporting forms; pilot testing; and training of raters through review of disagreements, discussion of the reasons, and agreement about how to deal with difficult judgments are accepted methods of maximizing agreement in a wide variety of clinical ratings. These methods have resulted in acceptable levels of agreement in interpretation of pediatric chest radiographs in a multicenter study (19). We have provided empirical evidence of the magnitude of improved agreement that clinical trialists studying radiological findings in critically ill patients can achieve by modest pilot testing and consensus development. This process decreased the disagreement on the presence of infiltrates on the first film of each patient to 10% and on any film to 8% (Table 2A).

Strengths of this study include the careful blinding of the radiographs, and of the raters, to one another's interpretation; the participation of both intensivists and a radiologist; the relatively large number of films read and the resulting relatively narrow confidence intervals; and our rigorous approach to data analysis. The study would have been stronger yet if we had the resources to include more radiologists and intensivists, and conducted a more systematic evaluation of a training period that would allow raters to develop consensus standards. The intensivist who read each film was a critical care fellow at the time of the study. Stage of training might have influenced the degree of improvement with training, and including additional readers at varying stages of training would have allowed us to explore this issue.

Inferences from our study may be limited by the lack of detail and explicitness in the current definitions of ARDS (5, 11). Available guidelines for reading and interpreting chest radiographs in patients receiving mechanical ventilation do not solve this problem, as they too offer only general approaches rather than explicit criteria (20). As a result, we developed our own detailed criteria; our criteria, however, do not have the benefit of a wide consensus. Ongoing work is likely to ameliorate or solve this problem in the future.

In reporting our results, we have relied on an innovative approach to measuring agreement with binary ratings. Like traditional measures of agreement, the Phi  statistic takes values from -1.0 to 1.0. As we have described in METHODS, Phi  has three important advantages over existing approaches. First, it is independent of the level of chance agreement. Second, Phi  allows full use of information from nonindependent observations (in this case, multiple films from each patient). Third, Phi  allows testing of whether variations in agreement between different pairings of the same raters are significant. These options are not available with kappa . We believe these advantages of Phi  may ultimately lead to its replacing kappa  as the standard measure of agreement for binary clinical ratings. Until we gain further experience with the new method, however, we suggest investigators report both the standard kappa  and the Phi  statistic.

In summary, we have demonstrated that intensivists can achieve moderate levels of agreement in the radiologic diagnosis of ARDS without specific training. Further consensus training can increase the level of agreement to substantial or almost perfect. Clinicians involved in clinical trials should seriously consider pilot training and assessment of the level of agreement in making clinical and radiographic ratings to enhance the power and accuracy of their studies.

                              
View this table:
[in this window]
[in a new window]
 

TABLE 2

AGREEMENT BETWEEN RATERS ON ASSESSING THE PRESENCE OF DIFFUSE BILATERAL INFILTRATES*

    Footnotes

Correspondence and requests for reprints should be addressed to Thomas E. Stewart, M.D., Department of Medicine, Mount Sinai Hospital, Suite 427-600, University Avenue, Toronto, ON, Canada M5G 1X5. E-mail: tom.stewart{at}utoronto.ca

(Received in original form September 2, 1998 and in revised form July 7, 1999).

Acknowledgments: Supported in part by the Physicians' Services Incorporated Foundation of Ontario, the Ontario Thoracic Society, and Bayer Corporation.
    References
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

1. Garber, B. G., P. C. Hebert, J. D. Yelle, R. V. Hodder, and J. McGowan. 1996. Adult respiratory distress syndrome: a systematic overview of incidence and risk factors. Crit. Care Med. 24: 687-695 [Medline].

2. Miller, R. S., L. D. Nelson, S. M. Di Russo, E. J. Rutherford, K. Safcsak, and J. A. Morris. 1992. High-level positive end-expiratory pressure management in trauma-associated adult respiratory distress syndrome. J. Trauma 33: 284-291 [Medline].

3. Bell, R. C., J. J. Coalson, J. D. Smith, and W. G. Johanson. 1983. Multiple organ system failure and infection in adult respiratory distress syndrome. Ann. Intern. Med. 99: 293-298 .

4. Murray, J., M. Matthay, J. Luce, and M. Flick. 1988. An expanded definition of the adult respiratory distress syndrome. Am. Rev. Respir. Dis. 135: 720-723 .

5. Bernard, G. R., A. Artigas, K. L. Brigham, J. Carlet, K. Falke, L. Hudson, M. Lamy, J. R. LeGall, A. Morris, and R. Spragg. 1994. Report of the American-European consensus conference on acute respiratory distress syndrome: definitions, mechanisms, relevant outcomes, and clinical trial coordination. Am. J. Respir. Crit. Care Med. 149: 818-824 [Abstract].

6. Tudor, G. R., D. Finlay, and N. Taub. 1997. An assessment of inter- observer agreement and accuracy when reporting plain radiographs. Clin. Radiol. 52: 235-238 [Medline].

7. Guyatt, G. H., M. Lefcoe, S. D. Walter, L. E. Griffith, D. King, C. Zylak, N. Hickey, and G. Carrier. 1995. Interobserver variation in computerized tomographic diagnosis of intrathoracic lymphadenopathy in patients with potentially resectable lung cancer. Chest 107: 116-119 [Abstract/Free Full Text].

8. Maguire, W. M., P. G. Herman, A. Kahn, M. Simon-Gabor, V. Cruz, and T. M. Eacobacci. 1994. Interobserver agreement using computed radiography in the adult intensive care unit. Acad. Radiol. 1: 10-14 [Medline].

9. Bloomfield, F. H., R. L. Teele, M. Voss, D. B. Knight, and J. E. Harding. 1999. Inter- and intra-observer variability in the assessment of atelectasis and consolidation in neonatal chest radiographs. Ped. Radiol. 29: 459-462 [Medline].

10. Stewart, T. E., M. O. Meade, D. J. Cook, J. T. Granton, R. V. Hodder, S. E. Lapinsky, C. D. Mazer, R. F. McLean, E. S. Rogovein, B. D. Schouten, T. R. J. Todd, and A. S. Slutsky. 1998. Evaluation of a ventilation strategy to prevent barotrauma in patients at high risk for acute respiratory distress syndrome. N. Engl. J. Med. 338: 355-361 [Abstract/Free Full Text].

11. Fraser, R. G., J. A. Peter Pare, P. D. Pare, R. S. Fraser, and G. P. Genereux. 1988. Diagnosis of Diseases of the Chest, 3rd ed. W.B. Saunders, Philadelphia. xiii-xx.

12. Breslow, N. E., and N. E. Day. 1980. Statistical Methods in Cancer Research, Vol. 1: The Analysis of Case-control Studies. International Agency for Cancer Research.

13. Fleiss, J. L.. 1971. Measuring nominal scale agreement among many raters. Psychol. Bull. 76: 378-382 .

14. McClure, M., and W. C. Willett. 1987. Misinterpretation and misuse of the kappa statistic. Am. J. Epidemiol. 126: 161-169 [Free Full Text].

15. Cook, R. J., and V. T. Farewell. 1995. Conditional inference for subject-specific and marginal agreement: two families of agreement measures. Can. J. Stat. 23: 333-344 .

16. Landis, J. R., and G. G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33: 159-174 [Medline].

17. Cohen, J.. 1968. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol. Bull. 70: 213-220 [Medline].

18. Beards, S. C., A. Jackson, L. Hunt, A. Wood, C. M. Frerk, G. Brear, J. D. Edwards, and P. Nightingale. 1995. Interobserver variation in the chest radiograph component of the lung injury score. Anaesthesia 50: 928-932 [Medline].

19. Cleveland, R. H., M. Schlucter, B. P. Wood, W. E. Berdon, M. I. Boechat, K. A. Easley, M. Meziane, R. B. Mellins, K. I. Norton, E. Singleton, and L. Trautwein. 1997. Chest radiograph data acquisition and quality assurance in multicentre studies. Pediatr. Radiol. 27: 880-887 [Medline].

20. Winer-Muram, H. T., S. A. Rubin, M. Miniati, and J. V. Ellis. 1992. Guidelines for reading and interpreting chest radiographs in patients receiving mechanical ventilation. Chest 102(Suppl.): 565S-570S .
    APPENDIX

Let yijk = 1 if rater j classifies subject i as having diffuse bilateral infiltrates on day k, k = 1, 2, . . . , ki, j = 1, 2, 3, i = 1, 2, . . . , 99, and let yijk = 0 otherwise. The classifications for a given subject are dependent over time, so yijk and yijl are correlated. Furthermore, we are interested in relating the classifications from different raters, which is best achieved through a regression model. We can relate assessments by Raters 1 and 2 through the following random effects model:

logit(pi i1k) = alpha i + beta yi2k,

where alpha  ~ N(alpha , sigma 2) are iid, and beta  is the log odds ratio, reflecting the association between Raters 1 and 2. It is preferable, however, to condition on yi1.Sigma kyi1k, which is sufficient for alpha i to obtain a noncentral hypergeometric distribution that is a function of beta  alone (McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. Chapman & Hall, London). This can be done for every patient, and one can take the product of the resulting likelihoods to obtain an overall likelihood that is just a function of beta . The resulting likelihood is the same as that arising from the conditional analysis of 2 × 2 × k tables arising from stratified case-control studies, and therefore it can be maximized using EGRET, SAS, Splus, and many other computer packages for statistical analysis.

The test for the effect of training is accomplished by adding in a main effect of an indicator variable to a conditional logistic regression model, where the variable indicates whether the responses were obtained before or after a retraining session. For example, we may fit

log(psi 12) = beta 12 + gamma 12X

where psi 12 is the odds ratio reflecting the agreement between Observers 1 and 2, and we may test H0: gamma 12 = 0. If H0 is not rejected, then we would claim that the degree of agreement between Observers 1 and 2 does not depend on whether one is assessing agreement before training or after training. If H0 is rejected, the sign and magnitude of gamma 12 indicates whether the level of agreement has deteriorated or improved, and how much it has changed.

Note that these more sophisticated methods were adopted to handle the dependence of assessments within observers over time. Thus the formula Phi  = [(ad)1/2 - (bc)1/2]/[(ad)1/2 + (bc)1/2] does not apply here. It does for all other applications discussed in this article.





This article has been cited by other articles:


Home page
ICVTSHome page
E. Belmaati, C. Jensen, K. F. Kofoed, M. Iversen, I. Steffensen, and M. B. Nielsen
Primary graft dysfunction; possible evaluation by high resolution computed tomography, and suggestions for a scoring system
Interactive CardioVascular and Thoracic Surgery, November 1, 2009; 9(5): 859 - 867.
[Abstract] [Full Text] [PDF]


Home page
ChestHome page
R. P. Baughman, R. Shipley, S. Desai, M. Drent, M. A. Judson, U. Costabel, R. M. du Bois, M. Kavuru, R. Schlenker-Herceg, S. Flavin, et al.
Changes in Chest Roentgenogram of Sarcoidosis Patients During a Clinical Trial of Infliximab Therapy: Comparison of Different Methods of Evaluation
Chest, August 1, 2009; 136(2): 526 - 535.
[Abstract] [Full Text] [PDF]


Home page
Am J EpidemiolHome page
G. L. Delclos, D. Gimeno, A. A. Arif, F. G. Benavides, and J.-P. Zock
Occupational Exposures and Asthma in Health-Care Workers: Comparison of Self-Reports With a Workplace-Specific Job Exposure Matrix
Am. J. Epidemiol., March 1, 2009; 169(5): 581 - 587.
[Abstract] [Full Text] [PDF]


Home page
Am J Crit CareHome page
L. C. Bevis, G. M. Berg-Copas, B. W. Thomas, D. G. Vasquez, R. Wetta-Hall, D. Brake, E. Lucas, K. Toumeh, and P. Harrison
Outcomes of Tube Thoracostomies Performed by Advanced Practice Providers vs Trauma Surgeons
Am. J. Crit. Care., July 1, 2008; 17(4): 357 - 363.
[Abstract] [Full Text] [PDF]


Home page
ChestHome page
T. W. Rice, A. P. Wheeler, G. R. Bernard, D. L. Hayden, D. A. Schoenfeld, L. B. Ware, and for the National Institutes of Health, National He
Comparison of the SpO2/FIO2 Ratio and the PaO2/FIO2 Ratio in Patients With Acute Lung Injury or ARDS
Chest, August 1, 2007; 132(2): 410 - 417.
[Abstract] [Full Text] [PDF]


Home page
NEJMHome page
L. B. Ware and M. A. Matthay
Acute Pulmonary Edema
N. Engl. J. Med., December 29, 2005; 353(26): 2788 - 2796.
[Full Text] [PDF]


Home page
Arch. Dis. Child.Home page
J Davies, S M Tibby, and I A Murdoch
Should parents accompany critically ill children during inter-hospital transport?
Arch. Dis. Child., December 1, 2005; 90(12): 1270 - 1273.
[Abstract] [Full Text] [PDF]


Home page
Eur Respir JHome page
M. N. Gong, W. Zhou, P. L. Williams, B. T. Thompson, L. Pothier, P. Boyce, and D. C. Christiani
-308GA and TNFB polymorphisms in acute respiratory distress syndrome
Eur. Respir. J., September 1, 2005; 26(3): 382 - 389.
[Abstract] [Full Text] [PDF]


Home page
CMAJHome page
T. McGinn and G. Guyatt
Kappa statistic
Can. Med. Assoc. J., July 5, 2005; 173(1): 17 - 17.
[Full Text] [PDF]


Home page
J Intensive Care MedHome page
M. E. Graat, J. Stoker, M. B. Vroom, and M. J. Schultz
Can We Abandon Daily Routine Chest Radiography in Intensive Care Patients?
J Intensive Care Med, July 1, 2005; 20(4): 238 - 246.
[Abstract] [PDF]


Home page
ChestHome page
F. Michard, V. Zarka, S. Alaya, S. Sakka, and M. Klein
Better Characterization of Acute Lung Injury/ARDS Using Lung Water
Chest, March 1, 2004; 125(3): 1166 - 1167.
[Full Text] [PDF]


Home page
Anesth. Analg.Home page
M. Licker, M. de Perrot, A. Spiliopoulos, J. Robert, J. Diaper, C. Chevalley, and J.-M. Tschopp
Risk Factors for Acute Lung Injury After Thoracic Surgery for Lung Cancer
Anesth. Analg., December 1, 2003; 97(6): 1558 - 1565.
[Abstract] [Full Text] [PDF]


Home page
JBJSHome page
D. S. Bae, P. M. Waters, and D. Zurakowski
Reliability of Three Classification Systems Measuring Active Motion in Brachial Plexus Birth Palsy
J. Bone Joint Surg. Am., September 1, 2003; 85(9): 1733 - 1738.
[Abstract] [Full Text] [PDF]


Home page
ChestHome page
G. S. Martin, E. W. Ely, F. E. Carroll, and G. R. Bernard
Findings on the Portable Chest Radiograph Correlate With Fluid Balance in Critically Ill Patients
Chest, December 1, 2002; 122(6): 2087 - 2095.
[Abstract] [Full Text] [PDF]


Home page
ThoraxHome page
K Atabai and M A Matthay
The pulmonary physician in critical care * 5: Acute lung injury and the acute respiratory distress syndrome: definitions and epidemiology
Thorax, May 1, 2002; 57(5): 452 - 458.
[Abstract] [Full Text] [PDF]


Home page
NEJMHome page
T. J. Nuckton, J. A. Alonso, R. H. Kallet, B. M. Daniel, J.-F. Pittet, M. D. Eisner, and M. A. Matthay
Pulmonary Dead-Space Fraction as a Risk Factor for Death in the Acute Respiratory Distress Syndrome
N. Engl. J. Med., April 25, 2002; 346(17): 1281 - 1286.
[Abstract] [Full Text] [PDF]


Home page
ChestHome page
E. W. Ely and E. F. Haponik
Using the Chest Radiograph To Determine Intravascular Volume Status : The Role of Vascular Pedicle Width
Chest, March 1, 2002; 121(3): 942 - 950.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Respir. Crit. Care Med.Home page
A. D. BERSTEN, C. EDIBAM, T. HUNT, J. MORAN, and T. A. A. N. Z. I. C. S. C. T. GROUP
Incidence and Mortality of Acute Lung Injury and the Acute Respiratory Distress Syndrome in Three Australian States
Am. J. Respir. Crit. Care Med., February 15, 2002; 165(4): 443 - 448.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Respir. Crit. Care Med.Home page
M. J. TOBIN
Critical Care Medicine in AJRCCM 2000
Am. J. Respir. Crit. Care Med., October 15, 2001; 164(8): 1347 - 1361.
[Full Text] [PDF]


Home page
Am. J. Respir. Crit. Care Med.Home page
M. O. MEADE, G. H. GUYATT, R. J. COOK, R. GROLL, J. R. KACHURA, M. WIGG, D. J. COOK, A. S. SLUTSKY, and T. E. STEWART
Agreement between Alternative Classifications of Acute Respiratory Distress Syndrome
Am. J. Respir. Crit. Care Med., February 1, 2001; 163(2): 490 - 493.
[Abstract] [Full Text]


Home page
NEJMHome page
L. B. Ware and M. A. Matthay
The Acute Respiratory Distress Syndrome
N. Engl. J. Med., May 4, 2000; 342(18): 1334 - 1349.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by MEADE, M. O.
Right arrow Articles by STEWART, T. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by MEADE, M. O.
Right arrow Articles by STEWART, T. E.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Proc. Am. Thorac. Soc. Am. J. Respir. Cell Mol. Biol.
Copyright © 2000 American Thoracic Society