Published ahead of print on November 6, 2003, doi:10.1164/rccm.200204-347OC
© 2004 American Thoracic Society Repeatability of Spirometry in 18,000 Adult PatientsDepartment of Medicine, University of Arizona, Tucson, Arizona; and Department of Radiology, University of Iowa Hospitals and Clinics, Iowa City, Iowa Correspondence and requests for reprints should be addressed to Paul Enright, M.D., 4460 East Ina Road, Tucson, AZ 85718. E-mail: lungguy{at}aol.com
The objective of this study was to determine the limits for repeatability of FEV1, FVC, and PEF during spirometry test sessions in adult outpatients. A retrospective chart review of 18,000 consecutive patients, aged 20 to 90 years, referred to a large outpatient pulmonary function laboratory for testing was performed. Measurements included the differences between the highest and second-highest FVC (dFVC), FEV1 (dFEV1), and PEF (dPEF), from prebronchodilator spirometry, and anthropometric factors. Ninety percent of the patients were able to reproduce FEV1 within 120 ml (6.1%), FVC within 150 ml (5.3%), and PEF within 0.80 L (12%). Patient characteristics, such as sex, age, height, smoking status, and FEV1 (% predicted), had very little effect on repeatability, explaining only 2 to 4% of the variation in repeatability (expressed in milliliters). We conclude that the ability of patients to meet or exceed spirometry repeatability goals does not depend on patient characteristics when testing is performed by experienced personnel. The current American Thoracic Society repeatability goal of 200 ml for FEV1 and FVC may be too lenient.
Key Words: spirometry quality control airway obstruction lung-function standards It is important to try to achieve good repeatability (reproducibility) of FEV1 and FVC within a spirometry test session because poor repeatability reduces confidence in the interpretation of bronchodilator or methacholine response (short-term) and long-term (month-to-month or year-to-year) changes in lung function. For this reason, the American Thoracic Society (ATS) recommends that procedural sources of variation in lung function be minimized (1). On the other hand, trying to meet overly stringent criteria for withintest session repeatability is frustrating for technicians and patients alike, and lengthens test time. Guidelines for the performance of spirometry have been based on published analyses of thousands of spirometry tests done by experienced technicians to ensure that the repeatability goals are practical. The current ATS criteria for satisfactory spirometry (2) are based on the 90th percentile values obtained during a large, population-based survey, the Third National Health and Nutrition Examination Survey (3). The ATS standard states that after three acceptable maneuvers are performed, the two largest FEV1s should match within 200 ml (the difference between the highest and second-highest FEV1 within the prebronchodilator spirometry test session [dFEV1]), and the two largest FVCs should also match within 200 ml (the difference between the highest and second-highest FVC within the prebronchodilator spirometry test session [dFVC]). If both of these repeatability criteria are not met, then additional maneuvers should be performed in an attempt to achieve better repeatability (up to a total of eight maneuvers). The current European Thoracic Society criteria for satisfactory spirometry (4) states that the chosen (largest) values should not exceed the next highest by more than 5% or 100 ml, whichever is greater. The European Thoracic Society criteria was chosen to agree with the earlier 1987 ATS spirometry standards (5). The European Thoracic Society also suggests that "as a useful criterion" the two highest PEFs match within 10% (the difference between the highest and second-highest PEF within the prebronchodilator spirometry test session [dPEF]). Most spirometry testing is done for patients with pulmonary problems and not for general population samples, so the current study was performed to determine the withintest session spirometry repeatability of adult patients (many sick and elderly), who were referred to a large pulmonary function laboratory for testing.
Since 1992, all results from pulmonary function testing done at the outpatient pulmonary function laboratory of the Mayo Clinic in Rochester, MN have been stored in a database. The analyses for this manuscript were performed on a subset of that database, obtained by a search for all spirometry results performed on patients aged 20 to 90 years from January 1, 1996 to December 31, 2000. Spirometry was performed by 16 technicians with considerable full-time pulmonary function testing experience, who were certified by the American Association of Respiratory Care. Nine spirometers were used, all of the same model (Medical Graphics 1085 desktop system; St Paul, MN). This spirometer uses a screen pneumotach (Hans Rudolph model 3813; Kansas City, MO), which is heated to 37°C to prevent condensation forming on the screen. The pneumotachometer is located at the end of a 3-foot-long breathing tube. Spirometry test procedures conformed to 1995 ATS standards. The accuracy of each spirometer was checked daily, using a 3-liter calibration syringe, emptied at three different speeds. Testing was allowed on a given spirometer only after the measured volume errors were less than 3%. The patients were vigorously coached by a technician to perform forced expirations until three acceptable maneuvers (or a maximum of eight) were recorded. Acceptability was determined according to ATS recommendations. A color VGA monitor displayed a real-time tracing of exhaled flow versus volume, which was viewed by the subject and technician. Descriptive statistics were calculated for results from the three best maneuvers from each prebronchodilator test session, and for dFVC, dFEV1, and dPEF. To identify significant influences on performance, multiple regression analyses were performed on each performance-quality variable. The initial regressions included continuous independent variables for age, height, and percent-predicted FEV1 as well as dichotomous independent variables for male sex and smoking status (ever vs. never).
Of all 18,526 patients tested, 52% were male, 54.8% reported ever smoking, 22.8% reported episodes of shortness of breath with wheezing during the prior 12 months, 11.0% gave a history of physician-diagnosed asthma, and 9.5% reported having emphysema or chronic obstructive pulmonary disease. The ranges of height, age, and impairment of lung function were very wide (see Table 1) .
The ability of men and women to obtain reproducible FEV1s, FVCs, and peak flows was almost identical when expressed as percent difference. Only 5% of the patients were unable to match their highest FEV1 within 150 ml (see Table 2) . Half of them matched the FEV1s within 58 ml (3%) and FVCs within 72 ml (2.6%). Ninety-five percent of the patients were also able to match their highest peak flow within one liter per second.
Age did not affect repeatability in any of the models. Sex, height, smoking status, and the degree of lung function impairment (percent-predicted FEV1) explained less than 10% of the variance in the ability of patients to reproduce their spirometry values (see the R2 values in Table 3) . The use of absolute goals for repeatability (ml or L/seconds) results in less dependence on individual patient characteristics (a smaller R2 for the model) when compared with the use of percentage goals (such as a 5% match). Older patients were able to obtain repeatability for all three variables (regardless of how the match was expressed) as often as younger patients. Shorter patients and those with worse baseline lung function were less able to obtain reproducible maneuvers when expressed as a percent difference.
In general, the spirometry quality of the adult patients in our study compared favorably with results reported by other investigators. We believe that relatively stringent withintest session repeatability goals for the key spirometry variables FEV1 and FVC are important because they improve confidence in the diagnostic discrimination of the test and the confidence in which changes in lung function may be interpreted by the physician who ordered the test. We believe that the "gold standard" by which repeatability goals are determined should be based on the ability of highly experienced technicians, using optimal quality instruments, to meet the goals in 9 of every 10 patients when testing a wide variety of patients referred for pulmonary function testing. Most of the technicians who tested the patients in this study had performed spirometry as their primary responsibility for many years, and the spirometers were very well maintained. The very large number of patients, with widely varying age, height, smoking status, and degree of lung disease, make the results of this study highly generalizable.
Absolute versus Percent Match
Nine of every 10 patients could match their highest FEV1 within 120 ml (see Table 2, the 90th percentile for dFEV1), within 150 ml for FVC, and within 0.80 L/second for peak flow. These results suggest that the 1995 ATS recommendations (2) for spirometry repeatability goals (dFEV1 and dFVC < 200 ml) are too lenient for adults. The Third National Health and Nutrition Examination Survey was a large study of a general population sample (3), with a mean dFEV1 of 56 ml for women and 65 ml for men, almost identical to the results from our patients. The investigators recommended a goal of less than 200 ml, which was met by 94% of their subjects, regardless of height or sex. Our results are almost identical to this study of mostly healthy individuals, but we suggest that the goal be set so that 90% of patients will pass when tested by an experienced technician, instead of 95%.
Can Sick Patients Do as Well?
A large A very large study of young men in Norway found that 9.5% failed the dFEV1 goals of less than 5% or less than 100 ml, and this failure was more common in shorter men, older men, never smokers, and those with respiratory symptoms (7). A population-based study of 416 young adults (8) noted that young men with bronchial hyperresponsiveness and young women who were cigarette smokers were more likely than others to fail a dFEV1 goal of less than 100 ml. Our model for dFEV1 (milliliters) shows that smokers were only slightly more likely to have a larger dFEV1 (by an average of 3.5 ml when compared with never smokers). About 12% of their subjects failed that goal (an almost identical percentage as the patients in our study). A study of 864 employees (mean age 45) found that workers with lower lung function, as well as older workers, were less able to obtain FEV1s matching within 5% (9), but there was no association with smoking status. We found no significant effect of age on any of the repeatability variables, but current and former smokers had a slightly higher mean dFEV1 and a slightly higher mean dFVC.
Can Children and the Elderly Do as Well? Despite the association between cognitive function and the ability to perform reproducible spirometry in elderly persons (11), a population-based study of 5,201 persons aged 65 years and more, tested by 16 different technicians, demonstrated that only 3% could not match FEV1s within 200 ml (12). However, repeatability was not as good for a subsequent population-based study of elderly black subjects (13). The need for PEF repeatability criteria during spirometry tests is less important than for FEV1 and FVC because PEF and time to peak flow are used primarily as indices of the effort of patients to blast out the air quickly during the first 100 to 200 milliseconds of the maneuver and not for detecting airway obstruction. The 1995 ATS recommendations stated that "Although there may be some benefit from using PEF repeatability to improve subject effort, no specific [PEF] repeatability criterion is recommended at this time." Coates and coworkers found that in children with asthma or cystic fibrosis tested by hospital-based pulmonary function laboratory technicians, dFEV1 was much more closely associated with variation in the FVC due to variation in the depth of inhalation preceding the FVC maneuver than with variation in peak flow (dPEF) (14). However, if PEF repeatability is used, a dPEF goal of either 0.8 L/seconds or 16% is reasonable because the influence of patient characteristics is nearly the same for the absolute difference and the percent difference. Repeatability criteria should be first used by technicians as a goal while performing spirometry, performing additional maneuvers (up to a total of 8) in an attempt to obtain a good match between the highest and second-highest values obtained. After the second and subsequent maneuvers, the spirometer's software should display the repeatability of acceptable maneuvers (or a quality grade, from A to F, based on acceptability and repeatability). After testing is completed, the degree of repeatability is valuable to grade the quality of the test session, as done in research studies (1517) and recommended for office spirometry (18).
Will These Goals Be More Difficult with Other Spirometer Models?
Conclusions
These analyses could not have been performed without the pulmonary function database system established with considerable foresight by Drs. Joseph Rodarte and Robert Hyatt. The high quality of the tests is due to the experience, patience, and skills of the pulmonary function technicians of the Mayo Clinic in Rochester, Minnesota.
Conflict of Interest Statement: P.E. has no declared conflict of interest; K.C.B. has no declared conflict of interest; D.L.S. has no declared conflict of interest. Received in original form April 19, 2002; accepted in final form November 5, 2003
This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||