Study Duration and Measurement Frequency |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
Measuring the longitudinal change in FEV1 is useful for assessing
the adverse effects of respiratory exposures and pulmonary diseases. Investigators seek to estimate the "true" mean FEV1 slope
(µ
) of an infinite population. The difference between the estimated mean FEV1 slope (^µ
) and the true mean slope, resulting
from biological variation and measurement errors, can be minimized by increasing the number of subjects (N), years of follow-up
(D), or the frequency of measurements (P). We defined maximum
error emax such that P[|
µ
|
emax] = 0.95, and thus emax is one-half the width of the 95% confidence interval for µ
. We computed the values of emax on the basis of actual data obtained from
160 coal miners and working nonminers who had completed 11 spirometry measurements, using recommended equipment and
procedures, at 6-mo intervals over 5 yr. Individual 5-yr FEV1 slopes
(
FEV1) were calculated by linear regression. For a range of values
of N, D, and P, tables are provided for emax, the magnitude of detectable differences in
FEV1 between two groups, and the recommended number of subjects needed in each of two groups to reliably detect the anticipated differences in
FEV1. The tables provide
unique guidance for investigators in selecting among various
study design options.
| |
INTRODUCTION |
|---|
|
|
|---|
Longitudinal change in FEV1 over time (
FEV1 or slope ) is a
valuable health outcome measure when assessing the adverse
effects of exposures or diseases in individuals and groups.
However, spirometry results are subject to many sources of
measurement noise, making it difficult to detect the effects of
interest (the signal) (1). Three sources of variation that affect
the accuracy of estimation in FEV1 slope have been studied by
previous investigators: variations arising in the measurement
process, variations within an individual between occasions, and
variability between individuals. With the relatively high variability associated with short-term measurements of
FEV1,
the detection of important declines in FEV1 is difficult with
short periods of observation (2). Up to 8 yr of follow-up has
been recommended to appreciate with precision the rates of decline in FEV1, an intimidating prospect for most investigators (3). The absolute difference between estimated mean
FEV1 slope (^µ
) and the "true" mean slope (µ
), expressed by the "error" term (e = | ^µ
µ
|), can be minimized by increasing the number of subjects (N), the years of follow-up (D),
and/or the frequency of measurements (P) (4, 5).
From a longitudinal field spirometry survey, using equipment and procedures recommended by the American Thoracic Society (ATS) (6), we calculated the intra- and intersubject variability, under various available combinations of study
duration and measurement frequency. We defined the term
maximum error (emax) by setting the probability equal to 95%
that the error (e) is less than or equal to emax. That is, P[|^µ
µ
|
emax] = 0.95, and accordingly, emax is one-half the width
of the 95% confidence interval for µ
. Using the field test results, we present tables of the maximum error values (emax) for
FEV1, the anticipated accuracy in estimation of
FEV1, the
magnitude of differences in
FEV1 between two groups that
can be reliably detected, and finally the number of subjects
needed in each of the two groups for detecting anticipated differences in
FEV1. These results should provide guidance for
investigators in designing more efficient longitudinal spirometry studies.
| |
METHODS |
|---|
|
|
|---|
Subjects and Spirometry
The subject selection and the spirometry measurements have been detailed elsewhere (7). A cohort of 411 underground coal miners and working nonminers was established in our previous study. Spirometry testing was conducted with an 8-L survey spirometer with an attached microprocessor (Eagle II; Warren E. Collins, Braintree, MA). All testing was performed at the worksite, using the 1979 spirometry standards of the American Thoracic Society (6). Forced exhalation maneuvers were done in the standing position with a nose clip in place. Three to five maneuvers were obtained from each subject. Spirometry was repeated at 6-mo intervals and individual 5-yr FEV1 slopes were calculated by linear regression. In the present study, we used data obtained from 160 subjects who had completed 11 measurements in approximately 5 yr.
Estimating the Maximum Error Values
The absolute difference between the estimated FEV1 mean slope (^µ
)
and the unknown "true" mean slope (µ
) gives the amount of error in
the estimation. To evaluate the precision of different study design
strategies for repeated measurements of spirometry, we computed the
maximum error values by setting the probability equal to 95% that
the error is less than or equal to emax (P[|^µ
µ
|
emax] = 0.95), and
thus emax is one-half the width of the 95% confidence interval (CI).
Values for emax were determined for the various combinations of
study duration and number of equally spaced spirometry tests that
were available in the entire sample of N = 160 subjects.
Among adults, the linear model for individual lung function decline provides a good approximation to the actual shape of the decay
curve over a period of 4 to 6 yr (8). Assuming a 5-yr linear relation
among 11 measurements, we estimated the variation around the fitted
regression line for each individual j (^
2j) and the average variation
around the regression line for all study subjects. We also estimated the
FEV1 slope for individual j (^
j), and obtained the average of ^
j values,
giving the average slope ( ^
) for N individuals; µ
= ^
is the unbiased estimator of µ
(the mean of an infinite population of individuals).
For all 160 subjects, the individual spirometry tests were numbered
1 through 11, from the first to the last test during the 5 yr of follow-up.
We used all combinations of data with equal spacing between tests and
at least three measurements. Within the 5-yr study, there are 17 separate combinations of D and P (see Table 2), and a total of 72 possible
combinations of equally spaced tests with at least 3 measurements, that
is, 6 monthly, yearly, or at the beginning, middle, and end of the study.
For example, there are five available combinations of tests for D = 3 yr and P = 3 tests, including tests 1-4-7, 2-5-8, 3-6-9, 4-7-10, and 5-8-11;
similarly, there are three different combinations of tests available for
D = 4 and P = 9; and so forth. We computed the maximum error values using within- and between-subject variation (^
2 and ^
2
) obtained
from the full sample of N = 160 individuals, and the corresponding
study duration (D) and the number of spirometry measurements (P)
for all 72 available combinations of tests. We then grouped the combinations of tests into sets (with same D and P, but different test numbers); and report the average maximum error value emax for each set
(see Table 2). For assessing the sample size effect, we computed the
emax for sample sizes of N = 30, 50, and 100, using the values of ^
2 and
2
obtained from the cohort of N = 160 individuals. All the calculations for parameters of ^
j,
2, ^
2
, and the maximum error values were
conducted by using SAS (Cary, NC) software (9) (see Appendix for a
detailed explanation of the formulas used).
|
Estimating the Detectable Difference in
FEV1
Suppose we want to detect a difference of
units between the average rates of change in two groups. Using Schlesselman's formula 14 (5), {^
2
+ 12(P
1)^
2/[D2P(P + 1)]}/N =
2/[2(Z
+ Z
)2], we are
able to estimate the magnitude of
units under the following assumptions. Suppose we want to detect the difference at the significant level of
= 0.05 (Z
= 1.64), using a one-sided test of significance, and
= 0.20 (80% power, Z
= 0.84). We calculated the anticipated detectable differences in
FEV1 by substituting the values of P and D, and
the corresponding values of ^
2 and ^
2
presented in Table 2, for a
fixed sample size of N = 30, 50, 100, and 160 for each group, respectively. (For a two-sided test of significance, Z
should be replaced by
Z
/2, and Z
/2 = 1.96.)
Estimating the Number of Subjects Needed
On the basis of the above formula (Schlesselman's formula 14 [5]), we
calculated the anticipated number of subjects requested in each group
to detect a difference in average
FEV1 between two groups for any
fixed
units. The sample size estimation was made by using the corresponding values of ^
2 and ^
2
obtained from the full sample of N = 160 for various available values of D and P, at
= 0.05 and 80%
power, using a one-sided test of significance.
| |
RESULTS |
|---|
|
|
|---|
Table 1 gives the characteristics for the original study cohort as a whole by the number of spirometry tests completed. In the present analysis, we included 160 participants who had completed all 11 spirometry measurements. The average age at baseline was about 39 yr (ranged from 25 to 65 yr); 24% were current smokers and 44% were nonsmokers at the initial survey. The mean FEV1 at baseline was 4.18 L, and 3.85 L at the last survey. A greater proportion of miners than nonminers participated in all 11 spirometry tests (59 versus 42%, p < 0.001). Otherwise, there were no significant differences in the baseline characteristics of those who were included in the study group (N = 160) versus those who were excluded from the current analyses (N = 251).
We calculated the within- and between-subject variations
(^
2 and ^
2
), and the maximum error values for all 72 available
combinations of tests with study duration (D) varied from 1 to
5 yr and the number of measurements (P) from 3 to 11; and
then grouped the results into 17 sets with the same D and P. Table 2 presents means of ^
2 and ^
2
, by various available values of D and P, for N = 160. Table 3 summarizes the maximum error values for the 17 available sets of P and D, when N = 30, 50, 100, and 160 subjects. The results indicated that the
longer the study durations and the larger the sample sizes, the
smaller the maximum errors. However, within the same study
durations, testing every 6 or 12 mo resulted in similar maximum error values. When designing longitudinal spirometry studies, several combinations of D, P, and N are available that can achieve a desired value of maximum error. For example,
to design a study in which the error in the estimation of FEV1
slope is intended to be less than or equal to 11.7 ml/yr (i.e., the length of the CI = 23.4 ml/yr), the investigator could choose 16 options (as highlighted) from Table 3, including between N = 100, D = 5, P = 11 and N = 160, D = 3, P = 7.
|
Table 4 illustrates the size of the anticipated difference in average FEV1 slope between two groups to be detected at
= 0.05 and 80% power level when sample sizes are N = 30, 50, 100, and 160 for each of the two groups. For example, if the
anticipated difference in
FEV1 between two groups is 25-30
ml/yr, then the study design would have nine options (as highlighted) from Table 4, including between N = 30, D = 5, P = 11 and N = 160, D = 2.5, P = 6. Again, with a longer study
duration and a larger sample size, a smaller difference can be
detected. However, with the same study duration and sample
size, as the testing frequency increases the changes in detectable
FEV1 are relatively small. For longer follow-up duration and larger sample size, annual measurements are unnecessarily frequent.
|
Table 5 provides estimates of the number of subjects
needed in each of two groups to detect a fixed difference in
FEV1 between the groups at a significance level of 0.05 with
80% power. For example, a study is planned for lung function
in two groups of employees: those exposed to a chemical dust
and those not exposed, in which FEV1 will be measured every
6 mo for 3 yr. If the exposure is estimated to result in a mean
difference in FEV1 slope of at least 30 ml/yr between the exposed group and the unexposed group, Table 5 suggests that,
to detect this effect, the sample size should be about 76 subjects in each group, as highlighted.
|
| |
DISCUSSION |
|---|
|
|
|---|
The rate of change in spirometric lung function is widely used as an outcome variable in investigating the possible hazard of specific exposures. The usefulness of FEV1 measurements in assessing the adverse effects of environmental exposures or disease processes is critically dependent on the intrinsic variability in the measurement. Many sources of variation can be controlled by assuring adequate training of personnel administering the tests, by selecting equipment that has been tested by accepted procedures, and by adhering to recommended test methodologies and procedures (10, 11).
After these sources of measurement variation have been controlled, the "error" term in estimating group mean FEV1 slopes can be minimized by increasing the number of subjects (N), the duration of follow-up (D), and/or the frequency of measurements (P). Because longitudinal studies are labor intensive and time consuming, investigators aim to select efficient design strategies that can reliably detect the anticipated effects.
Statistical techniques have previously been suggested in
planning and in evaluating longitudinal study design choices,
such as frequency of measurement and study duration. In 1974, Berry estimated the standard deviation of the FEV1 between
occasions (within-subject variation) to be 0.12 L, based on several previous studies. By substituting this single value of 0.12 L
as the between-occasion standard deviation into the formula
given by Schlesselman (5) [Eq. (6)], Berry produced a table of
the standard errors (L/yr) of the FEV1 slope for study durations ranging from 1 to 10 yr and intervals between tests of 1, 3, 6, and 12 mo (4). The results provided information on trends
for precision in the estimation of
FEV1, and suggested that
for shorter follow-up periods, the estimation error predominates, whereas for longer periods of follow-up it is negligible.
Our study indicates similar conclusions: that longer study
durations and larger sample sizes give smaller maximum errors; whereas within the same study durations, testing every 6 or 12 mo results in similar emax values. However, in the current
study, based on actual longitudinal spirometry tests performed
with ATS-recommended equipment and procedures, we were
able to calculate the within-subject and between-subject variation, ^
2 and ^
2
, corresponding to a range of specific combinations of study duration and measurement frequency. Tables
are reported for the maximum error (emax), a single indicator
for assessing the precision of estimation for FEV1 slope. Emax
values integrate the effects of D, P, N, and inter- and intrasubject variability, and offer unique guidance for investigators
in selecting among various study designs. Tables are also presented for predicting the magnitude of differences in
FEV1
between two groups that can be reliably detected, and for estimating the number of subjects needed in each of two groups
for reliably detecting anticipated differences in
FEV1.
This study and the study by Berry (4) provide tables recommending the number of subjects required in each of two groups for a longitudinal study of FEV1; however, the results are
quite different. For example, Table VI in Berry (4) indicates
that a 2-yr study with three test points would require 153 subjects in each of two groups to have sufficient power to detect a
30-ml/yr difference, whereas our study predicts the need for
229 subjects, a 50% increase. For a 5-yr study, our results suggest the need for 27 subjects, whereas Berry indicated 37 were
needed. In contrast, for a 4-yr study, results from this study
and those of Berry are similar (requiring 42 versus 45 subjects
for 6-monthly measurements, and 54 versus 53 subjects for
yearly testing). One of the reasons for the differences might be
related to the values of within- and between-subject variation
being used in the estimation. Berry used single values of ^
and
^
for different combinations of study duration and measurement frequency, rather than values from actual test results, as
in the current study. Schlesselman commented that actual longitudinal data were needed to use these statistical techniques,
and avoids speculation about the size of ^
2
(5).
Certain cautions should be observed in using the tables to evaluate various study designs, because results from studies that use different methods or populations may not be fully comparable. First, the linear function for individual lung function decline provides a good approximation to the actual shape of the FEV1 decay curve for periods up to 4 to 6 yr. The adequacy of this approximation will decrease as the length of follow-up increases, but should still remain good for adults under 60 yr of age (8). For longer studies, basing statistical analyses solely on linear trends should be satisfactory as a first approximation, but may be insufficient for a more refined analysis, in which a second degree polynomial or other, more complicated curves may be appropriate. Second, an increase in within-subject variability in FEV1 has been observed among patients, compared with the general population (12). This might have implications for longitudinal study errors, particularly among current and former smokers, which represented about 55% of our study population. Among a currently working population, some investigators have found that smoking status has little effect on the standard deviation of FEV1 slopes (13). However, the applicability of our results to studies of populations with a high proportion of respiratory illness is not clear. Finally, in our study, testing was done throughout the 5-yr period according to ATS recommendations, by trained and motivated technicians, with ongoing quality assurance, and using an accurate volume spirometer. The results will be most suitable for studies using similar methods.
In summary, the results of this investigation provide guidance to researchers in selecting for longitudinal field spirometry studies a practical and efficient design that will result in acceptable values of maximum error and adequate power to detect anticipated differences in FEV1 slopes between two groups. The tables should be most useful for studies involving other middle-aged (25 to 65 yr), white, male working populations, with a similar proportion of smokers. It is anticipated that data from other longitudinal spirometry studies will be analyzed in a similar fashion, in order to provide data relevant to other populations.
|
| |
Footnotes |
|---|
Correspondence and requests for reprints should be addressed to Mei-Lin Wang, MD, NIOSH, 1095 Willowdale Road, Mail Stop H-2800, Morgantown, WV 26505. E-mail: mlw4{at}cdc.gov
(Received in original form March 31, 2000 and in revised form July 24, 2000).
Acknowledgments: The authors are grateful for the thoughtful comments on the manuscript provided by Drs. Paul Enright and Duane L. Sherrill (University of Arizona, Tucson, AZ), and by Drs. Dan S. Sharp and Eva Hnizdo, and by Ms. Patricia Schleiff (National Institute for Occupational Safety and Health).
Supported by the National Institute for Occupational Safety and Health.
| |
References |
|---|
|
|
|---|
1. Buist AS, Vollmer WM. The use of lung function tests in identifying factors that affect lung growth and aging. Stat Med 1988; 7: 11-18 [Medline].
2. Hankingson J, Wagner GR. Medical screening using periodic spirometry for detection of chronic lung disease. Occup Med State Art Rev 1993; 8: 353-361 .
3. Clement J, Van De Woestijne KP. Rapidly decreasing forced expiratory volume in one second or vital capacity and development of chronic airflow obstruction. Am Rev Respir Dis 1982; 125: 553-558 [Medline].
4. Berry G. Longitudinal observations: their usefulness and limitations with special reference to the forced expiratory volume. Bull Physio-path Respir 1974; 10: 643-655 .
5. Schlesselman JJ. Planning a longitudinal study: II. Frequency of measurement and study duration. J Chron Dis 1973; 26: 561-570 [Medline].
6. American Thoracic Society. Standardization of spirometry. Am Rev Respir Dis 1979;119:4-11.
7.
Hodgins P,
Henneberger PK,
Wang M-L,
Petsonk EL.
Bronchial responsiveness and five-year FEV1 decline: a study in miners and nonminers.
Am J Respir Crit Care Med
1998;
157:
1390-1396
8. Vollmer WM, Johnson LR, McCamant LE, Buist AS. Methodologic issues in the analysis of lung function data. J Chron Dis 1987; 40: 1013-1023 [Medline].
9. SAS Institute. SAS/STAT user's guide: version 6, 3rd ed. Cary, NC: SAS Institute, Inc.; 1990. p. 818-875.
10. American Thoracic Society. Standardization of spirometry: 1987 update. ATS statement. Am Rev Respir Dis 1987;136:1285-1298.
11.
American Thoracic Society. ATS statement: standardization of spirometry
1994 update. Am J Respir Crit Care Med 1995;152:1107-1136.
12.
Pennock BE,
Rogers RM,
McCaffree DR.
Changes in measured spirometric indices: what is significant?
Chest
1981;
80:
97
13.
Wang ML,
McCabe L,
Petsonk EL,
Hankinson JL,
Banks DE.
Weight
gain and longitudinal changes in lung function in steel workers.
Chest
1997;
111:
1526-1532
| |
APPENDIX |
|---|
Suppose the distribution of slope corresponding to an infinite
population of individuals is modeled by a probability density function f(
). The mean and variance of this distribution, respectively, are
|
The unbiased estimator of µ
is mean of the estimated
slopes for the sample of N individuals. For a given group of N individuals, the individual slopes
1,
2, . . . ,
N can be considered as a random sample of size N from the population of
values. Let ^
j denote the least-squares estimate of
j, the true
slope of the individual j. The estimate of µ
is
|
and the estimate of the variance 
2
is
|
The mean of ^
(the expected value of ^
) is
|
where E denotes expectation. That is, ^
is an unbiased estimator of µ
.
Similarly, the variance of ^
is V[ ^
] = (1/N2)
.
|
|
in where t1, . . . ,tp are the times at which the tests were performed, and
2 is the variation about the regression line for
the jth individual, and it is assumed to be same for all individuals. Thus,
|
Then, using formula 12 of Schlesselman (5), the estimated standard error of the sample mean slope of FEV1 is
|
|
in which ^
2j is the mean square error (MSE) of the simple line ar regression model for individual j.
|
is approximately normally distributed with mean zero and variance one. Then,
|
That is, if we estimate µ
with ^
then the maximum error in
our estimate is
|
The maximum error value is exactly half the width of the confidence interval for µ
.
This article has been cited by other articles:
![]() |
M. L. Wang, B. H. Avashia, and E. L. Petsonk Interpreting Periodic Lung Function Tests in Individuals: The Relationship Between 1- to 5-Year and Long-term FEV1 Changes. Chest, August 1, 2006; 130(2): 493 - 499. [Abstract] [Full Text] [PDF] |
||||
![]() |
E Hnizdo, L Yu, L Freyder, M Attfield, J Lefante, and H W Glindmeyer The precision of longitudinal lung function measurements: monitoring and interpretation Occup. Environ. Med., October 1, 2005; 62(10): 695 - 701. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Mehta, P. K. Henneberger, K. Toren, and A-C. Olin Airflow limitation and changes in pulmonary function among bleachery workers Eur. Respir. J., July 1, 2005; 26(1): 133 - 139. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. TOBIN Sleep-disordered Breathing, Control of Breathing, Respiratory Muscles, Pulmonary Function Testing, Nitric Oxide, and Bronchoscopy in AJRCCM 2000 Am. J. Respir. Crit. Care Med., October 15, 2001; 164(8): 1362 - 1375. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Proc. Am. Thorac. Soc. | Am. J. Respir. Cell Mol. Biol. |