Published ahead of print on April 3, 2008, doi:10.1164/rccm.200708-1256OC
© 2008 American Thoracic Society doi: 10.1164/rccm.200708-1256OC
Tuberculosis Outbreaks Predicted by Characteristics of First Patients in a DNA Fingerprint Cluster1 KNCV Tuberculosis Foundation, The Hague, The Netherlands; 2 Department of Infectious Diseases, Tropical Medicine and AIDS, Academic Medical Centre, Amsterdam, The Netherlands; 3 National Mycobacteria Reference Unit, Centre for Infectious Disease Control (CIb/LIS), National Institute of Public Health and the Environment, Bilthoven, The Netherlands; and 4 Department of Tuberculosis Control, Municipal Health Service, Amsterdam, The Netherlands Correspondence and requests for reprints should be addressed to Sandra V. Kik, M.Sc., KNCV Tuberculosis Foundation, P.O. Box 146, 2501 CC The Hague, The Netherlands. E-mail: kiks{at}kncvtbc.nl
Rationale: Some clusters of patients who have Mycobacterium tuberculosis isolates with identical DNA fingerprint patterns grow faster than others. It is unclear what predictors determine cluster growth. Objectives: To assess whether the development of a tuberculosis (TB) outbreak can be predicted by the characteristics of its first two patients. Methods: Demographic and clinical data of all culture-confirmed patients with TB in the Netherlands from 1993 through 2004 were combined with DNA fingerprint data. Clusters were restricted to cluster episodes of 2 years to only detect newly arising clusters. Characteristics of the first two patients were compared between small (2–4 cases) and large (5 or more cases) cluster episodes. Measurements and Main Results: Of 5,454 clustered cases, 1,756 (32%) were part of a cluster episode of 2 years. Of 622 cluster episodes, 54 (9%) were large and 568 (91%) were small episodes. Independent predictors for large cluster episodes were as follows: less than 3 months' time between the diagnosis of the first two patients, one or both patients were young (<35 yr), both patients lived in an urban area, and both patients came from sub-Saharan Africa. Conclusions: In the Netherlands, patients in new cluster episodes should be screened for these risk factors. When the risk pattern applies, targeted interventions (e.g., intensified contact investigation) should be considered to prevent further cluster expansion.
Key Words: tuberculosis transmission DNA fingerprinting prediction epidemiology
Tuberculosis (TB) mainly results from Mycobacterium tuberculosis transmitted through coughing of patients. By DNA typing, one can distinguish different strains of M. tuberculosis (1). Patients sharing an identical M. tuberculosis strain are considered to be part of a "cluster," reflecting recent transmission of M. tuberculosis and rapid progression to disease from recent exogenous infection. Unique DNA fingerprint patterns are assumed to be due to reactivation of remote infections or recent transmission from patients outside the study period or study area (2, 3). Outbreaks of TB occur regularly (4), as evidenced by clustering. Outbreaks could result from failure of contact investigations to detect all contacts and treat those with a recent infection. Population-based studies in low-incidence countries have identified individual risk factors for involvement in a cluster (2, 5–8). Patients in clusters are more often male, young, of certain nationalities, long-term residents in low-endemic countries, urban residents, sputum smear positive, HIV infected, drug or alcohol abusers, or homeless. Other studies have tried to identify risk factors for being the first patient in a cluster and the generation of secondary cases (9, 10). However, these studies assumed that the first patient diagnosed was the source case, which is not necessarily true when patient presentation delay is long. More probable is that the source case will be among the first two patients in a cluster. Although risk factors for an individual to be part of or give rise to a cluster have been assessed (2, 5–10), predictors of further cluster growth have not. Risk factors that predict further cluster growth are relevant for TB control as they may predict outbreaks. Early identification of clusters that potentially become large could help focus TB control efforts, especially in low-incidence countries that are approaching the elimination phase of TB. The aim of our study was therefore to determine which characteristics of the first two cases in a cluster can predict the development of large clusters (of 5 or more cases). Some of the results of this study have been reported previously in the form of an abstract (11, 12).
Data Collection We combined data from the Netherlands Tuberculosis Register (NTR), which includes demographic and clinical information of all patients with diagnosed TB, with data from the National Institute of Public Health and the Environment (RIVM) that include information on species identification, molecular typing, and drug susceptibility from all M. tuberculosis isolates in the Netherlands. Because the NTR is an anonymous register that includes routinely gathered surveillance data, no ethical approval was required for the study. Patients with culture-confirmed TB from January 1, 1993, through December 31, 2004, in both registers were matched on the basis of sex, date of birth, year of diagnosis, and postal code. Patients were included when data in both registers matched completely, or if one minor difference existed in one of the matching variables. For a mismatch in the year of diagnosis, only one calendar year difference between diagnosis (NTR) and isolation (RIVM) was tolerated. Duplicate matches were excluded.
DNA Fingerprinting
Selection of Cluster Episodes
Selection of the First Two Cases The first two cases of a cluster episode were selected according to their date of diagnosis. Characteristics of the first two cases were counted for each cluster episode. Missing values were counted as 0 as we assumed that, in these cases, the risk factor was not present. Consequently, for each cluster episode, the characteristics of interest in the first two cases were coded as either present in both (2), present in one (1), or absent (0).
Statistical Analysis
The relative count of each factor was compared between large and small clusters using logistic regression. Those predictors for which the count differed (P
Sensitivity Analysis
From 1993 through 2004, 18,200 patients with TB were reported to the NTR, 12,457 (68%) of whom had culture-confirmed TB. Of the culture-confirmed cases, 10,567 (85%) could be matched between the two datasets. Of these, 9,024 (85%) had complete agreement in the matching variables, whereas another 1,543 (15%) had a minor difference in one of the matching variables (Figure 2). No difference between matched and nonmatched patients was found regarding sex, age group, and nationality. Patients who were detected passively matched slightly more often (78%) compared with those found actively (65%).
Of the matched cases, 5,454 (52%) were clustered, representing 1,168 different DNA fingerprint patterns (Figure 2). In total, 622 cluster episodes of 2 years were identified comprising 1,756 of 5,454 (32%) cases. Five hundred and forty-two DNA fingerprint patterns were found in a single cluster episode, and 40 were found in two cluster episodes. The number of cases per cluster episode ranged from 2 through 20 (Figure 3). Of the 622 cluster episodes, 568 were small (91.3%) and 54 (8.7%) large. In Table 1, characteristics of cases in cluster episodes are shown and compared with cases that were clustered but not involved in a cluster episode to assess the possibility of selection bias. Because the number of cases assessed was very large, we considered a difference of more than 10% between the two groups relevant. Cases in cluster episodes had less often unknown information in several variables, were more often from Asia, and more often had TB caused by an M. tuberculosis strain of the Beijing genotype or one that was resistant to rifampicin.
In Table 2, characteristics of the first two cases are shown that are associated with large cluster episodes. Univariate analysis showed that time between the first two patients was significantly shorter in large clusters than in small clusters. In 36 of 54 large clusters (67%), the first two cases were diagnosed within a period of 3 months, compared with 150 of 568 (26%) in small clusters. One or both early cases in large clusters were more often young (age < 35 yr). Furthermore, it was more common that both first cases of large cluster episodes lived in an urban setting. HIV infection and MDR were both more often present in the first two cases of large clusters compared with small clusters (P = 0.051 and P = 0.039, respectively). The mean time between onset of symptoms and health seeking (patient delay) of the first patient was 13.7 weeks for first cases in large clusters compared with 8.9 weeks for those in small clusters (Mann-Whitney U test P = 0.432, known in 423 [68%] of all cluster episodes).
Multivariate logistic regression revealed four significant independent predictors for large clusters: a period of less than 3 months between the date of diagnosis of the first two cases, young age of one or both, both living in an urban area, and both coming from sub-Saharan Africa (Table 3). We did not take MDR into account in the multivariate model because the number of MDR-TB cases was small. The discriminative ability of this multivariate model is shown by the ROC curve (Figure 4). The AUC of the ROC curve in Figure 4 (black line) is 0.79 (95% confidence interval [CI], 0.72–0.85). None of the possible interaction terms between independent predictors was significant or increased the AUC of the ROC curve. The optimal cut point for predicting large cluster episodes was when sensitivity was 65% and specificity was 82%. The corresponding positive and negative predictive values were 25 and 96%, respectively. The probability corresponding with this optimal cut point is 14%, indicating that clusters with a predicted probability above 14% are likely to become large. In Figure 5, the probability of a large cluster episode is given for all possible combinations of characteristics of the first two cases. When the first two cases in a cluster episode occur within 3 months' time and at least one of them is younger than 35 year, according to their origin and address, the risk of development of a large cluster is one to more than five times increased.
To determine whether the ability to predict large clusters increased when characteristics of the first three instead of the first two cases were used, we repeated the analysis by comparing large clusters with small clusters with at least three cases (data not shown). Except for age, which was no longer a significant (P = 0.19) predictor, no other independent predictors were found. The AUC of the ROC curve did not change to a relevant extent (from 0.79 to 0.81; 95% CI, 0.74–0.88). The size of a large cluster episode was arbitrarily set at five or more cases. Our model was still valid when large cluster episodes were defined as having four or more, or six or more cases (data not shown). However, the AUC of the ROC curve changed to 0.70 (95% CI, 0.64–0.76) and 0.87 (95% CI, 0.80–0.93), respectively (Figure 4). We evaluated the usefulness of our model for the prediction of large cluster episodes that occurred during a 3-, instead of 2-year period. As a consequence, 30 of our small cluster episodes became large and the AUC of the ROC curve decreased to 0.70 (95% CI, 0.63–0.76) when the same predicting characteristics were included. Origin from sub-Saharan Africa seemed still a risk factor but was not significant anymore (Table 3).
This study showed that the growth of new TB clusters with five or more cases within 2 years can be predicted by the characteristics of the first two cases. Independent predictors for large cluster episodes were age under 35 years, living in an urban area, sub-Saharan African nationality of at least one of the first two patients, and less than 3 months' time between diagnosis of these first two patients. Sensitivity analysis showed that the discriminative ability of our model remained good when the definition of a large cluster episode was changed to include at least four or six cases, or when the time span of a large cluster covered 3 years instead of 2. Time between cases, age, nationality, and residence are all variables that are known shortly after the diagnosis of a new TB case and should be part of the national TB registration. When molecular data and the national registration are combined, new cluster episodes can be screened using these risk factors to identify those clusters at a higher risk of increasing in size, thereby providing an early warning system for municipal health services. In the United States, 38 to 57% of the clustered cases were found in addition to conventional contact investigation when information from genotyping M. tuberculosis isolates was used (21). Nowadays, a considerably faster method than IS6110-RFLP typing is available which is based on variable numbers of tandem repeats (22). This technique enables to give feedback on clustering of TB cases within a few weeks. A short time span between the first two patients in a cluster was the strongest predictor for large cluster episodes. One could attribute this strong association partly to our definition of a large cluster episode, which required more cases within 2 years than small cluster episodes. However, that does not explain our observations that only a period of less than 3 months between the first and second case was predictive, the same risk pattern was found when a cluster episode covered 3 instead of 2 years, and no decreasing trend was observed toward longer periods. Patient delay of first cases of large clusters was substantially longer (mean delay was 4.8 wk longer) than that of small clusters, although this difference was not significant. The prolonged patient delay of first cases of large clusters could explain the very short time span found between the first and second cases of large cluster episodes. Index cases that experienced a longer period with complaints and delayed health seeking may have infected more secondary cases and as a consequence gave rise to larger clusters than those index cases with a shorter patient delay. Population-based studies previously showed that young age is a risk factor for clustering (8, 9, 23–25). Usually, young index cases have more intimate contacts and more contacts in general (26), and as a result, generate more secondary cases (6, 9) than older index cases. This agrees with our finding that young age is a predictor of cluster growth. Also, in urban areas, the number of possible contacts and thus the chance that a patient with infectious TB will infect another person is greater than in rural areas (2, 25, 27). We showed that patients from sub-Saharan Africa more often gave rise to a large cluster episode. Different nationalities have been associated with clustering, depending on the study setting (2, 7–9, 28–31). Most studies showed that clustering tends to occur among persons with the same nationality (28, 29, 31). African patients, especially when coming from Morocco to the Netherlands (9) or from Somalia to Denmark (28), showed high risks of being the first case in a cluster. However, an epidemiologic link is rarely detected among African immigrants who share a DNA fingerprint (32, 33), which suggests that African cases may have contracted their infection in their home country where this DNA fingerprint is common (34). Underlying HIV infection, a well-known risk factor for progression to active TB disease and associated with a shorter time between successive cases in clusters with at least one HIV-positive person (35), is uncommon among patients with TB in the Netherlands (estimated prevalence is 4.1%) (36). We were unable to show that HIV infection was an independent predictor for the development of large cluster episodes. The aim of our study was to find a method to predict the majority of large clusters, without classifying too many small clusters incorrectly as large. The optimal cutoff point of our model allowed us to correctly predict 65% of the large cluster episodes, with only 18% of all small cluster episodes incorrectly predicted as becoming large. Because small cluster episodes occurred more frequently than large ones, the positive predictive value was 25% in this population. If all new cluster episodes with a probability above 14% were considered as potential outbreaks, this would lead to intensified case finding in 172 of 622 cluster episodes during our 8-year observation period; approximately 22 per year. In comparison with all patients with pulmonary TB in the Netherlands (672 in 2005) this is a rather small number. In addition, when intensified case finding succeeds and further transmission is prevented, the number of large clusters will gradually decrease over time. Although large cluster episodes occur rarely, the number of cases involved in large clusters can be substantial (4). Therefore, contact investigations around cases in potentially fast-growing clusters may need more attention than the routine investigation that is done in the Netherlands around all culture-confirmed TB cases. To our knowledge, few studies reported risk factors for cluster growth (10, 37). Driver and colleagues (37) showed that infectiousness of the initial cases was associated with a higher rate of cluster growth compared with clusters in which neither case was infectious. We were unable to confirm this finding because sputum smear results were missing in 29% of our cases and only available since 1996. Even when we considered all missing values as positive sputum smear results, sputum smear positivity of the first two patients was not an independent predictor. Through the selection of a 2-year time period to define cluster episodes, we may not have included all epidemiologically linked cases (38, 39). A recent study in the Netherlands showed that over half of the secondary cases caused by new strains (strains that were not isolated within the preceding 2 yr) occur within 2 years after introduction (40). Another study showed that 86% of cases that clustered within 2 years had an epidemiologic link that was evident or likely (33). We assumed that cases who develop active TB would do so at least within 2 years after infection; otherwise, they were not considered as a secondary case, but as a new source case. By this definition, we were able to include more than one cluster episode of a particular fingerprint. Our results therefore represent predictors for all possible outbreaks of emerging and reemerging strains rather than only new fingerprints. One limitation is that our model may not be valid for existing clusters that continue to have cases at least every 2 years. We found that clustered cases with an infection caused by an M. tuberculosis strain of the Beijing genotype, with rifampicin resistance or from Asia, were relatively more often part of a cluster episode of 2 years than other clustered cases. The association between rifampicin resistance and clustering can be explained by the fact that patients infected with a resistant strain remain infectious longer, because the resistance is usually not recognized directly at diagnosis. The fact that strains of the Beijing genotype were more often part of short cluster episodes is highly interesting, because this suggests that such strains transmit more successfully or that patients infected with such strains more rapidly progress to TB disease. Another limitation of our study is that we could only include culture-positive and matched TB cases and therefore excluded all possible transmission to and from patients who were not confirmed by culture, reducing potential cluster size. Furthermore, misclassification could have occurred due to instability of the fingerprint pattern (41), which would cause us to miss clustered cases. Because the number of large clusters is small, the power of our study was limited to find or exclude risk factors with small relative risks. In conclusion, we showed that the majority of TB outbreaks can be predicted by characteristics of the first two cases in a cluster episode. It is unclear whether the same predictive factors apply in other settings. Even in other low-endemic countries, the population and transmission patterns can differ from those in the Netherlands. However, the methodology we used can be applied by others to identify set specific predictive factors. TB cases who are part of new cluster episodes should be screened for the risk factors described in this study, and targeted interventions (e.g., intensified contact investigation) should be considered to prevent the predicted development of large clusters.
The authors are grateful to all Municipal Health Services in the Netherlands for regularly contributing data to the NTR for over 14 years. The authors also thank Nico Kalisvaart for his assistance in combining the two datasets and Saskia den Boon and Rein Houben for critical review of an earlier version of the manuscript.
Supported by the Netherlands Organization for Health Research and Development (ZonMw). Originally Published in Press as DOI: 10.1164/rccm.200708-1256OC on April 3, 2008 Conflict of Interest Statement: None of the authors has a financial relationship with a commercial entity that has an interest in the subject of this manuscript. Received in original form August 24, 2007; accepted in final form April 1, 2008
Related articles in AJRCCM:
This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||