|
|
||||||||
Original Research |
1 Group Health Center for Health Studies, Group Health Cooperative, 1730 Minor
Ave., Ste. 1600, Seattle, WA 98101.
2 Department of Radiology, University of California, San Francisco, San
Francisco, CA.
3 Department of Family Medicine, Oregon Health Sciences University, Portland,
OR.
4 Department of Radiology, University of North Carolina, Chapel Hill, NC.
5 Department of Internal Medicine, University of Washington School of Medicine,
Seattle, WA.
Received August 15, 2007;
accepted after revision November 20, 2007.
Supported by the Agency for Healthcare Research and Quality (HS-10591) and
the National Cancer Institute (1R01 CA 107623 and K05 CA 104699; Breast Cancer
Surveillance Consortium: U01 CA63731, U01 CA86082, U01 CA63736, U01 CA70013,
U01 CA63740, U01 CA70040, U01 CA69976, and U01 CA86076; and K05 CA104699).
Abstract
|
|
|---|
MATERIALS AND METHODS. We evaluated 45,007 initial short-interval follow-up mammograms from the Breast Cancer Surveillance Consortium interpreted 3–9 months after a probably benign assessment on a screening or diagnostic examination between 1994 and 2004. We linked these mammograms with patient characteristics and breast cancer diagnoses within 12 months. A subset of short-interval follow-up examinations (n = 13,907) was merged with radiologist characteristics collected from survey data from 130 interpreting radiologists. Using logistic regression, we fit generalized estimating equations to model sensitivity and specificity of short-interval follow-up mammograms by patient and radiologist characteristics.
RESULTS. For every 1,000 women, 8.0 women (0.8%) were diagnosed with breast cancer within 6 months and 11.3 (1.1%) within 12 months. Sensitivity was 83.3% (95% CI, 79.4–87.3%) for cancers diagnosed within 6 months and 60.5% (56.2–64.7%) for those diagnosed within 12 months. Specificity was 97.2% (96.9–97.6%) at 6 months and 97.3% (96.9–97.6%) at 12 months. Sensitivity at 12 months increased among women with unilateral short-interval follow-up mammograms (odds ratio, 1.56 [95% CI, 1.06–2.29]) and when the interpreting radiologist spent more than 10 hours a week in breast imaging (odds ratio, 3.25 [1.00–10.52]).
CONCLUSION. Initial short-interval follow-up mammography examinations had a lower sensitivity for detecting breast cancer within 12 months than other diagnostic mammograms (61% for short-interval follow-up vs 80% for diagnostic mammograms reported in the literature). However, sensitivity within the 6-month interval that is usually recommended for subsequent follow-up was 83%. Accuracy of short-interval follow-up mammograms was influenced by few patient and radiologist characteristics.
Keywords: breast cancer diagnostic mammography sensitivity short-interval follow-up mammography specificity
|
|
|---|
Despite the frequency of short-interval follow-up examinations in clinical practice, few data such as sensitivity and specificity are available on the performance characteristics of these examinations. Our study was stimulated by data available on the Website of the National Cancer Institute provided by the Breast Cancer Surveillance Consortium (BCSC) [13]. One table on this Website shows performance characteristics of diagnostic mammography examinations in the United States. The reported sensitivity of short-interval follow-up examinations was substantially lower than that of other diagnostic examinations (55.8% vs 79.8%) [1, 13]. We found this lower sensitivity of concern because it suggests that nearly one half of breast cancers are missed on these short-interval follow-up examinations.
The purpose of this study was to examine the accuracy of short-interval follow-up exami nations and factors that might affect interpretive performance relative to other diagnostic exami nations. We used detailed patient information from the BCSC linked to radiologist information from a self-administered survey to describe the sensitivity and specificity of short-interval follow-up mam mo grams. We also evaluated the association between the per formance of short-interval follow-up mammo grams and patient and radiologist characteristics in clinical practice. Patient characteristics, such as age and breast density, have been associated with the interpretive performance of both screening and diagnostic mammo graphy exam inations [14–16]. We do not know whether patient characteristics are associated with the accuracy of short-interval follow-up exami nations, and this information could be used to determine which women would be best suited for periodic short-interval follow-up surveil lance. Previous studies have also shown that radiologist characteristics, such as number of years in practice, work load, and type of train ing, are associated with inter pretive perfor mance of screening examinations [17–20]. Therefore, we explored whether any radio logists factors predict better performance when inter preting short-interval follow-up examinations.
|
|
|---|
Data Collection
Short-interval follow-up mammograms —We included mammograms
from 1994 to 2004 in this study if the interpreting radiologist indicated that
the examination was a short-interval follow-up. We included each woman's first
unilateral or bilateral diagnostic mammography examination that was classified
as short-interval follow-up (n = 110,942 women). We excluded women
with a history of breast cancer (n = 7,271) because the
recommendation for and interpretation of short-interval follow-up mammograms
may be different for these women. We made several additional exclusions to
ensure that our analysis included only initial short-interval follow-up
mammograms. BI-RADS guidelines recommend short-interval follow-up mammograms
only after a probably benign assessment
[3]; therefore, we excluded
women with examinations without a previous assessment of probably benign in
the BCSC database (n = 25,490). Because initial short-interval
follow-up examinations normally occur about 6 months after the previous
mammogram or sonogram, we also excluded women who had an examination less than
3 months or more than 9 months before (n = 31,487). Finally, we
excluded women with short-interval follow-up examinations that were missing a
final assessment (n = 445) and those with unknown laterality of the
mammogram (n = 1,242), for a total sample size of 45,007
short-interval follow-up mammograms.
We linked each short-interval follow-up mammogram to the radiologist's final BI-RADS assessment and recommendation within 180 days after the completion of all imaging workup. We also linked each mammogram to a Surveillance Epidemiology and End Results (SEER) registry, state cancer registry, or local benign and malignant pathology database to determine whether a cancer diagnosis had been made within 365 days of the short-interval follow-up examination.
Patient Characteristics
We obtained patient information by linking to standardized questionnaires
completed by women at the time of the short-interval follow-up examination
[21]. The questionnaires
collected information on demographics (e.g., age and race), breast cancer risk
factors (e.g., family history of breast cancer, breast symptoms, menopausal
status, and current use of hormone therapy), and clinical history of previous
breast procedures (e.g., biopsy).
Radiologist Characteristics
Radiologists were eligible to complete a self-administered survey if they
interpreted mammograms in the year 2001 at one of three participating sites:
Group Health Cooperative, New Hampshire Mammography Network, or Colorado
Mammography Project. Of the 181 surveys mailed in 2002, 139 surveys were
returned, for a response rate of 77%; 130 of these radiologists interpreted
one or more short-interval follow-up mammograms from 1994 to 2004, making them
eligible for the present study. A detailed description of the radiologist
survey has been published elsewhere
[19]. Briefly, we collected
demographic and clinical characteristics of radiologists, including age, sex,
academic affiliation, number of years spent working in breast imaging,
workload, percentage of time spent in diagnostic imaging, number of breast
procedures conducted, and malpractice history. We double-entered all survey
data at each site and transferred the data file via secure file transfer
protocol to the statistical coordinating center. We linked these survey
responses to a subgroup of the short-interval follow-up mammograms described
previously that were interpreted by radiologists with survey data (n
= 13,907). This subanalysis did not link to all 45,007 short-interval
follow-up mammograms because we included mammograms in the main analyses that
were interpreted by radiologists who did not complete the survey.
Statistical Analyses
We evaluated rates of abnormal interpretation and breast cancer diagnosis
among all short-interval follow-up examinations and for each category of
patient and radiologist characteristics. We then evaluated the sensitivity and
specificity among all examinations and for each category of patient and
radiologist characteristics using the standard 12-month follow-up interval for
detecting breast cancer [3].
Because guidelines for short-interval follow-up examinations suggest that
women should return for a second short-interval follow-up examination after 6
months, we also calculated sensitivity and specificity using a 6-month outcome
interval. Positive examinations were defined as those with a final BI-RADS
assessment of 4, 5, or 0 with a recommendation for biopsy after the completion
of all diagnostic workup. Negative examinations were defined as those with a
final BI-RADS assessment of 1, 2, 3, or 0 without recommendation for biopsy.
We determined final BI-RADS assessments by look ing for the first non-0
assessment from additional imaging within 180 days of the short-interval
follow-up mammogram; however, some 0 assess ments remained unresolved. We
defined the ab normal interpretation rate as the number of posi tive
examinations divided by the total number of examinations. The definitions we
used for sensi tivity and specificity are as follows:
The 6- and 12-month definitions were not mutually exclusive (i.e., cancers included in the 6-month definition were also included in the 12-month definition). We matched cancer laterality among unilateral short-interval follow-up mammograms so that cancers diagnosed only in the same breast as the unilateral examination were counted as a cancer diagnosis. If a cancer was diagnosed in the opposite breast, it was not counted as a cancer diagnosis for the calculation of sensitivity, and instead was included as a negative examination in the calculation of specificity. We calculated 95% CIs for sensitivity and specificity using the robust variance estimates from generalized estimating equations (GEE) with an exchangeable correlation structure to account for correlation within radiologists who interpreted more than one short-interval follow-up mammogram [22].
Using logistic regression fit with GEE to account for correlation within
radiologists, we modeled the odds of a positive examination given a cancer
diagnosis (sensitivity) and the odds of a negative examination given no cancer
diagnosis (specificity). We conducted univariate analyses to determine which
covariates should be included in the multi-variable models; those that were
statistically significant (p < 0.05) in one or more models (either
sensi tivity or specificity) were included. The final models for the patient
characteristics were adjusted for patient age (continuous), menopausal status
and hormone therapy use (premenopausal, postmenopausal with no hormone
therapy, postmenopausal with hormone therapy), mammogram laterality (bilateral
vs unilateral), and breast density (catego rized as BI-RADS density categories
3 and 4 [dense breasts] versus BI-RADS density categories 1 and 2 [fatty
breasts]). The final models for the radiologist characteristics were adjusted
for all patient character istics just described and for self-reported
radiologist characteristics, including age group (35–44, 45–54,
55 years old), academic affiliation (primary appointment vs affiliate or
no appointment), hours per week spent in breast imaging (< 10 vs
10),
number of mammograms interpreted in 2001 (
1,000, 1,001–2,000,
2,001), percentage of mammo grams interpreted that were diagnostic (
25%
vs 25–100%), and work full-time (yes vs no). We con ducted all
statistical analyses using Stata (StataCorp).
|
|
|---|
|
The 12-month sensitivity of all short-interval follow-up examinations was 60.5% (95% CI, 56.2–64.7%); at 6 months this increased to 83.3% (79.4–87.3%). We noted trends in accuracy of the short-interval follow-up examinations by patient characteristics; however, none was statistically significant in unadjusted models. Crude sensi tivity increased with patient age and among postmenopausal women not receiving hormone therapy. Sensitivity was also higher among unilateral short-interval follow-up examinations compared with bilateral exam inations. The sensitivity was lower for women with extremely dense breasts and whose examination before the short-interval follow-up mammogram was obtained to evaluate a specific breast problem. The average specificity was 97.3% (97.0–97.6%) at 12 months and was unchanged when calculated at 6 months; specificity increased only among women with almost entirely fatty breasts (98.4%; 95% CI, 97.8–99.0%).
Adjusted odds ratios (ORs) for sensitivity and specificity by patient characteristics are shown in Table 2. Because trends in unadjusted sensitivity and specificity at 6 months were similar to those at 12 months, and the number of cancers at 6 months was small, we presented adjusted rates for 12 months only. Although sensitivity increased among postmenopausal women who did not use hormone therapy (compared with premenopausal women), con fi dence intervals were wide. Women who had unilateral mammo grams had significantly increased sensitivity compared with those with bilateral exam inations (OR, 1.56 [95% CI, 1.06–2.29]). Post menopausal women (re gard less of hormone therapy use) and women with breast symptoms had slightly lower specificity (and thus more false-positive examinations) than premenopausal women and women without symptoms, respectively. Women with dense breasts had significantly lower specificity than women with fatty breasts (OR, 0.78 [0.69–0.97]).
|
Radiologist Characteristics
We present interpretive performance by radiologists' characteristics in
Table 3. The overall rates of
abnormal interpretation, cancer diagnoses, sensitivity, and specificity for
this subgroup of short-interval follow-up examinations were similar to those
in the larger patient population obtaining these examinations described in
Table 1. We noted trends in
which unadjusted sensitivities increased among radiologists who had less than
10 years of experience interpreting mammograms (compared with
10), spent
10 or more hours per week in breast imaging (compared with < 10 hours), or
were female (compared with male).
|
A few radiologist characteristics were associated with sensitivity and specificity (Table 4). Radiologists who spent 10 or more hours per week in breast imaging had increased sensitivity and specificity com pared with those who spent less than 10 hours per week (sensitivity OR, 3.25 [1.00–10.52]; specificity OR, 1.35 [0.88–2.07]). Radio logists with an affiliate or no academic ap pointment had increased specificity com pared with those with a primary aca demic appointment (OR, 2.51 [1.20–5.22]).
|
|
|
|---|
80%)
[1,
14]. Few patient or
radiologist characteristics were statistically significantly associated with
sensitivity. To our knowledge, this is the first article to describe the accuracy of initial short-interval follow-up mammograms and to evaluate the accuracy by patient and radiologist characteristics. Sickles et al. [1, 13] showed a similarly low sensitivity at 12 months (for all short-interval follow-up examinations) in results that were posted on the public BCSC Website; these results were not published or discussed in the original articles. Our study population was also from the BCSC, but included one new study site and additional years of data beyond those included in Sickles' articles. We also examined the influence of both patient and radiologist characteristics on sensitivity and specificity. These differences and that we matched laterality of cancer diagnoses and examinations might have accounted for the slightly higher sensitivity noted in our results (60.5% vs 55.8% in Sickles' earlier article [1, 13]). Other previous studies have evaluated the accuracy of the examination producing the initial short-interval follow-up recommendation (i.e., the examination given the probably benign assessment), but these studies did not evaluate the sensitivity or specificity of the initial short-interval follow-up examinations themselves [4, 7].
The reason for the low 12-month sensitivity of initial short-interval follow-up examinations is unclear, but there are several possible explanations. First, cancers assessed as probably benign (BI-RADS category 3) may not grow as rapidly as cancers that appear more suspicious for malignancy (BI-RADS category 4 or 5). Therefore, it may be more difficult to identify interval change (hence, recommend biopsy) at initial short-interval follow-up exam inations because these examinations usually are performed 6 months rather than 1 year after the index mammogram that prompted the short-interval follow-up. The rationale for recommending an initial short-interval follow-up examination is to identify 6 months earlier those poorer-prognosis "probably benign" cancers that do grow sufficiently rapidly to be detected early [10–12].
A second possible reason for the low sensitivity of initial short-interval follow-up examinations is that radiologists interpreting them might be reassured by the previous radiologist's probably benign interpretation and thus have a higher threshold for calling the initial examination suspicious compared with other diagnostic examinations. This would be false reassurance if it is resulting in low sensitivity for diagnosing breast cancer. It would be interesting to evaluate short-interval follow-up sensitivity among radiologists who interpreted both the examination that resulted in a probably benign assessment and the initial short-interval follow-up examination; however, we were unable to do this in our study.
A third possible reason for the low sensitivity of initial short-interval
follow-up examinations may be that the standard BI-RADS definition to use a
12-month follow-up period for evaluating sensitivity and specificity does not
match the 6-month follow-up period recommended after an initial short-interval
follow-up examination. Our data showed that using a 6-month follow-up interval
for the definition of sensitivity increased the unadjusted sensitivity of
short-interval follow-up examinations to 83%, which is similar to the 12-month
sensitivity for other diagnostic examinations (
80%)
[14]. However, a trade-off in
defining the follow-up interval for sensitivity always exists—if you
shorten the follow-up interval, sensitivity increases, because there are fewer
false-negative examinations that appear as interval cancers during this
shorter time. It has been recommended that the follow-up interval for defining
sensitivity and specificity of a screening or diagnostic test should match the
follow-up interval recommended for that test, so long as that is what occurs
in clinical practice [23].
Future research should evaluate the specific follow-up intervals that
radiologists are actually recommending after initial short-interval follow-up
examinations and whether women are complying with those recommendations.
Our study had several limitations. We were unable to retrospectively evaluate whether cancers did or did not show interval progression on short-interval follow-up mammograms, which might have helped to determine more effective thresholds for recommending biopsy rather than continued surveillance. In addition, we were unable to evaluate performance of initial short-interval follow-up examinations by the types of lesions requiring follow-up (e.g., mass, focal asymmetry, calcifications) or by the size and stage of the cancers that were diagnosed. These analyses were beyond the scope of our project.
Although our sample included 130 radiologists from three geographic areas in the United States, we had limited ability to evaluate the importance of individual radiologist characteristics because of a small sample size in some categories. However, all sensitivity and specificity analyses were based on multiple short-interval follow-up mammograms interpreted by the radiologists, increasing the statistical power of these calculations. Despite the large size of our cohort, we were also limited by the small number of women with breast cancer used to evaluate sensitivity, especially within the 6-month follow-up period. Overall, the rate of cancer diagnoses in this study was smaller than has been reported for other short-interval follow-up studies. This may be because a large proportion of short-interval follow-up examinations in our study directly followed screening mammograms, which have a lower cancer rate compared with other diagnostic exami nations. Given the low cancer rate and differences between this and previous studies, we caution the reader in interpreting our results.
Our study had several unique strengths. We were able to evaluate the interpretive performance of short-interval follow-up examinations in a large, geographically diverse population. The large population size allowed us to make several exclusions (such as women with a history of breast cancer or women without outcome information) and to analyze a pop ulation of women eligible for short-interval follow-up mammograms. Had we not made these exclusions, we likely would have increased the variability in our sensitivity and specificity estimates, thus decreasing the ease of inter pretation of our results. Our population included unique detailed information on patient and radiologist characteristics available from linking several study databases. We also had the ability to link to cancer registry data, which enabled us to calculate sensitivity and specificity.
In conclusion, the sensitivity of diagnostic mammograms obtained as initial short-interval follow-up examinations is low when using the standard 12-month auditing definition for follow-up period. The reasons for this low sensitivity should be elucidated. We noted increases in sensitivity among women who underwent unilateral short-interval follow-up examinations and among radiologists who spent 10 or more hours per week in breast imaging; but overall, few patient or radiologist characteristics were associated with accuracy. The value of using a 6-month (rather than a 12-month) follow-up period for defining sensitivity also should be examined in future studies.
Acknowledgments
We thank the BCSC investigators, participating mammography facilities, and
radiologists for the data they provided for this study. A list of the BCSC
investigators and procedures for requesting BCSC data for research purposes
are provided at
http://breastscreening.cancer.gov/.
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |