|
|
||||||||
1 Department of Radiology, Fletcher Allen Health Care, University of Vermont
College of Medicine, UHC Campus, Burlington, VT 05401.
2 The Office of Health Promotion Research, University of Vermont, Burlington, VT
05401.
Received September 6, 2001;
accepted after revision October 1, 2002.
Supported by cooperative agreement U01CA70013 from the National Cancer
Institute.
Abstract
|
|
|---|
MATERIALS AND METHODS. Two radiologists independently double-interpreted 25,369 screening mammograms performed from November 1998 to April 2000. The second reviewer could add but could not delete recalls. The subsequent additional diagnostic imaging was performed in the same way whether generated from the first or the second reviewer. The outcome of each case was determined. The cancer detection rate and sensitivity are reported.
RESULTS. Double interpretation of screening mammograms detected 143 breast malignancies. The second reviewer found nine (6.3%) of 143 cancers and all except one were stage 0 or I. The sensitivity increased from 74.4% to 79.4% with double interpretation. The second reviewer contributed 371 of the 3591 total recalls, increasing the absolute rate of recalls by 1.5% (371/25,369) and the relative rate by 11.5% (371/3220). Six hundred seventy-two total biopsies were performed; 38 were generated by the second interpretation.
CONCLUSION. The relative increase in cancer detection as a result of the second reviewer is 6.3%, similar to the 515% reported in the literature. All but one of the nine additional cancers detected were in the early stages.
|
|
|---|
One proven method of increasing breast cancer detection is to have two radiologists interpret the screening mammogram [4, 5, 6, 7, 8, 9, 10, 11].
This method, known as double interpretation, is commonly used in other countries. Double interpretation has been performed in one of two ways: Mammograms are interpreted by each radiologist independently without discussion of the findings; or mammograms are interpreted in consensus, in which recall occurs only with agreement of the radiologists involved. Published independent double-interpretation studies describe an increase in cancer detection and recall rates [4, 5, 6, 7, 11]. The primary goal of this study was to examine to what extent cancer detection is improved when independent double interpretation of screening mammography is used in the clinical setting in an academic mammography practice in the United States.
|
|
|---|
The Vermont Breast Cancer Surveillance System [13], a member of the National Cancer Institute Breast Cancer Surveillance Consortium [14], provided mammographic and pathologic data for this study. The data in this study include total number of screening mammograms, the number of recalls, and biopsy results matched to mammographic assessments. Mammographic interpretations were matched with pathology outcomes within 365 days to determine true- and false-positive and negative findings on mammograms. The Vermont Breast Cancer Surveillance System also provided the data on each radiologist regarding volumes interpreted and recall rates. Descriptive data are provided.
BI-RADS mammographic assessments of categories 4 (suspicious abnormality), 5 (highly suggestive of malignancy), and 0 (unresolved) were considered positive, and categories 1, 2, and 3 (negative, benign, and probably benign) were considered negative. All ductal carcinoma in situ and invasive cancers were included in this study as cancers. Lobular carcinoma in situ was not included as a cancer.
Double interpretation consisted of two radiologists independently interpreting the screening mammograms. The first radiologist marked the findings on the mammogram if a recall for additional imaging was recommended. The second radiologist, or double-reviewer, would then reinterpret the mammograms. The second reviewer looked at all images and could add but not delete recalls for additional imaging. The subsequent additional diagnostic imaging and any biopsies that followed were performed in a similar way, whether generated from the first or from the second reviewer. All discrepancies between the first and second reviewers were recorded and entered into an Access 2000 database (Microsoft, Redmond, WA), which was using standard methods after appropriate institutional review board approvals were granted. The radiology and pathology reports of all the recalls generated by the second reviewer were reviewed to determine the outcome of imaging and of any biopsies performed. The details of this review were recorded in the Access database, which was queried using standard methods.
Statistical evaluation was performed using chisquare tests. A p value of less than 0.05 was considered significant.
Seven radiologists interpreted screening mammograms. Two of the seven radiologists specialize in breast imaging (one of the specialists in breast imaging works part-time). The remainder spent 1050% of their time in breast imaging. The range of years of interpreting mammography was 318 years. The radiologist who had interpreted screening mammograms for 3 years was the only radiologist who was fellowship-trained in breast imaging. All radiologists met the MQSA requirements for the number of mammograms interpreted and continuing medical education. The number of screening mammograms interpreted by each radiologist and the recall rates for each individual are presented in Table 1. Radiologists C and D differed significantly from each other and from all other radiologists (p < 0.0001).
|
|
|
|---|
|
The demographics and screening histories of the women are shown in Table 2. The groups of women recalled differed significantly from those with normal screening results in regard to breast density, prevalence or incidence screening, and age. Recalls were more frequent with denser breasts, younger women, and prevalence screening. Recalls did not differ in family history. Table 2 reports the recall by incidence versus prevalence screening. Women for whom this was a first mammogram were recalled more frequently than women returning for mammography. The groups of women recalled by the first and second reviewers did not significantly differ.
Of the 672 biopsies, the first interpretation generated 94.3% (634) and the second interpretation, 5.7% (38). Two additional biopsies were performed because the patient or physician requested biopsy for BI-RADS category 3 interpretations. These two biopsies yielded benign results. Of the 38 biopsies, nine (23.7%) were malignant. An 85-year-old patient refused a recommended biopsy for calcifications, and another case remained unresolved as a BI-RADS category 0 because the patient was lost to follow-up. She later returned to the system, and her subsequent mammogram was a BI-RADS category 1 examination.
Of the nine cancers detected by the second reviewer, four were ductal carcinoma in situ (Figs. 1A, 1B, 1C and 1D) and five were invasive ductal carcinoma; four of these five were associated with ductal carcinoma in situ. Four cancers were pathologic stage I and one was stage IIB cancer. The second reviewer detected a 7-mm invasive cancer in the stage IIB patient. However, at mastectomy this lesion was a 4-cm invasive cancer, most of which was occult on imaging (Figs. 2A, 2B, 2C, 2D and 2E). This was the only cancer detected by the second reviewer that was axillary node positive for metastatic disease (Table 3).
|
|
|
|
|
|
|
|
|
|
The second reviewer also added two cancers that were detected at biopsy 6 months after the screening examination. In these two cases, the initial additional imaging was reported as BI-RADS category 3; changes on the imaging were noted at 6 months and biopsy was recommended and performed. These two cancers are considered false-negatives in the sensitivity calculations.
A comparison of the cancers detected by the first and second reviewers is presented in Table 4. The numbers do not differ significantly; however, the number of the cancers detected by the second reviewer is small, and the differences would have to be great to achieve statistical significance.
|
To assess the effects of double interpretation, we reviewed the outcome data from our group. Our cancer detection rate with double interpretation was 5.6 per 1000 screening mammograms and without double interpretation was 5.3 per 1000 screening mammograms in the study population. This is in the expected range as reported by Linver et al. [15]. Our mammographic studies are a combination of incidence and prevalence screening examinations; the proportions can be seen in Table 2.
We calculated false-negative rates as follows. A total of 143 cancers were detected on mammograms, 134 by the first reviewer and nine by the second reviewer. We found a total of 37 false-negative examinations: 26 cancers linked to screening mammograms with no recall (BI-RADS category 1, 2, or 3) and 11 cancers linked to category 1, 2, or 3 after additional images were obtained. Therefore, false-negative results occurred in 37 of 25,369 examinations, or 1.5 per 1000 screening mammograms. False-negative results were encountered in 37 (20.5%) of 180 proven cancers.
Sensitivity was calculated in two ways. First, we calculated the sensitivity using mammograms with positive results at final assessment after the workup was completed (assessment categories 4 and 5). With this method, the sensitivity is 134 (74.4%) of 180 for the first reviewer and increased to 143 (79.4%) of 180 with the addition of the second reviewer. Second, we calculated sensitivity using assessment category 0 as a mammogram with positive findings (as defined by the BI-RADS outcome audit). If this method is used, the sensitivity remained at 143 (79.4%) of 180 for the first reviewer and increased to 154 (85.6%) of 180 with the addition of the second reviewer.
Table 3 shows the radiologists who missed cancers as first reviewers and radiologists who detected cancers as second reviewers. The numbers are small, and no definite trends were noted with regard to years of practice, number of screening examinations interpreted, or recall rates. One pattern noted is that one radiologist overlooked three of the four ductal carcinoma in situ cases. This suggests one radiologist overlooks calcifications more commonly than the others do.
|
|
|---|
The 6.3% relative increase in cancer detection (as a result of the second reviewer) found in this study is similar to the findings of other reports in the literature, which describe an increase in cancer detection of 515% [4, 5, 6, 7, 8, 9, 10, 11]. Sensitivity increased (from 74.4% to 79.4%) with the double interpretation, as others have previously reported [5, 7] and as we expected. Warren and Duffy [6] reported a 3% (from 6.9% to 10%) increase in absolute recall rates and a 45% (from 3.1% to 6.9%) relative increase in recall rates with independent double interpretation. The 1.5% absolute increase in recall rates and an 11.5% relative increase in recall rates in our study were lower than those in that report.
Although our results are similar to those of previous reports, several differences in the studies are noted. Other studies of double interpretation performed in the clinical setting were conducted in countries outside the United States [5, 6, 8, 9, 11]. In studies from Europe and Canada, screening mammography is performed differently. For example, the screening interval varies from 2 to 3 years; and in several studies, only one mammographic view (mediolateral oblique) per breast was obtained at screening. In the study by Anderson et al. [9] from Scotland, the screening examinations were oblique views alone obtained every 3 years for women older than 50 years, and the interpretation rate for the first reviewer was 200 mammograms per hour. In our study, screening was annual screening for women older than 40 years, with craniocaudal and mediolateral oblique views obtained of each breast. The rate of first interpretation was 30 mammograms per hour. Kopans [10] reports a similar rate for first interpretations at his institution in the United States.
Outside the United States, recall rates are generally lower than in the United States. Anttinen et al. [4] report in Finland a recall rate of 5.5% with independent double interpretation, Anderson et al. [9] report that in Scotland the national average recall rate is 7.7%, Thurfjell et al. [5] in Sweden report a 4.8% recall rate with independent double interpretation, and Warren and Duffy [6] in England report a 10% recall rate with independent double interpretation. All of these recall rates are lower than the 14.2% recall rate of our group.
Our high recall rate may be explained by the following reasons. Our computer system automatically links assessment to BI-RADS recommendation. Also, the Vermont Breast Cancer Surveillance System has requested all recalls be recorded as assessment category 0. In our group, rarely is a case reported as a category 3, 4, or 5 from the screening board, which is in contrast to many groups that do not record category 0 cases as consistently. In some practices, radiologists report only a final assessment or record cases as category 3, 4, or 5 and at the same time recommend additional mammographic views or sonography. Taplin et al. [21] found, in a review of 292,795 women, that the screening assessments of BI-RADS category 3 were associated with the recommendation of additional imaging in 36.9% of cases; category 4 assessments, with recommended additional imaging in 38.7% of cases; and category 5 assessments, with recommended additional imaging in 6.6% of cases. At our institution, all of these recommendations would have been classified as category 0 mammograms. These data suggest that if category 0, the category that records recall rates, is used properly, recall rates would increase, which may explain, at least in part, why our recall rate is high. It is also likely that our recall rate is high because of our established practice pattern. Our range of recall rates was 11.917.2%, but five of the seven radiologists had similar recall rates. Finally, cost regulations and monitoring of recalls do not occur in the United States as they do in other countries with national funding and greater regulation.
Another factor that inevitably plays a role in practice patterns in the United States is the medicolegal environment that exists in this country. Berlin [22] points out that radiologists are the specialists most frequently sued in malpractice lawsuits that involve breast cancer, and multimillion-dollar settlements are commonplace. These facts cannot help but influence recall rates, short-interval follow-up, and biopsy recommendationsin short, the practice of breast imaging.
Because of our high recall rate, we have modified our method of double interpretation to include consensus in which two radiologists must agree to recall a patient for additional imaging and recalls can be deleted. Previous reports of this type of double interpretation show a slight loss of sensitivity, with up to a 45% decrease in recall rates [4]. We are currently collecting these data; and although we expect similar results, we will review the data to determine the effects on our practice and our recall rates.
Beam et al. [23] found that specific pairs of radiologists achieve optimal true-positive and false-positive rates and other pairs do not. This finding is consistent with the concept that each person has a position on the receiver operating characteristic curve and certain combinations of radiologists will complement one another and others will not. We did not assess this belief: the numbers were too small to analyze and would be of no practical value at our institution because our schedules are too complex to allow certain pairs of radiologists to perform double interpretations.
Other methods of double interpretation have been considered, including having a mammography technologist perform one of the interpretations [24, 25]. Another promising option is for computer-aided detection devices to serve as the second reviewer. In our group, 0.18 full-time-equivalent radiologists were used to perform the double interpretation, and the potential exists for computer-aided detection to do this task. Several studies show increases in cancer detection with computer-aided detection [26, 27]. Computer-aided detection also allows billing for the double interpretation, which cannot be done when a radiologist performs the second interpretation.
In conclusion, our study found a 6.3% relative increase in cancer detection with the use of independent double interpretation of screening mammograms. This increased sensitivity was expected. The cancers detected by the second reviewers were stage 0 or I with the exception of one case, and these findings did not differ from the 134 cancers detected by the first reviewers. These cancers were detected early, which can lead to lower mortality, lower cost, and the use of fewer resources [16, 17, 18, 19, 20]. Our group has modified the method of double interpretation to a consensus type of double interpretation with the goal of maintaining an increase in detection and achieving a lower recall rate. It is possible that computer-aided detection will be able to replace a second radiologist as the second reviewer in the future.
Acknowledgments
We thank Besty Sussman, Sally Herschorn, and the late Linda Roe for their
invaluable help in data collection, and Pamela Vacek for assistance in data
analysis.
|
|
|---|
This article has been cited by other articles:
![]() |
R. L. Birdwell The Preponderance of Evidence Supports Computer-aided Detection for Screening Mammography Radiology, October 1, 2009; 253(1): 9 - 16. [Full Text] [PDF] |
||||
![]() |
L. E. Philpotts Can Computer-aided Detection Be Detrimental to Mammographic Interpretation? Radiology, October 1, 2009; 253(1): 17 - 22. [Full Text] [PDF] |
||||
![]() |
S. Hofvind, B. M. Geller, R. D. Rosenberg, and P. Skaane Screening-detected Breast Cancers: Discordant Independent Double Reading in a Population-based Screening Program Radiology, September 29, 2009; (2009) radiol.2533090210v1. [Abstract] [Full Text] |
||||
![]() |
S. Hofvind, B. C Yankaskas, J.-L. Bulliard, C. N Klabunde, and J. Fracheboud Comparing interval breast cancer rates in Norway and North Carolina: results and challenges J Med Screen, September 1, 2009; 16(3): 131 - 139. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hofvind, P. M. Vacek, J. Skelly, D. L. Weaver, and B. M. Geller Comparing Screening Mammography for Early Breast Cancer Detection in Vermont and Norway J Natl Cancer Inst, August 6, 2008; 100(15): 1082 - 1091. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Gromet Comparison of Computer-Aided Detection to Double Reading of Screening Mammograms: Review of 231,221 Mammograms Am. J. Roentgenol., April 1, 2008; 190(4): 854 - 859. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. J. Gilbert, S. M. Astley, M. A. McGee, M. G. C. Gillan, C. R. M. Boggis, P. M. Griffiths, and S. W. Duffy Single Reading with Computer-aided Detection and Double Reading of Screening Mammograms in the United Kingdom National Breast Screening Program Radiology, October 1, 2006; 241(1): 47 - 53. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Morton, D. H. Whaley, K. R. Brandt, and K. K. Amrami Screening Mammograms: Interpretation with Computer-aided Detection--Prospective Evaluation Radiology, May 1, 2006; 239(2): 375 - 383. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Burnside, J. M. Park, J. P. Fine, and G. A. Sisney The Use of Batch Reading to Improve the Performance of Screening Mammography Am. J. Roentgenol., September 1, 2005; 185(3): 790 - 796. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. E. Hendrick, G. R. Cutter, E. A. Berns, C. Nakano, J. Egger, P. A. Carney, L. Abraham, S. H. Taplin, C. J. D'Orsi, W. Barlow, et al. Community-Based Mammography Practice: Services, Charges, and Interpretation Methods Am. J. Roentgenol., February 1, 2005; 184(2): 433 - 438. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |