AJR F and L Medical Products: Radiation Protection & More
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow CME Credit
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Georgian-Smith, D.
Right arrow Articles by Kopans, D. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Georgian-Smith, D.
Right arrow Articles by Kopans, D. B.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Hotlight (NEW!)
Right arrow
What's Hotlight?
DOI:10.2214/AJR.07.2393
AJR 2007; 189:1135-1141
© American Roentgen Ray Society


Original Research

Blinded Comparison of Computer-Aided Detection with Human Second Reading in Screening Mammography

Dianne Georgian-Smith1, Richard H. Moore2, Elkan Halpern3, Eren D. Yeh1, Elizabeth A. Rafferty2, Helen Anne D'Alessandro2, Mary Staffa2, Deborah A. Hall2, Kathleen A. McCarthy2 and Daniel B. Kopans2

1 Department of Radiology, Breast Imaging, Brigham and Women's Hospital, 75 Francis St., Boston, MA 02115.
2 AVON Breast Center, Boston, MA.
3 Department of Radiology, Institute of Technology Assessment, Massachusetts General Hospital, Boston, MA.

Received April 10, 2007; accepted after revision July 6, 2007.

 
Address correspondence to D. Georgian-Smith (dgeorgiansmith{at}partners.org).

CME

This article is available for CME credit. See www.arrs.org for more information.


Abstract
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
OBJECTIVE. The purpose of this study was to compare a human second reader with computer-aided detection (CAD) for the reduction of false-negative cases by a primary radiologist. We retrospectively reviewed our clinical practice.

MATERIALS AND METHODS. We found that 6,381 consecutive screening mammograms were interpreted by a primary reader. This radiologist then reinterpreted the studies using CAD ("CAD reader"). A second human reader who was blinded to the CAD results but knowledgeable of the primary reader's findings reviewed the studies, looking for abnormalities not seen by the first reader.

RESULTS. Two cancers were called back by the second human reader that were not called back by the CAD reader; however, the CAD system had marked the findings, but they were dismissed by the primary reader. Because of the small numbers, the difference between the CAD and second human reader was not statistically significant. The CAD and human second readers increased the recall rates 6.4% and 7.2% (p = 0.70), respectively, and the biopsy rates 10% and 14.7%. The positive predictive value was 0% (0/3) for the CAD reader and was 40% (2/5) for the human second reader. The relative increases in the cancer detection rate compared with the primary reader's detection rate were 0% for the CAD reader and 15.4% (2/13) for the human second reader (p = 0.50).

CONCLUSION. A human second reader or the use of a CAD system can increase the cancer detection rate, but we found no statistical difference between the two because of the small sample size. A possible benefit from a human second reader is that CAD systems can only point to possible abnormalities, whereas a human must determine the significance of the finding. Having two humans review a study may increase detection rates due to interpreter—hence, perceptual—variability and not just increased detection.

Keywords: breast cancer • computer-aided detection • mammography • mammography recall rates • screening mammography


Introduction
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
The real prevalence of breast cancer in the general population may be as high as 20 cases per 1,000 women [1]. The detection rate has been reported to be between three and eight cases per 1,000 women [2] depending on the mix of prevalent or incident cancers in the population. There is potential room to improve our ability to detect cancers. As Moskowitz [3] elegantly discussed, good clinical practice is the attainment of a high sensitivity in breast cancer detection. Therefore, tools that assist radiologists to achieve this goal are desirable.

One method shown to improve cancer detection in screening mammography is the use of double reading by two radiologists. Previously, we reported the benefit of double reading in which 7.7% additional cancers were detected by a second human reader that had been missed by the primary reader of approximately 6,000 screening mammograms (Hulka C et al., presented at the 1994 annual meeting of the Radiological Society of North America [RSNA]). Although double reading requires more resources and a delay in interpretation, referring physicians and women preferred this system to immediate online interpretation once they were informed of the benefits [4]. Other investigators have reported an increase in detection rates of 5–15% as a result of double reading [57].

Another proven method to increase the sensitivity of screening mammography is the use of a CAD system, whether in a community or an academic practice. In a private practice, Freer and Ulissey [8] reported an almost 20% increase in cancer detection with the use of a CAD system. Birdwell et al. [9] reported an overall detection rate of 29 cancers in 8,000 cases (four cases per 1,000 women) in which the use of CAD prompted the detection of two additional cancers and thereby increased detection sensitivity by 7.4% [9]. Most recently in the largest study to date of more than 21,000 screening studies, Morton et al. [10] had similar results: a 7.6% increase in the number of cancers detected by the addition of CAD.

Because both double reading and CAD have been independently shown to improve sensitivity for breast cancer detection, we retrospectively reviewed our clinical academic practice to compare CAD with a blinded human second reader for the detection of additional breast cancer not seen by a primary radiologist. The purpose of this study was to compare the practice of a human second reader with a CAD reader for the reduction of the number of false-negative cases resulting from review by a primary radiologist.


Materials and Methods
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Patients
Between June 11, 2001, and April 9, 2003, 6,381 consecutive screening mammograms were obtained at one outpatient screening center using a film-screen unit (DMR, GE Healthcare). The four-view screening studies were scanned using a CAD system (ImageChecker CAD system, R2 Technology), and this system was used to evaluate the images.

Radiologists
Eight academic, dedicated breast imagers with an average of 14 years (range, 3–26 years) of experience independently reviewed the mammograms under three conditions. No one radiologist interpreted a dominant number of mammograms. The mean percentage of cases per radiologist was 12% (range, 7–20%). The interpretations represented a fair cross section of our academic practice to ensure the generalizability of our study results.

Review Sequence
We prospectively designed the review sequence so that the human second reader was blinded to the results of the CAD, but the clinical care of the patient was determined by the results of both. Therefore, the clinical practice was based on these two additional reviews. The results of this study were determined from a retrospective review of the clinical practice. Institutional review board approval was obtained before the start of the study. This study was conducted before HIPAA regulations were in effect.

We set three review conditions as follows: first, the primary radiologist; second, the CAD reader who was the primary radiologist along with input from the CAD system; and, third, the second human reader, a different radiologist than the primary one, who reinterpreted the images without knowledge of the CAD results but with knowledge of the primary reader's results.

Study Design
The primary reader—The screening films were batch read. Residents or fellows in training were not permitted to review the films so that the primary reader would not be influenced by another human reader. The primary reader interpreted each film-screen mammogram and then locked his or her impression into a computer. Previously obtained films, preferably 2 years old or older, were available for comparison but were not interrogated by the CAD system. Prior mammograms were available in 78% (3,930/5,049) of the patients at the time of interpretation. When an abnormality was detected that warranted the patient to be called back for diagnostic evaluation, the primary reader marked the area or areas of concern on the films with a wax marker. These notations were important so that the second human reader would know which areas were areas of concern to the primary reader.

The CAD reader—After recording an impression without CAD, the primary reader turned on the CAD system for the four views obtained of the current year. The mammograms were reinterpreted by the primary radiologist with the assistance of the CAD markings. These impressions were then entered into the computer in a separate file, and any new areas of concern were not marked on the films with a wax marker so that the second human reader would not be alerted to the CAD markings.

The second human reader—After the primary reader completed the initial interpretation and the interpretation with CAD, a different radiologist quickly scanned each case to look for areas of suspicion that had not been detected by the primary reader. The intention of this reader was to find mammographic signs of suspicion that had not been detected by the primary reader. It was not the function of the second radiologist to judge the appropriateness of the calls made by the primary reader. Therefore, the primary reader's markings were never reversed by the second human reader.

Callback Workflow
Patients were recalled for additional diagnostic workup based on findings from any of the three interpretations. All patients who were called back for additional workup came to the diagnostic breast imaging center at the hospital.

Pathology and Follow-Up
The histologic results for all of the cases recommended for biopsy were obtained. Malignant cases were defined as those with ductal carcinoma in situ, invasive ductal carcinoma, or invasive lobular carcinoma at either core biopsy or surgical biopsy. All of the screening patients were closely tracked for at least 12 months to identify potential false-negative cases.

Statistical Analysis

  1. The recall rates for the primary, CAD, and second human readers were calculated by the number of cases called back per reader by the entire number of screening cases.
    1. The relative increase in recall rates by the CAD reader was determined by the number of callbacks by the CAD reader per the number of callbacks by the primary reader.
    2. The relative increase in recall rates by the second human reader was determined by the number of callbacks by this reader per the number of callbacks by the primary reader.

  2. The biopsy rates were determined from the number of patients who underwent a core or surgical biopsy (counted only once if the patient underwent both procedures) per reader over the number of patients called back per reader.
  3. The positive predictive values (PPVs) were the number of malignant histologies per reader per number of biopsies per reader (true-positive and false-positive cases).
  4. The cancer detection rate was the number of malignancies per number of screening patients.
    1. The relative increase in cancer detection rate contributed by the CAD reader was the number of malignancies detected by the CAD reader per number of malignancies detected by the primary reader.
    2. The relative increase in cancer detection rate contributed by the second human reader was the number of malignancies detected by the second human reader per number of malignancies detected by the primary reader.

  5. The false-negative cases were defined as patients' mammograms interpreted as negative in whom breast cancer developed within 12 months.
    1. The false-negative rate for the primary reader was the number of cases interpreted as negative in patients in whom breast cancer was detected by the CAD reader, second human reader, or both or developed within 12 months over the total number of screening cases.
    2. The false-negative rate for the CAD reader was the number of cases interpreted as negative in patients in whom breast cancer was either detected by the second human reader or developed within the year over the total number of screening mammograms.
    3. The false-negative rate for the second human reader was the number of cases interpreted as negative in patients in whom breast cancer either was detected by the CAD reader or developed within the year over the total number of screening mammograms.

  6. The false-positive cases were defined as patients recommended for biopsy whose pathology results were benign.

Statistical significance was determined using a two-sided McNemar test to compare the performance of the CAD reader to the second human reader. There were no cases of multifocal or multicentric malignancies.


Results
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Readers' Performances
The primary reader called back 475 of 6,381 cases (7.4%) (Table 1). Biopsies were recommended in 70 of the 475 cases (14.7%) called back by the primary reader. There were 13 malignancies (18.6%) detected in these patients for a primary reader screening detection rate of 2.04 cases per 1,000 women (13/6,381), reflecting that our population is one of primarily incident cancers because 78% of the population had previous studies.


View this table:
[in this window]
[in a new window]

 
TABLE 1: Performance by Radiologists and Computer-Aided Detection (CAD) Reviewer

 

An additional 30 cases were called back by the CAD reader for a CAD additional callback rate of 0.47% (30/6,381) cases (Table 1). Of these 30 callbacks, three (10%, 3/30) were recommended for biopsy. There were no malignancies.

There were 34 cases (34/6,381, 0.53%) that were called back by the second human reader that were in addition to the primary reader's callbacks (Table 1). Except for one case, these cases were different from those that were called back on the basis of CAD markings. Five of the 34 cases (14.7%) were recommended for biopsy. Of the second reader's recommendations that went to biopsy, two of the five (40%) were malignant. The increase in cancer detection contributed by the second reader was from 13 by the primary reader to a total of 15 cases for a relative increase in the cancer detection rate of 15.4% (2/13) (Table 1).

There was no statistical significance in the performance between the CAD and second human readers as measured by the recall rates and by the relative increase in recall rates (p =0.70) and in cancer detection rates (p =0.50) (Table 1). However with a difference detected between the CAD and second human reader of only two cancers, no statistical significance could be shown due to such a small number. In contrast, the difference in the number of call-back cases between the two readers was much greater. With 63 cases of disagreement, we would have had 80% power to detect the difference if either reader was associated with twice as many callbacks as the other reader.

The overall screening cancer detection rate for all three readers was 2.35 per 1,000 (15/6,381). In addition, there were three interval malignancies not detected by any of the readers that developed within 1 year. There were, thus, a total of 18 cancers at the time of imaging in the cohort, for a prior probability of 2.82 cases per 1,000 women.

Malignancies Detected by the Second Human Reader
Two additional cancers not seen by the primary reader or the CAD reader were called back by the second reader (Figs. 1A, 1B, 1C and 2A, 2B, 2C, 2D). Of clinical significance is that the CAD system had marked both of the lesions, but the markings were dismissed by the primary reader. Both cases were called back for diagnostic workup by the second reader that resulted in recommendations for short-interval follow-up. Biopsies were subsequently recommended on the basis of findings at the 6-month workup, and malignancies were then diagnosed. Because cancer was detected within 1 year of the incident screening as a result of the second human reader's calls, these cases were counted as true-positive calls for the second reader.


Figure 1
View larger version (108K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1A Malignancy detected by human second reviewer in 52-year-old woman with ductal carcinoma in situ (DCIS) who presented with group of three or four punctate calcifications on screening mammograms. At time of screening, these calcifications had arguably been stable for 3 years. Therefore, human second reviewer's motivation to recommend additional views is unknown. At diagnostic visit, radiologist thought that calcifications were stable, but short-interval follow-up was recommended. At that follow-up visit, radiologist thought that calcifications had increased in number since mammograms obtained 3.5 years earlier, although differences in technique were considered, and recommended biopsy. Pathology results were DCIS and calcifications were associated with carcinoma. Mediolateral oblique view. Photographic enlargement shows punctate calcifications (arrows) seen on mammograms obtained 3 years before study mammogram.

 

Figure 2
View larger version (115K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1B Malignancy detected by human second reviewer in 52-year-old woman with ductal carcinoma in situ (DCIS) who presented with group of three or four punctate calcifications on screening mammograms. At time of screening, these calcifications had arguably been stable for 3 years. Therefore, human second reviewer's motivation to recommend additional views is unknown. At diagnostic visit, radiologist thought that calcifications were stable, but short-interval follow-up was recommended. At that follow-up visit, radiologist thought that calcifications had increased in number since mammograms obtained 3.5 years earlier, although differences in technique were considered, and recommended biopsy. Pathology results were DCIS and calcifications were associated with carcinoma. Mediolateral oblique view. Photographic enlargement shows calcifications (arrows) seen at screening; patient was called back by human second reviewer. Diagnostic workup concluded stability, but short-term follow-up was recommended.

 

Figure 3
View larger version (120K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1C Malignancy detected by human second reviewer in 52-year-old woman with ductal carcinoma in situ (DCIS) who presented with group of three or four punctate calcifications on screening mammograms. At time of screening, these calcifications had arguably been stable for 3 years. Therefore, human second reviewer's motivation to recommend additional views is unknown. At diagnostic visit, radiologist thought that calcifications were stable, but short-interval follow-up was recommended. At that follow-up visit, radiologist thought that calcifications had increased in number since mammograms obtained 3.5 years earlier, although differences in technique were considered, and recommended biopsy. Pathology results were DCIS and calcifications were associated with carcinoma. Magnified (x1.8) mediolateral oblique view obtained 6 months after B at time of biopsy that was recommended for same calcifications (arrows).

 

Figure 4
View larger version (123K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 2A Malignancy detected by human second reviewer: 73-year-old woman with ductal carcinoma who was called back by human second reviewer for possible architectural distortion versus summation shadows. Abnormality was suspected on only mediolateral oblique projection of screening mammographic images. Of note is that computer-aided detection (CAD) system had marked this same image, but mark had been dismissed by "CAD reviewer." At time of diagnostic evaluation, many additional views were obtained, and finding was considered to be superimposition of shadows. However, short-term follow-up was recommended in 6 months based only on radiologist's "gut" feeling, even though mammogram was considered to be negative for abnormal findings. At that follow-up, finding was now thought to be architectural distortion in two views but was best seen in craniocaudal projection. Whether this change represented progression in malignancy versus differences in projection is not known. Pathology showed ductal carcinoma in situ. Mediolateral oblique (A) and craniocaudal (B) mammograms. Photographic enlargements show area considered to be overlapping shadows (arrows) after diagnostic workup.

 

Figure 5
View larger version (123K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 2B Malignancy detected by human second reviewer: 73-year-old woman with ductal carcinoma who was called back by human second reviewer for possible architectural distortion versus summation shadows. Abnormality was suspected on only mediolateral oblique projection of screening mammographic images. Of note is that computer-aided detection (CAD) system had marked this same image, but mark had been dismissed by "CAD reviewer." At time of diagnostic evaluation, many additional views were obtained, and finding was considered to be superimposition of shadows. However, short-term follow-up was recommended in 6 months based only on radiologist's "gut" feeling, even though mammogram was considered to be negative for abnormal findings. At that follow-up, finding was now thought to be architectural distortion in two views but was best seen in craniocaudal projection. Whether this change represented progression in malignancy versus differences in projection is not known. Pathology showed ductal carcinoma in situ. Mediolateral oblique (A) and craniocaudal (B) mammograms. Photographic enlargements show area considered to be overlapping shadows (arrows) after diagnostic workup.

 

Figure 6
View larger version (106K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 2C Malignancy detected by human second reviewer: 73-year-old woman with ductal carcinoma who was called back by human second reviewer for possible architectural distortion versus summation shadows. Abnormality was suspected on only mediolateral oblique projection of screening mammographic images. Of note is that computer-aided detection (CAD) system had marked this same image, but mark had been dismissed by "CAD reviewer." At time of diagnostic evaluation, many additional views were obtained, and finding was considered to be superimposition of shadows. However, short-term follow-up was recommended in 6 months based only on radiologist's "gut" feeling, even though mammogram was considered to be negative for abnormal findings. At that follow-up, finding was now thought to be architectural distortion in two views but was best seen in craniocaudal projection. Whether this change represented progression in malignancy versus differences in projection is not known. Pathology showed ductal carcinoma in situ. Craniocaudal (C) and mediolateral oblique (D) mammograms (magnification, x1.8) 6 months later show architectural distortion (arrows) that prompted the radiologist to recommend surgical biopsy.

 

Figure 7
View larger version (97K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 2D Malignancy detected by human second reviewer: 73-year-old woman with ductal carcinoma who was called back by human second reviewer for possible architectural distortion versus summation shadows. Abnormality was suspected on only mediolateral oblique projection of screening mammographic images. Of note is that computer-aided detection (CAD) system had marked this same image, but mark had been dismissed by "CAD reviewer." At time of diagnostic evaluation, many additional views were obtained, and finding was considered to be superimposition of shadows. However, short-term follow-up was recommended in 6 months based only on radiologist's "gut" feeling, even though mammogram was considered to be negative for abnormal findings. At that follow-up, finding was now thought to be architectural distortion in two views but was best seen in craniocaudal projection. Whether this change represented progression in malignancy versus differences in projection is not known. Pathology showed ductal carcinoma in situ. Craniocaudal (C) and mediolateral oblique (D) mammograms (magnification, x1.8) 6 months later show architectural distortion (arrows) that prompted the radiologist to recommend surgical biopsy.

 
Interval Cancers Not Detected by Any of the Readers
There were three false-negative cases of malignancy arising within 12 months of the negative screening mammograms not noted by any of the readers. On retrospective review of the original screening mammograms, two of the three cases were considered negative by consensus. The third false-negative for all three readers was considered to be a missed case on retrospective review. This was in a 76-year-old woman who developed a malignancy in the scar of a previously excised benign mass from 4 years earlier (Fig. 3A, 3B, 3C); the tumor was detected by palpation 9 months after the negative screening examination. It is noteworthy that the CAD system had marked this area, but the mark had been dismissed by the primary reader. The second human reader noted only postsurgical changes. In retrospective review, there were mammographic signs of malignancy in the scar characterized by increasing density at the biopsy site. A 2.1-cm invasive ductal carcinoma was surgically excised.


Figure 8
View larger version (99K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 3A False-negative case for all reviewers: 76-year-old woman with invasive ductal carcinoma. Craniocaudal mammogram obtained 4 years before study in which mass (arrow) was excised and was found to be benign (fibrocystic changes without atypia) at histology.

 

Figure 9
View larger version (103K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 3B False-negative case for all reviewers: 76-year-old woman with invasive ductal carcinoma. Screening mammogram, craniocaudal view, 2 years before study shows postsurgical changes.

 

Figure 10
View larger version (93K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 3C False-negative case for all reviewers: 76-year-old woman with invasive ductal carcinoma. Screening mammogram, craniocaudal view, at time of study in which increase in density at biopsy site was not detected by any of reviewers, although area was marked by computer-aided detection system.

 

Discussion
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
The main purpose of our study was to compare the performance of a CAD system as a second reader with that of a human second reader in an academic clinical practice. The performance measures for our study—that is, our recall rates, biopsy rates, positive predictive values, and cancer detection rates—were similar to those of Birdwell et al. [9], who also measured performance in a university hospital setting. In particular, we questioned whether there were differences between the two readers when used in screening for cancer not seen initially by a primary reader. The overall effect was to evaluate the contribution of each to the increase in detection of malignancy or, alternatively, to the decrease in the number of false-negative cases by the primary reader.

Our results show that in our academic screening practice the performance of a CAD system and a human second reader were not statistically significantly different as measured by malignancy detection rates or call-back rates, with both readers adding cases to those identified by the primary reader. However, the differences between these two additional readers were too small to detect statistical significance.

A recent large study by Fenton et al. [11] of more than 200,000 women also showed no change in the detection of breast cancer by CAD in a consortium of community practices. The cancer detection rate before and after CAD was 4.15 and 4.20 cases per 1,000 women, respectively; however, the biopsy rates increased from 14.7 to 17.6 cases per 1,000 women (p < 0.001) and the recall rates increased from 10.1 to 13.2 cases per 1,000 women (p < 0.001), respectively. Therefore, there was an overall decrease in accuracy after the use of CAD, although admittedly there was a proportionate increase in ductal carcinoma in situ, due to the detection of calcifications by CAD, to invasive cancer. Although the study by Fenton et al. was clearly different in design and purpose than ours, both studies showed a lack of improvement in cancer detection with the use of CAD.

Although the number of cases was too small to determine if the CAD or if the human reader was superior, the results of this study nevertheless highlight an important issue about the use of CAD: CAD can identify a lesion, but the human reader still must determine its significance. Two cancers highlighted by CAD were dismissed by a human, the primary reader. A possible advantage of a human second reader is not only that more lesions may be detected but also that a second human may have a different interpretation (reader variability) of the findings and that difference may lead to earlier diagnosis. Variability in human perception is highlighted by the fact that 64 cases were called back by the CAD reader and human second reader but that only one case was called back by both readers. This finding suggests that, although the clinical outcomes may look the same statistically, the two forms of double reading may be vastly different and, therefore, may not be equivalent.

No previous studies have used this specific methodology to measure CAD against a second reader, to our knowledge, but the authors of several reports have compared CAD with double readings in case-controlled environments. Destounis et al. [12] questioned the effect of CAD on the false-negative rate of screening-detected cancers that had double reading in which the second reader was aware of the initial reader's responses. They retrospectively reviewed prior mammograms in 318 cancer cases that had been clinically interpreted as negative by double reading. The prior years' mammograms were interrogated by CAD if three of five radiologists on a panel had deemed findings to be "actionable." CAD marked 37 of the 52 cases (71%) as showing actionable findings. Therefore, the theoretic value of the CAD markings is that CAD would have reduced the false-negative rate from 31% (98/318) to 19% (61/318).

Birdwell et al. [13] also conducted a study with similar results evaluating the potential effect of CAD on the false-negative rate, but unlike the cases in the Destounis et al. study [12], the clinical cases had been reviewed by only one radiologist. A panel reviewed the prior years' mammograms that had been interpreted as negative in 427 patients with screening-detected cancers and found that 115 cases had lesions visible that were deemed "actionable," for a false-negative rate of 27%. CAD marked the majority of findings, 77% (88/115), similar to the previous study. In addition, there was a theoretic reduction in the false-negative rate by using CAD from 27% (115/427) to 6.3% (27/427).

Our usual clinical practice of screening mammography is that of double reading in which the second reader is aware of the initial reader's interpretation. The ultimate goal of a second reader is to reduce the false-negative rate of the first reader. We have reported detection of an additional 7.7% of cancers with the use of double reading before the use of CAD (Hulka C et al., presented at the 1994 annual meeting of RSNA). This finding has been corroborated in community practice by Taplin et al. [14], who showed that the average increase in sensitivity with double reading was 7%. With the introduction of CAD to our clinical setting, we did not find CAD helpful in reducing the false-negative rate of the primary reader. Keep in mind, though, that our methods of evaluating CAD differed significantly from those of the previous two studies of Destounis et al. [12] and Birdwell et al. [13] in that most of our cases were negative and our study group did not stem from a cohort of malignant cases.

As we previously noted, the primary reader in our study dismissed two additional malignant cases that were identified by our second human reader. To conclude clinical significance would be hasty (p =0.50) (Table 1). However, note that the CAD system marked those two cases found by the human second reader and marked the one case of interval cancer considered by consensus to have detectable mammographic findings, but the human reader of the CAD marks, the primary reader, dismissed the findings. So the potential remains, despite our results, for CAD to impact the clinical outcome of false-negative cases.

Nevertheless, there is still no clear consensus in the literature as to the benefit of CAD versus double reading, particularly because of the effect of false-positive marks and their effect on specificity. Karssemeijer et al. [15] performed a retrospective study of 250 cancer cases and 250 normal cases that had been independently double read clinically. They evaluated the sensitivity of CAD for these cases. However, they looked at only the findings that had been marked by the radiologists and ignored the potential false-negative interpretations of cases with CAD-only marks. Their conclusion was that the performance of an independent double reading was significantly better than that of CAD (Tukey-Kramer, p = 0.009) because of fewer false-positive marks by a human reader and, hence, a higher specificity for double reading than CAD. These results are in contrast to those of Ciatto et al. [16], who found CAD to be significantly more specific than double reading. Those investigators retrospectively reviewed the screening mammograms of patients with findings interpreted as negative who developed interval cancers. They looked at the effects of double reading and CAD on those cases. CAD was almost as sensitive as independent double reading but, in contrast to the previous study, CAD was more specific. These disparate results, whether CAD or double reading is superior, may be due to differences in methods and study populations.

Most of the findings highlighted by CAD in our study were false-positive markings. False-positives are distracting, and undoubtedly the large number of them contributed to the lack of appreciation by the radiologists in the three "actionable" cases that were falsely interpreted as negative yet marked by CAD: two false-negatives by the CAD reader and one false-negative interval case that, by consensus, showed abnormal findings. With the initial CAD software, there was approximately one mark per film (four per patient), and obviously benign findings, such as axillary lymph nodes and vascular calcifications, were noted. A large number of false-positive marks by CAD have been noted in several studies [1720].

The effects of superfluous markings on the detection of true findings, which the results of our study illustrate, were discussed by Ikeda et al. [17] and Astley [18]. To calculate a false-positive CAD marking rate, we will use our optimum average number of two marks per case. We know that of the 18 malignant cases in a population of approximately 6,000 patients, there were two malignant cases that had no mammographic findings as determined by consensus retrospective review, leaving 16 cases with mammographic findings. If CAD had marked both projections in all of the cases, there would have been a total of approximately 12,000 marks for the entire population and only 32 marks would have indicated findings of malignancy, resulting in 11,968 marks that did not indicate malignancy and a false-positive rate of 99.7% (11,968/12,000). In this setting, the radiologist becomes numb to the CAD markings, rendering each to be insignificant and easily dismissed. This problem explains why the results of studies using retrospective study cases differ from those using prospective clinical studies.

The human second reader is rapidly processing findings and unconsciously or consciously dismissing most of them. The CAD marks must be actively evaluated. As Ikeda et al. [17] noted, "it is the radiologist's knowledge of breast cancer imaging and diagnostic acumen that influences the choice to recall a finding, not the marking of the CAD system." Our results clearly corroborate this opinion. Moreover, we also agree that the lack of action on CAD markings that subsequently prove to be malignant is not an indication of radiologists performing below the standard of care, particularly because more than 99% of the markings should be dismissed.

Limitations to this study may be the sample size and the inability to show statistical significance between the human second reader and the CAD reader. To show statistical significance, one would need a difference of at least six cases between the groups, and any fewer has a power of zero. If the difference of two cancers per group were maintained, we would have needed at least four times the number of screening cases, more than 24,000, and all of the malignancies would need to have been identified by only one reader, an unlikely scenario. Another difference is that the CAD was used on only analogue films. There may be differences with the new digital algorithms, reducing the number of false-positive CAD marks and thereby reducing the confounding impact of the false-positive CAD marks. CAD has always excelled in detecting calcifications. Now, with the large size of a given mammographic image and marked contrast of digital imagery, detection of calcifications by humans has perhaps improved over film-screen mammography, thereby possibly reducing the effect of CAD. All of these issues deserve further study and would affect the results of our form of study.

The workflow of our current practice has changed because of the advent of digital screening mammography. We currently do not use a second human reader, but we do use the CAD system to supplement the primary reader. This change has occurred because of the increase in time it now takes to review a digital screening mammogram over an analogue one, which anecdotally has been measured in our practice to be twice as long. The additional time needed by two human readers, the primary and second, is too long to be acceptable in our institution, which strives to complete interpretations within 24 hours of acquisition. This practical consideration led to the dissolution of the second human reader in our practice. Each practice should evaluate the merits of a second radiologist, whether human or computer, given one's own workflow.

In conclusion, the results of our study are concordant with those of previous studies showing that either a second human reader or a CAD system can increase the detection of cancers in a screening program. However, our experience highlights a phenomenon that has been minimally emphasized—namely, how the human who uses CAD interprets the CAD markings. We showed that the two cases detected by the human second reader were not recalled by the primary radiologist despite their identification by the CAD system; the primary reader was perhaps influenced by the very large number of false-positive marks.

CAD only identifies. The interpretation of a mammogram still depends on the judgment of a radiologist, relying on experience and knowledge, to determine its importance. Variability in human performance is highlighted by the fact that the CAD reader and the human second reader called back different cases from one another. A CAD system can minimize perceptual failure but cannot compensate for interpretation failure. The use of a human second reader has the advantage of offering a second interpretation and an increase in perception.


Acknowledgments
 
We thank Donna Burgess for her contributions, managing the operational aspects and supporting data preparation, to this study.


References
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 

  1. Pollei SR, Mettler FA, Bartow SA, Moradian G, Moskowitz M. Occult breast cancer: prevalence and radiographic detectability. Radiology 1987;163 : 459–462[Abstract/Free Full Text]
  2. Sickles EA, Ominsky SH, Sollitto RA, Galvin HB, Monticciolo DL. Medical audit of a rapid-throughput mammography screening practice: methodology and results of 27,114 examinations. Radiology 1990;175 : 323–327[Abstract/Free Full Text]
  3. Moskowitz M. Retrospective reviews of breast cancer screening: what do we really learn from them? Radiology1996; 199:615 –620[Free Full Text]
  4. Slanetz P, Moore RH, Hulka CA, et al. Physicians' opinions on the delivery of mammographic screening services: immediate interpretation versus double reading. AJR 1996;167 : 377–379[Abstract/Free Full Text]
  5. Bird RE. Professional quality assurance for mammographic screening programs. (commentary) Radiology 1990;177 : 587[Free Full Text]
  6. Tabar L, Fagerberg G, Duffy SW, Day NE, Gad A, Grontoft O. Update of the Swedish two-county program of mammographic screening for breast cancer. Radiol Clin North Am 1992;30 : 187–210[Medline]
  7. Thurfjell EL, Lernevall KA, Taube A. Benefit of independent double reading in a population-based mammography screening program. Radiology 1994;191 : 241–244[Abstract/Free Full Text]
  8. Freer TW, Ulissey MJ. Screening mammography with computer-aided detection: prospective study of 12,860 patients in a community breast center. Radiology 2001;220 : 781–786[Abstract/Free Full Text]
  9. Birdwell RL, Bandodkar P, Ikeda DM. Computer-aided detection with screening mammography in a university hospital setting. Radiology 2005;236 : 451–457[Abstract/Free Full Text]
  10. Morton MJ, Whaley DH, Brandt KR, Amrani KK. Screening mammograms: interpretation with computer-aided detection—prospective evaluation. Radiology 2006;239 : 375–383[Abstract/Free Full Text]
  11. Fenton JJ, Taplin SH, Carney PA, et al. Influence of computer-aided detection on performance of screening mammography. N Engl J Med 2007; 356:1399 –1409[Abstract/Free Full Text]
  12. Destounis SV, DiNitto P, Logan-Young W, Bonaccio E, Zuley M, Willison KM. Can computer-aided detection with double reading of screening mammograms help decrease the false negative rate? Initial experience. Radiology 2004;232 : 578–584[Abstract/Free Full Text]
  13. Birdwell RL, Ikeda DM, O'Shaughnessy KF, Sickles EA. Mammographic characteristics of 115 missed cancers later detected with screening mammography and the potential utility of computer aided detection. Radiology 2001;219 : 192–202[Abstract/Free Full Text]
  14. Taplin SH, Rutter CM, Elmore JG, Seger D, White D, Brenner RJ. Accuracy of screening mammography using single vs independent double interpretation. AJR 2000;174 :1257 –1262[Abstract/Free Full Text]
  15. Karssemeijer N, Otten JD, Verbeek AL, et al. Computer-aided detection versus independent double reading of masses on mammograms. Radiology 2003;227 : 192–200[Abstract/Free Full Text]
  16. Ciatto S, Rosselli Del Turco M, Burke P, Visioli C, Paci E, Zappa M. Comparison of standard and double reading and computer-aided detection (CAD) of interval cancers at prior negative screening mammograms: blind review. Br J Cancer 2003;89 :1645 –1649[CrossRef][Medline]
  17. Ikeda DM, Birdwell RL, O'Shaughnessy KF, Sickles EA, Brenner RJ. Computer-aided detection output on 172 subtle findings on normal mammograms previously obtained in women with breast cancer detected at follow-up screening mammography. Radiology 2004;230 : 811–819[Abstract/Free Full Text]
  18. Astley SM. Computer-based detection and prompting of mammographic abnormalities. Br J Radiol 2004;77 [spec no. 2]:S194 –S200[Abstract/Free Full Text]
  19. Soo MS, Rosen EL, Xia JQ, Ghate S, Baker. Computer-aided detection of amorphous calcifications. AJR 2005;184 : 887–892[Abstract/Free Full Text]
  20. Baker JA, Rosen EL, Lo JY, Gimenez EI, Walsh R, Soo MS. Computer-aided detection (CAD) in screening mammography: sensitivity of commercial CAD systems for detecting architectural distortion. AJR 2003; 181:1083 –1088[Abstract/Free Full Text]

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
JOURNAL OF THE ICRUHome page
References
J. ICRU, December 1, 2009; 9(2): 89 - 104.
[PDF]


Home page
RadiologyHome page
R. L. Birdwell
The Preponderance of Evidence Supports Computer-aided Detection for Screening Mammography
Radiology, October 1, 2009; 253(1): 9 - 16.
[Full Text] [PDF]


Home page
RadiologyHome page
R. M. Nishikawa and L. L. Pesce
Computer-aided Detection Evaluation Methods Are Not Created Equal
Radiology, June 1, 2009; 251(3): 634 - 636.
[Full Text] [PDF]


Home page
NEJMHome page
F. J. Gilbert, S. M. Astley, M. G.C. Gillan, O. F. Agbaje, M. G. Wallis, J. James, C. R.M. Boggis, S. W. Duffy, and the CADET II Group
Single Reading with Computer-Aided Detection for Screening Mammography
N. Engl. J. Med., October 16, 2008; 359(16): 1675 - 1684.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
R. F. Brem
Blinded Comparison of Computer-Aided Detection with Human Second Reading in Screening Mammography: The Importance of the Question and the Critical Numbers Game
Am. J. Roentgenol., November 1, 2007; 189(5): 1142 - 1144.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow CME Credit
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Georgian-Smith, D.
Right arrow Articles by Kopans, D. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Georgian-Smith, D.
Right arrow Articles by Kopans, D. B.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Hotlight (NEW!)
Right arrow
What's Hotlight?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS