|
|
||||||||
Original Research |
1 Department of Radiology, Breast Imaging Center, Ullevaal University Hospital,
Kirkeveien 166, N-0407 Oslo, Norway.
2 R2 Technology, 2585 Augustine Dr., Santa Clara, CA.
3 Present address: 1720 Holt Ave., Los Altos, CA 94024.
Received December 22, 2005;
accepted after revision May 25, 2006.
A. Kshirsagar, S. Stapleton, and R. A. Castellino are (or were at the time
of our study) employees of R2 Technology, which makes the CAD system discussed
herein.
Abstract
|
|
|---|
MATERIALS AND METHODS. The cases of 3,683 women who underwent both screen-film mammography and full-field digital mammography (FFDM) with independent double reading for each technique were followed for 2 years to include cancers detected in the interval between screening rounds and cancers detected at the next screening round. Fifty-five biopsy-proven cancers were diagnosed. The baseline screening mammograms of the 55 cancers were defined as having positive findings if at least one of two independent readers scored it 2 or higher on a 5-point rating scale. The baseline mammograms of interval (n = 10) or secondround (n = 16) cancers were retrospectively classified as overlooked (n = 2), minimal sign actionable (n = 8), minimal sign nonactionable (n = 5), and normal (n = 11). The baseline mammograms of these cases of cancer were evaluated with a CAD system, and the CAD results were compared (McNemar's test for paired proportions) with the findings at prospective independent double reading of mammograms obtained with each technique.
RESULTS. For FFDM, CAD sensitivity was 95% (37/39) compared with 64% (25/39) for double reading (p = 0.006), and for screen-film mammography, CAD sensitivity was 85% (33/39) compared with 77% (30/39) for prospective double reading (p = 0.57) of radiographically visible lesions in baseline mammograms. CAD correctly marked five (13%) of 39 cancers on screen-film mammography and 14 (36%) of 39 cancers on FFDM not detected at prospective independent double reading.
CONCLUSION. CAD showed the potential to increase the cancer detection rate for FFDM and for screen-film mammography in breast cancer screening performed with independent double reading.
Keywords: breast breast cancer computer-aided detection digital images mammography screening
|
|
|---|
Conventional screen-film mammography has been the technique of choice for screening programs. Full-field digital mammography (FFDM), however, offers several advantages, and its benefits are probably best realized with soft-copy display and interpretation of images. In three prospective studies comparing screen-film mammography and FFDM with soft-copy reading in screening populations, fewer cancers were detected with FFDM compared with screen-film mammography in the Colorado-Massachusetts study [4] and the Oslo I study [5], but more cancers were detected with FFDM in the Oslo II study [6]. The differences, however, were not statistically significant. The results of the large American College of Radiology Imaging Network trial [7] showed that the diagnostic accuracy of screen-film mammography was similar to that of FFDM. The accuracy of FFDM, however, was significantly higher than that of screen-film mammography among women younger than 50 years, women with heterogeneously dense and extremely dense breasts, and premenopausal and perimenopausal women.
Computer-aided detection (CAD) is designed to help radiologists increase the cancer detection rate and rate of detection of early-stage cancers and to decrease interobserver variability [8-11]. Both retrospective [8-11] and prospective [12-16] clinical studies have yielded evidence of the benefit of CAD in screen-film mammography. Two sequentialreading studies [8, 12] (in which the images are interpreted without and then with CAD input) from a community practice and an academic setting showed 19.5% and 7.4% increases, respectively, in the cancer detection rate. A large historical controlled study [14] (in which cancer detection rates were compared in pre-CAD and CAD periods) did not show an overall significant increase in cancer detection. Subsequent analysis [15] of the reported data showed a 19.7% increase in cancer detection by 17 of the 24 radiologists in the study who were in the low-volume category. In a study with a similar historical control design, Cupples et al. [16] found CAD was associated with a 16.1% increase in the cancer detection rate and, of importance, a 164% increase in the detection of invasive cancers 1.0 cm or smaller.
CAD systems are beneficial when they show malignant lesions that are visible and actionable on the image but that are overlooked by the radiologist (observational oversight) and when radiologists recognize and act on the missed or overlooked cancers identified with the CAD system. The former scenario is typically evaluated with retrospective studies, the ideal CAD system showing all cancers not identified by radiologists. The latter scenario can only truly be evaluated with prospective studies.
Although CAD with screen-film mammography has been evaluated in both retrospective and prospective studies, the performance of CAD with FFDM has not been widely established, even with retrospective studies, despite a general belief [7] that one key advantage of FFDM is easier implementation of CAD. One challenge with evaluating CAD with FFDM has been the lack of availability of databases from FFDM screening programs. The database from the Oslo I trial provides data on known cancers consecutively detected with FFDM screening. Because the Oslo I study was a paired screen-film mammography and FFDM trial with follow-up, the database contains cancers classified as missed or overlooked. This database can be used to evaluate the performance of a CAD system in a set of known cases of cancer and to evaluate the amount of correlation between cases of cancer the CAD system marks and the cases of cancer identified by radiologists.
The aim of our study was to evaluate the effect of a CAD system on mammograms acquired with FFDM and to compare the prospective independent double reading of screen-film mammography and FFDM in the paired Oslo I study with the findings of retrospective analysis of the cancer cases with a commercially available CAD system.
|
|
|---|
The Norwegian Breast Cancer Screening Program protocol has a 2-year interval between screening rounds. In this analysis, the study population was evaluated for more than 2 years to include all cancers detected in the interval between screening rounds and cases of cancer found at the second screening round. The diagnostic evaluations of women recalled were performed at the breast imaging center of our institution within 2 weeks after the consensus meeting. All cytologic and histologic examinations were performed in the department of pathology. The baseline interpretation results, results from diagnostic evaluation of recalled patients, cytologic and histologic findings for patients undergoing surgery, and histologic findings for patients with interval cancers and cancers detected at the second screening round were sent to a central database of the Norwegian Breast Cancer Screening Program located at the Cancer Registry of Norway. All malignant tumors diagnosed in Norway must be reported to the Cancer Registry of Norway, and this database is linked to the mammographic screening database for each county. This system enables complete surveillance of all women in the screening program, including women whose interval cancers might have been diagnosed at other institutions.
Imaging
The 3,683 women (mean age, 58.2 years) in the study population underwent
both screen-film mammography and FFDM. All screen-film mammograms were
acquired on one of two mammography units (Mammomat 300, Siemens Medical
Solutions) with Kodak Min-R 2000 film and Min-R 2190 screens (Eastman Kodak).
The FFDM images were acquired on a Senographe 2000D unit (GE Healthcare).
Mammograms for both imaging techniques consisted of the two standard views
(craniocaudal and mediolateral oblique) of each breast. Both examinations were
performed on the same day within minutes of each other in the screening unit
in downtown Oslo. Screen-film mammography was always performed first in case
the woman, for whatever reason, chose not to participate in the double
examination.
Eight radiologists, all with more than 4 years of screening mammographic experience, were divided into two teams. One team interpreted screen-film mammograms and the other team, FFDM for 1 week. Each team of four radiologists alternated weekly between screen-film mammography and FFDM interpretations. In each group, two radiologists independently interpreted either screen-film mammographic or FFDM examinations.
The findings of prospective independent double reading of the baseline screen-film and FFDM screening images were recorded in the central database of the Norwegian Cancer Registry. A 5-point rating scale for probability of cancer was used: 1 = normal, definitely benign; 2 = probably benign; 3 = indeterminate; 4 = probably malignant; 5 = malignant. If at least one of the two readers categorized a mammographic finding 2 or higher (hereafter called positive), the case was reviewed at a baseline screening consensus meeting. The baseline screening consensus meeting could dismiss cases with a low abnormal mammographic score (rating score of 2). However, recall for diagnostic evaluation was mandatory for cases categorized 3 or higher by at least one of the two original readers. The reading protocol has been previously published [5].
The baseline screening mammograms of women in whom an interval cancer developed or cancer was detected in the second screening round were retrospectively reviewed in a meeting of the radiologists taking part in the study. The baseline screening mammograms were retrospectively classified into four categories as follows: Normal indicated that a later-developing malignant tumor could not be seen, even when the location and mammographic appearance of the subsequently developed cancer were known. Minimal sign nonactionable indicated that although minor changes were seen at the location of the later-developing cancer, these mammographic features were so minimal and nonspecific that a true-positive CAD prompt would likely have no influence on decision making regarding recall. Minimal sign actionable indicated that suspicious mammographic features were present that should have initiated a recall if prompted by CAD findings. Overlooked (missed) cancer indicated that obvious malignant mammographic features were present and the woman should have been recalled in the baseline screening round if the abnormality had been detected by the radiologists or was prompted by CAD findings.
The mammographic findings of the cancers detected at baseline screening and of the subsequent cancers visible and actionable in retrospect (i.e., minimal sign actionable and overlooked categories) were classified as one of the following: ill-defined mass, spiculated mass, distortion and asymmetric density, microcalcifications only, and density with calcifications.
|
Interval cancersTen interval cancers were diagnosed in the study population. Six interval cancers were interpreted as normal by all four radiologists during the baseline screening round and were classified either as minimal sign nonactionable or as normal in the retrospective review of the baseline mammograms. Four interval cancers had a true-positive score (3 on screen-film mammography and 1 on FFDM) in the baseline screening round but were dismissed at the baseline screening consensus meeting. Histologic examination of the three cancers with a true-positive screen-film mammographic score revealed two invasive ductal carcinomas and one ductal carcinoma in situ. The interval cancer with a true-positive FFDM score was invasive lobular carcinoma.
Cancers detected at second screening round Sixteen cases of cancer in 15 women were diagnosed (one woman had bilateral breast cancer) in the screening round 2 years after the baseline. Three of these cases of cancer (two invasive ductal carcinoma and one invasive lobular carcinoma) had a true-positive FFDM score at baseline interpretation 2 years earlier but were dismissed at the baseline screening consensus meeting. During the retrospective review meeting on the baseline mammograms of the 26 subsequently detected cases of cancer (10 interval cases and 16 cases detected in the second round), 11 were classified as normal, five as minimal sign nonactionable, eight as minimal sign actionable, and two as overlooked. The total of 26 interval cancers and cancers detected at the second round are hereafter called subsequent cancers.
CAD
The baseline screen-film mammograms of the 55 cases of cancer were analyzed
with the latest screen-film mammographic version of a commercially available
CAD system (ImageChecker version 8.0, R2 Technology, pending U.S. Food and
Drug Administration approval at the time of the study) at an operating point
where the average number of false marks per normal four-view case was 2.2, as
measured in 345 clinically confirmed normal cases from different institutions
[17]. The baseline FFDM images
were analyzed with the FFDM version of the same CAD system, which produced an
average of 1.9 false-positive marks per normal four-view case measured in 97
consecutively evaluated normal cases from the Oslo I study
[18]. This CAD system displays
two types of marks. An asterisk indicates a mass or area of architectural
distortion, and a triangle indicates an area suggestive of
microcalcifications.
Scoring and Statistical Analysis
The areas marked by the CAD system were assessed by the consensus panel to
determine whether the location and characterization (mass or
microcalcification mark) of the CAD marks corresponded to the mammographically
detected cancer. Each case was classified as having true-positive or
false-negative findings on the basis of biopsy-proven ground truth. A CAD
prompt was considered true-positive if the center of the CAD mark was within
the confines of the cancer in at least one of the two standard views.
The cancer detection rates reported in this study for CAD and for the radiologists were calculated on the basis of true-positive scores on images in the baseline interpretation session, even if the cancer was diagnosed subsequently. A rating score of 2 or higher on the 5-point rating scale for probability of cancer by at least one of the two independent readers was defined as a true-positive score in estimates of the cancer detection rates of the radiologists. The number of cancers detected in prospective independent double reading of the baseline screen-film and full-field screening mammograms was determined. The number of baseline detected cancers correctly marked with the CAD system was determined, as was the number of cancers judged at the retrospective review to be minimal sign actionable or overlooked on the baseline mammograms of the subsequently diagnosed cancers. McNemar's test for paired proportions (Epi Info, version 6, Centers for Disease Control and Prevention) was used for statistical analysis. A statistically significant result was considered p < 0.025.
|
|
|---|
|
Subsequent cancersIn 10 of the 26 cases of subsequently diagnosed cancer, the cancer was judged in retrospect to be visible and actionablethat is, overlooked or minimal sign actionable categories. On the baseline mammograms, CAD marked 60% (6/10) of these cases on screen-film mammograms and 100% (10/10) on FFDM images. Of the four cancers with false-negative findings at double reading of screen-film mammograms and a true-positive score at CAD, two cases were correctly marked by CAD on both views and two cases on one view. However, one of the latter two cases revealed a positioning failure, the lesion not being visualized on the craniocaudal view. Of the seven cases of overlooked lesions and suspicious findings with false-negative interpretation on FFDM, CAD correctly marked three cases on both views and four cases on one view. The malignant lesion not seen on the craniocaudal screen-film mammogram because of positioning failure also was outside the image on the FFDM craniocaudal view (i.e., positioning failure). Whether this lesion would have been correctly marked by CAD on both views if there had been no positioning failure is an open question. In a clinical setting, these cases had the potential for earlier detection with the assistance of CAD. CAD marked both cancers overlooked with both screen-film mammography and FFDM. Although CAD correctly marked 5/5 (100%) and 2/5 (40%) of the five screen-film mammography and FFDM cases, respectively, that were retrospectively judged minimal sign nonactionable, a true-positive CAD mark would likely have had no influence on the decision to recall the patient. The mammographic changes were minimal and nonspecific and therefore were not considered further in this analysis. Figure 2 shows a flowchart of the results of CAD in these 15 cases.
|
Comparison of CAD Results and Double Reading Findings
Comparison of the screen-film mammographic and FFDM double reading findings
with the CAD results for 39 cases of cancer (29 baseline and 10 subsequently
diagnosed actionable lesions) showed a slightly higher true-positive score for
CAD with screen-film mammography (33/39 vs 30/39 true-positive findings) and a
remarkably higher score for CAD with FFDM (37/39 vs 25/39 true-positive
findings), especially for spiculated masses (12/12 vs 5/12 true-positive
findings) and microcalcifications (10/10 vs 6/10 true-positive findings).
The standalone sensitivity of CAD was determined by summing the data presented in Tables 1 and 2 for screen-film mammography and FFDM. In FFDM, CAD correctly marked 27/29 of the baseline malignant lesions and 10/10 of the subsequent visible and actionable lesions, for an FFDM CAD sensitivity of 94% (37/39). In screen-film mammography, CAD correctly marked 27/29 of the baseline malignant lesions and 6/10 of the subsequent visible and actionable lesions, for a screen-film mammography CAD sensitivity of 85% (33/39).
|
If all cancer cases given a score of 2 or more by at least one of the two independent readers had been acted on, the standalone sensitivity of independent double reading for the 29 baseline cases of cancers and 10 subsequent visible and actionable lesions would have been 64% (22/29 + 3/10 = 25/39) for FFDM and 77% (27/29 + 3/10 = 30/39) for screen-film mammography. Thus, for FFDM, CAD sensitivity was 94% (37/39) compared with 64% (25/39) for double reading, and for screen-film mammography, CAD sensitivity was 85% (33/39) compared with 77% (30/39) for double reading. In a two-by-two-table analysis of cancer detection rate, McNemar's test showed no significant difference (p = 0.57) between independent double reading and CAD of screen-film mammograms. The comparison did show a statistically significant difference (McNemar's test p = 0.006) for interpretation of FFDM images.
Overall comparison for the 39 actionable cancers showed that CAD marked 5/39 (13%) of lesions on screen-film mammograms that were not recalled in a double-reading environment (Table 3). Thus, it is possible that 13% more cancers could have been detected on screen-film mammography, reaching a combined detection rate of 90%. An interesting finding was that on FFDM, CAD marked all 14/39 (36%) malignant lesions not recalled at independent double reading, potentially increasing the cancer detection rate 36% and reaching a 100% combined detection rate (Table 3) for these 39 cancers.
|
|
|
|---|
Double reading is another method of reducing the false-negative rate of mammographic screening, the increase in cancer detection being as high as 15% [22]. Because of practicality and cost-effectiveness concerns, however, double reading has not been widely adopted, except in European countries with population-based screening programs in which double reading is mandated. Therefore, another goal for CAD is to produce increases in cancer detection rates similar to those of double reading without the increased costs and complexities that arise when two radiologists review the same examinations. CAD consequently is considered to represent an alternative to and not an additional procedure in double reading. In an experimental study design [23], CAD was found to be almost as sensitive as simulated double reading.
The few reports in the literature on CAD performance in FFDM indicate that the results would be equivalent to the results reported for CAD analysis of secondarily digitized images [24-26]. This finding is not surprising, because previous experimental retrospective studies comparing screen-film mammography and FFDM showed that FFDM is comparable with screen-film mammography in detectability, conspicuity, and characterization of microcalcifications [27].
To our knowledge, few studies have been conducted to evaluate CAD in a double reading environment. In a retrospective study, Destounis et al. [11] found that CAD had the potential to decrease the false-negative rate at double reading by more than one third. In our study, after exclusion of minimal sign nonactionable lesions, CAD correctly marked five cases of cancer on screen-film mammograms and 14 cases on FFDM; these cases had been missed at baseline independent double reading.
We found a 36% potential benefit of CAD in FFDM with soft-copy reading. In the Oslo I study, the lower cancer detection rate with FFDM compared with screen-film mammography can be explained in part by a learning curve effect and suboptimal reading environments with FFDM soft-copy review [28]. This explanation is confirmed by the higher cancer detection rate for FFDM in the Oslo II study. Oslo II was performed with the same radiologists as in Oslo I, but they had gained more experience in FFDM soft-copy reading and had improved reading environments [6]. Our study was performed after completion of the Oslo I study, and we used the database from that study. Oslo II was in progress during the data collection and analysis phases of our study. Nevertheless, our results show that CAD has the potential to increase the cancer detection rate even in mammographic screening programs with double reading. Our results also indicate that CAD with FFDM may have special benefit for radiologists inexperienced with FFDM and soft-copy reading.
Many breast cancers detected at screening are in retrospect visible on the previous mammograms. The rate of missing detectable cancers was estimated to be 29% in one study [29], and other studies have shown that approximately 50-67% of malignant tumors are visible in retrospect on previous mammograms [9, 30]. The judgment whether a lesion is visible but not actionable in retrospect versus when a lesion should be considered minimal sign actionable or missed (overlooked) is subjective and depends to a large extent on whether the readers are blinded or informed [31].
Previous screening mammograms of patients with interval cancer have been classified into four categories: screening error, minimal sign present, occult, and occult also at diagnosis [32]. According to this classification, 13% of previous findings are screening errors and 38% are minimal sign present; that is, approximately one half of findings identified on previous mammograms are actionable [31].
Ikeda et al. [21] discussed a subset of cancers that have perceptible but nonspecific mammographic findings marked with CAD technology even when the findings do not warrant recall. We therefore separated the minimal sign group into actionable (warrant recall) and nonactionable (nonspecific findings that probably would not warrant recall, even if prompted, in daily practice). After the latter are excluded from analysis, there is still an additional potential benefit of CAD in a double reading setting. Of the 39 radiographically visible and actionable cancers in our study, 14 (36%) of the cases of cancer interpreted as normal after double reading were correctly marked by CAD in at least in one view on FFDM. This number was five (13%) for screen-film mammography. We believe patients would have a high probability of being recalled in the screening round if CAD were used.
Three limitations of this study need to be addressed. First, the study was a retrospective CAD analysis, which can provide insight into the potential effect of CAD on reader sensitivity. A prospective study is necessary to determine the actual benefit of CAD in daily practice. Our analysis was focused on the degree of correlation between CAD findings and clinical decisions to measure the potential benefit from CAD. Although other investigators have retrospectively measured this potential benefit of CAD, we are not aware of studies that included an analysis of cancers detected with FFDM screening or that were conducted with cases collected from a screening program such as the Oslo I trial. We believe a strength of our study was that our data set consisted not only of cancers detected with consecutive baseline screening with both screen-film mammography and FFDM but also of baseline mammograms of interval and subsequently detected cancers.
In a retrospective analysis such as that performed in this study, the measured effect of CAD is a potential rather than an actual benefit, because one cannot know whether a particular true-positive CAD mark would change a radiologist's decision in daily practice. We took a conservative approach in our retrospective classification of the missed cancers. We did not include all retrospectively visible cancers but included only cancers visible in retrospect that were also judged to be actionable, that is, those classified as overlooked and minimal sign actionable. Thus, if a retrospective review of the baseline mammograms were to reveal only a subtle nonspecific finding in the location of the subsequent cancer, it is likely that the radiologist in daily practice would (and should) discard such a prompt, assuming it is a false CAD mark. Acting on such nonspecific findings (although "correctly" marked by CAD) would result in an unacceptable increase in the recall rate, as shown by Ikeda et al. [21]. We believe that inclusion of such a CAD mark in an analysis as a true-positive CAD mark would lead to an overestimate of the potential benefit of CAD in daily practice. Therefore, we subgrouped minimal-sign lesions into nonactionable and actionable and included the latter only when marked by CAD as a true-positive finding and one that radiologists would likely act on.
A second limitation of this study was the small number of cases of cancer and consequently the lack of statistical power. The low number of cases of cancer may explain the 100% potential performance of radiologists plus CAD in FFDM mode. This finding is likely spurious. It seems unlikely that with the use of CAD, all patients with lesions marked by CAD would have been recalled in the baseline round, even though the lesions were visible and actionable. However, these cancers would have had a higher probability of being recalled in the baseline screening round with the use of CAD.
Third, this study did not show whether the CAD false-marker rate would lead to an increase in recall rate greater than the increase in cancer detection rate. Experience from single interpretation of screen-film mammograms suggests that the recall rate increases with CAD but at a rate comparable with or less than the increase in cancer detection rate [12-16].
Our findings clearly show the potential benefit of CAD input for radiologists interpreting FFDM screening mammograms, even in screening programs with independent double reading. To our knowledge, our study is unique in that we compared CAD with the results of prospective independent double reading with a 5-point rating scale as used in daily practice and in that we used two imaging techniques (screen-film mammography and FFDM). Furthermore, we measured the potential benefit of CAD in FFDM. We believe it was important to measure the correlation between the radiologists' findings and CAD performance on FFDM and therefore to measure the potential benefit of CAD in FFDM.
In conclusion, our results indicate that CAD has the potential for increasing the cancer detection rate, even in breast cancer screening programs using independent double reading. Furthermore, our results indicate that CAD may be of particular value when FFDM with soft-copy reading is introduced into breast cancer screening programs.
|
|
|---|
This article has been cited by other articles:
![]() |
J. S. The, K. J. Schilling, J. W. Hoffmeister, E. Friedmann, R. McGinnis, and R. G. Holcomb Detection of Breast Cancer with Full-Field Digital Mammography and Computer-Aided Detection Am. J. Roentgenol., February 1, 2009; 192(2): 337 - 340. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hofvind, P. M. Vacek, J. Skelly, D. L. Weaver, and B. M. Geller Comparing Screening Mammography for Early Breast Cancer Detection in Vermont and Norway J Natl Cancer Inst, August 6, 2008; 100(15): 1082 - 1091. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. P. Forman Back to the Beginning Am. J. Roentgenol., February 1, 2007; 188(2): 295 - 296. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |