|
|
||||||||
Original Research |
1 Department of Radiology, Caritas St. Elizabeth's Medical Center and Tufts
University School of Medicine, Boston, MA.
2 Present address: Department of Radiology, Hospital of Saint Raphael, New
Haven, CT.
3 Present address: Department of Radiology, Boston University School of
Medicine, Boston Medical Center, 88 East Newton St., Boston, MA 02118.
Received September 7, 2005;
accepted after revision February 14, 2006.
Address correspondence to P. J. Slanetz
(priscilla.slanetz{at}bmc.org).
Abstract
|
|
|---|
SUBJECTS AND METHODS. Over a 26-month period, 5,016 screening mammograms were interpreted without, and subsequently with, the assistance of the iCAD MammoReader detection system. Data collected for actionable findings included dominant feature (calcification, mass, asymmetry, architectural distortion), detection method (radiologist only, CAD only, or both radiologist and CAD), BI-RADS assessment code, associated histopathology for those undergoing biopsy, and tumor stage for malignant lesions. The study population was cross-checked against an independent reference standard to identify false-negative cases.
RESULTS. Of the 5,016 cases, the recall rate increased from 12% to 14% with the addition of CAD. Of the 107 (2%) patients who underwent biopsy, 101 (94%) were prompted by the radiologist and six (6%) were prompted by CAD. Of the 124 biopsies performed on actionable findings in the 107 patients, findings in 79 (64%) were benign and in 45 (36%) were in situ or invasive carcinoma. Three study participants who were not recalled by the radiologist with the assistance of CAD developed cancer within 1 year of the screening mammogram and were considered to be false-negative cases. The radiologist detected 43 (90%) of the 48 total malignancies and 45 (94%) of the 48 malignancies with the assistance of CAD. CAD missed eight cancers that were detected by the radiologist, which presented as architectural distortions (n = 3), irregular masses (n = 4), and a circumscribed mass (n = 1). CAD detected two in situ cancers as a faint cluster of calcifications that had not been perceived by the radiologist and one mass that was dismissed by the radiologist, accounting for at least a 4.7% increase in cancer detection rate. Sensitivity of screening mammography with the use of CAD (94%) represented an absolute and relative 4% increase over the sensitivity of the radiologist alone (90%). Specificity of screening mammography with and without the use of CAD was 99%.
CONCLUSION. Routine use of CAD while interpreting screening mammograms significantly increases recall rates, has no significant effect on positive predictive value for biopsy, and can increase cancer detection rate by at least 4.7% and sensitivity by at least 4%. This study provides "true" values for sensitivity and specificity for use of CAD in interpretation of screening mammography as measured prospectively in the context of a working clinical setting.
Keywords: breast cancer computer-aided detection mammography screening
|
|
|---|
Although there is a preponderance of evidence related to the usefulness of CAD as measured retrospectively in a laboratory setting [5, 7, 8], there are relatively few studies that prospectively examine the use of CAD in a working clinical environment [9-11]. Furthermore, there are no studies that provide "true" sensitivity and specificity of the use of CAD in the interpretation of screening mammography in a working clinical setting. It is time-consuming and expensive to measure prospectively the effect of the use of CAD, and the low incidence of breast cancer in a screening population largely limits the statistical significance of findings. Furthermore, measurement of sensitivity and specificity requires long-term follow-up and comparison of study participants against an independent reference source. Although retrospective studies can be designed to have the statistical power to conclude about the significance of findings, they are not true indicators of how CAD performs in the clinical arena and do not provide measures of sensitivity and specificity of screening mammography with the use of CAD in a working clinical setting.
The purpose of this article is to prospectively assess the clinical usefulness of CAD in the interpretation of screening mammograms and to determine its impact on recall rate, positive predictive value, cancer detection rates, and the sensitivity and specificity of screening mammography.
|
|
|---|
During the 26-month study period, a total of 5,016 asymptomatic women undergoing screening mammography and CAD evaluation were enrolled in the protocol. Standard screening guidelines were used, and high-risk patients between the ages of 35 and 40 years were encouraged to undergo screening. Patients with a history of breast carcinoma were accepted into the study as long as they had been disease-free for 5 years.
A total of 4,835 (98%) bilateral and 123 (2%) unilateral mammograms were obtained, with a standard two-view evaluation of each breast performed by a mammography-certified radiology technologist. All were imaged using Instrumentarium Performa or Diamond conventional film-screen systems (GE Healthcare). The mammograms were subsequently digitized and analyzed by the iCAD MammoReader system. This particular system uses two types of markscircles indicating areas of possible microcalcifications and crosses indicating areas of possible masses or architectural distortion.
Each mammogram was first interpreted by the radiologist without knowledge of CAD results. In cases in which prior films were available, the images were compared with a study at least 2 years before the current study. If there was a positive finding, the radiologist made a decision regarding recall or follow-up before reviewing the CAD results. The digitized CAD-analyzed image with associated marks was then reviewed, with the radiologist focusing on areas marked by CAD. The radiologist noted any actionable findings, defined as an abnormality on the screening study prompting the interpreting radiologist to recall a patient for additional imaging, as a result of this review, and made the additional recall decisions as necessary. In instances in which a finding prompted radiologist recall before review of CAD analysis, the radiologist marked the area on the printed CAD-analyzed image as follows: "+" marks on a CAD-analyzed image indicated potential masses identified by CAD; "O" marks indicated potential calcifications identified by CAD; and crayon marks indicated areas prompting recall by the interpreting radiologist. In this study, a radiologist could make the decision to recall a patient on revaluation after viewing CAD results, but could not change a recall decision to a negative assessment based on lack of CAD marking in an area of potential abnormality.
Data collected for actionable findings included dominant feature (calcification, mass, asymmetry, architectural distortion), detection method (radiologist only, CAD only, or both radiologist and CAD), BI-RADS assessment code, associated histopathology for those undergoing biopsy, and tumor stage for malignant lesions. We did not systematically analyze lesion size or breast density in our study. The study population was subsequently cross-checked against the Caritas St. Elizabeth's Medical Center cancer registry of all breast malignancies diagnosed between June 1, 2002 and July 31, 2005, to determine the pool of patients who developed malignancy within 12 months of screening mammography.
All data were entered into a Microsoft Excel spreadsheet for analysis, and all statistical analysis was performed using the Statistical Analysis Toolpak (Microsoft). A p value of less than 0.05 was considered to be statistically significant.
|
|
|---|
As Table 1 shows, the use of CAD increased the total number of recalls by 15% (from 607 to 696) and increased the number of actionable findings by 12% (from 727 to 816). The recall rate increased in absolute terms by 2% (from 12% to 14%), which represented a statistically significant increase. The use of CAD increased the number of actionable microcalcification findings by 17% (from 153 to 179) and the number of actionable asymmetry and masses by 11% (from 574 to 637), producing a slight shift in the proportion of microcalcifications and asymmetry or masses deemed actionable.
|
Of the 696 patients recalled, 689 returned for subsequent diagnostic evaluation and seven were lost to follow-up. Table 2 reveals the effect of CAD on the BI-RADS assessment of recalled patients. The absolute number of patients categorized as BI-RADS category 1 or 2 increased 18% (from 391 to 462), patients categorized as BI-RADS category 3 increased 10% (from 103 to 113), and patients categorized as BI-RADS category 4 or 5 increased 6% (from 108 to 114). CAD effectively increased the proportion of patients with final BI-RADS assessments of categories 1 and 2 from 64% to 66%, while decreasing the proportion in BI-RADS categories 4 and 5 from 35% to 33%. Standard recommendations were made based on the BI-RADS assessment.
|
Of the 114 women recommended for biopsy, seven did not return and were deemed, for purposes of the study, lost to follow-up. Table 3 outlines the histopathologic characteristics for the 124 actionable findings in the remaining 107 women who were recommended for biopsy and who underwent the procedure. The effect of CAD was to decrease the proportion of malignant lesions slightly (positive predictive value for biopsy) from 37% (43/117) to 36% (45/124), a statistically insignificant finding. The proportion of both benign and malignant lesions by category shifted modestly as a result of the effect of CAD. Furthermore, the use of CAD decreased the proportion of patients with malignancies to patients recalled by 9% (from 6.8% to 6.2%) and the proportion of malignant lesions to total actionable findings by 7% (from 5.9% to 5.5%).
|
As Table 4 shows, the two additional lesions detected with the aid of CAD were stage 0 and stage I tumors, increasing the proportion of early-stage (stage 0 or I) tumors detected by 1% (from 72% to 73%). The malignant mass marked by CAD on one view, but which was dismissed by the radiologist, was a stage II tumor.
|
Table 5 examines the effect of CAD on the detection of malignant lesions. CAD increased detection rate of malignant lesions by 4.7% (from 0.86% to 0.90%). The radiologist alone detected 43 (90%) of 48 malignant lesions. The use of CAD added two malignancies to increase the total to 45 (94%) of 48 malignant lesions. The radiologist alone detected 90% (27/30) of malignant asymmetry or masses and 89% of malignant microcalcifications (16/18). The radiologist with the assistance of CAD detected 90% (27/30) of malignant asymmetry or masses and 100% (18/18) of microcalcifications. CAD alone marked 67% (20/30) of asymmetry or masses and 100% of microcalcifications (18/18). Of the 48 malignancies detected, 77% (37/48) were initially noted by the radiologist and marked by CAD. Two percent (1/48) were marked by CAD and dismissed by the radiologist, and 4% (2/48) were not detected by either the radiologist or CAD.
|
The sensitivity and specificity of screening mammography both with and without the use of CAD were also calculated. The radiologist alone had a sensitivity of 90% (43/48 total malignant lesions detected), which with the assistance of CAD increased by 4% in absolute and relative terms to 94% (45/48). While not clinically relevant, CAD alone had a sensitivity of 79% (38/48). The specificity for both the radiologist alone and with the assistance of CAD was 99%. CAD missed eight cancers detected by the radiologist, which presented as architectural distortions (n = 3), irregular masses (n = 4), and a circumscribed mass (n = 1). An additional two masses that were missed by both the radiologist and CAD could not be observed on imaging and were categorized as false-negatives. CAD detected two in situ cancers as a faint cluster of calcifications that had not been perceived by the radiologist and one mass that was dismissed by the radiologist, which was categorized as a false-negative.
|
|
|
|
|
|
|---|
A true-positive mark by CAD does not directly translate to enhanced radiologist performance with CAD in a clinical setting. Rather, it is the interaction between the radiologist and the technology that produces increased cancer detection and sensitivity. In our study, there was one case of CAD marking a malignant lesion (mass) that was dismissed by the radiologist, possibly influenced by the low specificity of the CAD system. The patient presented 5 months later with a palpable mass (Figs. 1A, 1B, 1C, and 1D). Because no independent standard was used in previous prospective trials [9-11], it is impossible to determine the number of such cases for comparison with CAD specificity. As CAD systems evolve, the trade-off between sensitivity and specificity may become less severe, and further research may elucidate the association between CAD specificity and cancer detection rates.
|
|
|
In Freer and Ulissey's 2001 study [9], a 19.5% increase in cancer detection rate was associated with a statistically significant increase in recall rate and no decrease in positive predictive value of biopsy. Gur et al. [10] found no significant increase in either cancer detection or recall rate. It is possible, however, that an adjustment for patient characteristics would have resulted in modest increases in recall and cancer detection rate [14]. Birdwell et al. [11] found a 7.4% increase in cancer detection rate with a 1% increase in recall rate. The results from our study indicate a 4.7% increase in cancer detection rate (from 0.86% to 0.90%), a statistically significant increase in recall rate (12.1% to 13.9%), and no significant decrease in positive predictive value of biopsy (36.7% to 36.2%).
In our study, an additional 89 women were recalled and an additional six women biopsied to detect an additional two cancers, resulting in 44.5 recalls and 3.0 biopsies for each additional cancer detected. We compared these benchmarks to the results based on radiologist interpretation alone, in which 607 women were recalled and 101 women biopsied to detect 43 cancers, resulting in 14.1 recalls and 2.3 biopsies for each cancer detected. To determine whether the additional direct costs of the procedures and indirect costs associated with anxiety, pain, and discomfort of recalled and biopsied patients [15] as a result of the use of CAD are offset by the benefits of increased detection of malignancies would require a cost-benefit analysis incorporating long-term data on the effect of earlier detection on morbidity.
|
|
|
|
Our methodology reflected a prospective study in a clinical environment and was largely concerned with the results of CAD as obtained from a real-life practice rather than reproducible theoretic values. Although we recognize that other investigators report a low reproducibility of marks with other CAD systems when converting analog images to digital format [18], we are not aware of this issue being studied with the iCAD system, and our study did not address this issue. In reality, however, analog-to-digital conversion is practiced by most centers in this country using CAD technology; consequently, our study represented the reality of clinical practice at this time.
The results of our study show that CAD decreased the number of false-negatives in our study from 5 to 3, outside of the low end of the range of potential benefit cited in a number of retrospective trials [5, 7, 8, 19]. However, this false-negative rate was determined in a clinical context and is not directly comparable to values obtained from retrospective studies. In one of the false-negative cases, CAD marked the lesion but the radiologist dismissed the finding, and in the remaining two false-negative cases, neither revealed any suspicious finding even on retrospective review. Thus, the results from retrospective analyses may represent maximum values for the potential effect of CAD on false-negative rates and are unlikely to be achieved in a clinical setting.
|
|
|
|
It is worthwhile to note that the radiologists who participated in this study are both experiencedone with 20 years of clinical experience, and the other with fellowship training in breast imaging and 9 years of clinical experience. Previous literature suggests that the use of CAD by an inexperienced radiologist results in a greater increase in the cancer detection rate and sensitivity than for an experienced radiologist [21]. Thus, the values we obtained for the effect of CAD on cancer detection rates and sensitivity of screening may represent lower values on a range that would be expected in a variety of clinical settings with radiologists of varying experience levels.
Although it is convenient to look at the impact of CAD on the detection rate, the ultimate goal of CAD is not to improve radiologic interpretation but to improve treatment outcomes. In their 2001 study, Freer and Ulissey [9] showed that use of CAD increased the proportion of early-stage tumors detected (stage 0 or I) from 73% to 78%. Other recent studies show CAD's effectiveness in detecting small lesions [22]. Our study similarly shows an increase in proportion of early-stage tumors detected from 68% to 70%, with two of the three lesions detected by CAD, but not by the radiologist, characterized as early stage, the third being a stage II tumor. Although it has been established that CAD can lead to earlier detection of malignant lesions, it is unclear whether this translates into the ultimate goal of improved morbidity and mortality rates.
CAD has been suggested as a potential replacement to double reviewing on the basis that its performance is comparable to that of double reviewing [23]. Double reviewing in screening mammography potentially increases the cancer detection rate by 3-15 women per 10,000 women screened and increases or decreases recall rates, depending on the method of double screening used [24-26]. Despite this improvement in sensitivity, the practice of double reviewing is uncommon in the United States, owing largely to logistic, resource, and financial considerations [14, 15, 27]. The results of this studyshowing a 4.7% increase in cancer detection rate with a significant 2% increase in recall rate in absolute terms and no significant change in positive predictive valuesuggest that further research, including economic analysis, is necessary to determine whether there is a role for CAD in place of, or even in conjunction with, double reviewing [16].
A number of studies have examined the effect of mammographic appearance and tumor size on radiologist interpretation and prognosis [7, 28] and on CAD sensitivity [29-32]. Data from this study are consistent with these previous studies, showing a greater sensitivity of CAD for overall microcalcifications than for masses and architectural distortions. Specifically, CAD detected 100% (18/18) of malignant lesions categorized as microcalcifications (Figs. 2A, 2B, and 2C) and 67% (20/30) of malignant lesions categorized as masses (Figs. 3A, 3B, 3C, 3D, 4A, 4B, 4C, and 4D). Our study did not examine how the system performed with different types of calcifications, which was the focus of a recent study that used a different CAD system [33].
It is noteworthy that CAD marked a malignant mass that the radiologist dismissed, suggesting that the way that a radiologist uses CAD is shaped by the radiologist's experience in using the technology and by the radiologist's notions regarding CAD's strengths and weaknesses. Although CAD may hold tremendous promise in improving the practice of screening mammography, its efficacy is ultimately limited by the ability of a radiologist to interact with CAD and capture that potential.
Fundamentally, an evaluation of CAD is a question of whether this technology merely shifts the receiver operating characteristic curve of a single radiologist or represents a point on a different curve entirely. The debate over the relationship between recall rate and cancer detection rates is especially relevant to this question [34-36]. This article does not attempt to provide an answer; instead, it provides a data point in a growing body of prospective data toward an assessment of the use of CAD in the clinical setting. It would require the weight of studies that use long-term cost-benefit analyses that look at effect on morbidity and mortality to provide a truly definitive statement. Furthermore, CAD must be shown to be a more cost-effective and feasible method for screening mammography than alternatives, including increasing radiologist recall rates or double-reviewer methodologies.
In summary, we show that the use of CAD in a clinical setting over a 26-month period increases the detection of cancer by at least 4.7% and the sensitivity of mammography by at least 4%. These increases in cancer detection and sensitivity come at the cost of a statistically significant 2% increase in recall rate, but with no statistically significant impact on the positive predictive value of biopsy. We report a "real" sensitivity of 94% and specificity of 99% for the use of CAD in interpretation of screening mammography.
|
|
|---|
This article has been cited by other articles:
![]() |
References J. ICRU, December 1, 2009; 9(2): 89 - 104. [PDF] |
||||
![]() |
R. L. Birdwell The Preponderance of Evidence Supports Computer-aided Detection for Screening Mammography Radiology, October 1, 2009; 253(1): 9 - 16. [Full Text] [PDF] |
||||
![]() |
R. M. Nishikawa and L. L. Pesce Computer-aided Detection Evaluation Methods Are Not Created Equal Radiology, June 1, 2009; 251(3): 634 - 636. [Full Text] [PDF] |
||||
![]() |
M. A. Gavrielides, L. M. Kinnard, K. J. Myers, and N. Petrick Noncalcified Lung Nodules: Volumetric Assessment with Thoracic CT Radiology, April 1, 2009; 251(1): 26 - 37. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. J. Gilbert, S. M. Astley, M. G.C. Gillan, O. F. Agbaje, M. G. Wallis, J. James, C. R.M. Boggis, S. W. Duffy, and the CADET II Group Single Reading with Computer-Aided Detection for Screening Mammography N. Engl. J. Med., October 16, 2008; 359(16): 1675 - 1684. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Gromet Comparison of Computer-Aided Detection to Double Reading of Screening Mammograms: Review of 231,221 Mammograms Am. J. Roentgenol., April 1, 2008; 190(4): 854 - 859. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. F. Brem Blinded Comparison of Computer-Aided Detection with Human Second Reading in Screening Mammography: The Importance of the Question and the Critical Numbers Game Am. J. Roentgenol., November 1, 2007; 189(5): 1142 - 1144. [Full Text] [PDF] |
||||
![]() |
R. F. Brem Clinical Versus Research Approach to Breast Cancer Detection with CAD: Where Are We Now? Am. J. Roentgenol., January 1, 2007; 188(1): 234 - 235. [Full Text] [PDF] |
||||
![]() |
D. Gur and J. H. Sumkin CAD in Screening Mammography. Am. J. Roentgenol., December 1, 2006; 187(6): 1474 - 1474. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |