|
|
||||||||
Original Research |
1 Group Health Cooperative, Center for Health Studies, Seattle, WA 98101.
2 National Cancer Institute, Applied Research Program, Division of Cancer
Control and Population Sciences, 6130 Executive Blvd., MSC 7004, EPN 4500,
Bethesda, MD 20892.
3 Department of Radiology, University of Washington, Seattle Cancer Care
Alliance, Seattle, WA 98109.
Received June 2, 2005;
accepted after revision January 6, 2006.
This work was funded by grant CA63731 from the National Cancer Institute
(NCI), but the opinions are solely those of the authors and do not imply any
endorsement by NCI or the federal government. In addition, R2 Technology
provided equipment and technical assistance for the project. The findings are
those of the authors and cannot be construed to reflect the thoughts or
opinions of R2 Technology.
Abstract
|
|
|---|
MATERIALS AND METHODS. We created a stratified random sample of screening mammograms weighted with difficult cases split evenly among women with fatty breast tissue and those with dense breast tissue: 114 patients were cancer-free, 114 had cancer 1 year after screening, and 113 had cancer 13-24 months after screening. In test settings 6 months apart, 19 community radiologists interpreted 341 bilateral screening mammograms with and without CAD. We compared the sensitivity and specificity using regression models adjusting for repeated measures.
RESULTS. CAD assistance did not affect overall sensitivity (cancer by 1 year: 63.2% without CAD and 62.0% with CAD; cancer in 13-24 months: 33.5% without CAD and 32.3% with CAD), but its effect differed for visible masses that were marked by CAD compared with those that were not marked by CAD (hereafter referred to as "unmarked"). CAD was associated with improved sensitivity for marked visible cancers and decreased sensitivity for unmarked visible masses; the sensitivities without and with CAD, respectively, were as follows: marked cancer by 1 year, 82.7% versus 83.1%; marked cancer in 13-24 months, 44.2% versus 57.9%; unmarked cancer by 1 year, 37.4% versus 30.1%; unmarked cancer in 13-24 months, 29.7% versus 23.0% (p < 0.03 for both interactions between assistance and CAD marking for cancer by 1 year and cancer in 13-24 months). CAD marked 77% (70/91) of the visible cancers by 1 year and 67.3% (37/55) of the visible cancers in 13-24 months. CAD marked more visible calcified lesions (86%) than masses and asymmetric densities (67%) (p < 0.05). Overall specificity was 72% without and 75% with CAD (p < 0.02). CAD had a greater effect on both specificity (p < 0.02) and sensitivity (p < 0.03) among radiologists who interpret more than 50 mammograms per week. The results were the same for fatty breast tissue and dense breast tissue.
CONCLUSION. In this experiment, CAD increased interpretive specificity but did not affect sensitivity because visible noncalcified lesions that went unmarked by CAD were less likely to be assessed as abnormal by radiologists. Breast density did not affect CAD's performance.
Keywords: BI-RADS breast cancer computer-assisted detection diagnosis screening mammography
|
|
|---|
Methods for improving mammography interpretation are challenging because increasing radiologists' ability to find cancer when it is present (sensitivity) often comes at the cost of decreasing the ability to conclude that cancer is absent in someone who is disease-free (specificity). True improvements in mammography screening performance reflect either an increased ability to find lesions (detection) or better ability to distinguish benign from malignant lesions (discrimination) once detection occurs. When individuals simply interpret a higher proportion of lesions as being abnormal without improving detection or discrimination, sensitivity increases at a cost to specificity.
One promising approach to improving mammography interpretation is computer-assisted detection (CAD) [12-14]. This approach uses computer software to identify and mark areas of concern on digitized versions of film-screen mammograms [15] or soft-copy images obtained with full-field digital mammography equipment [16]. The marked images can be displayed on a small monitor at the base of the reviewer board after the initial interpretive review of the original films or on a large monitor as soft-copy images for full-field digital image review [12, 14].
Early evaluations of CAD showed improvements in cancer detection [13, 17, 18], although a more recent study has raised questions about CAD's performance [19]. CAD is now a U.S. Food and Drug Administration-approved and reimbursed service [5, 15, 18], but several issues remain unresolved including the effect of CAD on specificity, whether CAD can be used to find cancers earlier than otherwise would occur, and CAD's performance in dense versus fatty breast tissue [20-22]. This study was undertaken to evaluate whether CAD improves interpretive performance in a carefully constructed set of cases that allow the evaluation of cancer detection within 1 versus 2 years after screening, characterization of cancers, and the evaluation of the differential effects of CAD in fatty versus dense breasts. We hypothesized that CAD would improve sensitivity at no cost to specificity and would enable more accurate assessment of fatty compared with dense breast tissue.
|
|
|---|
50/wk (n = 10); 51-100/wk (n =
7);> 100/wk (n = 2). Radiologists consented to participate in this
study and were given continuing medical education credit for their time. The
study was reviewed and approved by the health plan's human subjects and study
review committees.
Sample
We randomly sampled women with at least 2 years of health plan enrollment
subsequent to screening mammograms interpreted in the health plan from 1996
through 1998. Each selected woman contributed at most one examination to the
potential sample, with the most recent examination selected for inclusion when
more than one was available. We excluded examinations from women with a
diagnosis of breast cancer before 1996, those missing a breast density
assessment or an interpretation consistent with the American College of
Radiology lexicon in use at the time
[23], and those with a history
of breast augmentation.
We identified a set of 56,387 screening mammography examinations that met eligibility criteria including 28,965 (51%) from women with fatty breasts (BI-RADS density 1 or 2, almost entirely fat and scattered fibroglandular fat) and 27,422 from women with dense breasts (BI-RADS density 3 or 4, heterogeneously dense and extremely dense) [23]. For these women, we linked data from screening mammograms to data from the Surveillance Epidemiology and End Results Reporting (SEER) Registry. We identified 527 with a diagnosis of invasive cancer or ductal carcinoma in situ diagnosed within 2 years of the selected screening mammogram: 226 women with fatty breasts and 301 women with dense breasts. Including all tumors diagnosed within 2 years since screening meant the entire set of tumors detected at the time of the screening examination or in the subsequent interval was included as potential cases. We excluded women with lobular in situ lesions.
We used stratified sampling to select films, allowing us to create a difficult test set that focuses on three key issues: first, the effect of CAD on sensitivity and specificity; second, the differential effect of CAD by breast density; and, third, the ability of CAD to enable radiologists to find cancers earlier than using conventional mammography alone. Films were stratified by breast density (dense vs fatty), cancer status (noncancer, cancer by 1 year, or cancer in 13-24 months), and the original radiologist's assessment in practice (BI-RADS categories 0, 4, or 5 = positive; or BI-RADS categories 1, 2, or 3 = negative). We randomly selected films within each stratum. Power calculations were based on a chi-square test to detect the overall effect of assistance by simultaneously calculating the statistical significance of changes in true-positive and false-positive rates [24]. We planned to include at least 240 cancers (60 in each density-time-to-diagnosis stratum) and 120 noncancers (60 in each density stratum) to achieve 80% power to find a 2.5-point difference in either sensitivity or specificity.
Table 1 shows the distribution of selected films across the sampling strata. We took a random sample among women in each quadrant of a table of cancer by 12 months (yes or no) after screening, cancer in 13-24 months (yes or no) after screening, and interpretation (negative or positive) for each level of density (fatty or dense). This test set was more difficult than a radiologist would encounter in practice. Table 2 shows the expected sensitivity and specificity values based on our sampling, and those numbers (e.g., 71.4% and 75.8% for cancers by 1 year in fatty breasts) are substantially lower than would occur in clinical practice, where sensitivity and specificity are reported to be 78% and 92%, respectively [25].
|
|
Expert Review
For analysis, all films were reviewed for BI-RADS density by an expert
radiologist on our team. Films from women with cancer were also reviewed for
visibility of the cancer and whether a CAD mark corresponded to the area of
the cancer. The radiologist had all prior films and those leading to the
diagnosis during her evaluation of the study films. She was also told the
location of the cancer as recorded in SEER. She used the 4-point density scale
recommended in BI-RADS and recorded visibility as "easily
visible," "visible (should have been found by most
radiologists)," "subtle (could have been found and worked up on a
good day)," "visible only in retrospect," and "not
visible." A tracing of the mammogram was made on a clear overlay of the
image and the cancer was circled. At a separate time, a printout of the
CAD-marked image was compared with the overlay to establish if the CAD mark
was within the area of the cancer. The visible cancers were morphologically
classified by one investigator as one of the following: mass, architectural
distortion, asymmetric density, asymmetric density and architectural
distortion, calcification, or mixed type with calcification.
Training in CAD
We used the ImageChecker M2 1000 system (version 2.2, R2 Technology) for
this test. A 2-hour course in CAD was required before the reviewers began
interpreting images, and a shorter version was given before resuming the
readings 6 months later. The course included practice cases reviewed by
representatives of R2 Technology on the study viewing board and an explanation
of CAD. Before each study was reviewed by a radiologist, 10 cases were
reviewed to reacquaint the radiologist with the R2 equipment and the
data-recording procedures. The results of these interpretations were not
included in the analysis.
Interpretive Testing
Figure 1 shows the design of
the reinterpretation study. Each radiologist rated all the films both with and
without assistance in two independent 4-hour sessions at the baseline (first
interpretation) and then again 6 months later (second interpretation). To
control for the order of computer assistance, radiologists were divided into
two groups (A and B) and films were divided into two sets (I and II), as noted
in Figure 1. The two film sets
included an approximately equal number of films from each stratum. The order
of the sets was changed for the two radiology groups so that we could evaluate
whether findings were affected by seeing the films with CAD assistance at the
first or second interpretation. Within the test sets, the order was not
altered between the first and second interpretations.
|
At each review session, radiologists recorded their assessments on data sheets that included separate ratings for each breast using BI-RADS coding (category 0, 1, 2, 3, 4, or 5). BI-RADS coding in use at the time associated a descriptive summary and recommendations with each of the following assessments: category 0 = incomplete, 1 = negative, 2 = benign, 3 = probably benign, 4 = suspicious for malignancy, and 5 = highly suggestive of malignancy. Radiologists also recorded their start and end time for each review session.
Analysis
Our analyses examine the effect of computer assistance on the average
sensitivity and specificity of assessments made by a group of radiologists. We
dichotomized the BI-RADS scale by considering BI-RADS categories 1, 2, and 3
assessments as "negative" and BI-RADS categories 0, 4, and 5
assessments as "positive." We combined the two breast assessments
into a single assessment per woman using the assessment in the breast with
cancer for women with unilateral cancer, the minimum assessment for women with
bilateral cancer (i.e., a negative assessment if one was given), and the
maximum assessment for women without cancer (i.e., the positive assessment if
one was given). When calculating maximum and minimum assessment for a woman,
category 0 cases were treated as a 3.5 in the ordered BI-RADS categories from
1 to 5.
We describe the overall test performance using simple averages of sensitivity and specificity across radiologists. We tested for differences in sensitivity and specificity using generalized estimating equations that adjust for clustering due to repeated assessment of mammograms both between and within radiologists [26]. This approach focuses on correct estimation of SEs of regression coefficients for non-nested data, but it does not allow estimation of CIs for estimated sensitivity and specificity. We separately estimated the effects of covariates on sensitivity and specificity using logistic regression models, and we tested for both overall effects of assistance and for differential effects of assistance as a function of mammogram and radiologist characteristics. We tested for differential intervention effects (i.e., effect modification) by including interactions between assistance and covariates.
Covariates considered in the separate models included breast density (dense
or fatty), reviewer experience (< 10 vs
10 years), interpretation
volume (
50 vs > 50 mammograms/wk), and order of presentation of test
films (assistance first or second). For women with cancer, we also used
separate models to test the effect on sensitivity of the following: lesion
visibility, presence of calcifications, and presence of a mass. Because our
sample included relatively few radiologists, covariance estimates adjust for
the small cluster size [27].
Statistical tests are based on corresponding regression coefficients and
reflect two-sided tests.
Tests for the association between marking of cancers by the CAD algorithm and lesion visibility, breast density, and timing of cancer diagnosis are based on the chi-square statistic or Fisher's exact test.
|
|
|---|
Of the 21 radiologists who initially agreed to participate in this study, 19 interpreted films during both review sessions including four who had prior CAD experience. Most (n = 17, 89%) interpret mammograms less than 40% of their work time and about half (n = 10, 53%) interpret 50 or fewer mammograms per week. Their average review time for approximately 90 examinations was 1 hour 35 minutes for unassisted review and 1 hour 32 minutes for assisted review.
|
|
Sensitivity
Sensitivity on the test set was higher than the positive proportion in
practice. On the test set, sensitivity dropped slightly with computer
assistance (cancer by 1 year: 63.2% vs 62.0%; cancer in 13-24 months: 33.5% vs
32.3%, respectively), but the differences with and without assistance were not
statistically significant for either cancer group. As shown in Figures
2A and
2B, there was no apparent
separation of the performance of assisted and unassisted interpretation when
graphed as their true-positive versus false-positive rate for cancers by 1
year or cancer in 13-24 months. However, in the modeling of sensitivity,
reading volume significantly modified the effect of assistance on sensitivity
for detection of cancers within 1 year (p < 0.02, based on the
interaction between assistance and reading volume).
Among radiologists who interpret 50 or fewer mammograms per week, CAD was associated with small improvements in sensitivity (cancer by 1 year: 1.7 points higher with assistance [63.3% vs 65.0%, respectively]; cancer in 13-24 months: 0.1 point higher with assistance [34.8% vs 34.9%]). Among radiologists who interpret more than 50 mammograms per week, CAD was associated with decreased sensitivity (cancer by 1 year: 4.5 points lower [63.2% vs 58.7%]); cancer by 2 years: 2.7 points lower [32.2% vs 29.5%]). We did not find evidence that the effect of assistance on sensitivity was modified by either breast density (p > 0.5, both cancer groups) or lesion visibility (p > 0.15, both cancer groups). Finally, the effect of computer assistance was not altered by whether CAD was used with the test set during the first or second round of review (p > 0.15, both cancer groups).
Specificity
Specificity, relative to unassisted interpretations, was higher with
assistance than without assistance (72% vs 75%, respectively; p <
0.02). Interpretation volume significantly modified the effect of assistance
on specificity (p < 0.03, based on an interaction between
assistance and volume). Assistance was associated with a larger increase in
specificity among radiologists who interpret more than 50 mammograms per week
(6.6 points higher with assistance [78.4% vs 71.8%]) relative to those who
interpret 50 or fewer mammograms per week (0.8 points higher with assistance
[72.4% vs 71.6%]). We did not find evidence that the effect of assistance on
specificity was modified either by breast density (p > 0.95) or by
whether the mammogram was first seen with or without CAD (p >
0.75) (e.g., whether CAD was used at the first or second interpretation).
Table 3 shows CAD marking by lesion visibility among the visible lesions. Sixty-seven percent (n = 152/227) of the cancers were visible but only 146 are considered in Tables 3 and 4 because six were missing information on markings (one cancer by 12 months and five cancers in 13-24 months). The proportion of visible masses differed by density and time to the cancer diagnosis. As expected, visibility differed significantly for masses in fatty versus dense breasts (p < 0.05). Among the cancers within fatty breasts (n = 124), 23% (n = 28) were easily visible, 18% (n = 23) were visible, 23% (n = 28) were subtle, 9% (n = 11) were visible in retrospect, and 27% (n = 34) were not visible. Among the cancers within dense breasts (n = 103), 7% (n = 7) were easily visible, 26% (n = 27) were visible, 22% (n = 23) were subtle, 5% (n = 5) were visible in retrospect, and 40% (n = 41) were not visible.
|
|
Among the cancers found within 12 months (n = 114), 25% (n = 28) were easily visible, 31% (n = 35) were visible, 22% (n = 25) were subtle, 4% (n = 4) were visible in retrospect, and 19% (n = 22) were not visible. Among the cancers found 13-24 months (n = 113), 6% (n = 7) were easily visible, 13% (n = 15) were visible, 23% (n = 26) were subtle, 11% (n = 12) were visible in retrospect, and 47% (n = 53) were not visible.
Among the visible cancers with both lesion type and marking information (n = 145), 50% (n = 72) were masses without calcifications, 16% (n = 23) were calcifications only, 14% (n = 21) were mixed types with calcifications, 13% (n = 19) were asymmetric densities, 4% (n = 6) were architectural distortions, and 3% (n = 4) included both architectural distortions and asymmetric densities. The analysis of visibility and CAD marking excluded films missing CAD marking information. Among known visible cancers, the likelihood that a cancer was marked by CAD increased with the degree of visibility reported by the expert radiologist (p < 0.001) and did not differ between fatty and dense breasts (p > 0.50).
Across all cases, radiologists changed their interpretation from negative when unassisted to positive when assisted for an average of 8.4% of the cases and they changed from positive when unassisted to negative when assisted in an average of 10.5% of the cases. On average, 81% of cases were interpreted the same whether assisted or unassisted. Among the visible cancers radiologists changed their interpretation from negative when unassisted to positive when assisted in an average of 9.2% of the cases and they changed from positive when unassisted to negative when assisted in an average of 10.2% of the cases.
Table 4 shows CAD marking by cancer characteristics. Overall, 73% of cancers were marked by CAD. The CAD algorithm marked 77% (70/91) of the visible cancers diagnosed within 1 year, and 67% (37/55) of the visible cancers found in 13-24 months. Computer marking of the lesions differed by lesion characteristics. CAD markings were more likely among cancers associated with calcifications (86%, 38/44) than cancers without calcifications (i.e., mass, asymmetric density, or architectural distortion) (67%, 68/101) (p < 0.03).
For cancers diagnosed by 1 year and those diagnosed in 13-24 months, the impact of CAD on sensitivity differed significantly (p < 0.03) depending on whether the technology marked the lesions. Among the visible cancers, CAD assistance was associated with increases in sensitivity when cancers were marked (82.7-83.1% for marked cancer by 1 year; 55.2-57.9% for marked cancer in 13-24 months) and with decreases in sensitivity when cancers were not marked (from 37.4% to 30.1% for unmarked cancer by 1 year; from 29.7% to 23.0% for unmarked cancer in 13-24 months).
|
|
|---|
Although our results are surprising, they are consistent with recent work that showed no significant effect of CAD in an academic setting [18], but they differ from earlier reports showing improved detection [2, 10, 12, 13]. Our study offers an opportunity to better understand CAD, but comparing it with other studies is difficult because several factors influence the results of CAD evaluations: first, how improvement was measured; second, the mix of masses, asymmetric lesions, and calcified lesions involved; third, the mix of breast densities; fourth, the CAD algorithm; and, fifth, the interpretive abilities of the radiologists. How each study addresses each of these factors influences its comparability to the others.
Other work has looked at whether CAD marks lesions [4, 11, 13, 17, 28]. Our study tested the effect of CAD on radiologists' recommendations and hence on sensitivity and specificity. Our findings suggest that the radiologists did not act on all CAD marks. With CAD assistance, specificity improved or was unchanged. However, not all the visible tumors were marked by CAD and it appears that radiologists relied on the presence of CAD markings when assistance was present. One previous study showed a reduction in sensitivity with CAD assistance among cancers diagnosed by 1 year [29]. In our study, sensitivity decreases were restricted to radiologists who interpret more than 50 mammograms per week.
The early reports that describe the marking of visible lesions [13, 17] found the same rate of CAD marking that we report: CAD did not mark 23% of the lesions that were visible in retrospect [13]. Training for our study included instructions to the radiologists to recommend evaluation even if there were no CAD marks, but similar to two other studies, we found that radiologists were less likely to recommend evaluation of visible lesions that were not marked by CAD [8, 20, 22]. The reaction of radiologists to markings is critical to the success of this technology. Careful attention to this aspect of CAD instruction is essential in future training and research, and the tendency to defer to technology needs acknowledgment as a risk of its use.
The case mix, including the distribution of breast densities and tumor characteristics across films, also influences a CAD evaluation. Breast density can affect the performance of CAD because of its influence on lesion visibility [29]. Therefore, studies that have a higher proportion of dense breasts may show a lower impact of CAD on overall cancer detection because fewer lesions are visible. However, we found that marking of visible cancers occurs with similar frequency among fatty and dense breast tissue. Therefore in studies restricted to the effect of CAD on detection of visible lesions, density is unlikely to have an effect.
In keeping with previous reports, we found that calcifications are more likely to be marked by CAD than masses, asymmetric densities, or architectural distortions [12, 17, 30]. In practice, masses account for most nonpalpable breast lesions (59%) and most missed lesions (70%) [17, 31]. Within the film set examined by Warren Burhenne et al. [13] and Birdwell et al. [17], 47% of cancers were masses and 51% were calcified lesions. Because their set included a higher proportion of calcifications than expected in clinical practice, their study results may overestimate the potential impact of CAD [4, 13, 17].
Another consideration in our test set is the presence of bilateral cancers. The scoring system we used combines the breast-level interpretation into a single interpretation for each patient. For women without cancer, we took the most abnormal interpretation. For women with cancer, we took the most abnormal interpretation from the breast with cancer. In a woman with bilateral cancer, we took the least abnormal interpretation. The four cases of bilateral cancer included in our set resulted in 113 assessments (57 unassisted and 56 assisted). Of these, 67% were in agreement for both breasts (i.e., both positive or both negative). Only 37 (33%) of the bilateral examinations were given a negative assessment because the radiologist did not identify both cancers. All bilateral cancers occurred within 12 months of screening, and these 37 discordant assessments represent only 0.9% of the 4,234 assessments of films with cancers diagnosed within 12 months. Based on how rare these interpretations were (0.9%) of the total number of interpretations, it is unlikely that our "minimum score" applied to bilateral cancers affected our results.
CAD algorithms continue to evolve over time, and therefore the particular version of a vendor's product that is tested will influence conclusions about CAD performance [13, 22, 32]. Product differences between vendors further complicate the evaluation of this technology. Because we used version 2.2 of ImageChecker by R2 Technology, we expected some improvement in performance relative to findings from Warren Burhenne et al. [13] who used the earlier version of ImageChecker (version 1.2). Although a newer version of ImageChecker is now on the market, our study demonstrates an approach to assessment that needs replication to test the new algorithm's performance with respect to visible noncalcified masses and asymmetric densities.
Finally, radiologist experience as reflected in training and interpreting volume may also affect CAD evaluation. Whether the study involves breast radiologists or general radiologists may affect performance measures [33]. Furthermore, the volume of images that radiologists interpret is expected to affect their performance, although the evidence for this is thin to date [14, 34, 35]. High-volume breast specialists may not have as much potential for improvement with the use of CAD as low-volume general radiologists. The CAD studies reported to date include primarily breast specialists [4, 10, 12, 13, 17]. One report including a breast specialist is from community practice [28]. Our study included mostly general radiologists and approximately half interpret 50 or fewer mammograms per week.
A potential limitation of our study is the test setting; the radiologists knew they were being evaluated and that the cancer rate in the test set was higher than that in practice. This knowledge might have made them pay closer attention to the films, and there is some evidence that the radiologists in this study elevated their performance. Sensitivity and specificity on the test set were the same or higher than expected from sampling. This elevation of practice would make it harder to show an advantage with computer assistance, but it should not affect the differential findings with respect to marked and unmarked lesions. In addition, a recent study that showed that the prevalence of lung abnormalities did not affect estimates of interpretive performance on chest radiographs is reassuring [36]. The authors of that lung study suggested that the use of a case set enriched with cancers would not bias estimates of CAD performance [36]. Finally, a recent evaluation of CAD in a practice setting showed no change in specificity and no significant improvement in detection among 24 radiologists interpreting 59,139 examinations compared with the same radiologists interpreting 56,432 examinations without CAD [18].
Another limitation of the study is that we used one radiologist as the standard for assessment of breast density and visibility of cancers. Although visualization of a gradient of markings by CAD across the radiologist's assessment of degree of visibility provides some internal validation, our findings with respect to the effect of CAD on the diagnosis of visible cancers might differ slightly if another radiologist had decided which cancers were visible.
Finally, we recognize there is no gold standard for the lesion characteristics and density designations identified in this study. The proportion of "visible" cancers and lesions with markings across lesion characteristics might change using other "experts" as the standard; however, the findings that CAD had no effect on sensitivity and specificity would not be altered. In studies in which individuals or panels of experts select visible lesions, the ambiguities of "visibility" and the unknown pool from which they are drawn (all cancers found at diagnosis, all cancers reported within a set period of time, or all cancers that occur among all screened women) make it difficult to generalize to the full range of cancers that occur in a population of women seen in a center. That generalization is what is critical to understanding whether CAD will improve practice and how well it is working.
Despite the challenges of evaluating CAD, progress is being made. The fears of making specificity worse are not being born out. Warren Burhenne et al. [13], Freer [12], or Gur et al. [18] do not show clinically significant increases in recall among radiologists using CAD in mammographic interpretations. Our study suggests that specificity may actually improve, especially among experienced radiologists, but some of this improvement may come at a cost to sensitivity. The challenge is that to improve performance, radiologists must use judgment regarding whether to act on their findings or on the computer algorithm's markings. Our study also suggests that when CAD does not mark a lesion, reviewers may ignore their own findings, thereby decreasing their sensitivity. This finding raises the question of whether there is a potential for CAD to do harm.
To mitigate this potential harm, training should emphasize lesion characteristics of masses, asymmetric densities, and architectural distortions that may be missed by CAD while efforts continue to improve algorithms used to identify these lesions. For the algorithms to be improved, they should be first tested on a common test set with a known proportion of cases among high-versus low-density breast tissue and a known proportion of masses, calcifications, and mixed lesions to allow comparison with existing algorithms before testing new ones in a clinical setting. Then in clinical tests, studies should report the lesion characteristics and the proportion of lesions marked by CAD within the lesion subgroups. Ideally, studies would be based on film sets that are representative of the population to which CAD would be applied (e.g., screening mammograms or diagnostic mammograms). If sampling is done, the distribution of films across true-positive and true-negative interpretations among cancers and noncancers should also be included.
Visible lesions that are unmarked by CAD offer a potential for harm if radiologists do not act on them because of the absence of CAD markings. These visible unmarked lesions also offer the best chance to improve mammography performance if the CAD evaluation of masses, asymmetric densities, and architectural distortions is improved through further research [32].
Acknowledgments
We wish to acknowledge the careful work of Deb Seger, Alice Park, and
Letitia Hodgkinson who handled the many details of implementing this study
within an active health care environment and Tammy Dodd for shepherding the
manuscript to completion. We also greatly appreciate the assistance of R2
Technology who provided equipment and technical assistance for the
project.
|
|
|---|
16 and 900 [docket no. 95N-0192],
RIN 0910-AA24 ed. Washington, DC: Department of Health and Human Services.1997This article has been cited by other articles:
![]() |
D. S.M. Buist, M. L. Anderson, S. D. Reed, E. J. Aiello Bowles, E. D. Fitzgibbons, J. C. Gandara, D. Seger, and K. M. Newton Short-Term Hormone Therapy Suspension and Mammography Recall: A Randomized Trial Ann Intern Med, June 2, 2009; 150(11): 752 - 765. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. J. Fenton, S. H. Taplin, P. A. Carney, L. Abraham, E. A. Sickles, C. D'Orsi, E. A. Berns, G. Cutter, R. E. Hendrick, W. E. Barlow, et al. Influence of Computer-Aided Detection on Performance of Screening Mammography N. Engl. J. Med., April 5, 2007; 356(14): 1399 - 1409. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. F. Brem Clinical Versus Research Approach to Breast Cancer Detection with CAD: Where Are We Now? Am. J. Roentgenol., January 1, 2007; 188(1): 234 - 235. [Full Text] [PDF] |
||||
![]() |
D. Gur and J. H. Sumkin CAD in Screening Mammography. Am. J. Roentgenol., December 1, 2006; 187(6): 1474 - 1474. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |