|
|
||||||||
Beth Israel Deaconess Medical Center and Harvard Medical School Boston, MA 02215
The article by Brem et al. [1] concludes that the "use of the computer-aided detection (CAD) system significantly improved the detection of breast cancer by increasing radiologist sensitivity by 21.2%. Significance is in the eyes of the beholder, and in the scientific literature it should be used only in the sense of statistical significance. On the other hand, I doubt that using tenths of a percentage point in this conclusion is statistically significant. Be that as it may, I find this article to be slightly misleading because it does not discuss the tradeoff between sensitivity and specificity.
If I review 100 screening mammograms a day, I will linger over and eventually pass approximately 10 findings that I deem minimally suspicious (chance of cancer believed by me to be < 1/500). If these estimates are correct, and I review 100 mammograms per day 5 days per week, then approximately twice per year one of these equivocal findings will later be diagnosed as cancer and come back to haunt me, or at least be brought to my attention and perhaps to the attention of a lawyer with a CAD system. If I called back each of these women, my sensitivity would increase, as it did in the article by Brem et al. [1], but my callback rate would increase from its present 6% to well above the accepted upper limit of 10%.
A few biases are inherent in a retrospective study such as this. Brem et al. [1] point out that if only "one or more of the three study radiologists recommended workup...the case was classified as a missed cancer...and was processed by the CAD system," assuming that the area of suspicion corresponded to the area of subsequently proven cancer. Furthermore, the participants were all expert mammographers and had a high index of suspicion that a subtle previously missed cancer was present because "the study radiologists were told normal cases might be included as distracters among the cases they received. In fact, no normal cases were included."
In conclusion, Brem et al. [1] suggest that using CAD will increase the sensitivity of cancer detection in screening mammography, but they do not address, or even discuss, the equally important question of how this increase in sensitivity might decrease specificity, with increased callbacks and benign biopsies.
References
The George Washington University Medical Center Washington, DC 20037
Note.The reader's attention is directed to a second reply by Dr.
Brem, which appears after the next letter.
Archival publications that address the impact of a commercial CAD system on recall rate include Warren Burhenne et al. [2] and Freer and Ulissey [3]. Warren Burhenne et al. assessed the impact of CAD on recall rate with two separate sets of cases: one set of 23,682 cases reviewed without CAD and a second set of 14,817 cases reviewed with CAD. The same 14 radiologists from five institutions reviewed relatively similar proportions of cases in each set, and no case subject was included in both sets. The recall rate without CAD was 8.3%, and the recall rate with CAD was 7.6%.
Fairly complicated statistics were used to conclude that recall rates showed no statistical change caused by using CAD.
Freer and Ulissey [3] assessed the impact of CAD on recall rate when CAD is used clinically. They recorded radiologists' recall decisions before and after viewing CAD marks with 12,860 prospective cases interpreted by two radiologists at one institution. The recall rate was 6.5% (95% confidence interval [CI], 6.06.9%) without CAD and 7.7% (95% CI, 7.28.1%) with CAD. These 95% CIs do not overlap, indicating that this is a statistically significant increase in recall rate, but the magnitude of the increase (1.2%) is not likely to be considered clinically important, especially when balanced with their concurrent 19.5% increase in cancer detection with CAD.
We conducted a companion study using the same commercial CAD system that we used for the work described in our recent AJR article [1] and presented the results at the Radiological Society of North America 2001 annual meeting [4]. This study used a research design similar to that of Freer and Ulissey [3], with 4,295 cases interpreted by 16 radiologists at five institutions. One difference was that we assessed workup rate, rather than recall rate, and included short-interval follow-up in our definition of workup. This may have led to the high workup rates reported in this study. The workup rate was 16.3% without CAD and 16.8% with CAD, which, with rounding, represented a 0.5% (23/4,295) increase in workup rate with CAD. Again, this 0.5% increase is not likely to be considered clinically important when balanced with the 21.2% increase in sensitivity reported in our companion study.
This leads to another point by Hall, who questioned the significance of the 21.2% increase in sensitivity we report. Statistical significance can be assessed with the 95% CIs reported as 14.528.3%. Since the lower bound (14.5%) is well above 0, this is a statistically significant result. In addition, we believe that even this lower bound would be considered a clinically significant improvement in radiologist sensitivity.
Finally, Hall points out that retrospective studies have limitations. We discuss these issues in reply to a companion letter by Mark Guenin concerning our AJR article [1]. The additional limitation that Hall points out is that we did not use normal distracters for the study radiologists who reviewed the potentially missed cancer cases. We expected most of the 377 potentially missed cancer cases not to have any subsequently diagnosed cancer lesions visible on the blinded review the study radiologists performed. The earlier examinations were selected because mammograms were available, without regard for the visibility of lesions on them. Therefore, we believed that this data set would have an adequate number of distracters without including additional ones. Because 177 of the 377 potentially missed cancer cases in fact did have lesions that were detected on blinded review, the other 200 cases functioned in effect as distracters. We did not include this discussion in the article, for the sake of brevity. Hopefully, this response clarifies the issue for Hall and other readers who may have had similar questions.
In summary, the available literature shows that CAD does not significantly affect recall rate. We believe this adds further support to our conclusion that the 21.2% increase in radiologist sensitivity using CAD that we report (and that is confirmed by Warren Burhenne et al. (20.8%) [2] and Freer and Ulissey (19.5%) [3] is significant, as we discuss in our companion response to Guenin.
References
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |