|
|
||||||||
Tristán Associates Harrisburg, PA 17111
The definitive study of the benefits and drawbacks of computer-aided detection (CAD) for mammography was published in 2001 by Freer and Ulissey [1]. In that prospective study involving 12,860 mammograms, radiologists recorded their assessment without CAD, then reviewed the CAD analysis and modified their assessment if modification was appropriate. In other words, they duplicated the conditions of real-life mammographic interpretation. They found 19.5% more cancers with CAD than without. Perhaps the only way to improve on that study would be to repeat it at numerous institutions, to be certain that others could match their performance. The study design itself was flawless.
It therefore came as a surprise to read the article by Brem et al. in the September 2003 issue of AJR [2], which fell short of the standard set by Freer and Ulissey [1], shedding no new light on the subject. Brem et al. solicited from several institutions mammograms whose findings were interpreted as normal in women who were subsequently diagnosed with cancer by screening mammography within 2 years. Three radiologists individually reviewed each case, unaware of the anatomic site at which cancer was subsequently diagnosed, and recorded whether he or she would have recalled the patient for additional studies; for those patients who were recalled, the site of concern was noted as well. Two additional radiologists reviewed the recalled patients' records and found that in half of them, the site of concern was indeed the location of the subsequently diagnosed cancer. Those cancers were judged to have been missed by the original interpreting radiologist. The missed cancers were analyzed using CAD, which placed marks on two thirds of them. The authors concluded that radiologists' sensitivity increased 21% with CAD.
In a study with an almost-identical design published 3 years ago, Warren Burhenne [3] et al. reported virtually identical findings. Several assumptions inherent in the current study by Brem et al. and the earlier study by Warren Burhenne et al. were criticized in a letter by Sacks [4]. Briefly, those problematic assumptions concerned:
Patients were not included for whom no earlier mammograms were available.
The time interval (924 months) during which earlier mammograms were obtained was set arbitrarily.
The sample did not include patients whose cancers became palpable later or whose cancer was missed again on the next round of screening.
Inevitably, most studies early in the evolution of any technology tend to be retrospective reviews, like that of Warren Burhenne et al. [3]. Prospective studies, like that of Freer and Ulissey in 2001 [4], are performed later under more realistic conditions. Why the big step backward with the article by Brem et al. [2]?
CAD is used as an aid to, never in place of, a radiologist. Therefore, a comparative analysis has little value. Proving that CAD can detect a given lesion is meaningless. What matters is its ability to convince a radiologist to pursue additional workup for a lesion that he or she might otherwise have overlooked or dismissed. The difference is critical to understanding why one should not simply add the sensitivity of CAD to the calculated sensitivity of a radiologist. If CAD never yielded a false-positive mark, then most radiologists would pursue each lesion marked by CAD, and it might be accurate simply to add the sensitivity of CAD to that of radiologists, as Brem et al. [2] have done (although the study would still suffer from the limitations pointed out by Sacks [4]). But such is far from the case. On average, CAD produces up to 2.8 marks per four-view mammogram, so most marks made by any CAD system are false-positives. Therefore, simply proving that CAD would have marked an overlooked cancer does nothing to measure whether radiologist performance would have been enhanced by CAD. A study like that of Brem et al. asks the question, "What might a radiologist theoretically have done using this technology, subject to certain numerous assumptions?"but a study like that of Freer and Ulissey asks, "How do radiologists actually perform using this technology?" I would choose the latter any day.
In summary, Brem et al. [2] have produced estimates of radiologists' sensitivity without and with CAD that do not reflect the day-to-day practice of mammography. A test under real-world conditions is the only fair test of CAD systems because their use is adjunctive to radiologists, not competitive with them.
References
The George Washington University Medical Center Washington, DC 20037
Note.The reader's attention is directed to another reply by Dr.
Brem, which begins on page 1598.
Both studies were large trials conducted at multiple institutions18 for our study [2] and 13 for Warren Burhenne et al. [3]and led to similar outcomes:
The sensitivity of radiologists without CAD in our article was 377/(377 + 123) = 75.4%. The sensitivity of radiologists without CAD for Warren Burhenne et al. [3] was 427/(427 + 115) = 78.8%.
The sensitivity of radiologists when aided by CAD in our article was (377 + 80)/(377 + 123) = 91.4%. The sensitivity of radiologists when aided by CAD for Warren Burhenne et al. [3] (computation not shown in their article, but using the same calculation method as in our article with their data) was (427 + 89)/(427 + 115) = 95.2%.
The improvement in radiologist sensitivity with use of CAD in our article was (91.4% / 75.4%) 1 = 21.2%. The improvement in radiologist sensitivity with use of CAD for Warren Burhenne et al. [3] (computation not shown in their article, but using the same calculation method as in our article with their data) was (95.2%/78.8%) 1 = 20.8%.
In general, prospective studies such as the one accomplished by Freer and Ulissey [1] at one institution are preferable to retrospective studies such as ours [2] and the one by Warren Burhenne et al. [3], but the feasibility of conducting a prospective CAD study at 1318 institutions would be a daunting task. Therefore, the data presented in both of these large-scale retrospective studies are a significant contribution to the medical literature. They even corroborate the 19.5% increase in cancer detection with CAD reported by Freer and Ulissey, with increases in radiologist sensitivity of 21.2% (our article) and 20.8% (Warren Burhenne et al.). To show that there is approximately one order of magnitude of difference in scale of these retrospective studies compared with the prospective study, the numbers of additional cancers detected using CAD of the numbers detected without using CAD from the same (Freer and Ulissey) or similar (our article and the one by Warren Burhenne et al.) cohorts was eight of 41 at one institution (Freer and Ulissey), 80 of 377 at 18 institutions (our article), and 89 of 427 at 13 institutions (Warren Burhenne et al.).
We agree with Guenin that the criticisms discussed by Sacks [4] regarding the Warren Burhenne et al. [3] would also apply to our study. Also, we agree with the response by Warren Burhenne et al. [5] to Sacks' letter, which pointed out that the study design was described and its limitations addressed. Warren Burhenne et al. even mention, "as with all research studies, conclusions are always bracketed by the nature of the data acquired and the assumptions used to extrapolate results to the general population."
Guenin's suggestion that we "simply add the sensitivity of CAD to that of radiologists" is incorrect. Although the numerator of the CAD-assisted sensitivity contains the term "NCAD," we do not add CAD sensitivity to that of the radiologist. NCAD is a weighted average of the CAD detections in which the weighting corresponds to the number of panel radiologists detecting that lesion. This weighting partially accounts for the fact that not all CAD-detected lesions will be followed up by the radiologist.
Guenin also mentions, "CAD is used as an aid to, never in place of, a radiologist." We would fully agree with this statement and did not say anything to the contrary. He then goes on to state, "Therefore, a comparative analysis has little value." Our study is not a comparative analysis. We do not compare the performance of CAD alone versus a radiologist alone. Instead, we show performance improvement for radiologists using CAD.
In summary, all three CAD studies discussed here address the impact of CAD in screening mammography. As a prospective study, Freer and Ulissey's study [1] uses the study design of choice, but this single-institution study included a limited number of breast cancer cases, a disease with low incidence and prevalence. Our study [2] and that of Warren Burhenne et al. [3] have the intrinsic limitations associated with retrospective studies. However, our study design allowed inclusion of large numbers of cancer cases, which amounted to approximately one order of magnitude more than in Freer and Ulissey's study. Most important, even with the use of different manufacturers' CAD systems and the differences in design and magnitude, all three of these landmark CAD studies offer the same conclusion: Using mammographic CAD is expected to result in an increase in breast cancer detection of approximately 20%. This, we most likely can agree, is a benefit to breast imaging radiologists and the women they serve.
References
This article has been cited by other articles:
![]() |
J. M. Ko, M. J. Nicholas, J. B. Mendel, and P. J. Slanetz Prospective assessment of computer-aided detection in interpretation of screening mammography. Am. J. Roentgenol., December 1, 2006; 187(6): 1483 - 1491. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Zheng, G. S. Maitz, M. A. Ganott, G. Abrams, J. K. Leader, and D. Gur Performance and Reproducibility of a Computerized Mass Detection Scheme for Digitized Mammography Using Rotated and Resampled Images: An Assessment Am. J. Roentgenol., July 1, 2005; 185(1): 194 - 198. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |