|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Original Research |
1 Breast Imaging Section, Charlotte Radiology, P.A., 1701 East Blvd., Charlotte, NC 28203.
Received July 2, 2007;
accepted after revision December 9, 2007.
Address correspondence to M. Gromet
(matthew.gromet{at}charlotteradiology.com).
Abstract
|
|
|---|
MATERIALS AND METHODS. A review was performed of 231,221 screening mammograms interpreted by experienced mammographers from 2001 through 2005 in a community-based mammography program. In 112,413 (48.6%), mammographers performed the first of two readings. In 118,808 (51.4%), they performed a single reading aided by CAD.
RESULTS. For double reading, the first reader's recall rate was 10.2%; sensitivity, 81.4%; positive predictive value, as a percentage of positive screening mammograms resulting in a tissue diagnosis of cancer within 1 year (PPV1), 4.1%; and cancer detection rate, 4.12 per 1,000. After the double-reading process, the final recall rate was 11.9%; sensitivity, 88.0%; PPV1, 3.7%; and cancer detection rate, 4.46 per 1,000. For single-reading with CAD, the recall rate was 10.6%; sensitivity, 90.4%; PPV1, 3.9%; and cancer detection rate, 4.20 per 1,000. Statistically significant results included a lower recall rate with CAD compared with double reading (10.6% vs 11.9%, respectively; p < 0.0001); increased sensitivity with CAD compared with the first reader (90.4% vs 81.4%, p < 0.0001); and increased recall rate with CAD compared with the first reader (10.6% vs 10.2%, p < 0.0001).
CONCLUSION. Double reading increased sensitivity with a modest increase in the recall rate compared with single reading. Single reading with CAD, compared with double reading, resulted in a small, but not statistically significant, increase in sensitivity with a lower recall rate. Our results indicate that CAD enhances performance of a single reader, yielding increased sensitivity with only a small increase in recall rate.
Keywords: breast cancer computer-aided detection CAD double reading mammography screening mammography
|
|
|---|
Our screening mammography practice started in 1985; the methodology has been previously reported [7]. The practice has grown to 10 sites in the Charlotte, NC, area and now obtains approximately 66,000 screen ing mammograms annually. Exam i na tions are batch-read, with negative reports rendered by mail to the patient and her physician. Patients needing additional imaging (BI-RADS 0) are recalled by telephone to one of our comprehensive breast centers where workup and biopsy, if needed, are performed. Until 2003, we double-read our screening mammograms; during 2003, we converted to single reading with CAD. As a large, single-institution program with relative consistency of methods, patients, radiologists, and data collection, we believe that results from our experience before and after CAD implementation could provide additional evidence about the usefulness of this technology.
This study compares the recall rate, sensitivity, positive predictive value (PPV), and cancer detection rate for the three reading methods: single reading with CAD, double reading, and the first reader (without CAD) in a double-reading program. Biopsy and pathology data for positive cases are also compared.
|
|
|---|
We use a strict separation of screening from diagnostic mammography patients. Patients who indicate a lump, discharge, or skin or nipple changes are redirected to diagnostic mammography. We allow patients with implants to have a screening mammogram if they otherwise qualify; four views of each breast are obtained including "push-back" (Eklund) views. Patients who are within 5 years of a breast cancer diagnosis undergo diagnostic, not screening, mammography.
All mammograms in the study were film-screen studies obtained on Lorad M-IV equipment. The film was Kodak Min-R 2000 until November 2003; for the remainder of the study period, it was Kodak Min-R EV.
Data Tracking and Medical Audit
All screening mammograms (and subsequent diagnostic studies, biopsies, and
patho logy) are tracked in a customized database developed and maintained by
Complete Mammography Data Man age ment in Charlotte, NC. For double-read
studies, each radiologist's interpretation is tracked independently, as is the
final disposition. In this way, a false-negative can be attributed to the
first reader if a cancer was detected only after the second reading.
False-negatives include all cancers diagnosed within 1 year of a negative screening interpretation (excluding those diagnosed on the next annual screen) regardless of whether the cancer is visible on the mammogram. Because the interpretations are tracked separately, a cancer may appear as a true-positive for one reader and a false-negative for another reader. A cancer is counted as a true-positive in the practice's sensitivity calculation if the final disposition results in recall of the patient.
Positive mammograms are followed up to obtain pathology results to calculate true- and false-positives. False-negative studies are identified through our practice when interval cancers occur and by searching the cancer registries at the major hospital systems in Charlotte. Because some patients may leave our area, some interval cancers remain undiscovered, but our methodology was consistent throughout the study period.
Cancers are categorized as invasive or in situ (ductal carcinoma in situ [DCIS]); patients with both invasive and in situ cancers are categorized as having invasive cancer. Lobular carcinoma in situ (LCIS), atypical ductal hyperplasia, atypical lobular hyperplasia, and atypical papillary lesions are not included as malignancies.
Double-Reading Method
We employ a clerical assistant who hangs the screening mammograms on a
mammography multiviewer with comparison films, if available. A 3-year-old
prior examination is preferentially selected; additional prior films are made
available to the radiologist on request. After the first reader is given
demographic information about the patient and reviews the films, he or she
renders an interpretation, which is recorded by the assistant. One of our
specialized mammographers performs the first reading. The second reading is
usually performed by a general radiologist who maintains certification in
mammography but does not specialize in this area. Generalists were used for
the second reading because of their desire to maintain Mammography Quality
Standards Act [8] certification
and because of logistics: Our reading stations are dispersed geographically,
and we do not have enough subspecialists to second-read all cases. Cases
designated for recall by the first reader (subspecialist) are recalled
regardless of the second reading. Cases deemed negative by the first reader
and positive by a generalist second reader are routed to a third opinion by a
different subspecialist reader who determines the final reading. The
subspecialist third opinion allows us to reduce unnecessary recalls due to the
higher confidence and skill the subspecialist provides. Two negative opinions
are always required for a final negative interpretation.
CAD Reading Method
During 2003, CAD (Image Checker CL, R2 Technology; software version 3.2
when installed and sequentially upgraded to version 5.3 in 2004) was
implemented on a staged basis throughout our mammography reading sites. When
CAD was installed, double reading was discontinued in favor of a single read
by a specialized mammographer. CAD results were available to the reader on
small video displays below the mammogram. The study did not attempt to measure
the actual influence of CAD on individual readings, but rather the overall
patient outcomes when using CAD to read screening mammograms. In a small
minority of cases (2.1%), the reader obtained a second opinion before the
issuance of the report; this step may or may not have been related to a
CAD-marked abnormality. Because this second opinion was not part of a
structured double-reading program, these cases were still counted in the pool
of CAD single-read mammograms.
Statistical Analysis
Descriptive statistics, including counts and percents or means and SDs,
were calculated. The chi-square test was used for data measured on the nominal
scale. The Student's t test was used for data measured on the
interval scale. A p value of less than 0.05 was considered
statistically signifi cant. SAS software (SAS Institute) was used for all
analysis.
|
|
|---|
|
|
|
Interpreting Radiologists
The nine radiologists whose data were used for this analysis had been
practicing mammography for a mean of 15 years before 2001 (range, 1–24
years). The only radiologist with less than 5 years' experience before 2001
joined our practice directly from a mammography fellowship in July 2000.
During the 2001–2005 study period, the nine radiologists interpreted a total of 428,909 mammograms (including second readings and diagnostic examinations) including those in this analysis. On an annual basis, they read a mean of 9,531 examinations and a median of 8,895 (range, 4,459–15,281 readings).
The nine radiologists' CAD experience during the study period included a mean of 13,201 examinations and a median of 12,137 examinations (range, 7,128–24,778 examinations).
Comparison of Patient Groups
Because historical studies are potentially biased by differences in patient
populations, we examined two variables to determine comparability of the cases
double-read and those single-read with CAD. The first variable was patient age
because the increasing incidence of cancer with age could bias the expected
cancer detection rate. The double-read patients had a mean age of 53.8 years
(SD, 11.5 years), and the single-read CAD patients had a mean age of 53.5
years (SD, 11.1 years). Figure
1 shows the age distribution by decade. Due to the large number of
patients, there was a statistically significant difference in average age
(p < 0.0001), even though the difference (0.3 year younger for the
CAD group) was small.
|
|
The data show that 58.6% of patients in the double-reading group had a 1-year prior mammogram compared with 61.7% of those in the CAD group. In addition, 8.5% of the double-read mammograms were the patients' first mammogram compared with 7.4% of the CAD-read studies. These differences are statistically significant (p < 0.0001).
For patients who had a prior mammogram of a known date, the mean time since the most recent prior mammogram was 19.1 months (SD, 10.8 months) for the double-read group and 18.8 months (SD, 10.9 months) for the CAD group. The quantitative effect of the age distribution and the time since the most recent prior mammogram on the expected outcomes was not determined as part of this study. However, both the small difference in age (younger for the CAD group) and the slightly larger differences in the time since the most recent prior mammogram (fewer first mammograms and more patients with mammograms from 1 year earlier for the CAD group) would tend to bias the CAD group toward a lower cancer detection rate, lower sensitivity, and lower recall rate [9].
First reader performance in the double-reading program—During the period of double reading, the 112,413 first reads by the nine radiologists were designated BI-RADS category 0 (recall for further imaging studies) in 11,418 cases, or 10.2%. After workup, these recalls yielded a cancer diagnosis in 463, for a cancer detection rate of 4.12 per 1,000. The positive predictive value, as a percentage of positive screening mammograms resulting in a tissue diagnosis of cancer within 1 year (PPV1), was 4.1% (463 divided by 11,418 recalls).
After medical audit, 106 cancers were diagnosed within 1 year of negative interpretations rendered by the first readers: 68 were interval cancers and 38 were detected by the second reader. The sensitivity of the first reader was 81.4% considering these 106 cases as false-negatives.
Contribution of the second reader in the double-reading program—All screening mammo grams, whether read as negative or recalled by the first reader, received a second reading. Based on the second reading, including third opinions when needed, a total of 13,426 patients were recalled, or 11.9% of patients screened. After workup, 501 of these recalled patients had cancer, for a cancer detection rate of 4.46 per 1,000. The PPV1 was 3.7%.
The benefit of double reading was the detection of 38 additional cancers, causing a reduction of false-negatives to 68. The second reader increased the sensitivity by 6.6 absolute percentage points to 88.0%. This benefit came at a cost of 2,008 additional patients recalled and 140 additional biopsies performed. The PPV1 was lowered to 3.7%, and the cancer detection rate increased by 0.34 per 1,000.
Performance of single reader with CAD—The 118,808 screening studies read by a single reader using CAD resulted in 12,651 recalls (10.6%). Workup of these patients resulted in a cancer diagnosis in 499 patients, for a cancer detection rate of 4.20 per 1,000. The PPV1 was 3.9%. Medical audit identified 53 false-negatives, resulting in a sensitivity of 90.4%.
For 480 of the 499 cancers (96%) correctly recalled, the presence or absence of a CAD mark on the index lesion was recorded. CAD marked 257 of the invasive cancers and 135 of the in situ cancers. There was no CAD mark on 69 of the invasive cancers and 19 of the in situ cancers. CAD sensitivity for the radiologist-recalled malignancies was 78.8% (invasive), 87.7% (in situ), and 81.7% (total).
Comparison of Reading Methods
Single reading with CAD showed no statistically significant difference from
double reading in sensitivity, cancer detection rate, or PPV1.
However, the recall rate was lower with CAD (10.6%) than with double reading
(11.9%). The 1.3% difference was statistically significant (p <
0.0001).
Compared with the first reader performance in the double-reading program, single reading with CAD resulted in a significantly increased sensitivity (90.4% vs 81.4%, respectively; p < 0.0001) at a cost of a small increase in the recall rate (10.6% vs 10.2%, p < 0.0001). There was no statistically significant difference in PPV1 or cancer detection rate.
Biopsy and Pathology Data
For the first reader's recalls in double reading, we obtained a diagnostic
mammogram in 11,068 patients (97% of recalls). The diagnostic radiologist
recommended a biopsy in 1,540 of the first reader's recalls, and we performed
1,422 biopsies and diagnosed 435 cancers. The PPV3 (percentage of
biopsies resulting in a tissue diagnosis of cancer) attributed to the first
reader's recalls was 30.6%. The second reader's additional recalls led to 157
more biopsies recommended and 140 more performed. Thirty-one additional
cancers were diagnosed by these biopsies, yielding a PPV3
(attributed only to added recalls of the second reader) of 22.1%. The overall
PPV3 for double reading was 29.8%.
For the single reader with CAD, 12,651 recalls led to 12,376 diagnostic studies (98%), 1,929 biopsy recommendations, and 1,709 biopsies. PPV3 was 27.8%.
The number of cancers included in these PPV3 calculations does not include all malignancies that were used to calculate sensitivity. This is because the diagnostic mammogram (usually interpreted by a different radiologist) may not have explicitly recommended biopsy, but a biopsy was performed by clinician or patient preference. In addition, biopsies may have been delayed until after further clinical or radiographic follow-up. For a cancer to be included in our PPV3 calculations, it must have been diagnosed from a biopsy recommendation (or BI-RADS category 4 or 5) in the diagnostic mammogram generated by the recall. True-positives attributed to the screening mammography reader included all cancers diagnosed within 1 year after the abnormal screening examination. The higher sensitivity for in situ cancers is believed to be caused by the greater confidence of the radiologist in analyzing calcifications than masses and the decreased likelihood of a missed DCIS coming to clinical attention within 1 year. The reason the first reader's sensitivity for in situ malignancy is somewhat lower is that minimal calcifications recalled by the second reader may yield DCIS, which would be chargeable to the first reader as a false-negative.
|
|
|---|
Our double-reading experience in a community practice provides a large data set on which the impact of the second reader can be measured. Our findings are consistent with those of other published studies and support the benefits of double reading. For a specifi ed pool of nine experienced first readers of 112,413 mammograms, sensitivity rose from 81.4% to 88.0% (an absolute increase of 6.6% in sensitivity and a relative gain of 8.2% cancers detected) due to the double-reading process. The cancer detection rate rose from 4.12 to 4.46 per 1,000. These benefits came at a cost of a higher recall rate (11.9% vs 10.2%) and a lower PPV1 (3.7% vs 4.1%).
Because double reading is time-consuming and is not generally reimbursed, CAD has become increasingly popular in the United States as an alternative way to increase sensitivity. Medicare and private insurers' payments for CAD use have assisted hospitals and practices in adopting this technology. The major goal for CAD is to reduce oversights when screening mammograms are read [15]; the radiologist is still responsible for lesion analysis and the final interpretation of an examination. CAD has been shown in multiple studies to identify many subtle cancers [1, 3, 6]. In addition to marking significant lesions, CAD places up to 2.8 false marks per case [15]; the radiologist's skill at rejecting these false marks is critical to avoiding excessive recalls.
The results of several published prospective studies of sequential readings have shown increased cancer detection rates using CAD [6, 16–19]. In those studies, the radiologist first recorded an impression of the mammogram without CAD assistance. After reviewing of the CAD data, the same radiologist rendered a final interpretation; any changes in the reading were attributed to CAD. Freer and Ulissey [16] used this method for 12,860 mammograms and showed a 19.5% increase in the cancer detection rate (from 41 to 49 malignancies) and a 19% increase in the recall rate (from 6.5% to 7.7%). Birdwell and colleagues [17], Ko and colleagues [20], and Helvie and colleagues [18] used similar methods for their studies but had fewer mammograms in their series (8,682, 5,016, and 2,389, respectively).
A potential criticism of a sequential reading study is the need to document a verifiable committed reading before CAD review to ensure its integrity. Nonetheless, there still could be bias in the way the radiologists "reads" the mammogram before CAD review; for example, the reader's scrutiny for calcifications might be less diligent if he or she believes that CAD is highly sensitive for highlighting calcifications.
Other studies of CAD have been based on historical controls [21–23]. Most of these studies have shown improved patient outcomes after CAD adoption. Cupples et al. [21] evaluated radiologists' performance in reading 7,872 mammograms before CAD and 19,402 mammograms after CAD installation. After CAD adoption, the cancer detection rate went up 16.1% (from 3.7 to 4.3 per 1,000) and the recall rate went up 8.1% (from 7.7% to 8.3%). Almost half of the CAD readings were done by a radiologist new to the practice who did not read any of the pre-CAD studies [21].
In contrast to the majority of investigators who have found benefits from CAD in their clinical studies, in a recent article in the New England Journal of Medicine Fenton et al. [24] concluded that CAD is associated with reduced accuracy of interpretations of screening mammograms. Their study results showed an increase in sensitivity from 80.4% to 84.0% and an increase in cancer detection rate from 4.15 to 4.20 per 1,000, but these two findings were not statistically significant. Because the study showed a 19.7% increase in the biopsy rate, which was statistically significant (p < 0.001), the authors argue that CAD provided no benefit.
The article by Fenton and colleagues [24] was based on a survey of results from 43 facilities in three states, although CAD was implemented in only seven facilities. Of the 429,345 single-read mammograms included in their study, only 7% (31,186) were read using CAD. The facilities that implemented CAD did so for a mean of 18 months (range, 2–25 months), which calculates to a mean of 12 CAD studies per day at each facility. Thirty-eight radiologists read the CAD studies, for a mean of only 821 examinations each. In an editorial in the same issue of the New England Journal of Medicine, Hall [25] pointed out a possible flaw in the study by Fenton and colleagues—that is, the time it takes readers to adjust to using CAD was not assessed. A full analysis of the Fenton et al. study is beyond the scope of this article; however, the lower volume of CAD studies and cancers diagnosed, the radiologists' reduced experience with CAD, and the comparison of merged data from seven CAD facilities to data from 36 that never used CAD are some important differences from our study.
Our study compares CAD results to historical double-read control studies and to the first reader in the double-reading period. We limited the data to those of a specified pool of nine radiologists who participated in the double-read studies before CAD use (112,413) and the single-read CAD studies (118,808). With respect to the patients in our study, the CAD cohort had a very slightly younger average age, were less likely to be undergoing mammography for the first time, and were more likely to have undergone mammo graphy 1 year earlier. These differences cause a slight bias against CAD in terms of sensitivity and cancer detection rate [9, 26]. Despite this bias, our results show CAD to have slightly greater sensitivity than double reading (90.4% vs 88.0%, respectively), although the difference does not reach statistical significance. The recall rate with CAD is lower, 10.6% compared with 11.9% (p < 0.0001). The cancer detection rate was slightly lower, but the difference was not statistically significant.
In comparing CAD results to those of our first reader in the double-reading program, there are gains in sensitivity (90.4% vs 81.4%) with a small increase in recall rate (10.6% vs 10.2%), both statistically significant (p < 0.0001). The first reader's interpretation was recorded and entered by a clerical assistant and constituted a documented final interpretation for that radiologist.
A strength of our study is the large number of malignancies and the large number of cases that were double-read and read with CAD. In particular, the number of CAD cases was nearly four times greater than that reported by Fenton et al. [24]. An additional strength is the use of data from the same pool of experienced high-volume readers; the mean CAD experience of our readers of 13,201 is far in excess of the 821 mean CAD reads in the study by Fenton and colleagues. Our direct data analysis, in our view, compares favorably to the statistical modeling used by Fenton and coworkers. Our independent recording and tracking of the first reader's interpretation in our computer program adds reliability to the subsequent comparison of single reading with CAD.
A potential weakness of a study that uses historical controls is the possible bias of changing populations, differing skills of the radiologists, or both. Many of the previously published studies cited herein, including the Fenton et al. [24] study, have used historical controls to examine CAD benefits. We were able to control for two of the most important modifiers of probability of disease: patient age and the time since the last mammogram. Although we used the same radiologists in both arms of the study, it is possible that their skills could have improved over time. However, our readers had a mean of 15 years' experience before the study period, and all are high-volume readers. We believe that significant changes in their skills during the study period are unlikely to explain the findings.
In conclusion, we found that both double reading and CAD are effective methods to increase the sensitivity of screening mammography for experienced mammogram readers. In our study, the second reader increased sensitivity 6.6%, from 81.4% to 88.0%; the recall rate rose from 10.2% to 11.9%. Single reading enhanced by CAD review yielded a higher sensitivity of 90.4%, with a smaller increase in the recall rate from 10.2% to 10.6%. With manpower and cost constraints limiting the use of double reading in the United States, CAD appears to be an effective alternative that provides similar, and potentially greater, benefits. We conclude that CAD provides improved patient outcomes when added to single reading. This conclusion is based on a statistically significant gain in sensitivity from 81.4% to 90.4%, with only a small increase in the recall rate, when CAD study data were compared with the first reader in the double-reading program. Based on a historical methodology similar to that used by Fenton and colleagues [24], our study had fewer variables and yielded different results.
A randomized trial comparing CAD and double reading to single reading without CAD may be difficult to justify on ethical and medical–legal bases. However, a properly designed randomized study comparing double reading to single reading with CAD would provide the optimal assessment of these two competing strategies to improve the efficacy of reading screening mammograms.
Acknowledgments
I thank George McIntyre, of Complete Mam mography Data Management, for his
tireless efforts to ensure complete and accurate data management and
reporting. I also thank Sondra Simons for her skill and diligence in pathology
tracking and data auditing.
|
|
|---|
263bThis article has been cited by other articles:
![]() |
R. M. Nishikawa and L. L. Pesce Computer-aided Detection Evaluation Methods Are Not Created Equal Radiology, June 1, 2009; 251(3): 634 - 636. [Full Text] [PDF] |
||||
![]() |
M. Sadik, M. Suurkula, P. Hoglund, A. Jarund, and L. Edenbrandt Improved Classifications of Planar Whole-Body Bone Scans Using a Computer-Assisted Diagnosis System: A Multicenter, Multiple-Reader, Multiple-Case Study J. Nucl. Med., March 1, 2009; 50(3): 368 - 375. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. F. Dick III, T. H. Gallagher, R. J. Brenner, J. P. Yi, L. M. Reisch, L. Abraham, D. L. Miglioretti, P. A. Carney, G. R. Cutter, and J. G. Elmore Predictors of Radiologists' Perceived Risk of Malpractice Lawsuits in Breast Imaging Am. J. Roentgenol., February 1, 2009; 192(2): 327 - 333. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. L. Ellis Performance Parameters for Mammography Screening Am. J. Roentgenol., November 1, 2008; 191(5): W204 - W204. [Full Text] [PDF] |
||||
![]() |
M. Gromet Reply Am. J. Roentgenol., November 1, 2008; 191(5): W205 - W205. [Full Text] [PDF] |
||||
![]() |
F. J. Gilbert, S. M. Astley, M. G.C. Gillan, O. F. Agbaje, M. G. Wallis, J. James, C. R.M. Boggis, S. W. Duffy, and the CADET II Group Single Reading with Computer-Aided Detection for Screening Mammography N. Engl. J. Med., October 16, 2008; 359(16): 1675 - 1684. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |