Impact of Computer-Aided Detection in a Regional Screening Mammography Program
Abstract
OBJECTIVE. This study was conducted to prospectively assess the effect of computer-aided detection (CAD) on screening outcomes in a regional mammography program.
MATERIALS AND METHODS. Between January 1, 1998, and December 31, 2000, 27,274 consecutive screenings were performed. Radiologists' performance before CAD (n = 7,872) and with CAD (n = 19,402) was determined by annual audits. All positive biopsy results were reviewed; histopathology was reviewed and confirmed. Outcomes (recall, biopsy, and cancer detection rates) with CAD (1999, 2000) were compared with historical control data (1998).
RESULTS. With CAD, increases were seen in recall rate (8.1%, from 7.7% to 8.3%), biopsy rate (6.7%, from 1.4% to 1.5%), and cancer detection rate (16.1%, from 3.7 per 1,000 to 4.3 per 1,000). Detection rate of invasive cancers of 1.0 cm or less increased 164% (from 0.508 to 1.34 per 1,000 screens; p = 0.069). Detection rate of in situ cancers declined 6.7% (from 1.27 to 1.19 per 1,000; p = 0.849). In multivariable analysis of invasive cancers, early stage (stage I) was strongly associated with detection by CAD (odds ratio = 4.13, p = 0.025). Mean age at screening detection of cancer was 5.3 years younger in the CAD group than in the pre-CAD group (p = 0.060).
CONCLUSION. Increased detection rate, younger age at diagnosis, and significantly earlier stage of invasive cancer detection are consistent with a positive screening impact of CAD. Audit results were positive but generally not statistically significant due to sample size limitations. Our findings support the hypothesis that screening with CAD significantly improves detection of the specific cancer morphologies that CAD algorithms were designed to detect.
Introduction
The primary goal of screening mammography is early detection of breast cancers, which has been shown to increase survival [1, 2]. The potential for CAD to improve screening mammography outcomes by increasing the cancer detection rate has been shown in several retrospective studies [3–7]. In 2001, Freer and Ulissey [8] reported the results of their prospective study of the impact of CAD in a community breast cancer screening practice. They performed sequential readings (i.e., initial screening without CAD, followed by a review of the CAD-prompted findings) of 12,860 consecutive screening mammograms and tracked data on how often review of the CAD marks prompted a change in the initial, unaided interpretation. In this 1-year period, they showed a 19.5% increase in the number of screen-detected cancers with the routine use of CAD, with an increase in the proportion of early stage cancers (stage 0, I). Although the lesion type (calcification or mass) and cancer stage were reported, the size of cancers detected was not.
Since then, investigators from Stanford Medical School [9] and the Mayo Clinic Medical School [10] reported on the impact of CAD in their academic settings. Using a study design similar to that of Freer and Ulissey (i.e., a prospective clinical study of sequential readings), they reported an increase in cancer detection of 7.4% and 7.3%, respectively. In all three studies [8–10], although the recall rates increased with CAD, the increase was concordant with or less than the increase in cancers detected. Recently, Gur et al. [11] reported the results of a large prospective clinical study using CAD in their academic setting and concluded that CAD did not result in a statistically significant increase in cancer detection. This was not a sequential reading study but instead was based on historical controls (i.e., comparison of screening performance before and after the introduction of CAD). Neither size nor stage of the detected breast cancers was reported.
Late in 1998, a mammography CAD system was installed in our community-based regional breast cancer screening practice. In January 1999, we began a 2-year prospective study to determine the impact of CAD on our radiologists' performance. Similar to Gur et al. [11], our study was also based on historical controls rather than sequential reading of screening mammograms. We measured not only the usual parameters of mammography screening performance (recall rates, biopsy rates, and cancer detection rates) as in earlier studies cited, but also the type, size, and stage of cancers found by screening.
Materials and Methods
Study Period
A CAD system was first installed in early December 1998, followed by in-service training for radiologists and staff regarding system usage. Routine use of CAD with screening mammography had begun by January 1, 1999, and continued through December 31, 2000. Data collection for the study ended December 31, 2001, when all screening audit data for calendar year 2000 were finalized. Study data included the results of all short-term follow-up recommendations, the results of all biopsies performed arising from non-normal screenings, and the results of all surgical pathology reports for those patients with biopsy-proven breast cancer in 1998, 1999, and 2000.
Study Group
The study group included 27,274 consecutive screening mammogram examinations: 7,872 were interpreted without CAD in calendar year 1998 preceding CAD installation (pre-CAD group), and 19,402 were interpreted with CAD in the 2 years (1999 and 2000) after CAD installation (CAD group). In 1999, several short periods of screening without CAD occurred, typically of less than 1 day in duration, during downtime for service or software upgrades. The best estimate is that approximately 200 of 19,402 screenings (1.0%) may have been interpreted without CAD. It was not possible to exclude these examinations from the CAD group since neither the patients nor the screening results could be identified.
Screening mammograms were performed on asymptomatic women referred for routine screening by their primary care physicians. All women scheduled for routine screening mammography were required to complete and sign a patient history form. Those indicating clinical symptoms, such as a palpable lump, skin thickening, or nipple discharge, were converted from screening to diagnostic evaluations and were prospectively excluded from the study. Non-localized breast pain was not considered a sufficient clinical symptom to exclude women from routine screening.
Mammogram Examinations
All screening mammograms were performed at facilities accredited by the U.S. Food and Drug Administration (FDA) and in accordance with Standards of the American College of Radiology (ACR) and current requirements of the Mammography Quality Standards Act (MQSA) [12]. All were done using modern analog equipment (Mammomat 3000 or Nova 3000, Siemens Medical Solutions) and low-dose, film-screen technique (Min-R 2000 with rare earth intensification screens, processing chemistry, and DayLight autoprocessors, Kodak Health Imaging). Standard two-view (craniocaudal [CC] and mediolateral oblique [MLO]) mammograms of each breast were performed. If considered necessary by the technologist performing the examination, supplementary views (exaggerated craniocaudal, anterior compression) were obtained to complete the examination. Patients with augmentation implants also had implant-displaced views when possible. Facilities participating in this study included the main breast center located on the hospital campus and three affiliated off-site screening facilities. All screening examinations were interpreted at the main breast center, where the CAD processing also took place. Mammogram equipment, film screen combinations, processing, and film viewing technology were identical in the pre-CAD and CAD study groups. The only new technology was the introduction of CAD.
CAD Analysis
Standard views (CC and MLO) from each screening mammogram were analyzed by the CAD system (Image Checker M1000 with software v.2.0, R2 Technology) using a CAD processing unit and a separate CAD display unit. Implant-displaced views in CC and MLO projections, when obtained, were analyzed instead of nonimplant-displaced views. Other supplementary views, when obtained, were not analyzed by CAD.
Freer and Ulissey [8] have previously described in detail the film digitization and CAD analysis processes used in this study. A processing unit scans and digitizes the mammogram film's image and then uses proprietary software to analyze the digital image data. The display unit is a specially modified automated mammogram film viewer with two 5-inch (127 mm) monitors that display the findings of the CAD analysis. The interpreting radiologist first reviews the screening mammogram films and then activates the CAD display and visually correlates the CAD findings. A “mass marker” (*) indicates an area that could represent a mass (which includes spiculated and non-spiculated masses, architectural distortions, and asymmetries). A calcification marker (Δ) indicates an area of three or more grouped microcalcifications. The marker(s) on the CAD display screen prompt the interpreting radiologist to make sure that the corresponding area(s) of possible concern have been carefully examined on the original films. When available, prior mammogram examinations, usually from the preceding year, were compared with the current screening at the time of interpretation; however, the prior mammograms were not processed by the CAD system.
Interpreting Radiologists
Three radiologists (A, B, and C) participated in the screening mammography program during the pre-CAD and CAD study periods. All were experienced mammographers who met the established standards of the ACR, FDA, and MQSA. Radiologist A had completed a 1-year fellowship in mammography and breast intervention. Radiologist B had completed a 1-year fellowship in women's imaging, including 4 months of subspecialty training in mammography. Radiologist C had no subspecialty training in mammography but had more than 10 years of clinical experience. Radiologist D joined the practice in 1999 and thus only participated in screening during the CAD period. Radiologist D had completed a 1-year fellowship in sonography and had formerly served as the section chief of mammography at a medical school's radiology residency program. Because no pre-CAD data were available for radiologist D, data from the CAD period are presented for all radiologists combined and also for radiologists A, B, and C only. Almost half of the screens performed in the CAD period were by radiologist D. These data were included to improve statistical power.
Examination Results Coding
After interpreting the screening mammograms, each radiologist coded the results by ACR BI-RADS [13] category: 1, negative; 2, benign; or 0, incomplete, needs additional evaluation. BI-RADS classifications 3, probably benign; 4, suspicious; and 5, highly suggestive of malignancy were reserved for coding the results of diagnostic evaluations after the initial BI-RADS category 0 screening mammogram. For screening mammography audit purposes, a BI-RADS result of category 3 (probably benign) after a follow-up diagnostic examination was still treated as positive screening. No formal double reading by a second radiologist was done.
Data Collection
Screening mammogram interpretation data were entered in a computerized reporting and database system (Mammography Reporting System, Mammography Reporting Systems, Inc.). This database was used to generate the audit data used for this study as well as for examination reports; recall letters; and tracking outcome data from all diagnostic mammograms, breast sonograms, and all other breast imaging and interventional procedures performed at our facilities. Combined data from the mammography database, clinical information from weekly interdisciplinary breast cancer pretreatment planning conferences (where all newly diagnosed breast cancers treated at our institution are presented and discussed), and the regional cancer registry were used to identify all screening mammograms that led to a diagnosis of breast cancer.
Biopsy and Surgical Pathology Data
Pathology reports of core needle biopsy or surgical biopsy histopathology or both were obtained for all cancers included in the study. No cancers in the study group were diagnosed by cytology. A recommendation for cyst aspiration was not considered to be a biopsy recommendation, even if cyst aspiration cytology was subsequently performed. Atypical hyperplasia, complex sclerosing lesion, and lobular carcinoma in situ diagnosed by core needle biopsy always prompted the recommendation for repeat surgical biopsy. Ductal carcinoma in situ was reported as a positive biopsy for breast cancer; lobular carcinoma in situ was not.
One pathologist reviewed all breast cancers diagnosed at our institution during the study period. Cancers were staged from the combined data of surgical pathology and imaging-guided needle biopsy procedures. Clinical cancer stages (i.e., stages 0–IV) and T-stages (TX, NX, MX) were reported according to the standards of the American Joint Committee on Cancer (AJCC) [14]. Tumor size, node status, and clinical stage were assessed from the best available clinical, imaging, and histopathology data in every case. For staging, pathologically unknown lymph node (pNX) and distant metastases (pMX) were presumed to be negative or absent if not clinically apparent in those patients lacking biopsy data. No invasive cancers ≤ 1.0 cm had incomplete surgical pathology data, however.
Statistical Analysis
Using the patient management system data, recall and biopsy rates were calculated and reported as percentages. Using biopsy-proven cancer as the gold standard, the positive predictive value (PPV) for biopsy (PPV: number of biopsy-proven cancers divided by the total number of biopsies) and the cancer detection rates were computed and reported as percentages. Statistical differences between the pre-CAD and CAD periods were analyzed with Stata version 8.0 software (StataCorp), using two-tailed Fisher's exact tests of proportions, Student's t tests of means, and multivariate logistic regression.
Screens were performed by three radiologists (A, B, and C) in the pre-CAD period and by four radiologists (A, B, C, and D) in the CAD period; therefore, some analyses of the CAD period were repeated excluding radiologist D. Some analyses were also repeated with the number of screening mammograms proportionately adjusted for the cases lost to follow-up. Aggregate means were computed as the total number of events (e.g., cancers detected) divided by the total denominator (in this case, screenings) without regard for interradiologist differences. Because the relative proportions of screenings conducted by radiologists A, B, and C were almost identical in the pre-CAD and CAD periods, weighting was not used in comparing aggregate pre-CAD and CAD rates.
Results
Outcome Audit Data
Outcome audit data are presented in Table 1. During the 3-year study period, 27,274 consecutive screening mammograms were performed: 7,872 in the pre-CAD group and 19,402 in the CAD group. Of the total, 2,217 patients (8.1%) were recalled for further evaluation. From these, 409 biopsies were advised, 392 biopsies were performed, and 112 cancers were confirmed by biopsy. The recommendation for screening recall resulted in a cancer diagnosis (PPV1) in 5.1% of cases. The recommendation for biopsy resulted in a cancer diagnosis (PPV2) in 27.4% of cases (112/409) and a positive biopsy rate (PPV3) of 28.6% (112/392).
Radiologist | No. of Screenings | No. of Screening Recalls | Recall Rate (%) | No. of Biopsies Advised | No. of Biopsies Done | Biopsy Rate (%) | No. of Cancers Found | PPV3 for Biopsies Done (%) | Cancer Detection Rate (per 1000) |
---|---|---|---|---|---|---|---|---|---|
Pre-CAD period | |||||||||
A | 4,750 | 362 | 7.62 | 73 | 71 | 1.49 | 18 | 25.4 | 3.79 |
B | 2,484 | 175 | 7.05 | 25 | 25 | 1.01 | 7 | 28.0 | 2.82 |
C | 638 | 68 | 10.70 | 12 | 12 | 1.88 | 4 | 33.3 | 6.27 |
D | 0 | 0 | NA | 0 | 0 | NA | 0 | NA | NA |
Total | 7,872 | 605 | 7.69 | 110 | 108 | 1.37 | 29 | 26.9 | 3.68 |
CAD period | |||||||||
A | 6,120 | 537 | 8.77 | 132 | 125 | 2.04 | 31 | 24.8 | 5.07 |
B | 3,370 | 255 | 7.57 | 48 | 47 | 1.39 | 10 | 21.3 | 2.97 |
C | 657 | 73 | 11.10 | 16 | 14 | 2.13 | 5 | 35.7 | 7.61 |
D | 9,255 | 747 | 8.07 | 103 | 98 | 1.06 | 37 | 37.8 | 4.00 |
Total | 19,402 | 1,612 | 8.31 | 299 | 284 | 1.46 | 83 | 29.2 | 4.28 |
Entire study | |||||||||
Total | 27,274 | 2,217 | 8.13 | 409 | 392 | 1.44 | 112 | 28.6 | 4.11 |
Performance | |||||||||
Including radiologist D | |||||||||
Percent change | 8.11 | 6.69 | 8.84 | 16.10 | |||||
Rates | 8.31 vs 7.69 | 1.46 vs 1.37 | 29.2 vs 26.9 | 4.28 vs 3.68 | |||||
p value | 0.092 | 0.613 | 0.708 | 0.532 | |||||
Excluding radiologist D | |||||||||
Percent change | 10.90 | 33.61 | –7.90 | 23.10 | |||||
Rates | 8.52 vs 7.69 | 1.83 vs 1.37 | 24.7 vs 26.9 | 4.53 vs 3.68 | |||||
p value | 0.042 | 0.018 | 0.680 | 0.415 |
Note–CAD = computer-aided detection, PPV = positive predictive value, NA = not available.
Seventeen patients (4.2%) for whom biopsy was recommended were lost to follow up: two of 110 (1.8%) in the pre-CAD group and 15 of 299 (5.0%) in the CAD group. With CAD, the cancer detection rate increased 16.1% (4.3 vs 3.7 per 1000). Detection rate of invasive cancers ≤ 1.0 cm increased 164% (from 0.508 to 1.34 per 1,000 screens; p = 0.069). Detection rate of in situ cancers declined 6.7% (from 1.27 to 1.19 per 1,000; p = 0.849). Results of a similar magnitude were observed with or without inclusion of radiologist D.
Age, Prevalence, and Incidence Data
Age, prevalence, and incidence data for the study group are presented in Table 2. The mean age of the study population (54.5 years) and mean ages in the pre-CAD group and CAD group (54.4 years and 54.5 years, respectively) were almost identical. The mean age of patients with screening-detected breast cancer was 60.9 years, 6.4 years older than the entire study population. The age at diagnosis of breast cancer in the CAD group was 59.5 years, 5.3 years younger than the age of cancer diagnosis in the pre-CAD group (p = 0.060). The age range for diagnosis of cancer was 35–90 years: 40–90 years pre-CAD and 35–90 years with CAD.
Pre-CAD | CAD | Entire Study | |||||||
---|---|---|---|---|---|---|---|---|---|
No. (%) | Age (yr) | SD | No. (%) | Age (yr) | SD | No. (%) | Age (yr) | SD | |
All screenings | |||||||||
Prevalent | 1,068 (13.6) | 46.1 | 12.5 | 1,371 (7.1) | 44.5 | 11.4 | 2,436 (8.9) | 45.2 | 11.9 |
Incident | 6,804 (86.4) | 55.7 | 11.8 | 18,031 (92.9) | 55.3 | 12.1 | 24,838 (91.1) | 55.4 | 12.1 |
Total | 7,872 (100) | 54.4 | 12.3 | 19,402 (100) | 54.5 | 12.4 | 27,274 (100) | 54.5 | 12.4 |
Cancers detected | |||||||||
Prevalent | 2 (6.9) | 44.0 | 4.9 | 5 (6.0) | 48.9 | 7.7 | 7 (6.2) | 47.5 | 7.0 |
Incident | 27 (93.1) | 66.4 | 14.4 | 78 (94.0) | 60.2 | 12.1 | 105 (94.8) | 61.3 | 12.9 |
Total | 29 (100) | 64.8 | 15.1 | 83 (100) | 59.5 | 12.1 | 112 (100) | 60.9 | 13.1 |
Note–CAD = computer-aided detection.
The study group's 27,274 screenings included 2,436 (8.9%) prevalent (first) screenings and 24,838 (91.1%) incident screenings. The proportion of prevalent screenings decreased from 13.6% pre-CAD to 7.1% in the CAD group (p < 0.005). Of 112 breast cancers, only seven prevalent cancers (6.2%) were found: two in the pre-CAD group and five in the CAD group. Five incident cancers were found in patients who indicated that they had undergone a previous screening at another facility, but neither the previous images nor the mammogram reports could be obtained.
The average interval between a true-positive screening and previous negative screening was 21 months (median, 14 months; range, 11 to 65 months) in the 100 incident cancer cases for which previous mammogram results were available. The mean age at screening was almost 10 years younger among women receiving prevalent compared with incident screenings (45.2 years vs 55.4 years, p = 0.000), a difference seen in both the pre-CAD and CAD groups.
Histopathology and Cancer Stage
The histologic characteristics and stages of the 112 screening-detected cancers are presented in Table 3. The 96 ductal carcinomas included 33 ductal carcinomas in situ, 24 invasive ductal carcinomas, and 39 combined invasive and in situ ductal carcinomas. An extensive intraductal component was present in eight of 39 (20.1%) of the invasive and in situ ductal carcinomas. The16 other invasive cancers included six invasive lobular cancers (5.31%). With CAD, relative differences of similar magnitudes were found with or without the inclusion of radiologist D.
CAD Group with Radiologists | |||
---|---|---|---|
Characteristic | Pre-CAD Group No. (%) | A, B, C, and D No. (%) | A, B, and C No. (%) |
Histologic type | |||
Ductal carcinoma, in situa | 10 (34.5) | 23b (27.7) | 13 (28.3) |
All invasive ductal carcinomasa | 13 (44.8) | 50 (60.2) | 26 (56.5) |
Ductal carcinoma, invasive | 8 (27.6) | 16 (19.3) | 11 (23.9) |
Ductal carcinoma, invasive and in situ | 5 (17.2) | 34 (41.0) | 15 (32.6) |
All other invasive carcinomasa | 6 (20.7) | 10 (12.0) | 7 (15.2) |
Lobular carcinoma, invasive | 2 (6.9) | 4 (4.8) | 2 (4.4) |
Mixed mammary carcinoma | 1 (3.4) | 2 (2.4) | 2 (4.4) |
Mucinous (colloid) carcinoma | 2 (6.9) | 1 (1.2) | 1 (2.2) |
Tubular carcinoma | 1 (3.4) | 1 (1.2) | 1 (2.2) |
Tabulo-lobular carcinoma | 0 (0.0) | 1 (1.2) | 1 (2.2) |
Papillary carcinoma, invasive and in situ | 0 (0.0) | 1 (1.2) | 0 (0.0) |
Total | 29 (100.0) | 83 (100.0), p = 0.290b | 46 (100.0), p = 0.625b |
AJCC cancer stage | |||
0b | 10 (34.5) | 22 (26.5) | 13 (28.3) |
Ib | 7 (24.1) | 38 (45.8) | 22 (47.8) |
Otherb | 12 (41.4) | 22 (26.5) | 10 (21.7) |
IIA | 5 (17.2) | 13 (15.7) | 4 (8.7) |
IIB | 4 (13.8) | 5 (6.0) | 4 (8.7) |
IIIA | 1 (3.4) | 1 (1.2) | 1 (2.2) |
IIIB | 0 (0.0) | 1 (1.2) | 0 (0.0) |
IV | 2 (6.9) | 2 (2.4) | 1 (2.2) |
Unstaged | 0 (0.0) | 1 (1.2) | 1 (2.2) |
Total | 29 (100.0) | 83 (100.0), p = 0.096b | 46 (100.0), p = 0.076b |
T-stage | |||
Tis (in situ)b | 10 (34.5) | 23b (27.7) | 13 (28.3) |
T1mic–T1bb | 4 (13.8) | 26 (31.3) | 15 (32.6) |
T1mic (≤ 1 mm) | 0 (0.0) | 4 (4.8) | 1 (2.2) |
T1a (> 1 mm, ≤ 5 mm) | 0 (0.0) | 6 (7.2) | 5 (10.9) |
T1b (> 5 mm, ≤ 10 mm) | 4 (13.8) | 16 (19.3) | 9 (19.6) |
T1c, T2, T3 (> 10 mm)b | 15 (51.7) | 31 (41.0) | 17 (37.0) |
T4 | 1 (3.4) | 1 (1.2) | 0 (0.0) |
T1 or unstaged | 0 (0.0) | 2 (2.4) | 1 (2.2) |
Total | 29 (100.0) | 83 (100.0), p = 0.169b | 46 (100.0), p = 0.199b |
Note–CAD = computer-aided detection, AJCC = American Joint Committee on Cancer.
a
The p values compare these categories in the pre-CAD group vs CAD groups with and without radiologist D
b
Includes one stage IV TisNXM1 cancer
The more substantial changes seen with CAD were within the subgroup of invasive cancers, as summarized in Table 4. Proportionate increases occurred in invasive ductal carcinomas with an in situ component (116% increase, p = 0.053), stage I cancers (72% increase, p = 0.062), and invasive cancers ≤ 1.0 cm (T1mic, T1a, T1b; 112% increase; p = 0.103). The increase in prevalent invasive cancers was comparatively small (26%, p = 1.00). When these characteristics were combined with age at diagnosis in logistic regression analysis (Table 5), the strongest and most significant predictor for improved screening outcomes with CAD was early stage invasive cancer (stage I vs stage II or higher; odds ratio 4.13; p = 0.025). Age at diagnosis was almost significant (odds ratio = 0.96 per year, p = 0.064). Histologic type and prevalent versus incident screening were not significantly associated with the impact of CAD. AJCC cancer stage was a better predictor than the specific T-stage.
Invasive Cancer | Pre-CAD Group (n = 19) | CAD Group with Radiologists A, B, C, and D (n = 61)a | pb | CAD Group with Radiologists A, B, and C (n = 33) | pb |
---|---|---|---|---|---|
Histologic type | |||||
Invasive ductal carcinoma | 8 (42.1%) | 16 (26.2%) | 11 (33.3%) | ||
Invasive carcinoma plus in situ ductal carcinoma | 5 (26.3%) | 34 (55.7%) | 0.053 | 15 (45.4%) | 0.386 |
Other invasive carcinomas | 6 (31.6%) | 10 (16.4%) | 7 (21.2%) | ||
DCIS only | 0 | 1 (1.6%)a | 0 | ||
AJCC stage | |||||
Stage I | 7 (36.8%) | 38 (63.3%) | 0.062 | 22 (68.8%) | 0.041 |
Stage II–IV | 12 (63.2%) | 22 (36.7%) | 10 (31.2%) | ||
Unstaged | 0 | 1 (1.6%) | 1 (3.0%) | ||
T-stage | |||||
Tis | 0 | 1 (1.6%) | 0 | ||
T1mic plus T1a plus T1b | 4 (21.0%) | 26 (42.6%) | 0.103 | 15 (46.9%) | 0.080 |
T1c–T4 | 15 (78.9%) | 32 (52.5%) | 17 (53.1%) | ||
T1 not otherwise specified | 0 | 1 (1.6%) | 0 | ||
Unstaged | 0 | 1 (1.6%) | 1 (3.0%) | ||
Screen type | |||||
Prevalent | 1 (5.3%) | 4 (6.6%) | 1.000 | 3 (9.1%) | 1.000 |
Incident | 18 (94.7%) | 57 (93.4%) | 30 (90.9%) |
Note–CAD = computer-aided detection, AJCC = American Joint Committee on Cancer, DCIS = ductal carcinoma in situ.
a
Includes one stage IV TisNXM1 cancer
b
The p values compare pre-CAD versus CAD with and without radiologist D. Cancer categories DCIS, unstaged, and T1 not otherwise specified were not included in these p values
Factor | Pre-CAD (n = 19) | CAD (n = 59)a | Adjusted Odds Ratiob | 95% CI | p |
---|---|---|---|---|---|
Mean age at diagnosis (yr) | 65.6 | 59.9 | 0.96 | 0.92–1.00 | 0.064 |
Histologic type | |||||
IDCA | 8 (42.1%) | 16 (27.1%) | 1.00 (ref) | ||
IDCA with DCIS | 5 (26.3%) | 33 (55.9%) | 2.34 | 0.61–8.93 | 0.214 |
Other invasive types | 6 (31.6%) | 10 (17.0%) | 0.49 | 0.11–2.14 | 0.345 |
AJCC stage | |||||
II–IV | 12 (63.2%) | 21 (35.6%) | 1.00 (ref) | ||
I | 7 (36.8%) | 38 (64.4%) | 4.13 | 1.20–14.3 | 0.025 |
Screen type | |||||
Incident | 18 (94.7%) | 55 (93.2%) | 1.00 (ref) | ||
Prevalent | 1 (5.3%) | 4 (6.8%) | 0.88 | 0.08–10.1 | 0.921 |
Note– CAD = computer-aided detection, CI = confidence interval, IDCA = invasive ductal carcinoma, DCIS = ductal carcinoma in situ, AJCC = American Joint Committee on Cancer, ref = referent category.
a
Excludes one unstaged cancer and one stage IV TisNXM1 cancer
b
Logistic regression adjusted for all other variables in the table. Odds ratio for age at diagnosis is per year of age
Discussion
Screening mammography with CAD has been shown to improve radiologists' performance in the detection of clinically occult cancers that otherwise would have been overlooked until either future screening or the development of symptoms or signs prompted a cancer diagnosis. The intent of this study was to determine if CAD would improve screening performance in our practice setting by prospective comparison of annual screening audits in the pre-CAD and CAD periods. Further, if we observed a benefit, we wanted to see how our results compared with those reported by others [8–11] and to verify, if possible, that earlier stages and smaller sizes of invasive cancers were indeed being detected with CAD.
Our 16.1% increase in the 2-year cancer detection rate by routine screening with CAD is similar to the 19.5% 1-year increase in cancer detection rates reported by Freer and Ulissey [8] and greater than that observed in other reports [9, 10], particularly the 1.7% increase reported by Gur et al. [11] over approximately 1.5 years. Although our overall recall rate increased 8.1% (8.31/7.69) and our biopsy rate increased 6.69% (1.46/1.37) with the use of CAD, both were well below the 16.1% increase in cancers detected. The PPV3 for biopsy increased 8.84% (29.2/26.9).
The percentage of ductal carcinoma in situ found in the CAD period decreased slightly, from 34.5% (10/29 to 27.7% (23/83). A possible explanation could be that for experienced radiologists, ductal in situ cancers presenting as calcifications represent less a problem of perception than a problem of characterization, thus reducing the potential benefits of CAD. However, the percentage of small invasive cancers found with CAD increased, from 13.8% (4/29) to 31.3% (26/83)—especially those with an associated ductal carcinoma in situ component, from 25.0% (1/4) to 76.9% (20/26). In the CAD group, the improved detection of early stage cancers presenting as either small groups of microcalcifications or small irregular masses or both supports the hypothesis that the observed benefit of screening with CAD was due to improved detection of the specific types of cancers that the CAD algorithms have been designed to detect. Further, the younger mean age of screen-detected cancer patients in the CAD group (59.5 vs 64.8 years, p = 0.069) suggests that the benefit of CAD was both temporal—that is, earlier cancer detection—and biologic—that is, smaller tumor size and lower cancer stage.
Lack of previous comparison mammograms may decrease radiologists' ability to perceive subtle changes related to new or developing cancers. Since CAD does not analyze prior studies, it seems plausible that screening with CAD might have a relatively greater positive impact in cancer detection when previous studies are lacking (e.g., prevalent screenings). In our study, even though the percentage of prevalent screenings decreased significantly in the CAD group from 13.6% (1,068/7,872) to 7.1% (1,371/19,402; p < 0.0005), the detection rate for prevalent cancers increased 95.2% from 1.87 per thousand (2/1,068) to 3.65 per thousand (5/1,371) with CAD. Five additional incident cancers in the CAD group were diagnosed in patients for whom no previous films could be obtained for comparison. If these five cancers were also considered to be prevalence screenings, then the increased detection rate with CAD would double to 7.29 per thousand (10/1,371). This supports the hypothesis that the impact of CAD on screening outcomes might be greater when previous comparison studies are lacking.
An analysis of the clinical stage of cancers detected is useful but not sufficient to give the best description of the benefit that CAD may produce in cancer detection. Detection of small but aggressive high-grade invasive cancers may be more beneficial than detection of larger, low-grade invasive cancers or extensive in situ cancers. Cancers with positive axillary nodes may reflect more advanced clinical stages but not necessarily larger tumor size or greater ease of detection by screening. Stated more simply, minimal cancers are not always stage 0 or stage I lesions, and aggregate cancer detection rates at screening do not reflect these important clinical distinctions.
Including TNM staging (e.g., T1cN0 vs T1aN1) gives a more accurate picture of the size of cancers found at screening and perhaps a better measure of the potential benefit of screening with CAD than clinical cancer stage alone. Further, expression of the benefit of CAD measured by aggregate detection rates alone may be misleading. Detection rate is a time-dependent variable. No screening test increases the background incidence of the disease. Over a sufficient time interval, the screening detection rate always reflects the background incidence rate of new cancers in the population being screened. And as with all cancer screening tests, the more relevant question is “are the cancers being found earlier, both temporally and pathologically?”
The study by Gur et al. [11] is the only study we are aware of that prospectively evaluated the impact of CAD in a clinical setting using a historical control methodology similar to our study. However, no data regarding size, stage, or patient age of screening-detected cancers were included. They concluded that the use of CAD showed only a minimal increase in cancer detection (1.7%) for their group of 24 academic radiologists, a finding that did not reach statistical significance (p = 0.68). Although their database was four times larger than ours (115,571 vs 27,274 screens, respectively), their confidence interval was also larger (–11% to +19%), a range inclusive of the 16.1% increase observed in our study. Further, Feig et al. [15] analyzed the published data and found a 19.7% increase in cancer detection by the “low-volume” radiologists (representing 17 of the 24 radiologists). They also noted that the distribution of prevalent and incident screens in the pre-CAD and CAD groups would further tend to understate the increase in cancer detection noted with the use of CAD.
Our study has some limitations. First, given that cancer detection in a screened population is rare, the sample sizes required to detect statistically significant changes of the magnitude reported here and by others are huge. For example, more than 450,000 screens would be required in each of the pre-CAD and CAD groups to detect a statistically significant 10% change in cancer detection rate, given 80% power and cancer detection rates of approximately four per 1,000 screenings. With our much smaller screening populations (7,872 pre-CAD and 19,402 with CAD), we had only 9% power to detect a significant change of 16.1% in the cancer detection rate. For such a change to be statistically significant, with 80% power, more than 150,000 screenings would be required in each group. Therefore, it is not surprising that studies such as ours to assess the impact of CAD fail to reach significance. Prolonging the study to accumulate more screening-detected cancers would increase the statistical power but would negate the transient benefit measured by an increase in the detection rate because no screening intervention, CAD included, increases the background incidence of disease over time.
Second, as with all historical control population studies [11], it is possible that inherent differences in the pre-CAD and CAD populations might be a reason for the observed increase in the cancer detection rates rather than the introduction of screening with CAD. Specifically, since cancer detection rates are higher in prevalence screening than in incidence screening [16], differences in screening outcomes might be due to differences in the proportions of prevalence and incidence screening rather than by the methodology of screening without or with CAD. However, in our study, the proportion of prevalence screening varied significantly in the pre-CAD group compared with the CAD group (13.6% vs 7.1%, respectively; p < 0.0005). This higher proportion of prevalence screenings in the pre-CAD group represents a population difference that would tend to skew cancer detection rates in favor of increased detection in the pre-CAD group. Such a bias might result in an underestimation of the increase in detection rates with CAD. However, this bias was not observed in our study.
Because the increase in cancer detection was measured by comparison with historical control data, the lack of pre-CAD data for radiologist D is a limitation. If radiologist D were superior to the other radiologists in screening ability, then including his data without a pre-CAD control would result in a superiority bias that might account for the apparent improvement in screening with CAD. However, our audit data do not suggest a superiority (or inferiority) bias for radiologist D, although the cancer detection rate was actually higher without radiologist D.
The historical control design of this study does not permit analysis of the potential for perceptual improvements in radiologists' screening ability over time (getting better with age). It is possible that learned improvement in screening ability over time, independent of any impact from CAD, may have biased the results in favor of improved cancer detection with CAD. To eliminate this potential bias, a prospective study comparing screening results on the same patient population with and without CAD would be required.
Third, in this study, cases that may have been changed from negative (i.e., BI-RADS 1 and 2) to recall (i.e., BI-RADS 0) only because the CAD mark(s) that brought the lesion(s) to the attention of the interpreting radiologist were not prospectively recorded. Therefore, comparison of our results with prospective clinical studies that used a sequential reading methodology [8–10] is difficult.
Fourth, there is no agreement on the definition of a “screening mammogram.” Some examinations that might have been coded as diagnostic in other facilities (because of breast implants, a prior history of breast cancer, or nonlocalized breast pain) have been included as screenings in our study. Asymptomatic women with augmentation implants were included, but only the implant-displaced views were screened with CAD. Women with a personal history of breast cancer were included if, after mastectomy, the opposite breast was being screened. Our screening protocol allows for breast cancer survivors to be returned to routine screening 3 years after completion of breast conservation therapy if clinical follow-up at our facility has shown no evidence of local or distant disease. Some breast cancer survivors continued to have diagnostic examinations indefinitely if so ordered by their referring physicians, however. Although other practices may use different criteria, our criteria for a screening mammogram were consistent during the pre-CAD and CAD periods.
Finally, it is likely some staging errors might have occurred due to incomplete data (one of 83 biopsy-proven cancers [1.2%] in the CAD group was not clinically staged and three [3.4%] lacked complete TNM staging). In such cases, tumor stage was assigned based on the best available evidence (e.g., lesion size determined from the mammogram if not recorded on the pathology report). For purposes of cancer staging, when regional lymph nodes were not clinically suspicious and had not been biopsied (TNM / AJCC = Nx), they were considered to be negative. In such situations, the tumor stage assigned in Table 2 could potentially down-stage some cancers. However, all minimal cancers reported in our study were confirmed by surgical pathology and were consistent with the mammogram findings.
In conclusion, a 16.1% increase in cancer detection rate occurred in the 2-year period after the introduction of CAD into our screening mammography practice. The increase in cancer detection rate did not occur at the expense of a discordant increase in the screening recall rate (8.11%) or biopsy rate (6.69%). The PPV3 for biopsy increased slightly (8.84%). The substantial increase in the proportion of early stage invasive cancers and younger patient age at diagnosis with CAD are clinically and statistically significant. These findings affirm the benefit if CAD in our regional mammography program [17].
Footnote
Address correspondence to T. E. Cupples ([email protected]).
References
1.
Tabar L, Vitak B, Chen HC, Yen MF, Duffy SW, Smith RA. Beyond randomized controlled trials: organized mammographic screening substantially reduces breast carcinoma mortality. Cancer 2001; 91:1724-1731
2.
Duffy SW, Tabar L, Chen H, et al. The impact of organized mammography service screening on breast carcinoma mortality in seven Swedish counties. Cancer 2002; 95:458-469
3.
Vyborny CJ. Can computers help radiologists read mammograms? Radiology 1994; 191:315-317
4.
te Brake GM, Karssemeijer N, Hendricks JH. Automated detection of breast carcinomas not detected in a screening program. Radiology 1998; 207:465-471
5.
Nishikawa RW, Doi K, Geiger ML, et al. Computerized detection of clustered microcalcifications: evaluation of performance on mammograms from multiple centers. RadioGraphics 1995; 15:445-452
6.
Kregelmeyer WP, Pruneda JM, Bourland PD, Hillis A, Riggs MW, Nipper ML. Computer-aided mammographic screening for spiculated lesions. Radiology 1994; 191:331-337
7.
Warren Burhenne LJ, D'Orsi CJ, Feig SA, et al. The potential contribution of computer-aided detection to the sensitivity of screening mammography. Radiology 2000; 215:554-562
8.
Freer TW, Ulissey MJ. Screening mammography with computer-aided detection: prospective study of 12,860 patients in a community breast center. Radiology 2001; 220:781-786
9.
Bandodkar P, Birdwell R, Ikeda D. Computer aided detection (CAD) with screening mammography in an academic institution: preliminary findings. (abstr) Radiology 2002; 225(P):458
10.
Morton MJ, Whaley DH, Brandt KR, Amrami KK. The effects of computer-aided detection (CAD) on a local/regional screening mammography program: prospective evaluation of 12,646 patients. (abstr) Radiology 2002; 225(P):459
11.
Gur D, Sumkin JH, Rockette HE, et al. Changes in breast cancer detection and mammography recall rates after introduction of a computer-aided detection system. J Natl Cancer Inst 2004; 96:185-190
12.
Quality standards and certification requirements for mammography facilities, 58 Federal Register67565 (1993)
13.
American College of Radiology. Breast imaging reporting and data system (BI-RADS), 2nd ed. Reston, VA: American College of Radiology, 1998
14.
American Joint Committee on Cancer. AJCC Cancer Staging Manual, 5th ed. Philadelphia, PA: Lippincott-Raven Publishers,1997
15.
Feig S, Sickles E, Evans WP, Linver M. Re: changes in breast cancer detection and mammography recall rates after introduction of a computer-aided detection system. (letter) J Natl Cancer Inst 2004; 96:1260-1261
16.
Feig SA. Age-related accuracy of screening mammography: how should it be measured? Radiology 2000; 214:633-640
17.
Butler WM, Cunningham JE, Bull D, et al. Breast cancer care: changing community standards. J Healthc Qual 2004; 26:22-28
Information & Authors
Information
Published In
Copyright
© American Roentgen Ray Society.
History
Submitted: August 18, 2004
Accepted: September 30, 2004
First published: November 23, 2012
Authors
Metrics & Citations
Metrics
Citations
Export Citations
To download the citation to this article, select your reference manager software.