Women's Imaging
Original Research
Comparison of Digital Mammography and Screen-Film Mammography in Breast Cancer Screening: A Review in the Irish Breast Screening Program
OBJECTIVE. Clinical trials to date into the use of full-field digital mammography (FFDM) for breast cancer screening have shown variable results. The aim of this study was to review the use of FFDM in a population-based breast cancer screening program and to compare the results with screen-film mammography.
MATERIALS AND METHODS. The study included 188,823 screening examinations of women between 50 and 64 years old; 35,204 (18.6%) mammograms were obtained using FFDM. All films were double read using a 5-point rating scale to indicate the probability of cancer. Patients with positive scores were recalled for further workup. The recall rate, cancer detection rate, and positive predictive value (PPV) of FFDM were compared with screen-film mammography.
RESULTS. The cancer detection rate was significantly higher for FFDM than screen-film mammography (6.3 vs 5.2 per 1,000, respectively; p = 0.01). The cancer detection rate for FFDM was higher than screen-film mammography for initial screening and subsequent screening, for invasive cancer and ductal carcinoma in situ, and across all age groups. The cancer detection rate for cancers presenting as microcalcifications was significantly higher for FFDM than for screen-film mammography (1.9 vs 1.3 per 1,000, p = 0.01). The recall rate was significantly higher for FFDM than screen-film mammography (4.0% vs 3.1%, p < 0.001). There was no significant difference in the PPVs of recall to assessment for FFDM and screen-film mammography (15.7% and 16.7%, p = 0.383).
CONCLUSION. FFDM resulted in significantly higher cancer detection and recall rates than screen-film mammography in women 50–64 years old. The PPVs of FFDM and screen-film mammography were comparable. The results of this study suggest that FFDM can be safely implemented in breast cancer screening programs.
Keywords: breast cancer, breast cancer screening, breast imaging, digital imaging, digital mammography, screen-film mammography
Mammography is a well-established screening tool, and screening has been shown to reduce breast cancer mortality due to earlier detection. To date, screen-film mammography has been the reference standard for use in breast cancer screening programs, and all previous randomized controlled trials into population-based breast cancer screening programs were performed using screen-film mammography [1–3].
Since it first gained U.S. Food and Drug Administration approval in 2000, digital mammography has gained in popularity because of its many advantages over screen-film mammography, including elimination of film processing, storage, copying, and retrieval; the ability to manipulate images after acquisition; and the more efficient use of computer-aided detection and telemammography. There were initial concerns that the more limited spatial resolution of digital mammography compared with the reference standard of screen-film mammography might lead to a reduced sensitivity for cancer detection [4]. However, experimental studies have shown that digital systems have a higher detective quantum efficiency and dynamic range, leading to improved contrast resolution [5]. Also, concerns that the lower spatial resolution of digital imaging would limit the detection of microcalcifications have been discounted by studies reporting that full-field digital mammography (FFDM) shows improved image quality with higher reliability in characterizing calcifications compared with screen-film mammography [6, 7].
Clinical studies to date into the use of FFDM for breast cancer screening have shown variable results. In early trials, Lewin et al. [8, 9] and Skaane et al. [10] showed a nonsignificant higher cancer detection rate for screen-film mammography than for FFDM. However, the Digital Mammographic Imaging Screening Trial (DMIST) [11, 12], which represents the largest trial of digital mammography to date, concluded that FFDM was more accurate in screening pre- or perimenopausal women younger than 50 years with dense breasts. In a more recently published article describing the final results of the Oslo II study, Skaane et al. [13] reported a significantly higher cancer detection rate in women screened on FFDM, and a number of recent European studies have yielded results that are more favorable for digital mammography than for screen-film mammography [14–17]. Table 1 summarizes the main findings of these studies.
The Irish National Breast Screening Programme (INBSP) was launched in 2000. It invites women ranging in age from 50 to 64 years to undergo breast cancer screening every 2 years. Screening is performed both on-site in static screening units and off-site in mobile units, with all reading performed centrally in the static units. Digital mammography was first introduced into the screening program on a phased basis in January 2005. Between January 2005 and December 2007, 18.6% (35,204 of 188,823) of patients were screened on digital systems. In April 2008, the INBSP became the first national screening center in Europe to be fully digitized.
The aim of this study was to retrospectively review the performance of FFDM in a population-based screening program and to compare its performance with that of the standard of screen-film mammography with respect to recall rate, cancer detection rate, and positive predictive value (PPV).
Between January 1, 2005, and December 31, 2007, 245,863 invitations to undergo breast cancer screening in the INBSP were sent to 188,823 women. This group represented all eligible women (age range, 50–64 years) in the catchment area of the INBSP. The uptake rate was 76.8%, and 188,823 screenings of 146,114 women were performed during the study period.
Women were selected to undergo screen-film mammography or FFDM when they presented for screening. Assignment was based on the time of check-in—that is, every third or fourth patient was assigned to digital mammography depending on the screening center. Patient age, breast density, or menopausal status did not influence patient selection. Women were not offered a choice of screening technique.
Of the 188,823 screening mammograms, 35,204 (18.6%) were obtained using FFDM and 153,619 (81.4%) using screen-film mammography. These examinations were initial (prevalence) screenings in 53,702, of which 9,546 (17.8%) were performed on digital systems, and were subsequent (incidence) screenings in 135,121, of which 25,658 (19%) were performed digitally.
The composition of both groups was comparable in terms of screening round and age distribution. Twenty-seven percent (9,546 of 35,204) of digital screenings were initial studies compared with 28.7% (44,156 of 153,619) of analog screenings. For women undergoing their first screening, the average age was 53.5 years for FFDM and 54.1 years for screen-film mammography. For women undergoing their second or subsequent screening, the average age was 58.6 years for FFDM and 58.5 years for screen-film mammography. Table 2 compares the two groups in terms of age and screening round. Information regarding breast density, socioeconomic status, and cancer risk factors such as menopausal status or hormone replacement therapy use is not collected by the INBSP and is therefore not included for comparison.
A retrospective analysis was performed to determine if screening was performed on a digital system or screen-film system, if the woman was recalled for further assessment, if a biopsy was performed, and if cancer was diagnosed. The recall rate, biopsy rate, cancer detection rate, and PPV for women who underwent FFDM were then determined and compared with those values for women who underwent standard screen-film mammography screening.
All women signed a consent form to participate in the screening program and agreed in writing to the collection, storage, and exchange of their health records for audit and quality assurance. Institutional review board approval was obtained.
The screen-film mammography images were acquired on one of two units: a GE 800 T (GE Healthcare) using a molybdenum anode and molybdenum–rhodium filter or a Mammomat 3000 (Siemens Healthcare) using a molybdenum–tungsten anode and molybdenum–rhodium filter.
The FFDM images were acquired using one of three machines: Sectra MDM (Sectra), Lorad Selenia (Hologic), or GE Essential (GE Healthcare).
Seven specialist breast radiologists with a minimum of 5 years' experience in reading mammography and reading an average of 20,000 examinations per year participated in image interpretation.
Screen-film mammography images were read using standard motorized mammography alternators. A magnifying glass was offered for reading. Old films were available and displayed under the current studies.
FFDM soft-copy images were read using a PACS mammography review workstation (IDS5, Sectra). The workstation included two high-resolution monitors (2,000 × 5,000 pixels). Initially, all four views were displayed. The images were then displayed at full resolution with one mediolateral oblique view on each monitor followed by one craniocaudal view on each monitor. Any further image manipulation such as zooming or windowing was left to the discretion of the radiologist interpreting the study. The review workstation was located in a darkened room away from the mammography alternators. Previous mammograms were reviewed on a standard view-box adjacent to the workstation.
All mammograms were double read by two radiologists. Reader 2 was not blinded to reader 1's recommendations. Old films were available at the time of reading. Each mammogram was assigned an R category 1–5. The R classification system is a 5-category rating scale used to define the probability of cancer on a radiologic study. The categories are defined as follows: R1, normal study; R2, benign finding; R3, indeterminate finding requiring further workup; R4, likely malignant finding; and R5, highly suspicious for malignancy. The R classification system is similar to BI-RADS with the main difference being that all patients with R3 findings are recalled for further assessment, whereas a diagnosis of BI-RADS category 3 usually prompts short-interval follow-up [18]. Six-month recall is not practiced in our screening program.
For each case, if both readers assigned a category of R1 or R2, the woman was listed for routine screening in 2 years. If both readers assigned a category of R3–R5, the woman was automatically recalled for further workup.
If there was a discrepancy in R category—that is, if one reader assigned a category of R1 or R2 but the other reader assigned a category of R3, R4, or R5, the patient was listed for discussion at a consensus meeting. This process in INBSP has recently been described in an article by Shaw et al. [19]. The consensus meeting was held twice weekly. All radiologists were invited to attend, and a minimum of two was required. All cases with a discrepancy in R category from the previous week were reviewed, and a consensus was reached as to whether the patient should be recalled for assessment or listed for routine screening.
Further diagnostic workup for all women recalled to assessment was performed in the central screening units. Diagnostic workup involved acquiring additional mammographic views including spot compression, true lateral, and magnification views and performing an ultrasound examination and MRI as required.
All solid masses, R3–R5 microcalcifications, and other suspicious lesions underwent core biopsy using a 14-gauge biopsy device. An 11-gauge suction device (Mammotome, Ethicon Endo-Surgery) was used in some cases depending on radiologist preference. Biopsy was performed under ultrasound guidance if possible and stereotactic guidance was used if necessary. On-site specimen radiographs were obtained to confirm adequate sampling of microcalcifications.
All samples were evaluated by a dedicated breast pathologist. All lesions biopsied were discussed at a multidisciplinary meeting within 1 week of biopsy. The multidisciplinary meeting was attended by breast surgeons, breast radiologists, breast pathologists, and medical and radiation oncologists as well as nursing staff, administration staff, and radiographers.
The recall rate, cancer detection rate, biopsy rate, and PPV were calculated for digital mammography and were compared with those values for standard screen-film mammography. The studies were subdivided into initial screenings and subsequent screenings and according to patient age. The indications for recall were noted in all women diagnosed with cancer, and the cancer detection rate based on the abnormality detected was compared for the two groups. The recall and cancer detection rates were also compared as a function of time for the digital group to determine whether a learning curve was apparent.
The recall rate was defined as the percentage of women screened who were recalled for further diagnostic workup. The cancer detection rate was defined as the number of cancers detected per 1,000 women screened. The PPV1 was the number of cancers detected as a percentage of the women recalled for assessment. The PPV2 was the number of cancers detected as a percentage of the women who underwent biopsy.
Statistical analysis was performed using a statistical software program (SigmaStat 3.0, Systat Software). A chi-square test was used to compare recall rate, cancer detection rate, and PPVs in FFDM and screen-film mammography. The significance level was set at a p value of < 0.05.
Table 3 summarizes the number of women screened, the recall rate, the cancer detection rate, and the PPV1 both in total and for initial and subsequent screenings. The rates for FFDM and screen-film mammography are compared.
A total of 6,135 women (of 188,823 screening mammograms) were recalled for further diagnostic workup, giving a recall rate of 3.2% overall. For women undergoing screen-film mammography, 3.1% (4,729 of 153,619) were recalled compared with 4.0% (1,406 of 35,204) of women who underwent FFDM. This difference was statistically significant (p < 0.001).
The recall rate was higher in women undergoing their first screening—6.0% (3,220 of 53,702). For first-screen women undergoing screen-film mammography, the recall rate was 5.7% (2,526 of 44,156) compared with 7.3% (694 of 9,546) in those undergoing FFDM (p < 0.001).
As expected, the recall rate was lower in women undergoing a subsequent screening: 2.2% (2,915 of 135,121 screenings). For subsequent-screen women undergoing screen-film mammography, the recall rate was 2.0% (2,203 of 109,463) compared with 2.8% (712 of 25,658) of women undergoing FFDM. This difference was also statistically significant (p < 0.001).
The recall rate for FFDM was significantly higher for women of all ages (Table 4). For women 50–54 years old, the recall rate for FFDM was 5.3% (649 of 12,188) versus 4.4% for screen-film mammography (2,343 of 53,229; p < 0.001). For women 55–59 years old, the recall rate for FFDM was 3.4% (421 of 12,493) versus 2.3% for screen-film mammography (1,306 of 56,209; p < 0.001). For women 60–64 years old, the recall rate for FFDM was 3.2% (336 of 10,523) versus 2.4% for screen-film mammography (1,080 of 44,181; p < 0.001).
During the study 1,013 cancers were detected giving an overall cancer detection rate of 5.4 per 1,000. The rate was higher in those undergoing their first screening round; 384 cancers were detected in 53,702 screenings (7.2 per 1,000). Six hundred twenty-nine cancers were detected in the 135,121 second or subsequent screenings (4.7 per 1,000).
The cancer detection rate was significantly higher in those screened on FFDM: 221 cancers were detected in 35,204 digital screenings (6.3 per 1,000), and 792 cancers were detected in 153,619 analog screenings (5.2 per 1,000) (p = 0.01).
When the study cohort was subdivided into women undergoing their first screening and those undergoing subsequent screening rounds, the detection rate remained higher in the digital group.
Of the women undergoing their initial screening, 75 cancers were detected in 9,546 digital screenings (7.9 per 1,000) and 309 cancers were detected in 44,156 analog screenings (7.0 per 1,000). This difference did not achieve statistical significance (p = 0.483).
Of the women undergoing a second or subsequent screening, a significantly higher number of cancers were detected in those screened on FFDM. One hundred forty-six cancers were detected in 25,658 digital screenings (5.7 per 1,000) and 483 cancers were detected in 109,463 analog screenings (4.4 per 1,000) (p = 0.008).
The cancer detection rate was higher for FFDM across all age categories, but this difference in detection rates did not achieve statistical significance for individual groups (Table 4). For women 50–54 years old, the cancer detection rate was 5.7 per 1,000 (70 cases in 12,188 screenings) for FFDM versus 4.9 per 1,000 (261 in 53,229) for screen-film mammography (p = 0.27). For women 55–59 years old, the cancer detection rate was 5.8 per 1,000 (72 cases in 12,493 screenings) for FFDM versus 4.7 per 1,000 (263 in 56,209) for screen-film mammography (p = 0.13). For women 60–64 years old, the cancer detection rate for FFDM was 7.5 per 1,000 (79 cases in 10,523 screenings) versus 6.1 per 1,000 (268 in 44,181) for screen-film mammography (p = 0.11).
In the women screened using screen-film mammography, 146 (18.4%) of the 792 cancers detected were ductal carcinoma in situ (DCIS). In the women screened on FFDM, 46 (20.8%) of the 221 cancers detected were DCIS (p = 0.48).
The cancers were subdivided into invasive cancers and DCIS, and the detection rates for screen-film mammography and FFDM were compared. The detection rates for both invasive cancers and DCIS were higher in women screened on FFDM. The detection rate for invasive cancers was 5.0 per 1,000 women screened using FFDM (175 of 35,204) and 4.2 per 1,000 women screened using screen-film mammography (646 of 153,619) (p = 0.054). The detection rate for DCIS was 1.3 per 1,000 women screened on FFDM (46 of 35,204) and 0.95 per 1,000 for women screened on screen-film mammography (146 of 153,619) (p = 0.072).
The PPV based on the number of women recalled for assessment (PPV1) who were subsequently diagnosed with cancer was 792 of 4,729 (16.7%) women undergoing screen-film mammography and 221 of 1,406 (15.7%) women undergoing FFDM (p = 0.383).
In women undergoing their initial screening, the PPV1 was 309 of 2,526 (12.2%) for screen-film mammography and 75 of 694 (10.8%) for FFDM (p = 0.337). In women undergoing a subsequent screening, the PPV1 was 483 of 2,203 (21.9%) for screen-film mammography and 146 of 712 (20.5%) for digital mammography (p = 0.455).
The biopsy rate was similar in both groups: 470 of 1,406 women (33.4%) recalled after FFDM screening underwent biopsy and 1,698 of 4,729 women (35.9%) recalled after screen-film mammography screening underwent biopsy (p = 0.09).
The PPV based on the number of women who underwent biopsy (PPV2) was also determined. For those screened on FFDM, 221 cancers were diagnosed in 470 women who underwent a biopsy, giving a PPV2 of 47%. In those screened on screen-film mammography, 792 cancers were diagnosed in 1,698 women biopsied, giving a PPV2 of 46.6%. This difference was not significant (p = 0.93).
The patients were subcategorized according to the date of screening, and the recall rate and cancer detection rate were compared over time.
In year 1, 7.8% (4,759 of 60,636) of women were screened on FFDM. Thirty-three percent (1,580 of 4,759) of these women were attending an initial screening. The recall rate in the FFDM group was 4.1% (193 of 4,759), and the cancer detection rate was 6.7 per 1,000 (32 of 4,759). The recall rate and cancer detection rate for screen-film mammography during this period were 3.2% per 1,000 (1,769 of 55,877) and 5.2 per 1,000 (292 of 55,877), respectively. The difference in recall rates was significant (p = 0.001), but the difference in cancer detection rates was not (p = 0.21).
In year 2, 17.3% (10,963 of 63,403) of women were screened using FFDM, 24.3% (2,664 of 10,963) of whom were undergoing their first screening. The recall rate was 3.7% (407 of 10,963), and the cancer detection rate 6.3 per 1,000 (69 of 10,963). The recall rate and cancer detection rates for screen-film mammography were 2.9% per 1,000 (1,507 of 52,440) and 4.9 per 1,000 (257 of 52,440), respectively. Again, the difference in recall rates was significant (p < 0.001), but the difference in cancer detection rates was not (p = 0.07).
In year 3, 30.1% (19,482 of 64,784) of women were screened using FFDM and 27.2% (5,302 of 19,482) of these were initial screenings. The recall rate was 4.1% per 1,000 (806 of 19,482), and the cancer detection rate was 6.2 per 1,000 (120 of 19,482). The recall rate and cancer detection rate for screen-film mammography were 3.2% per 1,000 (1,453 of 45,302) and 5.4 per 1,000 (243 of 45,302), respectively. The difference in recall rates was again significant (p < 0.001), but the difference in cancer detection rates was not (p = 0.23).
There was no significant difference in the recall rate (p = 0.19) or cancer detection rate (p = 0.91) for FFDM over time.
The type of mammographic abnormality detected was recorded for all women who were diagnosed with cancer and was compared for screen-film mammography and FFDM. Table 5 shows the cancer detection rates for FFDM and screen-film mammography based on the type of mammographic abnormality detected. The cancer detection rate due to the detection of microcalcifications was significantly higher for FFDM for all cancers (i.e., invasive and DCIS combined) (1.9 per 1,000 for FFDM vs 1.3 per 1,000 for screen-film mammography, p = 0.01) and for DCIS alone (1.2 per 1,000 for FFDM vs 0.7 per 1,000 for screen-film mammography, p = 0.009). For invasive cancers, the cancer detection rate due to the detection of architectural distortion was significantly higher for FFDM than for screen-film mammography (1.0 vs 0.7 per 1,000, respectively; p = 0.03). There was no other significant difference between the two groups.
Table 6 compares the sizes of the tumors detected by FFDM and screen-film mammography. Of the 821 invasive tumors detected, 465 (56.6%) measured ≤ 15 mm at diagnosis and 232 (28.3%) measured ≤ 10 mm. For those screened using FFDM, 103 of 175 (58.9%) invasive cancers measured ≤ 15 mm and 43 (24.6%) measured ≤ 10 mm. For those screened using screen-film mammography, 361 of 646 (55.9%) invasive cancers measured ≤ 15 mm and 187 (28.9%) measured ≤ 10 mm. These differences were not statistically significant.
Table 1 summarizes the main findings of the previously published studies comparing digital and screen-film mammography. The recent publication of the final results of the Oslo II study [13] showed a significantly higher cancer detection rate in women between 45 and 69 years old screened on FFDM (5.9 vs 3.8 per 1,000, respectively; p = 0.02). The results of our study support this finding, showing a significantly higher cancer detection rate for FFDM compared with screen-film mammography (6.3 vs 5.2 per 1,000, respectively; p = 0.01). These results suggest that digital mammography may be superior to screen-film mammography for cancer detection in women older than 50 years.
In our study, the two groups of patients were drawn from the same population and were similar in terms of age and screening round. The mammograms were interpreted using the same protocol and by the same radiologists throughout the 3-year period. When digital mammography was introduced, the screening program was well established with a reproducible practice of radiologist recall, reading policies, and guidelines. Therefore, it is likely that no other factors influenced the higher cancer detection rate in the women who underwent digital mammography.
When the screening examinations were subdivided into initial and subsequent screenings, the difference in cancer detection rates remained. This difference was significant for women undergoing a second or subsequent screening (p = 0.008) but not for those undergoing initial screening (p = 0.40). This latter finding is probably due to the fact that a smaller number of women underwent initial screening, so the data lack the statistical power to determine a significant difference (Table 3).
The cancer detection rate was higher for FFDM than screen-film mammography for both invasive cancer (p = 0.054) and DCIS (p = 0.072), and these results approached statistical significance. Although the cancer detection rate was significantly higher in women undergoing digital mammography overall, when broken down into age groups the difference did not reach statistical significance (Table 4). Again, this finding is almost certainly due to the smaller numbers and we anticipate that a significant difference will be shown when larger numbers are compared.
Our results are concordant with the final results of the Oslo II trial [13, 14]. In that study, 23,929 women between 45 and 69 years old attending a population-based screening program were randomized to undergo either screen-film mammography or FFDM, with approximately 29% (6,944) undergoing FFDM. Images were interpreted using independent double reading, and positive results were discussed at a consensus meeting before recall. They reported a cancer detection rate of 3.8 per 1,000 for screen-film mammography and 5.9 per 1,000 for FFDM, which was statistically significant (p = 0.02, chi-square test).
Our screening program has many similarities to that used in the Oslo II study [13], such as biennial screening, independent double reading, and consensus review of reader discrepancies. However, our study represents a larger study population, with 35,204 digital screenings in our study versus 6,944 in theirs. Overall 1,013 cancers were detected in our study versus 105 cancers in Oslo II. The Oslo II investigators reported no significant difference in PPVs, which is also in accordance with our study. Their study included women younger than 50 years old, in whom FFDM has already been shown to be more effective than screen-film mammography for cancer detection [11].
In another Norwegian study, Vigeland et al. [15] compared 18,239 women screened on FFDM with 324,763 women screened on screen-film mammography and found a nonsignificant higher cancer detection rate for FFDM than screen-film mammography (7.7 vs 6.5 per 1,000, respectively; p = 0.058). The results achieved statistical significance for DCIS detection (2.1 vs 1.1 per 1,000, p < 0.001). A limitation of that study was that the large screen-film mammography group consisted of merged data from 18 different counties collected over a 9-year period and read by different radiologists. None of the radiologists who read the screen-film mammography screening examinations were involved in reading the digital screening examinations. In our study, seven experienced radiologists read both FFDM and screen-film mammography and remained constant throughout the 3-year period.
In the current study, the cancer detection rate and DCIS detection rate were significantly higher for cancers presenting as microcalcifications. The invasive cancer detection rate was significantly higher for cancers presenting as architectural distortion. These findings are concordant with those of another European study by Del Turco et al. [16]. In that study, 28,770 women between 50 and 69 years old undergoing biennial screening in a Florence screening program underwent either FFDM or screen-film mammography. Films were double read using the R classification system. Those investigators found a higher cancer detection rate for FFDM than screen-film mammography (7.2 vs 5.8 per 1,000, respectively; p = 0.14), but this difference did not reach statistical significance, likely because of the small sample size; 104 cancers were detected in the digital group and 84 cancers in the analog group. However, Del Turco et al. did report a significantly higher detection rate of cancers depicted as microcalcifications for FFDM than screen-film mammography (2.6 vs 1.2 per 1,000, p = 0.007). This finding suggests that the higher cancer detection rate for FFDM may be secondary to the improved detection of microcalcifications and architectural distortion.
Another European study showing favorable results for FFDM involved the comparison of three techniques: screen-film mammography, photon-counting direct radiography, and computed radiography (CR) [17]. For this retrospective study of a population-based screening program, investigators compared 52,172 screening studies. The cancer detection rates were 3.1 per 1,000 for screen-film mammography, 4.9 per 1,000 for photon-counting direct radiography (p = 0.01), and 3.8 per 1,000 for CR (p = 0.22). Unlike our study, they reported a significantly higher PPV for digital mammography: 22% for screen-film mammography, 47% for photon-counting direct radiography (p < 0.001), and 39% for CR (p < 0.001).
DMIST by Pisano et al. [11] represents the largest clinical trial of digital mammography published to date [12]. In that study, 42,760 asymptomatic women were recruited at 33 different sites to undergo both screen-film mammography and FFDM. Both examinations were read independently by two single readers, one reader for screen-film mammography and one for FFDM. The readers rated the mammograms using a 7-point malignancy scale suitable for receiver-operating-characteristic curve analysis and using BI-RADS. Further workup was performed if either reader recommended it. They found no significant difference between digital and screen-film mammography except in pre- and perimenopausal women younger than 50 years with dense breasts, in whom FFDM showed greater diagnostic accuracy.
It is difficult to compare our study with DMIST because ours is a retrospective study and we are unable to subcategorize patients by breast density and menopausal status. However, what is of interest is that because the INBSP does not currently offer screening to women younger than 50 years, the population reported to benefit from digital mammography by Pisano et al. [11] was not included in our study. We can therefore exclude that category of women as accounting for the significant difference in the cancer detection rates. Our study results suggest that a broader category of women than previously thought may benefit from the use of digital mammography in breast cancer screening.
Earlier trials by Lewin et al. [8, 9] and the Oslo I trial by Skaane et al. [10] reported a slightly higher cancer detection rate in women screened on screen-film mammography, although this difference was not statistically significant. In a study by Lewin et al. [9], 6,736 paired examinations were performed on women 40 years old and older. Forty-two cancers were detected in total: 33 by screen-film mammography and 27 by FFDM. The difference in these results was not significant (p > 0.1, McNemar chi-square test). Probably the most limiting aspect of that study in its ability to show true differences in cancer detection is the relatively small numbers of cancers diagnosed (42 vs 1,013 in the INBSP study). Also, in the study by Lewin et al., the digital images were acquired using a prototype unit, and a prototype workstation with more limited spatial resolution (1,800 × 2,300 pixels) was used for soft-copy display. The authors commented that the workstation interface was not user-friendly, which may have been a source of distraction to the reader. The studies were also read by a single reader only. Lewin et al. performed a discrepancy analysis of all cases in which the interpretation of the screen-film examination differed from that of the digital examination. The most common reasons cited for discrepancy were fortuitous positioning and minor differences in opinion rather than any factor inherent to the technique used.
The Oslo I study by Skaane et al. [10] also reported a nonsignificant higher cancer detection rate in women screened on screen-film mammography than those screened on FFDM (7.6 vs 6.2 per 1,000, respectively; p = 0.23, McNemar test). However, the numbers in that study were also relatively small (3,683 paired examinations), and the authors commented that the reading environment for the digital studies was suboptimal with high ambient lighting. They performed a conspicuity analysis for all cancers detected and concluded that both techniques were equal overall, with 61% of tumors showing equal conspicuity and 19.3% showing superior conspicuity on both screen-film mammography and FFDM. They, therefore, concluded that the cancers missed on FFDM were not due to poor image quality because the cancers were visible in retrospect, and they attributed the misses to both a suboptimal reading environment and a learning curve effect. The number of cancers detected in that study was also relatively small with 28 cancers detected on screen-film mammography and 23 detected on FFDM, again limiting its power to show small differences.
In this study, we compared the cancer detection rates for digital mammography over time to look for a learning curve effect but found no significant difference (p = 0.91).
In our study, the recall rate was significantly higher for digital mammography in all age categories and for women undergoing both initial and subsequent screenings. Again, these findings are similar to the Oslo studies [10, 13], both of which reported a higher recall rate for the FFDM group. This difference was significant in the Oslo II study (4.2% for digital mammography vs 2.5% for screen-film mammography, p < 0.001). Del Turco et al. [16] also reported a significantly higher recall rate overall for digital mammography than screen-film mammography (4.56% vs 3.96%, p = 0.01). In the Italian study, the recall rate for FFDM was significantly higher than screen-film mammography for the detection of microcalcifications but not for masses or architectural distortion. They found that although the recall rate was higher for women of all ages and all breast density categories, the difference in recall rates was significant only for women 50–59 years old and for women with dense breasts (> 75% density). We did not record breast density, but with large numbers in this study of a very homogeneous population, we think that it is reasonable to assume that the distribution of breast density would be similar for both groups.
The higher recall rate for FFDM than screen-film mammography in our study could be attributable to improved conspicuity of abnormalities with digital mammography. It could also reflect a degree of unfamiliarity with a new technique. However, if the latter is the case, then the recall rate should have decreased over time, which was not apparent in our study; we detected no significant difference in recall rate over time (p = 0.19). The higher recall rate in the digital group may account for the higher cancer detection rate because previous studies have shown that cancer detection rates increase with increasing recall rates. This increase in cancer detection rates occurs at the expense of increasing false-positive rates. Otten et al. [20] reported that breast cancer detection rates can be increased by lowering the threshold for recall, especially for recall rates of 1–4% [20]. However, with further increases in recall rate, the cancer detection rate levels off with an associated disproportionate increase in false-positives. According to the study by Otten et al., for each 1% incremental increase in recall rate above 5%, the detection rate increases by only 0.03%, whereas PPVs decrease to less than 10%.
The aim of a screening program is to increase cancer detection while avoiding unnecessary morbidity and cost associated with increasing false-positive rates. The INBSP operates within strict quality assurance guidelines. Acceptable ranges for recall rate, biopsy rate, and cancer detection rate have been established nationally and are in keeping with international guidelines. The increased recall rate associated with FFDM in our study (from 3.1% with screen-film mammography to 4.0% with FFDM) is still within the acceptable range. The PPVs for FFDM and screen-film mammography were comparable, which implies that the increased recall rate was not associated with an unacceptable increase in false-positives.
The recall rate of a screening program depends on a number of factors including the skill of the readers, factors inherent to the screening population such as age and screening round, national health policy, and medicolegal issues. During our study, the same experienced readers read the FFDM and screen-film mammography screening examinations with no change in protocol or recall practice. The threshold for recall of suspicious findings was not lowered for FFDM; therefore, it is likely that the increased recall rate was due to increased detection of mammographic abnormalities. Whether this increased detection was due to the increased perception of subtle abnormalities or to the increased interpretation of perceived abnormalities as being suspicious is a topic for on-going research.
In previous studies from the United States by both Lewin et al. [8, 9] and the DMIST group [11], the recall rates were much higher than the recall rates reported in the INBSP and other European studies [12]. This phenomenon has been previously described by Smith-Bindman et al. [21]; they reported that recall rates are twice as high in the United States as in the United Kingdom. Lewin et al. [9] found a significantly lower recall rate in the FFDM group than in the screen-film mammography group (11.8% vs 14.9%, respectively; p < 0.001, McNemar chi-square model). In the DMIST trial, the recall rate was 8.4% for both FFDM and screen-film mammography. The lower recall rate in our study and other European studies may be representative of inherent differences between the screening systems in Europe and the United States, as previously commented on by Skaane et al. [10]. The lower threshold for recall of subtle abnormalities in the United States is believed to reflect a difference in the medicolegal environment.
A potential criticism of our study is that because the women screened in 2006 and 2007 have not yet had their 2-year follow-up, early false-negative studies cannot be excluded. However, the main premise of our study is that the cancer detection rate is higher for digital mammography and although we cannot evaluate for false-negative studies, this is true of both digital and analog groups. Some women (∼ 25%) underwent two screening mammography examinations during the study period. We do not think that this would have influenced the results of the study because women are removed from the screening population to a symptomatic service once they are diagnosed with cancer and were recalled for a specific abnormality only once. Assignment to digital or analog mammography was not influenced by the type of mammography examination previously performed and was based only on the time of check-in.
Another potential criticism is the possibility of bias introduction during randomization. However, patients were not questioned about menopausal status or hormone replacement therapy use before assignment and their breast density was not reviewed. We therefore have no reason to believe that women were preferentially assigned to one technique over the other. There were no differences between the two groups in terms of screening round or age distribution. Information regarding breast density, menopausal status, and other risk factors such as hormone replacement therapy use, parity, and age of menarche is not recorded by the INBSP. This is a limitation of our study, and subtle differences between the two groups cannot be definitively excluded. However, the large size of our study population should minimize any potential hidden bias introduced by small differences in these factors.
This study represents the largest review of the use of digital mammography in a population-based breast screening program to date. The results are very favorable for FFDM, which showed a cancer detection rate significantly higher than that for screen-film mammography (6.3 vs 5.2 per 1,000, p = 0.01). The benefit was apparent from the outset and was maintained in both the initial and subsequent screening groups. These findings support the previous findings of Skaane et al. [13]. Women younger than 50 years who have been shown to benefit from digital mammography in the DMIST trial were not included in our study. The results of this study indicate that the benefit of digital mammography can be extended to a broader group of women up to the age of 64 years. These findings further suggest that FFDM with soft-copy reading can be safely implemented in large-scale breast cancer screening programs.
Address correspondence to N. M. Hambly ([email protected]).
We thank Albert Winston and Donal Kiernan for their contribution to data collection.

Audio Available | 