Medical Physics and Informatics
Original Research
The Effect of Image Processing on the Detection of Cancers in Digital Mammography
OBJECTIVE. The objective of our study was to investigate the effect of image processing on the detection of cancers in digital mammography images.
MATERIALS AND METHODS. Two hundred seventy pairs of breast images (both breasts, one view) were collected from eight systems using Hologic amorphous selenium detectors: 80 image pairs showed breasts containing subtle malignant masses; 30 image pairs, biopsy-proven benign lesions; 80 image pairs, simulated calcification clusters; and 80 image pairs, no cancer (normal). The 270 image pairs were processed with three types of image processing: standard (full enhancement), low contrast (intermediate enhancement), and pseudo–film-screen (no enhancement). Seven experienced observers inspected the images, locating and rating regions they suspected to be cancer for likelihood of malignancy. The results were analyzed using a jackknife-alternative free-response receiver operating characteristic (JAFROC) analysis.
RESULTS. The detection of calcification clusters was significantly affected by the type of image processing: The JAFROC figure of merit (FOM) decreased from 0.65 with standard image processing to 0.63 with low-contrast image processing (p = 0.04) and from 0.65 with standard image processing to 0.61 with film-screen image processing (p = 0.0005). The detection of noncalcification cancers was not significantly different among the image-processing types investigated (p > 0.40).
CONCLUSION. These results suggest that image processing has a significant impact on the detection of calcification clusters in digital mammography. For the three image-processing versions and the system investigated, standard image processing was optimal for the detection of calcification clusters. The effect on cancer detection should be considered when selecting the type of image processing in the future.
Keywords: digital mammography, image processing, observer performance
In digital mammography, the final stage in the image formation process is the application of image processing. Image processing is applied to enhance the diagnostic content within the image—for example, by reducing the noise or enhancing the edges and image contrast.
Each manufacturer applies its proprietary image-processing algorithm, and the image appearances resulting from this processing can be very different. A number of studies have been performed to investigate the effect of the difference in image appearance as a result of applying different image-processing algorithms: These studies include both preference studies [1, 2] and objective studies [3–9], and the results of these studies are mixed. Some of the objective studies found significant differences among different image-processing algorithms [3–5], and others did not find a significant difference [6–9]. However, these studies had the following limitations: Different image-processing algorithms were used on systems with different detectors [3], different radiologists inspected different images [8], localization was not included in the task [4], not all cases were from different patients to maintain the independence assumed in the analysis [6], the cancers tended to be obvious and easily detected in all image-processing types investigated [7], and only calcifications were investigated (i.e., noncalcification cancers were not studied) [5, 9]. Therefore, the effect of the difference in image appearance resulting from using different image-processing algorithms on the detection of the various radiologic features of breast cancer is still unclear.
The purpose of this work was to perform a retrospective observer study to investigate the impact of different image-processing algorithms on the detection of cancers in digital mammography. Our approach overcame several of the limitations described previously. Subtle simulated calcifications and subtle noncalcification cancers were included in the study; all the observers inspected all images processed with all image-processing algorithms investigated; the cases were from different patients, thus maintaining independence of cases; the image-processing algorithms investigated were representative of the range currently used in clinical practice; and the study was based on objective measurements rather than observer preference.
The study protocol was approved by the regional research ethics committee. Our ethics committee did not require individual consent because our study used anonymous mammographic images and data.
An expert breast radiologist (25 years’ experience and 6000 screening mammography studies read annually) who did not take part in the study inspected the images and the associated clinical information of 238 patients with pathology-proven cancer. One hundred and eighty five of these women presented for screening and 53 were women showing symptoms of cancer. These cases were collected at two sites using eight mammography systems with amorphous selenium detectors (HologicSelenia and Hologic Dimensions, Hologic). In the United Kingdom National Health Service (NHS) Breast Screening Programme, women 50–70 years old are invited to undergo mammographic screening every 3 years. Women older than 70 years may refer themselves. The examination consists of two views of each breast: a mediolateral oblique (MLO) view and a craniocaudal view. The radiologist marked the outline of each cancer in both views (MLO and craniocaudal) on digital images using an in-house web interface [10] and rated lesion conspicuity in each view (very subtle, subtle, or visible).
Eighty cases with malignant noncalcification lesions (i.e., mass, architectural distortion, or asymmetric density) were selected from these 238 cases by the expert radiologist. The expert radiologist chose cases in which lesion conspicuity was subtle or very subtle but the lesions were still detectable within the images because obvious cancers would be easily detected under all image-processing options. In total, there were 83 noncalcification cancers in the 80 cases; one case was a patient who had bilateral cancer with noncalcification cancers in both breasts. A more detailed description of these cancers is given in Table 1.
In addition, the expert radiologist also inspected the images of 52 consecutive patients with biopsy-proven benign lesions (50 screening-detected cases and two symptomatic cases) and marked the location of the lesions. The images of 30 patients with biopsy-proven benign lesions were randomly selected from these 52 cases for the study. These were a mixture of calcification and noncalcification lesions with a mixture of conspicuities. The images of 160 patients assessed as showing normal findings (i.e., patient not recalled from screening) were also used in the study. Because of the time constraints of the study, the patients had not yet returned for their next screening session, so these normal cases have not been followed up. The interval cancer rate has been shown to be 0.55 cancers per 1000 women screened for the 12-month period after negative findings on a screening study in the United Kingdom [11]. Therefore, of the 270 patients in our study, we could expect 0.15 interval cancers. Although ideally all patients with normal findings would have been followed up, we do not think this lack of follow-up has significantly impacted the results of this study.
The number of subtle calcification clusters within the 238 cancer cases collected was smaller than the number of calcification clusters required for sufficient statistical power in this study. Therefore, instead simulated calcifications were inserted into 80 of the 160 images with normal findings. The method used for calcification cluster simulation is described in the next section. At the time of the study, we did not have simulation models of noncalcification cancers that were suitably advanced for use; therefore, images of real noncalcification cancers as described earlier were used.
Simulated calcification clusters have been used in observer studies of this type in the past [5, 9, 12]. The main advantage of using simulated cancers is that it is very time-consuming to collect real subtle cancers because of the low prevalence of cancers in breast screening, whereas a database of simulated cancers can be generated in a shorter time period. An additional advantage of using simulated cancers for this study is that cancers detected in the clinic will be found by a radiologist viewing images acquired on a particular digital mammography system with a particular image-processing algorithm applied. Therefore, if the clinically used image-processing algorithm is one of the arms in the study, because all the cancers have already been detected with this image-processing algorithm, there may be a bias in favor of this clinically used image-processing algorithm. When simulated cancers are used, this bias is removed.
The simulated calcification cluster images were derived from unprocessed digital images of sliced mastectomy samples imaged on a digital microfocus specimen x-ray cabinet (model MX20 DC2, Faxitron X-Ray, FaxitronBioptics), at five times magnification with an effective pixel size of 10 μm. The clusters within the mastectomy samples were a mixture of ductal carcinoma in situ and invasive carcinoma.
From these images, calcification clusters were extracted using a method described elsewhere [9, 13] to generate an image of the calcification cluster with only the surrounding background tissue removed. If inserted into the normal breast images without any other adjustments, the clusters within the image would appear different than if the cluster had been within the patient’s breast when imaged. This difference results from differences in magnifications, increased scatter, and decreased subject contrast due to the increased thickness of surrounding breast tissue and the different imaging system and imaging parameters used when acquiring the cluster images and normal breast images. Therefore, before a simulated calcification cluster was inserted, its appearance was adjusted to account for these differences. This process has been described and validated elsewhere [14, 15].
Calcification clusters were inserted into the breast images in a range of locations while avoiding positions very close to the skin edge and nipple. The breasts into which the calcification clusters were inserted were randomly selected from all normal images collected, so the breasts were of all densities. When the location to insert a cluster was being selected, the breast tissue immediately surrounding the cluster was categorized as homogeneously fatty, a mixture of fatty and glandular tissue, or homogeneously glandular. The locations were selected so that there were equal numbers of clusters in the three broad categories of local breast density. In total, 89 calcification clusters were inserted with between one and three clusters inserted per case. Examples of simulated calcification clusters inserted into breast images are shown in Figure 1.
![]() View larger version (248K) | Fig. 1A —For this study, images of calcification clusters were inserted into breast images that had originally showed normal findings. A, Three examples of calcification clusters inserted into breast images. Regions shown are 17.5 × 17.5 mm2. |
![]() View larger version (321K) | Fig. 1B —For this study, images of calcification clusters were inserted into breast images that had originally showed normal findings. B, Three examples of calcification clusters inserted into breast images. Regions shown are 17.5 × 17.5 mm2. |
![]() View larger version (327K) | Fig. 1C —For this study, images of calcification clusters were inserted into breast images that had originally showed normal findings. C, Three examples of calcification clusters inserted into breast images. Regions shown are 17.5 × 17.5 mm2. |
It was important that the calcification clusters were realistic in appearance; therefore, once inserted within the breast images, the calcification clusters were inspected by the same expert radiologist who judged them to be realistic in appearance and location and to be subtle in appearance.
The 80 cases with inserted calcification clusters, 80 normal cases with no cancer present, 80 cases with noncalcification cancers, and 30 cases with biopsy-proven benign lesions made up the set of 270 cases used in the study. These 270 cases were processed using three different types of Hologic image processing (Fig. 2). The first type was the manufacturer’s standard image processing and is referred to as the “standard” image processing (Fig. 2A). The second version has an intermediate amount of enhancement. This type is referred to as the “low-contrast” image processing (Fig. 2B). The third has no additional enhancement except skin edge enhancement, so it simulates the appearance of a film-screen image. This type is referred to as “film-screen” image processing (Fig. 2C). The low-contrast and the film-screen image-processing algorithms were developed by Hologic for investigational purposes only. However, the three types of image processing were selected for this study to be representative of the wide range in image-processing algorithms seen on the different digital mammography systems currently used in clinical practice.
![]() View larger version (375K) | Fig. 2A —56-year-old woman with noncalcification cancer. Expert radiologist classified lesion conspicuity in this case, which was used in study, as subtle. A, Subtle noncalcification cancer is shown in insets with standard image processing (A), low-contrast image processing (B), film-screen image processing (C). |
![]() View larger version (341K) | Fig. 2B —56-year-old woman with noncalcification cancer. Expert radiologist classified lesion conspicuity in this case, which was used in study, as subtle. B, Subtle noncalcification cancer is shown in insets with standard image processing (A), low-contrast image processing (B), film-screen image processing (C). |
![]() View larger version (295K) | Fig. 2C —56-year-old woman with noncalcification cancer. Expert radiologist classified lesion conspicuity in this case, which was used in study, as subtle. C, Subtle noncalcification cancer is shown in insets with standard image processing (A), low-contrast image processing (B), film-screen image processing (C). |
For all the cases used in the study, images of both breasts were displayed but only one view (MLO or craniocaudal) was shown. Ideally, two views of both breasts would be displayed, as in the clinic. However, only 2D simulated calcification clusters were available and these could not be reoriented for simulation into both views. For normal and benign cases, the view was selected randomly (subject to the constraint that the benign lesions be visible in the randomly selected view); for cases containing noncalcification cancers, the view in which cancer was most subtle while remaining detectable was selected. Of the 270 image pairs used in the study, 142 were MLO views and 128 were craniocaudal views.
Seven mammography readers (six radiologists and one radiographer) who had 2–22 years of experience reading mammograms (2–7 years with digital mammography) and who read at least 5000 screening examinations per year took part in the observer study. All observers were certified to interpret mammography images as defined by the United Kingdom NHS Breast Screening Programme [16] and had experience reading similar images (i.e., from this manufacturer) in their clinical practice. Each observer inspected the images in six sessions, and a break was taken halfway through each session. The images were organized so that there was at least 2 weeks between each observer seeing the same image pair with different image processing applied, and the sequence was also randomized so that each observer was shown the images in a different order. Observers were trained before the study in a pilot study (using 180 image pairs) to become familiar with the software, the different image-processing types, and the task they were being asked to perform. Additional training with a smaller set of 10 image pairs was performed at the beginning of the main study.
The study was performed using a web-based interface developed in-house [10]. This interface allowed the study to be performed remotely at two sites simultaneously using the clinical workstation in each site. Each workstation contained a pair of 5-megapixel monitors (site 1: model MDMG-5121, Barco; site 2: model GS521-CL, Eizo) calibrated using the vendors’ software to the DICOM grayscale standard display function [17]. At both sites, room illuminance was maintained at an ambient level during sessions [18].
Observers were told there could be no lesions, one lesion, or multiple lesions present within each image. They were asked to mark the center of any region that they were suspicious of being cancerous and were asked not to mark benign features. They were asked to identify if they were marking a calcification cluster, noncalcification cancer, or both. The observer was then asked the following question: What is the likelihood that this lesion is malignant? This question was answered on a 5-point scale, with 1 being the lowest likelihood of malignancy and 5 being the highest likelihood of malignancy.
For data analysis, cases containing biopsy-proven malignant noncalcification cancers and simulated calcification clusters were assumed to be positive for disease. Cases containing biopsy-proven benign lesions or normal cases were assumed to be negative for disease.
A mark made by an observer had to be within the outline of the cancer made by the expert radiologist to be a correctly localized cancer. All other marks were considered false-positives. The ratings given to each lesion were analyzed using jackknife-alternative free-response receiver operating characteristic (JAFROC) analysis [19]. In this analysis, the performance of an observer inspecting images with a particular image-processing algorithm applied is expressed as the figure of merit (FOM) and is calculated as the trapezoidal area underneath the alternative free-response receiver operating characteristic (AFROC) curve when extended to (1,1). This straight line extension credits perfect decisions, such as normal images with no marks, and penalizes unmarked lesions [20].
JAFROC analysis was performed using JAFROC software (JAFROC, version 4.0, D. P. Chakraborty). This analysis performs significance testing using the Dorfman-Berbaum-Metz ANOVA technique [21, 22]; the machine code for the requisite ANOVA module was obtained from the authors of that software package. In the primary analysis, only cases were treated as a random factor (so that the results apply to the population of cases but only for the readers used in the study). Secondary analysis was also performed with both cases and readers treated as random factors. For both, a p value < 0.05 was required for significance.
An additional analysis compared the number of false-positives marked in images for each type of image processing. The number of false-positives was jack-knifed over cases, and Dorfman-Berbaum-Metz ANOVA was performed. This analysis was first performed using false-positives on all cases. The analysis was then repeated for two subgroups of cases: normal cases and cases with biopsy-proven benign lesions. Finally, the number of times a cancer was correctly localized (with any rating) was determined.
The reader-averaged AFROC curves for the noncalcification cancers are shown in Figure 3. The reader-averaged FOMs were equal to 0.726 (95% CI, 0.681–0.770), 0.717 (95% CI, 0.670–0.763), and 0.719 (95% CI, 0.673–0.765) for the standard, low-contrast, and film-screen image-processing types, respectively. The differences between the reader-averaged FOMs for each pair of image-processing types are given in Table 2; the 95% CIs are shown in parentheses. When the 95% CI of the difference does not include zero, this indicates that there is a significant difference between the two image-processing types. No significant difference was found between any of the three image-processing pairs (p > 0.40) when readers were treated as either a fixed or a random factor in the ANOVA analysis.
![]() View larger version (25K) | Fig. 3 —Reader-averaged alternative free-response receiver operating characteristic curves for noncalcification cancers for three different image-processing (IP) types. Empirical curves are shown. Solid lines = measured data, dashed lines = straight line extension to (1,1). |
The reader-averaged AFROC curves for calcification clusters are shown in Figure 4. The reader-averaged FOMs were equal to 0.652 (95% CI, 0.606–0.698), 0.628 (95% CI, 0.584–0.673), and 0.612 (95% CI, 0.568–0.655) for the standard, low-contrast, and film-screen image-processing types, respectively. The differences between the reader-averaged FOMs for each pair of image-processing types are given in Table 2. When readers were treated as fixed factors in the ANOVA analysis, the standard image processing was significantly better than both the film-screen image processing (p = 0.0005) and the low-contrast image processing (p = 0.04). There was no significant difference between film-screen image processing and low-contrast image processing (p = 0.15). When readers are treated as random factors in the ANOVA analysis, the difference between standard and low-contrast image processing became nonsignificant; however, the difference between standard and film-screen remained significant (p = 0.03).
![]() View larger version (27K) | Fig. 4 —Reader-averaged alternative free-response receiver operating characteristic curves for calcification clusters for three different image-processing (IP) types. Empirical curves are shown. Solid lines = measured data, dashed lines = straight line extension to (1,1). |
The number of false-positive noncalcification marks was not significantly different for any of the image-processing pairs investigated (Table 3). The number of false-positive calcification clusters marks increased by 15% for film-screen image processing compared with standard image processing (p = 0.047), as shown in Table 3. When analyzing the subgroup of cases containing biopsy-proven benign lesions, we found that there was no significant difference in the number of false-positive calcification cluster marks for any image-processing pair (p > 0.16). For the subgroup of images with normal findings, the number of false-positive calcification cluster marks was significantly lower for standard image processing than for film-screen image processing (p = 0.041).
The number of times each cancer was correctly localized (mark made within outline of cancer made by expert radiologist) was calculated. The maximum number of times a cancer can be localized is 21 (seven observers reading all three image-processing types). The proportion of times the cancers were localized by all the observers in all the image-processing types and the number of times the cancer was not localized at all are given in Table 4.
We investigated the effect of image processing on the detection of calcification clusters and noncalcification cancers. We found that image processing had a significant effect on the detection of calcification clusters but not noncalcification cancers. Several studies have investigated the effect of image processing on breast cancer detection in general, including both calcifications and noncalcification cancers together [4, 6], or on one cancer type only [5, 9]. We analyzed the noncalcification cancers and calcification clusters separately to determine whether image processing had a different effect on the detection of the different radiologic features. Cole et al. [8] and Kamitani et al. [7] also analyzed calcification and noncalcification cancers separately. However, Kamitani et al. used a smaller number of cancer cases than this study, 11 calcification cases and 34 noncalcification cases. Cole et al. also used a much smaller number of cancer cases acquired on a system with an amorphous selenium detector, eight calcification cases and 13 mass cases, compared with the current study and different observers read images with different image-processing algorithms, whereas in this work the same readers interpreted the same cases with different image-processing algorithms to allow a paired comparison.
Our work found that the standard image-processing algorithm provided significantly higher detection of calcification clusters than the low-contrast and film-screen image-processing algorithms. The results of this study are in agreement with the work by Zanca et al. [5]; they performed a study with a design similar to the present work and found significant differences in calcification detection between several pairs of image-processing algorithms. However, Warren et al. [9], who also performed a study of similar design, found no significant differences in detection between two different image-processing algorithms. Given that the designs of our study and these two studies were similar, the differences in results may be because of differences in the image-processing algorithms investigated.
Our work also found that there were no significant differences in detection of noncalcification cancers between the different types of image processing. This result is in agreement with a study by Cole et al. [8], which found no significant difference in the area under the receiver operating curve for masses. However, that study was limited to small numbers of cases, as described earlier.
In this study, we considered marks made by observers on biopsy-proven benign lesions as false-positives. This approach has been used elsewhere [23] and assumes the ground truth is based on biopsy-proven cancers. If, instead, we consider that biopsy-proven benign lesions are not false-positives because a radiologist must have recalled this patient originally for a biopsy to be performed (correct recall) and the results are reanalyzed, the same conclusions are reached.
Visser et al. [6] found that image processing affected the suspiciousness of normal and malignant features in the breast. In this work we found that there was no significant difference in the number of false-positives with changes in the image processing for noncalcification cancers. However, for calcification clusters there was a significant decrease in the number of false-positives reported on normal images for standard image processing compared with film-screen image processing. Some of these images showed benign features that were not recalled for biopsy in the clinic. The change in the number of false-positives in these images indicates that benign features that did not appear suspicious enough to be recalled in the clinic can appear suspicious enough to be recalled when a different type of image processing has been applied.
The biggest limitation to this work is the number of cases and readers. There was large variability among readers, as shown by the wide 95% CI for the reader-averaged FOM for each image-processing type. In addition, a proportion of the noncalcification cancers were seen by all observers in all image-processing algorithms (Table 4). These lesions may have been too obvious to detect possible differences among image-processing algorithms. A proportion of the calcifications were not seen at all (Table 4). These lesions were too subtle to be useful in the study. Therefore, in retrospect this study would have had more power if some more subtle noncalcification cancers and if some more obvious calcifications had been included in the image set and if more observers had been enrolled in the study. There is a limit to the number of readers it is practical to enroll in such studies; therefore, the power limitations of this study emphasize the need for careful case selection when preparing observer studies.
In summary, for the particular image-processing algorithms and systems investigated, we found that the standard image processing was superior for the detection of calcifications and that there was no significant difference among image-processing algorithms for noncalcification lesions. The differences measured in the JAFROC FOM for calcification detection were between 4% and 6%. This difference is smaller than the change in calcification detection for other factors such as detector type or dose [9].
Finally, the results in this study are applicable for the particular image-processing algorithms and systems investigated. We recommend that objective measurements, such as the method used in this study, be applied when selecting the optimal image processing to be used by radiologists with other image-processing algorithms and systems.
This work is part of the OPTIMAM project and was supported by Cancer Research UK & Engineering and Physical Sciences Research Council Cancer Imaging Programme in Surrey in association with the Medical Research Council and Department of Health (England).
D. P. Chakraborty was supported in part by grants from the U.S. Department of Health & Human Services, National Institutes of Health (ROI-EB005243 and ROI-EB008688).
We thank Carole Kliger, Caroline Taylor, Victoria Cooke, Louise Wilkinson, Wilmi Pienaar, Rupika Mehta, and Christine Flis for taking part in the study. We also thank Lindsay Mungutroy and Charul Patel for their assistance in running the observer study. Thanks are also due to Claire Borrelli, Sharon Montaque, and Sue Storr for organizing time on the workstations to complete the study. We thank David Duncan for writing the software used to insert the calcification clusters into breast images.
We are grateful to the staff at King’s College Hospital—in particular, Asif Iqbal and Michael Michell for their assistance collecting breast images. We are also grateful for the help and cooperation of staff from the Jarvis Breast Screening Centre while collecting breast images for the study.
Finally, we are grateful to Hologic staff for advice and help in processing the images.

Audio Available | 







