AJR F and L Medical Products: Radiation Protection & More
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Zheng, B.
Right arrow Articles by Gur, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zheng, B.
Right arrow Articles by Gur, D.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Hotlight (NEW!)
Right arrow
What's Hotlight?
AJR 2005; 185:194-198
© American Roentgen Ray Society


Original Research

Performance and Reproducibility of a Computerized Mass Detection Scheme for Digitized Mammography Using Rotated and Resampled Images: An Assessment

Bin Zheng, Glenn S. Maitz, Marie A. Ganott, Gordon Abrams, Joseph K. Leader and David Gur

Department of Radiology, University of Pittsburgh, 300 Halket St., Ste. 4200, Pittsburgh, PA 15213-3180.

Received July 26, 2004; accepted after revision October 1, 2004.

 
Address correspondence to B. Zheng (zhengb{at}upmc.edu).


Abstract
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
OBJECTIVE. Our objective was to compare the performance and reproducibility of a computer-aided detection (CAD) scheme that uses multiple rotated and resampled images with an in-house-developed CAD scheme (single-image-based) and a commercial CAD product in detecting masses depicted on digitized mammograms.

MATERIALS AND METHODS. Ninety-two film mammograms (acquired from 23 patients) were selected. Forty-four mass regions associated with malignancy were visually identified. A commercial CAD system was used to scan and process each image four times, for a total of 368 digitized images depicting 176 mass regions. Images were processed using two CAD schemes developed in our laboratory. One uses the detection results generated from a single image, and the other averages five detection scores generated after processing the originally digitized image and four slightly rotated and resampled images. A region-based analysis was used to compare reproducibility and performance levels among the two in-house schemes and the commercial system.

RESULTS. The commercial system detected a total of 98 mass regions (55.7% sensitivity) and 136 false-positive regions (an average of 0.37 per image). Among the detected mass regions, 76 represented 19 regions that were detected on all four scans and 22 represented 10 regions that were not fully reproducible. Eighty-eight false-positive detections represented 22 reproducible detections on all four scans. Our single-image-based scheme identified 87 mass regions and 160 false-positive regions. Seventeen mass regions and 28 false-positive regions were detected on all four scans. The multiple-image-based scheme identified 98 mass regions and 132 false-positive regions. Twenty-three mass regions were detected on all four scans. One hundred twelve of the 132 false-positive regions represented 28 reproducible detections.

CONCLUSION. Averaging detection scores from multiple rotated and resampled images generated from a single digitization of a film can reduce variations in detection scores. Our multiple-image-based scheme improved both performance and reproducibility over the single-image-based scheme. The multiple-image-based scheme yielded an overall performance comparable to that of the commercial system but with improved reproducibility.


Introduction
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Computer-aided detection (CAD) systems are routinely used in many medical institutions around the world. Radiologists' confidence in CAD results is one of the most important factors in determining whether the use of these systems actually improves diagnostic performance [1, 2]. Several studies have suggested that both the performance levels and the reproducibility of these systems can affect radiologists' confidence in and reliance on CAD results [3-7]. The reproducibility of CAD schemes rarely has been reported, partially because a comprehensive assessment of reproducibility is tedious and difficult, requiring repeated digitization of a large number of film mammograms. Otherwise, the results may be unreliable [6]. In addition, the original mammograms are not always available to the researcher for this purpose. The reproducibility of true-positive findings (i.e., actual masses and microcalcification clusters) is generally substantially better than that of false-positive findings, suggesting that averaging detection results (CAD-generated likelihood scores for positive findings) obtained from repeatedly digitized films may improve reproducibility and overall performance (i.e., may increase sensitivity or decrease false-positive detection rate) [3, 8].

Although optical and electronic noise can change the pixel value distribution of a digitized image, it is believed that small shifts in film positioning between digitizations are also important factors affecting pixel values, resulting in poor reproducibility [6]. That effect is independent of the specific digitizer being used. In an attempt to improve the reproducibility and performance of an in-house-developed CAD scheme, we developed a method that generates multiple images from a single digitization of a film using small rotations and interpolations (resampling). This approach intends to simulate small shifts in film positioning during repeated digitizations [9].



View larger version (10K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1 Effective sizes and contrast levels (in digital values) of the 44 mass regions used in this study.

 
In this study, we digitized a set of mammograms four times and used these digitized images to evaluate the reproducibility and performance of a multiple-image-based CAD scheme that averages scores detected from five matched regions depicted on the original digitized image and four rotated and resampled images. The results are compared with those for a single-image-based scheme previously developed in our laboratory and a commercial CAD system.



View larger version (40K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 2 Comparison of repeated detections of true-positive (TP) mass regions using the three computer-aided detection schemes.

 



View larger version (35K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 3 Comparison of repeated detection of false-positive (FP) mass regions using the three computer-aided detection schemes.

 

Materials and Methods
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Twenty-three four-view mammographic examinations were selected for the study. Each examination depicted a visible mass. Biopsy reports confirmed that all depicted masses were associated with cancer. Twenty-one of the 23 masses were visible in both craniocaudal and mediolateral oblique views, and two were visible in only the mediolateral oblique view because the masses were too interior to be imaged on the craniocaudal view. Hence, 44 mass regions were depicted in this data set. The locations of these mass regions were marked on the appropriate images by an experienced radiologist aided by latest and prior images and all relevant radiology and pathology reports. Each of the 92 original films was scanned (digitized) and processed four times using a commercial CAD system (SecondLook, CADx Medical Systems; software version 6.0, iCAD). CAD results were saved, and the digitized images were transferred to a server in our laboratory. Figure 1 shows the distribution of measured effective sizes and contrast levels for the true-positive mass regions. Effective size was defined as the square root of the product of the longest and shortest axes across the depicted mass region [10], and contrast was defined as the difference between the mean pixel values inside the mass region and the surrounding background [11].

Simulation of repeated digitization using rotation and resampling of one digitized image is based on the hypothesis that small shifts in film positioning substantially contribute to poor reproducibility of CAD schemes [6]. The algorithm for rotating and resampling images has been described else-where [9]. In brief, each digitized image is first subsampled by averaging digital values. Each subsampled image is then automatically cropped to remove the majority of background pixels while retaining the entire area of breast tissue in the image. The resulting image (M columns x N rows) is then rotated slightly four times with rotation angles of {alpha} = ± 0.4° and ± 0.8°. The rotation center is located outside the image at (M - 573, N/2), where the origin (0, 0) of the coordinate system is at the top left corner of the image. During each rotation, the center pixel at the right edge of the image (M, N/2) is shifted by four pixels in the vertical direction, which represented a maximum linear displacement of 1.6 mm over the entire image. After rotation, the digital value of each pixel is resampled (interpolated) on the basis of four partially covered pixels in the initial digitized image:

where Si,j is a coverage ratio of the partial area (0 ≤ Si,j ≤ 1 and S1,1 + S1,2 + S2,1 + S2,2 = 1), I (Ii,j) is the digital value of a pixel (i,j) in the original digitized image, and I' (I'x,y) is the digital value of a pixel (x,y) in the rotated and resampled image. In this manner, we generate from a single digitized image four images, each with a slightly different pixel value distribution.

In this study, we compared the reproducibility and performance of three CAD schemes. The first scheme was that currently used in a commercial system (SecondLook, software version 6.0); the second scheme was a single-image-based scheme developed in our laboratory [12]. This scheme uses three stages to identify and classify suggestive mass regions. First, we use image subtraction after processing by two Gaussian filters with a large difference in the kernel sizes, followed by thresholding to identify between 10 and 30 suggestive regions per image. Second, an adaptive region growth algorithm defines three topographic layers for each region on the basis of local contrast measurement. Simple intralayer rules on growth ratio and shape factor are used to eliminate as many as 75-85% of the identified regions. Third, a set of features is also computed for each region, and the features are used as input values in an artificial neural network. The region is classified as positive or negative on the basis of the region-specific artificial neural network-generated detection score. The third scheme was a multiple-image-based scheme that uses the average of five detection scores for all matched regions after application of the second scheme to the originally digitized image and four rotated and resampled images. The region-matching criterion is as follows. If the distance between the centers of gravity of two regions is smaller than the maximum radial length of either of the two regions in question, they are considered to be matched [9]. The radial length is the computed distance from the region center of gravity to a pixel on the boundary (contour) [13]. If a matched region is not identified, a zero score is assigned to the region for this image. This matching and scoring is done automatically, without human intervention.

The three schemes were applied to all 368 images in an attempt to detect the 176 depicted mass regions. The overall performance and reproducibility of the three schemes were compared. A region-based analysis was performed, in which each depicted mass region was considered an independent observation. Similar to the commercial CAD product, one predetermined and fixed threshold was used in both in-house schemes. Suggestive regions with detection scores greater than this threshold were considered to be positive; otherwise, the regions were discarded. The same threshold (0.55) was used in this study as in a previous study to compare the performance of two commercial CAD systems and our single-image-based scheme [14].


Results
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Figure 2 compares the reproducibility of true-positive detections of mass regions for the commercial CAD system, our single-image-based scheme, and our multiple-image-based scheme. A reproducible detection was defined as a mass region that was either detected or missed on all four scans. If a region was detected on only one, two, or three scans, it was considered nonreproducible. Of the 44 depicted mass regions, the commercial system generated 34 reproducible detections (including 19 that were actually detected and 15 that were totally missed). Using our single-image-based scheme, 35 mass regions were reproducible (including 17 that were actually detected and 18 that were totally missed). The multiple-image-based scheme generated 41 reproducible detections (including 23 that were actually detected and 18 that were totally missed). Hence, using the multiple-image-based scheme, the nonreproducible detections of true-positive findings were reduced by 67% (from 9 to 3) when compared with the single-image-based scheme.

Figure 3 compares the reproducibility of false-positive detections for the 368 images. The commercial CAD system detected a total of 136 false-positive regions representing 47 independent regions (different locations). Twenty-two (46.8%) of these 47 regions were reproducible. The single-image-based scheme detected a total of 160 false-positive regions in 63 different locations. Twenty-eight (44.4%) of these 63 regions were reproducible. Twenty-five (71.4%) of 35 nonreproducible regions were detected only once in four scans. The multiple-image-based scheme detected a total of 132 false-positive regions in 38 different locations. Twenty-eight (73.7%) of these regions were reproducible.

Tables 1, 2, 3 compare the region-based performance levels among the three CAD schemes. Table 1 summarizes the total number of true- and false-positive mass regions detected in each of the four digitizations of the 92 images by these three schemes. It also summarizes the performance levels of these schemes in detecting the 176 mass regions depicted on all 368 digitized images. Compared with the single-image-based scheme, the multiple-image-based scheme detected 12 additional mass regions (from 87 to 98) and reduced false-positive detections by 28 (from 160 to 132). Compared with the multiple-image-based scheme, the commercial CAD system detected the same number of mass regions and four additional false-positive regions. Tables 2 and 3 show that the multiple-image-based scheme yielded higher reproducibility than did either the commercial CAD system or the single-image-based scheme. The multiple-image-based scheme detected four more reproducible mass regions than did the commercial CAD system (23 vs 19). The multiple-image-based scheme also generated six more reproducible false-positive detections than did the commercial CAD system (28 vs 22). As a result, the multiple-image-based scheme detected fewer independent regions than did either the commercial CAD system or the single-image-based scheme (Table 3).


View this table:
[in this window]
[in a new window]

 
TABLE 1 : Performance Levels of the Three CAD Schemes: Number of True-and False-Positive Mass Regions Detected

 

View this table:
[in this window]
[in a new window]

 
TABLE 2 : Number of True-and False-Positive Regions Detected All Four Times When the Three CAD Schemes Were Applied to 92 Images

 

View this table:
[in this window]
[in a new window]

 
TABLE 3 : Number of Different Regions Detected at Least Once by Each CAD Scheme During Four Scans of 92 Images

 

Only four repeated digitizations of each image were used in this study. Therefore, we computed the largest variation among the detection scores for the 26 true-positive regions that were detected at least once (in four scans) by both the single-image-based and the multiple-image-based CAD schemes (Table 3). The average variations in scores for regions detected by the single-image-based and the multiple-image-based schemes were 0.088 and 0.042, respectively.


Discussion
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Pixel value variations in repeated digitizations of film mammograms are caused mainly by slight shifts in film positioning and noise from the optical and electronic components of the digitizer [6]. These variations may affect computed feature values of suspected regions, potentially resulting in differences in the detection scores generated by CAD schemes. Current CAD schemes use a binary threshold to determine which region will be prompted (detected) or not prompted (discarded); hence, regions with detection scores close to this threshold (e.g., 0.55) could be detected in one scan and missed in another. One approach to improve the reproducibility of detection is to reduce possible variations in the detection scores. In this study, a 52% reduction (from 0.088 to 0.042) in the variation of detection scores was achieved. We demonstrated a practical approach in which a set of rotated and resampled images could replace the tedious task of repeating digitization of the same films to assess the reproducibility of CAD schemes. What is perhaps just as important is that previous studies indicated that false-positive detections were substantially less reproducible than true-positive detections [3, 8]. Therefore, using the average of detection scores generated from one digitized image and four corresponding rotated and resampled images, we could substantially improve the overall performance of our own CAD scheme (i.e., a 12.6% increase in sensitivity and a 17.5% reduction in the false-positive detection rate).

Assessment of the performance of a CAD scheme (i.e., sensitivity and false-positive rate) and assessment of its reproducibility are two different tasks. We noted that the number of true- and false-positive regions detected by the different schemes could be comparable but that the actual regions (locations) might be different. For example, the number of true- and false-positive regions detected by the commercial scheme was comparable to that detected by our multiple-image-based scheme. However, the two schemes demonstrated substantially different reproducibilities. Sixty-six percent (19/29) of the true-positive regions and 47% (22/47) of the false-positive regions were detected during all four digitizations by the commercial system, whereas 89% (23/26) of the true-positive regions and 74% (28/38) of the false-positive regions were detected four times by the multiple-image-based scheme.

The effective size and contrast levels of the mass regions, combined with the region-based sensitivity (55.7%) we obtained with the commercial system, suggest that our data set was not particularly difficult. Similar results (ranging from 52% to 56%) were found in two previous studies testing the performance of two leading commercial CAD systems [5, 14]. Hence, our relatively small data set is quite representative of the distribution of cases we sequentially ascertained from a large screening population [14].

The images in this preliminary study were digitized using a Multi-RAD 861 (Howtek Devices). Our previous tests indicated that this digitizer produces higher noise levels than do other digitizers used in other CAD systems. Because the pixel value variations in repeatedly digitized images are generated by both the small shifts of film positioning and the inherent noise of the digitizer, the improvements in performance with this method may vary for different digitizers and we expect that this method would perform some-what better for noisier images. Additional data are needed in this regard.

Using a desktop PC with a 1.8-GHz central processing unit (Athlon XP 2200, AMD) and 512 MB of random-access memory, our CAD scheme (for which the source codes have not been optimized for the computational speed) takes approximately 8-10 sec to complete the process on a set of five images (including the time required to rotate and resample the original digitized image four times). Compared with the time required to digitize film mammograms using current commercial CAD systems (approximately 45 sec to 1 min per image), the increase in computing time should have little impact on the overall efficiency of the CAD system.

Our measured reproducibility for the commercial system was comparable to that reported for another commercial system [3, 6, 8] despite the use of different types of digitizers and algorithms. We therefore believe that, although tested on images digitized by one commercial system, the concept of averaging detection scores of matched regions as depicted in a set of rotated and resampled images should be applicable to other CAD schemes that use digitized mammograms to identify suggestive regions based on features that are at least partially derived from the local pixel value distribution.


Acknowledgments
 
This work was supported in part by grants CA77850 and CA101733 to the University of Pittsburgh from the National Cancer Institute, National Institutes of Health.


References
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 

  1. D'Orsi CJ. Computer-aided detection: there is no free lunch. Radiology2001; 221:585 -586[Free Full Text]
  2. Guenin MA. How not to assess computer-aided detection for mammography. (letter) AJR2004; 182:1599[Free Full Text]
  3. Malich A, Azhari T, Bohm T, Fleck M, Kaiser WA. Reproducibility: an important factor determining the quality of computer aided detection (CAD) systems. Eur J Radiol2000; 36:170 -174[CrossRef][Medline]
  4. Moberg K, Bjurstam N, Wilczek B, Rostgard L, Egge E, Muren C. Computer assisted detection of interval breast cancers. Eur J Radiol 2001;39:104 -110[CrossRef][Medline]
  5. Zheng B, Ganott MA, Britton CA, et al. Soft-copy mammographic readings with different computer-assisted detection cueing environments: preliminary findings. Radiology2001; 221:633 -640[Abstract/Free Full Text]
  6. Taylor CG, Champness J, Reddy M, Taylor P, Potts HW, Given-Wilson R. Reproducibility of prompts in computer-aided detection (CAD) of breast cancers. Clin Radiol2003; 58:733 -738[CrossRef][Medline]
  7. Gur D, Sumkin JH, Rockette HE, et al. Changes in breast cancer detection and mammography recall rates after the introduction of a computer-aided detection system. J Natl Cancer Inst2004; 96:185 -190[Abstract/Free Full Text]
  8. Zheng B, Hardesty LA, Poller WR, Sumkin JH, Golla S. Mammography with computer-aided detection: reproducibility assessment—initial experience. Radiology2003; 228:58 -62[Abstract/Free Full Text]
  9. Zheng B, Gur D, Good WF, Hardesty LA. A method to test the reproducibility and to improve performance of computer-aided detection schemes for digitized mammograms. Med Phys2004; 31:2964 -2972[Medline]
  10. Nishikawa RM, Giger ML, Doi K, et al. Effect of case selection on the performance of computer-aided detection schemes, Med Phys 1994;21:265 -269[CrossRef][Medline]
  11. Zheng B, Chang YH, Gur D. On the reporting of mass contrast in CAD research. Med Phys1996; 23:2007 -2009[Medline]
  12. Zheng B, Sumkin JH, Good WF, Maitz GS, Chang YH, Gur D. Applying computer-assisted detection schemes to digitized mammograms after JPEG data compression: an assessment. Acad Radiol2000; 7:595 -602[CrossRef][Medline]
  13. Li L, Zheng Y, Zhang L, Clark RA. False-positive reduction in CAD mass detection using a competitive classification strategy. Med Phys 2001;28:250 -258[Medline]
  14. Gur D, Stalder JS, Hardesty LA, et al. Computer-aided detection performance in mammographic examination of masses: assessment. Radiology2004; 233:418 -423.[Abstract/Free Full Text]

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Zheng, B.
Right arrow Articles by Gur, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zheng, B.
Right arrow Articles by Gur, D.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Hotlight (NEW!)
Right arrow
What's Hotlight?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS