|
|
||||||||
1 All authors: Department of Radiology, University of Pittsburgh, A451 Scaife Hall, 3550 Terrace St., Pittsburgh, PA 15261-0001.
Received February 28, 2000;
accepted after revision May 15, 2000.
Supported in part by grants CA62800, CA66594, and CA67947 from the National
Cancer Institute.
Abstract
|
|
|---|
MATERIALS AND METHODS. Eight observers assessed 60 mammographic images obtained in six modes, ranging from noncompressed to a maximum data compression level of 101:1. Observers were asked to rate the images on a scale of 0 to 100 for the likelihood of the presence of a mass and also independently for the likelihood of the presence of clustered microcalcifications. In addition, observers were asked to rate their subjective assessment of the quality of each image for the detection of a mass and separately for the detection of microcalcifications. Receiver operating characteristic analyses were performed.
RESULTS. The average area under the receiver operating characteristic curve, Az, for the detection of clustered microcalcifications decreases significantly at the highest data compression level when compared with the noncompressed and two lowest levels of data compression (p < 0.01), and a trend test of the average area under the receiver operating characteristic curve for all observers is statistically significant (p < 0.05). No statistically significant differences among or between any of the data compression level modes for the detection of masses were detected.
CONCLUSION. At a high level of mammogram data compression, observer performance was degraded for the detection of clustered microcalcifications. Detection of masses was not affected by the data compression methods and levels used in this study.
|
|
|---|
|
|
|---|
The digitization process used a high-resolution, high-contrast sensitivity modified Lumisys-150 laser film digitizer (Lumisys, Sunnyvale, CA) that produces a scan matrix of 4096 x 5120 pixels for an 8 x 10 inch (18 x 24 cm) film by digitizing at a 50-µm sampling interval. This pixel resolution results in a Nyquist spatial frequency of 10 cycles/mm, which preserves much of the signal resolution of the analog film. The modulation transfer function at the Nyquist frequency is 30% in the fast-scan direction and 33% in the slow-scan direction. The 16-bit analog-to-digital converter of the digitizer permits a density measurement with a root mean square error of less than 0.01.
Each digitized image was compressed at five different compression levels.
The JPEG compression algorithm used in this study is a 12-bit version of the
standard. Before the JPEG algorithm was applied, each digitized mammogram was
segmented to identify image areas corresponding to tissue as opposed to
background, and background pixels were set to a constant value. The actual
procedures used in this process have been described in detail elsewhere
[9]. Because it is the degree
of quantization rather than the compression ratio itself that determines the
degradation of an image by compression, a measure of quantization was used as
an independent variable and not the actual compression ratio. Specifically, we
produced quantization matrices by scaling a base matrix that had been designed
to preserve slightly more accuracy for the lowest frequency components (i.e.,
the zero-frequency component q00 = 0.5; low-frequency components
q01 = q10 = q11 = 0.7; and all other entries
qij = 1.0 when i,j
2). The five constant scaling factors we
applied to this base matrix were 128, 176, 240, 304, and 400, which produced
mean compression ratios of 24:1, 34:1, 52:1, 70:1, and 101:1, respectively.
With this scheme, increased image noise will reduce compressibility for the
same quantization factors, and digitization at different resolutions will
result in different spatial frequencies in the image being affected by each
quantization factor. In our data set, the highest level of quantization
produced a mean compression ratio of 101:1, which is significantly higher than
what is required to permit the efficient handling of mammographic images with
current digital technology, and it is in the range in which studies of other
algorithms applied to different image types have shown significant
deterioration in image quality
[9,
10]. Note that these
particular quantization factors are appropriate only for images digitized at
the resolution and with the noise characteristics under consideration in this
study.
All images used in this study, both the data compressed versions as well as the original (noncompressed) digitized data versions, were laser-printed onto film with a specially modified Kodak Ektascan 2180 laser film printer (Eastman Kodak). This laser printer was modified to print a 43-µm pixel size. Laser-printed films were used for all image comparisons. The noise contributed by the laser printer is substantially lower than image noise in the original film.
The data sent to the printer are first passed through a specially designed lookup table. The first portion of the table, which corresponds to the brightest pixels, is quite flat (low contrast). The second portion of the table is steeper at the values that correspond to most of the tissue pixels in the image. The third part is relatively flat at the values corresponding to the darkest pixels in the image. This three-segment lookup table compensates for the laser printer's lower dynamic range (as compared with the original film) while preserving much of the contrast and brightness of the original film in the optical density range of 0.3 to 3.2. No enhancement processing was performed on the images (e.g., unsharp masking).
Observer Performance Study
Each image was labeled with a number. At the start of each observer
performance session, each participating mammographer, all of whom were unaware
of the study's purpose, started a computer program that produced a list of 30
random numbers in one mode (not identified to the participant). The observers
pulled the film images in the order corresponding to the printed list from an
ordered file and started a computer program that provided a unique scoring
screen for each image in the same order as shown on their list; the image
number was displayed on each scoring screen to ensure that the answers entered
into the computer were correctly matched with the image. Each screen was
identical, and each observer would view the film image and use a mouse to move
a slider on a scale of 0-100 (0, definitely absent; 100, definitely present)
to indicate confidence as to the likelihood of the presence of clustered
microcalcifications and use a separate slider for the presence of a mass.
Observers used similar sliding scales to assess the image quality for the
detection of clustered microcalcifications and separately for the detection of
a mass. Only when all the information was entered for a given image could the
observer access the next screen. The time the answer form remained on the
computer was recorded as an approximation of the time, in seconds, it took to
review the accompanying image and was recorded in the answer file for that
image.
Data Analysis
The area under the estimated receiver operating characteristic curve
(Az) was calculated for each observer, each compression
level, and each of the abnormalities with the use of the ROCKIT program
(Charles E. Metz, University of Chicago), which allows the option of
calculating the Az for continuous data
[11]. This calculation
resulted in a total of 96 Az estimates (eight observers
times six modes times two abnormalities).
The data were analyzed with the use of a single-factor one-way analysis of variance and a trend analysis to test for any significant differences in observer performances due to data compression. The trend test was a regression analysis, for which the statistical inference was derived by the slope estimate and the corresponding t statistic.
For masses and clustered microcalcifications, image quality ratings were averaged for each observer, and the mean and standard error of the averages were calculated for each mode. Similarly, the estimates of the observer times, in seconds, were averaged for each observer, and the mean and standard error of those averages were computed for each mode. Times exceeding 300 sec (5 min) were excluded from the analysis in an effort to discard times that were excessively long because of external factors such as telephone calls and discussions. In this study, approximately 4% of the interpretations were excluded from the computation.
|
|
|---|
|
|
The means of the image quality ratings for each successive data compression
level, starting with noncompressed, were 55, 53, 53, 53, 50, 50 and 49, 47,
47, 44, 38, 33 for the detection of a mass and clustered microcalcifications,
respectively. The results of a trend test showed a statistically significant
decrease in the perceived image quality rating for both abnormalities as data
compression levels increased (p < 0.01). For the detection of
microcalcifications, the perceived image quality was judged as poorer and
decreased faster with increasing data compression levels. The mean
interpretation times (±SD) in seconds for the different compression
levels were 67 (±15), 65 (±10), 58 (±17), 62
(±17), 62 (±21), and 59 (±14) for the noncompressed and
increasing data compression levels, respectively. These results were not
significantly different for any mode (p
0.12), and a trend test
was also not significant (p = 0.79).
|
|
|---|
The relevance of this study is limited somewhat by the fact that we used only a single view, whereas multiple views are used (two at a minimum) in the clinical environment. However, relative detection performance levels and trends in performance as a function of levels of data compression are the main interest here rather than absolute detection rates. Therefore, we believe that the study results and conclusions are scientifically valid even if direct clinical relevance is reduced. As indicated, we did not use the conventional nondigitized film in our study because it was readily recognizable. Because of its higher quality (mainly resolution > 10 lp/mm) when compared with the laser-printed film, one would expect potentially better performance with these films. However, as to the noncompressed digital images, the quality was quite high, and the detection performance is expected to be similar to that with conventional film.
In a previous multipoint rank-order study performed by our group [12], a subset of the cases used in this study was presented to a different group of mammographers who were asked to rank the images in each case from best to worst with respect to their usefulness in making the correct diagnosis of masses and clustered microcalcifications. Observers were able to order the films roughly close to the order of the compression level even though the observers were unaware of the study's exact purpose. Although subjective, these results are consistent with the objective performance-based findings presented here.
Assuming that there is acceptable performance in interpreting mammograms at relatively high levels of data compression, the subjective ratings of image quality, which decrease steadily with increasing levels of data compression, suggest that subjective comfort of radiologists rather than objective degradation in performance may ultimately determine the maximum acceptable data compression level. We want to emphasize that objectively measured detection performance levelsrather than subjective ratings of perceived image qualityshould ultimately determine what is acceptable (or not) in the clinical environment, but in practical terms this ideal is not always the case. When objective and subjective measures largely concur, as in this case, one should seriously evaluate the implications for both diagnosis as well as individual preferences.
In conclusion, high levels of data compression seem to affect observer performance in the detection of clustered microcalcifications. As important, perhaps, is that subjective assessment of image quality seems to indicate that comfort level of the observer decreases as well. The consistent trends we have observed suggest that there may also be an effect at lower compression levels, but measuring such effects would require a prohibitively large sample size. Hence, although possibly real, these effects may not be relevant to the clinical environment.
Acknowledgments
We thank Jill King and David Gur for their participation in the study and
data analysis as well as with their help with manuscript preparation.
|
|
|---|
This article has been cited by other articles:
![]() |
M. Penedo, M. Souto, P. G. Tahoces, J. M. Carreira, J. Villalon, G. Porto, C. Seoane, J. J. Vidal, K. S. Berbaum, D. P. Chakraborty, et al. Free-Response Receiver Operating Characteristic Evaluation of Lossy JPEG2000 and Object-based Set Partitioning in Hierarchical Trees Compression of Digitized Mammograms Radiology, November 1, 2005; 237(2): 450 - 457. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Suryanarayanan, A. Karellas, S. Vedantham, S. M. Waldrop, and C. J. D'Orsi Detection of Simulated Lesions on Data-compressed Digital Mammograms Radiology, July 1, 2005; 236(1): 31 - 36. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |