|
|
||||||||
Original Research |
1 Department of Radiology, University of Ulsan, Asan Medical Center, 388-1,
Seoul 138-736, South Korea.
2 Department of Radiology and Lineberger Comprehensive Cancer Center, University
of North Carolina at Chapel Hill, Chapel Hill, North Carolina, 27599.
3 Salix Pharmaceuticals, Inc., Morrisville, North Carolina.
4 Department of Biostatistics, University of North Carolina at Chapel Hill,
Chapel Hill, North Carolina.
5 Department of Radiology, University of North Carolina at Chapel Hill, Chapel
Hill, North Carolina.
Received February 3, 2005;
accepted after revision May 3, 2005.
Address correspondence to E. D. Pisano.
Abstract
|
|
|---|
MATERIALS AND METHODS. A total of 130 consecutive cases with calcifications (44 malignant and 86 benign) that had been evaluated with needle or surgical biopsy were collected. Both screen-film mammography and soft-copy digital mammography were obtained in the same patients under existing research protocols using Fischer Imaging's SenoScan (n = 71), Lorad's digital mammography system (n = 35), and GE Healthcare's Senographe 2000D (n = 24). Eight trained radiologists scored all lesionscropped or masked to display just the region of interestboth on screen-film and soft-copy digital mammography with a month between reviews to reduce the effects of learning and memory. A 5-point malignancy scale was used, with 1 as definitely not, 2 as probably not, 3 as possibly, 4 as probably, and 5 as definitely. Reviewers were randomly assigned condition order, and images within each condition were randomly ordered. Repeated measures analysis of variance was used to test for differences between conditions in specificity computed via nonparametric receiver operating characteristic (ROC) study separately for each reviewer and condition.
RESULTS. Across all reviewers, the mean specificity for 1 or 2 versus 3, 4, or 5 was 0.803 for screen-film mammography (range, 0.413-0.938; SD ± 0.166) and 0.833 for soft-copy image (range, 0.375-0.951; SD ± 0.187). Although not statistically significant (Student's t test p values from 0.19 to 0.99 across all cut points), numeric values of specificity were consistently higher for soft-copy versus screen-film mammography. No statistical significance in specificity was seen using all possible cut points in the 5-point scale, although the primary analysis used the cutpoint for differentiation between benign and malignant cases as 1 or 2 versus 3, 4, or 5.
CONCLUSION. No statistically significant difference was shown in specificity achievable using soft-copy digital versus screen-film mammography in this study.
Keywords: breast comparative studies diagnostic radiology digital images mammography observer performance
|
|
|---|
In this study, we investigated using soft-copy display for characterization of lesions from screening mammograms of women with dense breasts. Specifically, the objective of this study was to compare the specificity of screen-film and soft-copy image mammography for microcalcifications and to determine whether a soft-copy image provides improved specificity compared with screen-film mammography.
|
|
|---|
Soft-copy images were cropped so that just the region of interest was shown, and these subimages were displayed, in both craniocaudal and mediolateral oblique views, using high-luminance, high-spatial-resolution monitors of 2,048 x 2,506 pixels (model 1654, Orwin Associates). Screen-film mammograms were masked to show only the same calcifications and were displayed on a light box. An experienced radiologist, who did not participate in the scoring of the images, annotated and masked the films.
Eight trained radiologists, all qualified mammographic interpreters under the U.S. Federal Mammography Quality Standards Act (MQSA), independently participated as interpreters in this study. The interpreters all review mammograms as part of their everyday clinical practice of breast imaging. The performance task was to provide a probability of malignancy for each group of calcifications based on the perceived characteristics of the calcifications from screening mammograms. Calcifications were visible within the images presented for each case. The interpreters were shown only regions of the mammograms containing calcifications so that they could focus solely on the characterization task. Each interpreter scored all lesions, using both soft-copy digital and screen-film mammography. At least a month passed between interpretations in each condition, to reduce the effects of learning and memory. Interpreters were randomly assigned condition order, and images within each condition were randomly ordered within a technique. A 5-point malignancy scale was used, with 1 defined as definitely not malignant, 2 as probably not malignant, 3 as possibly malignant, 4 as probably malignant, and 5 as definitely malignant. The BI-RADS standard scale for likelihood of cancer classification was not used because it is not a continuous scale and this study did not focus on the screening task per se (callback vs no callback). A rating of 1 or 2 on our malignancy scale was called benign; and 3, 4, or 5 was called malignant. The scale enabled the radiologists to give only their estimates of the likelihood of malignancy, allowing them to assume all cases reviewed would undergo biopsy.
The radiologists were able to adjust window and level and to magnify (x2 power) each image interactively for soft-copy image. Appropriate masking of the film and viewbox was used for screen-film mammography. Radiologists were provided with a magnifying glass (x2 power) for screen-film mammography interpretation. No prior films, patient history, or diagnostic views (magnification images or spot radiographs) were provided to the interpreters. A research associate presented the cases to each interpreter in random order.
Repeated measures analysis of variance was used separately for each
interpreter and technique to test for differences in specificity between
conditions computed via nonparametric receiver operating characteristic (ROC)
study. To address our objectives while weighting the type 1 error rate to
reflect the tests of most interest, a traditional stepdown approach to the
repeated measures analysis of variance was conducted with the alpha level
divided as follows. First, we tested for the significance of the cut point by
technique interaction, which asks if the difference between soft-copy and
screen-film mammographies is different across the cut points, at the 0.02
-level. Given a nonsignificant interaction, we then performed stepdown
tests to the main effect test of technique, which tests for the difference
between the soft-copy and the screen-film mammography averaged across the cut
point, at the 0.02
-level. Because the main effect of the cut point was
not of interest, this test was ignored. Of secondary interest, regardless of
the significance of the interaction, were the four separate Student's
t tests comparing the soft-copy and screen-film mammographies at each
cut point. These Student's t tests were essentially the stepdown
tests for a significant technique by cut point interaction, but they were
conducted whether the interaction was significant or not. Each Student's
t test was conducted at the 0.01/4 equal to 0.0025 level. Although we
were interested in the soft-copy image only if it was superior to the
screen-film mammography image, the possibility existed that it would be worse
than the screen-film mammography image. Therefore, all of the tests were
two-sided.
|
|
|---|
Table 1 shows the mean
specificity for each cut point, which is averaged across all interpreters. For
all interpreters, the mean specificity for malignancy scores of 1 or 2 versus
3, 4, or 5 was 0.803 for screen-film mammography (range, 0.413-0.938; SD
± 0.166) and 0.833 for soft-copy digital mammography (range,
0.375-0.951; SD ± 0.187). As a visual aid,
Figure 1 plots the mean
specificities from Table 1
versus cut point by technique. Note that the cut point value 1 refers to the 1
versus
2 split, the cut point value 2 refers to the
2 versus
3 3 split, the cut point value 3 refers to the
3 versus
4
split, and the cut point value 4 refers to the
4 versus 5 split. The
separation of the lines at the second and third cut points shows that the
soft-copy image has a nonsignificantly higher specificity at these points,
which are the two cut points of the most clinical importance.
|
|
The test of the cut point by technique interaction resulted in an F-test statistic of 0.74 with a corresponding p value of 0.57, suggesting that the difference between soft-copy and screen-film mammographies is not statistically different across the cut points. The main effect test of technique resulted in an F-test statistic of 1.93 with a corresponding p value of 0.21, suggesting that the difference between soft-copy and screen-film mammographies averaged across the cut point is not statistically significant. Last, Table 2 shows the results of the four separate Student's t tests comparing screen-film and soft-copy mammographies at each cut point. The lack of significance of these four tests suggests that no statistically significant difference exists between techniques for any of the individual cut points considered, although the primary analysis used the cut point for differentiation between benign and malignant as 1 or 2 versus 3, 4, or 5. Although not statistically significant (Student's t test p values from 0.19 to 0.99 across all cut points), the numeric values of specificity were consistently higher for soft-copy digital mammography than for screen-film mammography (Table 1).
|
Interpreter variability did not significantly affect the results of this study.
|
|
|---|
However, most previous reports have focused on comparing observer performance with screen-film mammography and hardcopy digital mammography. As PACS becomes more universally available, it is expected that mammography will also rapidly convert to soft-copy display. Observer performance in soft-copy display must also be compared with that in screen-film mammography and in hard-copy digital mammography to determine whether digital mammography can completely replace film-based mammography.
The diagnostic accuracy of soft-copy and hard-copy interpretation is likely to be comparable if a high-resolution laser printer and a high-quality workstation with high-spatialand high-contrast-resolution monitors are used [8]. However, when digital mammograms are printed and displayed on laser film, the flexibility of digital imaging is lost because the display parameters must be chosen before printing. All of the available information cannot be optimally displayed in a single presentation. However, soft-copy display is flexible, allowing online contrast manipulation, roaming, and zooming to full resolution.
Our hypothesis was that soft-copy digital mammography provides improved specificity compared with screen-film mammography for the diagnosis of breast calcifications. Although our results were not statistically significant, the numeric values of specificity were consistently higher for soft-copy display than for screen-film mammography, revealing a trend in favor of the soft-copy technique. Recent studies support our results [1, 10]. Kuzmiak et al. [9] reported no statistically significant difference in diagnostic accuracy, including microcalcifications, between magnified screen-film mammography and unmagnified soft-copy digital mammography in breast tissue biopsy specimens.
Many challenges remain as screen-film mammography is converted to soft-copy digital mammography. The interpretation of the soft-copy image may be more time-consuming than that of the hard-copy digital mammography because of the additional digital operations associated with viewing soft-copy images. A recent study also revealed that speed and accuracy of the interpretations of digital mammography using printed-film versus soft-copy display are not significantly different, but soft-copy interpretation is slightly faster [11]. Sensitivity was slightly greater for interpretations with printed film, and specificity was slightly greater for interpretations with soft copy. Interpretation with soft-copy display is likely to be useful with digital mammography and is unlikely to significantly change accuracy or speed. Other challenges are the high costs associated with the required digital infrastructure, data storage and transmission, and developing support for this change from referring clinicians [12].
Despite these disadvantages, the long-term prospects of a filmless environment and soft-copy review seem to be inextricably linked to the future success of digital mammography. For soft-copy image interpretation to become an accepted replacement for screen-film mammography interpretation, its accuracy must be clearly established. Recent studies indicate that no compromise was noted in the diagnostic accuracy of soft-copy digital mammography compared with that of screen-film mammography. Future enhancementssuch as computer-aided detection and diagnosis and other new applications such as telemammography, tomosynthesis, contrast subtraction, and dual energy subtractioncould potentially result in further improvements in diagnostic accuracy.
This study design has a few limitations. The first is the relatively small number of cases it analyzed, thereby making it difficult to establish statistical significance for the various digital systems. The insufficient number of cases from each machine type did not allow comparison of the soft-copy performance of various machine types. In addition, the soft-copy system used was not commercially available, but rather a system that allowed display of images from any digital manufacturer. Another limitation is the exclusion of sensitivity from this study, which would give a more complete performance comparison. However, several articles have previously showed that detection sensitivity of calcifications on digital mammography does not differ significantly from that of screen-film mammography [6, 7, 13-15]. The third limitation is that this study did not include available compression and magnification views. However, because this limitation affected both techniques equally, it should not have altered our results. Finally, we did not study the effects of breast parenchymal pattern, patient age, location of the lesions, or the exact interpretation time on our results. The role of these factors will need to be evaluated in future studies.
In conclusion, no statistically significant difference was shown in specificity achievable using soft-copy versus screen-film mammography in this study.
Acknowledgments
We thank the International Digital Mammography Development Group for their
provision of the cases used in this study.
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |