|
|
||||||||
1 Department of Radiology, Imaging Research, Ste. 4200, University of
Pittsburgh, and Magee-Womens Hospital, 300 Halket St., Pittsburgh, PA
15213.
2 Present address: Department of Radiology, Scott & White Clinic, 2401 S.
31st St., Tempe, TX 76508.
Received April 29, 2002;
accepted after revision July 19, 2002.
Partially supported by grant CA85242 from the National Cancer Institute,
National Institutes of Health.
Abstract
|
|
|---|
MATERIALS AND METHODS. Eleven radiologists and one resident reviewed 128 cases three times: once without prior mammograms for comparison, once with mammograms from the most recent (1 year) examination, and once with mammograms acquired 2 years previously. They were asked to determine whether the patient should be recalled for additional procedures. Performances under the three conditions were compared.
RESULTS. Radiologists were significantly more accurate (p < 0.001) when comparison mammograms (obtained 1 or 2 years previously) were available. Although sensitivity was not significantly affected between the availability of mammograms from 1 or 2 years earlier (p > 0.10), the specificity was. Specificity using mammograms from the latest examination (obtained 1 year previously) as a reference was significantly better (p = 0.03) than specificity using mammograms obtained 2 years previously.
CONCLUSION. Comparison mammograms are important for accurate diagnosisin particular, for increasing specificity. The latest prior examination seems to be the optimal one for this purpose.
|
|
|---|
In the past decade, a number of investigations into the need for and efficacy of different aspects of mammographic interpretation processes have been performed [6,7,8]. Alternatives include, but are not limited to, the use of one-view mammography, the use of technologists and other physician extenders in the diagnostic process, the use of computer-aided detection to increase sensitivity, and the viewing of previously acquired mammograms for comparison during the interpretation.
As periodic mammographic screening becomes more common, a larger fraction of the screened population has mammograms from previous years available for comparison. Several reports have addressed the effects of comparison mammograms on the diagnostic process; without exception, those authors all found measurable benefits [9,10,11]. Their studies focused on using mammograms that, in most cases, had been acquired during the latest examination without specific restrictions on the time between the current and previous mammography. Consequently, the comparison mammograms may have been acquired anywhere from 1 to 4 years before the current examination. The need for high efficiency in many practices and the operational use of batch interpretations often make it impractical to compare mammograms from more than one previous examination, at least during the initial presentation. In many practices, preplacement (loading) of mammograms on a film multiviewer usually results, de facto, in images from a single reference examination being viewed. This leaves open the question of which reference examination is the best to use for comparison purposes when more than one is available. Anecdotal remarks regarding the possible use of mammograms that were obtained perhaps 2 years before the current examination have frequently been mentioned in this field but, to our knowledge, no previous study has addressed this question. We performed a preliminary three-mode observer performance study in which the interpretation with no comparison mammograms available was compared with interpretation aided by comparison of the current mammograms with those from examinations performed 1 year or 2 years earlier.
|
|
|---|
An experienced mammographer reviewed this pool of candidate cases and excluded cases for which images were suboptimal because of technical reasons and cases that had one or more specific original images missing from the series. This process was followed by a review of images together with all supporting documentation (e.g., subsequent examinations, pathology reports). Cases that did not meet our selection criteria or for which verification could not be completed were eliminated from the study. In general, cases were excluded from the study if the comparison films were deemed to be of limited or no value to the final disposition of the case. An example of this type of case could be one with numerous benign-appearing cysts that were routinely and repeatedly checked using sonography and whose Breast Imaging Reporting and Data System (BI-RADS) [12] ratings fluctuated from year to year.
We reviewed all selected cases to ensure that the acquisition protocols and the choice of mammograms used did not change in a manner that could alter the results of the study for the periods in question. Changes, or lack thereof, between examinations were determined by evaluating the size and shape of masses and the number and pattern of microcalcifications in clusters.
For the purpose of this study, we did not perform objective quantitative assessments that required densitometric measurements, because the outcome was known (verified) for all cases and most had been recalled independently by a radiologist during the original interpretation. Because we were interested in a varied distribution of cases including confirmed benign and malignant masses and microcalcifications, a large number of mammograms that had been recommended for sonography were eliminated simply because of the high prevalence of such cases in the initial data set. The 128 series of examinations ultimately selected for our study included 30 biopsy-proven malignant findings and 28 biopsy-proven benign findings. In 56 cases, patients had been recalled for special views or sonography; we also included 14 cases as controls (seven interpreted as negative and seven interpreted as positive). Table 1 provides a summary of the distribution of cases used in the study.
|
The data set was randomized and divided into two subsets of 64 cases. Each set was placed on a film alternator three times with the appropriate prior mammograms (or lack thereof) for the specific mode to be interpreted. Because we wanted a large number of observers to participate in the study, we could not counterbalance modes by observers. Therefore, we used the following mode sequence in the study: 1, 0, 2 and 2, 0, 1. The first set of 64 cases was viewed by all observers using reference mammograms obtained 1 year previously; the second set, without previous mammograms for comparison; the third set, with mammograms obtained 2 years previously; the fourth set, with mammograms obtained 2 years previously; the fifth set, with no reference mammograms; and the sixth (last) set, with mammograms obtained 1 year previously. A minimum of 2 weeks was required between interpretation of one mode and placing of the next mode, but most observers did not view the same set of cases for approximately 2 months. Observers were informed what type of reference images they were being provided (1- or 2-year prior examinations).
Eleven board-certified radiologists and one resident were asked to participate in the study. Ten of the 11 routinely view mammograms (> 1000 cases per year each), and the other is an experienced observer who interpreted mammograms for years (albeit not currently). Each viewed the 128 cases (four views in each examination) three times and was asked to evaluate each breast (left or right) and decide whether he or she would recommend that the patient be recalled for additional procedures (binary decision). All questions were presented on a computerized scoring form, and the responses were entered (using a computer mouse) directly into a database designed specifically for this study. If the recommendation was for a recall, the observer was asked to answer secondary related questions that appeared on the screen, such as the reason for the recommendation for recall.
Observers were not restricted in the amount of time spent reviewing and reporting each case. The individualized sessions varied in duration, and most observers reviewed each mode in two or three sessions. Only after all observers had completed a mode was the next mode mounted on the viewer. The computer recorded the time each case was reviewed and reported. At the completion of the study, observers were asked to subjectively assess which mode (0, 1, or 2) was better for assessing the need for a recall. The overall accuracy by mode was computed, and the results were compared using the repeated measures logistic regression procedure.
|
|
|---|
|
The percentage of cases that were accurately assessed in each group of breast images with a specific finding during the initial interpretation is shown in Table 3. For all groups, the overall results have consistently higher accuracy when reference cases are used (obtained either 1 or 2 years previously) as compared with mode 0 (no reference). To account for possible biases, we analyzed the data with respect to patient age by dividing the data set into two mutually exclusive subsets, with one group including all patients 60 years old or younger (77 cases) and the second group including all patients older than 60 years (51 cases). The results were not affected by patient age, and no interaction was found between patient age and mode. We then divided the data set into two groups based on whether the women had received hormone replacement therapy within the last year before the "current" examination (61 cases) or they had not (67 cases). No effect of hormone replacement therapy on accuracy was found, nor was there an interaction between hormone replacement therapy use and mode.
|
The average interpretation time over all observers was 47 ± 11 sec for mode 0, 59 ± 18 sec for mode 1, and 50 ± 13 sec for mode 2. Using the mammograms obtained 1 year previously as reference required more time per case than either of the other two modes (p < 0.05).
Results of the survey to determine observers' mode preferences are provided in Table 4. At the end of the interpreting experiment, observers were clearly divided in their preference for a specific reference examination, with most indicating a preference for the 2-year reference examination rather than the 1-year reference for the identification of several of the abnormalities listed. These subjective assessments are clearly not borne out by the objective measurements.
|
|
|
|---|
Our results indicate that, despite a subjective preference for the 2-year examination (mode 2), radiologists performed better with the 1-year examination (mode 1). We emphasize that accuracy in this study is defined as the ability to correctly identify either cases that should have been recalled (or followed up) for additional procedures and that were ultimately determined to be positive, or those that should not have been recalled and ultimately were verified as negative. Therefore, accuracy is directly related to both positive and negative predictive values. The performance levels found in this study for specific findings, ranging from 37% to 60% in the case of positive interpretations (Table 3), are not unexpected in this set of subtle cases.
The reasons for the subjective preference for the 2-year examination as a reference are not clear, but perhaps change, if present, would be expected to be easier and more accurately assessed when the interval between the two examinations is longer. This speculation was not the finding during our objective evaluation. Perhaps physical changes (such as body weight) during 2 years, potential hormonal effects (hormone replacement therapy), and the likelihood that some benign findings may change in radiographic appearance, as well, over a period of 2 years result in the mammograms obtained 1 year previously being a better reference overall.
In an electronic environment, it may become practical to efficiently view images from more than one prior examination; however, this is not currently the case in most clinical settings.
We anticipate that the practice of comparing a current examination with only one prior examination will remain common for a while. On the basis of the preliminary results from our study, mammograms obtained 1 year previously, when available, should be the ones to use as a reference.
|
|
|---|
This article has been cited by other articles:
![]() |
L. Hadjiiski, B. Sahiner, M. A. Helvie, H.-P. Chan, M. A. Roubidoux, C. Paramagul, C. Blane, N. Petrick, J. Bailey, K. Klein, et al. Breast Masses: Computer-aided Diagnosis with Serial Mammograms Radiology, August 1, 2006; 240(2): 343 - 356. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. B. Kopans, J. H. Sumkin, and D. Gur Older Is Better Am. J. Roentgenol., August 1, 2003; 181(2): 593 - 594. [Full Text] [PDF] |
||||
![]() |
F. M. Hall, J. H. Sumkin, and D. Gur Optimal Interval for Comparison Mammograms Am. J. Roentgenol., August 1, 2003; 181(2): 594 - 594. [Full Text] [PDF] |
||||
![]() |
R. J. Brenner, J. H. Sumkin, and D. Gur Prior Mammograms: How Old Is Old? Am. J. Roentgenol., August 1, 2003; 181 (2): 594 - 595. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |