AJR AJR Reprints & E-prints Available. Order Today!
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sumkin, J. H.
Right arrow Articles by Gur, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sumkin, J. H.
Right arrow Articles by Gur, D.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
AJR 2003; 180:343-346
© American Roentgen Ray Society


Optimal Reference Mammography: A Comparison of Mammograms Obtained 1 and 2 Years Before the Present Examination

Jules H. Sumkin1, Brenda L. Holbert1,2, Jennifer S. Herrmann1, Christiane A. Hakim1, Marie A. Ganott1, William R. Poller1, Ratan Shah1, Lara A. Hardesty1 and David Gur1

1 Department of Radiology, Imaging Research, Ste. 4200, University of Pittsburgh, and Magee-Womens Hospital, 300 Halket St., Pittsburgh, PA 15213.
2 Present address: Department of Radiology, Scott & White Clinic, 2401 S. 31st St., Tempe, TX 76508.

Received April 29, 2002; accepted after revision July 19, 2002.

 
Partially supported by grant CA85242 from the National Cancer Institute, National Institutes of Health.

Address correspondence to J. H. Sumkin.


Abstract
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
OBJECTIVE. We assessed and compared the benefit of using images acquired 1 year or 2 years previously during mammography interpretations.

MATERIALS AND METHODS. Eleven radiologists and one resident reviewed 128 cases three times: once without prior mammograms for comparison, once with mammograms from the most recent (1 year) examination, and once with mammograms acquired 2 years previously. They were asked to determine whether the patient should be recalled for additional procedures. Performances under the three conditions were compared.

RESULTS. Radiologists were significantly more accurate (p < 0.001) when comparison mammograms (obtained 1 or 2 years previously) were available. Although sensitivity was not significantly affected between the availability of mammograms from 1 or 2 years earlier (p > 0.10), the specificity was. Specificity using mammograms from the latest examination (obtained 1 year previously) as a reference was significantly better (p = 0.03) than specificity using mammograms obtained 2 years previously.

CONCLUSION. Comparison mammograms are important for accurate diagnosis—in particular, for increasing specificity. The latest prior examination seems to be the optimal one for this purpose.


Introduction
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Mammography is the most common and effective screening method used for the early detection of breast cancers [1, 2]. As more women comply with the current recommendations for annual screening of women older than 40 years, the total number of mammograms obtained each year is increasing [3]. In light of the large volume of mammographic examinations performed and the low yield of abnormalities in the screening environment, detection of subtle abnormalities (such as masses) surrounded by the complex normal anatomy is both difficult and time-consuming [4, 5]. Because the volume of mammograms to be interpreted is large and the reimbursement rate is low, a highly efficient interpretation process is needed.

In the past decade, a number of investigations into the need for and efficacy of different aspects of mammographic interpretation processes have been performed [6,7,8]. Alternatives include, but are not limited to, the use of one-view mammography, the use of technologists and other physician extenders in the diagnostic process, the use of computer-aided detection to increase sensitivity, and the viewing of previously acquired mammograms for comparison during the interpretation.

As periodic mammographic screening becomes more common, a larger fraction of the screened population has mammograms from previous years available for comparison. Several reports have addressed the effects of comparison mammograms on the diagnostic process; without exception, those authors all found measurable benefits [9,10,11]. Their studies focused on using mammograms that, in most cases, had been acquired during the latest examination without specific restrictions on the time between the current and previous mammography. Consequently, the comparison mammograms may have been acquired anywhere from 1 to 4 years before the current examination. The need for high efficiency in many practices and the operational use of batch interpretations often make it impractical to compare mammograms from more than one previous examination, at least during the initial presentation. In many practices, preplacement (loading) of mammograms on a film multiviewer usually results, de facto, in images from a single reference examination being viewed. This leaves open the question of which reference examination is the best to use for comparison purposes when more than one is available. Anecdotal remarks regarding the possible use of mammograms that were obtained perhaps 2 years before the current examination have frequently been mentioned in this field but, to our knowledge, no previous study has addressed this question. We performed a preliminary three-mode observer performance study in which the interpretation with no comparison mammograms available was compared with interpretation aided by comparison of the current mammograms with those from examinations performed 1 year or 2 years earlier.


Materials and Methods
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
We selected a small subset of cases from a pool of approximately 30,000 annual mammograms performed at the Magee-Womens Hospital of Pittsburgh and affiliated clinics. From these, we selected approximately 350 cases that had at least three or, in most instances, four consecutive annual mammographic examinations and for which follow-up with additional procedures was recommended on the initial interpretation at the third year. A small number of these cases were ultimately biopsied. For patients who were recommended for follow-up and later were found to have negative findings on the basis of special views or sonography, mammography with negative findings the following year served as confirmation of the previous negative finding. Sixty similar subtle cases that were not recommended for follow-up during the third-year examination were selected as possible controls. These cases were later confirmed as being negative for cancer on the basis of mammography the following year. To enable an assessment of the differences between interpretation modes, most selected cases were quite subtle—namely, many changes were not obvious between the prior examination and the one considered "current" for purposes of our study.

An experienced mammographer reviewed this pool of candidate cases and excluded cases for which images were suboptimal because of technical reasons and cases that had one or more specific original images missing from the series. This process was followed by a review of images together with all supporting documentation (e.g., subsequent examinations, pathology reports). Cases that did not meet our selection criteria or for which verification could not be completed were eliminated from the study. In general, cases were excluded from the study if the comparison films were deemed to be of limited or no value to the final disposition of the case. An example of this type of case could be one with numerous benign-appearing cysts that were routinely and repeatedly checked using sonography and whose Breast Imaging Reporting and Data System (BI-RADS) [12] ratings fluctuated from year to year.

We reviewed all selected cases to ensure that the acquisition protocols and the choice of mammograms used did not change in a manner that could alter the results of the study for the periods in question. Changes, or lack thereof, between examinations were determined by evaluating the size and shape of masses and the number and pattern of microcalcifications in clusters.

For the purpose of this study, we did not perform objective quantitative assessments that required densitometric measurements, because the outcome was known (verified) for all cases and most had been recalled independently by a radiologist during the original interpretation. Because we were interested in a varied distribution of cases including confirmed benign and malignant masses and microcalcifications, a large number of mammograms that had been recommended for sonography were eliminated simply because of the high prevalence of such cases in the initial data set. The 128 series of examinations ultimately selected for our study included 30 biopsy-proven malignant findings and 28 biopsy-proven benign findings. In 56 cases, patients had been recalled for special views or sonography; we also included 14 cases as controls (seven interpreted as negative and seven interpreted as positive). Table 1 provides a summary of the distribution of cases used in the study.


View this table:
[in this window]
[in a new window]
 
TABLE 1 Mammography Case Distribution by Type of Original Recommendation for Follow-Up and Verified Outcome

 

The data set was randomized and divided into two subsets of 64 cases. Each set was placed on a film alternator three times with the appropriate prior mammograms (or lack thereof) for the specific mode to be interpreted. Because we wanted a large number of observers to participate in the study, we could not counterbalance modes by observers. Therefore, we used the following mode sequence in the study: 1, 0, 2 and 2, 0, 1. The first set of 64 cases was viewed by all observers using reference mammograms obtained 1 year previously; the second set, without previous mammograms for comparison; the third set, with mammograms obtained 2 years previously; the fourth set, with mammograms obtained 2 years previously; the fifth set, with no reference mammograms; and the sixth (last) set, with mammograms obtained 1 year previously. A minimum of 2 weeks was required between interpretation of one mode and placing of the next mode, but most observers did not view the same set of cases for approximately 2 months. Observers were informed what type of reference images they were being provided (1- or 2-year prior examinations).

Eleven board-certified radiologists and one resident were asked to participate in the study. Ten of the 11 routinely view mammograms (> 1000 cases per year each), and the other is an experienced observer who interpreted mammograms for years (albeit not currently). Each viewed the 128 cases (four views in each examination) three times and was asked to evaluate each breast (left or right) and decide whether he or she would recommend that the patient be recalled for additional procedures (binary decision). All questions were presented on a computerized scoring form, and the responses were entered (using a computer mouse) directly into a database designed specifically for this study. If the recommendation was for a recall, the observer was asked to answer secondary related questions that appeared on the screen, such as the reason for the recommendation for recall.

Observers were not restricted in the amount of time spent reviewing and reporting each case. The individualized sessions varied in duration, and most observers reviewed each mode in two or three sessions. Only after all observers had completed a mode was the next mode mounted on the viewer. The computer recorded the time each case was reviewed and reported. At the completion of the study, observers were asked to subjectively assess which mode (0, 1, or 2) was better for assessing the need for a recall. The overall accuracy by mode was computed, and the results were compared using the repeated measures logistic regression procedure.


Results
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
We analyzed the data by attributing an accurate recommendation for a recall for the appropriate breast (analysis was performed by individual breast rather than case) if a positive case (proven cancer) was recalled or a later verified negative was not recalled. Table 2 provides the fraction (percentage) of accurate recommendations for each mode for all cases and observers. The analysis was repeated for positive cases (sensitivity) and negative cases (specificity). Repeated measures logistic regression clearly shows significant differences (p < 0.001) among the three modes. Sensitivity is not affected significantly by the availability of prior mammograms for comparison (p > 0.10), but specificity is significantly affected (p < 0.001). When pairs of modes are compared using the same methodology, either mode 1 or mode 2 is significantly different from mode 0 (p < 0.001), and mode 1 shows borderline significance when compared with mode 2 (p = 0.06). In all paired comparisons, including the comparison of modes 1 and 2, the results are similar to the overall results in that sensitivity is not significantly affected (p > 0.15) but specificity is (p < 0.05).


View this table:
[in this window]
[in a new window]
 
TABLE 2 Average Performance for All Observers in Correctly Recommending or Not Recommending a Specific Breast for Additional Procedures

 

The percentage of cases that were accurately assessed in each group of breast images with a specific finding during the initial interpretation is shown in Table 3. For all groups, the overall results have consistently higher accuracy when reference cases are used (obtained either 1 or 2 years previously) as compared with mode 0 (no reference). To account for possible biases, we analyzed the data with respect to patient age by dividing the data set into two mutually exclusive subsets, with one group including all patients 60 years old or younger (77 cases) and the second group including all patients older than 60 years (51 cases). The results were not affected by patient age, and no interaction was found between patient age and mode. We then divided the data set into two groups based on whether the women had received hormone replacement therapy within the last year before the "current" examination (61 cases) or they had not (67 cases). No effect of hormone replacement therapy on accuracy was found, nor was there an interaction between hormone replacement therapy use and mode.


View this table:
[in this window]
[in a new window]
 
TABLE 3 Observer Accuracy in Rating Cases with Specific Findings for Each Breast by Mode of Observation

 

The average interpretation time over all observers was 47 ± 11 sec for mode 0, 59 ± 18 sec for mode 1, and 50 ± 13 sec for mode 2. Using the mammograms obtained 1 year previously as reference required more time per case than either of the other two modes (p < 0.05).

Results of the survey to determine observers' mode preferences are provided in Table 4. At the end of the interpreting experiment, observers were clearly divided in their preference for a specific reference examination, with most indicating a preference for the 2-year reference examination rather than the 1-year reference for the identification of several of the abnormalities listed. These subjective assessments are clearly not borne out by the objective measurements.


View this table:
[in this window]
[in a new window]
 
TABLE 4 Radiologists' Preference for Specific Observation Mode by Task

 


Discussion
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
The availability of comparison (reference) mammograms during the interpretation of mammographic examinations is recognized as an important issue [11]. In our study, the comparison examinations contributed to improved specificity rather than sensitivity, thus high-lighting the need to use comparison studies to maintain low recall rates. This finding is in agreement with most previous studies in which the reference examination, regardless of the interval between examinations, was shown to be most useful in establishing that a suspicious area actually has negative findings. Despite the significance attributed to comparing current with previously acquired mammograms, no prior studies, to our knowledge, have been performed to assess what would constitute an optimal reference. The current practice of using the latest available mammograms as the reference is perhaps the result of convenience more than any systematic assessment. We attempted to evaluate which reference examination should be used as the initial one during the interpretation of mammograms. Although interpretation without a reference examination was included in the study (mode 0), the primary question was not whether the availability (or lack thereof) of such comparison is important, but rather whether a reference from an examination obtained 1 year previously is more appropriate than that obtained 2 years previously. Recognizing that much of the value of the reference is related to the task of confirming that a specific feature is stable—hence, that the finding is benign—it is not clear which reference examination can best serve for the assessment of a significant change or lack thereof.

Our results indicate that, despite a subjective preference for the 2-year examination (mode 2), radiologists performed better with the 1-year examination (mode 1). We emphasize that accuracy in this study is defined as the ability to correctly identify either cases that should have been recalled (or followed up) for additional procedures and that were ultimately determined to be positive, or those that should not have been recalled and ultimately were verified as negative. Therefore, accuracy is directly related to both positive and negative predictive values. The performance levels found in this study for specific findings, ranging from 37% to 60% in the case of positive interpretations (Table 3), are not unexpected in this set of subtle cases.

The reasons for the subjective preference for the 2-year examination as a reference are not clear, but perhaps change, if present, would be expected to be easier and more accurately assessed when the interval between the two examinations is longer. This speculation was not the finding during our objective evaluation. Perhaps physical changes (such as body weight) during 2 years, potential hormonal effects (hormone replacement therapy), and the likelihood that some benign findings may change in radiographic appearance, as well, over a period of 2 years result in the mammograms obtained 1 year previously being a better reference overall.

In an electronic environment, it may become practical to efficiently view images from more than one prior examination; however, this is not currently the case in most clinical settings.

We anticipate that the practice of comparing a current examination with only one prior examination will remain common for a while. On the basis of the preliminary results from our study, mammograms obtained 1 year previously, when available, should be the ones to use as a reference.


References
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 

  1. Miller AB. Mammography: reviewing the evidence—epidemiology aspects. Can Fam Physician 1993;39:85 -90[Medline]
  2. Smith RA. Breast cancer screening among women younger than age 50: a current assessment of the issues. CA Cancer J Clin 2000;50:312 -336[Abstract]
  3. Feig SA, D'Orsi CJ, Hendrick RE, et al. American College of Radiology guidelines for breast cancer screening. AJR 1998;171:29 -33[Free Full Text]
  4. Bird RE, Wallace TW, Yankaskas BC. Analysis of cancers missed at screening mammography. Radiology 1992;184:613 -617[Abstract/Free Full Text]
  5. Thurfjell EL, Lernevall KA, Taube AS. Benefit of independent double reading in a population-based mammography screening program. Radiology 1994;191:241 -244[Abstract/Free Full Text]
  6. Sickles EA. Findings at mammographic screening on only one standard projection: outcomes analysis. Radiology 1998;208:471 -475[Abstract/Free Full Text]
  7. Tonita JM, Hillis JP, Lim CH. Medical radiologic technologist review: effects on a population-based breast cancer screening program. Radiology 1999;211:529 -533[Abstract/Free Full Text]
  8. Freer TW, Ulissey MJ. Screening mammography with computer-aided detection: prospective study of 12,860 patients in a community breast center. Radiology 2001;220:781 -786[Abstract/Free Full Text]
  9. Thurfjell MG, Vitak B, Azavedo E, Svane G, Thurfjell E. Effect of sensitivity and specificity of mamography screening with or without comparison of old films. Acta Radiol 2000;41:52 -56[Medline]
  10. Callaway MP, Boggis CR, Astley SA, Hutt I. The influence of previous films on screening mammographic interpretation and detection of breast carcinoma. Clin Radiol 1997;52:527 -529[Medline]
  11. Bassett LW, Shayestehfar B, Hirbawi I. Obtaining previous mammograms for comparison: usefulness and costs. AJR 1994;163:1083 -1086[Abstract/Free Full Text]
  12. American College of Radiology. Breast imaging reporting and data system (BI-RADS), 3rd ed. Reston, VA: American College of Radiology, 1998

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
RadiologyHome page
L. Hadjiiski, B. Sahiner, M. A. Helvie, H.-P. Chan, M. A. Roubidoux, C. Paramagul, C. Blane, N. Petrick, J. Bailey, K. Klein, et al.
Breast Masses: Computer-aided Diagnosis with Serial Mammograms
Radiology, August 1, 2006; 240(2): 343 - 356.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
D. B. Kopans, J. H. Sumkin, and D. Gur
Older Is Better
Am. J. Roentgenol., August 1, 2003; 181(2): 593 - 594.
[Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
F. M. Hall, J. H. Sumkin, and D. Gur
Optimal Interval for Comparison Mammograms
Am. J. Roentgenol., August 1, 2003; 181(2): 594 - 594.
[Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
R. J. Brenner, J. H. Sumkin, and D. Gur
Prior Mammograms: How Old Is Old?
Am. J. Roentgenol., August 1, 2003; 181 (2): 594 - 595.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sumkin, J. H.
Right arrow Articles by Gur, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sumkin, J. H.
Right arrow Articles by Gur, D.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS