|
|
||||||||
Original Research |
1 Breast Care Center, University of Wisconsin Medical School, 600 Highland Ave.,
Rm. G3/101, Madison, WI 53792-1804.
2 The University of Iowa Hospitals and Clinics, Department of Radiology, Iowa
City, IA.
3 Department of Statistics and Department of Biostatistics & Medical
Informatics, University of Wisconsin, Madison, WI.
Received June 17, 2004;
accepted after revision November 10, 2004.
E. S. Burnside is supported by the General Electric Research in Radiology
Academic Fellowship and the Judith Stitt Award sponsored by the Wisconsin
Women's Health Foundation.
Abstract
|
|
|---|
MATERIALS AND METHODS. We analyzed recall rate, cancer detection, minimal cancer detection, detection of low-stage cancer, and tumor size from consecutive screening mammography examinations from October 2001 to July 2003. The initial 7,984 mammograms were interpreted in the midst of a busy breast imaging practice. Although these studies were not read online, the interpretations were often interrupted for telephone calls, procedures, and diagnostic mammograms. The remaining 1,538 studies were interpreted after the institution of dedicated uninterrupted batch reading.
RESULTS. Recall rates were 20.1% before and 16.2% after the introduction of batch reading (p < 0.001). Cancer detection rates were not significantly different: 5.6 cancers were detected per 1,000 examinations without and 7.2 were detected per 1,000 with batch reading. Prognostic factors for breast cancers diagnosed between these groups also were not significantly different. Of the screening-detected cancers diagnosed before batch reading, minimal cancers comprised 67% and low-stage cancers accounted for 76%. Of the cancers diagnosed using batch reading, 73% were minimal and 91% were low stage. The mean size of cancers, 11.7 mm without batch reading and 9.1 mm with batch reading, also showed no statistically significant difference.
CONCLUSION. Our experience shows that batch reading can significantly reduce screening mammography recall rates without affecting the cancer detection rate or the proportion of cancers diagnosed with favorable prognostic indicators.
|
|
|---|
Interpretation techniques, such as correctly identifying summation artifacts and ignoring findings of doubtful significance, can decrease the number of recalls in a screening population [15]. Developing this expertise may be easier said than done and likely also relates to experience. Finally, practice routines, such as carefully comparing screening examinations to prior studies, can significantly reduce false-positive interpretations [16-19].
Because false-positive interpretations contribute significantly to patient anxiety and increased health care costs, investigators have attempted to determine the optimal level of recall [5, 20, 21]. Yankaskas et al. [22] proposed that a recall rate of between 4.9% and 5.5% realizes the best trade-off between sensitivity and positive predictive value and reported that above this level, there was no correlation between recall and cancer detection rates. In contrast, Gur et al. [23] found that among 10 trained radiologists recall rates did correlate with cancer detection rates at higher levels of recallthat is, above 10%. Despite the controversy, published data continue to document the difficulty of achieving a low recall rate in the United States while preserving a high cancer detection rate [7].
The Mammography Quality Standards Act (MQSA), established to standardize mammography interpretation in the United States, does not appear to have decreased false-positive interpretation rates significantly. Recently, authors and policy makers have considered the possibility of an accreditation procedure that would only allow radiologists with accuracy above a certain threshold to interpret mammography. Beam et al. [24] found that this type of proscriptive policy for screening mammography would severely limit access. Therefore, we set out to prove that batch reading is another strategy that can be applied to improve recall rates.
Methods used to interpret screening mammograms can be broadly categorized as, first, dedicated batch reading; second, nonbatch reading offline; and third, nonbatch reading online. Dedicated batch reading requires an uninterrupted block of time designated to interpret a group of screening mammograms in succession. Nonbatch reading offline refers to interpreting screening mammograms in the midst of other duties such as diagnostic mammography or procedures after the patient has left the premises. Nonbatch reading online entails interpreting mammograms with similar interruptions while the patient waits for her results.
It seems intuitive that dedicated batch reading in a quiet, distraction-free environment would improve performance. This style of interpretation has been advocated by many experts, but adoption has been limited. In 1994, Houn and Brown [25] discovered in a survey of 1,057 facilities that only 20% used batch interpretation.
Two groups have previously studied the impact of different practice styles on the efficacy of screening. In both studies, performance parameters of screening mammograms interpreted when the patient was present (online) were compared to those interpreted after the patient had left the screening site (offline). The first group of investigators found that online interpretation resulted in higher recall rates without higher cancer detection [26]. The second study showed a higher recall rate and a higher positive predictive value of biopsy recommendation in those patients read online (Paramagul C et al., presented at the 2003 annual meeting of the American Roentgen Ray Society). Both groups recognized the possibility of selection bias, as those patients read online were self-selected. In contrast, our study was designed to answer an unexplored question: whether a transition from nonbatch to batch reading can improve performance in the context of all cases read "offline."
Before instituting an effort to improve practice, it is important to determine specific benchmarks for optimal performance. Practice guidelines for screening mammography are available from several sources including the American College of Radiology (ACR) and the Agency for Healthcare Research and Quality (AHRQ) [6, 27]. Importantly, the literature emphasizes that there must be a step-wise approach to improving performance. It is imperative to first document a sufficient cancer detection rate and prove that the cancers diagnosed at screening have favorable prognostic characteristics before attempting to decrease false-positive interpretations [28].
|
|
|---|
Facilities
Our breast imaging practice provides interpretation services to three
fixed-site mammography facilities. The first is a breast care center
integrated with our institution's Comprehensive Cancer Center. The breast care
center houses breast imaging and breast surgery and oncology clinics. At this
facility, a full range of breast imaging and interventional services is
provided including screening and diagnostic mammography, galactography,
sonography, breast MRI, percutaneous imaging-guided biopsy, and needle
localization. Both digital mammography equipment and analog mammography
equipment are used. Residents rotate through the breast care center site
monthly and preview as many screening mammograms as possible.
Multidisciplinary breast conferences are held at this site once a week. The
second and third sites are outreach clinics that perform only screening
examinations. One of these clinics performs only analog mammography, and the
other performs both digital and analog examinations.
At all sites, screening examinations are performed without radiologist oversight. The technologist obtains the routine screening views, checks them for technical quality, and then discharges the woman from the facility. The films are then given to the dedicated breast care center file room person. This individual makes a diligent attempt to retrieve previous examinations, which are used for comparison with the current study. The file room staff then hangs the films on a dedicated mammography alternator along with prior examinations (if available) to be interpreted by the radiologist.
In the first phase of our study, from October 1, 2001, to February 15, 2003, the radiologists used the nonbatch offline method of interpretation, interpreting screening examinations in free moments between other scheduled examinations that required radiologist oversight. No consistent dedicated time was available for interpreting the screening examinations. The interpretation of screening mammograms was routinely interrupted by telephone calls, diagnostic mammography, interventional procedures, and other activities throughout the day.
In the second phase of our study, from February 16, 2003, to July 30, 2003, dedicated time was provided for uninterrupted batch reading offline. In addition, the telephones in the reading room where screening examinations were interpreted were changed to lines for outgoing calls only. The staff members at the breast care center (where mammography interpretation occurred) understood and supported the program of distraction-free batch reading of screening mammography. After July 30, 2003, our practice adopted structured reporting. Because this change in practice may influence the recall rate and thus confound the effect of batch reading in isolation, the study was ended at this point.
Interpreting Radiologists
Five radiologists interpreting images from October 2001 to July 2003 were
included in this study. We included only radiologists who interpreted at least
100 screening cases before and after the introduction of batch reading. Five
radiologists who interpreted screening mammography during the time of the
study were excluded because they interpreted predominantly before or after the
introduction of batch reading and comparison of their performance between the
two sessions was not possible. All of the radiologists included were
board-certified (by the American Board of Radiology) and fulfilled the MQSA
requirements in terms of volume of studies interpreted per year and continuing
medical education. Two radiologists are fellowship-trained in breast imaging,
two practice predominantly in other subspecialties, and one is a general
radiologist.
Reading conditions before and after batch reading did not change. The same alternators were used for mammography interpretation and ambient light was meticulously minimized throughout the study. The same mammography equipment was used to perform screening mammography for the studies done before and after the adoption of batch reading. Rotating residents participated equally between the study groups. The interpretation of screening mammography was generally done in the morning because it was uniformly the radiologists' preference. Approximately 20-40 cases were evaluated per day, but fluctuations in volume were common in both groups.
Several changes in practice occurred during the time of this study including the adoption of digital mammography (May 2002) and computer-assisted detection (CAD) (analog in October 2002 and digital in April 2003). We performed subset analysis to control for these possible confounding variables.
Data Collection and Analysis
Data from these screening examinations were collected in a computerized
database linked to a Web-based electronic medical record. The mammography
report was dictated in free text, but the BI-RADS category was recorded in
structured form to enable auditing of this information
[30]. Screening examinations
were considered abnormal if either breast was assessed as BI-RADS category 0
(incomplete, need additional imaging), category 4 (suspicious), and category 5
(highly suggestive of malignancy). For all screening mammograms ultimately
requiring a tissue diagnosis, we routinely determine the outcome through our
biopsy database or through the referring clinician if the patient was cared
for outside our institution. For biopsies resulting in a diagnosis of
malignancy (ductal carcinoma in situ or any invasive carcinoma), we also
recorded the lesion size, axillary nodal status, and stage (based on the
American Joint Committee on Cancer staging system)
[31].
To measure performance, we recorded the number of low-stage cancers and minimal cancers in each group. Any cancer judged to be stage 0 or I was considered low stage. Ductal carcinoma in situ and invasive cancers less than 1 cm were considered minimal cancers. These data were aggregated into the desired timeframes and de-identified before analysis. Our institutional review board determined that this retrospective study was exempt from requiring informed consent. From these data, we determined outcomes, including abnormal interpretation rate, (i.e., recall rate), cancer detection rate, the proportion of detected cancers that were low stage, the proportion that were minimal cancers, and tumor size, to judge the performance of our breast cancer screening program before and after batch reading began.
We used S-PLUS programming software (Insightful) for data calculations and statistical analysis. The Student's t test was used for comparison of data having a normal distribution. The chi-square test was used to compare proportional data. A p value of less than 0.05 was considered statistically significant.
|
|
|---|
|
When all cases were pooled, the recall rate was 20.1% before and 16.2% after the introduction of batch reading (p < 0.001). Neither prognostic factors for breast cancers diagnosed between these groups (Table 2) nor cancer detection rates (Table 3) were significantly different. Because pooling the results of multiple radiologists may lead to an erroneous interpretation of the data (i.e., volume differences between physicians with a high recall rate and those with a low recall rate between the prebatch and batch reading groups might influence our results), we also analyzed each radiologist's performance in Table 3. These numbers indicate that each physician's recall rate decreased after the institution of batch reading. In fact, the radiologists with fellowship training in breast imaging improved to a statistically significant degree. Further, to control for differential volumes between reviewers, we used the change in recall and cancer detection rates between prebatch and batch reading phases to test the significance of the differences in Table 4. This analysis, which weights each reviewer equally to eliminate the effect of volume, shows a statistically significant average improvement after batch reading of 3.8%.
|
|
|
We also analyzed the cancer detection rate for each individual physician to avoid the pitfalls of pooling data. Table 3 also confirms that most (all but one) of the radiologists had an improved cancer detection rate after the institution of batch reading. Table 4 quantifies the average improvement as 2.1 cancers per 1,000 screening mammograms. Although this is not a statistically significant improvement, it confirms that the cancer detection rate did not decrease with the improvement in recall rate.
We evaluated the effect of batch reading on analog and digital screening mammography separately (Table 5). Of all the screening examinations performed at our institution, 76.3% were analog and 23.7% were digital. The improvement in recall rate for both analog and digital studies was statistically significant.
|
|
|
|---|
We found that the introduction of batch reading did in fact help each of the radiologists in our practice decrease their individual recall rates. This improvement was statistically significant for the group and for each of the two fellowship-trained breast imagers. Fellowship-trained breast imagers may be more profoundly affected by distractions in a setting where batch reading is not performed. The fellowship-trained breast imagers, it turns out, were trained using batch reading for screening examinations. When these breast imagers find themselves in a setting rife with distraction, false-positive rates increase. Of note, the fellowship-trained breast imagers maintained a high cancer detection rate throughout the study. By contrast, individuals accustomed to interpreting screening examinations in the midst of other clinical activities may benefit less from batch reading or the improvements may take longer. Verifying this theory of differential effects of batch reading based on training or prior experience will require further study with a larger patient population.
|
|
It is important to emphasize that we did not randomize patients between the study groups. For this reason, we showed that the patient populations were not significantly different in terms of age, family history of breast cancer, and available comparison in order to confirm that they were unbiased. We also must consider changes in our practice other than batch reading when weighing the validity of our results. Digital mammography was added to the clinical practice in May 2002. In theory, this change might have caused an increased recall rate initially in the prebatch group, with a subsequent decreased recall rate in the batch group as radiologists gain experience. Two facts mitigate the likelihood that the digital learning curve is responsible for our results. First, our analysis of analog mammography alone shows a statistically significant decrease in recall rate with batch reading. Second, a separate analysis of digital mammography before batch reading shows that recall rates did not start to markedly improve until batch reading began (Fig. 1). Though the coincident decrease in digital recall rate and the adoption of batch reading is suggestive of a causal effect, we cannot eliminate the possibility that a learning curve may have contributed to this improvement. Therefore, although we have shown a definite improvement in recall rate for analog mammography, the effect of batch reading on digital mammography needs further clarification. Hopefully, randomized prospective studies with larger samples of digital mammograms will elucidate the value of batch reading for this technique.
CAD was added in October 2002 for analog images and April 2003 for digital images. During the 4-month period between introduction of analog CAD and the introduction of batch reading, a recall rate of 19.8% was recorded, slightly less than the prior 12 months of the study. Therefore, CAD did not inflate the recall rate before institution of batch reading. In addition, it is highly unlikely that CAD played a role in decreasing the recall rate during these or subsequent months because available evidence clearly establishes that recall rates are increased or unaffected by CAD [32, 33]. The institution of digital CAD during the batch reading trial period would more likely increase recall rates in this group, thereby attenuating our results. Given these considerations, we believe that the changes in our practice outside the scope of the batch reading trial should not significantly alter our conclusions. It would be valuable to conduct either a randomized prospective or a controlled retrospective analysis, despite the logistic difficulties involved in such an undertaking. Such a trial could evaluate the effect batch reading has on the recall rates of radiologists of variable training and experience, of certain types of findings and not others, and of digital versus analog mammography.
Another challenge we faced was the small size of our batch reading cohort related to decreased volume and the short time frame of this phase of our study. The reason for the smaller number of studies in the batch reading group was twofold. First, an experienced physician was hired in February 2003 to interpret only screening examinations in an effort to decrease recall rates and to aid in staffing. This hire decreased the volume interpreted by the physicians included in the study. Second, we chose to terminate the study when structured reporting was instituted. The structured reporting software changed the way we collected patient history, thus making it impossible to confirm the similarity of the patient population between the prebatch and batch groups. In addition, because the effect of structured reporting on recall rate is unknown, we decided it was prudent to end our data collection at this point. Despite the small size of our batch reading cohort, we were able to show a statistically significant improvement in recall rate while maintaining our cancer detection rate. In fact, these results have continued beyond the discontinuation of our study, as shown in Figure 2. The recall rate has continued to drop for the physicians in the study and for the practice as a whole. Cancer detection rates have remained stable.
Finally, our recall rates are markedly higher than the guidelines set by the AHRQ and those achieved in other countries [7, 27]. This questionably limits the generalizability of our study to practices with significantly lower recall rates. We believe that recall rates between 10% and 20% are not rare in the United States because many articles published in peer-review journals describe recall rates similar to ours [23, 33, 34]. Therefore, we believe our findings are valuable.
Although the advantages of batch reading include decreased false-positive interpretations without a decrease in cancers detected, there are some trade-offs inherent in instituting this type of program. One disadvantage of batch reading is a slight delay in the interpretation of screening mammography. Same-day interpretation is often not possible because the films must be batched and hung by designated personnel. In general, most would concur that a short delay in screening mammography, a nonurgent study, is well worth the benefit of improved accuracy. Another disadvantage of batch reading is that the breast imaging physician is not available at all times for consultation or interpretation of other studies. Given the shortage of subspecialty mammographers, this can be a significant sacrifice for patients and referring physicians. Given the results of this study, we believe optimizing screening mammography performance should be the priority.
Many obstacles to decreasing the recall rates from screening mammography still remain in this country. One such obstacle is the threat of legal liability for a failure to diagnose breast cancer, which may increase the rate of false-positive interpretations of screening mammograms. In this project, we have shown that batch reading in a quiet, uninterrupted manner can decrease this tendency without lowering the cancer detection rate. As opposed to other solutions, such as tort reform or a proscriptive system of accreditation that is extremely difficult or impractical to impose, batch reading is a relatively easy solution to implement. In addition, given the evidence that recall rates likely correlate with cancer detection rates (i.e., as recall rate decreases so does cancer detection rate), a method that avoids this trade-off is extremely valuable [23]. Batch reading may allow reviewers to elevate their receiver operator characteristic curve rather than simply change the threshold at which they operate (i.e., improve both sensitivity and specificity without trading off between the two) [8, 23].
Our results indicate that batch reading decreases abnormal interpretation rates while maintaining cancer detection. Although proving that batch reading improves performance may not be a surprising result, it has not been documented before. Despite the intuitive nature of our findings, batch reading is not uniformly used. We hope this evidence will prompt further study. We believe that, despite the pressures of modern practice, interpreting radiologists should be supported to adopt quiet, distraction-free batch reading for all screening mammograms.
Acknowledgments
We thank Edward A. Sickles, MD, whose mentorship on study design was an
invaluable resource. Thanks also to Tonia Feiner for assistance with database
queries.
|
|
|---|
This article has been cited by other articles:
![]() |
C. M. Shaw, F. L. Flanagan, H. M. Fenlon, and M. M. McNicholas Consensus Review of Discordant Findings Maximizes Cancer Detection Rate in Double-Reader Screening Mammography: Irish National Breast Screening Program Experience Radiology, February 1, 2009; 250(2): 354 - 362. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Harvey, B. T. Nicholson, and M. A. Cohen Finding Early Invasive Breast Cancers: A Practical Approach Radiology, July 1, 2008; 248(1): 61 - 76. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Skaane, S. Hofvind, and A. Skjennald Randomized Trial of Screen-Film versus Full-Field Digital Mammography with Soft-Copy Reading in Population-based Screening Program: Follow-up and Final Results of Oslo II Study Radiology, September 1, 2007; 244(3): 708 - 717. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |