|
|
||||||||
Original Research |
1 Department of Family and Community Medicine, University of California, Davis,
4860 Y St., Ste 2300, Sacramento, CA 95817.
2 Disease Control and Vector Biology Unit, Department of Infectious &
Tropical Diseases, London School of Hygiene & Tropical Medicine, London,
England.
3 Department of Family & Community Medicine, Dartmouth Medical School,
Lebanon, NH.
4 Department of Biostatistics, University of Alabama at Birmingham, Birmingham,
AL.
5 Breast Imaging Center, The Emory Clinic, Atlanta, GA.
6 Department of Radiology, University of California, San Francisco, San
Francisco, CA.
7 Division of General Internal Medicine, University of Washington, Harborview
Medical Center, Seattle, WA.
8 Group Health Cooperative, Center for Health Studies, Seattle, WA.
9 Present address: Applied Research Program, National Cancer Institute,
Bethesda, MD.
10 Cancer Research and Biostatistics, Seattle, WA.
11 Northwestern Memorial Hospital, Lynn Sage Breast Cancer Center, Chicago,
IL.
Received March 15, 2005;
accepted after revision May 25, 2005.
Supported by the Agency for Health Research and Quality and the National
Cancer Institute (grants HS10591, U01 CA63731, 5 U01 CA 63736-09, and 1 U01
CA86082-01).
Abstract
|
|
|---|
SUBJECTS AND METHODS. Radiologists (n = 105) who routinely interpret screening mammograms in three states (Washington, Colorado, and New Hampshire) completed a mailed survey in 2001. Radiologists were asked to estimate how frequently they recommended additional diagnostic testing after screening mammography and the positive predictive value of their recommendations for biopsy (PPV2). We then used outcomes from 336,128 screening mammography examinations interpreted by the radiologists from 1998 to 2001 to ascertain their true rates of recommendations for diagnostic testing and PPV2.
RESULTS. Radiologists' self-reported rate of recommending immediate additional imaging (11.1%) exceeded their actual rate (9.1%) (mean difference, 1.9%; 95% confidence interval [CI], 0.9-3.0%). The mean self-reported rate of recommending short-interval follow-up was 6.2%; the true rate was 1.8% (mean difference, 4.3%; 95% CI, 3.6-5.1%). Similarly, the mean self-reported and true rates of recommending immediate biopsy or surgical evaluation were 3.2% and 0.6%, respectively (mean difference, 2.6%; 95% CI, 1.8-3.4%). Conversely, radiologists' mean self-reported PPV2 (18.3%) was significantly less than their mean true PPV2 (27.6%) (mean difference, -9.3%; 95% CI, -12.4% to -6.2%).
CONCLUSION. Despite regular performance feedback, community radiologists may overestimate their true rates of recommending further evaluation after screening mammography and underestimate their true positive predictive value.
Keywords: breast cancer breast imaging mammography
|
|
|---|
Although most clinicians receive little or no feedback regarding their clinical performance, many radiologists who interpret screening mammograms receive regular feedback regarding their interpretive performance. Enforced by the U.S. Food and Drug Administration (FDA), the federal Mammography Quality Standards Act (MQSA) of 1992 requires mammography facilities to collect data on cancer outcomes of women who receive recommendations for biopsy from facility radiologists [4]. The explicit goal of the act's audit requirements is to assist facilities in quality assurance and improvement efforts, although facilities are not obligated to use collected data for these purposes. Nevertheless, the FDA encourages facilities to monitor a range of mammography outcomes and to communicate audit results to radiologists [4]. Many facilities now deliver regular performance feedback to mammographers, including common metrics such as positive predictive value ("biopsy yield") and the overall proportion of women recalled for additional imaging and evaluation ("recall rate").
In 1999, the FDA began enforcing audits at the level of the individual radiologist, but little is known about how radiologists use and interpret performance feedback. We sought to determine whether mammographers who have received regular feedback about biopsy yield and recall rate can estimate their true performance in these domains accurately.
|
|
|---|
Radiologist Survey
A committee of mammography experts and community radiologists developed a
mail survey instrument, which included questions regarding radiologist
demographics, experience in mammography, and frequency of various mammography
recommendations. Sequential revision of the survey instrument was guided by
extensive pilot testing with community mammographers. One three-part question
asked radiologists to estimate the percentage of screening mammography
examinations they interpreted for which they recommended immediate additional
imaging (i.e., sonogram, diagnostic mammogram views); short-interval follow-up
(i.e., follow-up mammogram in 3-6 months); or immediate biopsy or surgical
evaluation. The subsequent question asked respondents to estimate the positive
predictive value of their biopsy recommendations: "Among women whose
screening mammograms you recall for additional workup and then recommend for
biopsy, what percent do you think turn out to have breast cancer within one
year of the screening mammogram?"
Surveys were mailed with informed consent materials, and response was encouraged by telephone follow-up when necessary. Completed survey data were double-entered into a relational database at each site.
Mammography Data
Radiologist survey data were linked with computerized mammography data for
bilateral screening mammograms interpreted by responding radiologists from
1998 to 2001. Included mammograms were designated "routine
screening" by the interpreting radiologists and were performed on women
older than 40 years without a history of breast cancer and without breast
implants. Individual mammogram records contained the date of examination and
the BI-RADS assessment and recommendations
[6]. Within consortium
facilities, radiologists recorded BI-RADS assessment separately and
independently from their follow-up recommendation
[7]. Thus, in addition to a
BI-RADS assessment category, each mammogram includes one of the following
recommendations: normal interval follow-up, immediate additional imaging,
short-interval follow-up, and immediate biopsy or surgical evaluation. Breast
cancer outcomes within each registry are ascertained by regular linkage with
regional cancer registries. After encrypting identifiers, mammography data
were sent via file transfer protocol for central data analysis in Seattle,
WA.
Definitions of Actual Recommendation Rates and Positive Predictive Value
For each radiologist, we determined the proportion of screening mammograms
with the following recommendations (1998-2001): immediate additional imaging,
short-interval follow-up, and immediate biopsy or surgical evaluation. Because
our survey question asked radiologists about the positive predictive value of
their biopsy recommendations (PPV2), we defined a screening
mammogram as positive if it contained a recommendation for immediate biopsy
and had a BIRADS assessment of 3, 0, 4, or 5. Although a departure from
BI-RADS, radiologists occasionally recommend biopsy alongside a BI-RADS
assessment of 3 or 0 [7], so we
initially included mammograms with these BI-RADS assessments. We calculated
the positive predictive value of a biopsy recommendation (PPV2) for
each radiologist from 1998 to 2001 as the proportion of women who were
diagnosed with breast cancer (including ductal carcinoma in situ) within 1
year of an initial positive screening mammogram. We obtained essentially
identical PPV2 estimates after including only mammograms with
biopsy recommendations and a BI-RADS assessment of 4 or 5, so we report here
the results of the initial calculation.
Data Analyses
We first performed analyses to assure the absence of time trends in
recommendation rates or PPV2 during the study period. Generalized
estimating equations were used to examine the association between recall rate
(the probability of a BI-RADS assessment of 0, 4, 5, or 3 with a
recommendation for immediate follow-up) and screening year
[8]. The association between
PPV2 and year was investigated by selecting screenings with a
BI-RADS assessment of 3, 0, 4, or 5 with biopsy recommendation and fitting a
similar model in which the outcome was the probability of a cancer diagnosis
during follow-up. Both models included the year of screening mammogram as the
main covariate of interest (1998, 1999, 2000, or 2001), adjusted for
mammography registry, and accounted for the correlation within a radiologist
using an independent correlation structure. We found no statistically
significant association between study year and either recall rate or
PPV2.
We first computed the mean differences (and 95% confidence intervals [CIs]) between radiologists' self-reported and actual rates of each recommendation and PPV2. Mean recommendation rates were weighted by the number of screening mammograms interpreted by each radiologist during the study period; means of PPV2 were weighted by the total number of positive mammograms with biopsy recommendations for each radiologist. We used weighted means so that results would not be unduly affected by radiologists who made relatively few recommendations during the study period. General linear models were used to study whether these mean differences were associated with radiologist characteristics (e.g., demographics, academic affiliation, breast imaging experience). Lastly, we tested whether statistically significant correlation existed between the radiologists' self-reported and actual recommendation rates and PPV2. Statistical tests were two-sided with an alpha level of 0.05.
|
|
|---|
|
Actual Versus Perceived Performance
Radiologists' mean perceived rate of recommending immediate additional
imaging (11.1%) was slightly higher than their actual mean rate (9.1%) (mean
difference, 1.9%; 95% CI, 0.9%-3.0%) (Table
2). However, radiologists' perceived rate of recommending
short-interval follow-up exceeded the actual rate by threefold, and
radiologists' perceived recommendation rate for immediate biopsy or surgical
evaluation exceeded the actual rate by fivefold
(Table 2). In contrast, the
mean perceived PPV2 was significantly less than the actual
PPV2 (mean difference, -9.3%; 95% CI, -12.4% to -6.2%).
|
In general, radiologists overestimated their recall rates and underestimated their PPV2, regardless of their demographic characteristics, full-time versus part-time status, academic affiliation, experience in breast imaging, or recent volume of mammography (data not shown). In bivariate analyses, radiologists with a primary academic affiliation, who were fellowship trained in breast imaging, or who interpreted a lower volume of mammograms, overestimated their rate of recommending immediate additional imaging to a significantly greater degree (p < 0.05) (data not shown).
A moderate correlation was found between radiologists' perceived and actual rates of recommendations for immediate additional imaging (r = 0.36, p < 0.001; Fig. 1). Radiologists' perceived and actual recommendations for short-interval follow-up were similarly correlated (r = 0.41, p < 0.001). Perceived and actual rates of recommendation for immediate biopsy or surgical evaluation were weakly correlated (r = 0.17, p = 0.09). No significant correlation was found between radiologists' perceived and actual PPV2 (r = 0.10, p = 0.31; Fig. 2).
|
|
|
|
|---|
One explanation for our findings is that most radiologists simply do not review or remember the results from their past outcome audits. Although little is known about how radiologists use data from outcome audits to modify their interpretive practice, studies in other clinical settings suggest that feedback has the greatest effect on the minority of clinicians who deviate substantially from the practice norm and has a comparatively small effect on most clinicians who may view themselves as within the norm [9]. Similarly, most radiologists might judge from audit reports whether their interpretive performance is within the norm, and if so, they may quickly forget their audit results. If radiologists use audit reports in this manner, the principal effect of the MQSA audit requirements may be the encouragement of normative interpretive behavior among U.S. mammographers.
Radiologists overestimated the frequency with which they recommend further evaluation after screening mammography and underestimated their PPV2. Together, these findings suggest that radiologists in the study tended to believe their false-positive rate is higher than it actually is. In other words, the radiologists tended to underestimate their specificity. Why might U.S. mammographers overestimate their true false-positive rate? Recall rates are known to be higher in the United States compared with programs in other countries, which may be partly attributable to fears of malpractice among U.S. mammographers [10, 11]. Indeed, media reports [12, 13] have emphasized that the relatively high recall rate in the United States has not substantially increased the cancer detection rate compared with screening programs in other countries. Mammographers in our sample may have developed an exaggerated impression of their own false-positive rate if their self-perceptions were influenced by media reports suggesting that U.S. recall rates are unnecessarily high.
Our study has several important limitations. First, we compared self-reported recommendation rates and PPV2 in 2001 to actual rates computed from 1998 to 2001, which allowed more precise estimates of individual radiologist performance. Although we found no evidence of temporal trends in recommendation rates or PPV2 during the study period, it is possible that recall rate or PPV2 could have changed over time for individual radiologists. In the absence of temporal trends across our study population, we nevertheless are confident in the validity of our principal findings. Second, although we piloted our mail survey extensively to reduce the likelihood of misinterpretation, some radiologists could have misunderstood the survey questions regarding follow-up recommendations or PPV2. Finally, although the radiologists in the study may not be representative of the entire U.S. population of mammographers, our sample includes both community-based and academic radiologists practicing in diverse facilities in three distinct U.S. geographic regions.
We conclude that the radiologists within three U.S. mammography registries tended to overestimate their true frequency of recommending further evaluation after screening mammography. The same radiologists tend to underestimate their PPV2, despite receiving at least annual feedback from their facilities regarding these specific aspects of their interpretive performance. Our findings suggest that many radiologists may not state accurately the results of outcome audits previously reported to them. This calls into question the potential value of future federal regulations that might require reporting of specific outcomes as a means of feedback to encourage improved clinical performance. Research is needed to characterize how radiologists interpret and use feedback to modify their interpretive practice.
Acknowledgments
We appreciate the dedication of the participating radiologists and project
support staff.
|
|
|---|
This article has been cited by other articles:
![]() |
H. Singh, S. Sethi, M. Raber, and L. A. Petersen Errors in Cancer Diagnosis: Current Understanding and Future Directions J. Clin. Oncol., November 1, 2007; 25(31): 5009 - 5018. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |