|
|
||||||||
Original Research |
1 Department of Radiology, University of California, San Francisco Medical
Center, Box 1667, San Francisco, CA 94143.
2 Department of Radiology, Breast Health Center, California Pacific Medical
Center, San Francisco, CA 94118.
3 Western Radiology Associates, Seattle, WA 98133.
Received September 6, 2005;
accepted after revision December 7, 2005.
Address correspondence to J. W. T. Leung
(Jessica.Leung{at}ucsfmedctr.org).
Abstract
|
|
|---|
MATERIALS AND METHODS. A retrospective observational study was conducted with data prospectively collected over a 5-year period in a community hospital-based practice in which 106,405 screening and 52,149 diagnostic mammograms were performed. The performance of three radiologists specializing in breast imaging was compared with that of six general radiologists. The following data were extracted and analyzed: recall rate, biopsy recommendation rate, and cancer detection rate. Statistical analysis was performed with a chi-square test and two-tailed calculation of p values.
RESULTS. The recall rates of the specialists and generalists were nearly the same at 6.5% and 6.7%, respectively. The biopsy recommendation rate at recall from screening examinations was nearly the same for generalists and specialists (1.2% and 1.1%, respectively; p = 0.4504). This rate also was similar for diagnostic examinations (8.5% for generalists; 8.4% for specialists; p = 0.4086). The cancer detection rate in the screening setting was slightly higher for specialists than for generalists: 2.5 and 2.0 cancers per 1,000 cases, respectively (p = 0.0614). The cancer detection rate in the diagnostic setting was 24.2% higher among specialists (20.0 cancers per 1,000 cases) compared with generalists (16.1 cancers per 1,000 cases) (p = 0.0177).
CONCLUSION. The only statistically significant difference between generalists and specialists was in cancer detection rate among patients undergoing diagnostic mammography. No statistically significant difference was identified between the two groups in terms of recall rate, biopsy recommendation rate, or percentage of favorable-prognosis cases of cancer detected. There was a trend toward greater cancer detection by specialists in the screening setting.
Keywords: biopsy breast breast cancer mammography screening
|
|
|---|
Radiologists vary in mammographic interpretations and subsequent management recommendations [6-8]. Sickles et al. [9] in 2002 presented the performance parameters for screening and diagnostic mammography in their academic breast imaging practice. In particular, they examined the performance differences between the radiologists in their practice who specialized in breast imaging (i.e., specialists) versus those who were general radiologists interpreting mammograms (i.e., generalists). They found striking differences between the two groups. Compared with the generalists, the specialists had lower recall rates, recommended more biopsies, and detected more cases of cancer and more cases of early-stage cancer. It is unknown whether these results hold true in other practices in other settings. Therefore, we sought to examine data from a community practice in the same city for comparison with published results for an academic practice in a comparable time period.
|
|
|---|
In the 5-year study period, 106,405 screening and 52,149 diagnostic mammograms were obtained. All mammograms were screen-film images, and all mammography units and technologists and the facility were accredited in accordance with the Mammography Quality Standards Act. Neither computer-assisted detection nor second review was used. A screening examination was defined as craniocaudal and mediolateral oblique mammograms of each breast of a woman who did not have symptoms. These examinations were performed in the absence of a radiologist and were batch-interpreted at a later time.
Indications for diagnostic mammography included additional imaging after recall for a finding identified on screening mammography; short-term follow-up for a probably benign mammographic finding (BI-RADS [13] assessment category 3); evaluation of breast problems, such as a palpable abnormality, nipple discharge, and focal breast pain; follow-up after benign findings at image-guided breast biopsy; and evaluation of some patients without symptoms, such as those with multiple cysts who had undergone several screening recalls for fluctuating masses or asymmetric findings. For such patients, diagnostic rather than screening studies were recommended to avoid the anxiety, inconvenience, and expense of multiple future screening recall visits. Patients with cancer treated with breast conservation were also routinely referred for diagnostic mammography. At diagnostic mammography, craniocaudal and mediolateral oblique views were reviewed by a radiologist. Additional mammographic views and sonograms were obtained as needed for complete evaluation.
The practice was composed of nine radiologists. All performed mammography and breast sonography. None of the radiologists had received fellowship training in breast imaging. All nine radiologists were board-certified and satisfied Mammography Quality Standards Act requirements for number of mammograms interpreted and number of continuing medical education (CME) hours obtained.
Three of the nine radiologists interpreted most of the diagnostic mammograms and breast sonograms and performed all of the image-guided core biopsies (Table 1). In the 5-year study period, 1,589 image-guided core biopsies were performed: 573 with stereotactic guidance and 1,016 with sonographic guidance. The radiologist performing the core biopsy ensured concordance between imaging and pathologic findings in each case. These three radiologists also spent more of their practice time performing mammography, had more CME in mammography, and (on average) had more years of experience in mammography (Table 1). These three radiologists were considered specialists for this study, and the other six radiologists were designated generalists.
|
Outcome data were collected prospectively and entered into a computer database for practice audit purposes and to satisfy Mammography Quality Standards Act requirements. Management recommendations were based on assessment categories defined by BI-RADS [13]. We extracted the following data for this study: number of screening and diagnostic mammograms interpreted by each radiologist during the 5-year study period, number of screening examinations in which recall imaging was recommended (BI-RADS assessment category 0 [incomplete, need additional imaging assessment]), number of diagnostic examinations in which biopsy was recommended (BI-RADS assessment categories 4 [suspicious] or 5 [highly suggestive of malignancy]), and number of cancers diagnosed on the basis of findings at either core biopsy or surgical excision. Cancer was defined as invasive carcinoma or ductal carcinoma in situ. The following indicators of favorable prognosis were extracted for malignant results: number of in situ cancers, number of minimal cancers, and number of cancers with negative axillary node findings. Minimal cancer was defined as ductal carcinoma in situ or invasive carcinoma 10 mm or less in diameter [14].
BI-RADS category 0 assessments were made only on screening mammography. Recall rate for each radiologist was defined as the number of BI-RADS category 0 assessments divided by the total number of screening examinations interpreted. Hence recall rate for each radiologist was calculated for screening mammography only. Biopsy recommendations were recorded at the screening recall visit and linked both to the radiologist generating the screening callback and to the radiologist interpreting the diagnostic examination at the recall visit. A database containing information on all breast tissue removed at surgical excision provided statistics on in situ and invasive malignant tumors, lesion size, and axillary nodal status. In determination of tumor size, reexcisions were not included. This information was linked to both the radiologist reviewing the initial screening mammogram and to the radiologist interpreting the mammogram at the diagnostic visit.
All data were entered into and analyzed with an Excel spreadsheet
(Microsoft). Statistical analysis was performed by chi square analysis and
two-tailed calculation of p values with the GraphPad program
(GraphPad Software). Differences were considered statistically significant at
p
0.05.
|
|
|---|
All 1,589 image-guided core biopsies were performed by radiologist A, B, or C, who reviewed the pathologic results to ensure adequate lesion sampling, imaging-pathology concordance, and appropriate management recommendations. For these reasons, radiologists A-C were considered breast imaging specialists and radiologists D-I, generalists.
Recall Rate
The recall rates of the specialists and generalists were nearly the same at
6.5% and 6.7%, respectively (Table
2).
|
Biopsy Recommendation Rate
The biopsy recommendation rate for screening examinations (defined as
number of BI-RADS category 4 or 5 interpretations at the recall visit divided
by the number of screening examinations interpreted) was nearly the same for
generalists and specialists (1.2% and 1.1%, respectively, p = 0.4504)
(Table 2). This finding was
also true for diagnostic examinations, defined as number of BI-RADS category 4
or 5 interpretations divided by the number of diagnostic examinations
interpreted (8.5% for generalists; 8.4% for specialists, p = 0.4086)
(Table 2). The biopsy
recommendation rate for screening patients was substantially lower than that
for diagnostic patients (p < 0.0001), as expected given the
differences in these patient populations
[15,
16].
Cancer Yield and Cancer Prognostic Features
The cancer detection rate at screening for specialists was slightly higher
than that for generalists: 2.5 and 2.0 cancers per 1,000 cases, respectively
(p = 0.0614) (Table
3). This trend reached statistical significance in the diagnostic
setting, in which the rate of cancer detection among the specialists (20.0
cancers per 1,000 cases) was 24.2% higher than that for the generalists (16.1
cancers per 1,000 cases) (p = 0.0177). As expected, the cancer
detection rate was higher in the diagnostic population than in the screening
population [15,
16].
|
There was no substantial difference between the specialists and the generalists in percentage of favorable-prognosis cancers detected, determined on the basis of the number of in situ, minimal, and node-negative cancers (Table 4). Overall, both groups identified more early-stage, favorable-prognosis cancers in the screening population than in the diagnostic population.
|
|
|
|---|
The factors influencing performance of breast imagers are multiple and complex. A key determinant is the training and experience of the interpreting radiologist, including years of mammographic experience, number of CME hours, and number of examinations interpreted [1-5]. The association between mammogram volume and recall rate was studied by Smith-Bindman et al. [4], who found that radiologists in the United States, compared with radiologists in the United Kingdom, interpreted fewer mammograms annually and had a higher recall rate. Specific training requirements have been established for interpretation of mammograms in the United States. The Mammography Quality Standards Reauthorization Act of 1998 requires radiologists to interpret 960 mammograms per 2-year period and to accumulate 15 mammographic CME hours every 3 years.
In a single-institution study, Sickles et al. [9] analyzed the performance parameters for mammographic interpretation among radiologists who specialized in breast imaging (specialists) and those who interpreted mammograms but did not specialize in breast imaging (generalists). In their academic practice, the specialists had substantially more initial training in mammography and interpreted 10 times more mammograms than the generalists. The specialists detected more cancers and more early-stage cancers. They also had a higher biopsy recommendation rate and a lower screening recall rate than did the general radiologists.
The data of Sickles et al. [9] were reported from a tertiary care referral academic institution. Furthermore, as noted by Guenin [17], one of the three specialists in that study was an internationally renowned breast imager with decades of experience in interpreting mammograms and teaching mammography and in conducting mammographic research. The other two specialists were fellowship trained by the expert. Therefore it remains to be established whether conclusions drawn from such a highly specialized setting can be generalized to other practices. Each mammographic examination in the academic practice described by Sickles et al. was reviewed by at least two and as many as four radiologists [9, 18]. This depth of review is not possible in most practices. We undertook our study to determine whether a significant performance difference between specialists and generalists would be found in a setting more reflective of breast imaging practice in the United States. Because there are no standardized performance guidelines for either screening or diagnostic mammography, we compared our community hospital-based private practice performance with the Mammography Quality Standards Act goals derived from other academic practices [10-12].
We specifically wanted to compare our results with those of Sickles et al. [9], because we practiced in the same city during the period in which their data were collected. We realized from the onset that exact and precise comparison between the two practices would be impossible. The proportion of practice time devoted to mammography by generalists in the practice of Sickles et al. was not stated, and there were additional immeasurable differences between the generalists and the specialists in the study by Sickles et al. and those in our study. We also recognized that academic and private practices differ in both patient populations and practice characteristics. Nevertheless, we believed that analysis of comparable data generated by radiologists at each institution would be of value in assessing mammographic practice patterns.
Our data were prospectively collected from a single breast imaging practice over a 5-year period. The large numbers of both screening and diagnostic mammograms interpreted allowed our results to achieve statistical significance. Although none was fellowship trained in breast imaging, each radiologist had satisfied or exceeded the requirements of the Mammography Quality Standards Act, and all had interpreted mammograms for more than 15 years. Periodic practice audit reviews provided individual performance data for each radiologist as a quality assurance exchange.
The specialists, as differentiated from the generalists, devoted a greater percentage of practice time to mammography and had more CME hours in breast imaging. More important, the specialists interpreted most (84.2%) of the diagnostic mammographic examinations and performed all of the imaging-guided core biopsies. This experience provided the specialists, but not the generalists, with the opportunity to correlate imaging and pathologic findings. This feedback process would be expected to improve diagnostic interpretation, as suggested by Guenin [17]. Furthermore, radiologist A, who performed mammography exclusively, conducted periodic reviews of breast interventions, discordant results, and pertinent articles from the breast imaging literature with radiologists B and C but not with radiologists D-I.
In our study, the specialists detected more cancers than the generalists, but the two groups were similar in other respects. Compared with generalists, the specialists detected 25.0% more cancers in the screening setting (p = 0.0614) and 24.2% more cancers at diagnostic visits (p = 0.0177). This difference is substantially smaller than that reported by Sickles et al. [9], in whose practice specialists had 76.5% and 61.2% higher cancer detection rates than the generalists on screening and diagnostic mammography, respectively.
Our generalists and specialists had nearly identical recall rates (6.7% and 6.5%, respectively) at screening mammography. A high recall rate adds to the costs of a screening mammography program [19]. Financial costs are incurred when additional tests are performed. Patient anxiety increases when unnecessary imaging or interventions are recommended. In this study, both generalists and specialists were sufficiently confident of their interpretative skills to avoid excessive recalls. In the study by Sickles et al. [9], the recall rate of the specialists was 4.9%, 30% lower than the 7.1% recall rate of the generalists. In our practice, these rates were nearly equal. The recall rates among both generalists and specialists in both studies were within the desirable goal of a recall rate less than 10% recommended by BI-RADS [13].
The definition of specialist versus generalist varies in individual practices. Experience in interpreting breast imaging examinations may be the most important factor [1-5]. We and Sickles et al. [9] used this criterion in defining a breast imaging specialist. Number of examinations interpreted and number of CME hours acquired were comparable for the specialists in our community practice and the specialists in the academic practice of Sickles et al. The specialists in our private practice had no fellowship training in breast imaging. This difference was the main one between radiologists defined as specialists by Sickles et al. and the specialists in our practice. It is uncertain whether this difference contributed to the variation in cancer detection rates between our study and that of Sickles et al.
The cancer detection rates for both screening and diagnostic mammography were higher in the study reported by Sickles et al. [9] than in our study. This finding was true for both screening and diagnostic examinations and for interpretations by both generalists and specialists. We believe that different patient populations account for most of this difference. In contrast to our community hospital-based private practice, the practice of Sickles et al. was an academic referral center that may have been preselected for patients at higher risk of breast cancer or more complex medical histories. In other words, there may have been selection bias in the population of patients at an academic center so that there was a higher pretest probability of malignancy. This difference would result in a higher cancer detection rate at tertiary academic centers for both screening and diagnostic mammography. Furthermore, each mammographic examination in the practice of Sickles et al. was reviewed by more than one and as many as four radiologists, a practice shown to increase the cancer detection rate [20].
Several characteristics of our practice would be expected to influence our rate of cancer detection. More than 87% of women in our practice had undergone screening mammography and had previous mammograms for comparison. In the report by Sickles et al. [9], the number was approximately 55%. Therefore our detection rates are more reflective of the incidence rather than the prevalence of breast cancer. In addition, 44.1% of our patients were younger than 50 years, and 33.4% were of Asian descent. These populations are known to have lower rates of breast cancer [21].
Both the generalists and the specialists in our practice achieved the following desirable goals for screening mammography [10-12] recognized by BI-RADS [13]: recall rate less than 10%, detection of 2-10 cancers per 1,000 screenings, more than 30% minimal cancers among cancers found, and fewer than 25% cancers with positive nodes. These desirable goals were achieved in our practice by both specialists and generalists. We believe that many years of mammographic experience and quality assurance feedback contributed to the performance of generalists. Although they did not perform the biopsies, generalists participated in periodic case reviews and analysis of radiologist-specific audit data.
Among the generalists, radiologist I interpreted substantially fewer mammograms and had the fewest CME hours in mammography. This radiologist also had the highest recall rate (10.9%) (Table 2) and the lowest cancer detection rate on diagnostic mammography (10.2%) (Table 3). These rates were 62.7% higher and 36.6% lower than the mean recall rate and diagnostic mammographic cancer detection rate among the generalists. This performance was outside the range of the other generalists and was likely related to the smaller number of examinations interpreted. The other five generalists had recall rates within a relatively narrow range of 4.8% to 7.2%.
The more complex cases were equally distributed among the generalists and specialists in our practice. The clinical schedule in our practice required the radiologist on service to interpret any examination performed, regardless of level of complexity. Hence we believe that this potential variable did not exist in our practice and did not affect our data.
One limitation of this study was that for each radiologist, we did not analyze patient age, family history, or availability of previous mammograms for comparison. We therefore could not ensure that the patients whose mammograms were interpreted by each radiologist were comparable in those respects. It was reasonable, however, to assume that these factors were randomly distributed among the nearly 160,000 examinations performed over 5 years. Another limitation was that we collected data only on biopsies recommended rather than biopsies performed. Nevertheless, there is a nearly 1:1 ratio for this variable in our practice, as we found in an evaluation of our biopsy experience [22]. Ongoing audits have established that more than 90% of our recommended biopsies were performed at our institution, with results available for correlation with imaging findings.
Our data from one high-volume community practice were carefully collected for many years. We expect that the data reflect those of most of the similar community-based practices in the United States. The results reported from highly specialized tertiary care referral academic institutions are not comparable with our results, and it may not be reasonable or realistic to expect them to be comparable with those of other practices such as ours.
We believe that general radiologists in community practice with sufficient training, experience, and quality improvement programs should achieve results comparable with those we report. Reports from community practices are published infrequently. It is important to report data from a variety of practice settings in attempts to establish practice guidelines and desirable goals. We hope that other community practices will report their outcome data in an effort to enlarge published experience in this important area of diagnostic imaging.
Acknowledgments
The authors thank Lori Greene and Bob Timlin for their assistance in data
collection and Yelena Borodina and Estella Liu for their help in manuscript
preparation.
|
|
|---|
Related articles in AJR:
This article has been cited by other articles:
![]() |
C. M. Shaw, F. L. Flanagan, H. M. Fenlon, and M. M. McNicholas Consensus Review of Discordant Findings Maximizes Cancer Detection Rate in Double-Reader Screening Mammography: Irish National Breast Screening Program Experience Radiology, February 1, 2009; 250(2): 354 - 362. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Miglioretti, R. Smith-Bindman, L. Abraham, R. J. Brenner, P. A. Carney, E. J. A. Bowles, D. S. M. Buist, and J. G. Elmore Radiologist Characteristics Associated With Interpretive Performance of Diagnostic Mammography J Natl Cancer Inst, December 19, 2007; 99(24): 1854 - 1863. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |