|
|
||||||||
1 Department of Radiology, University of Washington, Seattle Cancer Care
Alliance, 825 Eastlake Ave., G4-830, Seattle, WA 98109-1023.
2 Department of Orthopedics, University of Washington, Harborview Medical
Center, 325 Ninth Ave., Box 359798, Seattle, WA 98104-2499.
3 Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., Seattle, WA
98109-1024.
Received November 27, 2000;
accepted after revision January 9, 2002.
Supported by grant 1 R01 CA63146-01 from the National Cancer Institute,
National Institutes of Health.
Abstract
|
|
|---|
MATERIALS AND METHODS. The study comprised the interpretations of 82,620 consecutive screening mammograms by 18 radiologists between January 1, 1995, and December 31, 1998. For all mammograms, assessment categories and recommendations were compared to determine whether they were in accordance with BI-RADS guidelines. Overall patterns of discordance were analyzed, and comparisons of discordant patterns by assessment category, patient age, breast density, and year of examination were made.
RESULTS. The overall discordance between BI-RADS assessments and recommendations was low (3%). The assessment with the highest discordance was "probably benign finding" (category 3), at 53.5%. Mammograms obtained in 1998 were almost half as likely to have assessmentrecommendation discordance compared with those obtained in 1995 (2.4% vs 4.5%, respectively; odds ratio = 0.52; p < 0.001). Mammograms of women with dense breast tissue were 30% more likely to have lesions assigned discordant assessments and recommendations compared with those of women with fatty tissue (3.4% vs 2.7%, respectively; odds ratio = 1.3; p < 0.001). No differences in the patterns of discordance were found between mammograms of women younger than 50 years and those of women 50 years old and older (p = 0.10).
CONCLUSION. There has been improvement in the accurate application of BI-RADS since its introduction. However, variation in the pairing of BI-RADS assessments and recommendations persists. Continued efforts to educate radiologists about the use of BI-RADS and to clarify BI-RADS terms would promote maximum consistency in this use of this reporting method.
|
|
|---|
The BI-RADS lexicon provides a dictionary of terms to use when describing lesions seen on mammography. In addition to these definitions, BI-RADS provides specific final assessment categories and the associated recommendations to be assigned to each mammogram. The standardized assessment categories used to describe findings on mammography include "need additional imaging evaluation" (category 0), "negative" (category 1), "benign finding" (category 2), "probably benign finding" (category 3), "suspicious abnormality" (category 4), and "highly suggestive of a malignancy" (category 5). Recommendations that are appropriate for each of the six assessment categories are provided by BI-RADS.
Early studies indicated the BI-RADS lexicon not only helped standardize the language used in interpreting mammograms, but also could improve the sensitivity and specificity of mammographic interpretations [3, 4]. In a study of five radiologists using BI-RADS lesion descriptors for masses and calcifications, moderate inter- and intraobserver agreements were found in the interpretations of 60 mammograms [5]. Three subsequent independent reports indicated the BI-RADS lesion descriptors and assessment categories may improve the positive predictive value of mammographic interpretations [6,7,8]. Since its inception, BI-RADS has become widely used in both academic and private practice settings, and certain elements of BI-RADS are required for certification through the Mammography Quality Standards Act [9].
To date, a report about the use of the BI-RADS assessment categories and the associated recommendations in community practice has not been published. As data-bases from various community practices are pooled, final assessment categories 1 through 5 are used to calculate overall positive predictive values, sensitivity, and specificity. However, whether these assessment categories are used in a consistent manner in community practices is unclear. It is possible that the same assessment categories are associated with different recommendations for follow-up. The purpose of this study was to evaluate the use of the BI-RADS lexicon by community radiologists over time by determining the concordance of assessment categories with recommendations assigned to screening mammograms.
|
|
|---|
The data in this study were limited to MTR facilities that use the BI-RADS coding scheme and that reported both assessments and recommendations. Mammographic interpretations came from 18 radiologists in three facilities. In order that the data represent the range of practice patterns represented in the MTR, the data were randomly sampled so that no facility contributed more than 50% and no radiologist contributed more than 10% of the total examinations to the data set. Facilities included in the analyses contributed data between January 1, 1995, and December 31, 1998. Examinations were designated as diagnostic if the facility recorded the examination type as diagnostic or if a lump or bloody nipple discharge was reported. Examinations were considered screening if the examination type was reported as screening and if no specific symptoms or clinical findings were reported. Only screening mammograms were included in this study.
For each examination, the assessment category and recommendation were compared to determine whether the association was in accordance with the published guidelines of the American College of Radiology BI-RADS manual [1]. Not all the recommendations listed in the BI-RADS manual are specifically listed in the MTR database. For example, although BI-RADS specifically lists magnification views, spot compression views, and spot magnification views, and additional projections as possible recommendations associated with category 0, each of these specific variables is not included in the MTR database. Instead, the generic variable "additional views" is used in the MTR database. Table 1 lists concordant assessmentrecommendation pairings for BI-RADS and the corresponding recommendations used in the MTR database. In summary, for a lesion described as BI-RADS category 0, the recommendation of needs additional imaging evaluation or comparison with prior mammograms was considered concordant. For findings characterized as BI-RADS category 1 or 2, either a recommendation of normal-interval follow-up or no specific recommendation was accepted as concordant. For BI-RADS category 3 lesions, short-term follow-up must have been recommended to be considered concordant. For BI-RADS categories 4 and 5, a recommendation for tissue sampling or surgical consultation must have been made.
|
Of the 82,620 mammograms used for this analysis, 588 (0.7%) were assigned two or more recommendations. Most of the examinations with multiple recommendations (79.3%) had an assessment of category 0 in addition to another assessment. For mammograms with two or more recommendations, if one of the recommendations was concordant with the assessment, then that recommendation was used in our analysis and the other recommendation was dropped. If none of the recommendations was concordant, we prioritized the recommendations on the basis of severity and selected the most severe recommendation for our analysis.
We divided the analysis set into groups to examine discordant patterns. Patterns of discordance were compared by assessment category, patient age, breast density, and year of examination. Examination results for women younger than 50 years old and 50 years old and older, women with dense breast tissue versus those with fatty breast tissue, and those examined in 1995-1996 versus those examined in 1997-1998 were compared. Examinations that reported heterogeneously or extremely dense breast tissue were classified as dense, whereas examinations that reported breast tissue as entirely fat or scattered fibroglandular densities were classified as fatty.
Results of the recommendations by BI-RADS assessment category were put into
tabular form for comparison. Discordance values for each assessment and
recommendation were compared as a whole group and as several subgroups. To
test whether discordance was significantly different among subgroups, we
compared the proportions of overall discordance for patient age (<50 years
vs
50 years), year of examination (1995-1996 vs 1997-1998), and breast
density (fatty vs dense) using the Pearson's correlation coeffecient and the
chi-square test. An odds ratio was produced to further illustrate the
magnitude of the difference. In our study, the crude odds ratio estimates the
odds or likelihood that an assessment and recommendation for a given breast
density, patient age, or year of examination would be discordant. Confidence
intervals were based on the standard error of the coefficient and normal
approximation. To analyze the trends of discordance over time, we analyzed the
year of examination as a continuous variable and used logistic regression.
|
|
|---|
|
Of the cases with an assessment of suspicious abnormality (category 4), 84.5% had a concordant recommendation of either a biopsy or an appropriate action. The remaining 15.5% of these cases had a discordant recommendation including 7.9% with a recommendation for additional imaging; 2.6%, for sonography; 2.6%, for normal-interval follow-up; and 1.3%, for short-interval follow-up. Of the cases assessed as highly suggestive of malignancy (category 5), 90.6% had a concordant recommendation. However, 7.6% of the mammograms assessed as highly suggestive of malignancy were associated with a recommendation for normal-interval follow-up. A total of 18 cases had an assessment of BI-RADS category 4 or 5, but these cases had been assigned the recommendation of normal-interval follow-up.
Tables 3 and 4 show the assessments and recommendations given for mammograms during the first half (1995-1996) and second half (1997-1998) of the study. The overall discordance of mammographic assessments and recommendations decreased over timefrom 3.6% to 2.4% (p < 0.001). Mammograms obtained in 1998 were half as likely to have assessmentrecommendation discordance than those obtained in 1995 (2.4% vs 4.5%, respectively; odds ratio = 0.52; p < 0.001). The decrease in discordance between 1995-1996 and 1997-1998 primarily resulted from the more accurate association of assessments with the appropriate recommendations for negative, benign finding, and probably benign finding (categories 1-3). Nevertheless, in 1997-1998, almost 50% of the lesions described as probably benign finding were associated with a discordant recommendation: normal-interval follow-up was recommended for 30.8%; additional imaging, for 9.8%; sonography, for 4.4%; and biopsy, for 3.3%. We found no improvement over time in concordance patterns for suspicious abnormality or highly suggestive of malignancy, categories 4 and 5, respectively.
|
|
Table 5 shows the association of patient age, breast density, and year of examination with patterns of discordance. The patterns of discordance for mammograms of women younger than 50 years versus those of women 50 years old and older did not differ (p = 0.10). However, women with dense breast tissue had a 30% increased likelihood of having discordant assessments and recommendations compared with women with fatty breast tissue (3.4% vs 2.7%, respectively; odds ratio = 1.3; p < 0.001). Examinations during the latter half of the study (1997-1998) were significantly less likely to have discordant assessmentrecommendation pairings compared with examinations during the first half (1995-1996) of the study (2.4% vs 3.6%, respectively; odds ratio = 0.62; p < 0.001). In addition, logistic regression analysis showed a constant linear improvement over the 4 years of the study (p < 0.001).
|
|
|
|---|
Taplin et al. (unpublished data) investigated the use in 1997 of BI-RADS by seven registries within the breast cancer surveillance consortium. Our data are consistent with their findings of high concordance with recommendations given to negative and benign assessments, but moderate concordance (41%) with recommendations given to probably benign assessments. From our results, improvement would be predicted to continue over time, particularly with continued educational efforts and possible refinements to the lexicon.
There are two issues of possible concern when a discordant assessment and recommendation are given. The primary concern is the disposition of the patient. For BI-RADS to achieve maximum benefit, the specific recommendation given for the appropriate management of the patient needs to be clear. For some of the discordant patterns, the final disposition of the patient may have been correct. For example, when additional views are recommended for an assessment of probably benign finding, the patient may have the additional views at another site or another time. Based on those additional views, the assessment recommendation pairing of probably benign finding and short-interval follow-up or the pairing of negative and normal-interval follow-up may have then been given. The end result is the appropriate management of the patient, although an original erroneous assessment of category 3 (probably benign finding) rather than category 0 (needs additional imaging evaluation) was given.
Similarly, when a probably benign finding is paired with normal-interval follow-up (33%), it is possible that the patient management was correct in that the patient was in the second or third year of follow-up for stability of a probably benign lesion. At that time, a 1-year rather than 6-month follow-up interval was deemed appropriate. We wondered if this scenario could have been the case in our sample. We investigated further the 701 mammograms with an assessment of probably benign finding and normal-interval follow-up or no recommendation. We found that 283 (40%) of the 701 mammograms were of patients who had a mammogram obtained within the previous 2 years. However, of these 283 patients, only 12 (4%) had a mammogram with a recommendation of short-interval follow-up within the 2 years before the mammogram with the assessmentrecommendation pairing of probably benign finding and normal-interval follow-up was obtained. Finally, mammograms of lesions being followed up every 6 months should not have been in our study sample because these images should have been reported as diagnostic mammograms rather than as screening mammograms. Only the first screening mammogram with a lesion described as a probably benign finding should have been in our study sample, and all of those initial examinations should have been given a final assessment of probably benign finding and the recommendation of short-interval follow-up.
Clarifying the criteria for the probably benign finding category would be a helpful addition to the BI-RADS lexicon. A specific case to illustrate potential confusion is the reporting of an initial screening mammogram showing a mass. The initial screening mammogram would be reported as showing a category 0 lesion, need additional imaging evaluation. Subsequent spot compression views and sonograms would then be obtained, and only after those evaluations could the mass be assessed as a probably benign finding with the recommendation of short-interval follow-up. The mass is found to have remained stable at the first 6-month follow-up, and the lesion is again given an assessment and recommendation of probably benign finding and short-interval follow-up. At the second evaluation (1 year from the initial screening mammogram), the mass is said to have remained stable. At this point, a 1-year follow-up rather than a 6-month follow-up is deemed appropriate. On the basis of the mammographic findings, should the mass be assigned the assessment and recommendation of benign finding and normal-interval follow-up before 2-3 years of stability has been established, or should the mass be assigned the assessment and recommendation of probably benign finding and normal-interval follow-up? This issue could be clarified in future publications of the BI-RADS lexicon.
For some discordant patterns that we observed, the disposition of the patient is more concerning. For example, why normal-interval follow-up was recommended for 18 patients who were said to have a category 4 or 5 lesion on screening mammography is not clear. This discordant recommendation of normal-interval follow-up represents 4% of the patients with a lesion designated as a suspicious abnormality or as highly suggestive of malignancy. In these 18 cases, we examined previously obtained mammograms and reviewed each patient's cancer history at the time of and after mammography; however, these efforts failed to yield an explanation for the discordance in any of these cases. For example, these cases did not represent patients for whom a cancer diagnosis was already known and the normal-interval follow-up recommendation was for the contralateral breast. We also did not find any cases for which these aberrant category 4 or 5 assessments were followed up within 6 months by a cancer diagnosis, suggesting that the assessment and not the normal-interval follow-up recommendation guided the events. For discordant recommendations like these to be avoided, mammography sites in the course of their site audit could flag discordant assessmentrecommendation pairings to identify potential miscommunications to patients or to referring clinicians regarding follow-up care.
The second issue in regard to concordance patterns is whether the appropriate data are used for audit and research purposes. Most practices use assessment categories alone in their auditing practices. However, with discordant recommendations assigned to assessments, two different practices could yield different audit resultseven though the same recommendations were made to the patient. Similarly, in multicenter trials, pooling data from multiple sites involves decisions to pool either the assessment or the recommendation variable or both. Our findings indicate that one cannot always assume a given assessment leads to a specific recommendation.
Whether discordant assessments and recommendations primarily affect the outcome of the patient or the accuracy of audit data is unclear. This study suggests that there is significant variation in the pairing of probably benign finding with different recommendations. Fewer than half of the mammograms assessed as probably benign finding were recommended for short-interval follow-up.
A particular strength of this study is that it was conducted in community-based mammography practices using a large database from radiologists who used the BI-RADS system over a defined period of time. A limitation of our study is the absence of specific follow-up data. In a study of this size for which data are collected from multiple community practices, it is not possible to assess details of individual discordant examinations or details of specific patient histories or outcomes. However, providing information back to individual sites can facilitate further detailed evaluation by each specific practice.
Overall, we conclude that in community practice the concordance between BI-RADS assessments and recommendations is high and that the degree of concordance has improved over time. We did find significant variation in recommendations associated with the categories probably benign finding and suspicious abnormality, categories 3 and 4, respectively. Attention to these variations is important for both auditing and research purposes and for further possible refinements of the BI-RADS lexicon and in training programs about the use of the BI-RADS lexicon.
|
|
|---|
16 and 900, 1997This article has been cited by other articles:
![]() |
C. D. Lehman, C. M. Rutter, P. R. Eby, E. White, D. S. M. Buist, and S. H. Taplin Lesion and Patient Characteristics Associated with Malignancy After a Probably Benign Finding on Community Practice Mammography Am. J. Roentgenol., February 1, 2008; 190(2): 511 - 515. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Saslow, J. Hannan, J. Osuch, M. H. Alciati, C. Baines, M. Barton, J. K. Bobo, C. Coleman, M. Dolan, G. Gaumer, et al. Clinical Breast Examination: Practical Recommendations for Optimizing Performance and Reporting CA Cancer J Clin, November 1, 2004; 54(6): 327 - 344. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |