|
|
||||||||
1
Department of Radiology, University of Maryland School of Medicine, 22 S.
Greene St., Baltimore, MD 21201.
2
The Greenebaum Cancer Center, University of Maryland School of Medicine,
Baltimore, MD 21201.
3
Department of Epidemiology and Preventive Medicine, University of Maryland
School of Medicine, Baltimore, MD 21201.
Received September 22, 1999;
accepted after revision November 5, 1999.
Presented in part at the 1997 and 1998 Radiological Society of North
America annual meetings, Chicago, November 1997 and November 1998.
Abstract
|
|
|---|
MATERIALS AND METHODS. Five experienced mammographers, not specifically trained in BI-RADS, used the lexicon to describe and assess 103 screening mammograms, including 30 (29%) showing cancer, and a subset of 86 mammograms with diagnostic evaluation, including 23 (27%) showing cancer. A subset of 13 screening mammograms (two with malignant findings, 11 with diagnostic evaluation) were rereviewed by each observer 2 months later. Kappa statistics were calculated as measures of agreement beyond chance.
RESULTS. After diagnostic evaluation, the interobserver kappa values for describing features were as follows: breast density, 0.43; lesion type, 0.75; mass borders, 0.40; special cases, 0.56; mass density, 0.40; mass shape, 0.28; microcalcification morphology, 0.36; and microcalcification distribution, 0.47. Lesion management was highly variable, with a kappa value for final assessment of 0.37. When we grouped assessments recommending immediate additional evaluation and biopsy (BI-RADS categories 0, 4, and 5 combined) versus follow-up (categories 1, 2, and 3 combined), five observers agreed on management for only 47 (55%) of 86 lesions. Intraobserver agreement on management (additional evaluation or biopsy versus follow-up) was seen in 47 (85%) of 55 interpretations, with a kappa value of 0.35-1.0 (mean, 0.60) for final assessment.
CONCLUSION. Inter- and intraobserver variability in mammographic interpretation is substantial for both feature analysis and management. Continued development of methods to improve standardization in mammographic interpretation is needed.
|
|
|---|
Our study had the following three goals: to assess variability in feature analysis (description of lesion) and lesion management (threshold for biopsy); to identify which lesion descriptors are consistently used and to determine the positive predictive value of each major descriptor in the BI-RADS lexicon in our series of cases; and to provide guidance for possible areas of improvement in either terminology or training of interpreting physicians.
|
|
|---|
The case population was selected to maintain the expected rate of malignancy among lesions referred for biopsy. We had 30 cases of cancer in our series, representing 29% of lesions. Sixty-nine benign lesions and four high-risk lesions (three cases of atypical ductal hyperplasia and one of lobular carcinoma in situ) were included. At the same time, we sought to include at least three examples of each of the BI-RADS major descriptors for mass borders, asymmetric densities, microcalcification morphology, and distribution, except that all typically benign calcifications were condensed into the term "coarse," with only three such examples included. Ninety lesions were biopsy-proven (44 by 14-gauge core needle biopsy with 2 years of follow-up showing stability or regression, and 46 by surgical excision). Thirteen lesions were shown to be stable on 4-year follow-up mammography.
Good-quality copy mammograms were used, with the lesion marked on both craniocaudal and mediolateral oblique views. Observers were asked to review the screening films initially without comparison films and to complete a form detailing the findings using the BI-RADS lexicon and assessment and recommendation categories. Observers were asked to choose the single most worrisome applicable descriptor from each category. Assessment was finalized in BI-RADS categories together with corresponding recommendations: 1, negative, routine screening; 2, benign findings, routine screening; 3, probably benign, short interval follow-up (6 months); 4, suspicious, biopsy; 5, highly suggestive of malignancy, biopsy; and 0, needs additional evaluation.
For a subset of 86 lesions, additional diagnostic evaluation was immediately available under separate cover, and the observers were asked to again describe features and make a final assessment. Additional evaluation consisted of magnification images for 50 lesions, spot compression images for 14 lesions and sonograms for seven of these 14, true lateral images for five, one laterally exaggerated craniocaudal image, and one mammogram after aspiration. For 15 lesions, including four cases of microcalcifications, the only additional evaluation was comparison films from at least 4 years earlier. Comparison films were also supplied for 13 lesions with other additional evaluation. This subset of 86 cases included 23 cases of cancer (27% of lesions), the four high-risk lesions, and 59 benign lesions. Observers were again permitted to use category 0, needs additional evaluation, because personal preferences varied as to the extent of additional evaluation deemed necessary to make a final assessment.
To assess intraobserver variability, all observers rereviewed 13 randomly selected cases after a minimum of 2 months had elapsed since the first interpretation, longer than the 6 weeks advocated by Metz [8]. Two cancerous lesions were included, as were one high-risk lesion and nine benign lesions; 11 cases had diagnostic evaluation.
Kappa statistics were calculated using Stata software (Stata Press, College
Station, TX) to assess the proportion of inter- and intraobserver agreement
beyond that expected by chance
[9]. The method for estimating
an overall kappa value in the case of multiple observers and multiple
categories is based on the work of Landis and Koch
[10,
11] as follows: for each
category j, a kappa statistic (for multiple raters) is calculated
comparing category j with the other categories pooled. A weighted
average is used to combine these kappa values, where the weight for a given
kappa value is the product of pj, the proportion of ratings in
category j, and (1 - pj), the proportion of ratings not in
category j. BI-RADS final assessment categories 1 and 2 were combined
for this analysis. A value of
= 1.0 corresponds to complete agreement,
0 to no agreement, and less than 0 to disagreement. Svanholm et al.
[12] have suggested that a
kappa value of equal to or less than 0.50 be taken as poor and equal to or
greater than 0.75 as excellent reproducibility. Landis and Koch
[10] have suggested that a
kappa value of equal to or less than 0.20 indicates slight agreement;
0.21-0.40, fair agreement; 0.41-0.60, moderate agreement; 0.61-0.80,
substantial agreement; and 0.81-1.00, almost perfect agreement.
|
|
|---|
Interobserver Variability
Breast density.Moderate agreement was seen across our
observers in describing breast density, with an overall kappa value of 0.43.
Moderate agreement was seen in the use of the terms "fatty"
(
= 0.76), "minimal scattered fibroglandular elements"
(
= 0.43), and "extremely dense" (
= 0.45). Poor
agreement was seen in use of the term "heterogeneously dense"
(
= 0.17).
Lesion type.The five observers agreed on lesion type (mass, mass with calcifications, microcalcifications, special case, or calcifications with associated density) for 75 of 103 lesions, with a kappa value of 0.75. As has previously been observed [7], distinction of a "focal asymmetric density" from an "indistinct" mass was problematic (Fig. 1A,1B,1C). For eight (29%) of 28 lesions, the presence or absence of microcalcifications in a mass or focal density was the source of disagreement. Similarly, another eight lesions (29%) represented disagreement on the presence or absence of a mass or focal density associated with definite microcalcifications. In detailing results for individual terms describing feature analysis (Tables 1 and 2), we will consider only the observers' interpretations characterizing the lesion as a "pure" mass, density, or microcalcifications; we will exclude mixed lesions.
|
|
|
|
|
Mass borders.On the basis of the screening evaluation,
complete agreement in feature analysis of mass borders or special cases was
seen across all five observers in only seven (16%) of 45 lesions. After
diagnostic evaluation, complete agreement of all five observers was seen for
12 (38%) of 32 masses or special cases (
= 0.40).
The greatest agreement was in the use of the term "circumscribed
round" (
= 0.64) after diagnostic evaluation. Only two
interpretations describing a lesion as a circumscribed mass assessed the
lesion as suspicious, biopsy (BI-RADS category 4). Indeed, only one (4%) of 28
interpretations describing a circumscribed round mass was a malignant lesion
(Fig. 2,
Table 1), consistent with the
work of Sickles [13], which
showed a 1.4% risk of malignancy in such lesions, and with the recommendation
that such lesions be considered probably benign (BI-RADS category 3)
[1,
4]. Sonograms had been provided
for two masses the authors considered circumscribed, which accounted for none
of the observers' interpretations describing the lesion as circumscribed. Not
surprisingly, sonography was recommended for additional evaluation of nine
(32%) of 28 masses considered circumscribed by the observers before they would
render a final assessment.
|
Masses termed "microlobulated" by one observer were more likely
to be considered indistinct or spiculated by another observer (
=
0.32). All were considered suspicious (BI-RADS category 4) or in need of
additional evaluation (BI-RADS category 0) and one (10%) of 10 such lesions
was malignant (Fig. 2). Very
few lesions were termed "obscured"; essentially no agreement was
seen in use of this term (
= 0.10). A mass considered obscured by one
observer was more than twice as likely to be considered indistinct or
circumscribed by another.
Only fair agreement was seen on use of the term "indistinct"
(
= 0.28). Surprisingly, assessments of indistinct masses ranged from
out-right benign (BI-RADS category 2), through highly suggestive of malignancy
(BI-RADS category 5) (Table 1).
None of those assessed as category 2 (three interpretations) or category 3
(two interpretations) proved to be malignant, whereas six (50%) of 12 of those
assessed as category 4 and one (50%) of two of those assessed as category 5
proved malignant.
Moderate agreement was seen for the term "spiculated" (Fig.
3A,3B,3C)
(
= 0.58). All lesions so described were assessed as suspicious
(BI-RADS category 4) or highly suggestive of malignancy (BI-RADS category 5)
by all observers, and indeed 12 (75%) of 16 interpretations describing a
spiculated mass were of malignant lesions.
|
|
|
Special cases.Complete agreement was seen for the one
dilated duct. The four lymph nodes included were as likely to be considered
benign circumscribed masses as lymph nodes (
= 0.50). Lesions termed
"asymmetric breast tissue" (
= 0.38) or "focal
asymmetric densities" (
= 0.60) by one observer were both more
likely to be considered indistinct masses by another (Fig.
1A,1B,1C),
as seen in the study of Baker et al.
[7]. If the description was
asymmetric breast tissue, the lesion was always considered normal or benign
and proved to be. Focal asymmetric densities were more problematic, with
assessments ranging from negative (BI-RADS category 1) to highly suggestive of
malignancy (BI-RADS category 5). Overall, five (25%) of 20 lesions interpreted
as a pure focal asymmetric density without calcifications proved to be
malignancies (Table 1).
Mass shape.In describing mass shape, a random distribution
of use of descriptors was seen, with all terms (round, oval, lobulated, and
irregular) equally likely to be used (
= 0.28).
Mass density.The overall kappa value of 0.40 that we found
for mass density is comparable to the relatively low interobserver agreement
previously shown by Jackson et al.
[14]. In particular, no
agreement was seen for density termed "lower than that of surrounding
parenchyma" (
= -0.01).
Microcalcification morphology.Overall agreement was fair
(
= 0.36) for microcalcification morphology and moderate (
=
0.47) for description of microcalcification distribution
(Table 2). For lesions
considered pure calcifications (without mass or associated density), only fair
agreement was seen in use of the term "coarse, benign" (
=
0.29). When calcifications were described as coarse and benign by one
observer, they were more likely to be considered punctate or pleomorphic by
another (Fig. 4). One unusual
recurrent malignancy proved particularly problematic
(Fig. 5), being considered
benign or probably benign by all observers. This one lesion accounted for four
of five of the interpretations of malignant calcifications considered coarse
or benign.
|
|
Although good agreement was seen on use of the term "milk of
calcium" (
= 0.71), such calcifications were more likely to be
described as punctate or pleomorphic by another observer (Fig.
6A,6B).
All lesions described as milk of calcium were considered benign (BI-RADS
category 2) or probably benign (BI-RADS category 3) and proved to be
benign.
|
|
Calcifications described as coarse and benign, milk of calcium, or
amorphous were more likely to be considered punctate by another observer
(Figs.
3A,3B,3C,
5, and
6A,6B).
Agreement on "punctate" morphology was only fair (
= 0.36).
Although it has been suggested that this term implies uniform round
calcifications that can be considered probably benign (BI-RADS category 3)
[15], eight (13%) of 61
interpretations describing punctate microcalcifications considered them
suspicious (BI-RADS category 4) and recommended biopsy. Those eight
interpretations included one cancerous lesion
(Fig. 4) and one lobular
carcinoma in situ. Another seven interpretations described cases of atypical
ductal hyperplasia as punctate and probably benign. Thus, the positive
predictive value of a description of punctate calcifications was one (1.6%) of
61 considering only malignancies, and nine (15%) of 61 when including
high-risk lesions as well. Although distribution of calcifications clearly
also influences assessment, the analysis is not straightforward: one lesion
considered punctate calcifications in a segmental distribution by two
observers was assessed as suspicious by one observer and benign by the other.
The distribution of the other punctate calcifications assessed as suspicious
was described as clustered (n = 4), linear (n = 1), or
multiple clusters (n = 2).
"Amorphous" calcifications were problematic, with only fair
agreement on use of the term (
= 0.25). Again, calcifications described
as amorphous by one observer were more likely to be described as pleomorphic
or punctate by another. Assessments ranged from benign (BI-RADS category 2) to
suspicious, biopsy (BI-RADS category 4), and one (5%) of 20 was malignant.
Only fair agreement was seen in use of the term "pleomorphic"
(
= 0.37). Calcifications described as pleomorphic were given
assessments ranging from benign (BI-RADS category 2) to highly suggestive of
malignancy (BI-RADS category 5). Of 55 interpretations describing the lesion
as pure pleomorphic calcifications, 22 (40%) were malignant
(Table 2).
All "branching or fine linear" calcifications were
appropriately considered suspicious (BI-RADS category 4) or highly suggestive
of malignancy (BI-RADS category 5), and six (43%) of 14 pure branching
calcifications without associated mass were malignant. A lesion described by
one observer as branching calcifications was more likely described as
pleomorphic by another observer (
= 0.37).
Microcalcification distribution.The distribution of most
lesions manifesting as microcalcifications was termed "clustered"
(
= 0.58). This description was used nearly in proportion to the
overall case mix, with 18 (16%) of 113 clustered microcalcifications actually
being malignant and final management assessments spanning the spectrum of
BI-RADS categories (Table
2).
Very few cases were termed "linear" in distribution, and little
agreement existed on use of the term (
= 0.19). Such calcifications
were more likely to be considered clustered or segmental by another observer.
Three (30%) of 10 calcifications described as linear were malignant, and only
one interpretation assessed (punctate) calcifications in a linear distribution
as probably benign (BI-RADS category 3), with the rest considered suspicious
(BI-RADS category 4).
Moderate agreement was observed in the use of the descriptor
"segmental" (
= 0.46). Only one observer assessed
(punctate) segmental calcifications as benign, as mentioned; one case
described as segmental in distribution was considered suspicious, and 12 (86%)
of 14 were considered highly suggestive of malignancy (BI-RADS category 5).
Ten (71%) of 14 segmental calcifications were actually malignant. This figure
compares favorably with the results of Liberman et al.
[5], in which 74% of lesions
described as segmental and sent for biopsy proved to be malignant.
"Regional" and "diffuse" distributions have been
associated with a high likelihood of benignity
[4]. Very few lesions were so
characterized, and little agreement was seen in the use of either term
(
= 0.29 for regional and
= 0.08 for diffuse). If
calcifications were described as regional in distribution by one observer,
they were more likely considered segmental or clustered in distribution by
another. Of 10 interpretations describing calcifications as regional, one
(10%) was of a malignant lesion. None of those described as diffuse was
malignant, and they were more likely to be described as clustered by another
observer.
Final assessment.Table 3 presents interobserver agreement on lesion assessment and management based on screening images and after diagnostic evaluation. Overall kappa value was 0.21 after screening and 0.38 after diagnostic evaluation for BI-RADS final assessment categories, with assessment categories 1 and 2 considered equivalent. Disagreement on final assessment was greatest for lesions placed in BI-RADS categories 3, probably benign (recommend short-interval follow-up), and 4, suspicious (consider biopsy). This disagreement likely reflects a variation in the intervention threshold of individual observers, with some observers biopsying lesions having a low probability of malignancy.
|
Considering together those interpretations with a benign (BI-RADS category 2) or probably benign (BI-RADS category 3) assessment as one group and those recommending either biopsy or immediate additional evaluation as another group, agreement was seen for 70 (68%) of 103 lesions from screening films. Of 68 benign lesions, observers concurred for 39 (57%), recommending additional evaluation in 37 (54%) or classifying them as normal or benign in two (3%) lesions. Agreement was also seen in the recommendation for additional evaluation or biopsy for all four (100%) high-risk lesions and for 27 (90%) of the 30 cancerous lesions. The third observer initially rated one cancerous lesion as normal (Fig. 3A,3B,3C), one as benign, and one as probably benign; and the fifth observer rated one of the same lesions as benign (Fig. 4). As noted, a third cancerous lesion (Fig. 5) was considered benign or probably benign by all observers.
Agreement diminished after additional evaluation compared with screening assessments, with agreement seen in 47 (55%) of 86 cases. Observers concurred for 25 (42%) of 59 benign lesions, one (25%) of four high-risk lesions, and 21 (91%) of 23 cancerous lesions. As mentioned, of 23 cancerous lesions, 21 (91%) were recommended for additional evaluation or biopsy by three observers and 22 (96%) by two observers.
Intraobserver Variability
When cases were reassessed by the same observer, substantial disagreement
persisted, although to a lesser degree than among different observers
(Fig. 7). After diagnostic
evaluation, inconsistency in description of mass borders and focal asymmetries
ranged from 29% to 57% (mean, 40%) and of microcalcification morphology from
14% to 71% (mean, 49%). Major disagreement in final assessment or patient
managementdefined as immediate additional evaluation or biopsy (BI-RADS
categories 0, 4, or 5) versus follow-up (BI-RADS categories 2 or
3)occurred in 0-27% of cases. Overall, observers' final assessment was
unchanged in 36 (65%) of 55 second interpretations, whereas major disagreement
between first and second diagnostic impression was seen in eight (15%) of
repeated interpretations.
|
|
|
|---|
Kerlikowske et al. [21]
reported 78% agreement (
= 0.58) in the final assessments of two
observers and 86% intraobserver agreement (
= 0.73) in evaluation of
2616 mammograms. Those investigators did not use category 3, probably benign,
because theirs was a screening assessment. From 73% to 78% of cases with
cancer were considered abnormal by those observers, and fair to moderate
agreement was described for feature analysis
[21]. High breast density was
shown to increase disagreement twofold in both lesion detection and final
assessment [21]. Far less
agreement was seen in the earlier work of Boyd et al.
[22,
23], in which nine
radiologists reviewed 100 xeromammograms. Between pairs of radiologists, kappa
values ranged from 0.17 to 0.55 for diagnostic assessment, from 0.16 to 0.70
for the presence or absence of a mass, and from 0.34 to 0.73 for the presence
or absence of calcifications
[23].
We sought to simplify the analysis of variability in mammographic interpretation by excluding detection as a variable and evaluating only feature analysis and lesion management for marked lesions, using BI-RADS standardized terminology. As with many clinicians in practice, our investigators were not trained specifically in the terminology but do use the final assessment categories in their routine practice. Further, they were given ordered lists of the terminology by increasing likelihood of malignancy according to the criteria set forth by D'Orsi and Kopans [4] and were asked to use the single most worrisome applicable descriptor from each category.
We found disagreements between observers in clinically significant management (biopsy versus follow-up) in 32% (33/103) of screening interpretations and in 45% (39/86) after diagnostic evaluation. Observers disagreed with themselves in management 15% (8/55) of the time. The larger source of variation appeared to be lesion description, with disagreement on description of 38 (84%) of 45 mass borders on screening and 20 (63%) of 32 after diagnostic evaluation. Similarly, we found disagreement on 33 (70%) of 47 descriptions of microcalcification morphology on screening and 33 (75%) of 44 after diagnostic evaluation. Even when the same lesions were described using the same terminology, clinically significant variations in management occurred. In particular, one ductal carcinoma in situ was described as clustered pleomorphic microcalcifications by all five observers and was interpreted as probably benign by one, whereas the others all recommended biopsy. Despite the variability, the overall performance of our observers was outstanding, with recommendations for additional evaluation or biopsy (BI-RAD categories 4, 5, or 0) of 90-97% of cancerous lesions on screening and 91-96% after diagnostic evaluation.
Our results are similar in magnitude to variability in chest radiograph interpretation as studied in the 1950s. Yerushalmy et al. [24] found a 1-in-3 chance that two observers disagreed as to progression, regression, or stability of inflammatory disease and a 1-in-5 chance of the same observer disagreeing with him- or herself. Although such variation is inherent in observational studies, training in mammographic lesion description at least has the potential to reduce variability of feature analysis. If correlation of feature analysis and management can be further substantiated, we may be able to reduce the variability in management as well.
Our study likely underestimates true variations in practice for two major reasons. We did not allow variation in detection, because the lesions were marked on the films. Further, overlap in professional activities of our observers would be expected to diminish variability in lesion management.
From a medicolegal perspective, the more specifically descriptors are correlated with a specific risk of malignancy, the more a standard of management will apply when specific terminology is used. Thus it would be beneficial for the mammography community to establish which terminology is beneficial in predicting management (probably largely at the extremes of benign and highly suspicious) and to expect consistency in those cases. Obviously, variability will exist in the management of indeterminate lesions as a function of interventional thresholds. Again, clarification of the appearances of such lesions, understanding of this variability, and definition of reasonable practice standards for those lesions is in the best interest of all radiologists.
We found little consistency for the use of descriptors of mass shape or density and would propose that management be based largely on the appearance of the border of mass lesions. Indeed, we found that despite variability in describing individual cases, aggregate use of particular terms correlated with the expected risk of malignancy. "Circumscribed" masses have been shown to have a risk of malignancy of less than 2% [13, 15] and were consistently described and managed as benign or probably benign in our study, with variability largely due to the use of immediate sonography or short-term follow-up. Only one (4%) of 28 interpretations describing a mass as circumscribed, round, or oval was of a malignancy in our series. Shape may play a role with circumscribed masses, in that circumscribed gently lobulated masses were more likely malignant (4/33 [12%]) in our series. Similarly, "spiculated" masses were uniformly considered suspicious or highly suggestive of malignancy, and 12 (75%) of 16 such lesions were malignant. From the work of D'Orsi and Kopans [4] and others, it seems reasonable that a mass that is solid but not circumscribed on a baseline mammogram generally warrants biopsy. Thus, we were surprised to see several lesions described as "indistinct" yet assessed as benign. The inappropriate classification of such lesions as benign (BI-RADS category 2) would appear to be an area for greater training, and obviously sonography will play a substantial role in the evaluation of mammographically and clinically indeterminate lesions.
Distinguishing normal variant asymmetric densities from indistinct masses would appear to be another area for greater training. Sickles [25] has recently shown success in short interval follow-up of densities seen on only one view, although relatively little has been published to provide criteria for such a screening assessment, and spot compression or sonography may be needed to adequately evaluate many such densities. As in the study of Baker et al. [7], we found that even expert mammographers have difficulty distinguishing normal variant focal asymmetric densities from indistinct masses. Management is predicated on such perceptual differences. In one study [26], most cases of missed breast cancer that were visible in retrospect but not on blinded review were asymmetric densities. Comparison with old films may be helpful for such lesions.
For calcifications, training in identifying suspicious morphology or distribution is needed. When either suspicious morphology or suspicious distribution is seen, biopsy is appropriate [26]. We expect to continue to see variability in the use of the term "punctate" as has been seen in other work [7], but we may see improved performance if appropriate management is clarified. For example, a cluster of round punctate calcifications may be probably benign (BI-RADS category 3) [15], but similar calcifications in a segmental or linear distribution would be at least suspicious (BI-RADS category 4) [27]. Diffuse punctate calcifications may reasonably be considered benign [1]. Further data on this topic are needed. Increased familiarity with the illustrated BI-RADS lexicon [2] and training of interpreting physicians in the use of BI-RADS are expected to enhance consistency in description of lesions and will allow data analysis across multiple sites, ensuring congruency between description and subsequent management.
We noted great variation in the management of clustered amorphous calcifications, which likely reflects variation in the interventional threshold for these low-suspicion lesions. Further data establishing the likelihood of malignancy of indeterminate calcifications with description of distribution are needed, as is consensus about the appropriate threshold for intervention (e.g., >2% risk of malignancy). Because it is the decision to monitor or to biopsy that most influences patient care, greater efforts are needed to develop standardized cases emphasizing probably benign and suspicious lesions to minimize variation in interventional thresholds among radiologists.
In our study, inconsistency was greatest for probably benign and suspicious assessments. As stated, the results of Sickles [13, 15, 25] and Varas et al. [28] suggest reliable criteria exist for probably benign lesions, and Orel et al. [29] recently confirmed that only three (2%) of 141 lesions prospectively classified as probably benign (BI-RADS category 3) before biopsy proved to be malignant. Broader verification of the use of category 3 and the availability of examples of appropriate lesions for this assessment are needed. Such an assessment is not to be made generally on screening images alone, but only after a complete diagnostic workup. Furthermore, practitioners are encouraged to audit their outcomes [30] for lesions placed in category 3 and may reconsider their individual practice if the rate of malignancy exceeds 2% or if delays in diagnosis are affecting patient outcomes (i.e., lymph node status and stage of disease at diagnosis).
In summary, the BI-RADS lexicon is an important step forward in standardizing reporting. Variability is inherent in the practice of radiology and is not necessarily problematic. Indeed, agreement was seen in the recommendation for additional evaluation or biopsy for all four highrisk lesions and for 27 (90%) of 30 malignant lesions despite substantial variability in feature analysis and case-by-case management. It is our collective responsibility to identify areas in existing practices in which even experts have difficulty and to improve on those areas with broad data collection, training, and education. Ultimately, such efforts will help guide us all to improve our accuracy in reporting, to diagnose cancer at its earliest stage, and to avoid unnecessary biopsy of benign lesions.
Acknowledgments
We thank the observers who participated in this study: Cecilia Brennecke,
Judy Destouet, Nagi Khouri, Barbara Savader, and Rosy Singh. Without their
support, this study could not have been performed. We also thank the Susan G.
Komen Breast Cancer Foundation for their continuing support.
|
|
|---|
This article has been cited by other articles:
![]() |
E.-K. Kim, K. H. Ko, K. K. Oh, J. Y. Kwak, J. K. You, M. J. Kim, and B.-W. Park Clinical Application of the BI-RADS Final Assessment to Breast Sonography in Conjunction with Mammography Am. J. Roentgenol., May 1, 2008; 190(5): 1209 - 1215. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Miglioretti, R. Smith-Bindman, L. Abraham, R. J. Brenner, P. A. Carney, E. J. A. Bowles, D. S. M. Buist, and J. G. Elmore Radiologist Characteristics Associated With Interpretive Performance of Diagnostic Mammography J Natl Cancer Inst, December 19, 2007; 99(24): 1854 - 1863. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Berlin Accuracy of Diagnostic Procedures: Has It Improved Over the Past Five Decades? Am. J. Roentgenol., May 1, 2007; 188(5): 1173 - 1178. [Full Text] [PDF] |
||||
![]() |
G. J. R. Porter, A. J. Evans, E. J. Cornford, H. C. Burrell, J. J. James, A. H. S. Lee, and J. Chakrabarti Influence of Mammographic Parenchymal Pattern in Screening-Detected and Interval Invasive Breast Cancers on Pathologic Features, Mammographic Features, and Patient Survival Am. J. Roentgenol., March 1, 2007; 188(3): 676 - 683. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Burnside, J. E. Ochsner, K. J. Fowler, J. P. Fine, L. R. Salkowski, D. L. Rubin, and G. A. Sisney Use of Microcalcification Descriptors in BI-RADS 4th Edition to Stratify Risk of Malignancy Radiology, February 1, 2007; 242(2): 388 - 395. [Abstract] [Full Text] [PDF] |
||||
![]() |
M P Sampat, G J Whitman, T W Stephens, L D Broemeling, N A Heger, A C Bovik, and M K Markey The reliability of measuring physical characteristics of spiculated masses on mammography Br. J. Radiol., December 1, 2006; 79(Special_Issue_2): S134 - S140. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. A. Berg, J. D. Blume, J. B. Cormack, and E. B. Mendelson Operator Dependence of Physician-performed Whole-Breast US: Lesion Detection and Characterization. Radiology, November 1, 2006; 241(2): 355 - 365. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. J. Fabian and B. F. Kimler Mammographic Density: Use in Risk Assessment and as a Biomarker in Prevention Trials J. Nutr., October 1, 2006; 136(10): 2705S - 2708S. [Full Text] [PDF] |
||||
![]() |
W. E. Barlow, E. White, R. Ballard-Barbash, P. M. Vacek, L. Titus-Ernstoff, P. A. Carney, J. A. Tice, D. S. M. Buist, B. M. Geller, R. Rosenberg, et al. Prospective breast cancer risk prediction model for women undergoing screening mammography. J Natl Cancer Inst, September 6, 2006; 98(17): 1204 - 1214. [Abstract] [Full Text] [PDF] |
||||
![]() |