|
|
||||||||
1
Department of Radiology, Duke University Medical Center, Box 2623, Durham, NC
27710.
2
Department of Biomedical Engineering, Duke University Medical Center, Durham,
NC 27710.
Received January 22, 1999;
accepted after revision May 3, 2000.
The opinions and assertions contained herein are the private views of the
authors and are not to be construed as official or as reflecting the views of
the Department of the Army or the National Institutes of Health.
Abstract
|
|
|---|
SUBJECTS AND METHODS. The case-based reasoning system is designed to support the decision to perform biopsy in those patients who have suspicious findings on diagnostic mammography. Currently, between 66% and 90% of biopsies are performed on benign lesions. Our system is designed to help decrease the number of benign biopsies without missing malignancies. Clinicians interpret the mammograms using a standard reporting lexicon. The case-based reasoning system compares these findings with a database of cases with known outcomes (from biopsy) and returns the fraction of similar cases that were malignant. This malignancy fraction is an intuitive response that the clinician can then consider when making the decision regarding biopsy.
RESULTS. The system was evaluated using a round-robin sampling scheme and performed with an area under the receiver operating characteristic curve of 0.83, comparable with the performance of a neural network model. If only the cases returning a malignancy fraction of greater than a threshold of 0.10 are sent to biopsy, no malignancies would be missed, and the number of benign biopsies would be decreased by 25%. At a threshold of 0.21, 98%, of the malignancies would be biopsied, and the number of benign biopsies would be decreased by 41%.
CONCLUSION. This preliminary investigation indicates that the case-based reasoning approach to computer-aided diagnosis has the potential to improve the accuracy of breast cancer diagnosis on mammography.
|
|
|---|
Mammography is a sensitive procedure for detecting breast cancer, but the positive predictive value is low. Only 10-34% of women who undergo biopsy for mammographically suspicious impalpable lesions are actually found to have malignancy [1]. Between 0.5% and 2.0% of all mammographic examinations result in biopsy; several hundreds of thousands of biopsies are performed on benign lesions each year. The women undergoing biopsy for a benign finding are unnecessarily subjected to the discomfort, expense, potential complications, change in cosmetic appearance, and anxiety that can accompany breast biopsy [1,2,3,4]. In addition, the financial burden of these procedures ($3000-$5000 per biopsy) is significant in the present political and economic effort to reduce expenditures. The proposed system may significantly improve this performance through a simplified case-based reasoning approach that uses a large database of cases with known outcomes. In clinical practice, this system can be easily integrated into the mammographers' work flow through a computerized reporting system. The clinician interprets a mammogram and records the findings into a computer using a standard reporting lexicon (Breast Imaging Reporting And Data System [BI-RADS]) [5]. The database is searched for similar cases, and the fraction of similar cases that were malignant is returned. This fraction is referred to as the "malignancy fraction" and is an intuitive response that the woman's health care team can then include in the medical decision for biopsy.
|
|
|---|
The critical components of a case-based reasoning system are as follows: a translatable quantitative encoding for cases, a database of cases, and a rule to define when the test case matches a case in the database.
The input to the system is a subset of the BI-RADS [5] mammographic findings and items from the patients' medical histories. The output is formed from the known outcome of biopsy. The database consists of a set of radiologist's findings, a medical history, and a set of biopsy results for cases that were judged suspicious for breast cancer and for which excisional biopsies were obtained as the gold standard. For the work reported here, an existing set of 500 cases was used. The heart of the system is the set of rules defining a match between the test case and the known cases in the database. The trivial match criterion would be that all findings match exactly. Although easy to justify, this matching rule is impractical. Considering 10 of the categoric BI-RADS findings, there are more than 1010 combinations of features; exact matches are rare. Cases are matched if they are "close" in some sense. For this work, a mathematic form of this "closeness" was developed using an ad hoc deterministic matching rule described in a subsequent section.
Cases
The cases for this project were described for a previous investigation to
develop an artificial neural network for the decision to biopsy
[6].
Of the women undergoing needle localization for impalpable breast lesions between January 1991 and December 1995, 500 lesions were randomly selected for open excisional biopsy and pathologic diagnosis. These lesions include 206 that were retrospectively interpreted in a previous study [7] and 294 new cases that were prospectively acquired.
Each set of mammograms was acquired using film-screen technique on dedicated mammography equipment. No case was included in the study if either of the reviewing radiologists had prior knowledge of the biopsy results or if the suspicious area was not definitely identified. Of the 500 lesions evaluated there were 232 masses alone, 192 microcalcifications alone, and 29 combinations of masses and associated microcalcifications. The remaining 47 lesions included various combinations of architectural distortion, regions of asymmetric breast density, areas of focal asymmetric density, and areas of asymmetric breast tissue. Patients were 24-86 years old (average age, 55 years). At biopsy, 326 (65%) of the lesions were found to be benign, and 174 (35%) were malignant. This positive predictive value of 35% is greater than those reported in prior studies [1, 4, 8, 9] but is consistent with our previous data.
All mammograms were interpreted by radiologists whose primary clinical responsibilities are the interpretation of mammograms and the evaluation of breast lesions and who routinely report case findings using the BI-RADS [5] descriptors. Both radiologists were asked to describe each lesion using the BI-RADS lexicon by completing a checklist that included all possible BI-RADS descriptors. Both radiologists were permitted to select only a single descriptor from each category. The findings were recorded during the routine patient workup before biopsy results were known. The reviewing mammographer was provided with the patient's history and any prior films.
The cases were randomly numbered with no identifying marks that could be traced to the original patients to ensure that patient confidentiality was maintained.
Input Findings
The input features were selected from 10 of the features from the BI-RADS
[5] lexicon and one finding
from the medical history. The 10 features initially considered from the
BI-RADS lexicon were chosen on the basis of our previous work with these data
[6] and included mass size,
mass margin, mass density, mass shape, calcification description,
calcification number, calcification distribution, and special cases or
associated findings. The patient's age was included among the history
findings. We found that performance strongly depended on which features were
included in the matching criteria. No sophisticated feature-selection
algorithm was used. To reduce the initial number of features, a forward
stepwise linear discriminant analysis was performed with these 11 potential
input features, and six were found to contribute at a significance level of
0.05 or greater. These selected features were age, mass margin, mass density,
calcification description, calcification distribution, and associated findings
(including the architectural distortion descriptor). The case-based reasoning
system can be considered a very restricted linear model; therefore, feature
exclusion using linear discriminant analysis should retain any features useful
for case-based reasoning. In addition, the mammographer rated the likelihood
of malignancy on a 5-point scale for each case. This rating was independent of
the BI-RADS assessment.
Case-Based Reasoning Algorithm
Given a test case, a database, and a matching rule, the case-based
reasoning algorithm is straightforward. From all cases in the database, those
that match the test case are selected, and then the malignancy fraction is
computed as the number of selected cases that have malignant outcomes divided
by the total number of selected cases.
Matching Rules
The most important component of this system is the rule to decide whether a
new case is similar to a case in the database. The simplest matching rule is
to accept all cases. The next logical matching rule is to require that the
type of lesion be matched: mass or calcification. Next, we require a match for
the primary finding: mass margin for masses or calcification description for
calcifications. Then we consider all findings. By trying combinations of
findings one can obtain an optimized matching rule. We define a distance
function as the total number of features that do not match between the test
case and the example from the database. This rule introduces a parameter: the
distance cutoff. If an example in the database has a distance from the test
case that is less than the distance cutoff, then it matches.
Case Indexing
In formal case-based reasoning, case indexing refers to techniques that
allow rapid identification and retrieval of similar cases. For this problem,
the small number of features per case, combined with the inherent categoric
structure of the features, allows efficient implementation while considering
all features of every case. Only the patient age and the mass size are not
categoric. A sliding window was used to match patient age and mass size. If
the difference in feature value between two cases is less than the window
width, then the features match.
Sampling for Testing
Currently, much has been written in the statistics literature concerning
sampling from a finite number of examples to evaluate the performance of
modeling systems. Following the work of Tourassi et al.
[10] with neural network
system training and testing, we adopted a round-robin technique. In this
technique, a testing set is formed by removing only one of the examples. The
system is built from the remaining examples and then tested on the one that
was removed. The testing example is replaced in the set and another is
removed. The system is again built and tested. This procedure is repeated
until all examples have been used as testing cases. Performance of the
case-based reasoning system is evaluated by setting a threshold on the
malignancy fraction for the set of all tested examples. By adjusting this
threshold on the malignancy fraction over the range from zero to one and
computing the sensitivity and specificity of the system at each threshold, the
receiver operating characteristic (ROC) curve is generated for analysis.
Analysis
The performance of the system was quantified using the area under the ROC
curve, the area under the partial ROC curve, and specificity at a fixed
sensitivity. It is general practice to use a fitting algorithm to estimate the
area under the ROC curve. The most recognized of these programs are those from
Charles Metz [11].
Unfortunately, the program (ROCKIT; Charles Metz, University of Chicago,
Chicago, IL) produced a fit that did not well represent the actual data in the
region of greatest interest: the high-sensitivity region. For this reason, the
areas under the curves reported here were computed directly from the data
using Newton's method of integration. As an unfortunate consequence of using
the data directly, we are currently unable to report the statistical
significance of differences between two ROC curves. Although the area under
the ROC curve is a customary measure of performance, the weights assigned to
sensitivity and specificity are equal. For breast cancer prediction, the cost
of missing a malignancy is greater than the cost of performing a benign
biopsy; therefore, a more appropriate measure is the performance at high
sensitivity. For this reason, we report three other measures: the area under
the partial ROC curve reported for a sensitivity of 0.9 or greater, the
specificity at a sensitivity of 100% (representing the fraction of benign
biopsies that could be saved while missing no malignancies), and the
specificity at a sensitivity of 98% (representing the fraction of benign
biopsies that could be saved while missing 2% of the malignancies). Note that
perfect performance has an area under the full ROC curve of 1.0 and that
chance behavior has an area of 0.5. For the partial ROC curve, perfect
performance has an area of 0.1, and chance behavior has an area of 0.005.
|
|
|---|
|
As a next refinement, we considered matching exactly those six findings found to be significant by linear discriminant analysis (shown as rule "6 Findings," "Distance = 0" in Table 1). This gave an area of 0.7 but with poor specificity (<1%) at a high sensitivity. Next, one mismatched finding was allowed (shown as rule "6 Findings," "Distance = 1" in Table 1) resulting in an area of 0.79. Although this area under the ROC curve is encouraging, the performance at a high sensitivity was disappointing, with an area under the partial ROC curve of only 0.02 and a specificity of less than 1% at both 100% and 98% sensitivities.
Various combinations of the six features were then investigated. Although not every possible combination of features was investigated, a human-directed search resulted in optimal performance using only three features: mass margin, calcification description, and age with a matching window of 3 years difference. A case was allowed to match if two or three of the three features matched. Shown as rule "3 Findings," "Distance = 1" in Table 1, this system performed with an area of 0.83, a partial area of 0.045, a specificity of 25% at 100% sensitivity, and a specificity of 41% at 98% sensitivity. This is comparable with the area of 0.86 that has been reported for artificial neural networks and for mammographers on a subset of this database [12].
The performance of the case-based reasoning system can be compared with that of an artificial neural network model developed from these same data [6] (shown as "ANN" in Table 1). For a round-robin evaluation, the artificial neural network had an area under the full ROC curve of 0.86 compared with 0.83 for the case-based reasoning system. The partial area for the artificial neural network was 0.048 compared with 0.046 for the case-based reasoning system. The specificity at 100% sensitivity was 0.30 compared with 0.25 for the case-based reasoning system. The specificity at 98% sensitivity was 0.42 compared with 0.41 for the case-based reasoning system. These results are similar, with no statistically significant differences.
All the following results are for the case-based reasoning system with three inputs and with a distance cutoff of one. The histogram for this best performance shows the separation of the malignant and benign cases when malignancy fraction is used as a decision variable (Fig. 1). The cases are binned by malignancy fraction from the round-robin test. The bin labels correspond to the left edge of each bin. Thus, the bin labeled 0 includes cases with malignancy fractions of 0 up to but not including 0.1 and has 80 benign and no malignant cases, and the bin labeled 0.1 has 44 benign cases and three malignant cases.
|
Examination of the histogram reveals that the distribution of benign cases is bimodal and that of the malignant cases is not and thus violates the assumptions of the maximum likelihood fitting algorithms typically used to analyze diagnostic systems. The expanded histogram (Fig. 2) shows that if a threshold were set at a value of 0.1 for the output of the case-based reasoning system, then all malignancies could be identified to the right of the threshold (justifiably biopsied), and 81 of the 326 benign cases could be identified to the left (avoided unnecessary biopsy).
|
An encouraging aspect of the performance of the case-based reasoning system is shown by the performance at high sensitivity. As can be seen in Table 2, 100% sensitivity can be maintained with a specificity of 25%. This represents a positive predictive value improvement from 35% to 42%. Of the 326 benign biopsies, 81 patients would have been spared surgery. With a sensitivity of 98%, specificity is 41%. This represents a positive predictive value improvement from 35% to 46%. Of the 326 benign biopsies, 134 patients would have been spared surgery and 10 of the 174 malignancies would have been missed.
|
As Table 2 shows, to use the
system at 100% sensitivity, any case returning a malignancy fraction of
greater than 0.1 would be biopsied. To use the system at 98% sensitivity, any
case returning a malignancy fraction of greater than 0.21 would be biopsied.
Another way to interpret the output is to enter the case findings and examine
the malignancy fraction. If the malignancy fraction for similar cases is
greater than or equal to 0.1 (indicating that
10% of the cases that were
similar to the new case were malignant), then a biopsy should be performed and
the sensitivity of this decision should be 100% with a specificity of 25%.
Examples
In one case (Fig. 3) a
73-year-old woman's mammogram shows a region with a cluster of more than 10
fine-branching microcalcifications. The mammographer indicated that this case
was very likely malignant. Sixty cases were found to match, and the malignancy
ratio was 0.53. With a threshold of 0.10, this ratio of 0.53 would indicate
biopsy. The histologic diagnosis for this case was malignant.
|
In another case (Fig. 4), a 43-year-old woman's mammogram shows an isodense lobulated 18-mm mass with a well-circumscribed margin. The mammographer indicated that this case was very likely benign. The findings from this case matched 121 cases in the database, resulting in a malignancy ratio of 0.07. With a threshold of 0.10, no biopsy would be indicated. The histologic diagnosis for this case was benign.
|
In a third case (Fig. 5), a 45-year-old woman's mammogram shows an isodense irregular 25-mm mass with an obscured margin. The mammographer indicated that this case was very likely malignant. Of the 88 cases that matched this third case, nine were malignant, producing a ratio of 0.10. With the threshold lowered to 0.15, the system would not recommend biopsy, and this malignancy would have been missed if the decision to biopsy were made solely by the computer. The histologic diagnosis for this case was malignant. This example represents the malignant case with the lowest malignancy ratio (0.10) returned by the case-based reasoning system. This is the malignant case that would not be recommended for biopsy if the threshold were raised to 0.15 to save 40 more benign biopsies. The mammographer indicated that this case was very likely malignant. In such a situation, the opinion of the mammographer would be accepted, and the patient would be correctly referred for biopsy.
|
Computational Time
With a database of 500 cases, a new case can be compared with the entire
database in less than 0.04 sec running in a nonoptimized database language on
a Pentium III 600-MHz personal computer (Intel, Santa Clara, CA).
|
|
|---|
Although the artificial neural network techniques show excellent performance, interpretation of an individual prediction is challenging because of the nonlinear multidimensional representation of the decision space. Comparison of case-based reasoning performance with that of an artificial neural network model that was optimized with the same findings and cases shows that although the ROC area of the artificial neural network is larger, the specificity of the case-based reasoning system was better at high sensitivity when such a decision aid is likely to be used. The Bayesian network approach is theoretically attractive, although the underlying assumptions may be difficult to justify given the finite size of available databases. This case-based reasoning approach is easily implemented directly in the relational databases on which current mammography reporting systems are built.
A major attraction of this technique is its simplicity and intuitive clarity. The case-based reasoning system estimates the answer to the question, "Of all cases that are similar to this one, how many were malignant at biopsy?" The mechanism is easy to understand: Find all of the previous cases that are similar, and then report the fraction of those cases that were malignant.
One disadvantage of the case-based reasoning technique is the possibility that a new case will be presented that has no match in the database. Although this did not occur in the current database with the coarse matching criteria described previously, it is more likely to occur as more restrictive matching criteria are applied. This can be addressed by an adaptive matching criteria that will broaden the criteria for a case match, if too few matches are found, using a more strict criteria. Expansion of the database will also decrease the probability of such an occurrence.
Another potential difficulty is that two mammographers sometimes will report the same lesion using different descriptors. This difficulty may be overcome by matching rules that allow two cases to be matched if they are similar but do not require that they are identical. An example of this is the criteria described in this section for the age feature where a match was formed if two ages were within 3 years. Two interpretations of the same mammogram might not be identical, but they are likely to be similar.
The proposed system has the potential to add much practical value to a mammography reporting system. One use of the system is to serve as reference material for the mammographer who is considering a case for biopsy. A mammographer in a busy medical center may see 600 biopsied cases in a year. Given a 35-year career, this hypothetic mammographer may see fewer than 20,000 cases referred for biopsy. A more typical mammographer would see fewer than 200 cases per year with a career total of 7000. It is reasonable to expect that the system described here could easily contain more cases than a typical or even a busy mammographer could see during a lifetime of work. Evaluating a new case against such a database of 25,000 cases could be performed in less than 2 sec using a 600-MHz personal computer. The prediction system can be integrated in a seamless manner because most mammography reporting systems are database applications. Such a system could report the malignancy fraction to the mammographer less than 4 sec after the findings for the case were entered into the reporting system.
For the threshold of 0.1, we reported 81 benign biopsies that could have been avoided. These included 60 masses with well-circumscribed margins, one mass with a microlobulated margin, 18 masses with obscured margins, one mass with an ill-defined margin and with associated calcifications described as coarse, and one mass with a well-circumscribed margin and with associated calcifications described as indistinct.
The study described in this article considered only cases in which biopsies were performed. The actual sensitivity of the mammographers in this database was thus 100%, and their specificity was 0% because every case was biopsied. In a future study, we will include cases in which patients were considered for biopsy but were followed up instead. Some fraction of these patients will have developed cancer that would have been diagnosed had a biopsy been performed earlier. This future evaluation will examine whether the case-based reasoning system would have found any of these malignancies and thus improve the sensitivity of mammography.
|
|
|---|
This article has been cited by other articles:
![]() |
E. S. Burnside, D. L. Rubin, J. P. Fine, R. D. Shachter, G. A. Sisney, and W. K. Leung Bayesian Network to Predict Breast Cancer Risk of Mammographic Microcalcifications and Reduce Number of Benign Biopsy Results: Initial Experience Radiology, September 1, 2006; 240(3): 666 - 673. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. K. Markey, J. Y. Lo, and C. E. Floyd Jr Differences between Computer-aided Diagnosis of Breast Masses and That of Calcifications Radiology, March 21, 2002; (2002) 2232011257. [Abstract] [Full Text] |
||||
![]() |
M. K. Markey, J. Y. Lo, and C. E. Floyd Jr Differences between Computer-aided Diagnosis of Breast Masses and That of Calcifications Radiology, May 1, 2002; 223(2): 489 - 493. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |