April 2009, VOLUME 192
NUMBER 4

Recommend & Share

April 2009, Volume 192, Number 4

Women's Imaging

Original Research

A Logistic Regression Model Based on the National Mammography Database Format to Aid Breast Cancer Diagnosis

+ Affiliations:
1Department of Radiology, University of Wisconsin School of Medicine and Public Health, E3/311 Clinical Science Center, 600 Highland Ave., Madison, WI 53792-3252.

2Industrial & Systems Engineering, University of Wisconsin, Madison, Madison, WI.

3Present address: Health Economic Statistics, Merck Research Labatories, North Wales, PA.

4Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI.

5Department of Radiology, Medical College of Wisconsin, Milwaukee, WI.

Citation: American Journal of Roentgenology. 2009;192: 1117-1127. 10.2214/AJR.07.3345

ABSTRACT
Next section

OBJECTIVE. The purpose of our study was to create a breast cancer risk estimation model based on the descriptors of the National Mammography Database using logistic regression that can aid in decision making for the early detection of breast cancer.

MATERIALS AND METHODS. We created two logistic regression models based on the mammography features and demographic data for 62,219 consecutive mammography records from 48,744 studies in 18,270 patients reported using the Breast Imaging Reporting and Data System (BI-RADS) lexicon and the National Mammography Database format between April 5, 1999 and February 9, 2004. State cancer registry outcomes matched with our data served as the reference standard. The probability of cancer was the outcome in both models. Model 2 was built using all variables in Model 1 plus radiologists' BI-RADS assessment categories. We used 10-fold cross-validation to train and test the model and to calculate the area under the receiver operating characteristic curves (Az) to measure the performance. Both models were compared with the radiologists' BI-RADS assessments.

RESULTS. Radiologists achieved an Az value of 0.939 ± 0.011. The Az was 0.927 ± 0.015 for Model 1 and 0.963 ± 0.009 for Model 2. At 90% specificity, the sensitivity of Model 2 (90%) was significantly better (p < 0.001) than that of radiologists (82%) and Model 1 (83%). At 85% sensitivity, the specificity of Model 2 (96%) was significantly better (p < 0.001) than that of radiologists (88%) and Model 1 (87%).

CONCLUSION. Our logistic regression model can effectively discriminate between benign and malignant breast disease and can identify the most important features associated with breast cancer.

Keywords: logistic regression, mammography, National Mammography Database, risk prediction

Introduction
Previous sectionNext section

Mammography, accepted as the most effective screening method in the detection of early breast cancer, still has limited accuracy

and significant interpretation variability that decreases its effectiveness [16]. The use of computer models can help by detecting abnormalities on mammograms [710]; estimating the risk of breast cancer for improved sensitivity and specificity of diagnosis [1116]; and identifying high-risk populations for screening, genetic testing, or participation in clinical trials [1722]. This study focuses on the second goal: the use of a computer-aided diagnosis (CADx) model for risk estimation to aid radiologists in breast cancer diagnosis.

CADx models can quantify the risk of cancer using demographic factors and mammography features already identified by a radiologist or a computer-aided detection model. CADx models estimate the probability (or risk) of disease that can be used for improved decision making by physicians and patients [2325]. Previous studies on CADx tools use either small subsets of data, suspicious mammograms, or mammograms recommended for biopsy [1115]. Although most of these studies show that CADx tools are efficient in predicting the outcome as benign or malignant disease, none shows the effectiveness of CADx models when applied to mammography data collected during daily clinical practice. In addition, previous studies used biopsy results as the reference standard, whereas we use a match with our state cancer registry. To our knowledge, our study is the first one to develop and test a logistic regression–based CADx model based on consecutive mammograms from a breast imaging practice incorporating BI-RADS descriptors.

As the variables that help predict breast cancer increase in number, physicians must rely on subjective impressions based on their experience to make decisions. Using a quantitative modeling technique such as logistic regression to predict the risk of breast cancer may help radiologists manage the large amount of information available, make better decisions, detect more cancers at early stages, and reduce unnecessary biopsies. The purpose of this study was to create a breast cancer risk estimation model based on demographic risk factors and BI-RADS descriptors available in the National Mammography Database using logistic regression that can aid in decision making for the improved early detection of breast cancer.

Materials and Methods
Previous sectionNext section

The institutional review board determined that this retrospective HIPAA-compliant study was exempt from requiring informed consent. We used variables collected in the National Mammography Database [26] to develop a CADx model. The National Mammography Database is a recommended format for collecting practice-level mammography audit data to monitor and standardize performance nationally. The National Mammography Database includes Breast Imaging Reporting and Data System (BI-RADS) descriptors [27, 28].

Subjects

We collected data from all screening and diagnostic mammography examinations that were per formed at the Medical College of Wisconsin, Milwaukee, an academic, tertiary care medical center, between April 5, 1999 and February 9, 2004. Our database included 48,744 mammography examinations (477 malignant and 48,267 benign) performed on 18,270 patients (Table 1) having the mean age of 56.8 years (range, 18–99 years). Our data set consisted of 65,892 records; each record represents a mammography lesion (benign or malignant) observed on the mammogram or a single record of demographic factors only, if nothing is observed on the mammogram. The data were entered using the PenRad mammo graphy reporting and tracking data system (struc tured reporting software, PenRad) by technologists and radiologists. There were a total of eight radiologists, four of whom were general radiologists with some mammography background, two who were fellowship-trained, and two who had lengthy experience in breast imaging. The experience of the eight radiologists ranged between 1 and 35 years, and the number of mammograms interpreted by them ranged from 49 to 22,219. All mammography observations were made by radiologists; all demographic factors were recorded by technologists. This facility used a combination of digital and film mammography (∼ 75% film mammo graphy). No computer-aided detection tool was used for lesion detection. Mean glandular dose was not available at the time of our study.

TABLE 1: Distribution of Study Population

The clinical practice we studied routinely converts screening examinations to diagnostic mammo graphy examinations when an abnormality is identified; therefore, practice performance para meters were calculated in aggregate because these examinations could not be accurately separated. Specifically, we measured recommended per formance parameters (cancer detection rate, early-stage cancer detection rate, and abnormal interpretation rate) for all mammograms in our data set.

In contrast to our practice performance audit, based on mammograms, the analysis of the classification accuracy of the logistic regression model and radiologists was conducted at the record level. Because breast cancer classification actually occurs at the record level (i.e., each finding on mammography will require a decision to recall or to biopsy), we target this level of level of detail to help improve radiologists' performance. We clearly indicate when analyses in this article are based on mammograms rather than on records.

We used cancer registry matching as the reference standard in this study. All newly diagnosed cancer cases are reported to the Wisconsin Cancer Reporting System. This registry collaborates with several other state agencies to collect a range of data, including demographic information, tumor characteristics, treatment, and mortality. Data exchange agreements with 17 other state cancer registries yield data for Wisconsin residents receiving care in other states. We sent 65,892 records in the database to the cancer registry and received back 65,904 records after their matching protocol. An additional 12 records were returned to us because of duplication of records for patients diagnosed with more than one cancer. We developed an automated process that confirmed whether the cancer matched the assigned abnormality. This process ensured that the record indicated the same side and the same quadrant and that the diagnosis was made no longer than 12 months after the mammography date. If more than one record indicated the same side and quadrant, the matching was done manually. We used a 12-month follow-up period as the reference standard because it has been recommended as an interval sufficient to identify false-negatives in mammography prac tice audits [27, 28]. We removed 299 records belonging to 188 mammograms from 124 women because they could not be matched due to missing laterality or quadrant information from either the cancer registry (117 records) or the mammography structured report (182 records) (Table 2). Of the unmatched 299 records, 183 records represented a second record identifying a finding in women who already had a cancer matched to the registry. The remaining 116 records consisted of 38 BI-RADS category 1, 24 category 2, 22 category 3, 21 category 0, four category 4, and seven category 5. We then removed 101 duplicates. Finally, we removed 3,285 records that had BI-RADS assessment categories 0, 3, 4, and 5 (indicating a finding) that did not have descriptors recorded in the record. The final sample consisted of 62,219 (510 malignant, 61,709 benign) records.

TABLE 2: Data Processing

figure
View larger version (25K)
Fig. 1 Descriptors of National Mammography Database [26] entered to build logistic regression model for breast cancer prediction.

aBinary variable with categories “Present” or “Not Present.”

bClass 1, predominantly fatty; class 2, scattered fibroglandular; class 3, heterogeneously dense; and class 4, extremely dense tissue.

Statistical Analysis

Model construction—Logistic regression, a statistical approach to predicting the presence of a disease based on available variables (symptoms, imaging data, patient history, and so forth), has been successfully used for prediction and diagnosis in medicine [29, 30]. To build a breast cancer risk estimation model, we mapped the variables collected by physicians in their daily clinical practice (based on BI-RADS descriptors in the National Mammography Database) to 36 discrete variables. Figure 1 shows the schema of these variables used to build the model. We constructed two risk estimation models. Both models used the presence or absence of breast cancer as the dependent variable, and these 36 discrete variables were used as independent variables to build the model. Model 2 included these same variables plus the BI-RADS assessment categories assigned by the radiologists. More than 600 two-way interaction effects are possible in each model. We did not include any interaction term in our models.

Before model construction, we grouped BI-RADS categories 1 and 2 as “BI-RADS 1 or 2” because these cases had a low frequency of malignancy. The logistic regression model was built using R statistical software (The R Foundation for Statistical Computing) [31]. We used forward selection based on the chi-square test of the change in residual deviance. We used a cutoff of p < 0.001 for adding new terms. This stringent criterion was used to avoid including terms that, although statistically significant because of the large sample size, are not clinically important. The p values listed in Tables 3 and 4 are from chi-square tests of the significance of each term entered last. The importance of each term in predicting breast cancer can be assessed using the odds ratios provided in the tables. The details of logistic regression (including the interpretation of odds ratios) are discussed in Appendix 1.

TABLE 3: Model I, Multivariable Model with BI-RADS Categories Excluded

TABLE 4: Model 2, Multivariable Model with BI-RADS Categories Included

A number of sources of correlation are possible in these data. Findings from a particular radiologist may be more similar than findings from different radiologists, findings within a patient may be more similar than those from different patients, and findings during the same mammography visit of a patient may be more similar than those during other mammography visits of the same patient. We investigated models in which the radiologist is included as a random effect and compared it with our models in which the radiologist is excluded from the models. We found no substantial differences in the coefficients for the other terms in the model due to including the radiologist as a random effect. Thus, we chose the simpler model without the radiologist. We were unable to test random effects for patient or for mammogram within patient because the expected number of cancers for each patient is very small. Random effects models tend to be biased in these circumstances [32]. Instead, we relied on our stringent criterion of p < 0.001 for inclusion in the model to avoid the overly optimistic p value that occurs when the variance of the parameters is reduced by positive correlation induced by clustered data. The parameter estimates themselves are unbiased regardless of the form of the variance.

To show that BI-RADS descriptors substantively contribute to prediction accuracy in Model 2, we also constructed a secondary model (Model 3). Model 3 omits these descriptors and includes only patient demographic factors (age, history of breast cancer, family history of breast cancer, history of surgery, breast density, and hormone therapy) and BI-RADS assessment categories as independent variables to test whether performance declines. The details of Model 3 are provided in Appendix 2.

Model evaluation—We used a 10-fold cross-validation technique to evaluate the predictive performance of the two models. This methodology avoids the problem of validating the model on the same data used to estimate the parameters by using separate estimation and evaluation subsets of the data. Specifically, we divided the data set into 10 subsets (with approximately one tenth of benign abnormalities and one tenth of malignant abnormalities in each subset or “fold”) so that all abnormalities associated with a single patient were assigned to the same fold. This ensured that all folds are independent of each other. We started with the first nine folds (omitting the 10th fold) to estimate the coefficients of the independent variables (training) and predicted the probability of cancer on the 10th fold (testing). Then we omitted the 9th fold (used as the testing set) and trained the model using the other nine folds. Similarly, we tested on each fold. Finally, we combined all test sets to obtain a full-test set and evaluated the overall performance of the model using the full-test set. Note that for inclusion of variables in the final model, we used the whole data set (62,219 records), which gave us the best possible estimates of the variables from the available data.

Performance measures—We measured the performance of the two models using the outcome (i.e., the probability of cancer) of the full-test set obtained by 10-fold cross-validation. We plotted and measured area under the receiver operating characteristic (ROC) curve of Model 1 and Model 2 using the probability of cancer. We measured the performance of radiologists using BI-RADS assessment categories assigned to each mammography record. We first ordered BI-RADS assessment categories by likelihood of breast cancer (1, 2, 3, 0, 4, and 5), generated an ROC curve, and measured its area (Az) using a nonparametric method [33]. We compared the performance of the two models with that of radiologists using the nonparametric method of DeLong et al. [34] for comparing two or more areas under ROC curves obtained from the same data set.

For the purpose of assessing the sensitivity and specificity of radiologists, we classified BI-RADS categories 1, 2, and 3 as negative; and BI-RADS categories 0, 4, and 5 as positive [28]. We compared the sensitivity of the two models with the radiologists' sensitivity at 90% specificity, and the specificity of the two models with the radiologists' specificity at 85% sensitivity, with the corresponding CIs estimated using the efficient score method corrected to continuity [35]. Note that the points “sensitivity at 90% specificity” and “specificity at 85% sensitivity” on the radiologists' ROC curve were not observed in practice; they were obtained from the linear interpolation of the two neighboring discrete points. We used these levels of sensitivity and specificity because they represent the minimal performance thresholds for screening mammography [36]. We also estimated the number of true-positive and false-negative records at 90% specificity by multiplying the sensitivity (of radiologists, Model 1 and Model 2) by the total number of malignant records. Similarly, we estimated the number of false-positive and true-negative records at 85% sensitivity by multiplying the specificity (of radiologists, Model 1 and Model 2) by the total number of benign records. Finally, we identified the most important predictors of breast cancer using the odds ratio given in the

Results
Previous sectionNext section
Practice Performance

We found the following distribution of breast tissue density: predominantly fatty tissue, 15%; scattered fibroglandular tissue, 41%; heterogeneously dense tissue, 35%; and extremely dense tissue, 9% (Table 1). At the mammogram level, the cancer detection rate was 9.8 cancers per 1,000 mammograms (477 cancers for 48,744 mammograms). The abnormal interpretation rate was 18.5% (9,037 of 48,744 mammograms). Of all cancers detected, 71.9% were early-stage (0 or 1) and only 25.9% had lymph node metastasis. Radiologists showed a sensitivity of 90.5% and a specificity of 82.2% as estimated from BI-RADS assessment categories on the mammogram level.

Logistic Regression Model

In Model 1, 10 independent variables (mammographic features and demographic factors) were found to be significant in predicting breast cancer (Table 3). The most important predictors associated with breast cancer as identified by this model were spiculated mass margins, high mass density, segmental calcification distribution, pleomorphic calcification morphology, and history of invasive carcinoma. Age was not found to be a significant predictor, but it was included in the model because of its clinical relevance. In Model 2, which included BI-RADS assessment categories, nine independent variables were significant in predicting the risk of breast cancer (Table 4). The most important predictors associated with breast cancer identified by this model were BI-RADS assessment categories 0, 4, and 5; segmental calcification distribution; and history of invasive carcinoma. Note that the inclusion of BI-RADS assessment categories in Model 2 removed some of the significant predictors found in Model 1 and added others. We tested for the significance of variables in both the models (as shown in Tables 3 and 4) using the whole data set. Among demographic factors, none of the models found family history of breast cancer or use of hormones to be significant predictors of breast cancer. Among imaging descriptors, none of the models found breast density, architectural distortion, and amorphous calcification morphology to be significant predictors of breast cancer.

Radiologists achieved an Az of 0.939 ± 0.011 as measured by the BI-RADS assessment category assigned to each record. Model 1 achieved an Az of 0.927 ± 0.015, which was not significantly different (p = 0.104) from the radiologists' Az. Model 2, with an Az of 0.963 ± 0.009, performed significantly better (p < 0.001) than radiologists and Model 1 (Fig. 2). At the abnormality level, we found that

at 90% specificity, the sensitivity of Model 2 was 90.2% (95% CI, 87.2–92.6%) and was significantly better (p < 0.001) than that of the radiologists at 82.2% (78.5–85.3%) and Model 1 at 80.7% (77.0–84.1%). Table 5 illustrates that Model 2 identified 41 more cancers than the radiologists at this level of specificity. At a fixed sensitivity of 85%, the specificity of Model 2 at 95.6% (95.4–95.8%) was also significantly better (p < 0.001) than the radiologists at 88.2% (87.9–88.5%) and Model 1 at 87.0% (86.7–87.3%). Table 5 illustrates that Model 2 decreased the number of false-positives by 4,567 when compared with radiologists' performance.

TABLE 5: Performance Measures

We now illustrate the use of the logistic regression models to estimate the probability of cancer using three cases.

Case 1—A 45-year-old woman presented with a circumscribed oval mass of equal density on her baseline mammogram. She was assigned BI-RADS category 4 by the radiologist for this abnormality. Model 1 and Model 2 estimated her probability of cancer to be equal to 0.05% (95% CI, 0.01–0.23%). and 1.79% (0.27–11.11%), respectively. Biopsy of this case was benign. This is a classic example of a probably benign finding with an estimate of breast cancer of less than 2%.

Case 2—A 52-year-old woman with a history of breast cancer had a mammogram that showed an ill-defined oval mass (< 3 cm) that was increasing in size and had density equal to the surrounding glandular tissue. The radiologist assigned BI-RADS category 3. The probability of malignancy for this finding using Model 1 was 30.6% (8.2–68.6%) and for Model 2 was 3.6% (0.7–17.4%). Biopsy revealed malignancy. This case illustrates the superior predictive ability for Model 1 because the BI-RADS category was not correct and misled Model 2.

Case 3—A 60-year-old woman with a family history of breast cancer had a mammogram that showed a mass with a spiculated margin and irregular shape. Model 1 estimated her probability of cancer to be 51.2% (24.4–78.3%).This abnormality was assigned BI-RADS category 5. Model 2 estimated her probability of cancer to be 69.7% (33.5–91.2%). The biopsy outcome of this case was malignant. This case is a straightforward case of malignancy in which a correct BI-RADS category increases the probability of malignancy using Model 2.

figure
View larger version (12K)
Fig. 2 Graph shows receiver operating characteristic curves constructed from output probabilities of Model 1 and Model 2, and radiologist's BI-RADS assessment categories. AUC = area under curve.

Discussion
Previous sectionNext section

We constructed two breast cancer risk estimation models based on the National Mammography Database descriptors to aid radiologists in breast cancer diagnosis. Our results show that the combination of a logistic regression model and radiologists' assessment performs better than either alone in discriminating between benign and malignant lesions. The ROC curve of Model 1, which includes only demographic factors and mammography observations, overlaps and intersects with the radiologists at certain points in the curve, showing that one is not always better than the other. On the other hand, Model 2, which also includes radiologists' impressions, clearly dominates the other two ROC curves, indicating better sensitivity and specificity at all threshold levels. Adding radiologists' overall impressions (BI-RADS category) in Model 2, we could identify more malignant lesions and avoid false-positive cases as compared with the performance of Model 1 and radiologists alone.

Our computer model is different in various ways when compared with the existing mammography computer models in the literature. The existing models can be categorized in the following ways: for detecting abnormalities present on the mammograms, for estimating the risk of breast cancer based on the mammographic observations and patient demographic information, and for predicting the risk of breast cancer to identify high-risk individuals. The first category of models is used to identify abnormalities on the mammograms, whereas our model provides the interpretation of mammography observations after they are identified. The models in the second category, in which we classify our model, have used suspicious findings recommended for biopsy for training and evaluation or biopsy results as the reference standard. For example, one study constructed a Bayesian network using 38 BI-RADS descriptors; by training the model on 111 biopsies performed on suspicious calcifications, they found an Az of 0.919 [37]. Another study developed linear discriminant analysis and artificial neural network models using a combination of mammographic and sonographic features; they found an Az of 0.92 [16]. In contrast, our computer model was trained and evaluated on consecutive mammography examinations and used registry match as the reference standard. The third category of models (risk prediction models) has been built using consecutive cases, but they included only demographic factors and breast density in their model [19, 21, 22] and cannot be directly compared with our model.

In addition, our model differs from these risk prediction models by estimating the risk of cancer at a single time point (i.e., at the time of mammography) instead of over an interval in the future (e.g., over the next 5 years). In contrast to their findings, our model did not find breast density to be a significant predictor of breast cancer. This could be because the risk of breast cancer is explained by more informative mammographic descriptors in our logistic regression model. Our model reinforces previously known mammography predictors of breast cancer—irregular mass shape; ill-defined and spiculated mass margins; fine linear calcifications; and clustered, linear, and segmental calcification distributions [38]. In addition, we found increasing mass size and high mass density to be significant predictors, which has not been shown in the literature to our knowledge. Note that our results reflect a single practice and must be viewed with some caution with respect to their generalizability because significant variability has been observed in the interpretive performance of screening and diagnostic mammography [5, 6].

We developed two risk estimation models by excluding (in Model 1) and including (in Model 2) BI-RADS assessment categories. Although Model 2 performed significantly better than Model 1 in discriminating between benign and malignant lesions, Model 2 may have weaknesses as a stand-alone risk estimation tool if the assessed BI-RADS category is incorrect. If the BI-RADS assessment category does not agree with the findings, Model 1 and Model 2 used jointly will show a high level of disagreement in the prediction of breast cancer (as in example case 2) and will potentially indicate this error. When the radiologist's BI-RADS category is correct (i.e., when there is an agreement between the predictions of Model 1 and Model 2), Model 2 will be a better model for breast cancer prediction. In future work, we plan to estimate the level of disagreement between the two models and investigate the possible use of these models as complementary tools.

Our secondary model (Model 3) showed that the exclusion of the BI-RADS descriptors significantly impairs the performance of the logistic regression model, underscoring the need for the collection of these variables in a clinical practice.

It is common for clinical data sets to contain a substantial number of missing data. Although complete data are ideally better, that situation is rarely encountered in the real world. There is no perfect way to handle missing data, but there are two possibilities: to impute the missing descriptor depending on the fraction of various possible values of the descriptor or to assume that the missing descriptor was not observed by radiologists and mark it as “not present.” When building the model, we made the decision to label all of the missing data as not present; therefore, when testing and applying the model on a new case, the missing descriptors should be treated as not present. Our approach to handling missing data is appropriate for mammography data, where radiologists often leave the descriptors blank if nothing is observed on the mammogram.

To our knowledge, no prior studies discuss a logistic regression–based CADx model incorporating mammography descriptors from consecutive mammograms from a breast imaging practice. The use of a logistic regression model has some attractive features when compared with artificial intelligence prediction tools (e.g., artificial neural networks, Bayesian networks, support vector machines). Logistic regression can identify important predictors of breast cancer using odds ratios and can generate confidence intervals that provide additional information for decision making.

Our models' performance depends on the ability of radiologists to accurately identify findings on mammograms. Therefore, based on the literature, performance may be higher in facilities where most mammograms are read by mammography subspecialists as compared with general radiologists [39]. However, with appropriate training [40], general radiologists in combination with the model may approach the accuracy of subspecialty-trained mammographers. Decreasing variability in mammography interpretation, one of the underlying motivations of this research, can only be realized with further development of tools such as our model and with research to validate accuracy, effectiveness, and generalizability. We consider this work to be only a first step toward this goal.

We could not compare practice parameters directly with the literature because screening and diagnostic examinations could not be separated for this database. Our prediction Model 2 shows a significant improvement over radiologists' assessment in classifying abnormalities when built on a mix of screening and diagnostic data. The model's performance may differ when built separately on screening and diagnostic mammograms. For screening mammograms, the incidence is low and descriptors are less exact because of general imaging protocols and so may result in less accurate model parameters. In contrast, for diagnostic mammograms, the model parameters may be more accurate because more descriptors can be observed as a result of additional specialized views. In addition, the performance of our existing model may differ when tested on screening and diagnostic mammograms separately. The model may perform better when tested on the diagnostic examinations but worse when tested on the screening examinations.

Our risk estimation models are designed to aid radiologists, not to act as a substitute. The improvement in the model's performance by adding BI-RADS assessments indeed suggests that the radiologist's integration of the imaging findings summarized by the BI-RADS assessment categories does augment predictions based on the observed mammographic features. However, the logistic regression model contributes an additional measure of accuracy over and above that provided by the BI-RADS assessment categories, as evidenced by the improved performance compared with that of the radiologists alone.

The objective of our model is to aid decision making by generating a risk prediction for a single point in time (at mammography). As we were designing the study, we did not want to influence the probability of breast cancer based on future events but only on variables identified at the time of mammography. For this reason, we excluded unmatched BI-RADS 1 cases from our analyses, which represented either undetected cancer (present on the mammogram but not seen) or an interval cancer (not detectable on the mammogram). The inclusion of these cases may have erroneously increased the probability of malignancy by considering future risks rather than making a prediction at a single time based on mammography features alone. However, the exclusion of these cases may have erroneously decreased the estimated probability of malignancy, given that at least some of the false-negative cancers were likely present at the time of the mammogram, especially those in women with dense breasts, which is a limitation of our model.

Our models provide the probability of cancer as the outcome that can be used by radiologists for making appropriate patient management decisions. The use of such models has a potential to reduce the interpretive variability of mammography across practices and radiologists. Our models also facilitate shared decision making by providing the probability of cancer, which can be better understood by patients than BI-RADS categories. In the future, we will test our models' performance on other mammography practices to evaluate their generalizability. We will also include potentially important interaction effects that deserve particular attention. Note that including interaction effects will further improve the performance of our models.

In conclusion, we found that our logistic regression models (Model 1 and Model 2) can effectively discriminate between benign and malignant lesions. Furthermore, we have found that the radiologist alone or the logistic regression model incorporating only mammographic and demographic features (Model 1) are inferior to Model 2, which incorporates the model, the features, and the radiologist's impression as captured by the BI-RADS assessment categories. Our study supports that further research is needed to define how radiologists and computational models can collaborate, each adding valuable predictive features, experience, and training to improve overall performance.

APPENDIX 1: Logistic Regression
Previous sectionNext section

Binomial (or binary) logistic regression is a form of regression that is used when the dependent variable is dichotomous (e.g., present or absent) and the independent variables are of any type (discrete or continuous). The independent (observed) variables, Xi, X2,... Xn, are related to the dependent (outcome) variable, Y, by the following equation: where β1 is the regression coefficient of X1, p = probability {Y = 1}, and . The value of p can be calculated by taking the inverse of the Logit (p) as shown in the following equation: where p is the probability of the presence of disease (e.g., probability of cancer) when the findings X1, X2,... Xn, (e.g., calcification types, breast density, and age) are identified. βi is the coefficient of the independent variable Xi that is estimated using the available data (training set). Only significant variables (p values ≥ a) are included in the model. Variables can be added by stepwise, forward, or backward selection methods. Odds ratio is commonly used to interpret the effect of independent variables on the dependent variable, which is estimated by exp (βi). For example, if β1 is the coefficient of variable “prior history of breast surgery,” then exp is the odds ratio corresponding to the history of surgery—that is, the odds that the patient has a malignant lesion increases by the factor of exp (β1) if the patient has ever had breast surgery and all other independent variables remain fixed. More details of logistic regression and its application to the medical field can be found in other sources [29, 41, 42].

APPENDIX 2: Model 3
Previous sectionNext section

In order to assess the contribution of mammography descriptors in estimating the risk of breast cancer, we constructed Model 3, which included patient demographic factors (age, history of breast cancer, family history of breast cancer, history of surgery, breast density, and hormone therapy) and BI-RADS assessment categories, and excluded mammography descriptors. Only three variables were found significant in predicting the risk of cancer in Model 3 (Table 6); BI-RADS assessment categories were the most important predictor.

TABLE 6: Model 3, Multivariate Model with Patient Demographic Factors and BI-RADS Categories Only

We measured the performance of our model using receiver operating characteristic (ROC) curves and precision–recall curves (Figs. 3 and 4). We used precision–recall curves in addition to ROC curves to gain more insights into the performance of our model because precision–recall curves have higher discriminative power than ROC curves in cases of skewed data [43, 44]. “Precision” measures the positive predictive value and “recall” measures the sensitivity of a test. We plotted and measured the area under the precision–recall curve (APR) of the three models (Model 1, Model 2, and Model 3) and radiologists using the probability of cancer and BI-RADS assessment categories, respectively [43].

Model 3 achieved an Az (area under the ROC curve) and APR that were significantly higher than that of Model 1 and radiologists (all p < 0.001). More important, Model 3 excluding descriptors performed significantly worse (p < 0.001) than Model 2 including descriptors in terms of Az and APR (Table 7). Thus, the inclusion of mammographic descriptors significantly contributes to the superior performance of Model 2.

TABLE 7: Comparison of Area Under Receiver Operating Characteristic (Az) and Precision–Recall (APR) Curves

figure
View larger version (14K)
Fig. 3 Graph shows receiver operating characteristic curves constructed from output probabilities of Model 1, Model 2, and Model 3, and radiologist's BI-RADS assessment categories. AUC = area under curve.

figure
View larger version (24K)
Fig. 4 Graph shows precision–recall curves constructed from output probabilities of Model 1, Model 2, and Model 3, and radiologist's BI-RADS assessment categories. AUC = area under curve, PPV = positive predictive value.

Address correspondence to E. S. Burnside ().

References
Previous sectionNext section
1. Kopans DB. The positive predictive value of mammography. AJR 1992; 158:521-526 [Abstract] [Google Scholar]
2. Barlow WE, Chi C, Carney PA, et al. Accuracy of screening mammography interpretation by characteristics of radiologists. J Natl Cancer Inst 2004; 96:1840-1850 [Google Scholar]
3. Kerlikowske K, Grady D, Barclay J, et al. Variability and accuracy in mammographic interpretation using the American College of Radiology Breast Imaging Reporting and Data Systems. J Natl Cancer Inst 1998; 90:1801-1809 [Google Scholar]
4. Elmore JG, Miglioretti DL, Reisch LM, et al. Screening mammograms by community radiologists: variability in false-positive rates. J Natl Cancer Inst 2002; 94:1373-1380 [Google Scholar]
5. Miglioretti DL, Smith-Bindman R, Abraham L, et al. Radiologist characteristics associated with interpretive performance of diagnostic mammography. J Natl Cancer Inst 2007; 99:1854-1863 [Google Scholar]
6. Taplin S, Abraham L, Barlow WE, et al. Mammography facility characteristics associated with interpretive accuracy of screening mammography. J Natl Cancer Inst 2008; 100:876-887 [Google Scholar]
7. Freer TW, Ulissey MJ. Screening mammography with computer-aided detection: prospective study of 12,860 patients in a community breast center. Radiology 2001; 220:781-786 [Google Scholar]
8. Dean JC, Ilvento CC. Improved cancer detection using computer-aided detection with diagnostic and screening mammography: prospective study of 104 cancers. AJR 2006; 187:20-28 [Abstract] [Google Scholar]
9. Cupples TE, Cunningham JE, Reynolds JC. Impact of computer-aided detection in a regional screening mammography program. AJR 2005; 185:944-950 [Abstract] [Google Scholar]
10. Birdwell RL, Bandodkar P, Ikeda DM. Computer-aided detection with screening mammography in a university hospital setting 1. Radiology 2005; 236:451-457 [Google Scholar]
11. Baker JA, Kornguth PJ, Lo JY, Williford ME, Floyd CE Jr. Breast cancer: prediction with artificial neural network based on BI-RADS standardized lexicon. Radiology 1995; 196:817-822 [Google Scholar]
12. Bilska-Wolak AO, Floyd CE Jr. Development and evaluation of a case-based reasoning classifier for prediction of breast biopsy outcome with BI-RADS lexicon. Med Phys 2002; 29:2090-2100 [Google Scholar]
13. Burnside ES, Rubin DL, Shachter RD. Using a Bayesian network to predict the probability and type of breast cancer represented by microcalcifications on mammography. Stud Health Technol Inform 2004; 107(Pt 1):13-17 [Google Scholar]
14. Fischer EA, Lo JY, Markey MK. Bayesian networks of BI-RADS descriptors for breast lesion classification. Conf Proc IEEE Eng Med Biol Soc 2004; 4:3031-3034 [Google Scholar]
15. Markey MK, Lo JY, Floyd CE. Differences between computer-aided diagnosis of breast masses and that of calcifications. Radiology 2002; 223:489-493 [Google Scholar]
16. Jesneck JL, Lo JY, Baker JA. Breast mass lesions: computer-aided diagnosis models with mammographic and sonographic descriptors. Radiology 2007; 244:390-398 [Google Scholar]
17. Claus EB, Risch N, Thompson WD. Autosomal dominant inheritance of early-onset breast cancer: implications for risk prediction. Cancer 1994; 73:643-651 [Google Scholar]
18. Colditz GA, Rosner B. Cumulative risk of breast cancer to age 70 years according to risk factor status: data from the Nurses' Health Study. Am J Epidemiol 2000; 152:950-964 [Google Scholar]
19. Gail MH, Brinton LA, Byar DP, et al. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 1989; 81:1879-1886 [Google Scholar]
20. Taplin SH, Thompson RS, Schnitzer F, Anderman C, Immanuel V. Revisions in the risk-based Breast Cancer Screening Program at Group Health Cooperative. Cancer 1990; 66:812-818 [Google Scholar]
21. Barlow WE, White E, Ballard-Barbash R, et al. Prospective breast cancer risk prediction model for women undergoing screening mammography. J Natl Cancer Inst 2006; 98:1204-1214 [Google Scholar]
22. Tice JA, Cummings SR, Smith-Bindman R, Ichikawa L, Barlow WE, Kerlikowske K. Using clinical factors and mammographic breast density to estimate breast cancer risk: development and validation of a new predictive model. Ann Intern Med 2008; 148:337-347 [Google Scholar]
23. Vyborny CJ, Giger ML, Nishikawa RM. Computer-aided detection and diagnosis of breast cancer. Radiol Clin North Am 2000; 38:725-740 [Google Scholar]
24. Doi K, Macmahon H, Katsuragawa S, Nishikawa RM, Jiang Y. Computer-aided diagnosis in radiology: potential and pitfalls. Eur J Radiol 1999; 31:97-109 [Google Scholar]
25. Freedman AN, Seminara D, Gail MH, et al. Cancer risk prediction models: a workshop on development, evaluation, and application. J Natl Cancer Inst 2005; 97:715-723 [Google Scholar]
26. Osuch JR, Anthony M, Bassett LW, et al. A proposal for a national mammography database: content, purpose, and value. AJR 1995; 164:1329-1334 [Abstract] [Google Scholar]
27. American College of Radiology. Breast Imaging Reporting and Data System (BI-RADS), 3rd ed. Reston, VA: American College of Radiology, 1998 [Google Scholar]
28. American College of Radiology. Breast Imaging Reporting and Data System (BI-RADS), 4th ed. Reston, VA: American College of Radiology, 2004 [Google Scholar]
29. Bagley SC, White H, Golomb BA. Logistic regression in the medical literature: standards for use and reporting, with particular attention to one medical domain. J Clin Epidemiol 2001; 54:979-985 [Google Scholar]
30. Gareen IF, Gatsonis C. Primer on multiple regression models for diagnostic imaging research. Radiology 2003; 229:305-310 [Google Scholar]
31. Team RDC. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2005 [Google Scholar]
32. Moineddin R, Matheson FI, Glazier RH. A simulation study of sample size for multilevel logistic regression models. BMC Med Res Methodol 2007; 7:34 [Google Scholar]
33. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143:29-36 [Google Scholar]
34. DeLong ER, DeLong D, Clarke-Pearson D. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988; 44:837-845 [Google Scholar]
35. Newcombe RG. Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med 1998; 17:857-872 [Google Scholar]
36. Bassett LW, Hendrick RE, Bassford TL. Quality determinants of mammography. Clinical practice guideline. No. 13. Rockville, MD: Agency for Health Care Policy and Research. Public Health Service, U.S. Department of Health and Human Services,1994 [Google Scholar]
37. Burnside ES, Rubin DL, Fine JP, Shachter RD, Sisney GA, Leung WK. Bayesian network to predict breast cancer risk of mammographic microcalcifications and reduce number of benign biopsy results: initial experience. Radiology 2006; 240:666-673 [Google Scholar]
38. Liberman L, Abramson AF, Squires FB, Glassman JR, Morris EA, Dershaw DD. The Breast Imaging Reporting and Data System: positive predictive value of mammographic features and final assessment categories. AJR 1998; 171:35-40 [Abstract] [Google Scholar]
39. Sickles EA, Wolverton DE, Dee KE. Performance parameters for screening and diagnostic mammography: specialist and general radiologists. Radiology 2002; 224:861-869 [Google Scholar]
40. Berg WA, D'Orsi CJ, Jackson VP, et al. Does training in the Breast Imaging Reporting and Data System (BI-RADS) improve biopsy recommendations or feature analysis agreement with experienced breast imagers at mammography? Radiology 2002; 224:871-880 [Google Scholar]
41. Kleinbaum DG. Logistic regression: a self-learning text. New York, NY: Springer-Verlag,1994 [Google Scholar]
42. Hosmer D, Lemeshow S. Applied logistic regression. New York, NY: Wiley, 1989 [Google Scholar]
43. Davis J, Goadrich M. The relationship between precision-recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, PA: ICML, 2006:233-240 [Google Scholar]
44. Chhatwal J, Burnside ES, Alagoz O. Receiver operating characteristic (ROC) curves versus precision-recall (PR) curves in models evaluated with unbalanced data. Proceedings of the 29th annual meeting of the Society for Medical Decision Making. Pittsburgh, PA: SMDM, 2007 [Google Scholar]

Recommended Articles

A Logistic Regression Model Based on the National Mammography Database Format to Aid Breast Cancer Diagnosis

Free Access, , , , ,
American Journal of Roentgenology. 2009;192:1112-1116. 10.2214/AJR.08.1405
Abstract | Full Text | PDF (618 KB) | PDF Plus (681 KB) 
Free Access, , , ,
American Journal of Roentgenology. 2009;192:W178-W186. 10.2214/AJR.08.1593
Abstract | Full Text | PDF (1310 KB) | PDF Plus (1242 KB) 
Free Access, , , ,
American Journal of Roentgenology. 2016;206:883-890. 10.2214/AJR.15.14312
Abstract | Full Text | PDF (697 KB) | PDF Plus (776 KB) 
Free Access, , ,
American Journal of Roentgenology. 2009;192:1128-1134. 10.2214/AJR.07.3987
Abstract | Full Text | PDF (589 KB) | PDF Plus (336 KB) 
Free Access, , , ,
American Journal of Roentgenology. 2017;208:208-213. 10.2214/AJR.15.15987
Abstract | Full Text | PDF (651 KB) | PDF Plus (713 KB) 
Free Access, , ,
American Journal of Roentgenology. 2000;174:1769-1777. 10.2214/ajr.174.6.1741769
Abstract | Full Text | PDF (6503 KB) | PDF Plus (6739 KB)