Neoadjuvant chemotherapy (NACT) in women with locally advanced breast cancer increases the number of operable tumors and women eligible for breast conservation therapy without decreasing either overall survival or disease-free survival [
1,
2]. However, not all tumors respond favorably or completely to NACT; therefore, an accurate preoperative measurement of residual tumor size after NACT would be helpful in determining the appropriate surgical approach to minimize both morbidity and reexcision rates. Furthermore, the ability to preoperatively detect pathologic complete response (CR) could provide important prognostic information to aid in personalized treatment planning [
2–
6]. Although NACT alone is not currently the standard of care, clinical trials have been proposed or are currently under way to evaluate the potential for women with good response to NACT to avoid surgery or radiation altogether [
7,
8], in which case the ability to accurately identify those women preoperatively is critical. Currently, the accuracy of different preoperative measurement techniques for reflecting size of residual disease after NACT and detecting pathologic CR is not well established.
Common methods for measuring the size of residual tumor include clinical examination, ultrasound, MRI, and mammography. Challenges in differentiating residual tumor from chemotherapy-induced fibrosis, biopsy-site changes, and tumor necrosis can result in inaccurate estimates of tumor size by each of these modalities [
9,
10]. Previous studies comparing the accuracy of clinical and imaging measurements of residual tumors after NACT not only showed that MRI measurements best correlated to pathology size but also revealed that overestimation and underestimation of residual disease occurred [
11–
14]. Although the American College of Radiology (ACR) Appropriateness Criteria supports the use of MRI before and after NACT for monitoring tumor response [
15], data showing its ability to assess the amount of residual disease after NACT, including the ability to detect pathologic CR, would help guide treatment planning. Thus, further work determining the accuracy of MRI in comparison with other methods for assessing pathologic CR and extent of residual disease after NACT is necessary.
Recent studies from the ACR Imaging Network (ACRIN) 6657 Trial showed that changes in MRI size measures after initiating NACT are predictive of pathologic CR and 3-year survival [
16,
17]. Additional studies have shown that the MRI lesion type [
18], the presence of ductal carcinoma in situ (DCIS) [
19,
20], and histologic subtype [
18,
21,
22] can affect the accuracy of MRI measurements for assessing residual disease after NACT. As the third and final primary aim of the ACRIN 6657 Trial, this current study builds on the previous studies to evaluate the accuracy of post-NACT lesion size measurements by clinical examination, mammography, and MRI for detecting pathologic CR and assessing extent of residual disease.
Subjects and Methods
ACRIN 6657 was a multicenter prospective clinical trial that was conducted at nine academic and private institutions and was funded by the National Cancer Institutes. ACRIN 6657 was performed as the imaging component of a larger treatment study (Cancer and Leukemia Group B [CALGB] 150007: Investigation of Serial Studies to Predict Your Therapeutic Response with Imaging and moLecular Analysis [I-SPY TRIAL]), with the collective goal of identifying both imaging and tissue-based biomarkers that can predict response to standard NACT. The primary aims of the study were to evaluate the ability of MRI and tumor biomarkers to predict treatment response and 3-year disease-free survival after NACT [
16,
17,
23]. The full study protocol is available at
www.acrin.org/6657_protocol.aspx.
Patient Eligibility and Enrollment
The inclusion criteria for ACRIN 6657 have been previously described [
17]. Patients meeting eligibility criteria for the study were enrolled from May 2002 to March 2006 after institutional review board approvals from the ACR and individual participating institutions and appropriate patient consent was obtained. The eligibility criteria included enrollment in CALGB 150007 with tumors measuring at least 3 cm in largest dimension by clinical or imaging examination and receiving NACT. Patients with metastatic disease were excluded. Standard NACT included four cycles of anthracycline-cyclophosphamide and possibly four cycles of taxane. Exclusion criteria included pregnancy and presence of a ferromagnetic prosthesis. Consecutive patients initially were screened and consent was obtained for CALGB 150007 and then were registered for ACRIN 6657, as described previously [
16]. Tumor response to chemotherapy was determined according to the Response Evaluation Criteria in Solid Tumors criteria. For this analysis, all patients who underwent MRI, clinical examination, and mammography after NACT were included. Some of the data from these patients have been reported previously in studies investigating the ability of MRI to predict pathologic CR and recurrence-free survival [
17,
18,
24].
MRI Protocol
The MRI examinations used for measurements in this analysis were those performed after completion of NACT (MRI examination 4 in the trial). Imaging procedures have been published previously [
17]. Briefly, MRI examinations were performed on a 1.5-T magnet with a dedicated breast coil. Patients were imaged in the prone position with an IV catheter inserted in the antecubital vein or hand. Gadopentetate dimeglumine was administered with the start of data acquisition at a dose of 0.1 mmol/kg of body weight over 15 seconds, followed by a 10-mL saline flush over 15 seconds. The use of a power injector for gadolinium injection was not specified in the protocol. The MRI protocol included a sagittal T2-weighted fat-suppressed sequence followed by sequential high-resolution (≤ 1 mm in-plane spatial resolution) 3D T1-weighted fat-suppressed unenhanced and two or more contrast-enhanced gradient-echo sequences (TR ≤ 20 ms, TE = 4.5 ms, flip angle ≤ 45°, and section thickness ≤ 2.5 mm). T1-weighted imaging times were between 4.5 and 5 minutes per phase with the contrast-enhanced phases centered at approximately 2 and 7.5 minutes after contrast injection.
MRI Size and Volumetric Assessments
All MR images were evaluated by an interpreting radiologist at each participating site who measured maximum diameter on MRI and by researchers at a central site who measured functional tumor volume on MRI. Either a breast imager (seven sites) or an MRI radiologist (two sites) with a minimum of 3 years of experience performed image interpretation at each site according to the ACR BI-RADS MRI guidelines [
25]. The radiologists' interpretation of each index lesion included size and extent, MRI lesion type (mass vs nonmass enhancement [NME]), number of lesions, enhancement kinetics, and T2 appearance. The longest diameter by MRI was assessed as the longest dimension of suspicious enhancement including intervening nonenhancing tissue on the lateral-medial or cranial-caudal maximum-intensity-projection (MIP) images created from the first contrast-enhanced dataset. The orientation of the measurement used on the baseline MRI examination was kept constant on all subsequent MRI examinations, including MRI examination 4 used for this analysis, to maximize sensitivity and accuracy for detecting changes in tumor size. Each longest diameter by MRI was measured prospectively once by the interpreting radiologist [
26].
Quantitative imaging of all lesions was performed at the Breast Imaging Laboratory at the University of California San Francisco using previously described methods for tumor volume measurement based on contrast kinetics [
17]. The early percentage enhancement (PE), which was defined as the percentage change in enhancement from the unenhanced acquisition to the first contrast-enhanced acquisition, and signal enhancement ratio (SER), which compared early to late contrast enhancement levels, were calculated for each voxel. Tumor volume on MRI was calculated as the sum of voxels meeting nominal enhancement thresholds of PE > 70% with SER > 0.9 (i.e., voxels reflecting plateau and washout enhancement characteristics). Site-specific adjustments to the PE threshold were made as necessary to adjust for variability in MRI systems and imaging parameters [
27]. Tumors were excluded if the longest dimension or volume could not be measured (i.e., considered nonmeasurable) because the MRI examinations were not performed or not received from the study site, the image quality was poor, the contrast enhancement time points were incomplete, or image misregistration occurred.
Mammographic Size Assessment
Mammography was performed before and after NACT. A board-certified radiologist at each participating site interpreted each mammography examination according to the ACR BI-RADS [
25] and recorded the lesion's longest dimension, which we refer to here as the longest diameter by mammography. The longest dimensions of the tumor after NACT were assessed on both the craniocaudal and mediolateral oblique views, and the longest dimension on the two images determined the longest diameter by mammography. The measurements included spiculations, calcifications, and distortion (when present). The orientation used to determine the longest dimension was assessed independent of the pre-NACT longest diameter by mammography, longest diameter by MRI, and longest diameter by clinical examination and therefore may have differed.
Clinical Size Assessment
The patient's clinician measured lesion size by palpation and recorded the longest dimension, which we refer to here as the longest diameter by clinical examination, before surgery. The clinician had access to the medical records and previous imaging reports while making the clinical measurement. Ultrasound measurements of tumor size were not prospectively collected for the ACRIN 6657 study and therefore are not included in this analysis.
Histopathologic Analysis
Each institution performed histopathologic analysis, including assessments of tumor receptor status for estrogen and progesterone hormones (hormone receptor [HR]) and human epidermal growth receptor 2 (HER2) on all initial biopsies in accordance with the I-SPY Trial protocol [
28]. For all surgical specimens, final histopathologic analyses—including assessment of pathologic CR, size of residual invasive disease, and presence of DCIS—were reinterpreted by a centralized group of trained breast pathologists to standardize measurements. Pathologic CR was defined as no residual invasive disease in the breast or axillary lymph nodes after surgery. Reinterpretation by the centralized group of pathologists resulted in a change in residual disease measurements in a subset of cases, as previously reported [
17]. Centralized pathology assessment after NACT was used as the reference standard for final pathology size and pathologic CR for our analysis.
Statistical Methods
Simple and multiple logistic regression models were used to investigate the relationships between the preoperative measurements (longest diameter by mammography, MRI, and clinical examination and functional tumor volume on MRI) and a binary response outcome of pathologic CR. The odds ratios for the preoperative measurements and their 95% CIs were estimated. Furthermore, we conducted ROC curve analysis to examine the performance of each preoperative measurement. Specifically, the nonparametric areas under the ROC curves (AUCs) were calculated for preoperative measurements, both individually and combined, for assessing pathologic CR in all lesion types, single masses, multiple masses, and single NMEs [
29]. For comparison purposes, ROC analysis was conducted using only cases with all four preoperative measurements.
Spearman rank correlations and linear regression models were used to evaluate the associations between each preoperative measurement and final pathology size in patients with residual invasive disease. Differences between preoperative longest diameter measurements (by mammography, MRI, and clinical examination) and final pathology size were calculated to assess overestimation and underestimation of residual invasive disease. For linear regression models, a natural log (ln) transformation was used to make the response variable, pathology size, more normally distributed. These statistical analyses were performed for all lesion types combined and separately for single masses, single NMEs, multiple masses, and lesions without DCIS.
We also investigated whether the relationships between each preoperative measurement and pathologic CR or final pathology size were significantly different by histologic subtype (HR-negative–HER2-negative, HR-positive–HER2-negative, or HER2-positive) or by mammographic breast density (mostly fat, scattered fibroglandular densities, heterogeneously dense, extremely dense). Specifically, simple logistic and linear regression models were used to predict pathologic CR and final pathology size. In each model, one of the four preoperative measurements and tumor subtype (or mammographic density) were included as predictors. The interaction between the preoperative measurement and tumor subtype (or mammographic breast density) was also included in the model and tested for significance by Wald tests.
For this study, p ≤ 0.05 was considered significant. All statistical data analyses were performed with SAS software (version 9.3, SAS Institute).
Results
Patient and Tumor Characteristics
Of the 230 eligible ACRIN 6657 Trial study patients [
17], 52 were excluded from this study because preoperative functional tumor volume on MRI measurements was not measurable at the post-NACT time point and four were excluded because final pathologic CR data were not available, resulting in a final analysis group of 174 patients with imaging and pathologic data. Seven participating institutions submitted between three and 61 cases each. All four preoperative measurements were available for 138 of 174 (79%) tumors at the post-NACT time point (
Fig. 1).
Table 1 includes a summary of the characteristics of the full ACRIN 6657 cohort (
n = 230) and the subgroup comprising this study dataset (
n = 174). The full ACRIN 6657 cohort and the subgroup included in this study appeared similar demographically (i.e., age, race, ethnicity, menopausal status, and mammographic breast density). The mean age for the analysis set was 47.6 ± 9.1 (SD) years. Predominant self-reported racial backgrounds included white (133/174, 76.4%), black (30/174, 17.2%), and Asian (8/174, 4.6%). Most women had mammographic breast density of either scattered fibroglandular densities (49/174, 28.2%) or heterogeneously dense (88/174, 50.6%).
Of the 174 lesions, 103 (59.2%) were classified as single lesions (66 masses, 37 NMEs) and 71 (40.8%) were classified as multiple lesions (63 masses, eight NMEs) on the initial MRI. The predominant histologic type was invasive ductal carcinoma in 141 of 174 (81.0%) lesions, with DCIS also present in 80 of 174 (46.0%) tumors. Histologic sub-type distribution in the analysis set was 22.4% (39/174) HR-negative–HER2-negative, 43.7% (76/174) HR-positive–HER2-negative, and 32.2% (56/174) HER2-positive. Mean final pathology size was 22.5 mm (SD, 30.5 mm). A pathologic CR was achieved in 51 of 174 (29.3%) patients.
Association Between Preoperative Measurements and Pathologic Complete Response
Of the 51 patients with pathologic CR on final pathology, zero residual disease was accurately detected using longest diameter by clinical examination in 37 patients, longest diameter by mammography in 18, longest diameter by MRI in 27, and functional tumor volume on MRI in 19. In patients with pathologic CR on final pathology and nonzero residual disease detected on preoperative measurements, median longest diameter by MRI was 20.0 mm (range, 3.0–86.0 mm), median longest diameter by clinical examination was 21.0 mm (range, 4.0–40.0 mm), median functional tumor volume on MRI was 0.08 cm3 (range, 0.005–2.13 cm3), and median longest diameter by mammography was 31.0 mm (range, 5.0–100.0 mm).
Table 2 shows the association between preoperative measurements and pathologic CR. In simple logistic regression analysis, each of the preoperative measurements showed significant association with pathologic CR for all lesions (
n = 174,
p < 0.05).
Figure 2 shows examples of MRI findings in cases with and without pathologic CR.
In ROC analysis of cases with all four preoperative measures (
n = 138), longest diameter by MRI showed the highest accuracy for detecting pathologic CR for all lesions (AUC = 0.76) (
Fig. 3), multiple masses (
n = 50; AUC = 0.78), and NME (
n = 27; AUC = 0.84). All four preoperative measurements showed similar accuracy for detecting pathologic CR for single masses (
n = 56; AUC = 0.69–0.72). There were insufficient cases with multiple NMEs for a separate subanalysis. We further excluded tumors with DCIS to assess whether the presence of DCIS, which is not considered when determining pathologic CR by pathologic assessment, reduced the accuracy for predicting pathologic CR by preoperative measurements. After excluding tumors with DCIS, longest diameter by MRI maintained the highest AUC (
n = 76; AUC = 0.74). Combining all preoperative measures could increase performance for detecting pathologic CR for all lesions (AUC = 0.79) (
Fig. 3), single masses (AUC = 0.84), single NMEs (AUC = 0.84), multiple masses (AUC = 0.78), and tumors without DCIS (AUC = 0.78).
Testing of the interaction between histologic subtype (HR-negative–HER2-negative, HR-positive–HER2-negative, and HER2-positive) and each preoperative measurement did not show significant differences for detecting pathologic CR by histologic subtype (p = 0.09–0.78). Similarly, there was no significant difference for detecting pathologic CR by mammographic breast density (p = 0.12–0.73).
Association Between Preoperative Measurements and Final Pathology Size
Of the 123 patients with residual invasive disease on final pathology (non–pathologic CR), no disease was detected on preoperative measurements of longest diameter by clinical examination in 45 patients, longest diameter by mammography in 13 patients, longest diameter by MRI in 12 patients, and functional tumor volume on MRI in 29 patients.
Table 3 shows the association between preoperative measurements and final pathology size. The longest diameter by MRI, functional tumor volume on MRI, and longest diameter by clinical examination each showed a significant association with final pathology size for all lesions (
p < 0.05). The longest diameter by MRI showed the strongest association with final pathology size for all lesions (
r = 0.33) and was the only preoperative measure to show a significant association for single masses (
r = 0.47). The longest diameter by MRI (
r = 0.58), functional tumor volume on MRI (
r = 0.65), and longest diameter by clinical examination (
r = 0.53) showed a significant association with final pathology size for multiple masses. No preoperative measurement showed significant association with final pathology size for a single NME alone. Exclusion of tumors with DCIS showed longest diameter by MRI, functional tumor volume on MRI, and longest diameter by clinical examination measurements to have similar associations with final pathology size (
r = 0.23–0.27), with both functional tumor volume on MRI and longest diameter by clinical examination significant (
p = 0.05) and longest diameter by MRI borderline significant (
p = 0.05) of final pathology size by logistic regression modeling.
On the basis of calculated differences of longest diameter measures minus final pathology size, longest diameter by MRI size most accurately reflected residual invasive disease across all lesion categories, with a mean size difference of 2.4 mm for all lesions, compared with mammography (mean difference, 8.2 mm) and clinical examination (mean difference, −11.2 mm) residual disease (
Table 3). Results indicate longest diameter by MRI tended to slightly underestimate size of residual disease in single lesions (mean size differences, −3.3 mm for masses and −2.9 mm for nonmasses) and to overestimate in cases of multiple lesions (mean size difference = 11.0 mm for multiple masses).
Testing of the interaction between histologic subtype (HR-negative–HER2-negative, HR-positive–HER2-negative, and HER2-positive) and each preoperative measurement did not show significant differences in the association with final pathology size by histologic subtype (p = 0.14–0.83). Similarly, there was no significant difference in the association with final pathology size by mammographic breast density (p = 0.11–0.19).
Discussion
In an era moving toward individualizing cancer therapies, accurate posttreatment assessment of residual disease is important to select appropriate subsequent therapeutic and surgical management strategies. Our analysis showed that, overall, longest diameter by MRI had the greatest accuracy for detecting pathologic CR and measuring extent of residual invasive disease after NACT. Additionally, we found that associations between preoperative measures and final pathology size and pathologic CR were influenced by MRI lesion type (single mass, multiple masses, single NME), whereas the presence of DCIS, histologic sub-type, and mammographic density had little effect. Although longest diameter by MRI showed the highest correlation with pathology in the assessment of extent of residual invasive disease, the correlations of preoperative measures (in particular, of longest diameter by mammography) with final pathology size were overall low, suggesting that they may have limited utility in guiding surgery in patients without pathologic CR after NACT.
Although several studies have compared associations between different preoperative measurements and final pathology size after NACT, there is no consensus about which method is more accurate. Studies comparing the accuracy of MRI and mammography for detecting pathologic CR have consistently shown MRI to be superior [
10,
30–
33], which was further confirmed in our study. However, the reported correlations of MRI with final pathology size have been variable across studies [
34]. Some of these differences may reflect the heterogeneous nature of breast cancer, variability in pathology size assessments, or differences in study design. In our study, the correlation between longest diameter by MRI and final pathology size was surprisingly low (
r = 0.33 for all lesions), although it was within the wide range of those previously reported (range, 0.21–0.89) [
13,
22,
33,
35–
39]. Despite the low correlation, longest diameter measurements of residual disease by MRI more closely matched final pathology size than did those by mammography, which tended to overestimate, or by clinical examination, which tended to underestimate disease. Further, although all preoperative measures similarly predict the presence of pathologic CR in single masses, longest diameter by MRI appeared to more accurately reflect size of residual disease.
Our findings indicate that some of the variability in correlation with final pathology size across studies may result from differences in index lesion types in the study cohorts (mass vs NME). Subanalysis showed longest diameter by MRI was more closely correlated with pathology size in masses, particularly multiple masses (r = 0.58), but did not correlate at all in NME (r = −0.06), suggesting that preoperative MRI may be useful at guiding post-NACT treatment of larger areas of disease present before NACT.
In contrast to a previous study by Choi et al. [
20], we did not observe an improvement in the associations between preoperative MRI measurements and final pathology size by excluding lesions with DCIS. Choi et al. evaluated patients with pathologic CR (no residual invasive cancer) with DCIS and without DCIS after NACT and found a strong correlation between lesion size on MRI and histologic assessment in patients with DCIS and a lower rate of false-positive MRI findings in patients without DCIS. However, because their study did not include patients with residual invasive cancer (non–pathologic CR), they did not assess DCIS as an independent risk factor in the correlation between MRI and final pathology size for invasive cancers or accuracy for detecting pathologic CR. Given the importance of accurate preoperative tumor measurements including DCIS extent for surgical planning and determining a patient's candidacy for breast-conserving therapy, further studies to determine the accuracy of MRI and other preoperative measurements for reflecting DCIS extent and detecting pathologic CR are warranted.
In an exploratory analysis, our study did not identify any influence of histologic sub-type on the associations between preoperative measurements and pathology size or the accuracy for detecting pathologic CR. However, the lack of an association may be because of limitations of sample size. Alternatively, multiple previous studies have reported differences in performance for preoperative assessment of residual disease across histologic subtypes [
18,
20,
21,
24,
36,
40]. In particular, one large retrospective study of 746 women found that the performance of MRI for detecting pathologic CR varied by subtype, with highest accuracy in detecting HR-positive–HER2-negative cancers [
24]. However, study designs and reported results across prior studies have varied widely, with reduced accuracy of MRI observed in estrogen receptor–positive [
40], HER2-positive [
36], HER2-negative [
18,
22], and HR-positive–HER2-negative [
21] tumors, depending on the study. It is clear that more investigation is needed to better understand the influence of histologic subtype on accuracy to preoperatively assess residual disease and detect pathologic CR.
There are several strengths to this study. First, we used a prospective study design to directly compare preoperative size measurements by mammography, MRI, and clinical examination for a large number of tumors from multiple institutions. Most previous studies were retrospective and evaluated a small number of tumors from a single institution [
12–
14,
22]. Additionally, the ACRIN 6657 Trial standardized data collection across time points—before, during, and after NACT—and used a central pathology review as the reference standard for all pathology sizes. Also, our study examined two separate MRI size measurements (longest diameter by MRI and functional tumor volume on MRI). Functional tumor volume on MRI was calculated using a potentially more consistent centralized computer assessment and incorporated both 3D size and functional microvascular properties of the tumor, which have previously been shown to be a sensitive early predictor of therapeutic response [
17].
The study also has limitations. One limitation may relate to the approach for determining preoperative sizes. The ACRIN 6657 protocol required that the orientation of the longest diameter measurement by MRI be held constant across time points (from that originally defined on pre-NACT images) to maximize sensitivity and accuracy for detecting changes in tumor size. However, depending on how the tumor recedes with treatment, this strategy could result in discordance between the orientations of the final pathology size and MRI longest dimension measures, which may have lowered the longest diameter by MRI correlations in this study but would not affect assessment of pathologic CR or correlations between other preoperative measures and tumor size. Further, our finding that longest diameter by clinical examination showed associations with pathologic outcomes (both final pathology size and pathologic CR) comparable to MRI for some index lesion types was surprising and may reflect bias from the availability of additional information, including imaging measurements, at the time of clinical size assessments (longest diameter by clinical examination). The ACRIN 6657 Trial did not limit access to this information to the clinician because it may have impacted patient care and treatment. Despite this limitation, longest diameter by MRI performed superiorly at detecting pathologic CR compared with longest diameter by clinical examination for all lesions, multiple masses, and single NME and showed comparable performance for single masses. Further, although this study assessed the influence of lesion characteristics on the associations between preoperative measures and final pathology size, other institutional-level factors (e.g., interpretive performance, academic vs nonacademic) and patient-level factors (e.g., race, background parenchymal enhancement) could also affect the associations, warranting further investigation to assess the generalizability of our findings. Also, although the MRI protocols and imaging technology were standard of care at the time of this study, imaging technology and protocols have evolved. It is possible that faster dynamic contrast-enhanced MRI protocols and new MRI sequences (e.g., DWI) could reduce underestimation and overestimation of residual disease and increase the performance of MRI at detecting pathologic CR and measuring residual disease. Finally, although ultrasound is another imaging modality used to assess changes in tumor size, it was not included in the ACRIN 6657 study because it was not used consistently across all participating sites.
In summary, we report on the comparative performance of measurements by MRI, mammography, and clinical examination in assessing pathologic outcomes to treatment using data from a prospective multiinstitutional study. Our findings showed MRI measurements of longest tumor diameter to be superior to other preoperative measurements for detecting pathologic CR and assessing the extent of residual invasive disease after NACT, particularly for multiple lesions, and showed that mammography was least accurate. Thus, MRI performed preoperatively after NACT may facilitate new alternative personalized therapeutic approaches by more accurately detecting pathologic CR.