|
|
||||||||
1
Center for Health Studies, Group Health Cooperative, 1730 Minor Ave., Ste.
1600, Seattle, WA 98101.
2
Department of Family Medicine, University of Washington, 1935 N.E. Pacific,
Seattle, WA 98195.
3
Division of Mammography, Quality, and Radiation Programs, United States Food
and Drug Administration, 1350 Picard Dr., 2nd Floor-HFZ240, Rockville, MD
20850.
4
Cancer Prevention Research Program, Fred Hutchinson Cancer Research Center, MP
702, 1124 Columbia St., Seattle, WA 98104.
5
Office of Drug Evaluation III, Center for Drug Evaluation and Research, United
States Food and Drug Administration, 5600 Fishers Ln., Rockville, MD
20857.
Received April 9, 2001;
accepted after revision August 21, 2001.
All opinions and findings are the sole responsibility of the authors; the
views expressed do not necessarily represent those of the United States Food
and Drug Administration or the United States government.
Abstract
|
|
|---|
MATERIALS AND METHODS. We identified women with breast cancer who were younger than 40 years old and older and screened from January 1, 1988, through December 31, 1993. We retrospectively assigned Breast Imaging Reporting and Data System (BI-RADS) assessments to their screening mammogram. We classified cancers (invasive or ductal in situ) as "screen-detected" when found after positive assessments (BI-RADS codes 3, 4, and 5) and "interval-detected" when found after negative assessments (BI-RADS codes 1 and 2). One reviewer evaluated mediolateral oblique and craniocaudal views for all cancer cases using a 3-point scale (failure, borderline, pass) for each measure of clinical image quality (positioning, breast compression, contrast, exposure, noise, sharpness, artifacts, overall quality). We used separate logistic regression models and evaluated the odds of interval invasive cancer or invasive plus in situ cancer as a function of each measure of quality using "pass" as the referent group.
RESULTS. We found 492 screen-detected and 164 interval-detected cancers that met study criteria. Cancer detection (sensitivity) was highest (84%) among patients with proper breast positioning, but when images failed this measure (33.4%), sensitivity fell to 66.3%. After adjustment for age, film date, and breast density, interval-detected invasive cancers were more likely after images failing positioning (odds ratio, 2.57; 95% confidence interval, 1.28-5.52%). Failures in overall quality were also associated with interval cancers when cases of ductal carcinoma in situ (p = 0.037) were included.
CONCLUSION. Invasive breast cancer detection by mammography may be improved through attention to correct positioning.
|
|
|---|
The United States Congress passed the Mammography Quality Standards Act of 1992 to establish national standards [3]. The Mammography Quality Standards Act established a mammography certification program that includes evaluation of facility personnel, procedures, and technical image quality through annual on-site inspections and clinical quality review at least every 3 years through an accreditation body [2, 3]. This certification program is designed to ensure that all facilities in the United States achieve or exceed minimum quality standards. Results of analyses in Minnesota and Colorado suggest that the legislation has led to technical image quality improvements [4, 5].
Despite the intuitive appeal of high-quality clinical images, we are not aware of work that directly assesses the link between all of its dimensions and cancer detection at the time of screening (sensitivity). Some randomized trials of mammographic efficacy assured a minimum standard for mammography, and one trial had a systematic evaluation of clinical image quality [6]. Another study showed that increased optical density is associated with a higher rate of detecting tumors less than 10 mm in diameter [7]. Although authors have described the parameters of clinical quality, little is known about which ones affect cancer detection [1]. In addition to quality, factors that may reduce the sensitivity of mammography include technical and interpretive errors [8,9,10,11], rapid tumor growth [11,12,13,14,15,16], age [17], and breast density [18,19,20,21]. Our study used a cohort design to evaluate the association between mammographic clinical image quality and the detection of cancer at the time of a screening visit.
|
|
|---|
All female members of Group Health Cooperative who are 40 years old or older are invited to enroll in a breast cancer screening program [22, 23]. Program enrollment begins by completing a risk factor questionnaire. Once enrolled, women receive regular reminders that they are due for screening. Screening occurs through breast cancer screening program centers in which women receive a mammogram and clinical breast examination [23, 24]. Eighty-five percent of women members complete the questionnaire and enroll in the program, so they are eligible for regular reminders [23]. Physicians may also order mammograms in the course of usual care for screening or to evaluate a symptomatic woman. All examinations occur through accredited and certified radiology facilities. Results of all screening examinations that occur in the breast cancer screening program are recorded in an electronic file on a mainframe database.
Eligibility and Inclusion
We identified women enrolled in the breast cancer screening program who
were screened between January 1, 1988, and December 31, 1993. We designated
the "index examination" as a woman's last screening mammogram.
Among screened women, we identified those who met the following criteria: did
not have a breast cancer history before the index examination, remained
enrolled in Group Health Cooperative, or died of any cause during the
succeeding 24 months after the examination and were diagnosed with ductal
carcinoma in situ or invasive breast cancer (codes 174.0 through 174.9;
International Classification of Diseases for Oncology
[25]) within 24 months of
their index examination and before their next screening mammogram. All cancers
were identified and validated through our local Surveillance, Epidemiology,
and End Results cancer registry.
Mammographic Assessments
The interpretations for all mammograms occurred before implementation of
current American College of Radiology Breast Imaging Reporting and Data System
(BI-RADS) terminology, so we retrospectively assigned BI-RADS codes
[26]. To make the assignment
using a common standard, we gathered information from the women's paper
medical records, their recorded mammogram assessments (negative, benign,
indeterminate, positive), and the radiologists' recommendations (follow-up in
6 months, follow-up
year, surgical evaluation, or biopsy). We used this
information to systematically classify the interpretations in accordance with
American College of Radiology BI-RADS terminology
[18,
27]. Any negative or benign
screening assessment associated with a recommendation for a follow-up
mammogram after 1 or more years was classified respectively as BI-RADS 1
(negative) or BI-RADS 2 (benign finding). Any "indeterminate"
assessment associated with a recommendation for a follow-up mammogram after 1
or more years was classified as BI-RADS 2 (benign finding). Any
"indeterminate" associated with a 6-month follow-up recommendation
was classified as a BI-RADS 3 (probably benign). Any
"indeterminate" or "positive" assessment associated
with a recommendation for a surgical evaluation, biopsy, or both within 90
days was classified as BI-RADS 4 or 5 (suspicious or highly suggestive). The
original mammography assessments were recorded after the evaluation of
additional images if any were required.
Classification of Cases
We classified screen-detected cancers as those diagnosed among women within
24 months of a positive assessment of their screening mammogram. A positive
assessment included American College of Radiology BI-RADS codes 3, 4, and
5.
We classified interval-detected cancers as those that occurred among women within 24 months of a negative assessment of their screening mammogram. A negative assessment included American College of Radiology BI-RADS codes 1 or 2. We also counted any assessment as negative if abnormalities noted by the radiologist were in the opposite breast from that in which the cancer was detected.
We audited the charts of all interval-detected cancers, all screen-detected cancers diagnosed after 3 months of their index mammogram, and 34 of the screen-detected cancers diagnosed within 90 days of their index mammogram to ensure that our classification of screen-detected and interval-detected cancer occurrence was accurate.
Sensitivity
We included all breast cancer cases (all screen-detected and
interval-detected cases) in our study and calculated the sensitivity of
screening mammography as the ratio of screen-detected cancers to the sum of
all screen-detected and interval-detected cases. We calculated sensitivity for
pass, borderline, and failed cases for each measure of quality.
Mammography Quality Review
A single radiologist reviewed all mammographic examinations to evaluate
breast-tissue density and image quality. The quality reviewer was unaware of
the age, detection status (screen vs interval cancer), or year of the
mammogram for any women. The reviewer is the associate director for policy and
clinical affairs of the Division of Mammography Quality and Radiation Programs
of the Food and Drug Administration. This division administers the Mammography
Quality Standards Act. He is a board-certified Mammography Quality Standards
Act-qualified interpreting physician.
The reviewer rated the images for density and quality. He evaluated density on the basis of the four standard BI-RADS categories (almost entirely fat, scattered fibroglandular densities, heterogeneously dense, extremely dense). He evaluated the seven image quality attributes described in the Food and Drug Administration's regulations [3] (positioning, compression, exposure, noise, sharpness, contrast, and artifacts) using a grading scale developed specifically for this study based on previously published articles addressing image quality [1, 28, 29].
Each clinical quality attribute (positioning, compression, exposure, sharpness, noise, artifacts, and contrast), as well as an overall assessment, was rated on a 5-point ordinal scale based on the following: positioning, graded on the basis of the amount of pectoral muscle visualized, presence of retroglandular fat, no lateral glandular tissue cutoff film, craniocaudal nipple line within 1 cm of mediolateral oblique nipple line, and nipple in midline and profile on craniocaudal view; compression, graded on the basis of the separation of anatomic structures, uniform tissue exposure, no motion unsharpness, and good penetration of thicker areas without overpenetration of thin areas; exposure, graded on the basis of visualization of detail in dense glandular and fatty tissues, underlying tissue seen through pectoral muscle on mediolateral oblique view, and skin line visualized; sharpness, graded on the basis of the amount of blurring of edges of linear structures, tissue borders, and calcificationsno blurring, minimal blurring identified with a magnifying glass, moderate blurring identified with a magnifying glass, or blurring identified capable of obscuring or creating lesions; noise, graded on the basis of the amount seenno noise, minimal noise identified with a magnifying glass, moderate noise identified with a magnifying glass, noise identified without a magnifying glass, or noise causing artifacts capable of obscuring or creating lesions; artifacts, graded on number seennone, 1-5 seen but none simulate a lesion or significantly obscure anatomy, 1-5 artifacts seen and some simulate a lesion or significantly obscure anatomy, 5-10 artifacts seen and some simulate a lesion or significantly obscure anatomy, or greater than 10 artifacts seen and some simulate a lesion or significantly obscure anatomy; contrast, graded on good differentiation of fat, glandular, dense glandular, and calcific densities. After rating each of the clinical quality attributes of the mammograms, the radiologist provided a single subjective overall evaluation of their quality. The specific grading scales have not been used by accreditation bodies although the quality attributes are part of routine review. Because few films were scored at the very upper and very lower ends of the scale, we collapsed the rating to three points: pass (clinical quality score = 1, 2), borderline (clinical quality score = 3), and fail (clinical quality scale = 4, 5). Quality was rated separately for two mediolateral oblique and two craniocaudal views, and the worst quality rating for either view was used in the analyses.
Analysis
We compared the clinical quality of mammograms among women with
interval-detected cancers with that of screen-detected cancers using logistic
regression. Quality was included as an independent categoric variable in the
logistic regression models with passing quality as the referent category for
each measure and interval detection versus screen detection as the dependent
variable. Models were estimated separately for each of the seven attributes
and for one overall quality measure. For each model, we present estimated odds
ratios and 95% confidence intervals of interval detection for borderline
versus passing quality and failing versus passing quality. We also include
p values for the overall effect of the quality measure, based on the
change in log-likelihood. Tables present both unadjusted and
covariate-adjusted odds ratios. We estimated adjusted odds ratios from
logistic regression models that simultaneously included the main effects of
age, the year of the index examination, and the parenchymal density of the
contralateral breast. Adjustments for age and density were made because they
are known to have an effect on mammographic accuracy
[18,
27], so we expected that they
might modify the effect of quality on interval detection. We made an
adjustment for the year of the mammogram to account for changes in technology
over time. We did not adjust for body mass index because of its close
correlation with density and because density is a stronger predictor of
mammographic accuracy [18]. We
used Spearman's rank correlation to explore bivariate relationships among the
seven individual and one overall mammographic quality measures.
Our primary analyses focus on the probability of interval detection among women with invasive cancer using BI-RADS 1 or 2 as negative assessments and a follow-up period of 2 years. We explored the robustness of these results via three additional sets of analyses. These subanalyses used different subgroups of women and different definitions of screen and interval detection. The first set of subanalyses expanded the sample used for primary analyses to include women with ductal carcinoma in situ. The second set of subanalyses was restricted to invasive cases diagnosed within 13 months of their index mammogram. The third set of subanalyses used the same sample as the primary analyses but changed the definition of negative assessments to include BI-RADS 1, 2, or 3. This change meant that women given a BI-RADS 3 assessment were classified as interval-detected, whereas they were counted as screen-detected in the primary analysis.
|
|
|---|
Table 1 displays the characteristics of the women with screen-detected (n = 396) and interval-detected (n = 152) invasive cancers. As described elsewhere, the interval cancers were more likely to be among younger women, and by the time they were detected, the cancers were somewhat larger than those found at screening [27]. Half the interval cancers occurred in 1992 and 1993, reflecting the growth in total screening volume during that time period.
|
Table 2 shows the distribution of clinical quality measures for mammograms among women with screen-detected versus interval-detected invasive cancers. Three parameters show distributions that vary significantly among the screen-detected and interval-detected cancers: positioning, noise, and overall quality. Among the cases that failed to have good positioning, 35% failed only the mediolateral oblique view, 48% failed only the craniocaudal view, and 17% failed for both views. Among the cases that failed because of noise, 17% failed only the mediolateral oblique view, 4% failed only the craniocaudal view, and 78% failed both views. Among the cases that failed overall, 30% failed only the mediolateral oblique view, 14% failed only the craniocaudal view, and 56% failed both views.
|
Failures were rare (i.e., <3%) for compression, exposure, contrast, and artifacts, and these failures appeared to occur at similar rates for screen-detected and interval-detected cancers. Failures for sharpness were somewhat more common, and although interval-detected cases failed the subjective evaluation of image sharpness more often than screen-detected cases (10.5% vs 6.3%), this difference was not statistically significant (p = 0.167).
Overall quality was most strongly correlated with positioning (0.59), and only moderately correlated with other components of quality (ranging from 0.23 to 0.36).
Table 3 shows the proportion of cancers that were screen-detected (sensitivity) for each level of quality (pass, borderline, fail) within each of the measures. The proportion of interval-detected cancers is shown as 1-sensitivity. The model estimates the odds ratio and adjusted odds ratio of interval cancer for poor and borderline quality relative to passing quality. The model also estimates the relationship of the measure as a whole to interval cancer occurrence; p values for the measure reflect the evaluation of this association. Failing or borderline positioning was associated with the occurrence of interval invasive cancers. Sensitivity dropped from 84.4% among cases with passing positioning to 66.3% among cases with failed positioning. Borderline exposure was also associated with interval cancer occurrence but not in the expected direction. Borderline exposure appears to be associated with increased sensitivity in this analysis. Because the number of cases was small and no interval cancers were detected among those cases that failed exposure, confirmation of the finding regarding exposure must occur in other settings. Noise and overall quality were also associated with interval cancer occurrence.
|
Table 3 shows that adjusting for patient age, film date, and breast density had little effect on associations between film quality and interval cancer occurrence within 2 years. After adjustment, failing or borderline positioning remained significantly associated with the occurrence of interval cancers by 24 months. Borderline exposure also remained associated with interval cancer occurrence. Covariate adjustment reduced the effect of noise and overall quality so that the associations were no longer statistically significant, but the point estimates continued to suggest an effect that needs further evaluation in a larger study.
Not shown in Table 3 are results based on the three subanalyses using alternate samples and definitions of interval detection. The results for these subanalyses using the fully adjusted model differed slightly from the analyses presented in Table 3, although the direction of effects was the same as that in the primary analysis. In all three subanalyses, positioning remained associated with interval cancer occurrence. When women with ductal carcinoma in situ were included, overall quality was also associated with interval cancer occurrence (p = 0.04). When the sample was restricted to interval cancer within 13 months of the screening mammogram, exposure was no longer associated with interval cancer occurrence. When a negative assessment included BI-RADS codes 1,2, and 3, exposure was no longer associated with interval cancer, although both noise and overall quality were also significantly associated with interval cancer (p = 0.04 and 0.02, respectively).
|
|
|---|
Failures in overall quality appeared to be associated with interval cancers when we included ductal carcinoma in situ cancers and invasive cancers in the dependent variable. This association persisted after adjustment for body mass index and density. Overall quality is highly correlated with positioning quality, but consideration of the other elements such as sharpness and noise may also be important because. failures in these measures also were more common among interval cancers (Table 2). Whether failures in overall quality are important depends on the clinical significance of ductal carcinoma in situ. Despite the controversies surrounding the natural history of ductal carcinoma in situ, belief is widespread that its detection and treatment are important [30, 31]. Attention to noise, sharpness, and positioning as important aspects of overall clinical image quality may, therefore, also improve screening detection.
Mammography that fails in sharpness, noise, or positioning is amenable to improvements. Sharpness could improve by reducing patient motion, reducing mammography unit vibration, using a smaller focal point, and using high-detail film cassettes [32, 33]. Optimizing background optical density, increasing radiation dose, or using high-detail film could decrease failures in noise [34]. Failures in positioning are particularly operator-dependent and can be improved by training technologists [35, 36]. Optimal positioning maximizes the amount of breast tissue seen on the image, uses a fully rotational C-arm, and takes into account the woman's habitus. Consistent with work by Bassett et al. [36], the measure of quality used in our study emphasized visualization of the pectoral muscle, the presence of retroglandular fat, and the nipple profile. However, compression is noted as a condition in Bassett's work, and we assessed that separately in our study.
Despite the robustness of our findings, some limitations in this study might be addressed in future studies of clinical image quality. Our series included interval cancers subsequent to films taken in 1988-1993 and required retrospective assignment of BI-RADS codes. Those codes were systematically assigned before the cancers were classified as screen-detected or interval-detected cancers, so any bias is expected to be minimal. However, a stronger design would use prospective interpretations. Mammographic quality has improved since 1993, and failures in sharpness and noise may be less common now. Therefore, those failures now could be associated with interval cancers. Increased attention has been paid to the importance of positioning [35]. Our study reinforces the importance of positioning, and although the frequency of failure may have changed, it seems unlikely that failures in positioning would have lost their association with interval cancer occurrence. Finally, we drafted measures of clinical quality on the basis of experience and knowledge of the literature, but they do not represent a gold standard. They are a reasonable standard, and improvements in these measures might be considered in a future work. Furthermore, accreditation bodies assess these same quality attributes when evaluating films submitted for accreditation purposes, but they use different film selection criteria. Films submitted for accreditation are expected to represent the highest quality and are evaluated at that level. Therefore, the specific grading scale described in this article is not the one used by accreditation bodies.
Another limitation of the study deserves separate attention. We used a single rater to evaluate breast density and quality. That person was also the person who developed the quality scale. Although our method provided a consistent standard, it also may have introduced some bias because the application of the standard could have been influenced by the rater's commitment to its use and whether the rater could see the cancer on the films. However, at the time of the rating, the rater did not know whether the cancer was screen-detected or occurred in the subsequent interval. We know that about two thirds of cancers found in the interval subsequent to the screening could actually be seen at the time of the screening [27]. Therefore, we do not think that the rater's ability to see the cancer would introduce a large bias in the evaluation of the images when he did not know if they would be classified as screen-detected or interval-detected cancers. However, some persistent bias may exist if the rater was stricter in the application of the quality scale when a subtle cancer was identified. An alternative design would be to rate density and quality independently and use multiple reviewers with an average of their evaluations rather than the single rating. When others implement our quality measure and when multiple reviewers independently evaluate density and quality, the replication of our findings will be tested. Such a test is even more important given the association we found.
Although the associations between positioning and occurrence of invasive interval cancer appear robust, the lack of association with other measures does not mean those measures are not important. Failures of other quality measures, including noise and overall quality, were associated with interval cancer occurrence under some conditions. Failures in compression, contrast, and artifact were relatively rare. The lack of association between failures in these measures and interval cancer occurrence may simply reflect this rarity. Consideration of more sensitive scales for these measures may also be important. Young et al. [7] showed that achieving optimal optical density as measured by the automatic exposure control on the mammography unit is associated with the higher observerrated image quality and smaller tumor detection. In our study, however, we had a single observer, whereas Young et al. had four, and we did not include a measure of optical density in our analysis. We also included all interval cancers with a full range of sizes. Further study of the influence of mammographic quality on interval cancer occurrence might consider the development of more sensitive measurement scales, using multiple reviewers, and the evaluation of the effects of quality on small-tumor detection.
In summary, we found that failures in positioning were associated with subsequent interval cancer occurrence under all conditions considered. Failures in the measures of sharpness were associated with interval cancer occurrence depending on the cancers included; follow-up periods; and adjustments for age, year of study, and breast density. Our results emphasize the importance of attention to positioning and indicate that accepting even borderline positioning that reduces the visualization of the pectoralis muscle or the nipple may increase the likelihood of missing an invasive breast cancer and reduce the sensitivity of mammography.
Acknowledgments
We thank Cynthia Sisk and Stephanie Hauge for their attention to detail and
conscientious implementation of this study and Nancy Snell for her help in
preparation of this manuscript.
|
|
|---|
This article has been cited by other articles:
![]() |
P. A. Carney, E. Steiner, M. E. Goodrich, A. J. Dietrich, C. J. Kasales, J. E. Weiss, and T. MacKenzie Discovery of Breast Cancers Within 1 Year of a Normal Screening Mammogram: How Are They Found? Ann. Fam. Med, November 1, 2006; 4(6): 512 - 518. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. MINIGH Mammographic Film Artifacts. Radiol. Technol., May 1, 2006; 77(5): 389M - 402M. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hofvind, P. Skaane, B. Vitak, H. Wang, S. Thoresen, L. Eriksen, H. Bjorndal, A. Braaten, and N. Bjurstam Influence of Review Design on Percentages of Missed Interval Breast Cancers: Retrospective Study of Interval Cancers in a Population-based Screening Program Radiology, November 1, 2005; 237(2): 437 - 443. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. J. Aiello, D. S.M. Buist, E. White, and P. L. Porter Association between Mammographic Breast Density and Breast Cancer Tumor Characteristics Cancer Epidemiol. Biomarkers Prev., March 1, 2005; 14(3): 662 - 668. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. E. Hendrick, G. R. Cutter, E. A. Berns, C. Nakano, J. Egger, P. A. Carney, L. Abraham, S. H. Taplin, C. J. D'Orsi, W. Barlow, et al. Community-Based Mammography Practice: Services, Charges, and Interpretation Methods Am. J. Roentgenol., February 1, 2005; 184(2): 433 - 438. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. H. Taplin, L. Ichikawa, M. U. Yood, M. M. Manos, A. M. Geiger, S. Weinmann, J. Gilbert, J. Mouchawar, W. A. Leyden, R. Altaras, et al. Reason for Late-Stage Breast Cancer: Absence of Screening or Detection, or Breakdown in Follow-up? J Natl Cancer Inst, October 20, 2004; 96(20): 1518 - 1527. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. S. M. Buist, P. L. Porter, C. Lehman, S. H. Taplin, and E. White Factors Contributing to Mammography Failure in Women Aged 40-49 Years J Natl Cancer Inst, October 6, 2004; 96(19): 1432 - 1440. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. G. Elmore, P. A. Carney, L. A. Abraham, W. E. Barlow, J. R. Egger, J. S. Fosse, G. R. Cutter, R. E. Hendrick, C. J. D'Orsi, P. Paliwal, et al. The Association Between Obesity and Screening Mammography Accuracy Arch Intern Med, May 24, 2004; 164(10): 1140 - 1147. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |