AJR Join ARRS
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hardesty, L. A.
Right arrow Articles by Gur, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hardesty, L. A.
Right arrow Articles by Gur, D.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
AJR 2005; 184:1505-1507
© American Roentgen Ray Society


Perspective

Is Maximum Positive Predictive Value a Good Indicator of an Optimal Screening Mammography Practice?

Lara A. Hardesty1, Amy H. Klym, Betty E. Shindel, Denise M. Chough, Jules H. Sumkin and David Gur

1 All authors: Department of Radiology, University of Pittsburgh Medical Center, Magee Women's Hospital, 300 Halket St., Ste. 4200, Pittsburgh, PA 15213.

Received August 26, 2004; accepted after revision November 29, 2004.

Address correspondence to L. A. Hardesty (lhardesty{at}mail.magee.edu).

Abstract

OBJECTIVE. Positive predictive value (PPV1) has been used as one important indicator of the quality of screening mammography programs. We show how the relationship between sensitivity and recall rate may affect the operating point at which optimal (maximum) PPV1 occurs.

CONCLUSION. Optimal (maximum) PPV1 can occur at any sensitivity level and should not be used as the sole indicator for practice optimization because it does not take into account the number of cancers that would be missed at that sensitivity.

Recent publications have used positive predictive value (PPV1) as one of the primary indicators of the quality of screening mammography programs [13]. Defined as the proportion of positive screening mammograms (BI-RADS categories 0, 4, or 5) for which breast cancer is diagnosed within 12 months of the screening mammogram, PPV1 is calculated as the number of true-positives / (true-positives + false positives) [4].

The rationale behind the use of PPV1 to describe the quality of a screening mammography program is as follows: A maximum PPV1 indicates the highest rate of breast cancer detection per unit of a recalled examination (or an indication of the most efficient detection of cancers). PPV1 has, in the past, been generally thought to represent the "necessary balance" between sensitivity and specificity.

However, we wish to caution against the use of PPV1 alone as the primary indicator of the quality of screening mammography programs. A program that operates at the highest (maximum) PPV1 may still miss a large number of potentially detectable cancers. If only the lesions with a high likelihood of being malignant are recalled, the high specificity could elevate PPV1 despite the low sensitivity. We believe that the ultimate goal of mammographic screening should be the detection of as many breast cancers as early as possible—that is, when they are more likely to be treatable. Therefore, the sensitivity of a screening mammography program should be weighted more heavily than specificity when determining the overall quality of a screening program.

Critics of this point of view state that high specificity is necessary to minimize the costs (monetary and emotional) associated with the additional imaging examinations of women who are recalled but ultimately shown to not have breast cancer. Otherwise, the costs may be prohibitive, and, as a society, we may not be able to afford screening mammography. Although costs associated with the additional evaluation of women who are recalled from screening mammography are high, the cost of only a fraction of detectable cancers being missed may be higher.

We believe that detecting as many cancers as practically reasonable should be the ultimate goal of mammographic screening and that it should be basically limited by the highest practical number of recalls per detected cancer that are acceptable in a specific clinical environment. Specificity should be a secondary concern relative to sensitivity in the evaluation of the quality of screening mammography [5]. As a result, the use of PPV1 alone to evaluate the success of screening mammography programs or the inference that practices can be optimized by maximizing PPV1 without considering sensitivity related to "residuals"—namely, the cancers that could be detected but are not—may be flawed. In addition, PPV1 is related to the prevalence of breast cancer in the population being screened, so a comparison using PPV1 as the sole indicator of the quality of two screening mammography practices that evaluate different patient populations has little value.

We show our main point in this perspective using three possible relationships between detection sensitivity, which is assumed to be related to cancer detection rates, and recall rates (Fig. 1). If one assumes that the relationship between detection sensitivity and recall rate in screening practices is linear over a certain range of recall rates, the resulting PPV1 in this (linear) range is a constant (Table 1). If, on the other hand, the relationship is convex, as would be expected in certain ranges of recall rates (namely, the ratio between cancers being detected and recall rates decreases with increasing recall rates), PPV1 will continually decrease in this range (Table 1). Therefore, wherever the curve changes from linear to convex, which can possibly occur at any recall rate, PPV1 will begin to decrease.



View larger version (7K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1. Graph shows three possible models of the relationship between sensitivity and recall rates given an assumed technology-based upper limit of sensitivity for mammography at 90%: linear ({diamondsuit}), convex ({blacksquare}), and sigmoidal ({blacktriangleup}).

 

View this table:
[in this window]
[in a new window]

 
TABLE 1 Positive Predictive Value (PPV1) and Percentage of Cancers Theoretically That Were Detectable on Mammography but Were Actually Missed as a Function of Recall Rate and the Relationship Between Sensitivity and Recall Rate

 

If PPV1 is the sole index of success, its maximum value can happen at any level of sensitivity. Just because one operates at the point of maximum PPV1 does not mean that one operates optimally. A large fraction of cancers may actually be missed at this operating point. For example, if this break point (from linear to convex) occurs at 50% sensitivity, 50% of cancers will be missed, despite the fact that the cancers actually being detected will be detected with optimal efficiency.

If the relationship between sensitivity and recall rates is sigmoidal (S shape), this maximum PPV1 point will occur when the second derivative becomes negative—that is, where the linear region between concave and convex begins to become convex (Table 1). Again, this point can happen at any level of recall rate and, hence, at any sensitivity; therefore, PPV1 should not be used in this case as the primary measure for assessing the quality (operational success) of screening mammography practices.

Figure 1 shows three possible models for the relationship between sensitivity and recall rates (linear, convex, and sigmoidal) with the additional assumption that the technology itself (e.g., mammography) has an upper limit on sensitivity (e.g., 90%) that cannot be exceeded. The value we show in this example (90%) may well be an overestimation of the upper limit of the actual sensitivity of mammography resulting from the technology itself, but the actual number is not known.

Optimizing practices based on maximum PPV1 may be valid only when one operates at a sufficiently high sensitivity so that missed cancers constitute a small fraction of the total number that could have been detected if the practice operated with higher recall rates (hence, lower PPV1). Until we have a way to assign specific and acceptable utility functions (values) to detected and missed cancers, we have to be cognizant of the fact that women's expectations are high in this regard: "If I have a cancer, I expect it to be found, and found early. That is why I participate in periodic screening."

We must recognize that all technologies are limited in their capabilities and ultimately all imaging techniques (e.g., mammography, sonography, MRI) have an upper bound on the fraction of cancers that can actually be detected earlier when these techniques are used (Fig. 1). However, until we reach this upper bound (whatever it may be for each specific technology), we must consider the relationship between sensitivity and false-positive rates when assessing the quality of our practices.

If one wishes to assess practices quantitatively, it would generally require more than one variable. However, if one sets a limit for the cost function, one could use one variable for this purpose. For example, one can decide to operate at a highest level of sensitivity that would not exceed 40 recommendations for recall for every cancer case that is actually detected. This measure, which by definition constitutes an operating point with a specific PPV1, 1/41 or 0.024 in this example, will result in most instances in a different operating point than that of the maximum PPV1.

We discuss here only the use of PPV1 and not the use of PPV2 or PPV3 because different considerations apply to the diagnostic and biopsy aspects of breast imaging once a woman is recalled from screening mammography for additional procedures that may ultimately result in the diagnosis of breast cancer.

This general concept of evaluating "optimal" practice using simple indexes of performance also applies to studies that evaluate potential changes in the practice of screening mammography, such as those comparing single interpretations versus double interpretations or when comparing high-volume mammographers with low-volume mammographers. These issues also may be directly relevant and applicable to other types of screening programs being considered (e.g., CT for early detection of lung cancer).

References

  1. Blanks RG, Moss SM, Wallis MG. Monitoring and evaluating the UK National Health Service Breast Screening Programme: evaluating the variation in radiological performance between individual programmes using PPV-referral diagrams. J Med Screen2001; 8:24 –28[Abstract/Free Full Text]
  2. Elmore JG, Nakano CY, Koepsell TD, Desnick LM, D'Orsi CJ, Ransohoff DF. International variation in screening mammography interpretations in community-based programs. J Natl Cancer Inst2003; 95:1384 –1393[Abstract/Free Full Text]
  3. Yankaskas BC, Cleveland RJ, Schell MJ, Kozar R. Association of recall rates with sensitivity and positive predictive values of screening mammography. AJR2001; 177:543 –549[Abstract/Free Full Text]
  4. Quality Determinants of Mammography Guideline Panel. Quality determinants of mammography. Rockville, MD: United States Department of Health and Human Services, Public Health Service, Agency for Health Care Policy and Research, 1994:78 –86
  5. Kopans DB, Sumkin JH, Gur D. Older is better. AJR 2003;181:593 –594[Free Full Text]

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
RadiologyHome page
M. J. Schell, B. C. Yankaskas, R. Ballard-Barbash, B. F. Qaqish, W. E. Barlow, R. D. Rosenberg, and R. Smith-Bindman
Evidence-based Target Recall Rates for Screening Mammography
Radiology, June 1, 2007; 243(3): 681 - 689.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hardesty, L. A.
Right arrow Articles by Gur, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hardesty, L. A.
Right arrow Articles by Gur, D.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS