AJR AJR-based Continuing Ed for Technologists
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Skaane, P.
Right arrow Articles by Obuchowski, N. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Skaane, P.
Right arrow Articles by Obuchowski, N. A.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Hotlight (NEW!)
Right arrow
What's Hotlight?
DOI:10.2214/AJR.06.5007
AJR 2006; 186:579-580
© American Roentgen Ray Society

Receiver Operating Characteristic Analysis: A Proper Measurement for Performance in Breast Cancer Screening?

Per Skaane

Ullevaal University Hospital Oslo, Norway

Loren Niklason

Hologic, Inc. Hillsborough, NC

We read with great interest the article by Dr. Obuchowski [1] in the February 2005 issue of the AJR. The article gives an excellent overview of the use and practice of receiver operating characteristic (ROC) methodology in radiology practice. ROC analysis is a key tool for evaluating diagnostic systems, and ROC curves have been considered imperative when comparing new imaging technologies in breast cancer screening [2]. Obuchowski included in the article an example of ROC analysis based on the BI-RADS scoring system and an example of the conjectured accuracy needed to determine the expected difference in accuracy between two diagnostic tests. A clear statement is made in the article: "When comparing two or more diagnostic tests, ROC curves are often the only valid method of comparison."

There has, to date, been confusion in the breast imaging community about whether ROC analysis is a proper test for measuring diagnostic performance in breast cancer screening programs. There are two main reasons for this uncertainty: first, the low rate of true-positive cases (cancers) in mammography screening and, second, the nonlinearity of the BI-RADS scoring system.

The true-positive rate (cancer detection) in a screening program varies from approximately 0.4% (incident screening round) to about 0.8% (prevalent screening round)—that is, about 99.4% of women will have a normal finding (negative). Several readers will participate in prospective screening trials in daily practice, and the ideal situation with multiple readers interpreting the same cases for multiple-reader, multiple-case ROC will not occur. How will the large imbalance between positive and negative cases impact the utility of multiple-reader, multiple-case ROC analysis in the true screening situation?

Second, there will always be a binary out-come in a screening interpretation session. The BI-RADS scores in mammography interpretation sessions cannot be considered continuous variables. The difference between BI-RADS 2 and 3 or 3 and 4, depending on the decision threshold for a positive test result, is considerably higher than the difference between scores 1 and 2 or 4 and 5, which in daily practice often could be of only academic interest because these latter differences are of minimal influence on decision making. Consequently, some authors [3] have chosen not to use the widely familiar BI-RADS lexicon for experimental ROC analysis. Other authors [4] analyze the ROC curves at only one point corresponding to the true-positive fraction and false-positive fraction of the decision threshold. However, with a group of readers operating at different decision thresholds, how does one decide on the correct point for testing differences with a multireader, multicase ROC analysis?

We would appreciate very much if the author [1] would make some supplementary comments about the use of ROC analysis as a measurement for performance in mammography screening programs.


References
Top
References
References 
 

  1. Obuchowski NA. ROC analysis. AJR2005; 184:364 -372[Free Full Text]
  2. Gur D. Technology and practice assessment: in search of a "desirable" statement. Radiology2005; 234:659 -660[Free Full Text]
  3. Cole EB, Pisano ED, Kistner EO, et al. Diagnostic accuracy of digital mammography in patients with dense breasts who underwent problem-solving mammography: effects of image processing and lesion type. Radiology 2003;226 : 153-160[Abstract/Free Full Text]
  4. Jiang Y, Metz CE, Nishikawa RM. A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology 1996;201 : 745-750[Abstract/Free Full Text]

Reply

Nancy A. Obuchowski

The Cleveland Clinic Foundation Cleveland, OH

Drs. Skaane and Niklason make an important point about the relevance of receiver operating characteristic (ROC) curves and their associated metrics (e.g., area under the ROC curve) for the evaluation of mature screening and diagnostic tests. I agree with their general tenet that ROC curves are not ideal for characterizing the accuracy of a mature screening or diagnostic test in a specific clinical population. The issue, however, is not whether the BI-RADS lexicon lends itself to ROC analysis or whether the prevalence of breast cancer is sufficient. Rather, the critical issue is the phase of the technical assessment of mammography.

Zhou et al. [1] described three phases of the assessment of diagnostic and screening tests. The first phase consists of exploratory studies, designed to determine whether the new test has any value in clinical populations. The second phase is the challenge phase where the test is applied to patients or subjects that are in some manner difficult to diagnose (e.g., because their disease is subtle or there are comorbidities that may interfere with the test's accuracy). ROC curves are commonly used in these two phases of the assessment of diagnostic and screening tests. Mammography has been assessed at these two levels.

In the third, or advanced, phase, the test is applied to a prospective sample of subjects who are representative of the target population. The goal here is not to determine whether the test has diagnostic value or whether it fails for any specific subpopulations; these issues have already been addressed in the phase I and II studies, respectively. Rather, the goal of a phase III study is to characterize the accuracy of the test for the clinical population of interest. In a phase III study, a decision threshold may already be well established, so the estimation of the test's sensitivity and specificity and the positive predictive value and negative predictive value of the test in the clinical population are most relevant.

Skaane and Niklason cite two reasons why ROC curves may not be a proper tool for assessing the accuracy of mammography. First, they cite the very low prevalence (and incidence) rate of breast cancer in a screening population. Although this low prevalence rate certainly makes designing the study challenging, a low prevalence rate alone does not affect the validity of ROC curves. Sensitivity and specificity are measures of accuracy that are conditional on the true disease status of the patient; thus, they are not biased by samples with low prevalence rates.

Second, Skaane and Niklason point out that the BI-RADS lexicon is not linear. The BI-RADS lexicon, however, does not need to be linear to construct an ROC curve. ROC curves can be constructed from ordinal- or continuous-scale test results. We need just that the test results can be ordered from least to most suspicious (i.e., ordinal scale). It is my understanding that the BI-RADS lexicon can be so ordered.

Finally, regarding multireader multicase studies, I think it is important again to consider the phase of assessment of the diagnostic or screening test and the specific goals of the study. Multireader multicase studies are important for estimating interreader variability and for assessing the average accuracy of readers. If the goal of the study, however, is to assess the performance of a screening test in the field, then a traditional multireader, multicase design, where all readers interpret all cases, is not appropriate.


References 
Top
References
References 
 

  1. Zhou XH, Obuchowski NA, McClish DK. Statistical methods in diagnostic medicine. New York, NY: Wiley & Sons,2002

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
RadiologyHome page
P. Skaane, S. Hofvind, and A. Skjennald
Randomized Trial of Screen-Film versus Full-Field Digital Mammography with Soft-Copy Reading in Population-based Screening Program: Follow-up and Final Results of Oslo II Study
Radiology, September 1, 2007; 244(3): 708 - 717.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Skaane, P.
Right arrow Articles by Obuchowski, N. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Skaane, P.
Right arrow Articles by Obuchowski, N. A.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Hotlight (NEW!)
Right arrow
What's Hotlight?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS