|
|
||||||||
Noninterpretive Skills for Radiology Residents |
1 Department of Radiology, Box 170, The University of Virginia Health System, Charlottesville, VA 22908.
Received November 8, 1999;
accepted after revision November 17, 1999.
This is the third in a series on noninterpretive skills for residents in
diagnostic radiology from the American College of Radiology and the
Association of Program Directors in Radiology. Editor: Jannette Collins.
Introduction
|
|
|---|
In radiology, the situation is especially problematic. Over the past 30 years, perhaps more than any other specialty, radiology has experienced the introduction of many revolutionary technologies that have dramatically changed imaging practice. The diffusion of these technologies and the introduction of new applications for their use have far outpaced clinical researchers' determination of their appropriate roles in clinical practice. Instead, use has been guided by methodologically poor reports that tend to exaggerate the performance of new techniques [1,2,3], further spurring diffusion and use. The end result is the suspicion that expensive technologies such as CT and MR imaging are being used more frequently than is necessary for cost-effective patient care. This suspicion is widely held among policymakers and the payers of health care and is primarily responsible for the targeting of medical imaging as an opportunity for cost reduction.
At the root of the problem is the naivete of most radiologists as they read radiology literature and listen to meeting or postgraduate course presentations. Few radiology training programs emphasize research in their curriculum, and few programs address and encourage critical thinking skills. More critical radiologists would better dissect the methods supporting the results and conclusions of studies, adopting into their practices only valid information and discarding studies with biased results.
Comprehensive instruction in study design, biases, and hypothesis testing is beyond the scope of this presentation. Rather, I present a framework that can guide interested readers to continue their study of this subject to become better radiologists who can more appropriately care for patients. Additionally, I hope to influence radiology training programs and institutions such as the Radiology Residency Review Committee and the American Board of Radiology to consider the incorporation of critical thinking skills in the education of residents as a means of improving radiology.
Characteristics of Believable Publications and Presentations
|
|
|---|
Validity refers to the methods used in a study. Were the methods relatively free of biases (almost every study has some biases)? Were the measurement methods generally accepted and susceptible to comparisons with other studies? Could the results be tested by accepted statistical techniques? There are numerous types of validity. First, evaluation of validity considers "face validity," in other words, on the surface, does the study seem appropriate? Other types of validity, such as criterion and construct validity in survey research, and how to test for them, pertain more directly to the type of study and methodology. Assuring validity is frequently an arcane science, requiring experts from other disciplines as part of the research team. Readers should feel more assured about the validity of the study if appropriate methodologists and statisticians have been involved in the study design.
Reliability means that if the study were conducted a second time in the same setting with the same population of interpreters, a similar result would likely be obtained. Reliability is frequently an issue in interpreter studies of imaging efficacy (defined later in this article). One method of determining reliability is to have the observers of a study reinterpret a sample of cases that they interpreted at an earlier time. If the observers are able to produce similar interpretations (intraobserver agreement), the study is reliable.
Generalizability is perhaps the most important consideration for a radiologist trying to determine whether a result would be obtained in his or her own practice. To what extent do the setting, radiologists, and patient population reflect his or her practice? Did the technology reflect clinical practice or did it reflect a research setting? If the latter, how much of a stretch is it to imagine using the methodology in practice? Health policy experts generally discuss the differences between studies of efficacy and studies of effectiveness. Studies of efficacy examine a technology as it is used by subspecialized expertsoften those involved in developing and testing the technology in idealized settings. Studies of effectiveness examine how a technology performs in general practice. For obvious reasons, studies of effectiveness are of much greater interest to payers and policymakers than studies of efficacy. Generalizability is enhanced when studies are multiinstitutional, large, and effectively include a broad spectrum of possible patients and practitioners in general practice. Unfortunately, such studies are rare in the radiology literature.
Purpose of the Study and Study Design
|
|
|---|
I equate scientific clinical research that involves patients with a term that has become trendy in medical literaturetechnology assessment. A simple nomenclature of technology assessment (in ascending order of relevance to patient care) [5] categorizing studies according to their end point includes imaging efficacy, diagnostic efficacy, therapeutic efficacy, patient health outcomes (including quality of life outcomes), and cost-effectiveness.
Imaging efficacy details the performance of a technologyradiologist combination in determining whether disease is present or absent. It uses familiar measurements such as sensitivity, specificity, accuracy, positive and negative predictive value, and receiver operating characteristic curve analysis. Diagnostic and therapeutic efficacy reflect how the information from imaging impacts the referring physician's diagnosis and treatment plan, respectively. Both compare before-test with after-test considerations and estimate the difference between the two [6]. Patient health outcomes include both traditional measures of health (physiologic health), such as morbidity and mortality, and quality-of-life health outcomes that consider how health care affects constructs such as physical, mental, social, and role functioning; sense of well-being; and health-seeking behaviors [5]. Intuitively, medical imaging might play an important role in improving these aspects of health, but little research sustains this belief. Cost-effectiveness research requires the measurement of both cost and benefit, whereby the result is generally expressed in terms of incremental dollars spent per incremental improvement in health. For a number of reasons, largely related to important misapprehensions about the term, the frequency of the use of the term, and how to apply this research, I treat this as a special case to be discussed in a separate section of this presentation.
For the remainder of this section, I focus on the design of studies of imaging efficacy, but the general observations hold true for other genres of technology assessment. I have chosen to focus on imaging efficacy because these studies are, by far, the most frequently published scientific clinical research in the radiology literature.
Imaging efficacy research is susceptible to a prospective randomized controlled trial design (the generally accepted standard of clinical research). Randomized controlled trials (trials that ensure that patient populations in experimental and control groups are similar) greatly reduce the likelihood that biases will influence results. However, few randomized controlled trials have been performed in radiology because randomized controlled trials are difficult to design, time-consuming to implement and complete, impractical because of the number of patients required, and expensive. For a fast-moving technology such as diagnostic imaging, there are reasonable concerns that by the time randomized trials are completed the technologies tested will be obsolete (sometimes referred to as the "moving target" problem). Much more frequently, diagnostic imaging studies use a paired design whereby each patient is examined with two or more technologies. This design allows smaller sample sizes, less time, and less expense because it enables each patient to act as his or her own control and enables pairing of the analysis.
Both paired and unpaired designs are subject to a number of important biases that can severely affect the result (usually in a positive direction) and incorrectly influence the practice of the unwary reader. Such biases include verification (workup) bias, test interpretation (test review or diagnostic review) bias, casemix (selection or spectrum) bias, bias in reporting uninterpretable results, lack of a definitive reference standard, investigator bias, misapprehensions that result from inappropriately reporting certain measures (e.g., accuracy and positive predictive value), and inappropriate conclusions based on sample size [7,8]. A recent publication evaluating 11 metaanalyses of diagnostic technologies, including three imaging technologies, revealed that the literature is rife with some of these biases [9]. The study reported that biases systematically resulted in the exaggeration of the performance of imaging technologies. The most frequently encountered biases and the ones that led to the most severe skewing of the results were case mix and verification biases.
Common Biases and Sources of Misapprehension
|
|
|---|
Test interpretation (test review or diagnostic review) bias results from observers participating in an imaging efficacy study with foreknowledge that biases their interpretations. Sources of knowledge include remembering retrospectively accumulated cases, knowing the results of other tests, or having heard of the reference test result. The best solution for this problem is to have observers interpret cases "off-line" (not in the clinical setting) and to blind them to all such information. When two or more technologies are involved in the study, the order in which they are presented to each observer should be randomized to equalize any recall effect that might be present. Controversy continues as to whether observers should be provided clinical information in such designs. Clinical information influences performance [10, 11], so the choice should depend on whether the effort is intended to replicate the clinical setting [7]
Case-mix (selection or spectrum) bias occurs when cases are selected that inaccurately reflect the range of cases that occur in the general population. This is especially problematic for the evaluation of new technologies. For new technologies, we know in which cases we find disease, but we know less well which cases are missed. Thus, either retrospective or prospective case selection will generally favor selecting cases with the most advanced disease. In essence, the study becomes an evaluation of the wellest of the well and the sickest of the sick. An interpreter study using these easier cases is more likely to show a favorable result than when the technology is used in general practice.
A common practice in the radiology literature is to not report cases that show an uninterpretable result. Dealing with uninterpretable results can be problematic; however, fair and honest reporting requires that all cases be included in a publication or presentation. Failure to do so will give an unreasonably optimistic picture of the technology studied. If an uninterpretable result can be shown, using the reference standard, to be associated with either the absence or presence of disease (non-random pattern), it may be reasonable to include the test results in either of these categories with explanation. However, if the pattern is random, it is better to include noninterpretable results as a separate category [7].
Lack of a definitive reference standard is the sole bias that tends to make the performance of a technology appear more negative than it really is. Incorrectly classifying some of the results as correct or incorrect based on a reference standard that is sometimes wrong can lead to errors in categorization that understate accuracy. This is especially problematic when the technology being tested in reality performs better than the fallible reference standard. An example of such a situation is the use of biopsy as the reference standard for sporadically occurring diseases, such as autoimmune conditions, in which the more global view of an imaging technology might better reflect the patient's condition. When no adequate reference standard is available, assiduous clinical and imaging follow-up of patients to improve on classification (similar to the recommendations for avoiding verification bias) is the best approach.
The developers and initial evaluators of a new technology have investigator bias. Their experiences with their technology have been salutary and the career benefits they have derived from working with it may engender an exaggerated view of its capabilities. In addition to unconsciously allowing other biases detailed in this section to creep into their designs, affected investigators may also unconsciously fall prey to interpreting results in ways that confirm their preconceived views.
Accuracy is frequently used to report the performance of imaging tests. Accuracy is usually defined as all patients correctly diagnosed divided by the total number of patients in a study. Now imagine an extreme example in which a test is terrible at identifying patients with the disease0% sensitivitybut very good at being negative when no disease is present. And imagine that the population sample used to evaluate this test has 99 healthy people and one person with disease. Accuracy will be 99%, but in reality the test is functionally useless for its purpose. Examples of the misuse of accuracy, perhaps not quite so extreme but still misleading, abound in the radiology literature. Similarly, although the use of positive predictive value is generally recommended by most methodologists, its use can also be problematic for interpreting imaging efficacy studies. The case samples in imaging efficacy studies are typically heavily weighted toward cases with positive findings so that the number of cases required to achieve a statistically significant result can be decreased. However, positive predictive value is heavily influenced by the a priori probability of disease being present, or the representation of positive cases in a sample. That the disease prevalence in the sample exceeds the prevalence of disease in the general population causes the study results to be an exaggeration of positive predictive value beyond that which could be expected in practice [8].
Studies in the radiology literatureeven well designed onestend to be small in the number of patients and observers they report. The reasons for this are expense, available case material, and influence of career incentives to complete research quickly and publish frequently. When a positive difference between two technologies is established, little harm is done, so long as the other dictums of pertinence, validity, reliability, and generalizability are accommodated. However, a negative resultone showing no differenceis suspect. It may be that a real difference exists, even a clinically important one, but that the sample size is too small to reveal it significantly. In designing a study, I believe it is important to first consider what would be considered a practically significant difference. The study should be sized to have a high reliability for revealing such a difference. Smaller studies are inadequate and larger studies are wasteful.
|
|
|---|
The frequency with which improper claims are made for cost-effectiveness in the radiology literature is disturbing. That this is the case reflects an imperfect understanding of the meaning of the term. As pointed out by Doubilet et al. [12], the term "cost-effectiveness" is meaningless in an isolated context. Rather, a technology can be considered cost-effective only in comparison with another technology, and the technology is both less costly and of greater benefit than the technology to which it is compared (technologic dominance) [12]. This is a relatively rare circumstance. More often, a technology is either more costly and of greater benefit (e.g., nonionic contrast material) or less costly and of lesser benefit (e.g., stereotactic core breast biopsy) than the established technology it seeks to replace. In such cases, it is a societal decision (usually carried out through innumerable local political processes) whether the trade-off in cost and benefit are worthwhile.
In 1997, Blackmore and Magid [3] detailed their review of the 1989-1995 radiology literature to evaluate the quality of studies claiming to be cost-effectiveness research. These researchers assessed 44 articles for six major and four minor generally accepted criteria of methodologic rigor. The six major criteria included an explicit statement about which technologies were being compared, a perspective of analysis (e.g., payer, patient, societal, etc.this is critical because the costs and benefits that need to be studied will vary according to perspective), the cost data, the health outcome data, the use of a summary measure that used cost and benefit, and sensitivity analyses to test the effect of uncertainties. The four minor criteria included the source of cost data, the consideration of longterm (downstream) costs; discounting for opportunity costs; and the use of incremental computation (e.g., looking at marginal, rather than average, costs and benefits). All 10 criteria were present in only three of the 44 studies; the median number of major and minor criteria per study were three and one, respectively.
Critically Reading a Research Article
|
|
|---|
The introduction should have a succinct statement of what was studied and why it is an important subject. The subject or problem should be well defined. Ideally, an implicit or explicit hypothesis should be stated.
The methods section should provide sufficient detail so that the work could be replicated. It should indicate the specific technologies studied; the patient and radiologist populations, including sample size, inclusion and exclusion criteria, and other important descriptions; the outcomes or end points that were studied and the justification for their being chosen; the reference standard (if an imaging efficacy study); and the details of the analysis and statistical hypothesis testing.
The results should report what was promised in the introduction and methodsnothing more and nothing less.
The discussion should provide a restatement of the one or two most important findings, a statement about how the findings should be placed in the context of other related studies, an explanation of possible reasons why the authors' results differ from those of previous work, a detailing of possible biases in the work and why they could not be avoided, and a conclusion that indicates the authors' view of how the research should influence further research or clinical practice.
|
|
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
C. T. Thomas, P. T. Bradshaw, B. H. Pollock, J. E. Montie, J. M.G. Taylor, H. D. Thames, P. W. McLaughlin, D. A. DeBiose, D. H. Hussey, and R. L. Wahl In Reply: J. Clin. Oncol., January 15, 2004; 22(2): 380 - 381. [Full Text] [PDF] |
||||
![]() |
M. E. Spieth and R. B. Gunderman Radiologic Research and Residency * Dr Gunderman responds: Radiology, February 1, 2003; 226(2): 593 - 593. [Full Text] [PDF] |
||||
![]() |
S. Chan The Clinical Relevance and Scientific Potential of Ultra High-Field-Strength MR Imaging AJNR Am. J. Neuroradiol., October 1, 2002; 23(9): 1441 - 1442. [Full Text] [PDF] |
||||
![]() |
R. B. Gunderman, J. M. Nyce, and J. Steele Radiologic Research: The Residents' Perspective Radiology, May 1, 2002; 223(2): 308 - 310. [Full Text] [PDF] |
||||
![]() |
R. B. Gunderman Is Technical School a Good Model for Radiology Residency? Am. J. Roentgenol., November 1, 2001; 177(5): 1005 - 1007. [Full Text] [PDF] |
||||
![]() |
R. C. Brunken, D. R. Neumann, B. Bybel, C. C. Blackmore, and C. A. Beam Beyond Radiology Consultant Am. J. Roentgenol., November 1, 2001; 177(5): 1216 - 1217. [Full Text] [PDF] |
||||
![]() |
T. The Evidence-Based Radiology Working Group Evidence-based Radiology: A New Approach to the Practice of Radiology Radiology, September 1, 2001; 220(3): 566 - 575. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |