AJR ARRS Membership
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Herman, C. R.
Right arrow Articles by Fajardo, L. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Herman, C. R.
Right arrow Articles by Fajardo, L. L.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
AJR 2002; 179:825-831
© American Roentgen Ray Society


Fundamentals of Clinical Research for Radiologists

Screening for Preclinical Disease: Test and Disease Characteristics

Cheryl R. Herman1, Harmindar K. Gill, John Eng and Laurie L. Fajardo

1 All authors: The Russell Morgan Department of Radiology and Radiological Sciences, JHOC Rm. 4155, P. O. Box 0814, Johns Hopkins Medical Institutions, 601 N. Caroline St., Baltimore, MD 21287.

Received March 20, 2002; accepted after revision April 22, 2002.

 
This is the seventh in a series designed by the American College or Radiology (ACR), the Canadian Association of Radiologists, and the American Journal of Roentgenology. The series, which will ultimately comprise 22 articles, is designed to progressively educate radiologists in the methodologies of rigorous clinical research, from the most basic principles to a level of considerable sophistication. The articles are intended to complement interactive software that permits the user to work with what he or she has learned, which is available on the ACR Web site (www.acr.org).

Address correspondence to L. L. Fajardo.


Introduction
Top
Introduction
Appropriateness Criteria: The...
Appropriateness Criteria: The...
Evaluating the Effectiveness of...
Study Designs for Evaluation...
APPENDIX 1. Screening for...
References
 
Screening is the application of a test to detect a potential disease or condition in an individual who has no known signs or symptoms of that disease or condition [1]. In general, screening has two major objectives. One is the early detection of disease at a point when treatment is more effective, less expensive, or both. Here, the implicit assumption underlying the concept of screening is that early detection—before the development of symptoms—will lead to a more favorable prognosis because intervention initiated before the disease is clinically manifested will be more effective than treatment provided at a later stage of the disease [2, 3]. The second objective in screening is to identify risk factors that render an individual at a higher than average risk for developing a disease, with the goal of modifying the risk factors to prevent or minimize the disease [4,5,6]. The application of imaging examinations for disease screening is most often based on the first objective.

Although medical imaging is used in the diagnosis of most human ailments, mammography is the only diagnostic imaging examination currently in widespread use as a screening tool [7]. Multidetector CT is being evaluated as a means of detecting early-stage lung carcinoma [8, 9] and colorectal adenomatous polyps [10, 11], but it is not yet an accepted routine screening examination. Indeed, the concept of disease screening, including its appropriateness and evaluation, is not as straightforward as it may first appear. Even the basic assumption that early treatment will improve prognosis may not be true in all circumstances. Moreover, even if this assumption is justifiable for a particular condition, the risks or costs that are associated with any screening test (and any consequent "induced" procedures) must be weighed against the benefits. Thus, any new application of an imaging procedure to screen for disease should be considered an unproven method of disease control until its risks, benefits, and costs have been rigorously evaluated. Ideally, such evaluations should be completed before widespread use of the procedure for disease screening is undertaken or recommended [12].

Making and evaluating recommendations on the use of imaging studies for disease screening is one of the more difficult problems in medical imaging and clinical medicine. This article will discuss the use of screening tests for detecting early disease or for detecting risk factors for developing disease. Consideration will be given to the appropriateness criteria for two major elements of health screening programs: the condition or disease for which screening is being performed and the screening test itself. Within the context of these two elements, potential biases in the evaluation of screening programs and other critical issues in the evaluation of screening programs will be presented.


Appropriateness Criteria: The Disease or Condition Being Screened
Top
Introduction
Appropriateness Criteria: The...
Appropriateness Criteria: The...
Evaluating the Effectiveness of...
Study Designs for Evaluation...
APPENDIX 1. Screening for...
References
 
To be appropriate for screening, a disease should be serious, and the preclinical phase of the disease (Appendix 1) should have a high prevalence among the population targeted for screening. Furthermore, screening initiated before a critical point in the natural history of the disease should result in treatment being initiated before the onset of symptoms (Fig. 1). This treatment should be more beneficial in reducing morbidity or mortality than treatment given after symptoms develop. Finally, the screening for the disease should not result in a significant incidence of pseudodisease.



View larger version (11K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1. Diagram shows natural history of disease. Progression from biologic onset of disease to death is divided into preclinical and clinical phases. Detectable preclinical phase of disease is period during which screening tests are applied to detect a condition early in its natural history, before onset of symptoms.

 

Substantial Morbidity or Mortality If Untreated
The criterion of seriousness relates primarily to issues of both cost-effectiveness and ethics. The elimination or amelioration of adverse health consequences must justify resource expenditures on radiologic imaging for disease screening. Likewise, the consequences of failing to detect and treat the disease early must be sufficiently grave to ethically warrant exposing individuals to the risks (e.g., radiation exposure or false-positive diagnosis) and discomforts of the screening procedure itself. Life-threatening conditions, such as heart disease and cancer, and those known to have serious and irreversible consequences, such as congenital hypothyroidism and phenylketonuria, clearly meet the criterion of seriousness. On the other hand, medical imaging tests should be thoroughly evaluated for risks and benefits before being used to screen for certain asymptomatic conditions, such as gallstones. Although asymptomatic gallstones are fairly prevalent, rarely are they life-threatening and, in fact, the condition may never become symptomatic.

High Preclinical Prevalence
For a screening test to be effective, it must reveal a sufficient number of preclinical disease cases to justify the testing costs. Thus, the prevalence of preclinical disease must be high in the population for which screening is recommended. Targeting high-risk populations can increase the prevalence of the detectable preclinical phase of the disease and thus the number of cases detected on screening. This strategy will likely be applied to the emerging approaches to lung cancer screening using multidetector CT. Exceptions to the criterion concerning high prevalence of the detectable preclinical disease should be made if screening for rare conditions can be accomplished using tests that are accurate, inexpensive, and noninvasive. Although phenylketonuria occurs in only one of 15,000 neonates, widespread screening is justified by the effectiveness and low cost of the test and by the serious public health consequences of not detecting the disease in its preclinical phase.

Existence of a Critical Point and Appropriate Therapy
Screening tests are only effective if the condition or disease has a critical point (point CP in Fig. 1) so that treatment instituted before the critical point is more efficacious than treatment provided later. In the case of screening for preclinical neoplastic conditions, the critical point coincides with the onset of metastasis [12]. Thus, the critical point must occur during the detectable preclinical phase of the disease because screening is ineffective (and, indeed, unnecessary) after the onset of symptoms (i.e., during the clinical phase of the disease). If the critical point occurs soon after the onset of the detectable preclinical phase, screening may be too late to be useful. Conversely, screening may also be less effective early in the onset of the detectable preclinical phase if lesions are extremely small and are just at the threshold of detectability.

For screening to improve patient outcomes, an effective treatment for the disease must be available. A critical question in evaluating the importance of screening for a condition is whether treatment of the preclinical disease detected on screening is more effective than intervention initiated after the disease becomes symptomatic. Here, the natural history of the disease should be carefully considered. Figure 1 illustrates that the natural history of disease can be divided into preclinical and clinical phases. The preclinical phase is the period from the biologic onset of disease to the onset of clinical manifestations of the disease. During this phase, the condition is asymptomatic but detectable on a screening test. The detectable preclinical phase of disease is defined as the interval between the point at which the disease can be detected on screening (point B in Fig. 1) and the point at which symptoms develop [13] (point S in Fig. 1).

For screening to be beneficial, treatment initiated during the detectable preclinical phase must result in a better prognosis than therapy given after symptoms develop. For example, some subtypes of breast cancer develop for 3-8 years before becoming palpable at routine clinical breast examinations. During this stage, nonpalpable breast carcinomas may be detected on mammography. Many of these carcinomas are confined to the breast and are not associated with lymph node metastasis. Diagnosing and treating breast cancer during the preclinical phase result in a higher percentage of the cases remaining noninvasive (i.e., ductal carcinoma in situ), a lower percentage of cases of axillary lymph node metastasis, and a better 5-year patient survival rate than when breast cancer is diagnosed during the clinical phase [14].

Conversely, if early treatment engenders no difference in the patient's prognosis or health outcome, then the application of a screening test is neither necessary nor effective. For example, screening for lung carcinoma with chest radiography has historically been discouraged because the disease has a poor prognosis regardless of the phase during which treatment is initiated. Similarly, little justification exists in screening for conditions that are completely curable during the clinical phase of their natural history.

Low Incidence of Pseudodisease
A pseudodisease is a disease that does not require treatment because it does not affect patients' length or quality of life in a significant way. Screening for a disease will be ineffective if the screening test reveals substantial pseudodisease. Two sources of pseudodisease have been described [12, 15]. A type I pseudodisease is a condition that is diagnosed via a screening test and does not progress to symptomatic disease; it may even regress over time. This is a recognized phenomenon in screening for breast carcinoma; not all cases of ductal carcinoma in situ progress to invasive or metastatic disease [16, 17]. A type II pseudodisease is an indolent, slowly progressive disease found in conditions with long detectable preclinical phases or among patients with short life expectancies who may die from other causes [12]. This latter type of pseudodisease has been described in prostate carcinoma. Although the prevalence of clinically apparent prostate carcinoma in men aged 60-70 years is only about 1% [18], more than 40% of men in their 60s who have normal findings at rectal examinations have histologic evidence of disease [19] when prostate tissue is removed during cystectomy performed for bladder cancer. Because patients with pseudodisease do not die from the disease for which screening is performed, the survival of these patients is erroneously attributed to early treatment. If adjustments are not made for the detection of pseudodisease in a screening program, an overdiagnosis bias occurs [12]. For both types of pseudodisease, a screening test with positive results may cause the patients to undergo unnecessary tests and therapy. For these reasons, screening for conditions with a high frequency of pseudo-disease is not cost-effective.


Appropriateness Criteria: The Screening Test
Top
Introduction
Appropriateness Criteria: The...
Appropriateness Criteria: The...
Evaluating the Effectiveness of...
Study Designs for Evaluation...
APPENDIX 1. Screening for...
References
 
A successful disease-screening program requires not only that the disease have characteristics appropriate for screening but also that a valid screening test be available. Ideally, the test should be widely accessible, simple to administer, inexpensive, and associated with minimal discomfort and morbidity to the population screened. Moreover, the screening test results must be valid and reproducible. Finally, as discussed earlier, the test should be able to reveal the detectable preclinical phase of the disease accurately before the critical point of the disease.

Test Accuracy
A screening test is 100% accurate if it can be used to correctly classify individuals having preclinical disease as test-positive and those without preclinical disease as test-negative. In its simplest form, the assessment of the accuracy of a diagnostic technology involves two dichotomies: disease that is present (+) or absent (-) and test results that are positive (+) or negative (-). A 2 x 2 matrix (Fig. 2) is frequently used to illustrate the four outcome combinations in which n, the total number of test results examined, is expressed by the equation n = a + b + c + d. Two of the counts, a and d, correspond to correct test results (true-positive and true-negative, respectively), whereas b is the number of false-positive results and c is the number of false-negative results.



View larger version (9K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 2. Diagram of 2 x 2 matrix illustrates test outcomes and test accuracy for individuals with and without disease. Disease + = disease present, disease - = disease absent, a = number of true-positive results, b = number of false-positive results, c = number of false-negative results, and d = number of true-negative results.

 

Because the counts for the four outcomes are highly dependent on the sample size, it is customary to express them as rates. For example, a / (a + c) is equal to the proportion of individuals who have the disease and who have positive test results, or the rate of true-positives, also known as the sensitivity of the test; d / (b +d) is equal to the proportion of individuals who do not have the disease and who have negative test results, or the rate of true-negatives, also known as the specificity of the test; c / (a + c) is equal to the proportion of individuals who have the disease but have falsely negative test results, or the rate of false-negatives; and b / (b + d) is equal to the proportion of individuals who do not have the disease but who have falsely positive test results, or the rate of false-positives. Thus, sensitivity is the probability of an individual having positive test results when the disease is truly present, and specificity is the probability of an individual having negative test results when the disease is truly absent.

The usefulness of a screening test is evaluated by its positive and negative predictive values. The predictive value of a negative test (d / [c + d]) is the probability that a patient with a negative result on the diagnostic test truly does not have the disease for which the screening was conducted. Conversely, the predictive value of a positive test (a / [a + b]) is the probability that a patient with a positive result on the screening test truly has the disease for which the screening was conducted. The positive and negative predictive values of a test are dependent on the prevalence of the disease.

As the sensitivity of a screening test increases, the number of individuals with pre-clinical disease not diagnosed by the test decreases. A highly specific test has a low percentage of healthy individuals who are misclassified as having positive test results. Decisions regarding specific criteria for acceptable levels of sensitivity and specificity for a given preclinical disease involve weighing the consequences of leaving cases undetected (false-negatives) against erroneously classifying healthy persons as having the disease (false-positives). In general, sensitivity should be increased at the expense of specificity if the consequences of missing preclinical disease are great, such as when the disease is serious, detectable during its preclinical phase, and curable. Conversely, high specificity is desirable when the costs or risks associated with further diagnostic tests (i.e., surgical biopsy) are substantial. In this circumstance, ethics require that the screened population be informed that a negative result on the screening test does not absolutely guarantee that the disease is not present, only that the likelihood of having the disease is low.

One way to address the problem of the trade-off between the sensitivity and specificity is by administering several screening tests in parallel or sequentially. The former involves performing all the screening tests at the same time and considering individuals with positive results on any of the tests to be true-positive cases. This approach gives greater sensitivity than that achievable by performing each test alone because the condition is less likely to be missed; however, the approach lowers specificity because false-positive diagnoses are also more likely. When screening tests are administered sequentially, an initial screening test is performed, and only those individuals with positive test results undergo an additional screening procedure. Generally, sequential testing results in higher specificity than that achievable with a single test because positive results on a series of tests are more likely to represent a true-positive finding. This method, however, also lowers sensitivity.

Test Reproducibility
Any test being considered for use in a screening program must have reproducible results. For imaging tests, four important sources of variability can affect the reproducibility of results. The first relates to a biologic variation that might affect the performance of the test (i.e., patient size or cardiac motion). The second relates to the reproducibility of the test itself (i.e., patient positioning or film processing in the acquisition and production of mammograms). Third, intraobserver variability refers to differences in the way the same radiologist interprets a specific screening test at different times. Finally, interobserver variability refers to inconsistencies attributable to differences in the way different radiologists interpret the same screening examination. Interobserver variability is minimized if the interpretation criteria and end points are defined and quantifiable and is greater if the criteria are vague and subjective. Both intra- and interobserver variabilities have been reported [20,21,22] in the interpretation of screening mammograms, description of specific lesions, and recommendations for follow-up examinations, using the American College of Radiology Breast Imaging Reporting and Data System (BI-RADS) [23].

A common but flawed approach to measuring the accuracy of a potential screening test is to extrapolate data on tests performed in populations with symptomatic disease to screening populations [13]. However, using an asymptomatic population involves testing many subjects to identify a group with disease and following up those subjects to ascertain the true disease status. Both positive and negative test results in the subjects should be verified by acceptable methods such as histopathology and clinical or imaging followup. With respect to the latter, a follow-up period of sufficient length is critical. If the follow-up period is too short, false-negative cases may be missed; if it is too long, new cases of disease (e.g., "interval cancer") may be inaccurately classified as false-negatives.

Test Safety, Availability, and Cost-Effectiveness
Because screening tests are performed on asymptomatic individuals—most of whom are healthy and do not have preclinical disease—the tests must not be associated with significant morbidity or mortality. Even a minor side effect or adverse consequence to the screened population will likely offset the benefits of screening [12]. Radiation dose and the likelihood that the screening test itself may induce malignancies are frequently considered adverse consequences of screening tests involving imaging [24,25,26]. Other sources of morbidity that affect an individual's decision to undergo or forego screening include the discomfort associated with the test (e.g., compression with screening mammography or bowel preparation for screening barium enema examinations).

The screening test should be accessible to the population for whom it is indicated. Screening cannot be effective if the screening test is available only at large medical centers. Likewise, if the examination is costly, insurers may choose not to provide screening coverage, and patients may be unwilling or unable to pay for the test out of pocket.


Evaluating the Effectiveness of Screening
Top
Introduction
Appropriateness Criteria: The...
Appropriateness Criteria: The...
Evaluating the Effectiveness of...
Study Designs for Evaluation...
APPENDIX 1. Screening for...
References
 
Evaluations of effectiveness of a screening program should be based on outcomes and measures reflecting the impact of the program on the course of the disease. Here, the critical outcomes of interest are the assurance that the screened and unscreened populations are comparable, the estimates of lead-time and length-time biases, a comparison of cause-specific mortality rates between the screened and unscreened groups, and the measurement of relative and absolute risks.

Comparability of the Screened and Unscreened Groups
In determining the efficacy of a screening test, the screened and unscreened groups must be comparable with regard to all factors affecting the end point under evaluation, with the exception of the screening experience. In this regard, patient recruitment and self-selection bias (volunteer bias) should be taken into account. People who choose to participate in a screening program are likely to differ from those who do not volunteer in several ways that may affect survival [27, 28]. Volunteers tend to have better health and lower mortality rates than the general population and are more likely to adhere to prescribed medical regimes. Consequently, an observational study design comparing mortality rates of screened and un-screened groups is likely to show that those who volunteer to undergo screening have lower mortality rates, regardless of any effect of screening. On the other hand, those who volunteer for screening programs may represent the "worried well," or asymptomatic individuals who are at higher risk of developing disease because of medical or family history or lifestyle factors. Such individuals might have an increased risk of mortality regardless of the efficacy of the screening program. Thus, the direction of potential patient selection bias may be difficult to predict and the magnitude of such events even more difficult to quantify. Randomization schemes are used to overcome self-selection bias in studies evaluating potential screening tests by assigning individuals to screened and unscreened study groups after they agree to participate in the study.

Lead-Time and Length-Time Biases
Showing the benefit of treatment initiated during the preclinical phase of a disease is surprisingly difficult. Two widely recognized problems that arise when the benefits of screening are evaluated by comparing screened to un-screened populations are lead-time bias and length-time bias.

Lead time is the interval between the diagnosis of a disease at screening and the time at which it would have been detected via the onset of clinical symptoms [29]. Lead time, therefore, is the amount of time that the diagnosis was advanced as a result of screening (Figs. 1 and 3). Because screening is applied to asymptomatic individuals, every case of disease detected at screening has had its time of diagnosis advanced. Whether that lead time is a matter of days, months, or years varies by disease, individual, and screening procedure. For a disease that progresses rapidly from the preclinical to the clinical phase, less lead time will be gained from screening than for a disease that develops slowly and has a longer preclinical phase.



View larger version (13K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 3. Diagram depicts how lead-time bias can result in apparent increase in survival attributable to screening. Shown are hypothetical case histories of two women with breast cancer. Screening appears to be beneficial when, in fact, it only pushed time of diagnosis forward.

 

Lead time also varies with how soon the screening test is performed after the preclinical disease becomes detectable. For screened patients, cause-specific survival is measured as the length of time from disease detection on the screening test to death from the disease. For patients not screened, cause-specific survival is measured as the length of time from clinical diagnosis to death from the disease. For example, Figure 3 illustrates the hypothetical histories of two women with breast cancer. We assumed that the age of both women at the biologic onset of disease was 35 years and that the disease was detectable on screening when the women were 44 years old. One women (A) was screened at age 47, and her breast cancer was detected at that time. The other woman (B) did not undergo screening mammography; her breast cancer was diagnosed when she was 50 after she discovered a lump in her breast. Both women died at the age of 53. Because woman A survived 3 years longer after detection of breast cancer than woman B, screening appears to be beneficial when in fact it only pushed the time of diagnosis forward. This phenomenon is commonly referred to as lead-time bias [30,31,32,33,34,35,36]. If an estimate of lead time is not taken into account when comparing mortality between screened versus unscreened populations, survival will be erroneously overestimated for the screening-detected cases simply because the diagnosis was made earlier in the natural history of the disease. A second way to account for the effect of lead time on the efficacy of a screening program is to compare the age-specific death rates in the screened and unscreened groups rather than the length of survival from diagnosis to death.

Length-time bias refers to the overrepresentation among screening-detected cases of those diseases with long preclinical phases and thus more favorable prognoses. Diseases with a long preclinical phase are more readily detected on screening tests than are the more rapidly progressing diseases with shorter preclinical phases. An assumption underlying the concept of length-time bias is that diseases with long preclinical phases are more indolent and would have more favorable prognoses, regardless of any effect of the screening program itself. Thus, length-time bias could lead to an erroneous conclusion that screening is beneficial when, in fact, observed differences in mortality rates resulted merely from detection of cases of less rapidly fatal diseases, whereas cases of diseases that are more rapidly fatal were diagnosed after symptoms developed. Length-time bias is difficult to quantify. Its effect is greatest for cases detected at the initial screening; thus, one method of controlling for length-time bias is to compare cases detected at a subsequent screening (i.e., after the initial screening) to those detected clinically (when the patient develops symptoms).

Comparison of Cause-Specific and All-Cause Mortality Rates
The most definitive measure of the efficacy of the screening program is a comparison of the cause-specific mortality rates of those whose disease was diagnosed on screening and those whose diagnosis was made after the development of symptoms. Because the target disease causes only a small proportion of deaths in a screening-eligible population, a statistically precise estimate of differences in mortality rates or a statistically significant effect of screening on all-cause mortality rates can rarely be shown. However, evaluating the all-cause mortality rates may help to ensure that a major harm or benefit is not being missed. An all-cause mortality rate is all-inclusive and provides data relevant to the question of whether other risks are somehow changed along the continuum of the application of the screening test, the diagnosis of a disease, and the treatment. Second, an all-cause mortality rate provides an important perspective on the magnitude of benefit from screening. It puts cause-specific mortality reduction in the context of other competing risks and thus permits an estimate of the overall benefit to be reasonably expected by a particular individual who undergoes a screening evaluation [35]

Absolute Risk Versus Relative Risk
The effectiveness of screening can be expressed in terms of the relative risk, which is the ratio of the cause-specific mortality rate in the study group to that in the control group, or to the relative risk reduction, which is 1 minus this ratio. Although calculations of relative risk are valid, they can be misleading because they convey no information about an individual's baseline risk. The absolute risk reduction is increasingly recognized as a more appropriate measure of effectiveness of screening interventions [37]. Absolute risk reduction is expressed as the product of risk and relative risk reduction. For example, suppose a screening-eligible individual has a 2% probability of dying of a particular disease over the next 20 years. If the relative risk reduction from screening is 50%, the absolute risk reduction is 1%. Reporting absolute risk reduction is especially appropriate for screening because the overall risk to be averted is usually small. The absolute risk reduction puts the potential benefit in proper perspective so that an individual or his or her health care provider can weight it against the potential side effects and costs. The reciprocal of the absolute risk reduction is the number of individuals who must be screened to prevent one death or adverse event. In our example, this number is 100 or 1/0.01. The perception of the absolute risk reduction from screening may be significantly affected by the detection of a pseudodisease that, as discussed previously, falsely increases the perceived risk of developing the disease and the perceived effectiveness of earlier treatment.


Study Designs for Evaluation of Screening Tests
Top
Introduction
Appropriateness Criteria: The...
Appropriateness Criteria: The...
Evaluating the Effectiveness of...
Study Designs for Evaluation...
APPENDIX 1. Screening for...
References
 
Many epidemiologic design strategies are used to evaluate the efficacy of screening tests, including correlational studies, observational studies, and randomized trials. Correlational studies are used to examine trends in disease rates relative to screening frequencies in a population or to compare the relationship between frequencies of screening and disease rates for different populations. Such descriptive studies are useful in suggesting a relationship between screening and a decline in the morbidity or mortality rate. However, correlational studies have inherent limitations. First, because information from such studies concerns populations rather than individuals, it is not possible to establish that those experiencing the decreased mortality rate are in fact the same persons who received the screening tests. Moreover, such studies do not allow control of potential confounding factors, such as socioeconomic status. Finally, the measure of screening frequency used is usually an average value for the population, so identifying the optimal screening strategy for an individual is impossible. Thus, correlational studies can suggest the possibility of a benefit from a screening test, but they cannot test that hypothesis.

Observational analytic studies, both case-control and cohort, are also used to evaluate the efficacy of screening programs. In the case-control design, individuals with and without the disease are compared with respect to their prior exposure to the screening test. As with any case-control study, the definition and selection of the cases and controls are of critical importance to the validity of the findings [38, 39]. In a cohort study, the case-fatality rate of those who chose to be screened is compared with the case-fatality rate among those whose diagnoses were made due to the onset of symptoms. Interpretion of the results of cohort studies requires consideration of the potential effects of the self-selection of participants as well as lead-time and length-time biases [40].

Because the chief threat to validity is that screened and unscreened cases cannot be compared, the optimal assessment of the efficacy of a screening program derives from randomized trials. If the sample size is sufficiently large, the process of randomization controls any potential confounding variables. Patient self-selection or volunteer bias, a problem when comparing screened and unscreened groups in observational studies, does not influence the validity of randomized trials: after a group of volunteers agrees to participate in the study, individuals who are to undergo screening are chosen at random from the group by the investigators. Adjusting for the lead-time average can eliminate lead-time bias in comparisons of survival rates of patients whose disease was detected via screening versus those whose disease was detected clinically or, preferably, in comparisons of the age-specific mortality rates for the screened and the unscreened groups. Trials can also address the potential for length-time bias by comparing the mortality experience of the groups after repeated screenings.

In the United States, few randomized trials have evaluated programs that use imaging to screen for preclinical disease. The Health Insurance Plan Breast Cancer Screening Project [41] was a randomized trial conducted to evaluate whether periodic breast cancer screening with mammography and physical examination would result in reduced breast cancer mortality rates among women whose ages ranged from 40 to 64 years old. After 9 years of follow-up, an overall statistically significant reduction in breast cancer mortality was found among women who were offered screening compared with women who were assigned to usual medical care.

Although randomized trials provide the best and most valid data on the efficacy of screening programs, a fair amount of evidence on screening programs has come from nonexperimental study designs. Cost, feasibility, and ethical concerns can make randomized trials controversial. As radiologic screening for disease becomes more common, considerations of new evaluation methodologies to determine costs and benefits may be needed. The challenge for the future is to better identify which screening tests are appropriate for which populations. Emerging quantitative techniques of eliciting patient preferences [42] and of analyzing benefits, harms, and costs over time [43, 44] may help radiology meet this challenge.


APPENDIX 1. Screening for Preclinical Disease: Glossary of Terms
Top
Introduction
Appropriateness Criteria: The...
Appropriateness Criteria: The...
Evaluating the Effectiveness of...
Study Designs for Evaluation...
APPENDIX 1. Screening for...
References
 
Go


View this table:
[in this window]
[in a new window]

 
 


References
Top
Introduction
Appropriateness Criteria: The...
Appropriateness Criteria: The...
Evaluating the Effectiveness of...
Study Designs for Evaluation...
APPENDIX 1. Screening for...
References
 

  1. Eddy DM. How to think about screening. In: Eddy DM, ed. Common screening tests, 1st ed. Philadelphia: American College of Physicians, 1991:1 -21
  2. Eddy DM. Screening for cervical cancer. In: Eddy DM, ed. Common screening tests, 1st ed. Philadelphia: American College of Physicians, 1991:255 -285
  3. Eddy DM. Screening for breast cancer. In: Eddy DM, ed. Common screening tests, 1st ed. Philadelphia: American College of Physicians, 1991:229 -254
  4. Garber AM, Sox HC Jr, Littenberg B. Screening asymptomatic adults for cardiac risk factors: the serum cholesterol level. Ann Intern Med 1989;110:622 -639
  5. Sox HC Jr, Garber AM, Littenberg B. The resting electrocardiogram as a screening test: a clinical analysis. Ann Intern Med 1989;111:489 -502
  6. Sox HC Jr, Littenberg B, Garber AM. The role of exercise testing in screening for coronary artery disease. Ann Intern Med 1989;110:456 -469
  7. Smith RA, Mettlin CJ, Johnston DK, Eyre H. American Cancer Society guidelines for the early detection of cancer. CA Cancer J Clin 2000;50:34 -49[Abstract]
  8. Henschke CI, McCauley DI, Yankelevitz DF, et al. Early lung cancer action project: overall design and findings from baseline screening. Lancet 1999;354:99 -105[Medline]
  9. Kaneko M, Eguchi K, Ohmatsu H, et al. Peripheral lung cancer: screening and detection with low-dose spiral CT versus radiography. Radiology 1996;201:798 -802[Abstract/Free Full Text]
  10. Winawer SJ, Fletcher RH, Miller L, et al. Colorectal cancer screening: clinical guidelines and rationale. Gastroenterology 1997;1123:594 -642
  11. Frazier AL, Colditz GA, Fuchs CS, Kuntz KM. Cost-effectiveness of screening for colorectal cancer in the general population. JAMA 2000;284:1954 -1961[Abstract/Free Full Text]
  12. Black WC, Welch HG. Screening for disease. AJR 1997;168:3 -11[Abstract/Free Full Text]
  13. Cole P, Morrison AS. Basic issues in population screening for cancer. J Natl Cancer Inst 1980;64:1263 -1272
  14. Bassett LW, Lui TH, Giuliano AE, Gold RH. Prevalence of carcinoma in palpable vs. impalpable, mammographically detected lesions. AJR 1991;151:21 -24
  15. Morrison AS. Screening in chronic disease. New York: Oxford Univ. Press, 1992:125 -127
  16. Page DL, Dupont WD, Rogers LW, Landenberger M. Intraductal carcinoma of the breast: follow-up after biopsy only. Cancer 1982;49:751 -758[Medline]
  17. Rosen PP, Braun DW Jr, Kinne DE. The clinical significance of pre-invasive breast carcinoma. Cancer 1980;46:919 -925[Medline]
  18. Feldman AR, Kessler L, Myers MH, Naughton MD. The prevalence of cancer: estimates based on the Connecticut Tumor Registry. N Engl J Med 1986;315:1394 -1397[Abstract]
  19. Montie JE, Wood DP Jr, Pontes E, Boyett JM, Levin HS. Adenocarcinoma of the prostate in cytoprostatectomy specimens removed for bladder cancer. Cancer 1989;63:381 -385[Medline]
  20. Kerlikowske K, Grady D, Barclay J, Frankel SD, Ominsky SH, et al. Variability and accuracy in mammographic interpretation using the American College of Radiology Breast Imaging Reporting and Data System. J Natl Cancer Inst 1998;90:1801 -1809[Abstract/Free Full Text]
  21. Elmore JG, Wells CK, Lee CH, Howard DH, Feinstein AR. Variability in radiologists' interpretations of mammograms. N Engl J Med 1994;331:1493 -1499[Abstract/Free Full Text]
  22. Beam CA, Layde PM, Sullivan DC. Variability in the interpretation of screening mammograms by U.S. radiologists: findings from a national sample. Arch Intern Med 1996;156:209 -213[Abstract]
  23. American College of Radiology. Breast imaging reporting and data system (BI-RADS), 3rd ed. Reston, VA: American College of Radiology, 1998
  24. Dixon AK, Dendy P. Spiral CT: how much does radiation dose matter? Lancet 1998;352:1082 -1083[Medline]
  25. Faulkner K, Moores BM. Radiation dose and somatic risk from computed tomography. Acta Radiol 1987;28:483 -488[Medline]
  26. Mossman KL. Analysis of risk in computerized tomography and other diagnostic radiology procedures. Comput Radiol 1982;6:251 -256[Medline]
  27. Greenlick MR, Bailey JW, Wild J, Grover J. Characteristics of men most likely to respond to an invitation to be screened. Am J Public Health 1979;69:1011 -1016[Abstract/Free Full Text]
  28. Wilhelmsen L, Ljungberg S, Wedel H, Werko L. A comparison between participants and non-participants in a primary preventive trial. J Chronic Dis 1976;29:331 -339[Medline]
  29. Sackett DL, Haynes RB, Tugwell P. Clinical epidemiology: a basic science for clinical medicine. Boston: Little Brown, 1985: 172-176
  30. Hutchison GB, Shapiro S. Lead time gained by diagnostic screening for breast cancer. J Natl Cancer Inst 1968;41:665 -681
  31. Morrison AS. The effects of early treatment, lead time, and length bias on the mortality experienced by cases detected by screening. Int J Epidemiol 1982;111:261 -267
  32. Shapiro S, Goldberg JD, Hutchison GB. Lead time in breast cancer detection and implications for periodicity of screening. Am J Epidemiol 1974;100:357 -366[Abstract/Free Full Text]
  33. Prorok PC. The theory of periodic screening. I. Lead time and proportion detected. Adv Appl Prob 1976;8:127 -143
  34. Prorok PC. The theory of periodic screening. II. Doubly bounded recurrence times and mean lead time and detection probability estimation. Adv Appl Prob 1976;8:460 -476
  35. Black WC, Welch HG. Advances in diagnostic imaging and overestimation of disease prevalence and the benefits of therapy. N Engl J Med 1993;328:1237 -1243[Free Full Text]
  36. Shwartz M. Estimates of lead time and length time bias in a breast cancer screening program. Cancer 1980;46:844 -851[Medline]
  37. Laupacis A, Sackett DL, Roberts RS. An assessment of clinically useful measures of the consequences of treatment. N Engl J Med 1988;318:1728 -1733[Medline]
  38. Morrison AS. Case definition in case-control studies of the efficacy of screening. Am J Epidemiol 1982;115:6 -8[Free Full Text]
  39. Weiss NS. Control definition in case-control studies of the efficacy of screening and diagnostic testing. Am J Epidemiol 1983;116:457 -460
  40. Morrison AS. The effects of early treatment, lead time, and length bias on the mortality experienced by cases detected by screening. Int J Epidemiol 1982;111:261 -267
  41. Shapiro S. Evidence on screening for breast cancer from a randomized trial. Cancer 1977;39[suppl 6]:2772 -2782[Medline]
  42. Nease RF, Tsai R, Hynes LH, Littenberg B. Automated utility assessment of global health. Qual Life Res 1996;5:175 -182[Medline]
  43. De Koning HJ, Ineveld BM, van Oortmarssen GJ, et al. Breast cancer screening and cost effectiveness: policy alternatives, quality of life considerations, and the possible impact of uncertain factors. Int J Cancer 1991;49:531 -537[Medline]
  44. Black WC, Welch HG. A Markov model of early diagnosis. Acad Radiol 1996;3[suppl 1]:S10 -S12

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Am. J. Roentgenol.Home page
W. C. Black, E. A. Krupinski, A. Relyea-Chew, and F. S. Chew
Methodology and Application of Clinical Trials in Radiology: Self-Assessment Module
Am. J. Roentgenol., March 1, 2008; 190(3_Supplement): S23 - S28.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
K. F. Layton, J. Huston 3rd, H. J. Cloft, T. J. Kaufmann, K. N. Krecke, and D. F. Kallmes
Specificity of MR Angiography as a Confirmatory Test for Carotid Artery Stenosis: Is It Valid?
Am. J. Roentgenol., April 1, 2007; 188(4): 1114 - 1116.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
C. T. Kolber, G. Zipp, D. Glendinning, and J. J. Mitchell
Patient Expectations of Full-Body CT Screening
Am. J. Roentgenol., March 1, 2007; 188(3): W297 - W304.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
J. H. Sunshine and K. E. Applegate
Technology Assessment for Radiologists
Radiology, February 1, 2004; 230(2): 309 - 314.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
F. M. Hall and L. F. Rogers
CT Screening Examinations
Am. J. Roentgenol., April 1, 2003; 180 (4): 1178 - 1179.
[Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
L. F. Rogers
Whole-Body CT Screening: Edging Toward Commerce
Am. J. Roentgenol., October 1, 2002; 179(4): 823 - 823.
[Full Text] [PDF]


This Article
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Herman, C. R.
Right arrow Articles by Fajardo, L. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Herman, C. R.
Right arrow Articles by Fajardo, L. L.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS