|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fundamentals of Clinical Research |
1 Department of Radiology, 2910 Taubman Center, University of Michigan Medical Center, 1500 E. Medical Center Dr., Ann Arbor, MI 48109-0326.
Received April 6, 2001;
accepted after revision April 24, 2001.
Series editors: Craig A. Beam, C. Craig Blackmore, Stephen J. Karlik, and
Caroline Reinhold.
Introduction
|
|
|---|
Substantial research questions deal with matters of vital relevance to important groups, or populations of individuals. However, important populations are generally large and, because of numerous practicalities (economy, time, and ethics), researchers often find they cannot afford to study all members of interesting populations. The time-honored scientific solution to this problem is to draw a representative subset, or sample, from the population and to base conclusions about the population on conclusions drawn from the sample. Statistical science is then used to assess and manage the uncertainties inherent in this process of scientific inference.
The goal of this article is to review the distinction made by modern scientific thought between population and sample, and to review considerations applicable to the identification and selection of population and sample in clinical radiology research.
Conventional science distinguishes three groups of individuals (Fig. 1). The goal of the series that includes this article is to bring clinical research in radiology more in line with mainstream medical research. Researchers in radiology should therefore adhere to the modern concepts of target population, study population, and sample when designing and writing about their research. Introductory statistical texts serve to codify current concepts in mainstream scientific thinking. The following excerpt, representative of many, is taken from one such widely used text [1].
|
We must also carefully distinguish between the TARGET POPULATION and the STUDY POPULATION. The target population is the whole group of [individuals] to which we are interested in applying our conclusions. The study population, on the other hand, is the group of [individuals] to which we can legitimately apply our conclusions. Unfortunately the target population is not always readily accessible and we can only study that part of it that is available. If, for example, we are conducting a telephone interview...we do not have access to those individuals without a telephone.
Further on in the same text, the authors identify "sample" [1]:
There are many ways to collect information about the study population. One way is to conduct a complete CENSUS, by collecting data for every [individual] in it.... A more practical approach is to study some fraction, or SAMPLE, of the population.
Before selecting a sample, the investigator first must determine whether a need really exists for the information that will come from the investigation. The question being asked is intimately related to the selection of a sample that can provide the answer, and to the size of the sample needed to answer the question. The sample composition impacts the generalizability of the results to the study population; the composition of the study population impacts further generalization to the target population. The biases that might be introduced in the selection of the sample impact the confidence in the conclusions that can be drawn from a research study. In discussing the sample necessary to answer different questions, examples have been taken from this author's subspecialty of thoracic radiology, particularly the use of CT pulmonary angiography for the diagnosis of acute pulmonary embolism and lung cancer.
|
|
|---|
If a group of patients in clinical practice meets the same inclusion and exclusion criteria as the sample, then we apply the conclusions drawn from the sample to these patients from the study population with confidence. The more a patient differs from the sample, the more likely it is that the results from the sample do not apply to this patient.
Can a Disease Be Detected on an Imaging Test, and What Does It Look
Like?
|
|
|---|
For example, in the early to mid 1980s, several groups of researchers reported on CT and pulmonary embolism [8,9,10,11,12,13]. Those articles were case reports and small case series that for the first time documented that pulmonary embolism could be seen on IV contrast-enhanced CT. Although this simple concept might appear obvious to someone looking at the CT technology of today, it was not apparent before that time. The purpose of these reports by several investigators was to confirm the observation and to generate a database of knowledge that could lead to the generation of more complex scientific hypotheses. The early observations did not show the technical limitations of the technique or reveal the parameters necessary to optimize the technique. They did not show the accuracy of CT compared with a known reference standard such as conventional pulmonary angiography, and they did not show the accuracy of CT compared with other diagnostic tests, such as ventilationperfusion scintigraphy alone or in combination with lower extremity sonography. They did not show whether observers of varying expertise could agree on the diagnosis reproducibly or evaluate patient preference for one diagnostic test or another. These observations were simply the first step in a series of steps that need to occur before it can be determined if and what the role of a new technology is in medical practice.
Selection Bias and How to Select an Unbiased Population
|
|
|---|
Sampling Bias
The best sample is one that has the same characteristics as the study
population to which the investigator wishes the results to be applied. The
choice of a control group might introduce bias. A control group made up of
normal volunteers recruited from a newspaper advertisement or a notice on a
bulletin board is likely to be healthier than disease-free patients being seen
in a medical clinic, which will make a diagnostic test appear more specific
[14]. For example, if the
intent is to investigate the diagnostic accuracy of a test, such as positron
emission tomography, to distinguish between lung cancer and no lung cancer,
the appropriate group to study is all patients with suspected lung cancer, not
patients with lung cancer and healthy volunteers. In actual clinical practice,
the diagnostic test would not be applied to normal healthy volunteers but
instead to patients with, for example, a solitary nodule detected on a chest
radiograph, some of whom will have lung cancer and some of whom will not.
No matter what population is studied, it is important to thoroughly describe them. It is equally important to describe the sample. Although age and sex are usually specified, other factors, such as racial mix, inner city versus rural setting, or type of medical center in which the investigation was performed, often are not. Diseases might look different in populations of different ethnic backgrounds, and therefore diagnostic tests might perform differently. Patients referred to a tertiary academic medical center might have more severe disease than patients treated for the same disease in a community hospital. This factor might make a diagnostic test appear to be more sensitive than it is in actual community practice, because more severe disease is generally easier to detect [14]. It is also important to report comorbidities. For example, the accuracy of CT pulmonary angiography for pulmonary embolism might be different in outpatients, who in general are less sick and more likely to be able to hold their breath for a CT examination, than in hospitalized patients, particularly intensive care unit patients, who are more likely to have lung disease. In this example, reporting the frequency of pleural effusions, lung abnormalities, pulmonary function test results, and the percentage of patients who are ventilator-dependent might be crucial to understanding the population studied and how the results could be applied in clinical practice.
Exclusions and Omission of Uninterpretable Results
As important as it is to describe who was studied, it is also important to
describe patients who were excluded from the study or who declined to
participate, because they might be different from the patients actually
studied [15]. Some exclusions
are random: for example, an optical disk on which a CT scan of a patient was
stored is corrupted and the hard- copy images for that case are lost, or a
patient died an unrelated death as a result of an airplane crash. Other
exclusions are not random, and might introduce bias. For example, if patients
with early stage lung cancer manifesting predominantly as a solitary pulmonary
nodule declined to participate in a CT study designed to evaluate lung cancer
staging, the sensitivity of CT staging might be artificially high and the
population studied might be biased to patients with relatively obvious
metastatic disease. On the other hand, if patients with advanced metastatic
lung cancer declined to participate in the study because they felt too sick,
then the sensitivity of CT staging might be artificially low because the
patients with the most obvious disease were not included. For these reasons,
it is important to describe the patients studied as well as the patients who
were not studied, and to compare them to determine whether inherent
differences exist.
Consider the Radiologic Diagnostic Oncology Group lung cancer staging study [2] in which 80 of the 250 eligible patients were excluded from the analysis. The report states that 43 of these patients did not undergo a surgical staging procedure, and "20 of these were considered to have extensive disease on the basis of imaging studies (six of these had T3 or T4 lesions)." Therefore, six (7.5%) of 80 patients excluded had T3 or T4 lesions, compared with 48 of the 170 studied, or 28% [2]. In general, the higher the T level, the more likely that metastatic lymph nodes are present and that these lymph nodes are larger in size and greater in number than for lower level T lesions, and therefore easier to identify. If the sample is skewed toward patients with more severe disease, then the sensitivity might be overestimated. On the other hand, for the other 14 of 20 excluded for extensive disease, it is not stated in the published report what the extensive disease was. It is logical to think it might have been metastatic disease or M1 disease because patients with all levels of nodal or N disease were reported. If this is correct, then 14 (17.5%) of 80 excluded patients had metastatic disease. Because it is more likely that patients with metastatic disease have larger lymph nodes of greater size than patients without metastatic disease, selecting out more obvious cases of lymph node metastases might artificially reduce the reported sensitivity for lymph node staging compared with a group of all patients with known or suspected lung cancer selected to undergo imaging. So within the same study there are reasons to think that the reported sensitivity of CT and MR imaging for staging the lymph nodes is exaggerated and underestimated. The more thoroughly the sample and the excluded patients are defined, the easier it is to know whether they are similar or dissimilar and how that might impact these reported measures of test performance.
Omitting the results of studies that are technically inadequate and therefore uninterpretable, or including in a study only patients who can cooperate sufficiently to produce a technically optimal diagnostic test can lead to an overestimate of the test's sensitivity. For example, one cause of suboptimal-quality CT pulmonary angiography for acute pulmonary embolism is respiratory motion, because many patients with suspected pulmonary embolism are short of breath. If the sample is selected using clinical and demographic characteristics, and then the examinations of suboptimal quality are excluded from the final analysis, the reported sensitivity will be higher than if these patients were included in the analysis as cases in which no pulmonary embolism was detected on these studies (i.e., as negatives).
Using another CT pulmonary angiography example, Remy-Jardin et al. [16] compared the findings in 20 patients who underwent pulmonary angiography studies using 3-mm collimation, pitch of 1.7, and 1.0 sec per rotation with findings in 20 patients who underwent CT pulmonary angiography studies using 2-mm collimation, pitch of 2, and 0.75 sec per rotation. Remy-Jardin et al. stated the purpose of their study was to "analyze the influence of collimation on identification of segmental and subsegmental pulmonary arteries." The frequency of arteries that were sufficiently well seen to be analyzable for emboli was reported for both groups, with statistically significantly more segmental and subsegmental arteries seen with the thinner collimation protocol. When the sample is scrutinized, the scans included in the study had to be "technically acceptable," with strict inspiratory apnea and good or excellent arterial contrast opacification. Patients with prior lung surgery, lung distortion, or parenchymal infiltration on CT were excluded. Thirty-five patients were evaluated for suspected pulmonary embolism, all of whom had negative findings for pulmonary embolism on CT pulmonary angiography; the other five patients (12.5%) were not scanned because of suspected pulmonary embolism. In other words, the CT scans were much more ideal than they would be in a consecutive group of patients being scanned for pulmonary embolism, who are commonly short of breath and might have lung parenchymal or pleural abnormalities, or alterations in cardiac function that might reduce the technical adequacy of the study. Although this study of collimation showed that with thinner collimation more small vessels were well seen, it is unclear whether this finding would translate to a more realistic clinical population.
Retrospective Versus Prospective Selection
|
|
|---|
When looking at pulmonary embolism, the sensitivity of CT pulmonary angiography for small emboli has been questioned, leading investigators to look at the frequency with which isolated subsegmental or smaller pulmonary embolisms occur. Reported percentages have ranged broadly from 4% to 36% [17,18,19,20]. In one study, consecutive patients undergoing conventional angiography were studied, and 30% were found to have emboli in only subsegmental or smaller pulmonary arteries [20]. As the methods stated, these were consecutive patients undergoing pulmonary angiography, not consecutive patients with suspected pulmonary embolism. In fact, Oser et al. [20] stated in the discussion of their publication that
... the vast majority of our patients had intermediate-probability lung scans; thus, the patients with a larger embolic burden, namely, those with high-probability scans, were potentially excluded. This selection bias is difficult to avoid in a retrospective series, as it reflects the hospital referral pattern.
With regard to CT and pulmonary embolism, in order to know the sensitivity of CT pulmonary angiography for pulmonary embolism in the general population of patients presenting with suspected pulmonary embolism, a prospective investigation of all patients with suspected pulmonary embolism is necessary, using a reference standard such as conventional pulmonary angiography. The goal should be to prospectively recruit all patients with suspected pulmonary embolism and have all patients undergo the test under evaluation CT pulmonary angiography, and the reference testconventional angiography. Consider the impact of retrospective selection of the sample on diagnostic accuracy in the following scenarios. If all patients undergoing both CT pulmonary angiography and conventional pulmonary angiography over the previous 2-year period formed the sample, the reasons that patients underwent both tests, and not just CT pulmonary angiography, impact sensitivity. If a large proportion of the conventional angiograms were obtained because of inconclusive findings or a technically poor CT pulmonary angiogram, then the sensitivity of CT pulmonary angiography will appear artificially low compared with sensitivity in the general population. If a normal CT pulmonary angiography is the predominant reason for obtaining conventional angiograms, the sensitivity of CT pulmonary angiography will again be low. In this case, the frequency of subsegmental emboli found at angiography will also be higher than would be found in the general population of patients with pulmonary embolism because patients with larger and more obvious emboli will not have undergone conventional angiography.
Which physicians accept and begin to use a new imaging test might also bias the results. For example, if physicians in the emergency department began using CT pulmonary angiography before most of the physicians taking care of inpatients, then the sensitivity of CT pulmonary angiography might be high, but would be biased by the type of patients that are seen in the emergency department, who in general might be healthier, younger, able to hold their breath better, or have less lung disease than hospitalized patients. On the other hand, if critical care medicine physicians accept CT pulmonary angiography earlier for intensive care unit patients, the sensitivity of CT pulmonary angiography might appear low because of the extensive parenchymal consolidation and pleural effusions that are often present in this population of patients who are often ventilator-dependent. In this way, the spectrum of disease or the case mix in the sample impacts the measured accuracy of the diagnostic test in question. This point reinforces the need to thoroughly describe the patient population studied.
Retrospective studies also suffer from recall bias. Suppose an investigator wants to determine the severity of dyspnea in patients with suspected pulmonary embolism, hypothesizing that patients with more severe dyspnea have a higher frequency of pulmonary embolism than patients with lesser degrees of dyspnea or no dyspnea at all. The investigator might be approaching this as a way to evaluate the likelihood of a patient's having pulmonary embolism and thus to triage patients to a diagnostic test within 1 hr versus within 4-6 hr, given the available imaging facilities. If an investigator questions all patients evaluated over the past year for suspected pulmonary embolism about their dyspnea, it is likely that the patients who were diagnosed with pulmonary embolism and hospitalized for treatment will remember their dyspnea more vividly and rate it as more severe than patients not diagnosed with pulmonary embolism who were sent home. This would exaggerate the difference in reported dyspnea in the two groups, compared with what would be seen if all of the patients were asked about dyspnea before undergoing any diagnostic test for pulmonary embolism and would thereby increase the likelihood that the investigator's hypothesis would be proven correct on analysis.
Consecutive Versus Nonconsecutive Selection
|
|
|---|
Reference Standard
The choice of a reference standard impacts measurements of test accuracy.
In contrast to the ideal scenario for evaluating the accuracy of CT pulmonary
angiography described in the previous section, a methods section might read:
"All patients with pulmonary embolism confirmed at autopsy who had
undergone CT pulmonary angiography formed the sample." In this case, the
sensitivity of CT pulmonary angiography might be higher than in the general
population because patients dying from pulmonary embolism might have larger
emboli than patients not dying from pulmonary embolism.
Another problem is commonly referred to as "workup bias" [21]. Whenever the reference test is selectively applied only to patients with a positive result on the test in questionfor example, only patients with a positive CT pulmonary angiographythe reported sensitivity of CT pulmonary angiography will be artificially high at 100%, whereas the specificity will be artificially low.
When a new technology is compared with accepted reference tests or gold standards, the accuracy of the reference test is often called into question [22,23,24,25,26,27]. In the example of CT pulmonary angiography, the validity of conventional pulmonary angiography has been questioned. Several studies have reported poor interobserver agreement as to the presence or absence of emboli in subsegmental pulmonary arteries on conventional angiography. The Prospective Investigation of Pulmonary Embolism Diagnosis investigators (PIOPED) [28] found only 66% agreement among observers for isolated subsegmental emboli, compared with 98% at the lobar level and 90% at the segmental artery level. Similarly, Diffin et al. [17] reported interobserver agreement of only 45% for isolated subsegmental emboli at conventional angiography. If observers cannot agree on the gold standard, how can the new test, CT pulmonary angiography, be compared with it? This problem might lead investigators to look for a new sample population and apply a new gold standard. To do so might require an animal study with autopsy confirmation as the reference standard. For CT pulmonary angiography, Baile et al. [27] did just that. To compare the accuracy of CT pulmonary angiography and conventional angiography, these investigators instilled colored methacrylate beads into the pulmonary artery circulation of pigs, with a methacrylate cast of the pulmonary arteries used as the reference standard. These researchers found no statistically significant difference in CT pulmonary angiography and conventional angiography for the detection of emboli. However, if conventional angiography were used as the reference standard to which 1-mm CT pulmonary angiography was compared, conventional angiography would, by definition as the reference test, be 100% sensitive with a 100% positive predictive value, whereas CT pulmonary angiography would be considered only 76% sensitive with a positive predictive value of only 86%. If the sensitivity of a test is in question, surrogate measurements might be used to support the value of a negative test, such as patient outcome. For CT pulmonary angiography, most investigators have looked at series of patients gathered retrospectively with negative findings for pulmonary embolism on CT pulmonary angiography, and looked at the incidence of pulmonary embolism over the next 3-12 months. These studies have shown that pulmonary embolism occurs with the same frequency after negative findings on CT pulmonary angiography as after negative findings on conventional angiography [29, 30].
Imaging-Based Selection
It is often convenient to select patients who have undergone an imaging
test, or patients who are going to be sent for imaging, to form a sample. This
is referred to as imaging-based selection. However, patients who undergo
imaging might not be representative of all patients with a specific diagnosis
or symptom. Consider describing the appearance of lung cancer on MR imaging.
Investigators could generate a list of all patients at their facility who
underwent thoracic MR imaging in the past or will be undergoing MR imaging
over the next year, who have a diagnosis of lung cancer. A fairly high
proportion of these patients will likely have masses that abut or invade the
mediastinum. This does not mean that this proportion of all patients
presenting with lung cancer have mediastinal invasion, because the patients
undergoing MR imaging for lung cancer are usually preselected because of a
suspicion of mediastinal invasion on CT, and therefore the high incidence
should not be surprising. To know what the appearance of lung cancer is on MR
imaging or to determine the accuracy with which MR imaging can detect lung
cancer requires that all consecutive patients with a diagnosis of lung cancer
over a specified period of time undergo MR imaging. Although this example
might seen fairly obvious, the literature is full of examples in which this
type of selection bias impacts study results, although the impact on the
results might be less obvious than in the example and not initially
apparent.
Generalizability
Who was studied impacts to whom the results can be applied. If all patients
presenting with suspected pulmonary embolism undergo a diagnostic test, the
results will be different than if only patients with acute right heart failure
and suspected massive pulmonary embolism are studied, or if patients who have
an inconclusive result from another diagnostic test, such as a
ventilationperfusion scan, are studied. Similarly, how the test
performs on inpatients or intensive care unit patients might be different from
how it performs in outpatients or patients presenting to an emergency
department, who are less likely to have coexisting lung disease or abnormal
chest radiographic findings. In selecting a population to study for an
investigation, it is important to consider to whom the information derived
from that investigation can be applied.
For example, recently the prevalence of isolated subsegmental pulmonary embolism has been debated as part of the question of how accurate CT pulmonary angiography needs to be for the detection of subsegmental pulmonary embolism. If isolated subsegmental pulmonary embolism rarely occurs, then the technology might not need to be accurate for vessels of this size. However, if isolated subsegmental emboli are commonly seen, then the technology might need to be accurate. In one study, isolated subsegmental pulmonary embolism was reported to occur in 36% of patients diagnosed with pulmonary embolism [19]. In another study, isolated subsegmental pulmonary embolism was reported to occur in only 6% of patients diagnosed with pulmonary embolism [18, 31]. Which more realistically represents a population of all patients with suspected pulmonary embolism? The former study was performed to prospectively compare helical CT with pulmonary angiography for the detection of pulmonary embolism in patients with an unresolved clinical and ventilationperfusion scan diagnosis of pulmonary embolism. Patients with either a normal perfusion scan or a high-probability scan, the two groups for whom no pulmonary embolism and definite pulmonary embolism were diagnosed, and perhaps the easiest patients for CT to evaluate, were not studied with CT. Therefore, it is likely that 36% is an overestimate of the frequency with which isolated subsegmental pulmonary embolism occurs. The latter study was the PIOPED study [18, 28], in which patients with suspected pulmonary embolism were prospectively enrolled at multiple medical centers, and all patients underwent ventilationperfusion scannning and conventional pulmonary angiography.
The results described by Goodman et al. [19] can be generalized only to patients with an unresolved clinical suspicion for pulmonary embolism after ventilationperfusion scanning who underwent CT, as the title of that investigation states clearly. The results can also be generalized only to patients undergoing CT with the technique that was reported (5-mm collimation, pitch of 1:1, covering 12 cm of the thorax, and viewed on hard-copy film). Imaging technology rapidly evolves. Several researchers after Goodman et al. have reported on CT pulmonary angiography at 3-mm collimation [32,33,34]. The ability to perform multidetector CT pulmonary angiography using 1.25-mm collimation of the entire thorax is now possible, and interpretation on workstations has been shown to improve detection of pulmonary embolism compared with film-based interpretation [35]. However, the published literature lags behind what the technology of today is capable of. As investigators plan to study a new technology, they should consider ways to recruit a larger number of patients more quickly to answer the question they propose before the technology is outdated [36].
Several studies have reported the findings of pulmonary embolism detected incidentally on CT scans obtained for other reasons [13, 35, 37,38,39]. It would be incorrect to draw a conclusion that the anatomic distribution of pulmonary emboli in these patients is the same as in a population of patients presenting with clinical signs or symptoms of pulmonary embolism. In one series of nine patients, no incidentally detected emboli were seen beyond the segmental arteries [39]. This result does not mean that subsegmental pulmonary embolism does not occur as an incidental finding. The CT scans in this study might have been done with protocols used for general thoracic CT, rather than using a thin-section, rapid IVcontrast injection protocol CT, or the researchers may not have used a workstation for interpretationboth factors that improve the accuracy of CT pulmonary angiography for pulmonary embolism, particularly for small arteries.
|
|
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
G. T. Sica Bias in Research Studies Radiology, March 1, 2006; 238(3): 780 - 789. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. C. Blackmore and P. Cummings Observational Studies in Radiology Am. J. Roentgenol., November 1, 2004; 183(5): 1203 - 1208. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |