|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fundamentals of clinical research for radiologists |
1 Diagnostic Radiology and Nuclear Medicine, Rm. 2MR21, University of Western Ontario, London Health Sciences Center-University Campus, 339 Windermere Rd., London, Ontario, Canada N6A 5A5.
Received June 13, 2002;
accepted after revision July 9, 2002.
Address correspondence to S. J. Karlik.
Introduction
Discrete Data
A discrete variable is characterized by having only certain values (usually
integers). For example, a patient can have only a whole integer representing
the number of breast tumors. There are never cases of "2.7 tumors
detected on a mammogram" (although a group of patients might have a mean
of 2.7 tumors). Another example might be, "The study used eight
radiographs for archiving the images for a study." In the previous
example, it seems obvious that we use only a whole radiograph, not 7.5 or 8.5.
The distinction may not be so apparent: consider WBC. Because one counts the
number of cells per millimeter cubed, the data appear (e.g., 33
cells/mm3) like a ratio scale (which is discussed in the next
section of this article). Because there are never partial cells, the data are
defined as discrete.
Comparing Ratio and Interval Scales
Ratio scales of measurement have a constant interval size and a true zero
point. If one patient has a 6-cm kidney tumor and a second has a 3-cm tumor,
then we can state that the second tumor is half as large as the first. Ratio
scales also include capacities (mL), volumes (cm3), rates (mL/min),
weights (kg), and lengths of time (min).
Interval scale data are those derived from a measurement scale that possesses a uniform interval, but interval scale data have no true zero, as, for example, the centigrade temperature scale (degrees Celsius). Although the difference between 20°C and 25°C is the same as between 5°C and 10°C, 50°C cannot be considered twice as hot as 25°C because the zero point is arbitrary. Actually, the temperature scale of kelvin is a ratio scale, because the zero value is real at absolute zero.
Ordinal Data
This type of data deals with comparisons that are relative, rather than
quantitative. Thus, the data consist of an ordering or a ranking of
measurements. When one orders the finding, then the scale becomes ordinal,
even if the steps in the order are different. An example is the Kurtzke
[1] expanded disability scale
(0-10) for the neurologic assessment of patients with multiple sclerosis. In
this widely used scale, a worsening in the patient status of one unit from 1
(minor signs) to 2 (elevated thresholds) is dramatically different from 6
(walks with assistance) to 7 (wheelchair bound). A common from used in
radiology is to classify image interpretability as poor, moderate, or
excellent and perhaps grade as 1, 2, and 3.
It is also possible to have exactly the same original data portrayed in several different data types. Using an example of examination marks, we can have raw marks of 97, 75, 68, and 51 (discrete data) that can be expressed as the grades A, B, C, and D (ordinal data) or pass, pass, pass, and fail (nominal data). Although this latter example appears trivial, this exact type of data reduction is common in radiology, in which a complex data set is reduced to presence or absence to facilitate the common 2 x 2 chisquare analysis of diagnostic accuracy. The problem with data reduction is that it can result in a loss of information.
Continuous data have no discrete divisions between elements apart from those imposed by our measuring technique. Some examples are time, the size of a tumor, and blood pressure. Table 1 lists the time taken for a bolus injection of radiographic contrast material to reach a maximum in the kidney with a range of 8-28 sec for 30 patients. These data are raw in the sense that they are unadulterated, unmodified, and untransformed. Time is a continuous measurement, which can take any value whatsoever, but the precision of its measurement is dependent on our measurement tool (wall clock vs stop watch, accurate to a millisecond). By the established rules of science, a reported time of 21 sec is actually all times from 20.5 sec up to and including 21.4 sec. The next section illustrates a variety of ways of exploring these data.
|
A preliminary and easy way to look at continuous (raw) data is to use the "stem-and-leaf" plot. Although likely unfamiliar to the radiologist, it is easy to construct without computerized graphing packages and shows the distribution of the data in a rudimentary way. The common "stem" is along the left for each decade (0 for units = 0-9, 1 for teens = 10-19, and 2 for twenties = 20-29), and the different values are sorted by increasing values in the second column (Table 2). Most values are in the decade from 10-19, and there are no values exceeding 28 sec. This plot style would clearly identify a highly unusual value (84 sec) from a large number of pointsfor example, if a value of 8 in the stem and 4 in the leaf were seen. Although a stem-and-leaf plot can allow an easy appreciation of a data set, the details of the distribution are missing.
|
To obtain a more detailed examination of our example of enhancement time, we created a dot plot that shows the frequency of occurrence of any individual data values (Fig. 1). In a way analogous to the stem-and-leaf plot, the dot plot uses a stem value for each unique time value in the whole set. Then, a single dot is plotted for each occurrence of that value in the data setfor our example, one dot for 8 or 9 sec and 4 dots for 16 sec. Although possibly also unfamiliar, this method is another way to picture the raw data and is analogous to a histogram for each time point. In this type of plot, each data point simultaneously shows the actual value, occupies space, and represents one counting unit. Compared with the stem-and-leaf visual, the dot plot permits a more detailed appreciation of the variability in the data and is close to a histogram (albeit one that has been stood on its end).
|
The data set that is organized as a conventional histogram shows the
frequency of each data value as a bar (Fig.
2). When the data are scattered or the data intervals are too
numerous, it is customary to reduce the number of intervals, remembering that
there should be enough intervals or bins to show any relevant pattern. Because
the data in Table 1 consist of
30 values, one published rule is to use approximately
n
(square root) intervals, where n is the total number of values
[2]. With an n value
of 30 and a
n value of 5.5, five or six intervals are
appropriate. If we choose six intervals, then the resulting histogram shows a
maximum in the 15- to 17-sec interval (Fig.
3). Although this reduction of the data by decreasing the number
of intervals loses some of the details of the exact measurements seen in
Figure 2, the essential
character of the data is illustrated, in that the maximal enhancement time
values are identified in the 12- to 21-sec area in intervals containing 12-14,
15-17, and 18-21 sec. If the transit time variability is important, then you
might prefer to choose Figure
2. Conversely, if showing the typical time were your goal (say to
choose an optimal imaging time), then the expression of data in
Figure 3 would be appropriate.
Neither choice is artificial; each emphasizes a different aspect of the
data.
|
|
Having plotted our data and appreciated its distribution, we must determine three primary attributes: the center, the dispersion, and the symmetry of the data distribution.
Measurements of Central Tendency
The most widely used measure of central tendency is the familiar mean, in which the calculation of the mean is simply adding all values in the data set and dividing the sum by the number of samples. This procedure yields a mean value of 17.2 sec for our time data. The mean is only applicable to ratio or interval scale data.
Another way to look at this data is to make a cumulative frequency diagram. We first convert the frequency histogram (Fig. 3) to a cumulative frequency table (Table 3) and then plot Table 3 as the final cumulative frequency diagram (Fig. 4). The conversion is started by listing the number of occurrences for each interval under the interval values in Table 3. Then we calculate the cumulative frequency for each interval as the total frequency of that interval, plus the frequency of all lower intervals. For example, the cumulative frequency for the interval 18-21 sec is the actual frequency (seven occurrences) plus the total frequency in all smaller intervals (n = 18) to yield 25. It is also possible to convert the raw frequency histogram to the cumulative frequency diagram in an entirely analogous way using the individual data values rather than the intervals.
|
|
The cumulative frequency diagram provides the investigator with an opportunity to visualize three important measures of the data: the first quartile (q), the median value (M, or the second quartile), and the third quartile (Q). The median is the middle value from the data set. Because there is an even number of observations (n = 30), we take the 15th and 16th values from a list of the data with increasing values (16 and 17 sec here) and take the average, which is 16.5 sec. The median divides the data into two equal parts (by the number of observations); the quartiles divide each of these halves into two or four parts total. The values of q, M, and Q can help to show whether the data are symmetric in the interquartile range, which happens if the Mq and QM ranges are approximately equal. This determination of interquartile ranges is our first introduction to measures that characterize the dispersion or spread of the observed data.
Imagine that the histogram illustrated in Figure 3 could be physically weighed instead of occupying some space in a plot. The mean can be conceptually thought of as dividing the histogram into two equal parts by weight, whereas the median is simply the middle measurement in the data set. The median also expresses less information than the mean because the median is based on the rank of the individual data values (not the actual values). When the data set has many values that are low or high compared with the average, the median is less sensitive to these values and may be a preferential way to describe the central tendency. Thus, the median is insensitive to the data extremes. In our example, this insensitivity could happen if we exchanged the highest value in our set (28 sec) with a larger data point (100 sec). Although the median value would remain the same (16.5 sec), the new mean is 19.6 sec. Thus, the median retains its ability to identify a value more consistent with the spirit of the data compared with the mean, which has been increased by the extreme value.
In addition to the quartile divisions mentioned previously, the distribution can also be divided into other parts, such as percentiles (or 100 parts). A representative example of this division is the use of lethal dose 50 (LD50) from pharmacologic studies. The LD50 is actually the dose at which 50% of the experimental animals died, or the 50th percentile of lethal doses, or the median lethal dose. Similarly, q (first quartile) is the 25th percentile and Q (third quartile) is the 75th percentile.
A useful way to depict this type of data is the box-and-whiskers plot (Fig. 5), which is effective in summarizing the properties of a data set. The bottom and top of the box are the 25th and 75th percentiles (which are q and Q in Fig. 4), the line in the box is the median value (M), and the "whiskers" (looking like error bars) extend to the 10th and 90th percentiles.
|
The mode is another term used to describe the central tendency of a data set. The mode is defined as the most frequently occurring measurement, which is 16 sec for our enhancement data. It is possible that the data set has more than one mode. Hence, it is possible to see the descriptor "bimodal" for a distribution of data having two modes or two peaks on a plot of the data.
If we can assume that the data we collect is normally distributed, the SD has some useful interpretations. For example, 68% of all observations will lie within ± 1 SD of the mean value. Ninety-five percent of the data lies within ± 2 SDs, and 99.7% lies within ± 3 SDs of the mean. Hence, the SD is approximately one sixth of the total data range for a normal distribution.
The mean and SD of a normally distributed data set tell us about the internal structure or internal proportions. Another term that is often seen is the standard error. The SD of the means of many samples from the same population is called the standard error. The standard error depends on the sample SD, the number of samples, and the proportion of the population in the sample. These three statistical measuresmean, SD, and standard errorare used to determine whether two experimentally determined samples are from different populations. When we compare samples, we are applying a test of significance. "Statistically significant" may not equate to "interesting" or "important."
|
|
|
Relationship Between Two Variables
A scatterplot is the first step in examining the relationship between two sets of measures. The correlation coefficient (r) measures how close the relationship between two measurements is to linearity. The maximal values for r are 1 or -1, and the two variables can be positively or negatively correlated. If the two variables show a nonlinear relationship (e.g., parabolic), then r equals zero, even though a strong relationship exists. The two calculations for correlation coefficient are Pearson's product moment correlation for normal data, and for ordinal data, Spearman's rank correlation.
When a correlation coefficient is used, three steps should be adhered to: first, plot the raw data in a scatterplot; second, observe whether a relationship exists between the variables; and third, if the data suggest a linear, but not a curvilinear relationship, then calculate r. The problem with correlation calculations is that correlation can be confused with causality, and caution should be used about such an interpretation. The possibility of an indirect relationship, via a third and unmeasured variable, should be eliminated. It is up to the scientist to prove that these third variables have no effect on the observed correlation. Another caution is that Pearson's correlation coefficient is only dependable when the two compared variables are normally distributed because an outlier point can dominate the correlation.
In interpreting the strength of a correlation coefficient, we found no common consensus on the scale descriptors. A useful published example of descriptors might be: 0.0-0.2, very weak or negligible; 0.2-0.4, weak or low; 0.4-0.7, moderate; 0.7-0.9, strong, high, or marked; 0.9-1.0, very strong or very high [3].
Plotting data sets in scatterplots (Fig. 7A,7B,7C,7D,7E,7F) permits us to visually evaluate the data, and we can predict the outcome of an analysis of the correlation coefficients. The data in Figure 7A would have a good correlation, which is supported by a Pearson's test yielding an r value of 0.864 (strong correlation). Figures 7B and 7C are obviously linear and have an r value of 0.99 and an r value of -0.99 (very strong correlation). Figure 7D is somewhat ambiguous. However, r is equal to -0.549 and thus a moderate correlation exists. The data in Figure 7E are clearly related, but because the relationship is nonlinear, r is equal to 0.078. Even Figure 7F has a higher correlation coefficient, an r value of 0.247, than Figure 7E. A look at the correlation values alone for these data sets would suggest that the data in Figure 7E had no relationship, whereas the data have an interesting one that is immediately visible in the scatterplot.
|
|
|
|
|
|
When the scatterplot of the data for two variables looks like a linear relationship exists, then it is tempting to try and describe the relationship as linear and calculate the relationship between them using linear regression. This approach compares a dependent variable (y) in relation to an independent variable (x), which yields the familiar y = mx + b, where m is the slope of line and b is the y-intercept (when x = 0). Our hypothetic example shows the plot of raw data, regression line, and 95% confidence limits (Fig. 8). The difference between correlation and regression is that in a correlation, neither variable can be fixed, whereas in regression, one measurement is a variable (y) and depends on the other (x). Often, the value of x is assumed to be fixed, is capable of observation without error, and is normally distributed. Should there be no logical argument to define one variable as dependent and the other as independent, then the solution is to use a calculation of correlation and avoid the concept of dependence altogether. The importance of confidence limits should not be underestimated, either here with regression [4], or elsewhere with statements of sensitivity and specificity [5] or proportions and rates. For example, if we claim no side effects from contrast injections in 20 patients (rate = 0%), the upper 95% confidence limit of the rate of occurrence is actually 19%.
|
|
This article has been cited by other articles:
![]() |
N. Dendukuri and C. Reinhold Correlation and Regression Am. J. Roentgenol., July 1, 2005; 185(1): 3 - 18. [Full Text] [PDF] |
||||
![]() |
S. Lu, D. Ahn, G. Johnson, M. Law, D. Zagzag, and R. I. Grossman Diffusion-Tensor MR Imaging of Intracranial Neoplasia and Associated Peritumoral Edema: Introduction of the Tumor Infiltration Index Radiology, July 1, 2004; 232(1): 221 - 228. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Karlik Visualizing Radiologic Data Am. J. Roentgenol., March 1, 2003; 180(3): 607 - 619. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |