AJR 2003; 180:47-54
© American Roentgen Ray Society
Fundamentals of clinical research for radiologists |
Exploring and Summarizing Radiologic Data
Stephen J. Karlik1
1 Diagnostic Radiology and Nuclear Medicine, Rm. 2MR21, University of Western
Ontario, London Health Sciences Center-University Campus, 339 Windermere Rd.,
London, Ontario, Canada N6A 5A5.
Received June 13, 2002;
accepted after revision July 9, 2002.
Address correspondence to S. J. Karlik.
Series editors: Craig A. Beam, C. Craig Blackmore, Stephen J. Karlik, and
Caroline Reinhold.
This is the eighth in the series designed by the American College of
Radiology (ACR), the Canadian Association of Radiologists, and the
American Journal of Roentgenology. The series, which will ultimately
comprise 22 articles, is designed to progressively educate radiologists in the
methodologies of rigorous clinical research, from the most basic principles to
a level of considerable sophistication. The articles are intended to
complement interactive software that permits the user to work with what he or
she has learned, which is available on the ACR Web site
(www.acr.org).
Project Coordinator: Bruce J. Hillman, Chair, ACR Commission on Research
and Technology Assessment.
Introduction
In this series, we have been learning about the use of statistics to plan,
execute, and analyze our research. This module is designed to help define and
categorize data into conventional measures for display and analysis. Display,
or visualization, of the data is an important concept and one that is at the
root of our understanding of various types of data. Before addressing which
types of graphs, presentations, or analyses are useful and appropriate, we
need to define exactly what type of data to analyze. In our studies, we choose
different variables with which to collect the data that can be divided into
two primary types by quantity or category. The quantity types are continuous
(measuring) and discrete (counting), and the category types are nominal
(named) and ordinal (ordered). The following section defines and gives
examples of each.
Quantitative Variables
Continuous Data
Continuous data are probably the least frequently reported in the radiology
literature because our work has been traditionally one of dichotomous
interpretation: either an imaging study successfully reveals an abnormal from
a normal finding or it does not. Continuous data are found in which the data
of interest exist in a quantifiable range of values that can take any
conceivable value in that range. The degree of precision is based on the
technology used for its measurement. Some examples are blood pressure (mm Hg),
size of a tumor (cm), serum cholesterol (µg/ mL), length of an MR imaging
sequence (sec), and amount of contrast material (mL). Each of these variables
can have a wide range of values whose precision of measurement can vary
significantly. Another way to think about continuous data is that a possible
value between two other values always exists. An example would be a patient
with a systolic blood pressure of 111.5 mm Hg that lies between two other
patients with pressures of 111.2 and 111.9 mm Hg.
Discrete Data
A discrete variable is characterized by having only certain values (usually
integers). For example, a patient can have only a whole integer representing
the number of breast tumors. There are never cases of "2.7 tumors
detected on a mammogram" (although a group of patients might have a mean
of 2.7 tumors). Another example might be, "The study used eight
radiographs for archiving the images for a study." In the previous
example, it seems obvious that we use only a whole radiograph, not 7.5 or 8.5.
The distinction may not be so apparent: consider WBC. Because one counts the
number of cells per millimeter cubed, the data appear (e.g., 33
cells/mm3) like a ratio scale (which is discussed in the next
section of this article). Because there are never partial cells, the data are
defined as discrete.
Comparing Ratio and Interval Scales
Ratio scales of measurement have a constant interval size and a true zero
point. If one patient has a 6-cm kidney tumor and a second has a 3-cm tumor,
then we can state that the second tumor is half as large as the first. Ratio
scales also include capacities (mL), volumes (cm3), rates (mL/min),
weights (kg), and lengths of time (min).
Interval scale data are those derived from a measurement scale that
possesses a uniform interval, but interval scale data have no true zero, as,
for example, the centigrade temperature scale (degrees Celsius). Although the
difference between 20°C and 25°C is the same as between 5°C and
10°C, 50°C cannot be considered twice as hot as 25°C because the
zero point is arbitrary. Actually, the temperature scale of kelvin is a ratio
scale, because the zero value is real at absolute zero.
Categoric Variables
Nominal Data
Nominal variables often describe characteristics, such as male and female,
and are commonly used in radiologic studies. Nominal scales name the values of
the nominal variable. For example, a breast tumor type could be classified as
benign, malignant, or containing calcifications.
Ordinal Data
This type of data deals with comparisons that are relative, rather than
quantitative. Thus, the data consist of an ordering or a ranking of
measurements. When one orders the finding, then the scale becomes ordinal,
even if the steps in the order are different. An example is the Kurtzke
[1] expanded disability scale
(0-10) for the neurologic assessment of patients with multiple sclerosis. In
this widely used scale, a worsening in the patient status of one unit from 1
(minor signs) to 2 (elevated thresholds) is dramatically different from 6
(walks with assistance) to 7 (wheelchair bound). A common from used in
radiology is to classify image interpretability as poor, moderate, or
excellent and perhaps grade as 1, 2, and 3.
It is also possible to have exactly the same original data portrayed in
several different data types. Using an example of examination marks, we can
have raw marks of 97, 75, 68, and 51 (discrete data) that can be expressed as
the grades A, B, C, and D (ordinal data) or pass, pass, pass, and fail
(nominal data). Although this latter example appears trivial, this exact type
of data reduction is common in radiology, in which a complex data set is
reduced to presence or absence to facilitate the common 2 x 2 chisquare
analysis of diagnostic accuracy. The problem with data reduction is that it
can result in a loss of information.
Plotting Methods
Let us take the different types of measurement in turn and examine
exploring, summarizing, and presenting each type.
Continuous data have no discrete divisions between elements apart from
those imposed by our measuring technique. Some examples are time, the size of
a tumor, and blood pressure. Table
1 lists the time taken for a bolus injection of radiographic
contrast material to reach a maximum in the kidney with a range of 8-28 sec
for 30 patients. These data are raw in the sense that they are unadulterated,
unmodified, and untransformed. Time is a continuous measurement, which can
take any value whatsoever, but the precision of its measurement is dependent
on our measurement tool (wall clock vs stop watch, accurate to a millisecond).
By the established rules of science, a reported time of 21 sec is actually all
times from 20.5 sec up to and including 21.4 sec. The next section illustrates
a variety of ways of exploring these data.
A preliminary and easy way to look at continuous (raw) data is to use the
"stem-and-leaf" plot. Although likely unfamiliar to the
radiologist, it is easy to construct without computerized graphing packages
and shows the distribution of the data in a rudimentary way. The common
"stem" is along the left for each decade (0 for units = 0-9, 1 for
teens = 10-19, and 2 for twenties = 20-29), and the different values are
sorted by increasing values in the second column
(Table 2). Most values are in
the decade from 10-19, and there are no values exceeding 28 sec. This plot
style would clearly identify a highly unusual value (84 sec) from a large
number of pointsfor example, if a value of 8 in the stem and 4 in the
leaf were seen. Although a stem-and-leaf plot can allow an easy appreciation
of a data set, the details of the distribution are missing.
To obtain a more detailed examination of our example of enhancement time,
we created a dot plot that shows the frequency of occurrence of any individual
data values (Fig. 1). In a way
analogous to the stem-and-leaf plot, the dot plot uses a stem value for each
unique time value in the whole set. Then, a single dot is plotted for each
occurrence of that value in the data setfor our example, one dot for 8
or 9 sec and 4 dots for 16 sec. Although possibly also unfamiliar, this method
is another way to picture the raw data and is analogous to a histogram for
each time point. In this type of plot, each data point simultaneously shows
the actual value, occupies space, and represents one counting unit. Compared
with the stem-and-leaf visual, the dot plot permits a more detailed
appreciation of the variability in the data and is close to a histogram
(albeit one that has been stood on its end).
The data set that is organized as a conventional histogram shows the
frequency of each data value as a bar (Fig.
2). When the data are scattered or the data intervals are too
numerous, it is customary to reduce the number of intervals, remembering that
there should be enough intervals or bins to show any relevant pattern. Because
the data in Table 1 consist of
30 values, one published rule is to use approximately
n
(square root) intervals, where n is the total number of values
[2]. With an n value
of 30 and a
n value of 5.5, five or six intervals are
appropriate. If we choose six intervals, then the resulting histogram shows a
maximum in the 15- to 17-sec interval (Fig.
3). Although this reduction of the data by decreasing the number
of intervals loses some of the details of the exact measurements seen in
Figure 2, the essential
character of the data is illustrated, in that the maximal enhancement time
values are identified in the 12- to 21-sec area in intervals containing 12-14,
15-17, and 18-21 sec. If the transit time variability is important, then you
might prefer to choose Figure
2. Conversely, if showing the typical time were your goal (say to
choose an optimal imaging time), then the expression of data in
Figure 3 would be appropriate.
Neither choice is artificial; each emphasizes a different aspect of the
data.
Having plotted our data and appreciated its distribution, we must determine
three primary attributes: the center, the dispersion, and the symmetry of the
data distribution.
Measurements of Central Tendency
The central tendency is the tendency of the observations to accumulate at a
particular value or in a particular category. The three ways of describing
this phenomenon are mean, median, and mode.
The most widely used measure of central tendency is the familiar mean, in
which the calculation of the mean is simply adding all values in the data set
and dividing the sum by the number of samples. This procedure yields a mean
value of 17.2 sec for our time data. The mean is only applicable to ratio or
interval scale data.
Another way to look at this data is to make a cumulative frequency diagram.
We first convert the frequency histogram
(Fig. 3) to a cumulative
frequency table (Table 3) and
then plot Table 3 as the final
cumulative frequency diagram (Fig.
4). The conversion is started by listing the number of occurrences
for each interval under the interval values in
Table 3. Then we calculate the
cumulative frequency for each interval as the total frequency of that
interval, plus the frequency of all lower intervals. For example, the
cumulative frequency for the interval 18-21 sec is the actual frequency (seven
occurrences) plus the total frequency in all smaller intervals (n =
18) to yield 25. It is also possible to convert the raw frequency histogram to
the cumulative frequency diagram in an entirely analogous way using the
individual data values rather than the intervals.
The cumulative frequency diagram provides the investigator with an
opportunity to visualize three important measures of the data: the first
quartile (q), the median value (M, or the second quartile), and the third
quartile (Q). The median is the middle value from the data set. Because there
is an even number of observations (n = 30), we take the 15th and 16th
values from a list of the data with increasing values (16 and 17 sec here) and
take the average, which is 16.5 sec. The median divides the data into two
equal parts (by the number of observations); the quartiles divide each of
these halves into two or four parts total. The values of q, M, and Q can help
to show whether the data are symmetric in the interquartile range, which
happens if the Mq and QM ranges are approximately equal. This
determination of interquartile ranges is our first introduction to measures
that characterize the dispersion or spread of the observed data.
Imagine that the histogram illustrated in
Figure 3 could be physically
weighed instead of occupying some space in a plot. The mean can be
conceptually thought of as dividing the histogram into two equal parts by
weight, whereas the median is simply the middle measurement in the data set.
The median also expresses less information than the mean because the median is
based on the rank of the individual data values (not the actual values). When
the data set has many values that are low or high compared with the average,
the median is less sensitive to these values and may be a preferential way to
describe the central tendency. Thus, the median is insensitive to the data
extremes. In our example, this insensitivity could happen if we exchanged the
highest value in our set (28 sec) with a larger data point (100 sec). Although
the median value would remain the same (16.5 sec), the new mean is 19.6 sec.
Thus, the median retains its ability to identify a value more consistent with
the spirit of the data compared with the mean, which has been increased by the
extreme value.
In addition to the quartile divisions mentioned previously, the
distribution can also be divided into other parts, such as percentiles (or 100
parts). A representative example of this division is the use of lethal dose 50
(LD50) from pharmacologic studies. The LD50 is actually the dose at which 50%
of the experimental animals died, or the 50th percentile of lethal doses, or
the median lethal dose. Similarly, q (first quartile) is the 25th percentile
and Q (third quartile) is the 75th percentile.
A useful way to depict this type of data is the box-and-whiskers plot
(Fig. 5), which is effective in
summarizing the properties of a data set. The bottom and top of the box are
the 25th and 75th percentiles (which are q and Q in
Fig. 4), the line in the box is
the median value (M), and the "whiskers" (looking like error bars)
extend to the 10th and 90th percentiles.
The mode is another term used to describe the central tendency of a data
set. The mode is defined as the most frequently occurring measurement, which
is 16 sec for our enhancement data. It is possible that the data set has more
than one mode. Hence, it is possible to see the descriptor
"bimodal" for a distribution of data having two modes or two peaks
on a plot of the data.
Measurements of Dispersion
As seen in Figure 2, our
enhancement maxima do not all occur at the same time and are spread over a
substantial range (8-28 sec). We can exactly express this dispersion or
nonuniformity in the data. The most commonly used measure of dispersion for a
single sample of continuous data is the SD, and, like the mean, the SD takes
all the data into account. The SD is a statistical measure that expresses the
average amount by which all data values in the set deviate from the mean
value: the smaller the differences, the smaller the deviations, and the
smaller the SD (and vice versa). For our data set, the mean is 17.2 sec with
an SD of 4.7 sec.
If we can assume that the data we collect is normally distributed, the SD
has some useful interpretations. For example, 68% of all observations will lie
within ± 1 SD of the mean value. Ninety-five percent of the data lies
within ± 2 SDs, and 99.7% lies within ± 3 SDs of the mean.
Hence, the SD is approximately one sixth of the total data range for a normal
distribution.
The mean and SD of a normally distributed data set tell us about the
internal structure or internal proportions. Another term that is often seen is
the standard error. The SD of the means of many samples from the same
population is called the standard error. The standard error depends on the
sample SD, the number of samples, and the proportion of the population in the
sample. These three statistical measuresmean, SD, and standard
errorare used to determine whether two experimentally determined
samples are from different populations. When we compare samples, we are
applying a test of significance. "Statistically significant" may
not equate to "interesting" or "important."
Ordinal Data
Tables are effective for the presentation of ordinal data.
Table 4 illustrates an example
of the reporting of vessel conspicuity for different visualization techniques:
digital subtraction angiography, contrast-enhanced time-of-flight MR
angiography, three-dimensional time-of-flight MR angiography, and dynamic MR
angiography. The ordinal scale is partial visibility to excellent visibility
in four steps represented in the table by "+" to "+++"
in an intuitively obvious way.
Proportions and Rates
Proportions and rates are descriptive parameters for a population that can
be estimated from a sample. Rate is the occurrence of a particular event in a
sample and is given as a percentage. Table
5 shows an example in which the number of events (and rate as a
percentile) is listed for four possible categories of neurologic outcome
resulting from carotid artery stenting. "Proportion" is a
descriptor that is applicable to categoric data. A stacked bar chart permits
visualization of the proportions of three measures in three different patient
groups (Fig. 6). In radiology,
we frequently use a common statistical test (chisquare) to determine whether
the rate or proportion of observations is different in two or more
populations.

View larger version (10K):
[in this window]
[in a new window]
[as a PowerPoint slide]
|
Fig. 6. Bar chart shows proportion of patients in three treatment
groups who were found with no change in size of prostate (black bar),
enlargement (white bar), or decrease in size of prostate (gray
bar). Note proportion of patients in each classification in each of three
differently sized groups.
|
|
Relationship Between Two Variables
At times, we take two simultaneous measurements of our study population for
the purpose of determining whether a relationship exists. In some instances,
the measurements are taken to establish a pattern in the data (e.g., body
weight and X-ray attenuation) or to search for an easy-to-measure surrogate
marker for a hard-to-measure value (e.g., to measure the amount of iodinated
contrast agent in a solution using its optical absorbance).
A scatterplot is the first step in examining the relationship between two
sets of measures. The correlation coefficient (r) measures how close
the relationship between two measurements is to linearity. The maximal values
for r are 1 or -1, and the two variables can be positively or
negatively correlated. If the two variables show a nonlinear relationship
(e.g., parabolic), then r equals zero, even though a strong
relationship exists. The two calculations for correlation coefficient are
Pearson's product moment correlation for normal data, and for ordinal data,
Spearman's rank correlation.
When a correlation coefficient is used, three steps should be adhered to:
first, plot the raw data in a scatterplot; second, observe whether a
relationship exists between the variables; and third, if the data suggest a
linear, but not a curvilinear relationship, then calculate r. The
problem with correlation calculations is that correlation can be confused with
causality, and caution should be used about such an interpretation. The
possibility of an indirect relationship, via a third and unmeasured variable,
should be eliminated. It is up to the scientist to prove that these third
variables have no effect on the observed correlation. Another caution is that
Pearson's correlation coefficient is only dependable when the two compared
variables are normally distributed because an outlier point can dominate the
correlation.
In interpreting the strength of a correlation coefficient, we found no
common consensus on the scale descriptors. A useful published example of
descriptors might be: 0.0-0.2, very weak or negligible; 0.2-0.4, weak or low;
0.4-0.7, moderate; 0.7-0.9, strong, high, or marked; 0.9-1.0, very strong or
very high [3].
Plotting data sets in scatterplots (Fig.
7A,7B,7C,7D,7E,7F)
permits us to visually evaluate the data, and we can predict the outcome of an
analysis of the correlation coefficients. The data in
Figure 7A would have a good
correlation, which is supported by a Pearson's test yielding an r
value of 0.864 (strong correlation). Figures
7B and
7C are obviously linear and
have an r value of 0.99 and an r value of -0.99 (very strong
correlation). Figure 7D is
somewhat ambiguous. However, r is equal to -0.549 and thus a moderate
correlation exists. The data in Figure
7E are clearly related, but because the relationship is nonlinear,
r is equal to 0.078. Even Figure
7F has a higher correlation coefficient, an r value of
0.247, than Figure 7E. A look
at the correlation values alone for these data sets would suggest that the
data in Figure 7E had no
relationship, whereas the data have an interesting one that is immediately
visible in the scatterplot.
When the scatterplot of the data for two variables looks like a linear
relationship exists, then it is tempting to try and describe the relationship
as linear and calculate the relationship between them using linear regression.
This approach compares a dependent variable (y) in relation to an
independent variable (x), which yields the familiar y =
mx + b, where m is the slope of line and b
is the y-intercept (when x = 0). Our hypothetic example
shows the plot of raw data, regression line, and 95% confidence limits
(Fig. 8). The difference
between correlation and regression is that in a correlation, neither variable
can be fixed, whereas in regression, one measurement is a variable
(y) and depends on the other (x). Often, the value of
x is assumed to be fixed, is capable of observation without error,
and is normally distributed. Should there be no logical argument to define one
variable as dependent and the other as independent, then the solution is to
use a calculation of correlation and avoid the concept of dependence
altogether. The importance of confidence limits should not be underestimated,
either here with regression
[4], or elsewhere with
statements of sensitivity and specificity
[5] or proportions and rates.
For example, if we claim no side effects from contrast injections in 20
patients (rate = 0%), the upper 95% confidence limit of the rate of occurrence
is actually 19%.

View larger version (12K):
[in this window]
[in a new window]
[as a PowerPoint slide]
|
Fig. 8. Graph shows hypothetic data set () with linear
regression (solid line) and 95% confidence intervals (dashed
lines) plotted. Note that confidence intervals permit appreciation of
strength of regression. r2 = 0.927, slope (m = 1.28), and
x-intercept = -0.286.
|
|
Sensitivity and Specificity
Sensitivity and specificity are ratios fundamental to the radiology
discipline. They relate the ability of an imaging technique to reveal disease
when present (sensitivity) and to rule out disease when absent (specificity).
The numbers are generated using the familiar 2 x 2 table, which we have
seen previously in this series
[6], for proportions used to
compare diagnostic determination (presence or absence of disease) with a
standard of reference. The better the latter (e.g., surgical confirmation),
the more valuable and accurate the diagnostic measurement will be. Although
the analysis of a 2 x 2 contingency table has been shown previously in
this series of articles, we will use the example in
Table 6 to calculate these
values. Sensitivity is a(a + c), which is equal to
653/730 or 89%; specificity is d / (b + d), which
is equal to 1400/1537 or 92%. Missing from most reports in the radiology
literature is the confidence interval based on the binomial theorem
[7]. There are a few key
questions to consider when evaluating sensitivity and specificity values: Was
there an independent and blind comparison with the standard of reference? Was
the diagnostic test evaluated in a group of patients appropriate to the target
population? Was the standard of reference applied regardless of the diagnostic
test [7]? Both negative and
positive predictive values can also be calculated from the 2 x 2 table,
as well as prevalence, pre- and posttest odds, likelihood ratios, and posttest
probability. Usually, these statistical measurements are portrayed in simple
tables or in the text of an article. It is useful to show all 2 x 2
contingency tables because it is then possible for the reader to calculate all
these values. Even when the 2 x 2 is expanded into a receiver operating
characteristic analysis (to be described later in the series), the relevant
measure (usually area under the curve) can be expressed in table format with
the appropriate confidence intervals.
Summary
The purpose of this article was to define the different variables that
radiologists routinely use to describe their data. Categoric and continuous
data types were identified, and suitable graphs and tables were shown to
depict the findings in an informative and succinct manner. Continuous data and
measures of central tendency and dispersion were shown. The relationship
between two variables was determined by the correlation coefficient with a
consideration of the caveat that correlation should not be confused with
causality. The familiar 2 x 2 contingency table and derived values were
explored. Identifying variable types and choosing their appropriate displays
should be a more straightforward task after studying these examples.
References
- Kurtzke JF. On the evaluation of disability in multiple sclerosis.
Neurology
1998;50:1961
-1970
- Clarke GM. Statistics and experimental
design. London: Edward Arnold, 1994:7
- Rowntree D. Statistics without tears.
London: Penguin, 1991:170
- Glanz SA. Primer of biostatistics. New
York: McGraw-Hill, 1992:211
- Harper R, Reeves B. Reporting of precision of estimates for
diagnostic accuracy: a review. BMJ
1999;318:1322
-1323[Free Full Text]
- Jarvik JG. The research framework. AJR
2001;176:873
-878[Free Full Text]
- Sackett DL, Richardson WS, Rosenberg W, Haynes RB.
Evidence-based medicine. New York: Churchill
Livingston, 1997:118
-128

CiteULike
Complore
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
N. Dendukuri and C. Reinhold
Correlation and Regression
Am. J. Roentgenol.,
July 1, 2005;
185(1):
3 - 18.
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Lu, D. Ahn, G. Johnson, M. Law, D. Zagzag, and R. I. Grossman
Diffusion-Tensor MR Imaging of Intracranial Neoplasia and Associated Peritumoral Edema: Introduction of the Tumor Infiltration Index
Radiology,
July 1, 2004;
232(1):
221 - 228.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. J. Karlik
Visualizing Radiologic Data
Am. J. Roentgenol.,
March 1, 2003;
180(3):
607 - 619.
[Full Text]
[PDF]
|
 |
|