AJR F and L Medical Products: Radiation Protection & More
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Dendukuri, N.
Right arrow Articles by Reinhold, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Dendukuri, N.
Right arrow Articles by Reinhold, C.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Hotlight (NEW!)
Right arrow
What's Hotlight?
AJR 2005; 185:3-18
© American Roentgen Ray Society


Fundamentals of Clinical Research for Radiologists

Correlation and Regression

Nandini Dendukuri1,2 and Caroline Reinhold3,4

1 Technology Assessment Unit, Royal Victoria Hospital, Montreal, QC H3A 1A1, Canada.
2 Department of Epidemiology and Biostatistics, McGill University, 1020 Pine Ave. W, Montreal QC H3A 1A2, Canada.
3 Department of Diagnostic Radiology, Montreal General Hospital, McGill University Health Centre, 1650 Cedar Ave., Montreal QC H3G 1A4, Canada.
4 Department of Oncology, Synarc, 575 Market St., San Francisco CA, 94105.

Received November 17, 2004; accepted after revision November 23, 2004.

 
Series editors: Nancy Obuchowski, C. Craig Blackmore, Steven Karlik, and Caroline Reinhold.

This is the 18th in the series designed by the American College of Radiology (ACR), the Canadian Association of Radiologists, and the American Journal of Roentgenology. The series, which will ultimately comprise 22 articles, is designed to progressively educate radiologists in the methodologies of rigorous clinical research, from the most basic principles to a level of considerable sophistication. The articles are intended to complement interactive software that permits the user to work with what he or she has learned, which is available on the ACR Web site (www.acr.org).

Project coordinator: G. Scott Gazelle, Chair, ACR Commission on Research and Technology Assessment.

Staff coordinator: Jonathan H. Sunshine, Senior Director for Research, ACR.

Address correspondence to N. Dendukuri (nandini.dendukuri{at}mcgill.ca).


Introduction
Top
Introduction
Correlation
Regression
Multiple Variable Linear...
Sample Size Determination
Conclusion
APPENDIX 1. Inference for...
APPENDIX 2. Inference for...
References
 
This module covers common statistical methods used in radiologic applications for measuring relations between variables. Under the topic of correlation we describe Pearson's and Spearman's correlation coefficients and partial correlation, all of which are suitable for evaluating the association between two continuous variables. In the section on regression we cover linear and logistic regression models. Regression models are used to study the association between an outcome variable and one or more predictor variables that may be continuous or dichotomous. For linear regression models the outcome variable is continuous, whereas for logistic regression models it is dichotomous. We also briefly describe methods for model selection and sample size determination.

In a hypothetical study evaluating the use of MRI for the assessment of myocardial viability, researchers were interested in characterizing the nature of the relation between myocardial infarct volume and ejection fraction. Their objective was to answer questions such as: Is there any relation between infarct volume and ejection fraction? What is the strength of this relation? Does ejection fraction increase or decrease with increasing myocardial infarct volume? By how much would we expect the ejection fraction to change when the myocardial infarct volume increases by 1 mL? Can we predict a patient's ejection fraction when given his or her myocardial infarct volume? How accurate is this prediction?

Questions such as these arise in situations in which more than one variable has been measured on each patient (or observational unit) in a sample, and the relationship between the different variables is of interest. This module covers some of the most commonly used statistical tools to answer such questions: correlation coefficients and regression models. We will cover methods for studying the relation between two variables that may be both continuous, both dichotomous (i.e., having only two values), or a mix (one dichotomous and the other continuous). We will also cover situations in which we wish to study the relation between more than two variables.

To illustrate the methods in this tutorial we have used hypothetical examples that are all inspired from studies appearing in radiology research journals. Some of the concepts covered in this tutorial assume knowledge of earlier articles in this series, to which the reader is encouraged to refer [1-4].


Correlation
Top
Introduction
Correlation
Regression
Multiple Variable Linear...
Sample Size Determination
Conclusion
APPENDIX 1. Inference for...
APPENDIX 2. Inference for...
References
 
In Figures 1A, and 1B we have two scatterplots between ejection fraction and myocardial infarct volume. At first glance, it appears that the relation between the two variables is stronger in Figure 1A than in Figure 1B. In fact, the two figures are based on the same data from a hypothetical study of 30 patients. Altering the scale of the ejection fraction axis makes the relation observed in Figure 1B appear less strong than in Figure 1A. The purpose of this figure is to illustrate that a scatterplot alone is not sufficient to make conclusions about the strength of the relationship between two variables. The plot needs to be accompanied by an objective measure.



View larger version (13K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1A Scatterplots show relation between myocardial infarct volume and ejection fraction and illustrate effect of changing scale of ejection fraction axis. Relation between the two variables may appear stronger in A than in B, but both figures are based on same data. Altering scale of ejection fraction axis makes relation in B appear less strong than in A.

 


View larger version (13K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1B Scatterplots show relation between myocardial infarct volume and ejection fraction and illustrate effect of changing scale of ejection fraction axis. Relation between the two variables may appear stronger in A than in B, but both figures are based on same data. Altering scale of ejection fraction axis makes relation in B appear less strong than in A.

 
Pearson's Correlation Coefficient
Pearson's correlation coefficient is one such objective measure of the linear relation between two variables. Pearson's correlation coefficient (which we denote by rP) between two variables X (e.g., infarct volume) and Y (e.g., ejection fraction) is given by:

where xi and yi are the values of variables X and Y observed on each individual in the sample, x and y are the sample means of X and Y, and N is the number of individuals in the sample. The denominator of this expression is the square root of a positive quantity and is always taken to be positive. The numerator, on the other hand, can be positive or negative depending on the nature of the relation between X and Y. If X tends to increase when Y increases, then it is likely that when an individual xi exceeds the sample mean x, the corresponding yi also exceeds its mean y. This would cause the numerator, and thus rP itself, to be positive. If, on the other hand, X decreases as Y increases, it is likely that xi is less than x when yi is greater than y. This would result in a negative value of the numerator and of rP. In the example in Figures 1A, and 1B, we find that ejection fraction tends to decrease with increasing myocardial infarction volume. Thus, patients whose ejection fraction exceeds the mean ejection fraction of the sample are more likely to have myocardial infarct volumes that are smaller than the mean myocardial infarct volume of the sample, resulting in a negative value of rP.

Pearson's correlation coefficient can range from a minimum value of -1 to a maximum value of 1. Figures 2A, 2B, 2C, 2D, 2E, and 2F illustrates the value of rP in various prototypical situations. A value of rP = 1 is obtained when an increase in X is always associated with an increase in Y and the points in the scatterplot between X and Y can be joined to form a perfect straight line (Fig. 2A). A value of rP = -1 is indicative of a perfect negative linear relation between X and Y (Fig. 2B). As the strength of the linear relation between X and Y diminishes, the value of rP approaches 0 (Figs. 2C and 2D). A correlation coefficient of 0 indicates that there is no relation between the two variables. For the hypothetical data in Figures 1A, and 1B we find that rP is -0.91, suggesting a fairly strong negative relation between myocardial infarct volume and ejection fraction. The interested reader is referred to the table at the end of the appendix for a more detailed explanation of how to calculate the correlation coefficient.



View larger version (7K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 2A Examples of different values of Pearson's (rP) and Spearman's (rS) correlation coefficients. Value of rP = 1 is obtained when increase in X is always associated with increase in Y and points in scatterplot form a straight line.

 


View larger version (7K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 2B Examples of different values of Pearson's (rP) and Spearman's (rS) correlation coefficients. Value of rP = -1 is indicative of negative linear relation between X and Y.

 


View larger version (7K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 2C Examples of different values of Pearson's (rP) and Spearman's (rS) correlation coefficients. As strength of linear relation between X and Y diminishes, value of rP approaches 0.

 


View larger version (7K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 2D Examples of different values of Pearson's (rP) and Spearman's (rS) correlation coefficients. As strength of linear relation between X and Y diminishes, value of rP approaches 0.

 


View larger version (7K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 2E Examples of different values of Pearson's (rP) and Spearman's (rS) correlation coefficients. Plots show nonlinear relation between X and Y.

 


View larger version (8K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 2F Examples of different values of Pearson's (rP) and Spearman's (rS) correlation coefficients. Plots show nonlinear relation between X and Y.

 

View this table:
[in this window]
[in a new window]

 
TABLE : Calculating Pearson's Correlation Coefficient and the Simple Regression Equation Between Myocardial Infarct Volume and Ejection Fraction

 

Figures 2E and 2F illustrate two situations in which there is a perfect, though nonlinear, relation between X and Y. In Figure 2E, an increase in X is always accompanied by an increase in Y. Here, rP is quite high (0.92), although not equal to 1. In Figure 2F we have a U-shaped relation between the variables, with both low and high values of X being associated with high values of Y. Here rP is close to 0, suggesting only a weak relation between X and Y. These plots serve to illustrate that a value of rP close to 0 does not rule out the possibility of a strong nonlinear relationship between the variables.

Interpreting Pearson's correlation coefficient—A few things need to be kept in mind when interpreting a correlation coefficient:

  1. Correlation is independent of the units in which the two variables are measured. If our interest is in measuring the strength of the relation between ejection fraction and myocardial infarct volume, it does not matter whether the latter was measured in milliliters (mL) or liters (L).
  2. ) High correlation may indicate a strong association but not causation. Note that in the expression for rP, X and Y may be interchanged with no difference to the result. This means that the variables X and Y are not distinguished as "predictor" and "outcome" and it does not matter whether X causes Y or vice versa. It would be incorrect to assume that a high correlation between myocardial infarct volume and ejection fraction means that one of them is the cause of the other. Rather, we can only say that there is a strong association between them.
  3. The observed correlation (or lack of it) may be due to a confounding variable. In some situations the observed association (or lack of it) may be spurious and, in fact, reflect the effect of a third variable, referred to by epidemiologists as a "confounding variable" [5]. Such a variable is associated with both X and Y. Figure 3A is a scatterplot of the relation between endometrial thickness (measured at transvaginal sonography) and peak systolic velocity (measured at Doppler imaging) in postmenopausal women presenting with abnormal vaginal bleeding. The value of rP for the entire sample is only moderate (rP = 0.36). The sample was then divided into women with endometrial atrophy, those with endometrial hyperplasia, and those with endometrial carcinoma, and rP was calculated separately within each group. We find that the true strong relation between endometrial thickness and peak systolic velocity is obscured because both variables have an association with the histologic subgroups.
  4. Correlation between aggregate values is stronger than at the individual level. In Figure 3B, the blank circles form a scatterplot of endometrial thickness versus peak systolic velocity in postmenopausal women presenting at three health centers (university-based, community hospital, and walk-in clinic). The dark circles plot the relation between the average endometrial thickness and average peak systolic velocity for each of the three health centers. The correlation between the average values is almost 1, despite a weaker correlation at the patient level.
  5. Correlation is influenced by the range of the X and Y variables. The greater the range of the X and Y variables in the sample, the greater the correlation between them. Thus, a single outlying observation might give us a falsely elevated correlation coefficient.
  6. High correlation does not mean measurement equivalence. When comparing two imperfect measurements of the same underlying quantity, a high correlation is often used as a proof of strong agreement, but that is not correct. For example, we might be interested to determine whether measurements of the length of liver lesions using MRI and sonography are equivalent. A high positive correlation suggests only that increasing values of one measure are associated with increasing values of the second; it does not necessarily mean that they are measuring the same thing. A better approach to evaluating equivalence would be to examine the difference in magnitude of the observations on each patient. A large mean difference would suggest that the two measures are in fact not equivalent [6].



View larger version (19K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 3A Pearson's correlation coefficients. Graphs show correlation coefficients in the presence of confounding (A) and from aggregate data (B).

 


View larger version (16K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 3B Pearson's correlation coefficients. Graphs show correlation coefficients in the presence of confounding (A) and from aggregate data (B).

 

Assumptions used in calculating Pearson's correlation coefficient—Some important things need to be kept in mind before calculating rP. First, it is based on the assumption that both X and Y are measured on an interval scale. When we say myocardial infarct volume has been measured on an interval scale, we mean that a myocardial infarct volume of 4 mL is twice as large as a myocardial infarct volume of 2 mL. This would not have been true if it were measured by a nominal variable having values 1 (small), 2 (medium), and 3 (large) because we cannot say that a patient rated as "medium" has twice the myocardial infarct volume of a patient rated as "small." Second, both X and Y are assumed to follow a normal probability distribution [2]. This assumption allows us to perform hypothesis tests and construct confidence intervals for rP, as we will see.

Inference for Pearson's correlation coefficient—The sample correlation coefficient, rP, is a statistic the value of which changes depending on the sample collected. It is only an estimate of the population correlation coefficient, {rho}P, that we would have obtained if it were possible to observe the entire population of patients (or study units) from which the sample was collected. When reporting the sample correlation coefficient, we also need to report some measure of our uncertainty in the knowledge of the population correlation coefficient. This uncertainty may be expressed in terms of a p value or a confidence interval [3]. Confidence intervals are preferred to p values because they provide more information regarding the parameter estimated. An earlier article in this series explains in detail the distinction between confidence intervals and p values [3]. However, p values are still frequently reported in the medical literature, so we cover methods for their calculation and interpretation here.

p value: A p value measures the strength of the evidence in favor of a null hypothesis of the form H0: {rho}P = {rho}0, where {rho}0 is a predetermined value of the correlation coefficient of interest. In our example on myocardial infarct volume and ejection fraction, we can set {rho}0 = 0 to measure the evidence in favor of "no association between the two variables." When the p value is very low (typically < 0.05 or 0.01) we reject the null hypothesis. Details on how to calculate the p value are provided for the interested reader in Appendix 1. We find that the p value for our example is very, very small (<< 0.001). In other words, the probability that we would have observed a correlation as strong as rP = -0.91, when in fact the true correlation between myocardial infarct volume and ejection fraction was {rho}P = 0, is very, very small—much less than 0.0001. Therefore, we reject the null hypothesis of H0: {rho}P = 0 and conclude that there is an association between myocardial infarct volume and ejection fraction.

Confidence interval: The hypothesis testing approach limits us to a single hypothesis, which is often artificially set up. Rather than simply concluding that the population correlation coefficient is not 0, we might want to say a little more about the strength of the correlation. A confidence interval is more informative in that it gives us the range of possible values of {rho}P that are compatible with the observed value of the correlation coefficient. Details of the calculation of the confidence interval are given in Appendix 1. The 95% confidence interval for the correlation coefficient between myocardial infarct volume and ejection fraction is (-0.96 to -0.81). If our hypothetical study were repeated several times and a confidence interval calculated each time, then 95% of the confidence intervals would capture the true value of {rho}P. However, we cannot say if the interval obtained from our sample is one of the 95% that capture the true value of {rho}P (see [3] for more details on how to interpret a confidence interval). The 95% confidence interval may also be interpreted as the range of values of the null hypothesis ({rho}0) that cannot be rejected at the 1 - 0.95 = 0.05 level of significance.

The fact that our 95% confidence interval does not include 0 means that the null hypothesis of {rho}0= 0 would be rejected, which is the same conclusion we reached earlier using the p value. A better approach would be to compare the confidence interval with a predetermined range of values indicative of no relation between the variables. For example, let us say that a correlation coefficient in the range from -0.1 to 0.1 is in practice indicative of no relation between myocardial infarct volume and ejection fraction. Then the fact that our confidence interval clearly lies outside this region leads us to conclude there is a strong, negative relation between myocardial infarct volume and ejection fraction.

Partial Correlation
It is possible that the observed correlation between two variables (X and Y) may be in part because of a third variable (Z) that is related to both of these variables. When this third confounding variable is also observed, we may be interested in estimating the correlation between X and Y after eliminating the effect of their correlation with Z. For example, in a study of liver lesion characterization using three diagnostic tests—sonography, CT, and MRI—the Pearson's correlation coefficient between the accuracy of the different diagnostic tests was as shown in the following equations:



Clearly, all three methods are correlated with each other. What is the correlation between the diagnostic performance of sonography and MRI alone, after eliminating the effect of the correlation that both have with CT? To estimate this, we can calculate a partial correlation coefficient. The partial correlation between X and Y after having eliminated the effect of a third variable Z is given by:

If Z is not a confounding variable, one or both of rP (X,Z) and rP (Y,Z) would be 0 or very small. In such a situation, the partial correlation between X and Y (rXY.Z) would be similar to the Pearson's correlation coefficient between them (rP [X,Y]).

The partial correlation coefficient between performance in sonography and MRI in our example is shown in these equations (where US = sonography):

Thus, after eliminating the contribution of CT, we find that the strong relation between sonography and MRI vanishes. Moreover, it appears that the direction of the relation changes as well, suggesting that after removing the contribution of CT, lesions that are accurately diagnosed with sonography in fact are poorly diagnosed with MRI and vice versa.

This concept can be extended to calculate the partial correlation between two variables after adjusting for the effect of two or more variables. Multiple regression, which is discussed later in this article, can be used for the same purpose and is more straightforward to perform using commonly available statistical software packages.

Spearman's Rank Correlation
Spearman's rank correlation, which we denote by rS, is another statistic used for measuring the correlation between a pair of variables. It is called a nonparametric measure and is preferred when assumptions required for calculating Pearson's correlation coefficient are violated—that is, when X and/or Y are not measured on an interval scale, or when X and/or Y do not follow a normal probability distribution. To calculate Spearman's correlation coefficient, we need to assign a rank to the individual values of X and Y—that is, sort each of X and Y in increasing order and assign them ranks so that the smallest observation has a rank of 1 and the highest observation has a rank of N. The expression for Spearman's correlation coefficient is similar to Pearson's correlation coefficient, except that xi and yi are replaced by the rank(xi) and rank(yi) as follows:

Spearman's correlation coefficient ranges between -1 and 1, with these extreme values indicating a perfect negative or positive relationship, respectively, between X and Y. It takes the value 0 when there is no relation between the variables (Figs. 2A, 2B, 2C, and 2D). An advantage of Spearman's correlation coefficient over Pearson's correlation coefficient is that it can be used to evaluate a nonlinear relation between variables when the direction of the relationship does not change. In Figure 2E, where Y continuously increases with X, we see that the perfect nonlinear relationship between the variables is captured by Spearman's correlation coefficient, although not by Pearson's correlation coefficient. However, like rP, rS is inappropriate for measuring the strength of a nonlinear relationship that both increases and decreases, such as the U-shaped relation in Figure 2F.


Regression
Top
Introduction
Correlation
Regression
Multiple Variable Linear...
Sample Size Determination
Conclusion
APPENDIX 1. Inference for...
APPENDIX 2. Inference for...
References
 
The correlation coefficients described thus far can be used to measure the strength and the direction of an association. Regression models go a step further and can be used to predict the value of one variable given the other. This quality makes them suitable for the study of relationships when the two variables can be distinguished as "predictor" and "outcome." Note, however, that fitting a regression equation between two variables does not imply a causal relation between them. Regression models also provide a more straight-forward approach to adjusting for the effect of confounding variables. They can be used to deal with a variety of types of outcome variables (continuous, dichotomous, ordinal, count data, and so forth). Here, we focus on two of the most commonly used models for radiologic applications—linear regression models, in which the outcomes are continuous, and logistic regression models, in which the outcomes are dichotomous.

Regression is a broad area to which this article provides but a brief introduction. Greater detail on estimation and inference for linear and logistic regression is covered in introductory biostatistics textbooks [7-9]. More complex topics, such as regression model diagnostics, variable selection, and logistic regression for ordinal variables, are covered in greater depth in advanced textbooks [10-13].

Simple Linear Regression
Like Pearson's correlation coefficient, simple linear regression is also used to characterize linear relationships between variables. It is distinguished from multiple variable linear regression (discussed later) in that it involves only two variables, the outcome or dependent variable and the predictor or independent variable. The standard form of the simple linear regression equation is as follows:

where X and Y are the observed values of the predictor and the outcome variables, respectively. The parameters {alpha} and ß are called the intercept and the slope, respectively. For a given value of X, the predicted value of Y is {alpha} + ßX. The term {epsilon}, the residual (or error), is the difference between the observed value of Y and the predicted value of Y. The intercept and slope parameters are estimated with the aim of reducing this difference. The estimated values of the intercept and slope are denoted by a and b, respectively. An important assumption of the linear regression model is that the residuals are assumed to follow a normal distribution with mean 0 and a variance {sigma}2, which remains constant for all values of X. These assumptions imply that for a given value of X, the error in predicting the outcome is 0 on the average. Moreover, the magnitude of the error is not associated with X.



View larger version (14K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 4 Graph shows simple linear regression line between ejection fraction and myocardial infarct volume (MIV).

 
For our hypothetical example of the relation between myocardial infarct volume and ejection fraction, the estimated simple linear regression equation is as follows:

(see the solid line in Fig. 4).

The intercept of the regression model is equal to the predicted value of the outcome when the predictor variable is 0. This parameter is of interest only in those situations in which 0 lies within the plausible range of X values. Figure 4 shows that when the myocardial infarct volume is 0 mL, the ejection fraction is predicted to be equal to the intercept, or 70%. The slope of the regression model is the change in the outcome corresponding to a unit change in the predictor variable. A slope of 0 indicates that no relation exists between the predictor and outcome variables. From Figure 4, we see that when the myocardial infarct volume increases by 1 mL, the predicted value of the ejection fraction decreases by an amount equal to the slope, or -3.6%.

Selecting the "best-fitting" line—We need an objective criterion to help us estimate {alpha} and ß so that we have a best-fitting straight line. As explained earlier, we would like to use the regression equation to predict the outcome variable using the predictor variable. Clearly, we would like to do so in a way that minimizes the error in prediction (i.e., results in the lowest possible residual), {epsilon} i, for each patient. We use a criterion that minimizes the sum of the squared residual terms:

This is known as the method of least squares. The expressions for the estimated values of the intercept and the slope obtained using the method of least squares are given in

where

(See the table in Appendix 2 for an illustrative example of how to calculate a and b for a smaller sample of five patients. Notice that much of the calculation involves the terms already used in the calculation of Pearson's correlation coefficient.) In addition to a and b, we also obtain an estimate for the SE (i.e., square root of the variance) of the residuals, which we denote by s:

For our example, the SE of the residuals is given by s = 3.53. This tells us that the average error in predicting the ejection fraction by the myocardial infarct volume is about 3.53%. This error is quite small when compared with the range of ejection fraction values—roughly 40-70%—suggesting that our regression equation has a good predictive ability on average.

The residual SE, s, can be used to obtain estimates of the SEs of a and b and of the predicted value of the outcome variable using the formulae given in Appendix 2. These SEs can be used to perform inferences for these parameters via hypothesis tests or confidence intervals. In our example we find that the confidence interval for the slope of the regression line is (-4.3% to -2.9%). Because this interval does not include 0, we can conclude that there is an association between myocardial infarct volume and ejection fraction.

Model diagnostics—After having obtained the intercept and slope of a regression model, we need to verify whether the basic assumptions on which the model was built were satisfied. We need to evaluate whether the residuals follow a normal probability distribution, whether the variance of the residuals is constant for all values of X, and whether the relation between Y and X is linear. All of these assumptions can be verified using the following simple plots of the residuals.

Normal probability plot—A normal probability plot is used to verify whether the residuals follow a normal probability distribution. Most standard statistical software packages can be used to produce this plot. Figure 5A illustrates the ideal situation, in which the residuals do indeed follow a normal distribution and we observe a straight line along the diagonal of the plot. Any departure of the residuals from a normal distribution will show up as a deviation from this straight line. Figure 5B illustrates a case in which the residuals are skewed to the right and we observe a curved line below the diagonal. A possible corrective measure for this problem is to model the natural logarithm of the outcome instead of the outcome itself.



View larger version (18K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 5A Prototype normal probability plots. Graphs show plots with normally distributed residuals (A) and with residuals skewed to the right (B).

 


View larger version (18K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 5B Prototype normal probability plots. Graphs show plots with normally distributed residuals (A) and with residuals skewed to the right (B).

 

Scatterplot of residuals versus X—Figures 6A, 6B, 6C are prototype scatterplots of the residuals versus the predictor variable, X. In Figure 6A, we have the ideal situation, in which the model is appropriate. The residuals are randomly scattered about the value of 0 for the entire range of X. Furthermore, the residuals fall in a horizontal band of equal width for the entire range of X, meaning that they have a constant variance. In Figure 6B, we have a situation in which the residuals indicate that the relation between outcome and predictor is nonlinear. We find that values of X that are close to its minimum or maximum are associated with positive residuals, whereas values of X in the middle of its range are associated with negative residuals. The parabolic relation between the residuals and X in this plot suggests that Y is in fact a quadratic function of X—that is, Y is a function of both X and X2. In Figure 6C, we see an increase in the magnitude of the residuals with increasing X. This tells us that our assumption of a constant variance has been violated. As a result, the prediction of the outcome is better for lower values of X than for higher values.



View larger version (12K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 6A Graphs show prototype plots for linear regression diagnostics using residuals. In ideal situation (A), model is appropriate; in B, residuals indicate that relation between outcome and predictor is nonlinear; in C, prediction of outcome is better for lower values of X than for higher values.

 


View larger version (11K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 6B Graphs show prototype plots for linear regression diagnostics using residuals. In ideal situation (A), model is appropriate; in B, residuals indicate that relation between outcome and predictor is nonlinear; in C, prediction of outcome is better for lower values of X than for higher values.

 


View larger version (11K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 6C Graphs show prototype plots for linear regression diagnostics using residuals. In ideal situation (A), model is appropriate; in B, residuals indicate that relation between outcome and predictor is nonlinear; in C, prediction of outcome is better for lower values of X than for higher values.

 
Model fit—The usefulness of the regression model is determined by how well it predicts the outcome—that is, how well it fits the data. In the absence of information on myocardial infarct volume, our best guess at predicting the ejection fraction for patients in our sample would have been the sample mean ejection fraction y—that is, the predicted value of the ejection fraction would be identical for all patients and equal to y = 54.2%. This would be equivalent to assuming a = y and b = 0 (the horizontal dotted line in Fig. 4) and would result in the maximum possible value for the sum of the squared residuals. A commonly used method to estimate the usefulness of a linear regression line is to compare the decrease in the sum of the squared residuals with this maximum value. This is done using the R2 statistic, which is an estimate of the proportion of the total variation in Y that is explained by X. The R2 statistic ranges from a minimum of 0% when X is not related to Y to 100% when there is a perfect relation between the two variables. In our example, we found that R2 = 82.5%, meaning that myocardial infarct volume explains 82.5% of the observed variation in the ejection fraction.


Multiple Variable Linear Regression
Top
Introduction
Correlation
Regression
Multiple Variable Linear...
Sample Size Determination
Conclusion
APPENDIX 1. Inference for...
APPENDIX 2. Inference for...
References
 
Simple linear regression can be extended to accommodate more than one predictor variable. For example, a patient's glomerular filtration rate (GFR) can be predicted by a linear combination of the patient's age, weight, sex, and the inverse of his or her serum creatinine value by using an equivalent of the form:

As in the case of the simple linear regression model, the unknown parameters {alpha}, ß1, ß2, ß3, and ß4 are estimated with the objective of minimizing the sum of the squared residuals (i.e., the sum of the squared differences between the observed GFR values for each patient and the predicted values according to the regression model). We do not present the expressions for calculating the different coefficients and their confidence intervals because these are cumbersome, requiring knowledge of matrix theory. Moreover, most widely available statistical software programs can calculate these quantities. We focus instead on the interpretation of the model.

Table 1 presents the results from a hypothetical study relating the GFR to the predictor variables mentioned here among 100 patients with ages ranging from 40 to 60 years, weight ranging from 40 to 100 kg, and serum creatinine levels between 180 and 200 mmol/L. The intercept is the predicted value of the outcome in the event that all predictor variables are equal to 0. This quantity is of interest only when it is possible for all predictor variables in the model to be simultaneously equal to 0. In the example in Table 1, the intercept is not of interest because the values age = 0, weight = 0, and 1 / serum creatinine = 0 are not possible. The regression coefficients (estimates of the ß1 parameters) corresponding to continuous predictors are interpreted as the change in the outcome variable for a unit change in the predictor variable, while the remaining predictor variables are constant. This means that among a group of patients with a common weight, sex, and serum creatinine, an increase of 1 year in a patient's age is associated with a decrease in the GFR of 0.06 mL/min.


View this table:
[in this window]
[in a new window]

 
TABLE 1 : Multiple Variable Linear Regression Model for Predicting Glomerular Filtration Rate

 

Ordinal and nominal predictor variables— When including nominal predictors (e.g., variables such as sex or country of origin that have no natural ordering) or ordinal predictors (e.g., age measured in 5-year categories) in a regression model, we need to create what are called "dummy variables" or "indicator variables." To do this, we identify one of the categories of the predictor as a reference category. In the case of ordinal variables, the reference category is typically the lowest category. For example, if age is a three-category ordinal variable having values 61-65 years, 66-70 years, and 71-75 years, the 61-65 year category could be selected as the reference. In the case of nominal variables, where there is no clear ordering of the categories, any category may be arbitrarily selected as the reference. Once the reference category has been determined, we create indicator variables corresponding to each of the remaining categories of the predictor. The indicator variables take the value of 1 if a patient is in the category to which it corresponds or 0 otherwise. Because three categories were defined for the variable age, this means we need to create two indicator variables—one would take the value 1 for patients in the 66-70 year category, and the second would take the value 1 for patients in the 71-75 year category. Both indicator variables are added to the regression model as predictors.

In the example for GFR, the only noncontinuous predictor is sex. The category "male" was regarded as the reference category. Thus, the variable "sex" is an indicator for the female sex. It takes the value 1 if the patient is female and 0 if the patient is male. The regression coefficient corresponding to sex tells us that after adjusting for the effect of other predictor variables, female patients have a GFR that is 2.60 mL/min lower than that of male patients.

Inference for regression coefficients— Along with regression coefficients, we can report confidence intervals that give an idea of the uncertainty in estimating them. If the confidence interval corresponding to a predictor variable does not include 0, we conclude that it is statistically significant. Alternatively, we could perform a hypothesis test based on the t distribution and report a p value that tells us the probability of observing our estimated regression coefficient if its true value is 0. If the p value is much smaller than a predetermined level of significance (typically 0.05 or 0.01), we reject the null hypothesis that the regression coefficient is equal to 0. If there are k parameters in a model, the p value is obtained from the tables of the t distribution with N - k degrees of freedom (df), where N is the sample size and k is the number of predictors in the regression model. In our example, we can deduce from the 95% confidence intervals that the regression coefficients corresponding to both 1 / serum creatinine and sex are significantly different from 0, and those corresponding to age and weight are not. A similar conclusion is obtained on the basis of the p values.

Model fit—The R2 statistic introduced earlier can also be used to evaluate model fit for multiple variable linear regression models. The R2 statistic is defined as the proportion of the variance in the outcome variable explained by the regression model. It ranges between 0% and 100%, with values closer to 100% indicating a better model fit. In our example for predicting GFR from age, weight, sex, and serum creatinine level, the R2 statistic was quite low, meaning that the information obtained explained only 21% of the observed variation in GFR. A low value of R2 is not unusual in real-life applications.

Model selection—When we have several candidate predictor variables, we are often faced with the challenge of choosing between different models that are based on different predictors. Besides assessing the fit of a model, the R2 statistic may also be used to compare two different models for the same outcome. Table 2 lists R2 values for different candidate multiple regression models with GFR as the outcome. The model with the highest value of R2—that is, the model that best explains the observed variation in GFR—is the model with all four predictor variables included simultaneously. In interpreting these results, it must be noted that the R2 statistic is influenced by the number of predictor variables in the model. Notice that in Table 2 the R2 statistic increases with every additional predictor added to the model. Thus, when comparing two models, the R2 statistic may simply favor the model with the greater number of predictors.


View this table:
[in this window]
[in a new window]

 
TABLE 2 : Comparing Different Candidate Models for Predicting Glomerular Filtration Rate

 

Besides the R2 statistic, several other criteria have been proposed for model selection. One such criterion is the Bayesian information criterion (BIC). This criterion assesses model fit while simultaneously applying a penalty for every additional predictor added. Our interest is not in the actual value of the BIC for a given model, but rather the difference in the BIC between two models. The lower the BIC, the better the fit of the model. From Table 2, we see that according to the BIC criterion, adding age to the model worsens the model fit. Although criteria such as the R2 and BIC may be used to assess model fit, the choice of which predictor variables go into a model depends also on their clinical relevance, their impact on the magnitude of regression coefficients associated with the remaining predictors, and their statistical significance.

Model validation—An important way to evaluate a model is to use it to predict the outcome in a data set that is independent of the one used to fit the regression model. This step is referred to as "model validation." Repeating the study to collect new data may not always be a feasible option because of the cost and time involved. Instead, if we have a sufficiently large sample, we may choose to split the data set into two parts—a model-building or training data set that is used to estimate the regression coefficients, and a validation data set. This is known as cross-validation [11]. The model-building data set needs to be sufficiently large to obtain the required precision in estimating the regression coefficients. If this is not possible with half the data, the model-building data set may be larger than the validation data set.

Confounding and effect modification—A multiple linear regression model allows us to study the relation between a primary predictor, X (e.g., the experimental treatment), and the outcome, Y, while adjusting for the effect of one or more secondary predictor variables (e.g., the patient's demographic characteristics). For illustration, we will consider only one secondary predictor, Z, but the concepts discussed here can be extended to the case of more than one secondary predictor. A variable Z is said to be a confounder if it is associated with both X and Y. The true relation between Y and X is not determined by Z. However, not including Z in the regression model results in an incorrect estimate of magnitude or direction of the regression coefficient of X. A variable Z is said to be an effect modifier if it affects the magnitude of the association between Y and X. To determine if Z is an effect modifier, we must add both Z and the product XZ to the regression model between Y and X. It is possible for a variable to be both a confounder and an effect modifier.



View larger version (20K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 7A Confounding and effect modification. Graphs illustrate confounding (A and B) and effect modification (C and D).

 



View larger version (23K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 7B Confounding and effect modification. Graphs illustrate confounding (A and B) and effect modification (C and D).

 



View larger version (20K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 7C Confounding and effect modification. Graphs illustrate confounding (A and B) and effect modification (C and D).

 



View larger version (23K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 7D Confounding and effect modification. Graphs illustrate confounding (A and B) and effect modification (C and D).

 
The difference between a confounder and an effect modifier is illustrated graphically in Figures 7A, 7B, 7C, and 7D. In this example, we are interested in studying the relation between the primary predictor variable, weight (kg), and the outcome variable, bone density (mass/volume units). In our hypothetical sample, the patient's sex is a variable that is associated with both the outcome (bone density) and the predictor (weight)—women tend to have a lower bone density and a lower weight than men. Figure 7A illustrates the case when sex is a confounding variable but not an effect modifier. Fitting a single regression line for both men and women that includes only weight as a predictor, we obtain a regression coefficient of 0.4 mass/volume units corresponding to weight. Fitting two separate regression lines—one among men and the other among women—we find that the slope of the two lines is the same and is equal to 0.2 mass/volume units (Fig. 7B). This is the correct value of the slope, which can also be obtained by fitting a single multiple variable regression model in the entire sample that includes both weight and sex as predictors, as follows:

where the predictor "sex" is an indicator variable for female sex.

Figure 7C illustrates the case when sex is an effect modifier of the relation between weight and bone density—that is, the strength of the association between weight and bone density is modified by the variable sex. This means the regression lines between bone density and weight among men and women have different slopes (see Fig. 7D). In our hypothetical example, bone density increases more rapidly with weight among women than among men. We can evaluate whether sex is an effect modifier using a single multiple variable regression model that includes weight, sex, and their product as predictors, as follows:

From this single equation we can determine the different associations between bone density and weight among men and women. By setting sex = 0 in this equation, we find that the regression coefficient associated with weight is 0.2, the same as was obtained by fitting a separate linear regression model among men. Similarly, when setting sex = 1 in the equation, we find that the regression coefficient associated with weight = 0.2 0.15 = 0.05 mass/volume units, which is the same as the regression coefficient obtained when fitting the model among women alone. If the regression coefficient corresponding to the product term is significantly different from 0, we conclude that there is an interaction between weight and sex.

Logistic Regression
Logistic regression, like linear regression, can be used to relate a single outcome variable to one or more predictor variables. However, the outcome variable is dichotomous, having only two values (e.g., success or failure of an experimental treatment, survival or death at the end of a 10-year follow-up). One value of the dichotomous outcome variable must be designated as the outcome of interest—for example, success when the outcome has the values success or failure, or death if the outcome has the values death or survival. The odds of the outcome of interest are given by the ratio of the probability of observing the outcome of interest, to the probability of not observing it: probability of success / probability of failure, or probability of death / probability of survival. The logistic regression equation relates the logarithm of the odds of the outcome to the predictor variables.

In a hypothetical study, logistic regression was used to predict the extremely high breast density on mammography using information on a woman's parity (i.e., number of children), body mass index (BMI), and age. Extremely high breast density was defined as a dichotomous variable taking the value 1 when a woman's breast density was greater than or equal to 75%, and taking the value 0 when a woman's breast density was less than 75%. The resulting multiple logistic regression equation had the following form:


where ln is the logarithm to the natural base e and EHBD is extremely high breast density.

The predictor variables in a logistic regression equation may be continuous, nominal, or ordinal. As in the case of multiple linear regression, nominal and ordinal predictor variables are entered into the equation as indicator variables. In the logistic regression equation for extremely high breast density, BMI and age are both continuous variables, and nulliparous is an indicator that the woman is nulliparous.

The best estimates for the unknown parameters {alpha}, ß1, ß2, and ß3 may be obtained by a statistical method known as maximum likelihood. This method helps us identify the most likely value of the true parameters given the observed data and under the assumption that the number of patients with the outcome of interest follows a binomial distribution [2].

The relation between each predictor variable and the outcome in a logistic regression model is expressed in terms of an odds ratio (for more about odds ratios see the article by Blackmore and Cummings [4] in this series). When the predictor variable is ordinal or nominal, the odds ratio is a comparison between each indicator variable and the reference category. An odds ratio of 1 indicates there is no difference in the odds of the outcome of interest between the category associated with the indicator variable and the reference category. An odds ratio greater (lesser) than 1 indicates the outcome of interest is more (less) likely in the category associated with the indicator variable than in the reference category. Results for the extremely high breast density example are given in Table 3. The odds ratio of 5.53 corresponding to nulliparous tells us that the odds of extremely high breast density are (5.53 - 1) x 100 = 453% greater among women who are nulliparous compared with those who are not. For a continuous predictor variable, the odds ratio gives the relative increase (or decrease) in the odds of the outcome for a change of one unit of the predictor variable. For example, in Table 3, the odds ratio of 0.85 corresponding to BMI means that for a unit increase in the BMI, a woman's odds of extremely high breast density decrease by (1 - 0.85) x 100 = 15%. The odds ratios for all predictor variables are obtained by taking the exponent of the regression coefficient.


View this table:
[in this window]
[in a new window]

 
TABLE 3 : Logistic Regression Model for Predicting Extremely High Breast Density

 

We can test whether each regression coefficient is different from 0 using a chi-square test with N - k df, where N is the sample size and k is the number of predictors in the regression model. By comparing the chi-square p values in Table 3 with the traditional level of significance of the null hypothesis of {alpha} = 0.05, we conclude that the predictors nulliparous and BMI are statistically significantly associated with an extremely high breast density. Alternatively, we can report a confidence interval for the odds ratio. If the confidence interval does not include 1, then the predictor is considered statistically significant. If the confidence interval includes 1, as in the case of the predictor age in Table 3, we conclude that it is not significantly associated with the outcome.

As in linear regression a logistic regression model can also be used to determine whether a particular predictor variable is a confounder or effect modifier. The fit of a logistic regression model may be assessed using the BIC or a statistic similar to the R2 statistic.


Sample Size Determination
Top
Introduction
Correlation
Regression
Multiple Variable Linear...
Sample Size Determination
Conclusion
APPENDIX 1. Inference for...
APPENDIX 2. Inference for...
References
 
Any well-designed research study must begin with an idea of the sample size required. An insufficient sample size might leave us with important questions unanswered. On the other hand, too large a sample size might mean an unnecessarily expensive study. The sample size required for a study is calculated so that it provides sufficient evidence to make inferences about the primary parameter(s) of interest in the study. As mentioned throughout this series, there is an increasing emphasis in scientific journals on reporting of confidence intervals rather than p values. Thus, for this article we will limit ourselves to sample size formulae that are suitable for studies having the objective of reporting a confidence interval for the primary parameters of interest. Furthermore, we focus on sample size calculations for Pearson's correlation coefficient and simple linear regression. Sample size formulae for multiple variable linear regression and logistic regression are available but involve complex methods and are typically implemented by software programs [14]. These programs also provide calculations for studies in which the primary objective is to test a null hypothesis and report a p value.

General Concepts for Sample Size Calculation
Whatever the parameter of interest, certain concepts remain common to the exercise of sample size calculation.

First, the sample size calculation requires a guess value for the parameter of interest (e.g., correlation coefficient or the slope of a regression model) and parameters of its probability distribution (e.g., SE of the slope). This is rather paradoxical because the goal of the study is to find out more about this parameter. However, some reasonable range of guess values for the parameter can usually be found from the literature.

Second, identify a clinically meaningful range of values for this parameter.

Sample Size for Pearson's Correlation Coefficient
Assume we want to perform a study the goal of which is to measure the correlation between ratings of two experienced radiologists on a series of mammograms. Based on an earlier pilot study, our guess value for the correlation coefficient is {rho}P = 0.85. A sufficiently high correlation is deemed to be in the order of 0.8-0.9. Any value less than this is considered poor correlation. Ideally, we would like our research study to unequivocally determine whether the true correlation between the reviewers is sufficiently high. This means we would like our sample size to be large enough to ensure that the confidence interval lies entirely within or below the range 0.8-0.9—that is, the half-width of the confidence interval (or precision of our estimate) should be a maximum of 0.85 - 0.8 = 0.9 - 0.85 = 0.05. The calculation of the confidence interval requires the transformation of the correlation coefficient, {rho}P, into

(see Appendix 1). Therefore, we need to determine the maximum permissible value of the confidence interval half-width on the transformed scale. To do this, we transform both the guess value of the correlation coefficient and the lower end of the confidence interval and calculate their difference. The maximum permissible half-width of the transformed confidence interval, is given by

The sample size required to obtain a (1-{alpha})% confidence interval is then calculated as

where Z1-{alpha}/2 is the (1-{alpha}/2) quantile of the standard normal distribution. Thus, to obtain a 95% confidence interval for our study, we would need a sample size of approximately

Sample Size for the Slope of a Simple Linear Regression Model
Sample size calculation for the simple linear regression model typically focuses on determining whether the slope is different from 0. The required sample size can be obtained using the same approach as that given in this article for the correlation coefficient, by exploiting the fact that a slope of 0 in a simple linear regression equation is equivalent to a correlation of 0 between the predictor and outcome variables. Suppose we plan to study the relation between renal length as measured by sonography (predictor) and GFR (outcome) via simple linear regression. Suppose also that a smaller pilot study of the relation between these variables had reported a correlation coefficient of 0.3 (-0.2 to 0.8). To conclusively show a relation between the two variables, we would like the confidence interval to lie within 0.1-0.5 (i.e., to eliminate 0). The required sample size can be calculated using the methods described earlier for Pearson's correlation coefficient.


Conclusion
Top
Introduction
Correlation
Regression
Multiple Variable Linear...
Sample Size Determination
Conclusion
APPENDIX 1. Inference for...
APPENDIX 2. Inference for...
References
 
This article describes some of the most common statistical methods used by radiologists to evaluate the relation between variables. The article stresses the interpretation of these statistics and describes formulae to implement some of the simpler methods. Although it is unlikely that readers will actually perform these calculations by hand because they are all available in standard statistical packages, our aim in discussing them is to give the interested reader a better understanding of the motivation behind the statistical methods. Because of limited space we can only scratch the surface of many of the topics under regression models. More details on the topics discussed here may be found in introductory [7-9] and advanced [10-13] textbooks.


APPENDIX 1. Inference for Pearson's Correlation Coefficient (rP)
Top
Introduction
Correlation
Regression
Multiple Variable Linear...
Sample Size Determination
Conclusion
APPENDIX 1. Inference for...
APPENDIX 2. Inference for...
References
 
p value
To calculate the p value, we need to transform rP as follows:

where ln is the natural logarithm. This transformation is required because even though X and Y may follow a normal distribution, rP does not. However, ZP is known to follow a normal distribution with a standard deviation

making the calculation of the p value and confidence intervals easier. The remaining steps involved in calculating a p value are explained in the box below.


Compute the test statistic

The rule for estimating the p value depends on the alternative hypothesis HA as follows (see [3] for more on hypothesis testing):

When HA: {rho}P > {rho}0, the p value is given by the probability P(Z≥z).

When HA: {rho}P < {rho}0, the p value is given by the probability P(Z≤z).

When HA:{rho}P != {rho}0, the p value is given by the probability P(Z≥|z|).

The p value is calculated by comparing the test statistic with the tables of the normal distribution. Typically, if the p value is less than a predetermined level of significance, such as 0.05 or 0.01, the null hypothesis is rejected in favor of the alternative.

 

Recall that in our example of myocardial infarct volume and ejection fraction, the correlation coefficient for the entire sample of n = 30 patients was rP = -0.91. To estimate the evidence in favor of the hypothesis "there is no relation between myocardial infarct volume and ejection fraction"—that is, H0: {rho}P = 0—we begin by calculating the test statistic. First transform rP into

Then transform {rho}0 into

Finally, calculate the SD of ZP as

Using these three quantities, the test statistic can now be calculated as z = (ZP - Z0)/{sigma}Z = (-1.53-0)/0.19 = -8.05. The evidence in favor of the null hypothesis against an alternative hypothesis of "there is a relation between myocardial infarct volume and ejection fraction"—that is, HA: {rho}P != 0 is equal to P(Z≥|-8.05|). This is the probability that a variable following a standard normal distribution is less than -8.05 or greater than 8.05. From the normal distribution tables, we find that this probability is less than 0.0001. See module 10 in this series [2] for an explanation of how to use the tables of the normal distribution.

Confidence interval
As in the case of the p value, to construct a confidence interval for {rho}P we first need to transform rP into ZP. The upper (uZ) and lower (lZ) limits of the (1-{alpha})% confidence interval on the transformed scale are given by (lZ = ZP - Z1-{alpha}/2 {sigma}Z, uZ = ZP + Z1-{alpha}/2 {sigma}Z), where {sigma}Z is the previously defined SD of ZP, and Z1-{alpha}/2 is the (1-{alpha}/2) quantile of the standard normal distribution. The latter is the point below which the area under the normal distribution curve is equal to 1-{alpha}/2. We then retransform these limits to obtain the (1-{alpha})% confidence interval for {rho}P as (l = [exp(2lZ) - 1]/[exp(2lZ) + 1], u = [exp(2uZ) - 1]/[exp(2uZ) + 1]). In our example of myocardial infarct volume and ejection fraction, we can use the previously calculated values of Zp and {sigma}Z to obtain a 95% confidence interval on the transformed scale as (lZ = -1.53 - 1.96[0.19], uZ = -1.53 + 1.96[0.19]) = (-1.90 to -1.16). The value Z1-{alpha}/2 = 1.96 is obtained from the normal distribution table. On retransformation, we obtain the limits of the 95% confidence interval for {rho}P as



APPENDIX 2. Inference for the Simple Linear Regression Model
Top
Introduction
Correlation
Regression
Multiple Variable Linear...
Sample Size Determination
Conclusion
APPENDIX 1. Inference for...
APPENDIX 2. Inference for...
References
 
Standard errors (SEs) for the intercept and slope of the simple linear regression model, and expressions for calculating the p value and confidence interval for these parameters are given in the box below:Go



View larger version (20K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
 

Typically, we are more interested in the slope than in the intercept. A natural null hypothesis of interest is H0: ß=0. The SE of the slope in our example is given by sb = 0.32. See the table in this appendix for an illustration of how to calculate sb in a smaller sample of five patients. Note the results there are slightly different from those in this section because they are based on a different sample. Using the formula in the box, the test statistic can be calculated as

As in the case of the correlation coefficient, the p value that we report depends on the direction of the alternative hypothesis. If the alternative hypothesis was HA: ß != 0, then the p value is given by P(tN-2≥|tb|)—that is, the probability that the standard t distribution with N - 2 = 28 degrees of freedom (df) takes values less than or equal to -|tb| = -11.38 or greater than or equal to |tb| = 11.38. (Recall N = our sample size of 30. See [3] more details on the t distribution.) Looking up the t distribution tables corresponding to N - 2 = 30-2 = 28 df, we find that this probability is less than 0.001. Because this probability is much less than the traditional significance levels of 0.05 or 0.01, we reject the null hypothesis and conclude that there is a relation between ejection fraction and myocardial infarct volume.

Alternatively, we could construct a 95% confidence interval for the slope. As mentioned previously, this is more informative than simply reporting whether we did or did not reject a single null hypothesis. The term "t1-{alpha}/2, N-2" in the formula above denotes the 1-{alpha}/2 quantile of the t distribution with 28 df (i.e., the point on the standard t distribution below which there is a 1-{alpha}/2 probability). For a 95% confidence interval, we have {alpha} = 1-0.95 = 0.5. The value of t1-{alpha}/2, N-2 = t0.975,28 = 2.05. For our example, we have already calculated b = -3.6% and sb = 0.32. Thus, the 95% confidence interval is given by

This interval gives us an idea of the range of values of the slope that is compatible with the data and cannot be rejected by a hypothesis test. Because the interval does not include 0, we can conclude that there is a negative relation between ejection fraction and myocardial infarct volume.

For a given value of myocardial infarct volume, our simple linear regression model may also be used to predict the ejection fraction for an average patient or to predict the ejection fraction for an individual patient. The SEs for the predicted mean ejection fraction and for an individual's ejection fraction are as follows:

SE for predicted mean outcome at x

SE for predicted individual outcome at x

Notice that these two SEs are very similar except for the fact that an additional 1 appears in the term under the square root for the SE of the predicted outcome for an individual. This causes the SE of the predicted outcome for a single individual to always be greater than the predicted outcome for an average individual. This is because of the additional variance of the individual outcomes above the average outcome. In our example, SM,2 = 1.14, and SI,2 = 3.71. The predicted value of the outcome when the predictor is equal to x is denoted by . The predicted average ejection fraction corresponding to a myocardial infarct volume of 2 mL (denoted by ) can be calculated using the regression equation as 70 - 3.6(2) = 62.8%. The expression for a (1-{alpha})% confidence interval for the average ejection fraction is

Recall that we had determined from the tables of the t distribution that t0.975,28 is 2.05. Thus, the 95% confidence interval for the predicted mean ejection fraction when myocardial infarction volume = 2 mL is given by



The confidence interval for an individual's ejection fraction when myocardial infarction volume is 2 mL is obtained by replacing the SE in the this expression by sI,x—that is, by




References
Top
Introduction
Correlation
Regression
Multiple Variable Linear...
Sample Size Determination
Conclusion
APPENDIX 1. Inference for...
APPENDIX 2. Inference for...
References
 

  1. Karlik SJ. Exploring and summarizing radiologic data. AJR 2003;180:47 -54[Free Full Text]
  2. Joseph L, Reinhold C. Introduction to probability theory and sampling distributions. AJR2003; 180:917 -923[Free Full Text]
  3. Joseph L, Reinhold C. Statistical inference for continuous variables. AJR 2005;184:1047 -1056[Free Full Text]
  4. Blackmore C, Cummings P. Observational studies in radiology. AJR 2004;183:1203 -1208[Free Full Text]
  5. Hennekens CH, Buring E. Epidemiology in medicine. Boston, MA: Lippincott, Williams & Wilkins,1987
  6. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet1986; 1:307 -310[CrossRef][Medline]
  7. Moore DS, McCabe GP. Introduction to the practice of statistics, 3rd ed. New York, NY: Freeman,1998
  8. Glantz SA. Primer of biostatistics, 5th ed. New York, NY: McGraw-Hill, 2001
  9. Dawson B, Trapp RG. Basic and clinical biostatistics, 3rd ed. New York, NY: McGraw-Hill Lange Medical Series, 2001
  10. Harrell F. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis, 1st ed. New York, NY: Springer-Verlag,2001
  11. Kleinbaum DG, Kupper LL, Muller KE, Nizam A. Applied regression analysis and multivariable methods, 3rd ed. Pacific Grove, CA: Duxbury Press, 1998
  12. Hosmer D, Lemeshow S. Applied logistic regression, 2nd ed. New York, NY: Wiley,2000
  13. Kleinbaum DG. Logistic regression: a self-learning text, 2nd ed. New York, NY: Springer-Verlag,2002
  14. Hintze JL. PASS [power analysis and sample size] user's guide. Kaysville, UT: NCSS [Number Cruncher Statistical System],1996

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?



This Article
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Dendukuri, N.
Right arrow Articles by Reinhold, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Dendukuri, N.
Right arrow Articles by Reinhold, C.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Hotlight (NEW!)
Right arrow
What's Hotlight?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS