|
|
||||||||
Original Research |
1 Department of Medicine, Section of Cardiology (M/C 715), University of
Illinois at Chicago College of Medicine, Chicago, IL 60612.
2 Department of Health Sciences, University of York, Heslington, York YO10 5DD,
United Kingdom.
Received October 11, 2004;
accepted after revision December 14, 2004.
Address correspondence to A. B. Sevrukov
(sevrukova{at}mir.wustl.edu).
Abstract
|
|
|---|
MATERIALS AND METHODS. We assembled a convenience sample of 2,217 pairs of repeated electron beam CT coronary calcium scans acquired in quick succession. Each scan consisted of forty 100-msec, 3-mm sections obtained at 60% of the ECG R-R interval. A single observer quantified calcium in each scan independent of knowledge of calcium quantity in the repeated scan. We then modeled a relationship between the variation of the differences between repeated measurements of calcium and the magnitude of the calcium score and formulated 95% repeatability coefficient equations for the Agatston and volumetric CAC score. The equations allow determining the smallest statistically significant interval change in the calcium score between two serial measurements in a given subject.
RESULTS. In a subject with measurable CAC at baseline, the smallest statistically significant interval change is ± (4.930 x square root of baseline Agatston CAC score) or ± (3.445 x square root of baseline volumetric CAC score). In a subject with no measurable CAC at baseline, a follow-up CAC score exceeding 11.6 Agatston units or 9.5 mm3 qualifies for statistically significant progression. The results were similar in men and women.
CONCLUSION. By examining repeatability of quantitative electron beam CT measurements of coronary calcium as a function of the magnitude of the calcium score, we developed a model to determine the smallest statistically significant change between serial measurements in a given subject.
|
|
|---|
To answer these questions, we must first estimate with an appropriate statistical model the agreement between repeated quantitative electron beam CT measurements of CAC, also called repeatability [3]. The International Organization for Standardization defines repeatability as "precision under repeatability conditions" where independent repeated measurements are obtained by the same observer with the same method in the same laboratory using the same equipment within short intervals of time [4]. The need to consider agreement arises because repeated CAC measurements do not, in general, yield identical results because of unavoidable random errors inherent in every measurement procedure [4]. In quantitative electron beam CT of CAC, the scanner and heart rate variability are two major causes of the measurement error [5, 6].
Much of the existing literature on the measurement error of quantitative electron beam CT of CAC has focused on estimating the average agreement between repeated measurements across the entire range of CAC scores or in arbitrary subgroups within the range [7-16]. However, in day-to-day clinical practice, these average estimates of repeatability do not help us to interpret an individual change in the CAC score between serial measurements. This is because the size of the measurement error is directly proportional to the magnitude of CAC [17-19]. In the practical interpretation of CAC score changes between serial measurements in a given subject, this relationship has to be taken into account [4].
A study of agreement between repeated quantitative electron beam CT CAC measurements by Bielak et al. [17] was first to account for the relationship between the differences in repeated CAC measurements and the magnitude of CAC. They applied the regression method of Bland and Altman [3] but chose a linear regression model that fitted the data poorly, thus overestimating agreement between repeated CAC measurements for small CAC scores and underestimating it for large CAC scores. In a similar study, Hokanson et al. [18] accounted for the relationship between the differences and the magnitude of CAC by square root transformation of the CAC score. Although square root transformation could in principle be used in this context, it has little practical value because estimates of agreement derived from square root transformed data cannot be back-transformed to give estimates for the actual measurements; only the logarithmic transformation allows the results to be interpreted in relation to the original data [20, 21]. In fact, the way the researchers presented their resultsgiving units of cubic millimeters (mm3) for the square root of the CAC volumetric score (i.e., mm3/2)may be misleading and easily overlooked by clinical scientists.
Use of logarithmic (log) transformation of quantitative electron beam CT measurements of CAC is complicated because we cannot log zero CAC scores. Because of the complexities in removing the relationship between the differences in repeated CAC measurements and the magnitude of CAC, we approached this problem by modeling this relationship with a more general statistical method that uses natural CAC score units.
In this article, we are concerned with determining the smallest statistically significant difference between repeated measurements at a given magnitude of the Agatston [22] and volumetric [9, 23] CAC scores. If the observed change in the subject's CAC score between two serial measurements is greater than the smallest statistically significant difference between repeated measurements for that magnitude of CAC, then one is certain that the change is too large to be explained by the measurement error alone. In other words, the observed interval change in the CAC score can be judged statistically significant. We estimate repeatability of quantitative electron beam CT of CAC by use of 95% repeatability limits and a regression approach for nonuniform differences that models the relationship between the differences and the magnitude of CAC in natural CAC score units.
|
|
|---|
Both repeated scans in each pair were displayed side-by-side on an image-analysis workstation monitor and scrutinized for visible ECG trigger errors, respiratory motion artifacts, and incomplete coverage, which served as exclusion criteria. Because coronary artery motion artifacts (particularly in the right coronary artery) are responsible for a considerable portion of the measurement error, which was the focus of our study, we intentionally did not use these artifacts as a basis for exclusion. After this selection procedure, our final study sample consisted of 2,217 pairs of electron beam CT CAC scans, of which 1,598 were obtained in men with a mean age of 51 years (SD, 9.5) and 619 in women with a mean age of 55 years (SD, 9.5).
Our sampling procedure was likely to yield a sample that was not representative of the population of interest in regard to demographics and coronary heart disease risk factors, which are both related to the subject's magnitude of CAC [25]. CAC score repeatability, however, is primarily and directly determined by the magnitude of CAC; to ensure the validity of our study results, it was important to make certain that our sample had a substantial range of CAC scores. Among the 2,217 subjects, there were 1,153 subjects (52%) with no detectable CAC and 1,064 subjects (48%) with some CAC on the first scan. The number of subjects with no detectable CAC was equal between two repeated scans. The median positive CAC score at the first scan was 63 Agatston units (interquartile range, 17.8-191) and 49.5 mm3 (interquartile range, 15.3-146.8). The median positive CAC score at the second scan was 62.2 Agatston units (interquartile range, 17.4-191.6) and 48.2 mm3 (interquartile range, 15.1-150.2).
Electron Beam CT Protocol and Quantitative Measurement of CAC
All electron beam CT CAC studies were performed using the same scanner
(C-150XL Imatron, GE Healthcare) and the same scan protocol. Two sets of
100-msec images were acquired using ECG triggering at 60% of the ECG R-R
interval with a 1- to 2- min interval, each in a single breath-hold. Each set
consisted of forty 3-mm tomographic sections covering the entire coronary
tree. Images were reconstructed to a 512-pixel matrix using a sharp kernel and
a 26-cm display field of view (0.5-mm pixel size). They were transferred to an
image-analysis workstation (NetraMD, ScImage, Inc.) for quantitative
measurement of CAC. Patient-identifying information was removed from all 4,434
individual scans (2,217 x 2), and their list was randomized before
measuring CAC. This ensured that repeated scans did not appear in pairs. In
accordance with the ISO 5725-1 standard established by the International
Organization for Standardization
[4], a single observer
quantified CAC in each scan independent of knowledge of CAC quantity in the
repeated scan. At least 3 adjacent pixels (total area of 0.75 mm2)
with an attenuation value of 130 H or greater were required to define a CAC
plaque. CAC was measured by use of the Agatston
[22] and volumetric
[9,
23] methods and expressed in
Agatston score units and cubic millimeters (mm3), respectively. The
total CAC score for each scan in a pair was the sum of all the lesions scored
within the left main, left anterior descending, left circumflex, and right
coronary arteries.
Repeatability Analysis
We estimated repeatability in our sample by evaluating the agreement
between repeated quantitative electron beam CT measurements of CAC. When
examining repeatability, we expect the mean difference (bias or systematic
difference) between repeated measurements to be zero because the same method
was used. Repeatability depends only on the distribution of random measurement
errors. The SD of repeated measurements, also called repeatability SD, enables
us to measure the size of the measurement error
[4].
|
|
...the value less than or equal to which the absolute difference between two test results obtained under repeatability conditions may be expected to be with a probability of 95%.
In this article, we use the term "repeatability coefficient" to describe the absolute difference, removing the sign, between two repeated measurements. We use the term "repeatability limit"as in upper repeatability limit or lower repeatability limitto describe the upper and lower boundaries of the difference between two repeated measurements.
The difference between two repeated measurements is expected to be less than 1.96 SD for 95% of paired observations [20]. This is the 95% repeatability coefficient. Hence, if the difference between two serial CAC scores in a given subject is greater than the estimated 95% repeatability coefficient for this subject's magnitude of CAC, then we can be 95% certain that the change is too large to be explained by the measurement error alone. Therefore, the 95% repeatability coefficient represents the smallest statistically significant change between serial measurements of CAC in a given subject. In practice, we would use the observed value of the baseline CAC score as an estimate of the magnitude of CAC to give 95% limits within which a follow-up CAC score would be expected to lie.
Agreement between repeated measurements is a question of estimation, not hypothesis testing. Because estimates are usually made with some sampling error, we calculated 95% confidence intervals for the repeatability limit estimates to give an indication of the precision of these estimates.
In quantitative electron beam CT of CAC, the size of the measurement error
tends to increase as the magnitude of the CAC score increases
[17,
18]. Thus, repeatability of
this method can be defined as a function of the size of the measurement. We
use a regression approach for nonuniform differences proposed by Bland and
Altman [3,
20,
26,
27]. This approach is based on
simple calculations and graphical techniques; it does not involve the creation
of arbitrary CAC groups and makes no additional assumptions. The method
involves modeling the variation of the differences (D) between repeated
measurements directly as a function of the magnitude of the measurement, which
is estimated by the average (A) of the two CAC measurements
[20,
26]. We take the absolute
values of D, removing the sign (|D|), and regress these absolute
differences on A, our best estimate of the magnitude of CAC for that subject.
Because repeatability depends only on the distribution of random measurement
errors, D follows a normal distribution with mean zero at all levels of A, and
|D| will have a half-normal distribution, the right-hand half of
the normal distribution. This has a mean equal to the SD of |D|
multiplied by
. Hence, if we take the
mean value of |D| and multiply by
, we get the SD of the differences
[26]. We then estimate the 95%
repeatability coefficient for any given magnitude of CAC. The half-normal
method provides a powerful and simple method for estimating measurement error
that is neither constant nor proportional to magnitude.
|
|
|
|
|---|
![]() | (1) |
Inspection of the equation suggested that the main predicting variable was
a square root transform of A, and we could omit the final term and fit a
simple function of
. Regression of
|D| on
gave
![]() | (2) |
The residual mean square was 186.96 and compared favorably with 183.44 for
fractional polynomial regression of |D| on A. Because the
residual variance is almost the same, we can retain the simple formulation. To
avoid negative SD at small magnitudes of A, we tried forcing the constant term
in the regression equation to be zero. The resulting residual variance was
187.53that is, essentially the same as previously obtained values;
therefore, we decided to use regression of |D| on
with the constant set to zero, giving
![]() | (3) |
The SD of the differences is found by multiplying the mean value of
|D| by
, giving
.
The difference between two repeated measurements on the same subject is
expected to be within 1.96 SD for 95% of observations. The resulting 95%
repeatability coefficient (r) for the Agatston score can be expressed
with the following equation:
![]() | (4) |
Thus for any subject, we expect two repeated measurements to be less than
4.930
apart with a probability of 95%
[19]. To use the equation, we
simply put in the observed CAC score. For example, if we measure CAC to be 100
Agatston units, we get the repeatability coefficient of 4.930
. Hence, a repeated measurement of CAC
in the same subject would be between 51 and 149 Agatston units with a 95%
probability. For a subject with a baseline CAC score of 100 Agatston units who
undergoes a follow-up study months or years later, a change in the CAC score
greater than 49 units would be statistically significant: A follow-up CAC
score greater than 149 units would indicate statistically significant CAC
progression, and a follow-up CAC score less than 51 units would indicate
statistically significant CAC regression.
The confidence interval for the coefficient in equation 4 (i.e., 4.930) is found from the 95% confidence interval in the regression analysis. For the regression slope, this was 1.943-2.070, giving 4.77-5.08 for the repeatability limit co-efficient given earlier, a very narrow confidence interval. Figure 2 shows 95% repeatability limits for the Agatston score.
We can check what proportion of differences lies between the 95% limits. This is 98.0%. If we restrict our attention to subjects whose scores are nonzero on both the first and the repeated measurements, the limits include 96.1% of differences.
What about when one of the CAC scores in a pair is zero? With this convenience sample, we cannot estimate the probability that another observation would be zero. We can estimate what it might be when the other observation is not zero. A subsample was assembled of observations with one zero and one nonzero observation (n = 113). The 95th centile of the distribution of nonzero second observations if the first is zero is 11.6 with a 95% confidence interval of 9.1-14.7. The 94th centile is directly estimated, and the confidence interval is found by the binomial method [29] and lies between the 103rd and 112th observations. Thus, we can say that if the first observation is zero, a second is unlikely to exceed 11.6 Agatston units.
Sex Effect for the Agatston Score
The effect of sex on the differences between repeated CAC measurements was
not significant (p = 0.8), so we can ignore it.
Measurement Error for the Volumetric Score
First, for our 2,217 pairs of observations, we plotted the difference (D)
versus the average (A) of repeated measurements of CAC to check assumptions
(Fig. 3). The plot shows that
as with the Agatston score, the difference between repeated measurements tends
to increase as the magnitude of CAC (estimated by the average of two
measurements) increases.
The regression was performed as described. Again, the fractional polynomial
suggested the main predicting variable was a square root transform of A. The
residual variance was 79.38. Using
as a
predictor of |D|, we calculated that the residual variance was
80.94, and if we set the constant term to zero, it was 81.09. This looks
almost as good and avoids negative SD at small magnitudes, so we decided to
use regression of |D| on
with the
constant set to zero, giving
![]() | (5) |
The SD of the differences is found by multiplying the mean value of
|D| by
, giving
.
|
![]() | (6) |
For example, if we measure CAC to be 100 mm3, we get the
repeatability coefficient of 3.445
.
Hence, a repeated measurement of CAC in the same subject would be between 66.5
and 134.5 mm3 with a 95% probability. For a subject with a baseline
CAC score of 100 mm3 who undergoes a follow-up study months or
years later, an upward or downward change in the CAC score greater than 34.5
mm3 would be statistically significant.
The confidence interval for the coefficient in equation 6 (i.e., 3.445 in this case) is found from the 95% confidence interval in the regression analysis, as before. For the regression slope this was 1.356-1.450, giving 3.33-3.56 for the repeatability limit coefficient given earlier, another narrow confidence interval. Figure 4 shows 95% repeatability limits for the volumetric score. They are visibly narrower than those for the Agatston score.
The proportion of differences that lies between these limits was 97.1%. If we consider only nonzero measurements, the proportion between the limits was 95.5%.
What about when the observation is zero? As before, we cannot estimate the probability that another observation would be zero. We can estimate what it might be when the other observation is not zero. The 95th centile of the distribution of nonzero second observations if the first is zero is 9.5, with 95% confidence interval of 8.4-14.0. Thus, we can say that if the first observation is zero, a second is unlikely to exceed 9.5 mm3.
Sex Effect for the Volumetric Score
Sex was not a significant predictor of the differences between repeated CAC
measurements (p = 0.5).
|
|
|---|
Table 1 shows estimated 95% repeatability coefficients and corresponding 95% repeatability limits at increasing magnitudes of the Agatston and volumetric CAC scores. To determine whether a given subject had a statistically significant change between two serial CAC measurements, one finds the 95% repeatability coefficient that corresponds to the subject's baseline CAC score. If the observed change in the CAC score between baseline and follow-up is greater than the corresponding 95% repeatability coefficient, then one is 95% certain that the change is too large to be explained by the measurement error alonethat is, it is statistically significant.
|
Let's return to our subject A with a baseline CAC score of 10 units and subject B with a baseline CAC score of 100 units who both had a 20-unit interval increase in the CAC score. Table 1 shows that subject A in fact had significant CAC progression because his follow-up CAC score (30 units) exceeded the upper 95% repeatability limit for a score of 10 units, for both the Agatston and volumetric CAC scores. Subject B's follow-up CAC score (120 units), however, fell within the 95% repeatability limit for a score of 100 units, for both types of the CAC score. We therefore can conclude with probability of 95% that an apparent interval change in subject B's CAC score was attributable to the method's measurement error and not to actual CAC progression.
The 95% repeatability coefficient (or limits) has been widely cited and widely used in clinical and radiographic measurement studies to estimate the agreement between two methods of measurement or the agreement between two measurements made by the same method [30-33]. In both cases, the resulting estimates of agreement facilitate the interpretation of the individual measurement, being in the same units [20].
The use of the 95% repeatability coefficient does not imply that other approaches to repeatability, such as intraclass correlation, are not appropriate. Because repeated observations are replicates of the same measurement there is no bias, and correlation can be used in the analysis of such data provided there is a population from which the sample can be regarded as a representative sample. This is often not the case in the study of clinical or radiographic measurements, where samples are often chosen to include more subjects with extremely high or low values than would a representative sample. However, even when it is appropriate, the correlation coefficient does not help us to interpret an individual clinical measurement. To do this, we need to consider the agreement between repeated measurements on the same subject, which can be done by use of the 95% repeatability coefficient [27].
The 95% repeatability coefficient does not tell us whether the method of CAC detection is sufficiently repeatable to be used in tracking interval changes in the CAC score. This is a clinical, not a statistical, decision.
The 95% repeatability coefficient method is limited at the very low end of CAC scores. Figure 5 shows a plot of the differences versus the average of repeated measurements of CAC for very low Agatston CAC scores. We can see that the 95% repeatability limits are too wide at these low values. No differences are outside the limits until the average CAC score exceeds 5 Agatston units. This results in the negative lower 95% repeatability limit at these very low scores. This is because there are many zero second observations at such low CAC scores, and the normal approximation for differences breaks down. This is unlikely to pose a problem because clinical significance of such low CAC scores is uncertain. To avoid the problem, set any negative lower 95% repeatability limit to zero.
In this article, we use the 95% repeatability limits and a regression approach for nonuniform differences to present a working example of estimating repeatability of a quantitative electron beam CT protocol on one scanner. Other electron beam CT or mechanical CT protocols and scanners are likely to have different measurement errors, and therefore their repeatability should be estimated in separate analyses. The general approach used in this study, however, can be easily applied to other CT protocols and scanners provided replicate quantitative CT measurements obtained under repeatability conditions are available.
By accounting for a positive relationship between the method's measurement error and the magnitude of CAC, the statistical model presented in this article allowed us to determine the smallest statistically significant change in the CAC score between serial measurements in a given subject.
|
|
|---|
This article has been cited by other articles:
![]() |
D. E. Schraufnagel, J. C. Michel, T. J. Sheppard, P. C. Saffold, and G. T. Kondos CT of the Normal Esophagus to Define the Normal Air Column and Its Extent and Distribution Am. J. Roentgenol., September 1, 2008; 191(3): 748 - 752. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Chung, R. L. McClelland, R. Katz, J. J. Carr, and M. J. Budoff Repeatability Limits for Measurement of Coronary Artery Calcified Plaque with Cardiac CT in the Multi-Ethnic Study of Atherosclerosis Am. J. Roentgenol., February 1, 2008; 190(2): W87 - W92. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Budoff, S. Achenbach, R. S. Blumenthal, J. J. Carr, J. G. Goldin, P. Greenland, A. D. Guerci, J. A.C. Lima, D. J. Rader, G. D. Rubin, et al. Assessment of Coronary Artery Disease by Cardiac Computed Tomography: A Scientific Statement From the American Heart Association Committee on Cardiovascular Imaging and Intervention, Council on Cardiovascular Radiology and Intervention, and Committee on Cardiac Imaging, Council on Clinical Cardiology Circulation, October 17, 2006; 114(16): 1761 - 1791. [Full Text] [PDF] |
||||
![]() |
T. J. Dubinsky Coronary Artery Calcification Scoring Am. J. Roentgenol., December 1, 2005; 185(6): 1540 - 1541. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |