Original Research
Neuroradiology/Head and Neck Imaging
December 2009

Bidimensional Measurements in Brain Tumors: Assessment of Interobserver Variability

Abstract

OBJECTIVE. Bidimensional tumor measurements indicating a greater than 25% increase in tumor size are generally accepted as indicating tumor progression. We hypothesized that use of digital images and a homogeneous reader population would have lower interobserver variability than in previous studies.
SUBJECTS AND METHODS. Eight board-certified radiologists measured tumor diameters in three planes in two consecutive MRI examinations of 22 patients with contrast-enhancing high-grade brain tumors. Products of tumor measurements were calculated, and determinations were made about tumor progression (> 25% increase in area). A variance components model was run on diameter products and the ratios of consecutive maximal diameter products. The variance components included patient examination effect, reader effect, and residual effect.
RESULTS. Complete agreement was found among readers in 10 cases (45%), all indicating stable disease. In the other 12 cases, at least one reader considered progressive disease present. The variance components model showed variance due to readers was small, indicating only modest bias among readers. The residual variance component was large (0.038), indicating that repeated measurements on the same image likely are variable even for the same reader. This variability in measurement implies that repeated measurements by the typical reader have an inherent 14% false-positive rate in the diagnosis of progression of tumors that are stable.
CONCLUSION. Our hypothesis was disproved. We found substantial interreader disagreement and indications that the very nature of the measurement method produces a high rate of false-positive readings of stable tumors. These findings should be considered in interpretation of images with this widely accepted criterion for brain tumor progression.

Introduction

Cross-sectional imaging studies play an important role in evaluation of patients with tumors because they offer a method of measuring tumor size, which is one of the major methods of assessing tumor progression and response to therapy. In 1990, rigorous criteria for categorization of brain tumor response were suggested that were based on change in the largest cross-sectional diameter of the enhancing portion of the tumor on serial CT or MR images [1]. These criteria are now generally accepted for brain tumors and are in widespread use. These criteria (hereafter referred to as the Macdonald criteria) figure importantly in decisions about whether a brain tumor has responded to therapy and, thus, whether therapy is to be continued or altered. It seems reasonable to consider the test performance characteristics of a test that carries such important weight in determination of therapeutic changes. To our knowledge, such an assessment has seldom been performed. In one such evaluation [2], consensus agreement among five readers was found for only 35% of CT studies and 45% of MRI studies of brain tumors. However, the study was limited by a mixed reader sample consisting of radiologists and nonradiologists and by performance of measurements on hard-copy film, which is subject to inaccuracies based on the fact that measuring devices external to the image must be placed on the image.
With the popularization of digitization in medical imaging, image interpreters at medical institutions typically review electronic images and measure tumors using software programs that allow placement of electronic calipers on lesions. Measurements on electronic images would be expected to be both more accurate and more reproducible than those on hard-copy film, although to our knowledge, such a comparison has not been made.
We set out to study interobserver variability in the use of digital images of tumors in a homogeneous sample consisting solely of radiologists at the same level of training. We hypothesized that use of digital images and a homogeneous reader group would have lower interobserver variability than the consensus agreement of 45% in MR studies of high-grade brain tumors in a previous study [2] in which the subjects were readers in different disciplines using hard-copy film. In that study, consensus agreement was 45% on MRI studies of high-grade brain tumors.

Subjects and Methods

We enrolled into the study 22 patients who were treated at our brain tumor center. The study was approved by the institutional review board at our hospital, and informed consent was obtained from each patient. Patients were randomly selected in the following manner. The study research assistant reviewed the list of patients referred from our brain tumor center who were to undergo MRI in our radiology department. To have been considered for the study, patients had to have undergone a previous MRI examination that showed contrast-enhancing tumor and had to be competent to provide informed consent. Patients provided consent for analysis of the images from the initial and subsequent MRI examinations. Thus the data set consisted of two sets of MR images of each patient. Identifying information was removed from the MR images with the workstation software (eFilm, Merge Healthcare). In all cases, the initial and subsequent MRI examinations were performed within 12 weeks of each other.
All tumors were high-grade gliomas (World Health Organization grade 3 or grade 4), and the patients had undergone biopsy alone or biopsy in combination with surgery. Five tumors had central nonenhancing regions consistent with necrosis, and 11 tumors had central nonenhancing regions representing resection cavities. Seven tumors had ill-defined borders consistent with infiltrative tumor. Two tumors had multiple enhancing foci interspersed with nonenhancing tissue. In both cases, only the largest tumor was measured.
Tumor measurements for determining treatment response according to the Macdonald criteria are based on the product of orthogonal diameters on the image with the largest contrast-enhancing tumor area. In the event that multiple lesions are present, the sum of products of individual measurable lesions is calculated. The Macdonald criteria for assessment of tumor response are as follows. A complete response is determined by resolution of all contrast-enhancing tumor. A partial response is defined by an at least 50% decrease in the product of two orthogonal diameters. Progressive disease is defined by at least a 25% increase in the product of orthogonal diameters. Confirmation of all three of these determinations with a repeated imaging study 4 weeks after the first is suggested. All other interval changes are defined as stable disease.
Image acquisition parameters were as follows: unenhanced axial T1-weighted images, contrast-enhanced T1-weighted images (TR/TE, 480/14; 5-mm slices with 2.5-mm interslice gap), T2-weighted axial MR images (2,200/80–30, 5-mm slices with 2.5-mm interslice gap), FLAIR axial images (8,000/120; inversion time, 2,000 milliseconds; 5-mm images with 2.5-mm interslice gap), and contrast-enhanced coronal T1-weighted spoiled gradient-recalled echo images (12/5, 1.2-mm slices). Images were reconstructed in the sagittal plane with 1.6-mm contiguous slices from the coronal spoiled gradient-recalled echo data set.

Image Display and Measurement Techniques

Eight readers (all board-certified radiologists participating in a neuroradiology fellowship program at our institution) were recruited to perform tumor measurements for all patients. In two sessions within a 2-week span, readers individually viewed images with a desktop computer running workstation software (eFilm). Readers were aware that they were measuring tumors on images from pairs of examinations and that the images from the first examination would be presented before those from the second examination. This sequence of presentation was chosen because it most closely simulated the actual clinical practice of obtaining measurements on images from serial studies.
Readers placed electronic calipers on the images in each plane and measured tumor diameters in two orthogonal directions on various images. They chose the image showing the greatest bidimensional product and recorded the diameters and slice chosen on a paper form. The information was entered into an electronic database by a research assistant. Readers were not asked to calculate products of tumor measurements or to assess tumor stability or progression. Products of tumor measurements were calculated by the research assistant, and determinations of tumor stability or progression were made by the research statistician involved in the study.

Statistical Analysis

The data consisted of two sets of measurements of 22 patients. Each set of measurements was a set of six diameters (two diameters in each of three planes) recorded by each of eight readers. These values were used to determine disease progression as defined by the criterion of a 25% increase in the maximum product of diameters from any of the three slice directions. The statistical methods used in the data analysis were as follows.
Assessment of interreader variability—A set of 28 simple kappa values for determination of disease progression was computed for each pair of readers. The average kappa value over the 28 pairs also was computed with its standard error estimated with the statistical jackknife technique. Using this method, we were able to assess degree of agreement between readers.
Effect of tumor size on tendency to indicate tumor progression—We assessed whether measurements indicative of tumor progression were more likely for large tumors. To answer this question, Kendall's correlation was used to correlate the reader average progression determination with initial tumor size.
Reproducibility of tumor measurements in each imaging plane—A set of Pearson's correlation coefficients was calculated for each of the six diameter measurements. SD and absolute deviation from the mean were computed over the eight readers as measures of dispersion. The signed rank test was used to compare these dispersion measures for the three measurement planes.
Estimation of variance due to reader contribution and residual contribution—To have a set of essentially stable tumors from which to estimate the variability component alone, we removed from the data set the cases of two patients that were most consistent with tumor progression. Variance components models were run on each of the six measures, on the maximum diameter product, and on the ratio of the maximum diameter products. Variance components for tumor, reader, and residual were used in the models. This variance components model also was fit to the ratio of consecutive maximum bidimensional planar products.

Results

Based on the Macdonald criteria, complete agreement was found among readers in 10 cases, all indicating stable disease; 12 patients were judged by at least one reader to have tumor measurements indicating progressive disease (Table 1). A total of 176 readings were made of image sets from two serial examinations (eight readers, 22 patients). Tumor measurements consistent with tumor progression were made in a total of 24 instances (14%).
TABLE 1: Rate of Agreement Among Readers Regarding Measurements Indicating Progressive Disease or Stable Disease
No. of Readers Making Finding
Stable DiseaseProgressive DiseaseNo. of Cases
8010
716
622
533
3
5
1
All readers recorded tumor measurements consistent with progressive disease in at least one patient. For any one reader, the number of patients in whom progressive disease was considered present ranged from one patient (4.5% of patients assessed) to five patients (22.5%) (Table 2). Two readers considered tumor progression present in five patients; three readers, three patients; two readers, two patients; and one reader, one patient. The postoperative changes did not appear to influence subsequent tumor measurements. The Kendall correlation value of 0.44 (p = 0.01) between the reader average progression call and the reader average initial tumor size judgment indicated a statistically significant relation.
TABLE 2: Numbers of Measurements Indicating Tumor Progression Classified by Patient and Reader
Patient No.Reader No.No. of Readings Indicating Progressive Disease in Specific Tumor
 12345678 
1110110105
2000000112
3100010002
4000000000
5001001103
6000000000
7000000000
8000000000
9000000101
10000000000
11000100113
12000000000
13000000000
14101100003
15000000000
16000010001
17010000001
18000010001
19000010001
20000000000
21001000001
22
0
0
0
0
0
0
0
0
0
No. of readings indicating progressive disease in any tumor
3
2
3
3
5
1
5
2
24

Reader Agreement

Complete agreement among readers was found in 10 cases (45%), all readers considering stable disease present (Tables 1 and 2) (Fig. 1A, 1B, 1C, 1D, 1E, 1F, 1G, 1H, 1I). In six cases, consensus was reached by seven of eight readers. In all of these cases, the readers provided measurements indicating stable disease. In two cases, six readers agreed and two readers disagreed on disease status. In the other five cases, five of eight readers agreed and three disagreed on disease status (Fig. 2A, 2B, 2C, 2D, 2E, 2F, 2G, 2H).

Interreader Variability

Twenty-eight sets of reader pairwise Cohen kappa statistics were computed for the overall binary determination of progression. The pairwise average kappa value was 0.13 (SE, 0.07; p > 0.05) with a range of kappa values of –0.21 to 0.61, indicating weak agreement among readers.

Reproducibility of Tumor Measurements in Each Imaging Plane

For each of the six diameter measurements, all 28 reader pairwise Pearson's correlation coefficients were computed. The averages of these pairwise correlations were 0.77 for the axial plane, 0.74 for the coronal plane, and 0.82 for the sagittal plane. The corresponding ranges for the correlations were 0.53–0.96, 0.46–0.93, and 0.64–0.96. These correlations indicated only moderate reproducibility between readers in the measurements of the diameters. The two reader dispersion measures of SD and absolute deviation around the mean were not significantly different between the image planes. The signed rank test yielded p > 0.2 for all three comparisons.

Effect of Tumor Size on Tendency to Indicate Tumor Progression

After removal of two cases most consistent with tumor progression from the data set to have a set of essentially stable tumors from which to estimate the variability component alone, a variance components model was fit to the ratio of consecutive maximum planar products. The measurement variability tended to be greater for the larger tumors, and the use of the ratio as the response variable also served to stabilize this variance.

Estimation of Variance Due to Reader Contribution and Residual Contribution

A variance components model fit to the ratio of consecutive maximum planar products showed that variance due to reader was small, indicating only modest bias among readers. This result indicated that individual readers did not tend to consistently measure tumors as either larger or smaller than the other readers did. However, the residual variance component was large (0.038), indicating that repeated measurements in the same case are likely variable even with the same reader. The square root of this residual variance component, 0.19 or 19%, indicates an estimate of the uncontrolled relative variation from measurement to measurement of the maximum planar product. This result suggests a limitation on the precision of the current technique.
The measurement variability implies an inherent 14% false-positive rate for the diagnosis of progression of stable tumors for repeated measurements by the typical reader. Based on the fitted model and with the assumption of a normal distribution for measurement variability, redefinition of progressive disease as a 30% increase in maximum would give a 6% false-positive rate. Furthermore, redefinition of progressive disease as a 50% increase in maximum pair product would be expected to result in a false progression rate of 2%.
Fig. 1A MRI findings in 45-year-old man with biopsy-proven glioblastoma multiforme show an example of tumor in which no reader found changes in product of bidimensional diameters that indicated presence of progressive disease (i.e., size did not increase by > 25%). F–I were obtained 10 weeks after A–E. Unenhanced T1-weighted axial image shows region of abnormal signal intensity in right hemisphere consistent with tumor.
Fig. 1B MRI findings in 45-year-old man with biopsy-proven glioblastoma multiforme show an example of tumor in which no reader found changes in product of bidimensional diameters that indicated presence of progressive disease (i.e., size did not increase by > 25%). F–I were obtained 10 weeks after A–E. Contrast-enhanced T1-weighted axial image shows enhancing tumor in posterior aspect of right frontal lobe. Four of eight readers chose this image as having greatest bidimensional product in axial dimension.
Fig. 1C MRI findings in 45-year-old man with biopsy-proven glioblastoma multiforme show an example of tumor in which no reader found changes in product of bidimensional diameters that indicated presence of progressive disease (i.e., size did not increase by > 25%). F–I were obtained 10 weeks after A–E. Contrast-enhanced T1-weighted axial image at slightly more cephalic level than B also shows enhancing tumor in posterior aspect of right frontal lobe. Four readers who did not choose B chose this image as having greatest bidimensional product.
Fig. 1D MRI findings in 45-year-old man with biopsy-proven glioblastoma multiforme show an example of tumor in which no reader found changes in product of bidimensional diameters that indicated presence of progressive disease (i.e., size did not increase by > 25%). F–I were obtained 10 weeks after A–E. Representative image from coronal 3D contrast-enhanced T1-weighted data set shows inhomogeneously enhancing mass. Four of eight readers chose this image as having greatest bidimensional product in coronal dimension. Other four readers chose total of three images as having greatest bidimensional product.
Fig. 1E MRI findings in 45-year-old man with biopsy-proven glioblastoma multiforme show an example of tumor in which no reader found changes in product of bidimensional diameters that indicated presence of progressive disease (i.e., size did not increase by > 25%). F–I were obtained 10 weeks after A–E. Contrast-enhanced T1-weighted sagittal image shows tumor in posterior aspect of frontal lobe. Image was chosen by all eight readers as image in sagittal imaging sequence in which greatest bidimensional product was measured.
Fig. 1F MRI findings in 45-year-old man with biopsy-proven glioblastoma multiforme show an example of tumor in which no reader found changes in product of bidimensional diameters that indicated presence of progressive disease (i.e., size did not increase by > 25%). F–I were obtained 10 weeks after A–E. Unenhanced axial MR image shows no change in right frontal lobe mass compared with A–E.

Discussion

The major findings in this study are as follows. First, we found that the rate of consensus agreement among all readers with regard to tumor progression was relatively low (45%), which was identical to that found in a prior study using hard copy film and a heterogeneous set of readers that was smaller in size than our study [2]. Next, when assessing interreader variability with regard to tumor progression using kappa analysis, we found weak agreement among readers. The analysis of reproducibility of tumor measurements showed that for each of the three imaging planes, only moderate agreement among readers was found and that variability in the measurements in any one plane did not substantially differ between imaging planes. Variability in measurements among larger tumors tended to be greater than for smaller tumors. Thus our hypothesis that use of electronic calipers placed on digital images by a homogeneous group of readers would result in improved interreader variability was disproved.
Fig. 1G MRI findings in 45-year-old man with biopsy-proven glioblastoma multiforme show an example of tumor in which no reader found changes in product of bidimensional diameters that indicated presence of progressive disease (i.e., size did not increase by >25%). F–I were obtained 10 weeks after A–E. Contrast-enhanced T1-weighted axial image shows enhancing tumor in posterior aspect of right frontal lobe. All eight readers chose this image as having greatest bidimensional product in axial dimension. No obvious change from initial MR images (A–E) is evident.
Fig. 1H MRI findings in 45-year-old man with biopsy-proven glioblastoma multiforme show an example of tumor in which no reader found changes in product of bidimensional diameters that indicated presence of progressive disease (i.e., size did not increase by >25%). F–I were obtained 10 weeks after A–E. Image from coronal contrast-enhanced T1-weighted data set shows tumor. Three readers chose this image as having greatest bidimensional product in coronal dimension. Other five readers chose total of four images as having greatest bidimensional product.
Fig. 1I MRI findings in 45-year-old man with biopsy-proven glioblastoma multiforme show an example of tumor in which no reader found changes in product of bidimensional diameters that indicated presence of progressive disease (i.e., size did not increase by >25%). F–I were obtained 10 weeks after A–E. Contrast-enhanced T1-weighted sagittal image shows no obvious change from first MR images. Eight readers found change in maximal bidimensional between examinations ranged from –4% to 14%. Thus measurements by no readers indicated tumor progression.
Fig. 2A MRI findings in 53-year-old woman show example of tumor in which measurements of five readers indicated changes in product of bidimensional diameters that indicated progressive disease (i.e., ≥ 25% increase in size) and those whose measurements of three readers indicated stable disease (i.e., not indicating change in size < 25%). E–H were obtained 8 weeks after A–D. Unenhanced T1-weighted axial image shows region of abnormal signal intensity in left occipital, temporal, and parietal lobes consistent with tumor. Small region of high signal intensity consistent with hemorrhage or calcification is evident.
Fig. 2B MRI findings in 53-year-old woman show example of tumor in which measurements of five readers indicated changes in product of bidimensional diameters that indicated progressive disease (i.e., ≥ 25% increase in size) and those whose measurements of three readers indicated stable disease (i.e., not indicating change in size < 25%). E–H were obtained 8 weeks after A–D. Contrast-enhanced T1-weighted axial image shows enhancing tumor in posterior aspect of left parietal lobe. All readers chose this image as having greatest bidimensional product in axial dimension.
Fig. 2C MRI findings in 53-year-old woman show example of tumor in which measurements of five readers indicated changes in product of bidimensional diameters that indicated progressive disease (i.e., ≥ 25% increase in size) and those whose measurements of three readers indicated stable disease (i.e., not indicating change in size < 25%). E–H were obtained 8 weeks after A–D. Representative image from coronal 3D contrast-enhanced T1-weighted data set shows enhancing mass to have ill-defined medial border. Two of eight readers chose this image as having greatest bidimensional product in coronal dimension. Other six readers chose total of three images as having greatest bidimensional product.
Measurement of tumor size on images figures importantly in clinical care and therapeutic trials because tumors in internal organs (e.g., brain tumors) are not palpable or otherwise amenable to physical examination, and change in size cannot be estimated from functional changes (e.g., findings at neurologic examinations or assessments of function, such as Karnofsky score). The importance of initial tumor size measurements of brain tumors (i.e., before therapy) has been emphasized. Results of one study [3] indicated that the size of recurrent glioblastoma multiforme managed with brachytherapy is a significant predictor of survival. Another study showed that the response of malignant glioma to chemotherapy is related to tumor size [4].
Fig. 2D MRI findings in 53-year-old show example of tumor in which measurements of five readers indicated changes in product of bidimensional diameters that indicated progressive disease (i.e., ≥ 25% increase in size) and those whose measurements of three readers indicated stable disease (i.e., not indicating change in size < 25%). E–H were obtained 8 weeks after A–D. Contrast-enhanced T1-weighted sagittal image shows contrast-enhancing tumor. This image was chosen by four readers as image in sagittal imaging sequence in which greatest bidimensional product was measured. Other four readers chose total of three images.
Fig. 2E MRI findings in 53-year-old show example of tumor in which measurements of five readers indicated changes in product of bidimensional diameters that indicated progressive disease (i.e., ≥ 25% increase in size) and those whose measurements of three readers indicated stable disease (i.e., not indicating change in size < 25%). E–H were obtained 8 weeks after A–D. Unenhanced axial MR image shows no change in left temporal, occipital, and parietal lobe mass compared with A–D.
Fig. 2F MRI findings in 53-year-old show example of tumor in which measurements of five readers indicated changes in product of bidimensional diameters that indicated progressive disease (i.e., ≥ 25% increase in size) and those whose measurements of three readers indicated stable disease (i.e., not indicating change in size < 25%). E–H were obtained 8 weeks after A–D. Contrast-enhanced T1-weighted axial image shows ill-defined enhancing tumor. Five readers chose this image as having greatest bidimensional product in axial dimension. Other three readers chose total of two images.
Fig. 2G MRI findings in 53-year-old show example of tumor in which measurements of five readers indicated changes in product of bidimensional diameters that indicated progressive disease (i.e., ≥ 25% increase in size) and those whose measurements of three readers indicated stable disease (i.e., not indicating change in size < 25%). E–H were obtained 8 weeks after A–D. Image from coronal contrast-enhanced T1-weighted data set shows tumor to have ill-defined medial border.
Fig. 2H MRI findings in 53-year-old show example of tumor in which measurements of five readers indicated changes in product of bidimensional diameters that indicated progressive disease (i.e., ≥ 25% increase in size) and those whose measurements of three readers indicated stable disease (i.e., not indicating change in size < 25%). E–H were obtained 8 weeks after A–D. Contrast-enhanced T1-weighted sagittal image shows contrast-enhancing mass that is not well circumscribed. Three readers chose this image as having greatest bidimensional product in axial dimension. Other five readers chose total of three images. Among eight readers, change in maximal bidimensional between MRI examinations ranged from –2% to 64%. Four readers considered tumor progression present.
Despite published findings indicating the importance of initial tumor size to prognosis, the importance of serial changes in tumor size is a matter of debate. Furthermore, interobserver variability confounds the issue for brain tumors and non-CNS tumors alike. The medical literature on measurement of lung tumors is informative on these issues. For instance, one recent study [5] showed substantial interobserver variability in manual measurements of lung tumor size on CT scans. Another recent study of lung tumors [6] showed no correlation between early changes in tumor size (measured with Response Evaluation Criteria in Solid Tumors) and survival. The results of that study suggested that response criteria other than size would be more appropriate for measuring response. Although those studies involved a different tumor type and a different response criterion than those used in our project, the findings raise concerns about the utility of tumor size measurements in making decisions about therapy.
The Macdonald criteria are one of the more common methods of measuring brain tumor size [1]. These criteria have been used in a large number of clinical trials since their introduction in 1990. Although some investigators have noted that, in the future, assessment of tumors would optimally combine anatomic measurements with assessments of physiologic change by techniques such as perfusion imaging and molecular imaging, at present at most large brain tumor centers these techniques are not routinely used to evaluate therapeutic response [7]. Conventional contrast-enhanced MRI remains the standard imaging technique for determination of tumor response. It therefore is appropriate to consider the question of test performance of measurements made on electronic images.
In one study of interobserver variation in brain tumor measurements on electronic images (similar to our study), the investigators found substantial interobserver variation [8]. Unlike in our study, in that investigation tumors were measured at one time point. Furthermore, tumor dimensions were not measured directly in three planes, but the measurement in the third dimension was inferred from the number of slices on which contrast-enhancing tumors were seen. In that study, readers differed in 2D measurements of contrast-enhancing tumors 29% of the time. As those authors noted, this finding is likely to affect determinations of tumor progression in many instances if different readers measure tumor dimensions on different serial images.
Authors of a previous study [1] indicated the limitations of brain tumor measurements on CT and MR images, including irregular shape of lesions and variations in experience and skill of observers. In that study, interobserver variability regarding CT and MRI tumor measurements in the assessment of progression according to the Macdonald criteria was measured for a diverse group of five clinicians that included two neuroradiologists, a neurosurgeon, a neurologist, and a radiation oncologist. Consensus agreement about tumor response was reached for only 35% of CT studies and 45% of MRI studies. We attempted to limit variation in degree of experience among readers by having only neuroradiology fellows, all with the same level of training, perform measurements. We attempted to take irregularity of shape of enhancing tumors into account by using images obtained in three orthogonal planes. In the previous study [1], readers measured tumors on printed film by referring to the centimeter scale on the hard-copy images. Such a technique would be expected to involve some measure of imprecision, depending on the method by which diameters are compared against the centimeter scale. To eliminate this confounding feature, rather than relying on the printed scale, we used electronic images and electronic calipers that generated measurements when the calibers were placed. We performed only measurements of interobserver reproducibility. Assessment of repeated measurements by the same observers might have shown improved performance characteristics compared with measurements performed by different observers.
On the basis of the study design, we hypothesized that a higher rate of consensus agreement would be reached in our study compared with a previous study [2]. Instead, the consensus among all readers in our study as to tumor stability or progression (45%) was the same as in the previous study. Full consensus was reached only on the cases in which the measurements indicated the presence of stable disease and not in cases in which the measurements indicated disease progression. Furthermore, in the cases of approximately 20% of the patients, the readers were almost evenly divided between judgments, three readers disagreeing with the other five readers. This lack of improvement in degree of consensus agreement might have been due to factors in study design but also may represent inherent difficulties in measurement of tumors according to the Macdonald criteria. With that in mind, we analyzed not variance due to reader but residual variance.
The results with our variance components model indicated that variance due to reader was small but that the residual variance component was large. In a previous study of brain tumor measurements using Macdonald criteria [2], the investigators reported an intra class correlation coefficient of 0.64, compared with a value of 0.80 in our study. Although we found a higher intraclass correlation coefficient, these findings nonetheless indicate that repeated measurements in the same case are variable even with the same reader measuring the tumor dimensions. In essence, these findings indicate a limitation of the precision of the Macdonald criteria for measuring tumor size and for indicating tumor progression. The measurement variability in our study suggests a 14% false-positive rate for indicating progression when brain tumors are, in fact, stable.
Because of the relatively high false-positive rate we found inherent in the measuring technique and the various morphologic features assessed, we estimated various false progression rates associated with two other criteria for tumor progression. Based on the fitted model used in our analysis, changing the criterion for progression from a 25% increase to a 30% increase in bidimensional product would result in a much lower false-positive rate of 6%. Furthermore, redefinition of progressive disease as a 50% increase in maximum pair product would be expected to result in a false progression rate of 2%. Careful analysis of the effect on sensitivity and specificity, with a valid reference standard, clearly is needed before any such change can be considered.
In our study, we found only moderate reproducibility between readers in the measurements of diameters and weak agreement on determination of lesion progression even though the readers were at a similar level of relatively advanced training. The variance analysis of our study did not indicate tendencies of individual readers to consistently measure tumors as either larger or smaller than the others did. Thus the moderate degree of reproducibility in our study likely cannot be ascribed primarily to attributes of the specific readers.
That many tumors in our series were irregularly contoured and some had infiltrative margins likely added to difficulty in reproducibly measuring tumor size. It was difficult to determine the degree to which postoperative changes, which also contributed to irregular tumor margins, influenced tumor measurements. The diameter method of tumor measurement is based on the assumption that a solid ellipsoid accurately characterizes tumor shape in most cases. However, the complex shape of many tumors and the presence of a necrotic region do not make them amenable to consideration as a solid ellipsoid, thus the tumors are not easily represented by a single set of diameter measurements. For instance, in one study of the use of diameter measurements of tumors [9], 85% of tumors were best characterized by more than one set of diameter measurements. The second set of cross-sectional measurements was most commonly needed because of the presence of a necrotic portion the size of which had to be subtracted from the larger set of cross-sectional diameters for accurate representation of tumor volume. In that study, however, even in the case of solid tumors, the assumption of an ellipsoid shape appeared to increase the variance in tumor size measurements.
Our findings indicate that in our patient sample, the very nature of the tumor measurement method produced a high rate of false-positive readings on stable tumors, indicating that other techniques of measurement would be preferable. A number of methods of tumor measurement other than the Macdonald criteria have been proposed. For instance, measurement of tumor volume by tracing tumor margins with manual or semiautomated methods is an alternative to the cross-sectional diameter technique [10]. In one study [9], investigators found that use of a computer-assisted planimetric technique of measuring tumor perimeters decreased intrareader and interreader variability compared with computation of a volume based on three orthogonal diameter measurements by readers. Because we did not have ready access to a volumetric software program, we did not compare manual linear measurements with semiautomated volumetric measurements. It is possible that computational methods will be valuable in improving tumor measurements. To be truly valuable, however, those techniques will have to be inexpensive, easy to use, and widely available to clinical radiologists and oncologists.

Acknowledgments

We thank the following individuals for contributions to this study: David Reardon, Robert Case, Christopher Cepeda, William Delfyett, Geentali Kapoor, Ian Kurth, Ronaldo Lessa, Felix Lin, and Timothy Waltner.

Footnotes

Address correspondence to J. M. Provenzale.
WEB
This is a Web exclusive article.

References

1.
Macdonald DR, Cascino TL, Schold SC Jr, Cairncross JG. Response criteria for phase II studies of supratentorial malignant glioma. J Clin Oncol 1990; 8:1277 –1280
2.
Vos MJ, Uitdehaag BMJ, Barkhof F, et al. Interobserver variability in the radiological assessment of response to chemotherapy in glioma. Neurology 2003; 60:826–830
3.
Simon JM, Cornu P, Boisserie G, et al. Brachytherapy of glioblastoma recurring in previously irradiated territory: predictive value of tumor volume. Int J Radiat Oncol Biol Phys 2002; 53:67 –74
4.
Grant R, Walker M, Hadley D, et al. Imaging response to chemotherapy with RMP-7 and carboplatin in malignant glioma: size matters but speed does not. J Neurooncol 2002; 57:241–245
5.
Macpherson RE, Higgins GS, Murchison JT, et al. Non-small-cell lung cancer dimensions: CT–pathological correlation and interobserver variation. Br J Radiol 2009; 82:421–425
6.
Birchard KR, Hoang JK, Herndon JE Jr, Patz EF Jr. Early changes in tumor size in patients treated for advanced stage nonsmall cell lung cancer do not correlate with survival. Cancer 2009; 115:581–586
7.
Perry JR, Cairncross JG. Glioma therapies: how to tell which work? J Clin Oncol 2003; 21:3547 –3549
8.
Hayward RM, Patronas N, Baker EH, Vézina G, Albert PS, Warren KE. Inter-observer variability in the measurement of diffuse intrinsic pontine gliomas. J Neurooncol 2008; 90:57–61
9.
Sorensen AG, Patel S, Harmath C, et al. Comparison of diameter and perimeter methods for tumor volume calculation. J Clin Oncol 2001; 19:551 –557
10.
Joe BN, Fukui MB, Meltzer CC, et al. Brain tumor volume measurement: comparison of manual and semiautomated methods. Radiology 1999; 212:811–816

Information & Authors

Information

Published In

American Journal of Roentgenology
Pages: W515 - W522
PubMed: 19933626

History

Submitted: February 21, 2009
Accepted: July 17, 2009

Keywords

  1. brain
  2. interobserver variability
  3. measurements
  4. tumor

Authors

Affiliations

James M. Provenzale
Department of Radiology, Duke University Medical Center, Durham, NC 27710.
Departments of Radiology, Oncology, and Biomedical Engineering, Emory University School of Medicine, Atlanta, GA 30322.
Claro Ison
Department of Radiology, Duke University Medical Center, Durham, NC 27710.
David DeLong
Department of Radiology, Duke University Medical Center, Durham, NC 27710.

Metrics & Citations

Metrics

Citations

Export Citations

To download the citation to this article, select your reference manager software.

Articles citing this article

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share on social media