|
|
||||||||
1 Division of Neuroradiology, The Johns Hopkins Medical Institutions, 600 N.
Wolfe St., Baltimore, MD 21287-7619.
2 MR Perception Laboratory and the F. M. Kirby Center for Functional Magnetic
Resonance Imaging of The Kennedy Krieger Institute, Baltimore, MD
21287-7619.
3 Present address: Department of Radiology, Hospital of the University of
Pennsylvania, 3400 Spruce St., Philadelphia, PA 19104.
4 Department of Biostatistics, The Johns Hopkins School of Public Health,
Baltimore, MD 21287-7619.
Received April 9, 2002;
accepted after revision July 11, 2002.
Address correspondence to E. R. Melhem.
Abstract
|
|
|---|
MATERIALS AND METHODS. We generated T2-weighted brain MR images (TR/TE, 4000/80) with simulated hyperintense lesions derived from a real multiple sclerosis plaque. The size of the original multiple sclerosis lesion was varied by scaling up or down the lesion using a bicubic interpolation method. Three hundred seventy-eight composite images, in which two T2-weighted images containing lesions were paired, were presented to three equally trained neuroradiologists to define thresholds below which changes in original lesion size could not be detected. Stepwise logistic regression was used to evaluate the dependency of size thresholds on the original size of the lesion.
RESULTS. Thresholds ranged from a 5% to 15% increase in the original lesion diameter. For increases greater than 15%, all three reviewers detected the change in lesion size irrespective of the diameter of the original lesion. There was a dependency of the threshold on the diameter of the original lesion (p = 0.02).
CONCLUSION. Using an MR simulator, we can define thresholds below which changes in original lesion size cannot be reliably detected. These results may guide the design of clinical trials that rely on trained reviewers to assess change in lesion burden.
|
|
|---|
Assessment of change in brain lesion load can be achieved using qualitative (visual inspection by trained reviewers) [16, 18] or quantitative [7, 8, 19, 20] methods. Quantitative methods can be further divided into fully automated methods and semiautomated methods that rely on manual outlining of lesions by trained reviewers followed by computer-based quantification [7]. However, the fully automated methods continue to require validation against the current standard of manual outlining by trained reviewers [7].
Using simulated T2-hyperintense brain lesions placed in computer-generated T2-weighted brain MR images, we sought to define thresholds below which trained reviewers cannot detect changes in lesion size. We hypothesized that size thresholds do exist, that size thresholds are dependent on the original size of the lesion, and that size thresholds vary with the reviewer.
|
|
|---|
A mixed multiecho, spin-echo, and inversion recovery MR sequence was used to obtain images of the brain of a 40-year-old consenting male volunteer at the level of the lateral ventricles. The mixed sequence simultaneously provided two image data sets: eight spin-echo images (TR, 1500; 1 signal acquired) with different TEs (20, 40, 60, 80, 100, 120, 140, 160) and eight inversion recovery images (TR, 2000; inversion time, 400 msec; 1 signal acquired) at the same TEs. Image slice thickness was 5 mm, inplane resolution was 0.80 x 0.86 mm (rectangular field of view, 165 x 220 mm; scan matrix, 205 x 256), and acquisition time was 9 min 30 sec.
Brain-Image Simulation
The generation of normal brain images involved two steps: First, we
generated pixel-by-pixel T1-relaxation, T2-relaxation, and proton density
(
) brain maps (256 x 256 matrix) at the level of the lateral
ventricles online (software release 6.2, Philips Medical Systems) using the
image data sets from the mixed MR sequence. Second, we generated images
simulating T2-weighted MR sequences offline (Enterprise 5500; Sun
Microsystems, Mountain View, CA) using T1-relaxation, T2-relaxation, and
proton density pixel values from the corresponding maps.
We developed an image-simulation software using Interactive Data Language
(Research Systems, Boulder, CO). Each pixel value S(x, y) of
the generated image was calculated using the following equation:
![]() |
We derived T1(x, y), T2(x, y), and
(x, y)
from the corresponding pixel values of the T1-relaxation, T2-relaxation, and
proton density maps, respectively. We selected parameters for the T2-weighted
MR sequence on the basis of values typically used in clinical brain imaging:
TR/TE, 4000/80.
Lesion Simulation
Using the same mixed multiecho, spin-echo, and inversion recovery MR
sequence described previously, we obtained images of the brain of a
32-year-old consenting woman with a known multiple sclerosis lesion in the
left centrum semiovale. We derived T1(x, y), T2(x, y), and
(x, y) for all the pixels in the multiple sclerosis lesion from
the corresponding pixel values of the T1-relaxation, T2-relaxation, and proton
density maps, respectively (Fig.
1). Only pixels with T1, T2, and
values two standard
deviations or more above the average pixel values in the corresponding
normal-appearing white matter (right centrum semiovale) were included in the
multiple sclerosis lesion.
|
Nine simulated original lesions with different diameters ranging from 4 to 12 mm by increments of 1 mm were generated on the basis of the pixel values of the multiple sclerosis lesion. We enlarged original lesions 20 times by increments of 5%, resulting in a group of 21 corresponding lesions in which their diameters were equal to: X + X · I · 0.05 (X is the diameter of the original lesion, and I is an integer ranging from 0 to 20 by increments of 1) (Fig. 2A,2B,2C,2D,2E,2F).
|
|
|
|
|
|
We varied the size of the original multiple sclerosis lesion by scaling up or down the lesion using a bicubic interpolation method, allowing for a subsampling scaling by steps of 0.20 mm. The interpolated contour of the lesion was determined by embedding the lesion in the brain using a 5% threshold. Each of the simulated lesions was placed in the same location (white matter of the right parietooccipital region) on a separate T2-weighted MR image.
For the sake of comparison, composite images in which T2-weighted MR images containing lesions of one group were paired side by side with the image containing the original lesion of that group. The image containing the original lesion was placed on the left of the composite image 20 times and on the right 20 times. In addition, two extra composite images, in which two identical images containing the original lesion were placed side by side, were generated. This process resulted in a total of 42 composite images for each of the nine groups.
Image Presentation
Three neuroradiologists of equivalent training (instructor level) evaluated
each of the 378 composite images in the same randomized order. None of the
reviewers were involved in any other aspect of the study. The reviewers
received instructions that included the aim of the study, presentation of
sample images with simulated lesions, and a 5-min practice session immediately
preceding the reviewing session. They were also informed that a total of 378
composite images would be shown to them in one session, that images would be
presented in random order, and that the reviewing time for one image should be
minimized and must not exceed 20 sec.
Reviewers examined one composite image at a time on a 21-inch (53 cm)
monitor (viewable composite image size, 16 x 12 inches [40.64 x
30.48 cm], 1600 dots [horizontal] x 1200 lines [vertical]). The window
width and level were adjusted across images, using an Interactive Data
Language routine based on the equation:
![]() |
Reviewers scored each of the composite images on the basis of a 3-point scoring system: 0, simulated lesion in image on left is equal in size to lesion in image on right; 1, simulated lesion in image on left is larger than lesion in image on the right; and 2, simulated lesion in image on left is smaller than lesion in image on right.
One reviewer reevaluated each of the 378 composite images for assessment of intrarater reliability after a 7-week interval, as previously described.
Data Analysis
Thresholds below which at least one reviewer assigned the incorrect score
(false-negative and false-positive reviews) were identified.
Stepwise logistic regression was used to evaluate the dependency of size thresholds on the original size of the lesion, on the side (whether in the composite image, the image containing the original lesion was placed on the left or on the right), and on the reviewer. The variables used for the model were the reviewers' correct or incorrect responses (binary independent variable), the diameter of the original lesion (continuous dependent variable), the side (binary dependent variable), and the reviewer (categoric dependent variable). A p value of less than 0.05 was considered statistically significant.
The kappa statistic was used for pairwise evaluation of inter- and intrarater reliability.
|
|
|---|
Thresholds ranged from 5% to 15% increase in the original lesion diameter. For increases greater than 15%, all three reviewers assigned the correct score irrespective of the diameter of the original lesion.
Stepwise logistic regression showed a dependency of the threshold on the diameter of the original lesion (p = 0.02) with lower thresholds found at a larger original lesion diameter (Fig. 3). A weak but significant difference in the threshold was also found among the three reviewers (p = 0.03). No difference was shown based on the side (p = 0.85).
|
Because of complete agreement among the three reviewers for increases
beyond 15%, the kappa statistic for evaluation of inter- and intrarater
reliability was performed only for lesions with diameters that were 0-15%
larger than the original lesions. For interrater reliability, there was
excellent agreement between reviewers 1 and 2 (
= 0.77) and reviewers 2
and 3 (
= 0.73) and good agreement between reviewers 1 and 3 (
=
0.56). For intrarater reliability, there was excellent reviewrereview
agreement (
= 0.92).
|
|
|---|
Qualitative assessment (visual inspection by trained reviewers) and manual outlining of brain lesions remain integral steps for lesion load assessment in clinical trials [7, 8]. Our results show that trained reviewers can reliably detect a 15% or more increase in diameter for all original lesion sizes and a 10% or more increase in diameter for original lesions greater than 6 mm. For increases below these thresholds, there is good to excellent agreement among reviewers. This information may guide the design of clinical trials that rely on trained reviewers to assess changes in lesion burden on MR images. For example, investigators should be aware that the effectiveness of a specific drug designed to stop the progression of brain disease can only be questioned when the increase in lesion diameter exceeds the defined threshold. Our results also show that despite good to excellent agreement among reviewers (kappa statistic), the stepwise logistic regression showed a weak but significant difference in the threshold among the three reviewers (p = 0.03). This finding should caution investigators not to rely simply on the level of training when choosing reviewers for their trials but to test the interrater reliability of reviewers before recruitment using tools such as an MR simulator.
An interesting observation made from this study is the ability of the trained reviewers to detect increases in lesion diameter below the spatial resolution of the image. For example, a 10% increase in the diameter of a 6- or 7-mm lesion is below the in-plane spatial resolution of the image (0.80 x 0.86 mm). Although not addressed by the study design, this observation is probably due to the ability of trained reviewers to interpret increases in edge-pixel signal intensity as an increase in lesion size [28]. The bicubic interpolation method used in this study assigns greater signal intensity in the edge pixels for subpixel increases in diameter and defines a lower limit of 0.20 mm for detectable change. This new lower limit imposed by the interpolation method precludes us from studying original lesions with diameters of less than 4 mm.
Unlike previous studies that rely on phantoms [29], our study aimed to introduce an MR simulator containing more realistic-appearing simulated lesions based on a real multiple sclerosis lesion with the flexibility of assigning individual lesion size. Using this simulator, we could evaluate the effect of lesion number and location and the type of MR pulse sequence (T2-weighted vs fluid-attenuated inversion recovery vs proton densityweighted) on reviewers' sensitivity, specificity, and reliability [30]. In the future, this simulator may help assess the reliability of a reviewer's outlining the lesions before and during a clinical trial that uses brain imaging as a surrogate marker. We also envision this simulator being used to determine accuracy and reliability as well as to establish minimal performance standards for fully automated computer-based methods designed to quantify lesion load.
Our method for calculating T1, T2, and proton density maps from image data generated by a multiecho spin-echo sequence interleaved with a multiecho inversion recovery MR sequence, is based on the ratios and least squares algorithm [31]. This method has been validated with phantoms and human volunteers, and the calculated T1, T2, and proton density values are in agreement with values obtained using spectrometry [31].
Potential limitations of the simulator include the use of an approximate solution to the Bloch equation to generate the weighted images. A more exact solution includes a denominator (1 + e-(TR/T1) e-(TR/T2)) [32, 33]. The approximate solution is used in our simulations because the effect of the denominator on the signal intensity from brain parenchyma (gray and white matter) and simulated lesions is less than 1% on the T2-weighted MR images. Furthermore, using the approximate solution simplifies the calculation of the images and simulated lesions.
Because our simulation was based on conventional spin-echo sequences, we were unable to study the effects of fast imaging techniques, such as rapid acquisition with relaxation enhancement or echoplanar readouts, on the point-spread function of the simulated lesions and hence on the ability of trained reviewers to detect changes in lesion size [24, 34,35,36,37].
In conclusion, using an MR simulator containing simulated lesions based on a real multiple sclerosis lesion, we could define thresholds below which trained reviewers could not detect changes in lesion size. We also showed that these size thresholds are dependent on the original size of the lesion and vary among equally trained reviewers. Our results may guide the design of clinical trials that rely on trained reviewers to assess change in lesion burden on MR images.
|
|
|---|
This article has been cited by other articles:
![]() |
J. H. Woo, L. P. Henry, J. Krejza, and E. R. Melhem Detection of Simulated Multiple Sclerosis Lesions on T2-weighted and FLAIR Images of the Brain: Observer Performance Radiology, October 1, 2006; 241(1): 206 - 212. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Pikus, J. H. Woo, R. L. Wolf, E. H. Herskovits, G. Moonis, A. F. Jawad, J. Krejza, and E. R. Melhem Artificial Multiple Sclerosis Lesions on Simulated FLAIR Brain MR Images: Echo Time and Observer Performance in Detection Radiology, April 1, 2006; 239(1): 238 - 245. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |