|
|
||||||||
Original Research |
1 Department of Radiology, University Hospital of Rouen, Irue de Germont, Rouen
Cedex F-76031, France.
2 LITIS Laboratory, School of Medicine and Pharmacy, University of Rouen, Rouen,
France.
Received August 24, 2007;
accepted after revision March 28, 2008.
Address correspondence to J. N. Dacher
(jean-nicolas.dacher{at}univ-rouen.fr).
Abstract
|
|
|---|
MATERIALS AND METHODS. The true volumes of 20 animal kidneys of various sizes were obtained by fluid displacement. Each kidney was examined using two different MR units. Three-dimensional proton density–weighted acquisitions with an incremental slice thickness were performed. The MR volume was then measured with a segmentation algorithm based on the belief functions theory. Two independent observers performed all segmentations twice. Accuracy, intraobserver variability, and interobserver variability were evaluated by the Bland-Altman method. The number and type of manual corrections were recorded as well as the entire processing time.
RESULTS. The mean renal volume estimated by fluid displacement was 114 mL (range, 38–224 mL). With regard to the renal volumes obtained from assessments of adjacent axial MR images, the maximal SDs of the difference were 2.2 mL (accuracy), 0.6 mL (intraobserver variability), and 1.8 mL (interobserver variability). Segmentation of axial slices provided better accuracy and reproducibility than coronal slices. Overlapped coronal slices yielded poor results because of the partial volume effect. The mean processing time including optional manual modifications was less than 75 seconds.
CONCLUSION. The belief functions theory can be considered an accurate and reproducible mathematic method to assess renal volume from MR adjacent images.
Keywords: genitourinary imaging kidney disease MR technique MR urography renal function assessment renal volumetry
|
|
|---|
Furthermore, previous studies have shown that MR urography could determine other functional parameters such as renal blood flow and single-kidney glomerular filtration rate, both of which require normalization to renal volume [10–12].
In previously reported studies, renal segmentation was manually performed after a threshold [13, 14] or nonthreshold [15–19] stage. Although new segmentation techniques have been applied to the kidney, few studies have evaluated their accuracy and repeatability [13–16].
The 3D segmentation algorithm evaluated in the present study is based on the belief functions theory, which permits managing imprecise and uncertain information such as partial volume effect and noise [20–23]. This theory represents a connection between fuzzy reasoning and probability. An imaged organ is generally composed of connected voxels sharing similar characteristics, such as the gray level which was analyzed in this study. The aim of our research was to assess the accuracy and repeatability of this algorithm in calculating renal volumes from ex vivo MR images.
|
|
|---|
Animal Kidneys and Standard Volume Measurements
The study was performed in nine lamb and 11 pig kidneys obtained from an
abattoir. The hilar structures were extensively removed. Intrasinusal cavities
were flat. The kidneys were first soaked in a basin filled with 0.9% saline
for 2 hours to expand to their fullest volume capacity. Volumes were then
measured by a technician not involved in the MR measurements. The kidney
volume was considered equal to the volume of the displaced fluid
[15]. The final result, used
as the reference kidney volume, was the average of four successive
measurements.
The kidneys were prepared for MRI. They were placed in suspension in fat in a plastic container to simulate normal perirenal tissue. Fat was composed of a mixture of sunflower oil and hydrogenated refined copra oil so that the mixture was liquid when heated and solid at room temperature. A first layer of heated fat was placed in the container and was then refrigerated. Once the first layer of fat was solid, four kidneys were placed on the fat layer. The container was then filled with liquid fat and refrigerated again.
MR Acquisitions
MR acquisitions were performed at room temperature within 6 hours after
reference measurements. Fat-suppressed gradient-echo proton
density–weighted acquisitions were performed. Three-dimensional
acquisitions of each kidney were performed using an incremental section
thickness on the two MR units available at our institution: a 1-T unit
(Gyroscan NT, Philips Healthcare) and a 1.5-T unit (Symphony, Siemens Medical
Solutions). All 20 kidneys were scanned in both the 1- and 1.5-T magnets.
The MR acquisition parameters are summarized in Tables 1 and 2. Coronal and axial acquisitions were performed and the signal was obtained via an abdominal phased-array coil. Sensitivity encoding (1.6 factor) was used for all acquisitions performed using the 1-T unit. A 4-mm overlapped sequence with an actual 8-mm slice thickness, available only on the 1-T unit, was also tested. Neither parallel imaging nor overlapped sequences were available on the 1.5-T equipment at the time of the experiment.
|
|
Segmentation Method
To label a given voxel, its gray level was evaluated as well as those of
the adjacent voxels in the same slice and in the two adjacent slices above and
be low (3D segmentation). Data fusion per mitted the algorithm to decide
whether the assessed voxel was within or beyond the limits of the organ. The
segmentation process is fully explained in Appendix 1.
|
|
After a stack of images was accessed, the first step was to limit the volume of interest. The observer indicated which images were the first and last images showing a visible part of kidney. A stack of images was those determined to be between the first and last images. Then, a rectangle was drawn on the image from the middle of the stack that showed the kidney in its largest dimensions to encompass the kidney on all the selected images. This rectangle was automatically copied on all images in the stack for automatic segmentation.
After segmenting, a visual quality control check was performed. Manual correction of the segmented region of interest (ROI) on each image is feasible at this stage. Last, the renal volume was automatically calculated by a voxel-count method.
The MR kidney volumes were measured independently by two observers: a radiologist with 3 years of experience in MR urography and a computer analyst who participated in the research. Each stack of images was evaluated on four separate occasions by each observer (Fig. 2). Manual corrections of segmented images were allowed for only two of the four measurements. The obtained kidney volume, number of manual corrections performed when allowed, and processing time including the manual selection of the image volume to feed in the software were routinely recorded in the computer database. The whole processing time was automatically recorded by means of an additional plug-in. The time count started when the image stack was opened. The observers were blinded to the reference volumes, their results, and the results obtained by the other observer.
Quantitative and Statistical Analyses
Water displacement measurements of renal volume—The
reproducibility of measurements was assessed by the Bland-Altman method
[26]. For each kidney, the
mean of the four results was calculated. One of these four results was
randomly chosen and was subtracted from the mean. Intraobserver variability
was calculated as follows: For each kidney, two of the four measurements were
randomly chosen, and then one was subtracted from the other. The mean
difference and the SD of the difference (SDD) were calculated. Results were
plotted on graphs showing the 95% limits of agreement (mean difference
± 2 SDDs).
MR measurements of renal volume—The accuracy, intraobserver variability, and interobserver variability of the MR measurements were also assessed by the Bland-Altman method. The accuracy was obtained from the errors in measurements due to the algorithm. For each kidney, the error was defined as the difference between the reference volume and the first MR measurement of each observer. This calculation was performed for each MR unit, for all types of acquisition (coupling scan plane and thickness), and for manually modified versus nonmodified segment ation. For all kidneys (n = 20), the mean difference and SDD were calculated as previously described.
To assess intraobserver variability, the difference between the first and the second measurements of each observer was calculated for each kidney. To assess interobserver variability, the difference between the first measurements of both observers was calculated for each kidney. Bland-Altman calculations were performed using Microsoft Office Excel 2003.
|
To compare the results of the present study with those of previously reported series [13, 14], the volumes obtained from 3-mm-thick MR coronal slices (same thickness in comparators) were also plotted against reference volumes using a linear regression to obtain the coefficient of determination R2.
|
|
|---|
MR-Calculated Volumes
An experiment was used to explain how the limits of agreement were
determined by the Bland-Altman technique
(Fig. 3). No influence of
kidney volumes on the SDD values could be found in any of the experiments.
Accuracy, intraobserver variability, and interobserver variability graphs are shown in Figures 4, 5, and 6, respectively. The widest intervals were reported for each section thickness. For example, the accuracy-related SDDs obtained from 4-mm-thick adjacent axial images without manual modifications were 1.07 mL for observer 1 (Fig. 3) and 1.05 mL for observer 2 on the 1-T unit and 1.75 mL for observer 1 and 2.1 mL for observer 2 on the 1.5-T unit. Therefore, the 2.1-mL SDD defined the limits of agreement (4 x SDD = 8.4 mL; interval corresponding to –3 to 5.4 mL [Fig. 4]). All 95% limits of agreement spanned the axis of zero.
|
|
|
The mean number of manually corrected images per stack is shown in Figure 7. Image stacks were composed of 17–75 images depending on the scanning plane and section thickness. The processing time was always less than 75 seconds including manual corrections. Axial images required fewer modifications than coronal images (p < 0.05) for each section thickness. For axial acquisitions, approximately one image per stack was modified. In the coronal plane, the mean number of modified images per stack did not differ significantly regardless of the 2-, 3-, or 4-mm section thickness that was chosen. Conversely, the number of modifications was significantly higher for 5-mm-thick slices and 4-mm-thick overlapped slices (p < 0.02) than for the other section thicknesses.
|
R2 was equal to 0.99 for both observers and for each trial when assessing the accuracy of the measurements in 3-mm-thick coronal MR images.
|
|
|---|
Fresh animal kidneys were used for our study rather than physical objects. This choice was made because the specific shape and surface-to-volume ratio [28] of kidneys are prone to induce partial volume effect, which impacts the quality of segmentation. Lamb and pig kidneys were selected for their size to simulate the kidneys of children and adults, respectively [16, 29]. Water displacement was chosen as the reference method despite a 5-mL range (4 x 1.25 mL) of uncertainty.
Segmentation Algorithm
The segmentation algorithm was tested on two different MR units. Geometric
parameters (matrix, field of view) were defined on the basis of clinical
practice. Basically, axial and coronal scans were assessed with an incremental
slice thickness. The results obtained on one unit could not be compared with
those obtained on the other because of different reconstruction algorithms and
voxel sizes (Table 1). A 350-mm
asymmetric field of view commonly used to examine adults in routine practice
was chosen for all experiments. Because smaller voxels induce less partial
volume effect, results could be improved in cases of a reduced field of view
that is adapted to image pediatric patients rather than adults.
The overall evaluation of the segmentation method appeared satisfactory when processing adjacent slices. The quality of the results (accuracy and reproducibility) appeared to decrease with the thickness of slices in the coronal plane (Figs. 4, 5, 6). In contrast, slice thickness did not seem to influence the quality of segmentation in the axial plane.
Results obtained from axial images were better than those from coronal images. This finding is in agreement with previously published data [28]. In fact, the axial plane provides less partial volume effect because it is grossly perpendicular to the long axis of the kidneys. Otherwise, more slices were available for measurement (Table 2). Furthermore, in this ex vivo study, coronal slices could have been more affected by partial volume effect than in vivo due to the flattening of the kidneys. Whatever the slice thickness, it is noteworthy that the intraobserver variability of segmentation of axial MR slices remained lower than that of water displacement.
Interobserver SDD of adjacent segmented axial scans remained equal to or below 1.8 mL in the absence of manual modification. The impact of manual modifications was positive on interobserver variability when segmenting adjacent axial slices (interobserver SDD < 0.5 mL). In contrast, this impact was negative when segmenting adjacent coronal slices; then, interobserver SDD reached 3.3 mL with 4-mm-thick slices and as high as 5.4 mL with 5-mm-thick slices. Again, this apparent contradiction can be explained by the marked partial volume effect observed in the coronal plane. Results appear to indicate that the semiautomatic segmentation algorithm was more effective than human operators in cases of significant partial volume effect. Comparable impact of manual modifications was obtained in terms of intraobserver variability.
The overall results obtained from segmentation of coronal overlapped 4-mm slices were less satisfying. The SDD reached 10.5 mL for accuracy, 8.1 mL for intraobserver variability, and 5.7 mL for interobserver variability. In our experience, this type of acquisition should be avoided when assessing renal volume. In this situation, the algorithm, in fact, proved more accurate when used with no manual modification.
Several scientific congress communications have dealt with kidney segmentation in the past decade. Some have focused on the time–signal intensity curves of specific regions such as the renal cortex or medulla. In those studies, the accuracy and reproducibility of the segmentation procedures were not assessed on multiple models. Furthermore, the kidney volume was not evaluated.
In contrast, the authors of some clinical studies have focused their attention on the assessment of kidney volume. Manual segmentation with [13, 14] or without [15–19] a preliminary threshold stage has been used. Manual segmentation has been shown to be time-consuming and produces relatively poor reproducibility. Semiautomatic segmentation (i.e., median filter, user-defined intensity thresholds, morphologic erosions and dilatations, and region growing steps) has also been used to determine renal volumes [7–9].
Few authors have evaluated the reproducibility of their segmentation technique. Among them, Bakker et al. performed two experiments—one in pigs [15] and another in humans [16]. As in our study, intraobserver variability and interobserver variability were obtained using the Bland-Altman method. Five-millimeter-thick images were acquired in the coronal plane. The SDD for intraobserver variability was 8.2 mL in pigs [15] and 7.3 mL in humans [16], as compared with 3.3 mL with manual modifications (0.6 mL without manual modification) in the part of our study testing the same slice thickness with an inferior spatial resolution. The SDD for interobserver variability were 6.4 mL in pigs [15] and 9.9 mL in humans [16] as compared with 5.4 mL with manual modifications (0.6 mL without manual modification) in our study.
|
|
The processing time was reported in the in vitro study by Bakker et al. [15]. It ranged from 5 to 8 minutes as compared with less than 75 seconds in the present series.
The main advantage of the belief functions theory is its ability to merge 3D information originating from adjacent voxels to delineate the kidney boundary. The algorithm is only semiautomatic because the first part of the process (i.e., definition of the volume of interest) is manual. Belief functions theory has also been used for other purposes such as segmentation of brain MR images [30] and thoracic and abdominal CT images [21]. The algorithm used in that research has been previously used to segment intrathoracic organs from CT images in the context of conformational external radiation therapy.
Despite the overall optimal results obtained with this segmentation method, manual corrections may be necessary (Fig. 7) and the software was designed to account for these adjustments. Different situations occurred in our study. For example, a common situation requiring manual modification was segmentation of intrasinusal fat (Fig. 8A, 8B). The segmentation was basically appropriate in the experiment, but it had to be modified because of the mode of comparison (water displacement), which included the remaining intrasinus fat. This slight overestimation of the renal parenchyma volume is a limitation of the reference method. To assess the method accuracy, MR measurement error was the MR volume subtracted from the water displacement volume. Because of sinus fat, the mean measurement error was always slightly positive for adjacent slices processed without manual modification (Fig. 4).
Our study has other limitations and cannot be immediately extrapolated to MR urography in humans. First, the study did not test the influence of respiratory motion. Moreover, MR urography is mainly used in children where motion artifacts can deteriorate image quality. Second, the investigated kidneys were not contrast-enhanced and their positioning was artificial. Third, a two-class algorithm (see Appendix 1) adapted to ex vivo kidneys was used, whereas in humans a three-class algorithm is obligatory to segment the kidney from perirenal fat, the urinary excretory tract, and adjacent organs. At least more than two observers would have been preferable to strengthen the results of the study.
Clinical Applications
For clinical use, a three-class algorithm was made available as a plug-in
in the public domain software ImageJ
[24,
25,
31,
32]. This tool can be freely
accessed at the National Institutes of Health Website
(http://rsb.info.nih.gov/ij/).
The program runs as a downloadable application on any computer with a 1.1 or
later version of Java Virtual Machine (Sun Microsystems). The algorithm
enables functional renal parenchyma volume assessment from adjacent
contrast-enhanced T1-weighted DI-COM images covering the entire kidneys. We
suggest performing the volume assessment sequence at the end of the dynamic
phase of MR urography. At that stage, the renal excretory system is usually
contrast-enhanced. We reinject IV the same low dose of gadolinium chelate as
that used for the dynamic study (0.025–0.05 mmol/kg)
[33–35]
(Fig. 9). The volume
assessment sequence is then acquired at the tubular phase 60 seconds after
this second injection. Renal parenchyma signal is then homogeneous and is
superior to that of renal vessels and adjacent organs and is inferior to that
of the enhanced urinary tract.
|
|
Acknowledgments
We thank R. Medeiros, Rouen University Hospital Medical Editor, for his
valuable advice in editing the manuscript; P. Vannoorenberghe, who initiated
the belief functions theory in the laboratory; and J. F. Menard, who
supervised the statistical analysis.
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |