|
|
||||||||
Original Research |
1 Seoul National University College of Medicine, Institute of Radiation
Medicine, Seoul National University Medical Research Center, Seoul, Korea; and
Department of Radiology, Seoul National University Bundang Hospital, 300
Gumi-dong, Bundang-gu, Seongnam-si, Gyeonggido, 463-707, Korea.
2 Max-Planck-Institut für Informatik, Department 4, Computer Graphics Bldg.
46.1, Rm. 227, Stuhlsatzenhausweg 85, 66123 Saarbrücken, Germany.
Received November 10, 2007;
accepted after revision December 24, 2007.
Address correspondence to K. H. Lee
(kholee{at}snubhrad.snu.ac.kr).
Abstract
|
|
|---|
MATERIALS AND METHODS. At reversible, 4:1, 6:1, 8:1, 10:1, and 15:1 Joint Photographic Experts Group (JPEG) 2000 compressions, we compared the artifacts in 20 matching compressed thin sections (0.67 mm), compressed thick sections (5 mm), and AIP images (5 mm) reformatted from the compressed thin sections. The artifacts were quantitatively measured with peak signal-to-noise ratio (PSNR) and a perceptual quality metric (High Dynamic Range Visual Difference Predictor [HDR-VDP]). By comparing the compressed and original images, three radiologists independently graded the artifacts as 0 (none, indistinguishable), 1 (barely perceptible), 2 (subtle), or 3 (significant). Friedman tests and exact tests for paired proportions were used.
RESULTS. At irreversible compressions, the artifacts tended to increase in the order of AIP, thick-section, and thin-section images in terms of PSNR (p < 0.0001), HDR-VDP (p < 0.0001), and the readers' grading (p < 0.01 at 6:1 or higher compressions). At 6:1 and 8:1, distinguishable pairs (grades 1-3) tended to increase in the order of AIP, thick-section, and thin-section images. Visually lossless threshold for the compression varied between images but decreased in the order of AIP, thick-section, and thin-section images (p < 0.0001).
CONCLUSION. Compression artifacts in thin sections are significantly attenuated in AIP images. On the premise that thin sections are typically reviewed using an AIP technique, it is justifiable to compress them to a compression level currently accepted for thick sections.
Keywords: abdominal imaging artifacts average intensity projection CT data compression human visual system image quality metric JPEG 2000
|
|
|---|
5 mm)
for archiving, filming, and interpreting purposes, although thin sections are
still needed for 3D techniques such as multiplanar reformation or volume
rendering [1,
4-6]. To obtain thick sections, some researchers average a series of contiguous thin sections (average intensity projection [AIP] or raysum) [2-6, 9, 10] instead of reconstructing thick sections directly from the raw projection data. The sliding-slab AIP technique [10] is also increasingly used to review thin sections quickly and efficiently [5, 10, 11], and this technique has been adopted as a default viewing mode in several commercial CT workstations. This technique renders overlapping AIP slabs of a desired thickness in a real-time manner as users slide the slab along a given viewing direction, creating the illusion of image-to-image continuity. This can improve image quality without an increase in radiation dose by canceling the noise across the source thin sections being averaged [4, 5] while the spatial resolution inherent in the thin sections is preserved. Because these advantages enable radiologists to capitalize on the improved imaging capabilities of isotropic voxel scanners, the sliding-slab AIP technique can potentially become a primary interpretation mode for thin-section CT data sets [2, 5, 10]. However, adopting this visualization technique in routine clinical practice necessitates increased cost and time for storage and transmission of the thin-section image data sets.
|
The purpose of this study was to assess the effects of compressing source thin-section abdominal CT images on the final transverse AIP images.
|
|
|---|
CT Scanning
A 64-MDCT scanner (Brilliance, Philips Medical Systems) was used with the
following parameters: detector collimation, 0.625 x 64 mm; gantry
rotation time, 0.42 second; tube potential, 120 kVp; and pitch, 1.078-1.173.
The area scanned ranged from the diaphragm to the symphysis pubis. We used
automatic tube current modulation, and effective mAs
[23] for the 20 images forming
our study sample ranged from 126 to 175 mAseff (mean ± SD,
149.6 ± 10.6 mAseff). The effective dose in the 20 sample
scans was estimated to be 7.9-9.9 mSv (8.8 ± 0.6 mSv).
From each raw projection data, two CT image data sets, one thick-section and another thin-section, were reconstructed. The thick sections had a section thickness of 5 mm and a reconstruction interval of 4 mm, whereas the thin sections had a section thickness of 0.67 mm and a reconstruction interval of 0.33 mm. All other reconstruction parameters—that is, image position along the x- and y-axes, field of view (262-358 mm), and reconstruction filter type (filter B)—were kept constant for these two image data sets.
Image Selection
By retrospectively reviewing the thick sections of 100 examinations
performed in early January 2007, an abdominal radiologist with 7 years of
clinical experience compiled 20 images showing common abnormalities
(Appendix 1). One image was
selected per patient. The selected patients were 12 men and eight women,
ranging in age from 22 to 88 years (mean, 50 years). The 20 selected images
included 14 images above the umbilicus and six images below the umbilicus.
Matching thin sections were also selected, each of which had the
z-axis position nearest to the selected thick section. The theoretic
difference in the image position between matching thin and thick sections did
not exceed 0.17 mm, half of the reconstruction interval of the thin sections,
and was regarded as unimportant in subsequent analyses.
|
Image Compression
Each image had a bit depth of 12 bits per pixel aligned on a 2-byte
boundary. Compression level was defined as the ratio of the original pixel
size (16 bits per pixel) to the compressed size in bits per pixel
[24]. Using a Joint
Photographic Experts Group (JPEG) 2000 algorithm (PICTools, version 2.00.543,
Pegasus Imaging), the thick and thin sections were compressed to six different
levels: a reversible level (as the negative control) and irreversible levels
of 4:1, 6:1, 8:1, 10:1, and 15:1. The JPEG 2000 encoder was set to default
settings: reversible 5-3 wavelet filter and irreversible 9-7 wavelet filter;
single tile; 6 levels of wavelet decomposition; code-block, 64 x 64;
size of precinct, 32,768 x 32,768; and a single layer.
AIP Images
The 5-mm-thick AIP images that matched the original or compressed thick
sections were obtained. Each AIP image was calculated by averaging 15
contiguous original or compressed thin sections within a 5-mm span centered at
the z-axis position of the matching thin section. Therefore, an AIP
image and its matching thin sec tion had the same image position. The
12-bit-depth of images was preserved in this averaging procedure.
Final Study Sample
The compressed-version images included, first, 120 compressed thick
sections (20 images x six compression levels); second, 120 matching
compressed thin sections; and, third, 120 matching AIP images generated from
the compressed thin-section data sets. These images were paired to their
original versions, yielding 360 image comparisons in subsequent analyses.
PSNR
After converting the images to 8-bit images by applying the default
abdominal window setting in our clinical practice (level, 20 H; width, 400 H),
we measured PSNR in decibels (dB), as follows: where RMSE is the
root-mean-square error
and

f(x, y) and g(x, y) are the pixel values in the original and compressed images, respectively.
|
|
|
|
|
|
|
Radiologists' Visual Analysis
Three body radiologists with 6, 7, and 8 years of clinical experience
participated as readers in this study. The 360 image pairs of original and
compressed versions were randomly assigned to 18 reading sessions, while
avoiding repetition of any patient in a session. The order of reading sessions
changed for each reader. Sessions were separated by a minimum of 2 weeks.
Each image pair was alternately displayed on a single monitor, and the order of the original and its compression was randomized. The reader selectively toggled between the two images, returning to the first image as desired. Each reader, blinded to the tested compression levels, independently determined if the two images were indistinguishable (grade 0) or distinguishable. Any perceived differences between the two compared images were regarded as perceptual artifacts. If an image pair was rated as distinguishable, the readers were asked to grade the image difference (or compression artifacts) as follows: grade 1, barely perceptible difference; grade 2, identifiable difference but the subtle artifacts would not affect clinical interpretation; or grade 3, significant difference and the artifacts would potentially affect clinical interpretation. Although we selected the test images depicting various abnormalities, the readers were asked not to confine their analysis to the pathologic findings. Instead, they were asked to examine an entire image to find any image difference, paying attention to structural details, particularly small vessels and organ edges, and to the texture of solid organs and soft tissues.
Images were displayed in a one-by-one format (1,483 x 1,483 pixels) using viewing software (Piview Star, SmartPACS), a calibrated [29] monochrome monitor (ME315, Totoku) with a diagonal display size of 52.8 cm, and matching video hardware (LV32P1, Totoku). The maximum and minimum luminance values were 398.3 and 0.42 cd/m2, respectively, and the ambient room light was subdued. Window level and width were fixed at the aforementioned window settings.
Image review was conducted at each reader's convenience without time constraints. Reading distance was constrained to a range of 44-71 cm by aiming a laser beam in front of each reader's forehead onto a ruler perpendicular to the monitor screen. The readers' habitual viewing distance had been measured during 30 minutes of their clinical work. Limiting the reading distance was meant to reproduce our clinical practice because reading at a too close or too far reading distance would artificially enhance or degrade the readers' sensitivity to the compression artifacts [30].
Visually Lossless Threshold
A visually lossless threshold (VLT) was defined as the highest compression
level that can be applied before a compressed image appears distorted
[31]. From the radiologists'
responses, the VLT range was estimated for each of the 60 original images (20
thin sections, 20 thick sections, and 20 AIP images). The three radiologists'
responses were pooled: If at least two readers rated an image pair as
indistinguishable (grade 0) at a given compression level, the VLT was regarded
as above that level; otherwise, it was regarded as below the level. Therefore,
the VLT of each image could lie in one of the following ranges: < 4:1,
4:1-6:1, 6:1-8:1, 8:1-10:1, 10:1-15:1, or > 15:1.
Artifact Pattern
A radiologist and a computer scientist with 4 years of research experience
in CT image compression together reviewed the original thick-section,
compressed thick-section, original-version AIP, and compressed-version AIP
images and focused on the differences in artifact patterns between the
thick-section and AIP images. They recorded any artifact patterns that were
regarded as specific to AIP images and different from the previously known
artifacts of wavelet compressions—blur and ringing
[32]. During the review,
annotations were toggled on to reveal the compression level and image type.
Close in spection and magnification were allowed. The remaining viewing
conditions were the same as those for the aforementioned radiologists' visual
analyses.
Statistical Analysis
A biostatistician participated in the study design and statistical analysis
using software (StatsDirect, version 2.5.6, StatsDirect). At each compression
level, the thin sections, thick sections, and AIP images were compared for
their artifacts measured with PSNR, HDR-VDP, and the radiologists' grading.
Friedman tests with post hoc tests were used with the p value
threshold of 0.05. For the radiologists' responses, we additionally compared
the percentage of distinguishable pairs (grades 1-3) using exact tests for
paired proportions [33], with
the p value threshold adjusted to 0.017 using Bonferroni correction.
The determined VLT ranges were compared using the Friedman test with a post
hoc test. Interobserver agreements were mea sured using weighted kappa
statistics for multiple readers
[34].
|
|
|---|
Regarding the radiologists' grading results, the kappa statistics for the AIP, thin-section, and thick-section images were 0.55, 0.64, and 0.51, respectively. As the compression level increased, the radiologists assigned higher grades to the artifacts. At each irreversible compression, the artifact grade tended to increase in the order of the AIP, thick-section, and thin-section images (Figs. 3, 4, and 5A, 5B, 5C). Statistically significant differences were observed at 6:1 or higher compressions (Table 1).
|
With regard to the radiologists' binary responses (distinguishable or indistinguishable), kappa statistics for the AIP, thin-section, and thick-section images were 0.75, 0.87, and 0.74, respectively. At reversible and 4:1 compressions, the percentage of distinguishable pairs (grades 1-3) did not differ because few image pairs were rated as distinguishable regardless of whether they were AIP, thin-section, or thick-section images. At 6:1 and 8:1 compressions, the percentage of distinguishable pairs tended to increase in the order of AIP, thick-section, and thin-section images. The statistical significance in each comparison was tabulated and is shown in Table 2. At a 10:1 compression, the statistical significance disappeared because most image pairs were rated as distinguishable regardless of whether they were AIP, thin-section, or thick-section images.
|
The VLT ranges determined by the pooled radiologists' binary responses were tabulated and are shown in Table 3. No image pair was distinguishable at a certain compression level and indistinguishable at a higher compression level in the pooled radiologists' binary responses, although such cases occurred sporadically in the individual radiologist's responses (in six [3.3%] of 60 x 3 image-radiologist combinations). Although there was a significant variation in the VLT range between the images within each image type (AIP, thick-section, or thin-section images), the VLT range decreased in the order of AIP, thick-section, and thin-section images (p < 0.0001).
|
The radiologist and the computer scientist who compared the artifact patterns between the thick sections and AIP images observed the difference in the magnitude of the artifacts, similar to the three radiologists' visual analyses. Otherwise, any notable difference in the artifact pattern was not identified between the AIP and thick-section images. The artifact pattern in an AIP image was generally regarded as the same as that in its matching thick section at the same or at the nearest lower compression level. As compression levels gradually increased, blurring artifact became clearer in the AIP images.
|
|
|---|
The dilemma between image noise and radiation dose can be resolved using a sliding-slab AIP technique, which improves image quality without an increase in radiation dose by canceling out the noise across the source thin sections being averaged [4, 5]. In addition, the through-plane spatial resolution in the source thin sections is almost preserved when users slide a slab along a given viewing direction. Because overlapping AIP slabs are rapidly rendered in a real-time manner, a large thin-section data set can be reviewed quickly and efficiently [5, 10] once the data set is transmitted to a reviewing workstation.
To the best of our knowledge, there is no legal requirement that radiologists should archive, distribute, and interpret the thin-section data set in every examination; policies vary among radiologists and institutions. Nevertheless, it might be ideal to acquire and interpret most abdominal CT scans as thin as reasonably achievable, thereby capitalizing on the improved imaging capabilities of isotropic voxel scanners. In realizing this ideal practice, given that the sliding-slab AIP technique mitigates the conflict between the image noise and radiation dose, the remaining issue in data handling is how to archive and transmit the large thin-section data sets. This data overload is not a negligible issue. According to a report [1] from an institution where 2-mm-thick images from 16-MDCT scanners were routinely archived, the thin-section images were one third of the total image archive. Image compression is likely a promising solution to cope with this data explosion; however, few investigations on compression guidelines for these source thin-section images have been performed.
In our results, compression artifacts in the source thin sections were significantly attenuated in the AIP images, which is consistent in the mathematical (PSNR), simulated perceptual (HDR-VDP), and real perceptual (radiologists) analyses. This finding suggests that thin sections can be compressed to a higher level if the image fidelity matters in the final AIP images rather than in the source thin sections. These results might be attributable to the fact that averaging the compressed thin sections can cancel out compression artifacts in the individual thin sections across images. This is analogous to the fact that image noise in thin sections is reduced by averaging [5, 10], yielding better image quality in AIP images.
Our results also showed that although no difference in the artifact pattern was noted, the artifact magnitude was smaller in the AIP images than in the thick sections despite their identical nominal section thickness. The reason for this difference is not clear: the effects of compressing thin sections on the final AIP images are complicated by the averaging process and by the compression itself. With regard to the difference between the thick sections and the AIP images, our observations might be possibly limited to our experimental settings and might not be generalized to other scanners or image reconstruction kernels. Therefore, it might be plausible to interpret our results in a conservative way: An AIP image reformatted from compressed thin sections exhibits similar or slightly less artifacts than the corresponding compressed thick section.
The acceptable compression threshold for CT images can be affected by several independent factors including compression algorithm, diagnostic task, and image contents [12, 17, 30]. Several researchers have recently revealed that section thickness, which is adjustable in multidetector-row scanners, is another important factor determining compression artifacts [15, 18, 36]. With our results, determining the compression threshold becomes more complicated: How a thin-section data set is to be finally visualized in clinical interpretation should also be considered. The same issue can be raised for various postprocessing techniques other than AIP, including maximum intensity projection, volume rendering, and fly-through, all of which need further investigation.
In this study, we do not propose a specific acceptable threshold of compression ratio for thin sections that are to be interpreted using the AIP technique for several reasons. Determining a compression guideline is very difficult and has traditionally been based on receiver operating characteristic curve analysis. However, such a study is likely unrealistic to cover a broad range of abdominal abnormalities and various imaging parameters. In addition, as shown in our results on the VLT measurement, even with the same imaging and compression techniques, images exhibited different magnitudes of compression artifacts according to the image contents. Therefore, a fixed compression ratio would not be meaningful as a universal compression guideline. Adaptive compression would be more ideal [16, 17, 37, 38], although the relevant techniques should be further validated. Finally, the needs, cost-saving effect, and legal risk of irreversible compression would vary from community to community. Although these considerations were outside the scope of this study, our principal interest in this study was whether the reported low-level compressions (e.g., less than 6:1) [15] are truly needed for noisy thin-section abdominal CT images that are rarely reviewed as they are. Based on our results, we conclude that thin sections can be compressed at least to the de facto compression levels used for thick sections (e.g., 10:1-16:1) [14, 15, 39-43] on the premise that they are typically reviewed using an AIP technique.
We used PSNR, HDR-VDP, radiologists' grading, and binary responses to measure compression artifacts. All of these analyses showed consistent results providing a basis for using higher compressions for thin sections that would be reviewed using the AIP technique. However, none of these individual analyses was a perfect measure of the artifacts with regard to their clinical significance.
PSNR is probably the most widely used metric to calculate pixel-wise distortions because of its computational simplicity; however, it is known to inaccurately correlate with human perception of artifacts [28]. To overcome this, we used HDR-VDP to simulate the radiologists' perception. Of many proposed perceptual metrics [25, 28], we regard HDR-VDP as the most suitable for our study because it can cover the bright luminance [22] of the medical displays we used. HDR-VDP has been reported to accurately predict radiologists' perceptions of compression artifacts in CT images [16, 38]. However, it is still not a foolproof measure of perceptible artifacts [16, 28] because it partly relies on unverified assumptions in modeling the human visual system [25].
Therefore, we additionally measured the human radiologists' responses, which would be reasonably the most accurate measure for the artifacts. However, although we used an artifact grading scheme similar to those adopted in previous research studies [18, 19, 42, 44], a limitation was that we relied on the radiologists' subjective decisions. Accordingly, the radiologists occasionally assigned different grades to the same image. Whether the artifacts rated as grade 1 or 2 are acceptable in clinical interpretation might be debatable. These minute artifacts probably correspond to mainly de-noising effect [12, 45, 46] that is unimportant in diagnosis [47], especially for noisier thin sections. However, it should be noted that the de-noising effect is inevitably accompanied by some degree of blurring artifacts at the same compression, altering the inherent organ textures.
To reduce the subjectivity and to be more conservative, we finally analyzed the presence of any perceivable artifacts (distinguishable or indistinguishable). If a compressed image is indistinguishable from its original by a radiologist, there is no basis for arguing that this compression hinders diagnostic accuracy [48]. This "visually lossless" criterion has been rapidly gaining support as a conservative and practical guideline for medical image compression [14-18, 30, 40, 48]. However, the magnitude of the artifacts could not be measured in this analysis, and individual differences still existed in the radiologists' sensitivities to the artifacts. Our results show that although there is a significant variation in VLT according to image contents, a significantly greater compression level can be used as the VLT for thin sections if the image fidelity matters in the final AIP images rather than in the source thin sections themselves.
Several considerations should be noted about the generalizability of our results. As we mentioned, our experimental settings were homogeneous regarding scanner type, imaging parameters, compression algorithm, and viewing conditions including the single window setting. Furthermore, we tested only 20 images that were arbitrarily selected, which would not ideally represent all potential abnormalities in abdominal CT scans. Nevertheless, we believe our results on the artifact difference between AIP and thin sections would be reproducible in different conditions. It is likely unrealistic to repeat the same analyses in various combinations of these parameters. We tried to carefully choose parameters that reflected current abdominal CT practice. For instance, we chose a slab thickness of 5 mm, which is our default thickness for abdominal applications. Using a thinner section does not necessarily increase lesion detection [49], and radiation dose must be significantly increased to maintain a high contrast resolution in thin sections [50]. Although we limited this study to a transverse plane, we believe our results can also be extrapolated to multiplanar reformations of any viewing plane because the principle of averaging pixels is essentially the same [2].
This study has other limitations. First, the readers were not completely blinded to the section thickness because they could frequently guess which image was a thin section from its graininess. Therefore, the measured difference in perceptible artifacts according to the section thickness might be exaggerated. Second, to avoid a possible clustering effect in a statistical sense, we tested only a single image per patient, which is unlike a real clinical situation in which radiologists scroll through a series of images. A more elaborate study is necessary to approximate real clinical situations. Third, we focused on the magnitude of compression artifacts and resulting VLTs rather than on the diagnostic accuracy at different compression levels. However, the VLTs can serve as a more conservative guideline for the compression.
In conclusion, compression artifacts in source thin-section abdominal CT images are significantly attenuated in AIP images. On the premise that thin-section abdominal CT images are typically reviewed using the AIP technique, it is justifiable to compress them to a compression level currently accepted for thick-section images.
|
|
|---|
This article has been cited by other articles:
![]() |
S. J. Kim, H.-H. Kim, Y. H. Kim, S. H. Hwang, H. S. Lee, D. J. Park, S. Y. Kim, and K. H. Lee Peritoneal Metastasis: Detection with 16- or 64-Detector Row CT in Patients Undergoing Surgery for Gastric Cancer Radiology, November 1, 2009; 253(2): 407 - 415. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-M. Joo, K. H. Lee, Y. H. Kim, S. Y. Kim, K. Kim, K. J. Kim, and B. Kim Detection of the Normal Appendix with Low-Dose Unenhanced CT: Use of the Sliding Slab Averaging Technique Radiology, June 1, 2009; 251(3): 780 - 787. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |