|
|
||||||||
Original Research |
1 All authors: Department of Diagnostic Imaging, Assaf Harofeh Medical Center, affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Zerifin 70300, Israel.
Received February 25, 2007;
accepted after revision June 12, 2007.
Address correspondence to S. Strauss
(drstraus{at}netvision.net.il).
Abstract
|
|
|---|
MATERIALS AND METHODS. We retrospectively evaluated the static images of 168 adult patients who had undergone abdominal sonography. Three experienced radiologists independently graded the hepatic images as normal, mild steatosis, moderate steatosis, or severe steatosis. Assessment of liver steatosis was repeated on the same set of images 1 month later under the same conditions and blinded to the initial reading. Weighted kappa statistics were used to analyze interobserver and intraobserver agreement, and the agreement percentages were calculated.
RESULTS. The mean interobserver and intraobserver agreement rates
for the presence of fatty liver were 72% (
= 0.43) and 76% (
=
0.54). For severity of fatty liver, the initial reading for pairs of observers
had 47-59% (
= 0.40-0.51) interobserver agreement. The interobserver
agreement for the second reading was 59-64% (
= 0.43-0.54). The mean
agreement rates for pairs of observers were 53% (
= 0.47) and 62%
(
= 0.50) on the first and second readings. Intraobserver agreement for
severity of fatty liver ranged from 55% to 68% (
= 0.51-0.63).
CONCLUSION. Subjective visual assessment of fatty liver on sonography has substantial observer variability. There is a need for a more objective quantitative method of grading fatty liver on sonography that would be easily available and applicable in routine clinical practice.
Keywords: fatty liver sonography steatosis
|
|
|---|
Sonography is operator-dependent, and the sonographic evaluation of fatty liver is based mainly on the subjective impression of hepatic echogenicity and posterior attenuation of the ultrasound beam [6]. To be reliable in evaluation of the presence and severity of steatosis, sonography must be reproducible among observers and by the same observer on separate occasions. To the best of our knowledge, no report in the literature has described assessment of level of agreement and reproducibility in an unselected cohort of patients undergoing sonography for a range of abdominal disorders not necessarily related to the liver. This study was designed to evaluate interobserver and intraobserver variability in the sonographic assessment of fatty liver in routine clinical practice.
|
|
|---|
All sonograms were obtained with one of two units, one (Acuson 128 XP, Siemens Medical Solutions) with a 2- to 4-MHz vector transducer and the other (ATL HDI 5000, Philips Medical Systems) with a 2- to 5-MHz convex transducer. The examinations were performed by one of four sonographers with 10-26 years of experience. The technical parameters, including gain adjustment, placement of focal zone, use of tissue harmonics, and use of color Doppler technique, were optimized on a case-by-case basis.
Three radiologists with 8-27 years of experience in abdominal sonography independently reviewed the images of the 168 patients. Using a predetermined protocol, the observers graded each case as normal, mild fatty liver, moderate fatty liver, or severe fatty liver. Mild steatosis was seen as a slight increase in liver echogenicity. In moderate steatosis, visualization of intrahepatic vessels and the diaphragm was slightly impaired, and increased liver echogenicity was present. Severe steatosis was recognized as a marked increase in hepatic echogenicity, poor penetration of the posterior segment of the right lobe of the liver, and poor or no visualization of the hepatic vessels and diaphragm [7]. The liver was assessed to be normal if the texture was homogeneous, exhibited fine-level echoes, or was minimally hyperechoic or isoechoic compared with normal renal cortex and if there was no posterior attenuation of the ultrasound beam.
The images were reviewed by the observers on the same monitor and under the same ambient lighting conditions. The observers were blinded to the clinical and laboratory data and were unaware of the written interpretation previously reported and of the other observers' assessments. After an interval of 4 weeks, in which the cases were randomly rearranged in a new sequence, the observers were again requested to assess the presence and severity of steatosis in the same 168 cases under the same conditions and blinded to the initial reading.
The interobserver and intraobserver agreement percentages were calculated by dividing the number of occasions of complete agreement by the total number of occasions. Weighted kappa statistics were used to determine the degree of agreement after correction for the agreement expected by chance. The kappa statistic was interpreted as follows: less than 0.00, poor agreement; 0.00-0.20, slight agreement; 0.21-0.40, fair agreement; 0.41-0.60, moderate agreement; 0.61-0.80, substantial agreement; and 0.81-1.00, almost perfect agreement [8]. The level of statistically significant difference was p < 0.01. Statistical analyses were performed with SAS software version 9.1 (SAS Institute).
|
|
|---|
|
Intraobserver Agreement
Intraobserver agreement for grading the severity of fatty liver is
summarized in Table 1. The mean
kappa value for the three observers reached a moderate level (
= 0.58;
range, 0.51-0.63), and the mean percentage agreement between the first and
second readings was 62.1% (range, 54.7-67.9%). The first observer (A) agreed
with himself in 114 (67.9%) of the 168 cases (
= 0.63), the second
observer (B) agreed with himself in 92 (54.8%) of the 168 cases (
=
0.51), and the third observer (C) agreed with himself in 107 (63.7%) of the
168 cases (
= 0.59).
|
Intraobserver agreement was better if the results were analyzed to indicate
whether the liver was assessed as fatty without regard to severity of
infiltration. The mean intraobserver percentage of agreement for assessment of
presence or absence of fatty liver was 76.4% (range, 70.2-82.1%), but the
kappa value remained at the moderate level (
=0.54; range, 0.45-0.64).
Only one observer reached a substantial (
= 0.64) level of
agreement.
Interobserver Agreement
Interobserver agreement between each pair of observers (A and B, A and C, B
and C) for severity of fatty liver is summarized in
Table 2. The kappa values for
pairs of observers ranged from 0.40 to 0.51 at the first reading and from 0.43
to 0.54 at the second reading, corresponding to a moderate level of agreement.
The mean interobserver percentage of agreement for assessment of presence or
absence of fatty liver was 70.3% (
= 0.46) at the first reading and
73.4% (
= 0.40) at the second reading. All kappa values were at the
moderate level.
|
|
|
|---|
The most definitive means for assessing the presence and severity of steatosis is liver biopsy, which remains the reference standard. However, noninvasive techniques such as sonography, CT, and MRI have been used to detect fatty liver. The reported sensitivities and specificities are 60-100% and 77-95% for sonography, 43-95% and 90% for unenhanced CT, and 81% and 100% for chemical shift gradient MRI [1]. On unenhanced CT, fatty liver is diagnosed if the liver attenuation minus the spleen attenuation is -10 H or less. The diagnosis of fatty liver on contrast-enhanced helical CT may also be accurate but is protocol-specific [10]. The most widely used MRI technique for assessment of fatty liver is chemical shift gradient-echo imaging with in-phase and opposed-phase acquisitions. In fatty liver there is a loss of signal intensity on opposed-phase images in comparison with in-phase images [1].
Sonography is highly operator-dependent, and the diagnosis of fatty liver is based mainly on the subjective assessment of liver echogenicity. Although quantitative methods for measuring tissue echogenicity have been reported, these methods are not generally available and are not used in clinical sonography departments [6]. With the widespread use of sonography for evaluating abdominal disorders in general and hepatic disease in particular, the liver may be seen to have an echogenic appearance and is therefore interpreted as being fatty. Liver echogenicity normally equals or slightly exceeds renal cortical echogenicity, but this factor relies on the visual perception of the observer. Furthermore, the echogenicity of the renal cortex can be altered by disease processes, making the comparison less reliable. In this study, we used a well-recognized classification of fatty liver into mild, moderate, and severe degrees of steatosis based on an increase in hepatic echogenicity, impaired visualization of hepatic vessels and diaphragm, and poor penetration of the posterior aspects of the liver [7].
One of the determinants of diagnostic accuracy is reproducibility of the
examination, which should yield the same or similar results when repeated. The
findings in this study showed that observer variability in sonographic
assessment of fatty liver is considerable. The percentage of agreement between
pairs of observers was only 47-59% at the first reading and 59-64% at the
second reading. As expected, agreement was higher for assessment of presence
or absence of fatty liver; the mean percentage of agreement between pairs of
observers was 70% at the first reading and 73% at the second reading. Not
surprisingly, intraobserver agreement was superior to interobserver agreement,
given that there is a greater chance that several observers will differ in
interpretation than when the same observer evaluates the images. In more than
one third of the cases, however, the observers evaluated the severity of
steatosis differently on the second reading, and the mean intraobserver
agreement was moderate (
= 0.58). The percentage of agreement was
slightly better when analyzed to show whether the liver was fatty but still
revealed that an observer disagreed with himself in approximately one fourth
of the cases assessed. There were no instances of almost perfect interobserver
or intraobserver agreement, and low substantial agreement was achieved by only
one observer for both the presence (
= 0.64) and the severity (
= 0.63) of fatty liver.
The variability of radiologic interpretations in imaging of patients with
nonalcoholic fatty liver disease has been reported by Saadeh et al.
[11]. Their study included
evaluation of interobserver and intraobserver agreement for pattern and
severity of disease assessed with sonography, CT, and MRI. The intraobserver
agreement for severity of steatosis assessed with sonography was found to be
substantial (
= 0.63), but the interobserver agreement for severity was
fair (
= 0.40). Although the results are apparently similar, our study
differed from that by Saadeh et al. in several aspects. They studied a
preselected cohort of 25 patients with the clinicopathologic diagnosis of
nonalcoholic fatty liver disease, whereas our study population of 168 patients
included any patient presenting to the department for abdominal sonography.
Furthermore, Saadeh et al. found that the severity of steatosis was accurately
determined only when more than 33% fat was found at liver biopsy. It is
reasonable to expect that agreement levels would have been higher in our study
if it had been restricted to patients with similarly high levels of fatty
infiltration. Given that lipid accounts for approximately 5% of the total wet
weight of normal liver [12],
many patients with mild or moderate steatosis would have more than 5% but less
than 33% hepatic fat.
Echogenicity is the single most important criterion in the sonographic assessment of fatty liver, but it is subjective and variable. In a study of fetal liver echogenicity, Smith-Levitin et al. [13] found that in comparison with findings at electrooptical densitometry, the observers' evaluations of differences in echogenicity were extremely inaccurate and that the human eye is a poor assessor of image density. Similarly, a study of visual assessment of renal echogenicity compared with densitometric measurements of echogenicity in infants revealed poor correlation [14]. Several sonographic systems generate a histogram of the gray level in terms of image density or echo intensity [15, 16]. In a study of liver echogenicity conducted with a sonographic instrument with this capability, Vehmas et al. [15] found that radiologists' visual grades were more accurate than computerized measurements of early pathologic changes in the liver. Those investigators asserted that computerized measurements were impeded by being restricted to small areas to avoid blood vessels, bile ducts, and acoustic shadows, whereas an experienced radiologist pays attention to vascular architecture and the effect of liver size, diaphragmatic visibility, and artifacts in addition to the general echogenicity of the liver.
An important limitation of this retrospective study was that the images reviewed were static; the observers were not present during the examinations. In real-time imaging, liver echogenicity may be affected by the setting of the sonographic unit. In routine practice, the impression of echogenicity is often conveyed to the radiologist by the sonographer performing the examination. All of the examinations, however, were conducted by sonographers with at least 10 years of experience in abdominal sonography. Therefore, one can assume that the technical parameters affecting image acquisition were optimized and that the static images accurately reflected the impression gained by the sonographer during the examination. A potential source of bias in this study was that the three observers had spent the last 8 years working closely together in the same sonography unit. This factor may have produced better agreement levels than if the study had included observers from different institutions. This issue is relevant when patients are sent for follow-up examinations for assessment of changes in the degree of fatty liver and may undergo the examinations in different sonography departments.
To be useful in the assessment of fatty liver, sonography must have interobserver reliability and intraobserver reproducibility. This study, however, showed that radiologists can differ, sometimes substantially, in their evaluation of steatosis. At present, quantitative methods for measuring echogenicity are not widely available and are considered time-consuming, laborious, and not always accurate. Improved technology, however, has the potential to overcome these difficulties, and sonograms in the future may enable objective and reliable assessment of fatty liver. Until then, radiologists should acknowledge and clinicians should be aware that interobserver and intraobserver rates of agreement are only moderate in assessment of the severity of fatty liver and that agreement levels are only slightly better if the radiologist is required to decide whether steatosis is present.
|
|
|---|
This article has been cited by other articles:
![]() |
X. Ma, N.-S. Holalkere, A. K. R, M. Mino-Kenudson, P. F. Hahn, and D. V. Sahani Imaging-based Quantification of Hepatic Fat: Methods and Clinical Applications RadioGraphics, September 1, 2009; 29(5): 1253 - 1277. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |