AJR AJR-based Continuing Ed for Technologists
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Strauss, S.
Right arrow Articles by Katsnelson, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Strauss, S.
Right arrow Articles by Katsnelson, L.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Hotlight (NEW!)
Right arrow
What's Hotlight?
DOI:10.2214/AJR.07.2123
AJR 2007; 189:W320-W323
© American Roentgen Ray Society


Original Research

Interobserver and Intraobserver Variability in the Sonographic Assessment of Fatty Liver

Simon Strauss1, Ella Gavish, Paul Gottlieb and Ludmila Katsnelson

1 All authors: Department of Diagnostic Imaging, Assaf Harofeh Medical Center, affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Zerifin 70300, Israel.

Received February 25, 2007; accepted after revision June 12, 2007.

 
Address correspondence to S. Strauss (drstraus{at}netvision.net.il).

WEB This is a Web exclusive article.


Abstract
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
OBJECTIVE. The purpose of this study was to evaluate interobserver and intraobserver variability in the sonographic assessment of the presence and severity of fatty liver.

MATERIALS AND METHODS. We retrospectively evaluated the static images of 168 adult patients who had undergone abdominal sonography. Three experienced radiologists independently graded the hepatic images as normal, mild steatosis, moderate steatosis, or severe steatosis. Assessment of liver steatosis was repeated on the same set of images 1 month later under the same conditions and blinded to the initial reading. Weighted kappa statistics were used to analyze interobserver and intraobserver agreement, and the agreement percentages were calculated.

RESULTS. The mean interobserver and intraobserver agreement rates for the presence of fatty liver were 72% ({kappa} = 0.43) and 76% ({kappa} = 0.54). For severity of fatty liver, the initial reading for pairs of observers had 47-59% ({kappa} = 0.40-0.51) interobserver agreement. The interobserver agreement for the second reading was 59-64% ({kappa} = 0.43-0.54). The mean agreement rates for pairs of observers were 53% ({kappa} = 0.47) and 62% ({kappa} = 0.50) on the first and second readings. Intraobserver agreement for severity of fatty liver ranged from 55% to 68% ({kappa} = 0.51-0.63).

CONCLUSION. Subjective visual assessment of fatty liver on sonography has substantial observer variability. There is a need for a more objective quantitative method of grading fatty liver on sonography that would be easily available and applicable in routine clinical practice.

Keywords: fatty liver • sonography • steatosis


Introduction
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Fatty liver or hepatic steatosis is the term used to describe a spectrum of conditions in which triglyceride accumulates within hepatocytes. The two most common conditions associated with fatty liver are alcoholic liver disease and nonalcoholic fatty liver disease [1]. In addition to alcohol abuse, a variety of etiologic factors are associated with fatty liver, including obesity, diabetes, hepatitis, and drug toxicity [2]. The prevalence of nonalcoholic fatty liver disease in the general population is estimated to be 13-23%, but it increases to 74% among obese persons [3, 4]. Fatty liver is a common abnormality in patients undergoing abdominal sonography, especially persons with suspected liver disease. Identification of patients with steatosis is important because the liver injury can progress to steatohepatitis, fibrosis, and cirrhosis. In addition, identification of fatty liver should prompt the clinician to search for associations with diabetes mellitus, hypertension, hypertriglyceridemia, and low levels of high-density lipoprotein cholesterol [5].

Sonography is operator-dependent, and the sonographic evaluation of fatty liver is based mainly on the subjective impression of hepatic echogenicity and posterior attenuation of the ultrasound beam [6]. To be reliable in evaluation of the presence and severity of steatosis, sonography must be reproducible among observers and by the same observer on separate occasions. To the best of our knowledge, no report in the literature has described assessment of level of agreement and reproducibility in an unselected cohort of patients undergoing sonography for a range of abdominal disorders not necessarily related to the liver. This study was designed to evaluate interobserver and intraobserver variability in the sonographic assessment of fatty liver in routine clinical practice.


Materials and Methods
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
A retrospective review of abdominal sonograms obtained during 2004 and stored in the PACS at our institution was preliminarily conducted by one investigator. One hundred sixty-eight adult patients who met the following criteria were randomly selected as the cohort for this study: the examination was technically adequate and included both transverse and longitudinal views of the liver (10-20 images), no focal hepatic lesion was seen, and both the right kidney and the liver were seen in at least one of the images. In all selected cases, the liver had a diffusely homogeneous texture, except for those in which the ultrasound beam was attenuated in the far field. Patients with heterogeneous liver texture or ascites were excluded from the study. The patients had undergone abdominal sonography for a range of indications not necessarily related to the liver. The study was approved by the institutional review board, which waived informed consent.

All sonograms were obtained with one of two units, one (Acuson 128 XP, Siemens Medical Solutions) with a 2- to 4-MHz vector transducer and the other (ATL HDI 5000, Philips Medical Systems) with a 2- to 5-MHz convex transducer. The examinations were performed by one of four sonographers with 10-26 years of experience. The technical parameters, including gain adjustment, placement of focal zone, use of tissue harmonics, and use of color Doppler technique, were optimized on a case-by-case basis.

Three radiologists with 8-27 years of experience in abdominal sonography independently reviewed the images of the 168 patients. Using a predetermined protocol, the observers graded each case as normal, mild fatty liver, moderate fatty liver, or severe fatty liver. Mild steatosis was seen as a slight increase in liver echogenicity. In moderate steatosis, visualization of intrahepatic vessels and the diaphragm was slightly impaired, and increased liver echogenicity was present. Severe steatosis was recognized as a marked increase in hepatic echogenicity, poor penetration of the posterior segment of the right lobe of the liver, and poor or no visualization of the hepatic vessels and diaphragm [7]. The liver was assessed to be normal if the texture was homogeneous, exhibited fine-level echoes, or was minimally hyperechoic or isoechoic compared with normal renal cortex and if there was no posterior attenuation of the ultrasound beam.

The images were reviewed by the observers on the same monitor and under the same ambient lighting conditions. The observers were blinded to the clinical and laboratory data and were unaware of the written interpretation previously reported and of the other observers' assessments. After an interval of 4 weeks, in which the cases were randomly rearranged in a new sequence, the observers were again requested to assess the presence and severity of steatosis in the same 168 cases under the same conditions and blinded to the initial reading.

The interobserver and intraobserver agreement percentages were calculated by dividing the number of occasions of complete agreement by the total number of occasions. Weighted kappa statistics were used to determine the degree of agreement after correction for the agreement expected by chance. The kappa statistic was interpreted as follows: less than 0.00, poor agreement; 0.00-0.20, slight agreement; 0.21-0.40, fair agreement; 0.41-0.60, moderate agreement; 0.61-0.80, substantial agreement; and 0.81-1.00, almost perfect agreement [8]. The level of statistically significant difference was p < 0.01. Statistical analyses were performed with SAS software version 9.1 (SAS Institute).


Results
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Figure 1 illustrates the variability between and within observers.


Figure 1
Figure 1
Figure 1
View larger version (339K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1 73-year-old man with fatty liver. Three representative sonograms of liver show interobserver and intraobserver variability. At first reading, observer A graded liver as moderately fatty, observer B graded liver as mildly fatty, and observer C graded liver as normal. At second reading, observers A and C were consistent in grading liver moderately fatty and normal, respectively, whereas observer B, unlike in first evaluation, assessed liver as normal.

 

Intraobserver Agreement
Intraobserver agreement for grading the severity of fatty liver is summarized in Table 1. The mean kappa value for the three observers reached a moderate level ({kappa} = 0.58; range, 0.51-0.63), and the mean percentage agreement between the first and second readings was 62.1% (range, 54.7-67.9%). The first observer (A) agreed with himself in 114 (67.9%) of the 168 cases ({kappa} = 0.63), the second observer (B) agreed with himself in 92 (54.8%) of the 168 cases ({kappa} = 0.51), and the third observer (C) agreed with himself in 107 (63.7%) of the 168 cases ({kappa} = 0.59).


View this table:
[in this window]
[in a new window]

 
TABLE 1: Intraobserver Agreement for Severity of Fatty Liver

 

Intraobserver agreement was better if the results were analyzed to indicate whether the liver was assessed as fatty without regard to severity of infiltration. The mean intraobserver percentage of agreement for assessment of presence or absence of fatty liver was 76.4% (range, 70.2-82.1%), but the kappa value remained at the moderate level ({kappa} =0.54; range, 0.45-0.64). Only one observer reached a substantial ({kappa} = 0.64) level of agreement.

Interobserver Agreement
Interobserver agreement between each pair of observers (A and B, A and C, B and C) for severity of fatty liver is summarized in Table 2. The kappa values for pairs of observers ranged from 0.40 to 0.51 at the first reading and from 0.43 to 0.54 at the second reading, corresponding to a moderate level of agreement. The mean interobserver percentage of agreement for assessment of presence or absence of fatty liver was 70.3% ({kappa} = 0.46) at the first reading and 73.4% ({kappa} = 0.40) at the second reading. All kappa values were at the moderate level.


View this table:
[in this window]
[in a new window]

 
TABLE 2: Interobserver Agreement for Severity of Fatty Liver

 


Discussion
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Fatty liver is increasingly recognized as a clinical problem and is now considered the most common hepatic disorder in Western countries [9]. Initially accepted as an effect of alcohol abuse, nonalcoholic fatty liver is now known to be part of a larger metabolic syndrome that has the potential to progress to cirrhosis and liver failure [4]. Many patients with fatty liver have no symptoms or signs of liver disease at the time of diagnosis, and the first indication of steatosis is often found on cross-sectional imaging. In other patients, fatty liver can be the cause of hepatomegaly and elevated liver enzyme levels, prompting a sonographic study directed specifically at the liver.

The most definitive means for assessing the presence and severity of steatosis is liver biopsy, which remains the reference standard. However, noninvasive techniques such as sonography, CT, and MRI have been used to detect fatty liver. The reported sensitivities and specificities are 60-100% and 77-95% for sonography, 43-95% and 90% for unenhanced CT, and 81% and 100% for chemical shift gradient MRI [1]. On unenhanced CT, fatty liver is diagnosed if the liver attenuation minus the spleen attenuation is -10 H or less. The diagnosis of fatty liver on contrast-enhanced helical CT may also be accurate but is protocol-specific [10]. The most widely used MRI technique for assessment of fatty liver is chemical shift gradient-echo imaging with in-phase and opposed-phase acquisitions. In fatty liver there is a loss of signal intensity on opposed-phase images in comparison with in-phase images [1].

Sonography is highly operator-dependent, and the diagnosis of fatty liver is based mainly on the subjective assessment of liver echogenicity. Although quantitative methods for measuring tissue echogenicity have been reported, these methods are not generally available and are not used in clinical sonography departments [6]. With the widespread use of sonography for evaluating abdominal disorders in general and hepatic disease in particular, the liver may be seen to have an echogenic appearance and is therefore interpreted as being fatty. Liver echogenicity normally equals or slightly exceeds renal cortical echogenicity, but this factor relies on the visual perception of the observer. Furthermore, the echogenicity of the renal cortex can be altered by disease processes, making the comparison less reliable. In this study, we used a well-recognized classification of fatty liver into mild, moderate, and severe degrees of steatosis based on an increase in hepatic echogenicity, impaired visualization of hepatic vessels and diaphragm, and poor penetration of the posterior aspects of the liver [7].

One of the determinants of diagnostic accuracy is reproducibility of the examination, which should yield the same or similar results when repeated. The findings in this study showed that observer variability in sonographic assessment of fatty liver is considerable. The percentage of agreement between pairs of observers was only 47-59% at the first reading and 59-64% at the second reading. As expected, agreement was higher for assessment of presence or absence of fatty liver; the mean percentage of agreement between pairs of observers was 70% at the first reading and 73% at the second reading. Not surprisingly, intraobserver agreement was superior to interobserver agreement, given that there is a greater chance that several observers will differ in interpretation than when the same observer evaluates the images. In more than one third of the cases, however, the observers evaluated the severity of steatosis differently on the second reading, and the mean intraobserver agreement was moderate ({kappa} = 0.58). The percentage of agreement was slightly better when analyzed to show whether the liver was fatty but still revealed that an observer disagreed with himself in approximately one fourth of the cases assessed. There were no instances of almost perfect interobserver or intraobserver agreement, and low substantial agreement was achieved by only one observer for both the presence ({kappa} = 0.64) and the severity ({kappa} = 0.63) of fatty liver.

The variability of radiologic interpretations in imaging of patients with nonalcoholic fatty liver disease has been reported by Saadeh et al. [11]. Their study included evaluation of interobserver and intraobserver agreement for pattern and severity of disease assessed with sonography, CT, and MRI. The intraobserver agreement for severity of steatosis assessed with sonography was found to be substantial ({kappa} = 0.63), but the interobserver agreement for severity was fair ({kappa} = 0.40). Although the results are apparently similar, our study differed from that by Saadeh et al. in several aspects. They studied a preselected cohort of 25 patients with the clinicopathologic diagnosis of nonalcoholic fatty liver disease, whereas our study population of 168 patients included any patient presenting to the department for abdominal sonography. Furthermore, Saadeh et al. found that the severity of steatosis was accurately determined only when more than 33% fat was found at liver biopsy. It is reasonable to expect that agreement levels would have been higher in our study if it had been restricted to patients with similarly high levels of fatty infiltration. Given that lipid accounts for approximately 5% of the total wet weight of normal liver [12], many patients with mild or moderate steatosis would have more than 5% but less than 33% hepatic fat.

Echogenicity is the single most important criterion in the sonographic assessment of fatty liver, but it is subjective and variable. In a study of fetal liver echogenicity, Smith-Levitin et al. [13] found that in comparison with findings at electrooptical densitometry, the observers' evaluations of differences in echogenicity were extremely inaccurate and that the human eye is a poor assessor of image density. Similarly, a study of visual assessment of renal echogenicity compared with densitometric measurements of echogenicity in infants revealed poor correlation [14]. Several sonographic systems generate a histogram of the gray level in terms of image density or echo intensity [15, 16]. In a study of liver echogenicity conducted with a sonographic instrument with this capability, Vehmas et al. [15] found that radiologists' visual grades were more accurate than computerized measurements of early pathologic changes in the liver. Those investigators asserted that computerized measurements were impeded by being restricted to small areas to avoid blood vessels, bile ducts, and acoustic shadows, whereas an experienced radiologist pays attention to vascular architecture and the effect of liver size, diaphragmatic visibility, and artifacts in addition to the general echogenicity of the liver.

An important limitation of this retrospective study was that the images reviewed were static; the observers were not present during the examinations. In real-time imaging, liver echogenicity may be affected by the setting of the sonographic unit. In routine practice, the impression of echogenicity is often conveyed to the radiologist by the sonographer performing the examination. All of the examinations, however, were conducted by sonographers with at least 10 years of experience in abdominal sonography. Therefore, one can assume that the technical parameters affecting image acquisition were optimized and that the static images accurately reflected the impression gained by the sonographer during the examination. A potential source of bias in this study was that the three observers had spent the last 8 years working closely together in the same sonography unit. This factor may have produced better agreement levels than if the study had included observers from different institutions. This issue is relevant when patients are sent for follow-up examinations for assessment of changes in the degree of fatty liver and may undergo the examinations in different sonography departments.

To be useful in the assessment of fatty liver, sonography must have interobserver reliability and intraobserver reproducibility. This study, however, showed that radiologists can differ, sometimes substantially, in their evaluation of steatosis. At present, quantitative methods for measuring echogenicity are not widely available and are considered time-consuming, laborious, and not always accurate. Improved technology, however, has the potential to overcome these difficulties, and sonograms in the future may enable objective and reliable assessment of fatty liver. Until then, radiologists should acknowledge and clinicians should be aware that interobserver and intraobserver rates of agreement are only moderate in assessment of the severity of fatty liver and that agreement levels are only slightly better if the radiologist is required to decide whether steatosis is present.


References
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 

  1. Hamer OW, Aguirre DA, Casola G, Lavine JE, Woenckhaus M, Sirlin CB. Fatty liver: imaging patterns and pitfalls. RadioGraphics 2006;26 : 1637-1653[Abstract/Free Full Text]
  2. Fishbein M, Castro F, Cheruku S, et al. Hepatic MRI for fat quantitation: its relationship to fat morphology, diagnosis, and ultrasound. J Clin Gastroenterol 2005;39 : 619-625[CrossRef][Medline]
  3. Joy D, Thava VR, Scott BB. Diagnosis of fatty liver disease: is biopsy necessary? Eur J Gastroenterol Hepatol2003; 15:539 -543[CrossRef][Medline]
  4. Angulo P. Nonalcoholic fatty liver disease. N Engl J Med 2002; 346:1221 -1231[Free Full Text]
  5. Hui JM, Farrell GC. Clear messages from sonographic shadows? Links between metabolic disorders and liver disease, and what to do about them. J Gastroenterol Hepatol 2003;18 : 1115-1117[CrossRef][Medline]
  6. Zwiebel WJ. Sonographic diagnosis of diffuse liver disease. Semin Ultrasound CT MR 1995;16 : 8-15[CrossRef][Medline]
  7. Rumack CM, Wilson SR, Charboneau JW. Diagnostic ultrasound, 2nd ed. St. Louis, MO: Mosby, 1998:110 -112
  8. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33 : 159-174[CrossRef][Medline]
  9. Palmentieri B, de Sio I, La Mura V, et al. The role of bright liver echo pattern on ultrasound B-mode examination in the diagnosis of liver steatosis. Dig Liver Dis 2006;38 : 485-489[CrossRef][Medline]
  10. Jacobs JE, Birnbaum BA, Shapiro MA, et al. Diagnostic criteria for fatty infiltration of the liver on contrast-enhanced helical CT. AJR 1998; 171:659 -664[Abstract/Free Full Text]
  11. Saadeh S, Younossi ZM, Remer EM, et al. The utility of radiological imaging in nonalcoholic fatty liver disease. Gastroenterology 2002;123 : 745-750[CrossRef][Medline]
  12. Fishbein MH, Miner M, Mogren C, Chalekson J. The spectrum of fatty liver in obese children and the relationship of serum aminotransferases to severity of steatosis. J Pediatr Gastroenterol Nutr2003; 36:54 -61[CrossRef][Medline]
  13. Smith-Levitin M, Blickstein I, Albrecht-Shach AA, et al. Quantitative assessment of gray-level perception: observers' accuracy is dependent on density differences. Ultrasound Obstet Gynecol 1997; 10:346 -349[CrossRef][Medline]
  14. Eggert P, Debus F, Kreller-Laugwitz G, Oppermann HC. Densitometric measurement of renal echogenicity in infants and naked eye evaluation: a comparison. Pediatr Radiol 1991;21 : 111-113[CrossRef][Medline]
  15. Vehmas T, Kaukiainen A, Luoma K, Lohman M, Nurminen M, Taskinen H. Liver echogenicity: measurement or visual grading? Comput Med Imaging Graph 2004; 28:289 -293[CrossRef][Medline]
  16. Osawa H, Mori Y. Sonographic diagnosis of fatty liver using a histogram technique that compares liver and renal cortical echo amplitudes. J Clin Ultrasound 1996;24 : 25-29[CrossRef][Medline]

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Strauss, S.
Right arrow Articles by Katsnelson, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Strauss, S.
Right arrow Articles by Katsnelson, L.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Hotlight (NEW!)
Right arrow
What's Hotlight?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS