AJR ARRS PQI
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kundel, H. L.
Right arrow Articles by Miller, W. T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kundel, H. L.
Right arrow Articles by Miller, W. T., Jr.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Hotlight (NEW!)
Right arrow
What's Hotlight?
AJR 2001; 177:525-528
© American Roentgen Ray Society


Reliability of Soft-Copy Versus Hard-Copy Interpretation of Emergency Department Radiographs

A Prototype Study

Harold L. Kundel1, Marcia Polansky2, Murray K. Dalinka1, Robert H. Choplin1, Warren B. Gefter1, J. Bruce Kneelend1, Wallace T. Miller, Sr.1 and Wallace T. Miller, Jr.3

1 Department of Radiology, University of Pennsylvania, 3400 Spruce St., Philadelphia, PA 19444.
2 MCP Hahnemann University, School of Public Health, Broad and Vine Sts., Philadelphia, PA 19102.
3 Department of Radiology, Thomas Jefferson University, 11th and Walnut Sts., Philadelphia, PA 19107.

Received November 17, 2000; accepted after revision March 9, 2001.

 
Supported by grant P01 - CA 53141 from the National Cancer Institute, U. S. Public Health Service, Department of Health and Human Services.

Address correspondence to H. L. Kundel.


Abstract
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
APPENDIX: Example of Question...
References
 
OBJECTIVE. The purpose of this study was to compare the diagnostic reliability of hard-copy and soft-copy interpretation of radiographs obtained in the emergency department using a methodology for evaluating imaging systems when independent proof of the diagnosis is not available.

MATERIALS AND METHODS. We collected radiographs from a stratified sample of 100 patients seen in the emergency department. The images were obtained using computed radiography, and the digital images were printed on film and stored for display on a workstation. A group of seven experienced radiologists reported the cases using both film and the workstation display. The results were analyzed using mixture distribution analysis (MDA).

RESULTS. The reliability expressed as the percentage of agreement of a typical observer relative to the majority was computed from the MDA. The result was 90% for both hard copy and soft copy with bootstrap confidence intervals of 86-94%.

CONCLUSION. We conclude that, in the emergency department, soft-copy interpretation is as reliable as hard-copy interpretation. The strength of this conclusion depends on the validity of the MDA approach as well as the extent to which the observer sample and case sample are representative of the emergency department.


Introduction
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
APPENDIX: Example of Question...
References
 
The development and deployment of picture archiving and communication systems (PACS) has made image interpretation from soft copy feasible and, in many situations, desirable. A number of investigators have addressed the issue of the accuracy of soft copy when compared with hard copy in laboratory investigations that use specially selected samples [1,2,3] or field investigations that use a representative sample of clinical patients [4]. Laboratory studies are like stress tests in which the cases are chosen as exemplars of certain universal image properties. For example, contrast rendition is studied using lung nodules, and spatial resolution is studied using either fractures or pneumothoraces. Patients for field studies are selected because they are representative of the clinical population. An unbiased sample is difficult to acquire because the receiver operating characteristic (ROC) analysis, which has become the standard statistical tool for comparing observer accuracy, requires that the case sample be divided into diseased and disease-free categories on the basis of independent truth. In many of the patients seen in an emergency department, independent truth cannot be ascertained because the images are so important for making the diagnosis. Investigators try to circumvent this difficulty by having a panel of experts establish the diagnosis [1,2,3,4], which introduces another source of variability not taken into account by the ROC analysis. The criteria used by the panel can alter the outcome of the ROC analysis by changing the distribution of diseased and disease-free cases [5]. In addition, the panelists add to the cost of performing the study and, by virtue of their familiarity with the cases, are ineligible to serve as observers, thus depriving the interpretation study of two or three experts.

We propose a measure of reliability based on observer agreement that uses the mathematic technique of mixture distribution analysis (MDA) to compute the relative percentage of agreement [6,7]. Reliability is defined as the likelihood of agreement of one typical observer relative to the interpretation of the majority. This can be contrasted with accuracy, which is the likelihood of agreement with an external standard. The relative percentage of agreement has a range of 0-100. A value of 0 means that there is no agreement with the majority, 50 means that agreement occurs 50% of the time, and 100 means that agreement occurs all the time. The MDA methodology requires a minimum of five observers, but it does not require proven cases in the test sample. It does not estimate accuracy because there is no gold standard. The relative percentage of agreement is an estimate of the consistency of performance over cases and over observers. We have found a good correlation between the accuracy as measured by ROC analysis and the reliability as measured by the MDA [8].

The study reported here uses MDA to compare the reliability of soft-copy and hard-copy interpretations on a sample of 100 film radiographs drawn from the emergency department and interpreted by seven experienced radiologists who regularly interpret emergency department cases. The sample was stratified by the type of examination.


Materials and Methods
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
APPENDIX: Example of Question...
References
 
The cases were obtained while the emergency department radiology operation was in transition from hard copy to soft copy; both modalities were available and used for clinical interpretation in the emergency department radiology suite. All imaging was performed in the emergency department's radiography room using computed radiography plates (STIII; Fuji Medical Systems, Stamford, CT). The images were printed on a Dryview 8700 printer (Imation, Oakdale, MN) and checked for quality by the emergency department radiology technologist.

The Case Sample
A data coordinator, who was also a radiology technologist and was thoroughly familiar with radiography in the emergency department, assembled the sample of 100 cases. The emergency department is divided into a major trauma unit, an emergency unit (hereafter called the emergency department), and a unit called Penn Rapid Care that takes care of less critically ill patients. Only those images from the emergency department obtained during the day shift were included in the study. (The case mix is similar during the evening and night shifts.) The images were categorized as depicting the chest, abdomen, spine and pelvis, and extremities. The cases were acquired in the order of patient arrival to the imaging facility. Most emergency department patients undergo imaging for a single body region. When noncontiguous regions were imaged, for example, ankle and chest or ankle and wrist, each region was considered separately. All the standard views (e.g., posteroanterior and lateral chest) were included with the case. The general mix of cases in the emergency department and the number of cases selected in each category are shown in Table 1.


View this table:
[in this window]
[in a new window]

 
TABLE 1 Case Distribution

 

Of 313 successive patients, 55 were excluded in advance: 38 trauma cases, five nasal and facial bones, two postreduction of fractures, and 10 cases interpreted by potential observers in the study. The data coordinator selected a case, printed a duplicate copy, and assigned it a sequential number. That number was written on a label and pasted over the patient identifying information in the corner of the film, as required by the University of Pennsylvania's institutional review board. The images in each case were also retrieved from the local archive and stored in a research database before they were sent to the PACS archive, where data compression is applied. The data coordinator verified the reason for requesting the examination (the clinical history) by retrieving the emergency department request form. Information on the form was compared with the entry in the radiology information system. A total of 115 cases were excluded after selection because the clinical history either could not be retrieved or was not verifiable (n = 53) or because uncompressed soft-copy images could not be obtained (n = 62). The remaining 143 cases were used to fill the categories in the test set.

The Interpretation Test
Seven observers participated in the study. All were staff radiologists who either interpreted emergency department cases on the day shift or regularly rotated through the emergency department reading room on the evening shift (5:00 P.M.-11:00 P.M.). None had given the primary interpretation or had previously reviewed any of the cases. The hard-copy cases were divided into 10 groups of 10 each and given to the radiologists to interpret using a standard form (Appendix) with check boxes and a place to write a comment. The reason for requesting the examination was stated for each case, and the observer was asked to decide, with respect to the clinical history, whether findings were positive, equivocal, or negative. If equivocal, the observer was asked to state whether findings were ruled equivocal for technical reasons or for diagnostic reasons. The observer was also asked to report any findings not related to the clinical history and to indicate whether patient care would be affected by them.

After 6 months, the soft-copy cases were transferred to removable disks (Jaz 2GB; Iomega, Roy, UT) that were given to each of the observers. Two Advantage display workstations (General Electric Medical Systems, Milwaukee, WI) in the radiology interpretation rooms were equipped with the disk drives, and the cases were interpreted again. Both workstations had 2 x 2.5K monitors and were located in dimly lit rooms. The luminance range of the workstation display was 0.2-200 cd/m2. The luminance range and linearity were checked weekly during the study with a J17 illuminometer (Tektronix, Beaverton, OR). The viewing distance was not controlled. Observers were free to use any image-processing functions on the workstation.

Data Reduction and Analysis
The method for performing the MDA using the expectation—maximization algorithm has been described in the literature [6,7]. The objective of the analysis is to estimate the parameters of three binomial distributions representing three groups of cases: easy normal cases, hard cases (normal and abnormal), and easy abnormal cases. The parameters that are computed are the probability that a case in the group will be interpreted as positive (m) and the proportion of cases in that group (p). Hard cases are defined as those on which observers do not agree. Because observers cannot distinguish normal from abnormal, they are grouped together. A finer distinction into hard normals and hard abnormals can be made, but doing so requires estimating the parameters of four binomial distributions and requires more cases and more observers than were used in our study.

The input to the expectation—maximization algorithm is a set of interpretations coded as positive and negative. The observers in our study preferred to use three categories of findings: positive, equivocal, and negative. An analysis of the distribution of equivocal reports showed that they are equally as likely to be associated with a positive or negative interpretation. Our study had more negative than positive interpretations, so the equivocal interpretations were combined with the negative interpretations. The standard deviation of the mixture distribution parameters was computed using 1000 bootstrap samples with replacement [9].

The relative percentage of agreement was calculated as

when m2 was equal to or greater than 0.5 and as

when m2 was less than 0.5. The value of m is the proportion of observers who would give a positive interpretation. The relative percentage of agreement is defined as agreement with the majority interpretation, not as agreement on positive interpretations. A value of m less than 0.5 implies that the majority interpretation is negative. Therefore, when m is less than 0.5, the agreement with the majority is calculated as (1 - m).

The kappa statistic was calculated for pairs of observers using the three response categories [10]. The mean and standard deviation of the kappa statistic were computed for the 21 possible pairings of seven observers for hard copy and soft copy.


Results
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
APPENDIX: Example of Question...
References
 
Distribution of Equivocal Reports
Of the 33 soft-copy cases reported as equivocal, 57% (19/33) received one equivocal report, and 27% (9/33) received two equivocal reports. Similarly, of the 44 hard-copy cases reported as equivocal 59% (26/44) received one report, and 25% (11/44) received two reports. The number of cases with equivocal reports is shown in Table 2. Only five chest cases were reported equivocal by four or more observers in either the hard-copy or the soft-copy mode. The clinical history given with the cases is listed in Table 3.


View this table:
[in this window]
[in a new window]

 
TABLE 2 Equivocal Interpretations by Seven Observers in 100 Cases

 

View this table:
[in this window]
[in a new window]

 
TABLE 3 Clinical History in Cases Reported as Equivocal by Most of Seven Observers

 

Mixture Distribution Analysis
The results of the MDA are shown in Table 4 and plotted in Figure 1 as the means and the 95% confidence intervals (CIs) calculated from the bootstrap standard deviations. The relative percentage of agreement was 90.1 (86-94, 95% CI) for hard copy and 89.8 (86-94, 95% CI) for soft copy.


View this table:
[in this window]
[in a new window]

 
TABLE 4 Relative Percentage of Agreement and Parameters of Binomial Distributions

 


View larger version (15K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1. Graph shows parameters m and p plotted for hard copy ([UNK]) and soft copy ({blacksquare}) as mean and 95% confidence intervals.

 

Kappa
The mean kappa statistic for 21 pairs of observers was 0.48 (SD, 0.07) for the hard copy and 0.49 (SD, 0.08) for the soft copy.


Discussion
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
APPENDIX: Example of Question...
References
 
The approach to comparing two imaging modalities using reliability is based on the assumption that, in the real clinical world, when two experts agree, they are very likely to be right. When they disagree, it is usually impossible to be sure who is right. Fortunately for our patients, most imaging examinations made in the emergency department are easy cases for the experts, who agree approximately 90% of the time. We think that the agreement expressed by the relative percentage of agreement is more intuitive and easier to interpret compared with the value of the kappa statistic. When the actual accuracy of the test is high, agreement is high. Agreement with the truth implies agreement with each other. The converse is not necessarily true. People can agree and be wrong. We have shown that, for some imaging studies, the relative percentage of agreement mirrors the value of accuracy expressed as the area under the ROC curve [6,7]. Henkelman et al. [11] have shown that the actual accuracy determined by ROC analysis also can be estimated by combining the results from multiple imaging modalities obtained on the same cases without knowledge of the truth. We are currently studying the effect of the magnitude of actual test accuracy on the correlation between the area under the ROC curve and the relative percentage of agreement.

The value of the kappa statistic is also difficult to interpret because it varies with the ratio of abnormal to normal cases, even when the basic level of agreement does not change [12]. The calculation of kappa assumes that agreement must be due to chance when there is a high prevalence of either normal or abnormal cases. In imaging studies, observers actually agree on high prevalence cases. When two radiologists report a case as showing normal findings in a population in whom most of the studies being done show normal findings, their interpretations should not be classified as chance agreement.

We conclude that, in the emergency department, soft-copy interpretation is as reliable as hard-copy interpretation. The strength of this conclusion depends on the validity of the MDA approach and the extent to which the observer sample and case sample represent the cases seen in the emergency department. Table 1 and Figure 1 show that when cases were interpreted using hard copy, the proportion of cases called negative was slightly lower and the proportion of cases called positive was slightly higher. The differences were well within the 95% confidence limits, but suppose that they are real: it would take a very large study to determine if the observed difference was due to a real difference between hard copy and soft copy or was just variation by chance. Gur [13] has pointed out that very small differences in the accuracy of image interpretation may have a "significant effect on the actual added value of expert image interpretation." Using the MDA in a field study eliminates the need for a verification panel and frees up expert time for case interpretation. A larger study can be done using the added observers without an increase in total effort. The MDA methodology is useful for comparing imaging systems in situations in which independent proof is not available.


APPENDIX: Example of Question on Response Form
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
APPENDIX: Example of Question...
References
 
Go



View larger version (13K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 

 


References
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
APPENDIX: Example of Question...
References
 

  1. Eng J, Mysko WK, Weller GER, et al. Interpretation of emergency department radiographs: a comparison of emergency medicine physicians with radiologists, residents with faculty, and film with digital display. AJR 2000;175:1233 -1238[Abstract/Free Full Text]
  2. Slasky BS, Gur D, Good WF, et al. Receiver operating characteristic analysis of chest image interpretation with conventional, laser printed, and high-resolution workstation images. Radiology 1990;174:775 -780[Abstract/Free Full Text]
  3. Thaete FL, Fuhrman CR, Oliver JH, et al. Digital radiography and conventional imaging of the chest: a comparison of observer performance. AJR 1994;162:575 -581[Abstract/Free Full Text]
  4. Kundel HL, Gefter W, Aronchick J, et al. Relative accuracy of screen-film and computed radiography using hard and soft copy readings: a receiver operating characteristic analysis using bedside chest radiographs in a medical intensive care unit. Radiology 1997;205:859 -863[Abstract/Free Full Text]
  5. Revesz G, Kundel HL, Bonitatibus M. The effect of verification on the assessment of imaging techniques. Invest Radiol 1983;18:194 -198[Medline]
  6. Kundel HL, Polansky M. Mixture distribution and receiver operating characteristic analysis of bedside chest imaging using screen-film and computed radiography. Acad Radiol 1997;4:1 -7[Medline]
  7. Polansky M. Mixture distribution analysis. In: Beutel J, VanMeter R, Kundel H, eds. Handbook of imaging physics and perception. vol. 1. Bellingham, WA: Society of Photo-Optical Instrumentation Engineers. (Proceedings) 2000: 797-835
  8. Kundel HL, Polansky M. Comparing observer performance with mixture distribution analysis when there is no external gold standard. In: Kundel HL, ed. Medical imaging 1998: image perception: proceedings, vol. 3340. Bellingham, WA: Society of Photo-Optical Instrumentation Engineers, 1998:78 -84
  9. Efron B. Better bootstrap confidence intervals. JASA 1987;82:171 -200
  10. Fleiss JL. Statistical methods for rates and proportions: Wiley series in probability and mathematical statistics. New York: Wiley, 1981:212 -232
  11. Henkelman RM, Kay I, Bronskill MJ. Receiver operating characteristic (ROC) analysis without truth. Med Decis Making 1990;10:24 -29
  12. Cicchetti D, Feinstein A. High agreement but low kappa. I. The problem of two paradoxes. J Clin Epidemiol 1990;43:543 -549[Medline]
  13. Gur D. Operating at the diagnostic margins: image quality considerations. AJR 1993;160:1341 -1342[Abstract/Free Full Text]

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
JOURNAL OF THE ICRUHome page
References
J. ICRU, April 1, 2008; 8(1): 57 - 62.
[PDF]


Home page
Br. J. Radiol.Home page
N Buls, W Shabana, P Verbeek, P Pevenage, and J De Mey
Influence of display quality on radiologists' performance in the detection of lung nodules on radiographs
Br. J. Radiol., September 1, 2007; 80(957): 738 - 743.
[Abstract] [Full Text] [PDF]


Home page
Br. J. Radiol.Home page
T Pudas, L Korsoff, T Kallio, M Uhari, and A Alanen
Influence of film digitization on radiological interpretation
Br. J. Radiol., November 1, 2005; 78(935): 993 - 996.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
C. Balassy, M. Prokop, M. Weber, J. Sailer, C. J. Herold, and C. Schaefer-Prokop
Flat-Panel Display (LCD) Versus High-Resolution Gray-Scale Display (CRT) for Chest Radiography: An Observer Preference Study
Am. J. Roentgenol., March 1, 2005; 184(3): 752 - 756.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
M. H. Fuchsjager, C. M. Schaefer-Prokop, E. Eisenhuber, P. Homolka, M. Weber, M. A. Funovics, and M. Prokop
Impact of Ambient Light and Window Settings on the Detectability of Catheters on Soft-Copy Display of Chest Radiographs at Bedside
Am. J. Roentgenol., November 1, 2003; 181(5): 1415 - 1421.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
C. D. Johnson
Hard- versus Soft-Copy Interpretation
Radiology, June 1, 2003; 227(3): 629 - 630.
[Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
B. I. Reiner, E. L. Siegel, and F. J. Hooper
Accuracy of Interpretation of CT Scans: Comparing PACS Monitor Displays and Hard-Copy Images
Am. J. Roentgenol., December 1, 2002; 179(6): 1407 - 1410.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
L. F. Rogers
PACS: Radiology in the Digital World
Am. J. Roentgenol., September 1, 2001; 177(3): 499 - 499.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kundel, H. L.
Right arrow Articles by Miller, W. T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kundel, H. L.
Right arrow Articles by Miller, W. T., Jr.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Hotlight (NEW!)
Right arrow
What's Hotlight?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS