AJR Your Link to CME
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Graf, B.
Right arrow Articles by Fiedler, V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Graf, B.
Right arrow Articles by Fiedler, V.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
AJR 2000; 174:1067-1074
© American Roentgen Ray Society


IK Versus 2K Monitor

A Clinical Alternative Free-Response Receiver Operating Characteristic Study of Observer Performance Using Pulmonary Nodules

B. Graf1,2, U. Simon3, F. Eickmeyer1 and V. Fiedler1

1 Klinikum Krefeld, Institute of Diagnostic Radiology, Lutherplatz 40, 47805 Krefeld, Germany.
2 Present address: Siemens AG, Medical Engineering, Völklinger Str. 2, 40219 Düsseldorf, Germany
3 German National Research Center for Information Technology (GMD), Institute for Applied Information Technology, Schloss Birlinghoven, 53754 Sankt Augustin, Germany.

Received July 24, 1998; accepted after revision September 21, 1999.

 
B. Graf is a consultant to Siemens.

Address correspondence to B. Graf


Abstract
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
OBJECTIVE. The aim of this study was to investigate whether and how observer performance in detecting pulmonary nodules is influenced by the use of 1K and 2K monitors with and without voluntary postprocessing.

MATERIALS AND METHODS. The study was conducted with clinical digital chest radiographs of 48 patients. CT images of the same patient group served as the gold standard. Data on four different monitor conditions (1K overview, 2K overview, 1K with postprocessing, and 2K with postprocessing) were collected using a 6-point confidence-rating scale and interpreted with an alternative free-response receiver operating characteristic.

RESULTS. When magnification and window settings were applied on the 1K monitor at the expense of an increased interpretation time, observer performance with the 1K monitor was not significantly different from that with the 2K monitor. A significant difference only occurred between the 1K monitor postprocessing condition and the 1K monitor overview condition.

CONCLUSION. Considering diagnostic accuracy, the 1K monitor is sufficient for the detection of pulmonary nodules, provided that postprocessing options—especially magnification—are applied. Further comparative monitor studies on the detectability of other abnormalities (e.g., fine interstitial structures) need to be performed.


Introduction
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
The purpose of our study was to investigate whether and how the diagnostic quality in the detection of pulmonary nodules is influenced by the use of a 2K monitor (2048 x 2560 pixels) compared with a 1K monitor (1280 x 1024 pixels), which are both high-contrast grayscale Simomed monitors (Siemens, Karlsruhe, Germany). These cathode-ray-tube monitors are increasingly used for viewing radiologic images. The development of monitors with higher spatial resolution is a result of the widespread application of digital radiography and the growing number of hospitals using the picture archiving and communication systems (PACS). Especially in filmless hospitals with exclusively digital imaging techniques [1,2,3,4], soft-copy reporting has become essential for the operational efficiency of a networked radiology department. In our hospital, the question arose as to whether and how the use of the newer 2K instead of the 1K monitor will increase diagnostic reliability and validity.

Because the spatial resolution of the monitor is a main perceptual factor of the diagnostic quality in soft-copy interpretation [5], our first hypothesis was that the higher resolution of the 2K monitor compared with that of the 1K monitor would change the diagnostic quality. Our second hypothesis was that the use of voluntary image postprocessing would further change the diagnostic decision quality when the perceptual factors are the main influence on the diagnostic decision. We made no assumption about the direction of change because diagnostic decisions tend to be influenced not only by perceptual factors, but also by individual cognitive reaction styles, the stress of a diagnostic situation, and the differences in user performance. Our hypotheses were tested in an alternative free-response receiver operating characteristic (ROC) design.


Materials and Methods
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Patient Sample (Stimulus Material)
The patients were recruited for the study with a case selection feature in our radiology information system (Simedos; Siemens, Erlangen, Germany). In the first run, the database was searched for all patients who underwent a CT examination of the thorax from May 1996, when the digital luminescence radiography unit was established, to May 1997, the time of selection. From this patient sample (n = 162), only those patients who underwent a digital luminescence radiography chest examination (all with the same digital luminescence radiography unit) within a maximum of 2 weeks before or after the CT examination and who had pulmonary nodules were selected. The records of 130 patients were searched for patients with and without pulmonary nodules by three experienced radiologists who did not participate in the study. A sample of 59 patients (12 patients without pulmonary nodules and 47 patients with pulmonary nodules) was selected. According to the CT images and the digital radiographs, all pulmonary nodules that were masked by other pulmonary parenchymal abnormalities such as ground-glass opacities, consolidation, and atelectasis were excluded by the same three radiologists, resulting in a sample size of 36 patients with 68 pulmonary nodules. Calcified scars and hilar and pleural nodules were excluded. The CT images and digital radiographs of the patients without pulmonary nodules were reviewed twice. Only four of 12 patients without nodules had no abnormalities at all.

In the final patient sample group, the chest radiographs of 36 patients (17 women, 19 men) revealed at least one pulmonary nodule, and the chest radiographs of 12 patients (eight women, four men) did not indicate any pulmonary nodules. For all patients, CT scans of the chest served as gold standard. The patients with nodules were 36-77 years old (mean, 61 ± 10 years), and the patients without nodules were 34-80 years old (mean, 60 ± 14 years). The period between CT and digital luminescence radiography examination was 0-14 days (mean, 4 ± 4 days) for the patients with nodules and 0-13 days (mean, 4 ± 4 days) for the patients without nodules. The indications for the CT examinations for all patients are shown in Figure 1. Figure 2 shows a histogram of nodule diameter for the 68 pulmonary nodules seen on the CT scans. Pulmonary nodules were caused by bronchus carcinoma (n = 8), metastasis (n = 36), inflammatory infiltration (n = 6), lymphoma (n = 6), scar (n = 11), and histiocytoma (n = 1). The period between CT and digital luminescence radiography examination of inflammatory infiltrations was 3 days; no substantial evolution of the inflammatory process occurred during this time. The localization of the nodules was registered schematically by drawing them in standardized lung diagrams for every patient record.



View larger version (32K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1. —Bar chart shows prevalence of indications for CT examinations of patients with and without pulmonary nodules. Note that because the difference between sample size of both patient groups was approximately 75% statistical testing of similarity between two prevalence distributions for samples was not conducted. Light shading represents number of patients with nodules. Dark shading represents number of patients without nodules.

 


View larger version (14K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 2. —Histogram shows diameter of pulmonary nodules revealed by CT. Note that in total 68 pulmonary nodules were evident on CT scans with diameter of 3-35 mm, mean diameter of 12 mm, and median value of 10 mm. SD = 7.5 mm.

 

For all patients, the set of digital radiographs consisted of a posteroanterior and a lateral image. The digital luminescence radiographs were obtained with 125 kVp, 200-cm focus-detector distance, and a 12:1 grid (40 lines per centimeter, focused to 180 cm, 2.5 mm A1 filter) on a Digiscan 2H digital imaging plate system (Siemens, Erlangen, Germany) with ST-V screens and a sensitivity class of 200. According to the body size of each patient, a screen format of 35 x 35 cm with 1760 x 1760 x 10 bits per pixel for smaller patients or 35 x 43 cm2 with 1760 x 2144 x 10 bits per pixel for larger patients was applied. The radiographs had a pixel size of 0.2 mm and spatial resolution of 2.5 line pairs/mm.

Compared Monitors
In our radiology department, 1K monitors (high-contrast grayscale Simomed monitors, type: SMM 2183 L; Siemens, Karlsruhe, Germany) are used for soft-copy reporting. In this study, the conventional 1K monitor was compared with a newly developed high-resolution 2K monitor (high-contrast gray-scale Simomed monitor, type: SMM 21190 P; Siemens, Karlsruhe, Germany). The 1K monitor was connected to a SUN Sparc 5 computer (TRITEC Electronic, Mainz, Germany) and the 2K monitor to a SUN Ultra Sparc 1 computer (TRITEC Electronic). On the 2K monitor, a later software version was implemented with a user interface slightly different from that of the 1K monitor. Table 1 summarizes the relevant monitor parameters. The different software versions on the 1K and 2K monitors did not affect the study because the observers applied only a few identical software functions. The button design was composed of text on the 1K monitor and icons on the 2K monitor.


View this table:
[in this window]
[in a new window]

 
TABLE 1 Monitor Specifications

 

Brightness of both monitors was adjusted to 0.06 foot-lamberts for the dark control field and 73 foot-lamberts for the bright control field. A monitor test pattern (standard RP-133) of the Society of Motion Pictures and Television Engineers (New York, NY) was used for the adjustment of brightness. The brightness values were regularly controlled using a Mavo-Monitor digital instrument (Gossen, Erlangen, Germany). The ambient room lighting was dimmed during image viewing.

On both monitors, digital luminescence radiographs were presented in two different ways: overview images (a) without postprocessing and (b) with voluntary postprocessing (double magnification and window-level adjustment). The digital luminescence radiographs were displayed in full resolution on the 2K monitor overview and on the 1K monitor overview with half acquisition size. The maximum size of the overview image segment was 1024 x 1024 pixels on the 1K monitor and 2048 x 2048 pixels on the 2K monitor (area on both monitors, 300 x 300 mm2). The rest of the display was used for the application menu. The different digital luminescence radiography image formats (quadratic, 1760 x 1760 pixels; rectangular, 1760 x 2144 pixels) were displayed on the monitors under the overview condition; areas of 880 x 880 pixels (half acquisition size, factor 0.5) on the 1K monitor and 1760 x 1760 pixels (full resolution, factor 1) on the 2K monitor were assigned to the quadratic digital luminescence radiographs. The rectangular images were visualized with a slightly different display scale factor on both monitors. The images were fit into the quadratic segment and visualized at an area of 841 x 1024 pixels (factor 0.48) on the 1K monitor and 1682 x 2048 pixels (factor 0.96) on the 2K monitor.

Study Design
Six certified radiologists with a minimum of 9 years' experience in interpreting chest radiographs participated in the study. Eight viewing sessions per observer were separated by a relatively short interval of 0-8 days (mean, 2 ± 2 days) because the study had to be performed in a period of only 7 weeks. Image repetition occurred at an interval of 1-12 days (mean, 5 ± 3 days). The examination period was limited because the 2K monitor was on loan from Siemens.

The sessions took place in our regular interpretation room. Before the sessions, each observer was given written instructions. Each of the radiologists had at least 2 years of experience with soft-copy interpretation.

All radiographs of the 48 patients (68 nodules) were presented under four different monitor conditions: 1K and 2K monitors each with and without voluntary postprocessing (magnification and window-level adjustment). The monitor types could not be anonymous because of the different display formats (1K monitor, landscape; 2K monitor, portrait). Furthermore, the 2K monitor was easy to identify with the naked eye.

All digital luminescence radiographs were made anonymous, and the image name was hidden during interpretation session. For every observer, the cases were randomly divided into four subsets of 12 images each. In every session, each observer viewed one subset on the 1K monitor and one subset on the 2K monitor, both either with or without voluntary postprocessing. A specific image was reviewed only once per session. The sessions were interchanged in such a manner that three observers started with the 1K monitor and the other three observers started with the 2K monitor. Image presentation order was designed to offset any practice effects.

The observers were asked to detect every pulmonary nodule and subjectively estimate the nodule's suspected presence on a six-level confidence-rating scale: 1 = no, 2 = weak, 3 = possible, 4 = probable, 5 = strong, and 6 = unequivocal suspected presence of a pulmonary nodule. Furthermore, observers specified the location and estimated the diameter of the nodules. All specifications were assessed by the chairperson of the session in a standardized lung diagram with the help of anatomic landmarks such as the chest and bone structures. On all sessions, the observer could page forward and backward between posteroanterior and lateral radiographs. When image interpretation was finished for one image and interpretation time was stopped, the nodule diameter was measured by the observer. The observers were given unlimited interpretation time. Interpretation time was recorded with the digital clock displayed on the monitor. Timing began as soon as the image appeared on the screen and ended when the observer had completed the diagnostic evaluation.

In the viewing sessions with postprocessing, window settings and magnification could be used by the observers. On both monitors, the observers chose between two different magnifier functions: "magic glass" and "magnify." With the magic glass function, the enlarged part of the image is precisely one quarter of the image segment. Within this image detail, the original image is displayed magnified by two. Using the magnify function, the whole image is displayed with an enlarged matrix (double magnification). Thus, only a part of the image is displayed in the image segment. No other option (e.g., unsharp masking) was permitted. Generally, in our department, window levels (width and center) of chest radiographs are adjusted to width = 820, center = 613 for the posteroanterior radiographs and width = 604, center = 651 for the lateral radiographs. For some radiographs, these values were changed and then archived by the radiologist during initial reporting. For all images, the window values were not reset before our retrospective study. The archived window values served as base values. In four of 48 patients, the lateral image was not optimally adjusted because window-level changes were not saved by the radiologists during initial interpretation. The settings for the monitors and the interpretation environment were standardized for all sessions.

Statistical Analysis
Pretest (alternative free-response ROC).—As previous studies on the threshold detectability of pulmonary nodules have already shown, the visibility limit of soft-tissue nodules in the lungs on conventional radiographs is a diameter of 3 mm [6]. With a nodule size of 8-10 mm, observers could separate true cancer from noise that mimics the appearance of cancer on radiography. The results of hard- and soft-copy interpretation did not differ [7]. Also, statistic and psychologic item test theory makes it necessary to evaluate whether a set of stimulus items (pulmonary lung nodules) is adequate in the representation of the abilities of test subjects [8]; therefore, it is necessary to have a broad spectrum of items in the middle degrees of difficulty. Therefore, we conducted an alternative free-response ROC testing for the complete sample of 68 lung nodules, including nodules with a diameter of 3-6 mm, as a statistical pretest to see whether our set of stimuli (pulmonary nodules) is adequate to represent our observers' abilities and is in concordance with the experimental results.

ROC analysis and alternative free-response ROC analysis.—ROC and alternative free-response ROC analyses are methods originally developed in psychophysics. The main advantage of ROC and alternative free-response ROC analysis compared with other psychophysic scaling methods is the separation between the sensory threshold function and the cognitive decision bias with the four classes of possible answers (true-positive, true-negative and false-positive, false-negative) and with the assumption of two different probability distributions for positive and negative reactions [9]. The value of a simple one-dimensional ROC testing for evaluating diagnostic competence has been proven in several clinical studies about medical diagnostic quality control and management [10,11,12], but it neither requires correct localization of nodules nor allows for multiple abnormalities to be present in the same image.

Correct localization of nodules and multiple abnormalities present in the same image would be possible with a multidimensional ROC analysis, which would cause some mathematic curve estimation, consistency, and separation problems. The free-response ROC analysis [13] and, as a specific form of it, the alternative free-response ROC analysis [14, 15] have been developed as approximate solutions. Both analyses allow an arbitrary number of nodules per image. Observers indicate the confidence levels and the locations of all perceived nodules. Alternative free-response ROC as a variation of free-response ROC scores images and detected stimuli simply in a different weighting. Instead of false-positive detections, false-positive images are counted. Therefore, only the highest confidence false-positive decision per image is included regardless of how many lower confidence false-positive decisions are made on images with and without nodules.

In this study, the total number of positive cases was equal to the number of pulmonary nodules, and the total number of negative cases was determined by counting false-positive images and true-negative images. The standard computer program ROCFIT (Metz CE, Chicago, IL) was applied to calculate the area under the alternative free-response ROC curve (Az), which represents an index of observer performance. This software is adapted and improved from the original RSCORE II computer program (Dorfman DD, Iowa City, IA) developed by Dorfman and Alf [16]. The application of ROCFIT to alternative free-response ROC data was suggested by Chakraborty and Winter [15] and has already been used by other researchers [14,15,16,17,18,19].

The alternative free-response ROC parameters a and b for the binormal ROC distribution were averaged across the observers to produce an average alternative free-response ROC. The standard deviations (SD) were computed by the formula [20]:

Az,av is the average area under the curve of a monitor condition and Az,i is the area under the curve for one observer. N is the number of observers. The alternative free-response ROC was calculated and both statistically significant differences between monitor conditions and observers were determined by using a two-tailed Student's t test for paired observations ({alpha}=0.05).

Further analyses.—To test differences in observer performance for statistical differences, we evaluated interpretation times with two-tailed Student's t tests for paired observations ({alpha}=0.05) between the four monitor conditions (six pairs) and between the six observers (15 pairs). We also calculated sensitivity and specificity with equally weighted positive responses [21] for the two sample sizes (n = 45 nodules, n = 68 nodules) and tested their differences for statistical significance with a two-tailed Student's t test for paired observations ({alpha} = 0.05) systematically for different monitor conditions and different observers for the sample of nodules with a diameter of 7-35 mm (n = 45). Finally, the application of the postprocessing options was investigated.


Results
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Pretest (Alternative Free-Response ROC)
The mean area under the curve value of all observers was 0.55 (SD = 0.090) for the 1K monitor without postprocessing and 0.60 (SD = 0.095) for the 1K monitor with post-processing. The mean area under the curve value of all observers was 0.60 (SD = 0.114) for the 2K monitor without postprocessing and 0.60 (SD = 0.119) with postprocessing. In the results for each observer, small (two 3-mm, seven 4-mm, 13 5-mm, and one 6-mm) nodules were detected in only 10% of the images under every monitor condition; therefore, sufficient differentiation between the efficacy of different monitor conditions cannot be determined. The mean across all observers for the small nodules (3-6 mm diameter) was 1K overview, 1.2 ± 0.79; 2K overview, 1.2 ± 0.74; 1K with postprocessing, 1.3 ± 0.93; and 2K with postprocessing, 1.3 ± 0.96.

Alternative Free-Response ROC Analysis
After the statistical pretest for n = 68 nodules with an alternative free-response ROC analysis, 23 nodules with a diameter of 3-6 mm were excluded from the alternative free-response ROC analysis because of their poor detectability. According to Oestmann and Galanski [20] and the statistical test item theory, the area under the curve values with a range of 0.65-0.90 are desirable.

An average alternative free-response ROC curve was determined for each monitor condition (Fig. 3). For the four monitor conditions and the six observers, alternative free-response ROC curves were calculated (Figs. 4,5,6,7).



View larger version (20K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 3. —Graph shows alternative free-response receiver operating characteristic (ROC) curves of every monitor condition averaged for all six observers. Az = area under alternative free-response ROC curve. a and b represent vertical intercept and slope, respectively, of alternative free-response ROC curve when it is plotted on normal—deviate axes. Note that with regard to area under average alternative free-response ROC curve, 1K monitor with postprocessing (0.73 ± 0.081) and 2K monitor with (0.75 ± 0.101) and without (0.75 ± 0.109) postprocessing have equivalent results in detecting pulmonary nodules in digital chest radiographs in clinical interpretation environment. 1K monitor without postprocessing performed poorly (0.69 ± 0.086). 1K overview, a = 0.65, b = 0.81, Az = 0.6931. 2K overview, a = 0.86, b = 0.76, Az = 0.7524. 1K with postprocessing, a = 0.81, b = 0.84, Az = 0.7330. 2K with postprocessing, a = 0.85, b = 0.75, Az = 0.7504.

 


View larger version (26K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 4. —Graph shows alternative free-response receiver operating characteristic curves for all six observers on 1K monitor overview without postprocessing. Note high interobserver variability but consistency of accuracy in performance of each observer under different viewing conditions. Observer 2 (minimum area under curve, 0.58) and observer 6 (maximum area under curve, 0.82) stood out among six observers. Observer 1, a = 0.38, b = 1.00, Az = 0.6050. Observer 2, a = 0.24, b = 0.52, Az = 0.5839. Observer 3, a = 0.76, b = 0.81, Az = 0.7211. Observer 4, a = 0.67, b = 1.06, Az = 0.6768. Observer 5, a = 0.70, b = 0.70, Az = 0.7159. Observer 6, a = 1.17, b = 0.80, Az = 0.8190.

 


View larger version (27K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 5. —Graph shows alternative free-response receiver operating characteristic curves for all six observers on 2K monitor overview without postprocessing. We determined ranking for observers according to calculated area under curve values for every monitor condition for which 2K monitor overview is representative example. On average, observer 6 performed best followed by observer 4, observer 5, observer 3, observer 1, and observer 2. Observer 1, a = 0.57, b = 0.96, Az = 0.6597. Observer 2, a = 0.27, b = 0.28, Az = 0.6024. Observer 3, a = 0.56, b = 0.56, Az = 0.6884. Observer 4, a = 1.15, b = 1.03, Az = 0.7882. Observer 5, a = 0.76, b = 0.55, Az = 0.7457. Observer 6, a = 1.83, b = 1.18, Az = 0.8821.

 


View larger version (24K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 6. —Graph shows alternative free-response receiver operating characteristic curves for all six observers on 1K monitor with voluntary postprocessing. Note that again observer 6 performed best with area under curve of 0.82 and observer 2 performed the worst with area under curve of 0.60. Observer 1, a = 0.71, b = 1.13, Az = 0.6816. Observer 2, a = 0.29, b = 0.60, Az = 0.5971. Observer 3, a = 1.07, b = 0.82, Az = 0.7963. Observer 4, a = 0.83, b = 0.93, Az = 0.7281. Observer 5, a = 0.69, b = 0.54, Az = 0.7278. Observer 6, a = 1.27, b = 0.96, Az = 0.8190.

 


View larger version (26K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 7. —Graph shows alternative free-response ROC curves for all six observers on 2K monitor with voluntary postprocessing. Note that on 2K monitor with voluntary postprocessing, observer 6 achieved highest area under curve value (0.89) in this alternative free-response ROC analysis. Again observer 2 showed inferior performance compared with other observers (area under curve, 0.62). Observer 1, a = 0.97, b = 1.21, Az = 0.7311. Observer 2, a = 0.30, b = 0.30, Az = 0.6130. Observer 3, a = 0.30, b = 0.21, Az = 0.6150. Observer 4, a = 1.12, b = 1.20, Az = 0.7645. Observer 5, a = 0.56, b = 0.48, Az = 0.6923. Observer 6, a = 1.83, b = 1.12, Az = 0.8882.

 

Considering the six pairs of monitor conditions, only one pair showed a significant difference. On the 1K monitor with postprocessing, the average observer performance was significantly better than on the 1K monitor overview (p < 0.05) (Table 2). With regard to observer performance, two observers stood out. The results of observer 2 were significantly worse than those of observers 4, 5, and 6, and observer 6 performed significantly better than all other observers. The performances of observers 1, 3, 4, and 5 did not significantly differ (p < 0.05) (Table 2).


View this table:
[in this window]
[in a new window]

 
TABLE 2 Comparison of the Average Areas Under Alternative Free-Response Receiver Operating Characteristic Curve for All Monitor Condition Pairs and All Observer Pairs (Two-Tailed Paired Student's t test, {alpha} = 0.05)

 

Analysis of Interpretation Times
The average interpretation times for the observers and for the monitor conditions are shown in Table 3. Observer 2 had the longest average reaction times, ranging from 96 to 157 sec under all monitor conditions, and observer 6 had the shortest average reaction times, ranging from 31 to 64 sec. Observer 6 had significantly shorter interpretation times over all monitor conditions compared with observers 2, 3, 4, and 5, and observer 2 had significantly longer interpretation times over all monitor conditions compared with observers 1, 4, and 6. Observers 3, 4, and 5 were not significantly different. All calculations were conducted for p < 0.05.


View this table:
[in this window]
[in a new window]

 
TABLE 3 Average Interpretation Time (sec) per Image for All Four Monitor Conditions

 

Averaged over all observers, the monitor conditions were significantly different for the 1K monitor without postprocessing condition versus the 1K monitor condition with postprocessing condition, the 1K monitor without postprocessing condition versus the 2K monitor with postprocessing, and the 2K monitor without postprocessing condition versus the 2K monitor with postprocessing condition. For these compared monitor conditions, the application of postprocessing functions led to a significant increase in the observers' reaction times.

In comparison with the 2K monitor without postprocessing, interpretation time increased by a mean factor of 1.5 ± 0.57 when both postprocessing options were applied on the 1K monitor. The monitor conditions factors are presented in Table 4.


View this table:
[in this window]
[in a new window]

 
TABLE 4 Interpretation Time Factors

 

Sensitivity and Specificity
If all nodules (n = 68) are included in the analysis, an average sensitivity of 44% (SD = 3.5%) is calculated. If small nodules with a diameter smaller than 7 mm are excluded (n = 45), average sensitivity increases to 61% (SD = 4.6%). Average specificity for both sample sizes amounts to 68% (SD = 16.9%).

The following values were determined for the sample of n = 45 nodules with a diameter larger than 6 mm. The differences in sensitivity and specificity between the four monitor conditions were not significant. Observers 2 (p = 0.04) and 4 (p = 0.01) had a significantly higher sensitivity than observer 6. The specificity values averaged over the four monitor conditions reflect the alternative free-response ROC results, indicating similar differences between the observers. Observer 2 had the worst average specificity (38%) while observer 6 achieved the best average specificity (91%). The other observers achieved an average specificity of 63% (observer 1), 70% (observer 3), 69% (observer 4), and 75% (observer 5). Observer 2 achieved a significantly lower specificity than all other observers and observer 6 reached a significantly higher specificity than all other observers (p < 0.05). The specificity of observer 1 was significantly lower than the specificity of observer 4 (p = 0.014) and observer 5 (p = 0.043).

Application of Postprocessing Options
The application of the postprocessing options (window-level adjustment or magnification or both) was averaged for all observers and cases, and the results are presented in Figure 8. Three observers (observers 1, 3, and 5) applied the magic glass on both monitor types. The other three observers preferred the magnify option but used it only on the 1K monitor because it is too slow on the 2K monitor, which has a larger number of pixels. These observers applied the magic glass on the 2K monitor. When the lateral image was not optimally adjusted (4/48), all observers applied window-level adjustment on the 1K monitor for all views and on the 2K monitor for 83% of the cases (average across the observers). Differences were detected between the observers. Observer 3 used both postprocessing options for all images. Observer 6 used neither magnification nor window settings in considerably more cases than the other observers.



View larger version (26K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 8. —Bar chart shows application of postprocessing options averaged for all observers and patients. Note that in just over half of cases observers applied both options, and only in few cases (1K: 8%, 2K: 13%) did observers not apply postprocessing. For all 48 patients, application frequency of window setting and magnification was investigated for all observers. Student's t test did not indicate statistically significant differences in use of window setting (p = 0.14) and magnification (p = 0.24) between 1K and 2K monitors. Light shading = 1K monitor with postprocessing. Dark shading = 2K monitor with postprocessing.

 


Discussion
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Discovering which monitor type is associated with the best observer performance is of general interest now that monitor reporting is an accepted alternative to radiograph reporting. With regard to the different monitor conditions under investigation, the 1K monitor without postprocessing performed poorly in this alternative free-response ROC study. Only one significant difference was detected between the 1K monitor without postprocessing and the 1K monitor with postprocessing (p = 0.04). A slightly significant difference in observer performance was determined (p = 0.09) between the 1K monitor overview and the 2K monitor overview. No other significant differences were found. The results of this study indicate that the same observer performance might be achieved on the 2K monitor without postprocessing as on the 1K monitor with postprocessing. Observer performance is influenced by the spatial resolution of a digital system. With the application of a double magnification on the 1K monitor, the same spatial resolution is achieved as on the 2K monitor without magnification; therefore, these two monitor conditions were expected to show similar results. On the 2K monitor with post-processing, a higher observer performance is not achieved because on the 2K monitor a double magnification does not provide a real gain in information. The digital radiographs have a spatial resolution of 2.5 line pairs/mm (pixel matrix: 1760 x 1760 or 1760 x 2144), which is already displayed in full resolution on the 2K monitor without postprocessing. These findings are also supported by MacMahon et al. [5] who showed that increase of the quality of the observer performance by increase of spatial resolution will be limited if a certain pixel size is used. Their study was one of the first studies performed on the effect of pixel size on observer performance in interpretation of chest radiographs that had been digitized with a pixel size of 1.0, 0.5, 0.2, and 0.1 mm and printed on film. Compared with a pixel size of 1.0 mm, a significant improvement was obtained with a pixel size of 0.5 mm, and even further improvement was noted with a pixel size of 0.2 mm. The advantage of a pixel size of 0.1 mm was less apparent. Based on a field of view of 40 x 40 cm for chest radiographs, a 2048 x 2048 matrix would yield an effective pixel size of 0.2 mm. We suggest that this may prove to be a satisfactory compromise for digital chest radiography. The use of a 1024 x 1024 or coarser matrix seems likely to result in lower diagnostic accuracy.

Among our six observers, a high interobserver variability was detected in all investigative conditions, but the accuracy of performance was consistent for each observer under the different viewing conditions. Observers 2 and 6 stood out among the six observers. Under all four monitor conditions, the maximum area under the curve value was calculated for observer 6 and the minimum area under the curve value was determined for observer 2 (Figs. 4,5,6,7). These results are also reflected in the calculation of the specificity in which observer 2 had a high false-positive rate of 62% and observer 6 had a low false-positive rate of 9%. On the other hand, observer 2 achieved a significantly higher sensitivity than observer 6 (p = 0.04). Overall sensitivity was poor (61%), even after exclusion of nodules with a diameter smaller than 7 mm.

The analysis of interpretation or reaction times and their evaluation leads to some influence factors. The first factor is considering the monitor conditions in conjunction with the use of image processing functions. As expected, significant elongation in interpretation times occur with the use of image processing functions for the 1K and 2K monitors compared with the same monitors used without the image enhancement functions. Average interpretation time increased by a factor of 1.5 in the sessions with postprocessing compared with image interpretation without postprocessing on both monitor types, and observer 6, the observer with the shortest interpretation time, used neither magnification nor window settings in considerably more cases than the other observers.

Second, reaction times between the different observers were affected, independent from the use of image processing functions or the type of monitor used. Although observer 3 always used postprocessing functions, his reaction times did not differ significantly from those of observers 1, 4, and 5, indicating that the use of postprocessing functions is not the main factor for the length of interpretation times. Also, the range of mean interpretation times for each observer under different monitor conditions was relatively stable. This assumption is supported by the observation that the two observers, observers 2 and 6, with the most noticeable interpretation times also show the most differing diagnostic decision style from the other observers, according to the results of both the area under the curve analysis and the calculation of specificity. Reaction times are indicative of individual differences in processing information and intellectual ability; our results may indicate some cognitive bias in the diagnostic decisions of our observers. Here, further examinations are necessary.

There are five limitations to our study that should be discussed. First, the chest radiographs were selected during a 1-year period from one digital luminescence radiography unit in our radiology department. Only a limited number of patients were available who additionally underwent a CT examination (gold standard) in an interval of 0-14 days consecutively between the CT and the digital luminescence radiography examination. To obtain the maximum number of patients for the study, we included nodules with a location-dependent poor detectability positioned in the vicinity of anatomic structures: vascular pattern, heart shadow, and diaphragm in the lower quadrants, and bony structures in the upper quadrants. If such nodules had been excluded from the analysis, better ROC results might have been achieved by our observers.

Second, because the magnification is the most relevant factor in comparing 1K and 2K monitors, window-level adjustment should have been included in both the overview image and the postprocessing image to simply compare voluntary magnification of images versus nonmagnification.

Third, the study had to be performed in only 7 weeks because the 2K monitor was only available for that time period. If the study period had been longer, we could have defined longer intervals between the ROC sessions and, thus, may have been able to completely exclude practice effects.

Fourth, the overall applicability of the results of this study is somewhat limited because only one abnormality was investigated. The results are not generalizable to the routine interpretation of chest radiographs until other abnormalities (e.g., interstitial line structures, diffuse lung structures, pneumothoraces) are also evaluated.

Fifth, our lung nodule sample belonged to the clinical sample in our radiology department, so it is possible that the statistical sample distribution characteristics of the prevalences are not identical with the population distributions. Also, the distributions of gray values for the chosen lung nodules may not be characteristic for the diagnosed syndromes.

In conclusion, the results of this study indicate that no statistically significant differences in observer performance exist between 1K and 2K monitors for one specific abnormality (pulmonary nodules) provided that magnification and window settings are applied on the 1K monitor at the expense of an increased interpretation time. Considerable differences were detected among the six observers, but not among the four monitor conditions. Five of six observers subjectively preferred the 2K monitor because of its high resolution. Significant differences were not detected among the four monitor conditions with regard to sensitivity and specificity. Our results indicate that the use of 2K monitors without postprocessing could possibly expedite soft-copy interpretation in daily clinical routine.

Our initial hypothesis that different types of monitors in combination with different types of image enhancement functions will lead to statistically significant differences in diagnostic decisions could not be confirmed. Further comparative monitor studies on the detectability of other abnormalities (e.g., interstitial line structures, diffuse lung structures, pneumothoraces) still need to be performed. Cognitive and perceptual bias and their influence on diagnostic decision making need to be further investigated. Studies using 4K digital luminescence radiography screens should be performed as soon as 4K screens are widely available to test for a significant increase of diagnostic accuracy when using the 2K monitor with magnification.


Acknowledgments
 
We thank C. M. Schaefer-Prokop, M. Koschut, C. Paselk, K. D. Neubauer, H. W. Goergens, and H. J. Persicke for their participation in this study. We also thank B. Holzki, U. Bick, and E. A. Krupinski for their valuable assistance.


References
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 

  1. Fiedler V. Do HIS, RIS and PACS increase the efficiency of interdisciplinary team work? In: Lemke HU, Vannier MW, Inamura K, eds. Computer assisted radiology and surgery. Amsterdam: Elsevier Science B.V., 1997: 504 -510
  2. Morin RL, Berquist TH, Rueger W. The Mayo Clinic Jacksonville electronic radiology practice. In: Kilcoyne RF, Lear JL, Rowberg AH, eds. Computer applications to assist radiology. Carlsbad, CA: Symposia Foundation, 1996: 146 -151
  3. Mosser HM, Paertan G, Hruby W. Clinical routine operation of a filmless radiology department: three years experience. In: Jost R, Dwyer SJ, eds. Proceedings of the Society of Photo-Optical Instrumentation Engineers: medical imaging 1995— PACS design and evaluation—engineering and clinical issues. Bellingham, WA: SPIE, 1995: 321-327
  4. Tidman-Fuchs A, Christiansen F. The function of a digitized radiology department after one year in operation. In: Lemke HU, Vannier MW, Inamura K, Farman AG, eds. Computer assisted radiology. New York: Elsevier Science B.V., 1996: 1035
  5. MacMahon H, Vyborny CJ, Metz CE, Doi K, Sabeti V, Solomon SL. Digital radiography of subtle pulmonary abnormalities: an ROC study of the effect of pixel size on observer performance. Radiology 1986;158: 21 -26[Abstract/Free Full Text]
  6. Kundel HL. Predictive value and threshold detect-ability of lung tumors. Radiology 1981;139: 25 -29[Abstract/Free Full Text]
  7. Hofmann-Preiss K, Reichler B, Friedel N, Seyferth W. Detectability of pulmonary coin lesions: comparative assessment of the image quality of a storage phosphor system and a conventional film screen system. Aktuelle Radiol 1993;3: 152 -155[Medline]
  8. Lienert, GA. Testaufbau und Testanalyse. Weinheim: Verlag Julius Beltz, 1961
  9. Bortz J. Lehrbuch der empirischen Forschung. Heidelberg: Springer Verlag, 1984
  10. Paetan G, Mosser H, Tekusch A, Urban M, Augustin I, Hruby W. Befundung von digitalen Intensitv-bzw: Bettlungenaufnahmen am Monitor vs. Hardcopy—eine klinische ROC-Studie. Fortschr Roentgenstr 1994;161(4): 354 -360
  11. Razavi M, Sayre JW, Taira RK, et al. Receiver-operating-characteristic study of chest radiographs in children: digital hard-copy film vs 2K x 2K soft-copy images. AJR 1992;158: 443 -448[Abstract/Free Full Text]
  12. Slasky BS, Gur D, Good WF, et al. Receiver operating characteristic analysis of chest image interpretation with conventional, laser-printed, and high-resolution workstation images. Radiology 1990;174: 775 -780[Abstract/Free Full Text]
  13. Bunch PC, Hamilton JF, Sanderson GK, Simmons AH. A free-response approach to the measurement and characterization of radiographic-observer performance. J Appl Photogr Eng 1978;4: 166 -171
  14. Chakraborty DP. Maximum likelihood analysis of free-response receiver operating characteristic (FROC) data. Med Phys 1989;16: 561 -568[Medline]
  15. Chakraborty DP, Winter LHL. Free-response methodology: alternate analysis and a new observer performance experiment. Radiology 1990;174: 873 -881[Abstract/Free Full Text]
  16. Dorfman DD, Alf E Jr. Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals: rating-method data. J Math Psychol 1969;6: 487 -496
  17. Kundel HL, Nodine CF, Krupinski EA. Computer-displayed eye position as a visual aid to pulmonary nodule interpretation. Invest Radiol 1990;25: 890 -896[Medline]
  18. van Gils APG, van den Berg R, Falke THM, et al. MR diagnosis of paraganglioma of the head and neck: value of contrast enhancement. AJR 1994;162: 147 -153[Abstract/Free Full Text]
  19. Tanimoto A, Satoh Y, Yuasa Y, Jinzaki M, Hiramatsu K. Performance of Gd-EOB-DTPA and superparamagnetic iron oxide particles in the detection of primary liver cancer: a comparative study by alternative free-response receiver operating characteristic analysis. J Magn Reson Imaging 1997;7: 120 -124[Medline]
  20. Hayrapetian A, Aberle DR, Huang HK, et al. Comparison of 2048-line digital display formats and conventional radiographs: an ROC study. AJR 1989;152: 1113 -1118[Abstract/Free Full Text]
  21. Lienert GA. Verteilungsfreie methoden in der biostatistik. Meisenheim/Glan: Hain-Verlag, 1962
  22. Oestmann JW, Galanski M. ROC: a method to compare the diagnostic performance of imaging systems. Fortschr Roentgenstr 1989;151: 89 -92

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Am. J. Roentgenol.Home page
C. Balassy, M. Prokop, M. Weber, J. Sailer, C. J. Herold, and C. Schaefer-Prokop
Flat-Panel Display (LCD) Versus High-Resolution Gray-Scale Display (CRT) for Chest Radiography: An Observer Preference Study
Am. J. Roentgenol., March 1, 2005; 184(3): 752 - 756.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Graf, B.
Right arrow Articles by Fiedler, V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Graf, B.
Right arrow Articles by Fiedler, V.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS