|
|
||||||||
1 Department of Radiology, University of Pennsylvania Health Care System, 3600
Market St., Ste. 370, Philadelphia, PA 19104-2644.
2 Department of Radiology, Imaging Division, University of Pittsburgh, 300
Halkert St., Ste. 4200, Pittsburgh, PA 15213-3180.
Received February 1, 2002;
accepted after revision April 3, 2002.
C. F. Nodine was partially supported by grant DAMD17-97-1-7130.
Abstract
|
|
|---|
MATERIALS AND METHODS. Six radiology trainees (experience, 302-976 cases) and three mammographers (experience, 3000-5000 cases per year) reviewed 40 test cases. Each test case was represented by two mammograms that showed different views of the same breast. Twenty breasts contained suspicious lesions, and 20 were lesion-free. An interactive computer display system with an eyehead tracker measured the timing of decisions, where visual attention was directed, and how much time was spent fixating on a region of interest for each decision. Eye position was monitored during an initial-decision phase, and decision times were measured throughout a final-decision phase during which suspicious lesions recognized initially were interpreted and localized. Performance was analyzed using localization receiver operating characteristic curves.
RESULTS. The time course of interpreting mammograms is similar to that for interpreting chest radiographs. Mammographers detected 71% of the true lesions within 25 sec, and trainees detected 46% within 40 sec. Both a fixation dwell time of 1000 msec and a high level of confidence in the decision were associated with the detection of true lesions for the mammographers but not for the trainees.
CONCLUSION. Mammographers detected most breast lesions by global recognition within 25 sec, but trainees took more time. Prolonging one's search beyond the global recognition phase yielded few new lesions and increased the risk of error.
|
|
|---|
Ten years after the study by Christensen et al. [1], Berbaum et al. [2] reported on the time course of satisfaction of search. This study included the satisfaction of search condition for interpreting abnormalities on chest radiographs of lungs with simulated nodules and lungs without simulated nodules. Berbaum et al. used the choicereaction time method to measure inspection time from the onset of the display until each decision for a case. When a subject signaled that a decision had been made, the display was terminated, and the clock stopped while the subject gave a decision and level of confidence. Recording the inspection time then resumed for the next decision or began for the next case. This procedure differs from that used by Christensen et al., who measured decision time from signals on a tape recording of the review session. These two methods of measuring the time course of decision making led to slightly different results because Christensen et al. measured the time searching, deciding, and responding, whereas Berbaum et al. separated searching and deciding from responding for their time measurements.
For our study, we used another method of measuring decision time that is based on a model of visual search that pre-dates both of the studies cited earlier. In 1975, Kundel and Nodine [3] studied the instantaneous (200 msec) interpretation of chest radiographs in the absence of an opportunity for visual search. Receiver operating characteristic (ROC) performance measured as the area under the receiver operating characteristic curve was found to be 0.70 for one 200-msec flash. This study was designed specifically to test the hypothesis that a visual search begins with a global response that establishes context and flags deviations from the subject's expectations of normal. The global response initiates ensuing focal searches of the regions of interest that were flagged initially.
In the present study, subjects reported abnormal findings by moving a
cursor and clicking a button on the computer mouse. This study design made
analysis of the localization receiver operating characteristic (LROC) curve
possible so that we were able to distinguish between the localization of true
lesions and false lesions. Eye-position data were recorded during the study to
help corroborate reported lesions. Localization information could also be
coordinated with eye-position data to determine how visual attention was
allocated between correct and incorrect decisions. In the past, eye-position
data have helped identify the causes of error in radiographic interpretations.
These causes of error are failure to focus on an abnormality (search error);
failure to recognize an abnormality that is fixated on for less than 1000 msec
(recognition error); and failure to apply decision criteria to characterize an
abnormality that receives prolonged (
1000 msec) attention (decision-making
error) [4]. Furthermore,
eye-position data have been shown to be important in relating where a reported
finding was localized to whether that region had been scrutinized and for how
long and whether a finding was reported for that region
[5,6,7,8].
This information was of paramount importance in plotting the time course of
case decision outcomes in the final-decision phase of our study.
|
|
|---|
A case consisted of two mammograms of a single breast in different views. All breasts with abnormal findings except one contained a single lesion that was visible in both views. One breast had four lesions that were visible in both views. The test cases consisted of 13 breasts with masses; seven breasts with microcalcifications, one of which also had an architectural distortion; and 20 lesion-free breasts. An experienced mammographer selected the cases. The cases of malignant lesions were selected from cases with subtle mammographic findings and were biopsy-proven. The locations of the lesions were determined by the experienced mammographer after reviewing mammography assessment and biopsy reports. The cases with normal findings were selected from mammograms of patients who had negative findings at a 2-year follow-up. Informed consent was obtained, and the study was approved by the institutional review board.
Procedure
Each test case was reviewed during a trial that consisted of an
initial-decision phase and a final-decision phase. Eye position was calibrated
before each trial, and eye position was recorded throughout the
initial-decision phase using an eyehead tracker (ASL 4000 SU; Applied
Science Labs, Bedford, MA). During the initial-decision phase, subjects were
asked to evaluate two digital mammograms obtained in the craniocaudal and
mediolateral oblique views and decide first whether the images showed normal
or abnormal findings. When subjects indicated they were finished evaluating
each case, they said, "Done," and the experimenter terminated the
eye-position recording. Stopping the recording prompted a window that
displayed a menu to automatically open on the mammograms.
Subjects moved a cursor on the menu to indicate their initial decision of either normal or abnormal and to indicate their level of confidence in that decision as high, medium, or low. The decision "abnormal" was used to indicate that the breast probably contained a suspicious lesion (i.e., a category 4 or 5 lesion according to the Breast Imaging Reporting and Data System [BI-RADS] [9]).
The final-decision phase immediately followed the initial-decision phase. During the final-decision phase, subjects localized either the lesion seen initially or a newly discovered lesion or chose "Next Image" from the menu if no lesion was detected. If the subject localized a lesion, a menu automatically opened from which the subject selected lesion typemass, microcalcifications, or architectural distortionand a confidence-of-malignancy rating of high, medium, or low for the localized lesion. Eye position was not recorded during the final-decision phase.
Although subjects were allowed to change viewing distance, the average viewing distance in this study was 38 cm. At this viewing distance, 1 cm equals 1.5° of visual angle. The average size of the breast masses was 1 cm. The display size for a single breast image was 18.4 x 14.5 cm (i.e., a digitally cropped 8 x 10 inch image). An image took up half of the display screen. A decision was positively scored as a "hit" if it fell within 2.5° of a true lesion as marked by the experienced mammographer.
Viewing Conditions
The mammograms for the test cases were displayed on a high-resolution
21-inch (53 cm) 2560 x 2048 digital workstation (Clinton Electronics DS
5000L; Rockford, IL) in two views, craniocaudal on the left half of the screen
and mediolateral oblique on the right half of the screen. Each image was
digitized to a 50-µm pixel size using a digitizer (Lumiscan 100; Lumysis,
Sunnyvale, CA).
Instructions
Subjects were told that they could change their initial decision during the
final-decision phase, and the experimenter stressed that a decision of
abnormal should be selected only if the breast contained a suspicious lesion
(i.e., a BI-RADS category 4 or 5 lesion). During the final-decision phase, the
subject was asked to localize a suspicious lesion on the craniocaudal and
mediolateral oblique views for the cases initially called abnormal. However,
the experimenter emphasized that this procedure did not preclude localization
of newly discovered lesions during the final-decision phase on images for
cases called normal initially. Conversely, subjects could decide during the
final-decision phase that an initial decision of abnormal was an error and
that the breast was free of suspicious lesions (i.e., normal findings).
Measuring Decision Time
The computer clock timed the period from the onset of image display to the
opening of the menu for the initial decision. The clock started again at the
onset of the final-decision phase and timed each decision event (onset of
cursor click localizing a lesion) until the subject stopped reviewing the
images for a case by saying, "Next case." The time course of a
decision in the initial-decision phase was added to the time course of each
individual decision in the time course of the final-decision phase to obtain
decision times for given decision outcome. The times were combined because
according to our model of visual search, the initial-decision phase included
overview, flagging of regions of interest, and searching these regions before
determining the initial response.
The final-decision phase included re-searching and visually scrutinizing the regions of interest, making a decision, and responding. However, the eye-position data from the initial-decision phase and, in particular, the visual dwell time were considered relevant to lesion detection and localization data collected during the final-decision phase because we assume from the model we used [3] that initial impression, flagging of regions of interest, searching and detecting initially inspected lesions guided localization and decision making. Thus, decision time included all the steps in visual processing up to the mouse click to localize each lesion. If more than one lesion was localized, the response time for successive localizations included the response times of the preceding decisions, which introduced some uncertainty. However, we reported the decision times for each case rather than for each individual decision. Case decisions were almost always based on the first decision during the decision phase, thus minimizing timing error that results from multiple responses.
Analyzing Decision Outcome
The data for the initial-decision phase and final-decision phase were
analyzed separately. For analysis of the data from the initial-decision phase,
subject confidences (high, medium, or low) for overall decisions of normal or
abnormal were used to construct a 2 x 6 truth table. For analysis of the
data from the final-decision phase, LROC curves were analyzed. This analysis
required a 3 x 6 truth table to determine how many localized lesions
matched or did not match the location of a true lesion
[10]. If a localized lesion
fell within 2.5° (1.65 cm) of the location of a true lesion and was given
the highest confidence for the case, the decision was scored as true-positive.
If a localized lesion was 2.5° beyond a true lesion and was given the
highest confidence for the case, the decision was scored as false-positive for
a case with abnormal findings, sometimes referred to as a wrong lesion. In tie
cases for which confidence was equal between false-positive responses on
abnormal cases and true-positive responses, the true-positive response won for
that case. If no lesion was localized on images of a breast that contained a
lesion or lesions, the decision was scored as false-negative and was given the
highest confidence level for the case.
Lesion-free cases were scored as true-negative responses if no lesion was localized and given the highest confidence level for the case, but if lesions were localized, the lesion with the highest confidence rating for the case was assigned a false-positive response for a case with normal findings, most commonly referred to as a false-positive response.
Visual Dwell Time and Visual Attention
A 1000-msec dwell time was considered to be a significant allocation of
visual attention that typically occurs when a subject detects an object of
interest [11]. A dwell time of
this duration means that the cumulative fixation time of a group of fixations
clustering on a circumscribed area of the image reached 1000 msec.
Statistical Analyses
Analyses of variance, regression analyses, and chi-square analyses were
completed using statistics software (Statview 5.0; SAS Institute, Cary,
NC).
|
|
|---|
For the final-decision phase, we found that the mean cumulative number of case decision outcomes increased as a function of decision time for mammographers (Fig. 1). The mean number of case decision outcomes also increased as a function of decision time for trainees, but at a slower rate (Fig. 2).
|
|
A rapid rise occurred in true-positive responses for both mammographers and trainees for up to a 40-sec decision time, followed by a much slower true-positive rate (Figs. 1 and 2). The mean overall true-positive performance of trainees (mean = 9.25) was significantly below that of mammographers (mean = 14.33) when tested by analysis of variance (F1,7 = 8.4; p < 0.05). For mammographers, false-positive responses for cases with normal findings and false-positive responses for cases with abnormal findings increased at considerably slower rates, and the false-positive rate for abnormal cases tended to level off after 40 sec. This pattern of false-positive decisions differed markedly for the trainees. At 40 sec, false-positive responses for normal cases overtook false-positive responses for abnormal cases, which began to level off. Trainees make approximately the same mean number of false-positive responses for normal cases, but only slightly more than half as many true-positive responses.
Another way of comparing the difference in performance between mammographers and trainees is to measure positive predictive value as a function of decision time. Positive predictive value is calculated as [true-positive / (true-positive + false-positive)], where a false-positive is a case with normal findings.
We found that positive predictive value curves for both mammographers and trainees started high and gradually leveled off (Fig. 3). Mammographer performance was always higher than trainee performance. We fit the positive predictive value data for both mammographers and trainees by linear regression analysis using a least squares method for two components: an initial rapid phase of decreasing positive predictive value performance from 0 to 25 sec and a slower phase of decreasing positive predictive value performance from 30 to 100 sec. The correlation coefficients (r2) for the fits for mammographers and trainees were 0.94 and 0.75, respectively, for the rapid phases and 0.93 and 0.70, respectively, for the slow phases. The breakpoint at which the two curves cross for the mammographers falls at a 25-sec decision time, and the breakpoint for the trainees falls at 40 sec. Figure 3 shows the initial rapid phase (negative slope) and subsequent slow phase (relatively flat slope) of perceptual processing, with positive predictive value being significantly higher overall for experienced mammographers (mean = 0.73) than for trainees (mean = 0.57) when tested by analysis of variance (F1,7 = 5.63; p < 0.05).
|
For mammographers, 85% of the high-confidence positive decisions occurred within a 25-sec decision time, and these high-confidence decisions accounted for 77% of all true-positive decisions in the 25-sec time frame. When the rapid phase is extended to 40 sec, as suggested by the crossover point on the trainee curves, 71% of all high-confidence positive decisions occurred, and these high-confidence decisions accounted for approximately 50% of all true-positive decisions. The difference between 50% for trainees and 77% for mammographers was significant (chi-square = 7.35, 1; p < 0.01).
Overall, eye-position data indicated that mammographers fixated for 1000 msec on 62% of the true lesions compared with trainees, who fixated for 1000 msec on only 35% of the true lesions (chi-square = 34.5,1; p < 0.001). Mammographers failed to fixate on 15% of the reported true lesions compared with trainees, who failed to fixate on 36% of the reported true lesions (chi-square = 21.45, 1; p < 0.001). In addition, trainees failed to fixate on 42% of the false-negative decisions (misses) compared with 20% for mammographers (chi-square = 11.2, 1; p < 0.01).
Table 1 shows the median visual dwell time for decision outcomes for both mammographers and trainees. Mammographers looked longer than trainees at every decision-outcome category. False-positive responses for abnormal cases were looked at longest, whereas for trainees, false-positive responses for normal cases were looked at longest. However, perhaps more interestingly, trainees spent about the same median dwell time on true-positive responses as mammographers spent on false-negative responses. The overall pattern indicates that trainees do not spend enough time on correct lesions and spend too much time on incorrect lesions. For a sample two-view test case (Fig. 4A), we show the eye-fixation pattern of a trainee (Fig. 4B) compared with that of an experienced mammographer (Fig. 4C).
|
|
|
|
Finally, overall performance accuracy as measured by an LROC curve indicated that the mean area under the LROC curve for mammographers in the final-decision phase was 0.66 compared with that for trainees, which was 0.47. We show that the areas under the LROC curves for mammographers are significantly higher those of the trainees when tested by analysis of variance (F1,7 = 5.12; p = 0.05) (Fig. 5).
|
|
|
|---|
The initial 25-sec rapid phase for mammographers and 40-sec rapid phase for trainees that we identified could be considered equivalent to the rapid phase two-component model of image perception discussed by Christensen et al. [1]. As expected, the rapid phase was longer for the trainees because they have significantly less interpretation experience than the mammographers. We hypothesize that during this rapid phase the global impression initiates image perception by a Gestalt overview: conspicuous breast abnormalities are flagged for the ensuing focal search to scrutinize and evaluate each flagged region for potential abnormalities.
Mammographers are able to take greater advantage of the global impression than are trainees, perhaps by applying a more highly tuned lesion filter as the result of extensive experience [12, 13]. The mammographers in our study recognized most of the true-positives during the rapid phase, whereas the trainees had proportionally less success but still managed to detect more true-positives than false-positives. An analysis of the confidence ratings indicates that the true-positives recognized during the rapid phase were probably a lesion or lesions that were more conspicuous based on the fact that 85% of the true-positives were associated with high-confidence ratings for mammographers, and 71% of the trainees' true-positives during this time period were associated with high-confidence ratings. Mammographers rated confidence of true-positives higher and confidence of false-positives on abnormal cases lower than trainees. These results were indicated by a significant decision type by group interaction when tested by analysis of variance (F2,312 = 4.4; p < 0.05).
Performance declined during the slow phase that followed, as both we and Christensen et al. [1] have noted. For mammographers, continuing their search yielded few new true-positive responses, but these findings still outpaced false-positive responses for normal cases and false-positive responses for abnormal cases until the search was terminated. Not so, however, for trainees: The combination of false-positive responses for normal cases plus false-positive responses for abnormal cases overtook true-positive responses beyond 25 sec, which caused overall performance to decline until search was terminated. Christensen et al. also noted that the performance of trainees reviewing chest radiographs declined about halfway through the search so that false-positive responses were as likely as true-positive responses.
The pattern of high performance observed within the first 25-40 sec in our study and in the rapid phase in the study of Christensen et al. [1] indicates the important role played by the global impression in initiating image perception and directing focal search. Once the most conspicuous lesions have been identified, a focal search-and-identify phase takes over. Because hard lesions are, by definition, ambiguous, more false-positives and more declines in performance will naturally occur. This consequence suggests that subjects should terminate their search when they no longer believe that they can make a high-confidence decision. Prolonging their search at this point has a low probability of yielding a true-positive finding. Perhaps less experienced subjects would benefit less from this strategy, but mentor-guided feedback with instructions to trust only the more confident and early decisions and to quit searching when not sure might improve decision-making skills.
|
|
|---|
This article has been cited by other articles:
![]() |
E. A. Krupinski and K. S. Berbaum The Medical Image Perception Society Update on Key Issues for Image Perception Research Radiology, October 1, 2009; 253(1): 230 - 233. [Full Text] [PDF] |
||||
![]() |
B. Anderson, R. E.B. Mruczek, K. Kawasaki, and D. Sheinberg Effects of Familiarity on Neural Activity in Monkey Inferior Temporal Lobe Cereb Cortex, November 1, 2008; 18(11): 2540 - 2552. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Singh, S. Sethi, M. Raber, and L. A. Petersen Errors in Cancer Diagnosis: Current Understanding and Future Directions J. Clin. Oncol., November 1, 2007; 25(31): 5009 - 5018. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. G. Elmore and R. J. Brenner The More Eyes, the Better to See? From Double to Quadruple Reading of Screening Mammograms J Natl Cancer Inst, August 1, 2007; 99(15): 1141 - 1143. [Full Text] [PDF] |
||||
![]() |
H. L. Kundel, C. F. Nodine, E. F. Conant, and S. P. Weinstein Holistic Component of Image Perception in Mammogram Interpretation: Gaze-tracking Study Radiology, February 1, 2007; 242(2): 396 - 402. [Abstract] [Full Text] [PDF] |
||||
![]() |
R S Saunders and E Samei Improving mammographic decision accuracy by incorporating observer ratings with interpretation time Br. J. Radiol., December 1, 2006; 79(Special_Issue_2): S117 - S122. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Brenner, M. J. Ulissey, and R. M. Wilt Computer-Aided Detection as Evidence in the Courtroom: Potential Implications of an Appellate Court's Ruling Am. J. Roentgenol., January 1, 2006; 186(1): 48 - 51. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Ikeda, R. L. Birdwell, K. F. O'Shaughnessy, E. A. Sickles, and R. J. Brenner Computer-aided Detection Output on 172 Subtle Findings on Normal Mammograms Previously Obtained in Women with Breast Cancer Detected at Follow-Up Screening Mammography Radiology, March 1, 2004; 230(3): 811 - 819. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |