AJR Get Involved! Join ARRS Today
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sistrom, C. L.
Right arrow Articles by Honeyman-Buck, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sistrom, C. L.
Right arrow Articles by Honeyman-Buck, J.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Hotlight (NEW!)
Right arrow
What's Hotlight?
AJR 2005; 185:804-812
© American Roentgen Ray Society


Original Research

Free Text Versus Structured Format: Information Transfer Efficiency of Radiology Reports

Chris L. Sistrom and Janice Honeyman-Buck

Department of Radiology, University of Florida Health Center, PO Box 100374, Gainesville, FL 32610.

Received November 1, 2004; accepted after revision December 8, 2004.

 
C. L. Sistrom was funded by the General Electric/Association of University Radiologists Research Fellowship from July 2000 through June 2003.

Address correspondence to C. L. Sistrom (sistrc{at}radiology.ufl.edu).


Abstract
Top
Abstract
Background and Significance
Materials and Methods
Results
Discussion
References
 
OBJECTIVE. We discuss the effect of radiology report format on the accuracy and speed with which reviewers can extract case-specific information.

MATERIALS AND METHODS. A Web-based testing mechanism was used to present radiology reports to each of 16 senior medical students and record their answers to 10 multiple choice questions about specific medical content for each of 12 cases. Subjects were randomly assigned to view the reports in either free text or structured format. In addition to number of answers correct for each case, we recorded the time taken for each case and an efficiency score (correctly answered questions per minute). These three outcomes were tested for differences on report format using multifactorial analysis of variance. A postexperimental questionnaire and a mediated focus group elicited subject preference as to radiology report format.

RESULTS. There were no significant differences in the three outcomes (score, time, and efficiency) between the free text and structured format conditions. The power of the experiment was sufficient to detect small differences in these outcomes by format. Subjects strongly and consistently expressed a preference for the structured version.

CONCLUSION. We assert that free text and itemized (structured) forms of radiology reports are equally efficient and accurate for transmitting case-specific interpretative content to reviewers of the document.


Background and Significance
Top
Abstract
Background and Significance
Materials and Methods
Results
Discussion
References
 
During his inaugural address to the 2004 ARRS Meeting, the incoming president, Christopher R. B. Merritt, spoke extensively about radiology reporting [1]. He emphasized the need for using new technologies to achieve standardization and structure to facilitate clinical care, research, and compliance. In a 2005 review on the subject, Sistrom and Langlotz [2] described a framework for improved radiology reporting that articulated three attributes as targets for improvement. These were standard language, structured format, and consistent content [2]. In considering different options for report format, we believe that the choices made will affect the process of care in two distinct ways. First, during report creation, a predefined format may fundamentally alter the way that the interpreting physician thinks about the case as he or she produces the document. We call this reporting into structure. Second, during report review, the choice of different report formats may affect the way in which the reader comprehends the information. We call this reading structure.

The distinction between reporting into structure and reading structure is by no means purely academic. Current technology allows a clinical document to be produced in one format and displayed in an entirely different way. For example, one vendor of a structured computerized reporting product (eDictation) uses a sophisticated interface to allow radiologists to choose relevant findings from a large and highly structured menu of possibilities. At the same time, the software produces phrases and sentences that look just like a typical narrative report and it is this document that is made available for referring physicians to review. The assumption is that clinicians will be more comfortable with radiology reports that look like what they are used to reading, that is, narrative free text. Our research was designed to obtain empiric data about this question. We sought to examine the clinical utility of radiology reports in terms of information transfer to the reader. Specifically, what are the effects of report format (independent of content) on the efficiency with which medical personnel can read them and obtain information needed for patient care? Our working hypothesis was that consistently formatted (structured) reports would be easier to read and comprehend resulting in greater efficiency for the task of answering content-specific questions.



View larger version (12K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1A Screen capture images from the Web-based testing program used to perform the experiment. Students first were presented with a clinical scenario page (A). Pressing the button labeled GO TO THE REPORT caused the second page (B) to be displayed with the report text. Pressing the button labeled GO TO THE QUESTIONS caused a third page (C) to appear. This contained the 10 questions pertaining to the report and had buttons labeled GO BACK TO THE REPORT that caused redisplay of the report.

 



View larger version (61K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1B Screen capture images from the Web-based testing program used to perform the experiment. Students first were presented with a clinical scenario page (A). Pressing the button labeled GO TO THE REPORT caused the second page (B) to be displayed with the report text. Pressing the button labeled GO TO THE QUESTIONS caused a third page (C) to appear. This contained the 10 questions pertaining to the report and had buttons labeled GO BACK TO THE REPORT that caused redisplay of the report.

 

Materials and Methods
Top
Abstract
Background and Significance
Materials and Methods
Results
Discussion
References
 
Following approval for the research by our local institutional review board (IRB), radiology reports were selected from our departmental archive of routine clinical examinations performed between January 2001 and December 2002. Four reports were selected from each of three types: songraphy of the abdomen, CT of the abdomen, and head CT without contrast. All examinations had been performed on patients referred from the emergency department or outpatient clinics. We did not select the cases at random but rather looked for reports that had some relatively common abnormalities. We excluded reports that described findings that might be considered as unusual, emergent, or unexpected. All 12 reports were blinded by removing patient identifying information and any reference to proper names in the text.

For each report, we generated a series of 10 multiple-choice items, each having a question stem and from 3-5 options. These were designed to have a single option that was unambiguously correct based only on the content of the report to which they referred. The reports and candidate questions were administered to three senior medical students using a Web-based testing system (described below) that allowed them to refer back to the report text as needed. These students did not participate as subjects in the subsequent experiment. We also gave printed versions of the questions and associated reports to two faculty radiologists and two senior radiology residents and asked them to evaluate them for clarity and consistency. Using feedback given to us by all of these evaluators, we corrected the wording of one of the question stems and three of the options (all distracter items). We also eliminated one option (also a distracter).

The original reports were in narrative format with variable use of headings for indication, comparison, examination details, and findings. All of the original reports had a labeled impression section. These 12 reports along with the 10 questions pertinent to each one formed the free text condition of the experiment. A report structure shell was designed for each of the three types of studies. These consisted of the components typically found in narrative radiology reports with additional headings in the findings section. For abdominal CT and sonography, these basically were anatomic. For head CT, we combined anatomic and functional headings. The templates for all three examination types are listed in Appendix 1.


View this table:
[in this window]
[in a new window]

 
APPENDIX I : Headings Used for Structured Format Reports

 

For each case, we parsed the free text report into the appropriate structured template. This was done so that all the original content was exactly and completely replicated in the structured version. We were careful to leave basic sentence structure and word choice intact. We duplicated or eliminated words or phrases as needed to maintain proper syntax in the structured version. For example, the free text might contain "The pancreas, spleen, and kidneys are unremarkable." The word "unremarkable" would be placed after the pancreas, spleen, and kidneys headings in the structured version. If a heading in the structured template did not have any relevant content from the free text version, it was left in the report with no text after it. Appendix 2 is an example of a free text abdominal CT report and Appendix 3 contains the structured version. With 12 unique cases, each having two versions (free text and structured format), there were 24 total cases.


View this table:
[in this window]
[in a new window]

 
APPENDIX 2 : Free Format Report of an Abdominal CT Scan as Obtained from the Medical Record

 

View this table:
[in this window]
[in a new window]

 
APPENDIX 3 : Report from Appendix 2 Parsing into the Structured Template

 

Experimental Details
Our College of Medicine uses a locally developed Web-based testing system for all examinations administered to students. It has been in use for more than 7 years and by the time our medical students have reached their fourth year, they have taken at least 20 tests using it. Since we used the same system to administer our experimental tests, all of the subjects were quite familiar with its function and appearance. This eliminated any need to train subjects before their participation and reduced any potential variability in performance based on differential learning of the testing procedure itself.

The experimental testing software was designed to present each case as a set of three Web pages joined into a common frame set. The first page gave a brief description of a clinical presentation derived from the clinical history and the stated reason for the examination in question. A button (labeled GO TO THE REPORT) on the first page caused a second page containing the report text to be displayed. Buttons above and below the report text (labeled GO TO QUESTIONS) served to advance to the third page containing all 10 questions relating to the report. The question page also had buttons (labeled GO BACK TO THE REPORT) at the top and bottom that caused redisplay of the report text while keeping the question page active in the background, thus preserving its state. Subjects could switch back and forth between the report and questions as often as needed and their previous answers would remain intact. Figures 1A, 1B, and 1C depicts the three frames making up a single test case.



View larger version (25K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1C Screen capture images from the Web-based testing program used to perform the experiment. Students first were presented with a clinical scenario page (A). Pressing the button labeled GO TO THE REPORT caused the second page (B) to be displayed with the report text. Pressing the button labeled GO TO THE QUESTIONS caused a third page (C) to appear. This contained the 10 questions pertaining to the report and had buttons labeled GO BACK TO THE REPORT that caused redisplay of the report.

 

The Web pages all had JavaScript code (Sun Microsystems) embedded in them to record a time stamp (at 1/10 sec precision) for every button click and item selection as the subject navigated through the case and answered questions. A submit button on the question page served to record the answers and the time-stamped navigation data. Submission was not accepted until all 10 questions had been answered. It is important to note that the design of the testing mechanism caused the entire set of pages comprising a single case to be transferred from Web server to the local computer when the subject was ready to start. All navigation and recording of time stamps was accomplished locally and required no further traffic between the testing computer and the server. Thus, there was no possibility that timing during a single case might be confounded by variations in network speed. All testing was performed on a single workstation located in a quiet room. This computer had a single 18-inch flat panel color monitor. Relevant software included Windows 2000 professional and Microsoft Internet Explorer. It was connected to our hospital's internal network.

All of the subjects of the experiment were senior medical students at our institution. They were recruited by means of fliers posted in the medical school teaching complex. The research (and the flier) had prior approval by our local IRB. They were required to have taken and passed their medicine and surgery clinical clerkships before participating in the research. For incentive, a $100 account was opened for each participant at the College of Medicine bookstore. Each of the subjects was assigned all 12 cases to do during a single experimental session. Six of these cases were presented with the free text version of the report and the remaining six with the structured version of the report. The assignment of which of the 12 cases was presented to each subject as free text versus structured was randomized, with one restriction: We balanced the number of times each case was presented in free text versus structure across the entire experiment. The order in which subjects took their 12 cases was randomized, with one restriction: We balanced the assignment so that each case would be presented an equal number of times in each quartile of the case order (1-3, 4-6, 7-9, 10-12). Our initial power calculations called for eight subjects. This allowed a symmetric design in both the format and case order factors. We performed a second repetition of the experiment with an additional eight subjects, resulting in a total of 192 sets of case/subject responses for analysis. During the second repetition, the same randomization scheme was used with the only difference being that the assignment of the free text or structured format version of the report was reversed. Testing of all 16 subjects (5 women and 11 men) was completed in the 2002-2003 academic year. There were no dropouts or technical failures during the experiment and each of the subjects completed their 12 cases in the sequence assigned during a single session and none reported any problems with the testing system.

Following completion of the testing, 15 of the 16 subjects met with the principal investigator in a "debriefing" focus group. Before this meeting, subjects had been told only that the purpose of the experiment was to assess their ability to extract information from reports. At the beginning of the meeting, subjects were given a brief questionnaire to fill out. This asked several general questions about radiology report format and content. Next, a brief paragraph defined free text and structure followed by an example of each format (similar to Tables 1 and 2). Next, a set of eight Likert-scaled items asked for their preference concerning various functional aspects of reading radiology reports (accuracy, speed, certainty, items not mentioned, positive findings, negative findings, and general preference) based on their experience with the test cases. The preference scores were anchored as follows: 1 = prefer free text, 5 = no preference, 10 = prefer structure. A mediated discussion was then conducted to elicit opinions about radiology report structure and content. Note that subjects were not given any feedback—after they participated or during the debriefing meeting—concerning individual or aggregate performance in answering questions for the cases.


View this table:
[in this window]
[in a new window]

 
TABLE 1 : Results of Analysis of Variance for the Number of Correctly Answered Questions Per Case

 

View this table:
[in this window]
[in a new window]

 
TABLE 2 : Results of Analysis of Variance for the Number of Seconds Taken to Complete Each Case

 

Statistical Analysis
Each subject's participation generated a set of 12 experimental results. These consisted of answers to the 10 questions for a case and the time-stamped navigation data. We scored the subject's answers against a key to obtain the number correct. The start time was subtracted from the final submission time to obtain number of seconds taken to do each case. An efficiency score was then calculated for each case by dividing the number of questions answered correctly by the number of seconds taken to finish the entire case. This result was multiplied by 60 to give the number of correctly answered questions per minute. The time-stamped navigation activity records were processed to obtain two outcomes for each case. The number of times the subject moved back to view the report from the question page was tabulated. This outcome could take any positive integer and will be called report views. The number of answer selections made during each case was counted as well. This outcome was at least 10 and was higher when subjects changed their minds about one or more answers. Thus, there were five outcomes analyzed: number of questions correct, time in seconds taken to complete the case, efficiency score, report views, and answer selections.

We used the Statistical Analysis System (SAS Version 9 for Windows, SAS Institute) for all data manipulation and statistical calculation. For all tests of significance, we set p = 0.05 as the cutoff and used two-tailed alternate hypotheses. The analysis was done on the basis of a balanced incomplete block design. The factors included examination type (3 levels), report format (2 levels), and individual cases (4 levels per type). Thus, there were 24 factor level combinations (treatments), a block size of 12 (cases per subject), and eight blocks (subjects). The experiment was replicated twice with two groups of eight subjects for a total of 16 subjects and 192 experimental units. Summary statistics for the five outcomes were generated, including mean, median, mode, SD, frequency distribution plot, and normal probability plot.

We performed analysis of variance with a general linear model procedure (SAS PROC GLM) to test for differences in each of the five outcomes (percent of questions correct, time taken, efficiency, report views, and answer selections) jointly related to our independent variables. We used the same linear model for each outcome. Fixed effects included report format, case type, and the order that the case was presented to the subject. Random effects included case identity nested within case type, subject identity, and report format crossed with subject identity. The model was specified as follows:


Standard F statistics using the type 3 sums of squares and appropriate error terms were used to test the coefficients (ß1 - ß6) against the null hypothesis of no effect (ßi = 0). The Duncan procedure was used to perform multiple comparisons of mean number of report views by case type [3]. Since none of the fixed effects was significant in any of the other models, no additional multiple comparisons were performed. The two outcomes relating to subjects' test-taking habits (report views and answer selections) were correlated within subjects, formats, case types, and overall to obtain Pearson correlation coefficients [4].

Because our results showed equivalence on the main variable of interest (report format) we performed post hoc power analysis. The sample size was initially set with eight subjects each looking at 12 cases for a total of 96 experimental units. We were able to double the planned sample size because many students responded to the request for participation and running the experiments was quite easy due to all subjects' familiarity with the testing system. Our analysis was based on paired testing between free text and structured format with {alpha} = 0.05 and ß = 0.01 (90% power). We used the root mean square error from the analysis of variance output as the estimate of sigma for each outcome.

For the postexperimental survey, Likert-scaled items from the debriefing questionnaires were summarized by calculating the median value, the 10th percentile, and the 90th percentile. The general preference items were enumerated and percentages calculated. The qualitative content of the debriefing focus group was summarized from a transcription of a tape recording made during the session.


Results
Top
Abstract
Background and Significance
Materials and Methods
Results
Discussion
References
 
Experimental Results
Examination of the response data from the testing system revealed that all 16 subjects submitted valid responses to the 10 questions for each of their 12 cases and that they performed them in the assigned order. The timing data were complete and consistent with the answer responses with no gaps, extra entries, or ambiguities. Thus, there were 192 complete response sets available for analysis. These consisted of timing data and answers to all 10 questions for each of 12 cases completed by our 16 subjects. Each case was shown eight times in its free text form and eight times with the structured format. What follows are univariate statistics on each outcome for the entire data set (n = 192) along with the multifactorial analysis of variance results.

The number of correct responses (score) ranged from two to 10 with a mean of 8.35 and SD of 1.52. The distribution was negatively skewed with median and mode both being nine. None of the fixed effects (format p = 0.35, order p = 0.27, and case type p = 0.92) were significant and there was no interaction between format and subject effects on the score. The majority of the variance (60%) was partitioned between subjects and between cases within case type and R squared for the model was 0.58. The analysis of variance results are reproduced in Table 1.

The time taken to complete the cases ranged from 30 to 707 sec with mean of 351 and SD of 108. The distribution was not skewed with a median of 341 but there was some kurtosis. The single observation at 30 sec was a distinct outlier with the next lowest value being 102 sec (first percentile). None of the fixed effects were significant (format p = 0.41, order p = 0.99, and case type p = 0.22) and there was no interaction between format and subject effects on the time. The majority of the variance (62%) was partitioned between subjects and between cases within case type and R squared for the model was 0.71. The analysis of variance results are reproduced in Table 2.

The efficiency (number of correctly answered questions per minute) ranged from 0.52 to 4.7 with mean of 1.56 and SD of 0.56. The distribution was nearly normal except that the positive tail was longer. This was due to the same outlier (30 sec to complete the case) mentioned above. None of the fixed effects were significant (format p = 0.92, order p = 0.60, and case type p = 0.48) and there was no interaction between format and subject effects on the efficiency. Just under half of the variance (46%) was partitioned between subjects and between cases within case type and R squared for the model was 0.57. The analysis of variance results are reproduced in Table 3.


View this table:
[in this window]
[in a new window]

 
TABLE 3 : Results of Analysis of Variance for Efficiency of Answering Questions (Score / Time) x (60)

 

The number of times that any answer was selected for each case (answer selections) ranged from 10 (obligate floor value) to 19 with a mean of 11.5, SD of 1.68, median of 11, and mode of 10. The distribution was, as expected, not normal and looked more like a Poisson type with a mean of 11.5. We elected to proceed with analysis of variance despite the violation of normality because this outcome was considered to be secondary and the results were relatively uninteresting. The only significant effect was between subjects. Report format and case type had the same mean number across all levels.

The number of times subjects went back to look at the report text (report views) while answering questions ranged from two to 32 with a mean of 12, median of 11, and SD of 6.8. The distribution was near normal, allowing for the discrete nature of the outcome. The analysis of variance analysis showed no difference in the mean number by report type (structured = 13.4, free text = 12.3, p = 0.93). Again, there was considerable variance between subjects (p < 0.0001) with a minimum of 4.7 times per case ranging up to 25 times per case. Interestingly, there was a significant difference between the type of case (p = 0.0016) even though there was no significant difference between the individual cases (p = 0.11). The mean number of times the report was consulted for abdominal CT (13.1) was essentially the same as for abdominal songraphy (12.9). However, for head CT, subjects went back to look at the report an average of 11 times. We tested for interaction between case type and subject effects and found none. Therefore, the tendency to look back at head CT reports less frequently than sonography and abdomen CT was shared by all subjects.

Across the entire sample of 192 cases, the correlation between report views and answer selections was weakly positive with a Pearson coefficient of 0.22 (p = 0.002). When we examined the relationship between report views and answer selections for each subject, only three of the 16 had significant correlations. These were all positive (0.58, 0.64, 0.75) and probably accounted for the aggregate correlation. The correlations between report views and answer selections stratified by case type and report format were all weakly positive (0.18 to 0.28). The one exception was the head CT cases, where there was no correlation between numbers of report views and answer selections.

The head CT cases seemed to elicit a somewhat different pattern of report viewing unrelated to the number of times answers were selected. Considering that the head CT structured format was functionally organized rather than strictly by anatomy, we wanted to be sure there was no interaction between the type of case and our main independent variable of interest, the report format. When we added this interaction term to the analysis of variance models for score, time, and efficiency, we were reassured to find that it was not significant for any of the outcomes. Thus, the finding that report format had no effect on the outcomes is conclusive and holds across case type.

For the post hoc power analysis, the root mean square errors (sigma) were as follows: score = 1.32, time (seconds) = 65, and efficiency = 0.43. As described above, we set {alpha} = 0.05 and power to 90%. For score, we could have detected a difference of about 0.5 in the mean of correctly answered questions. The observed mean scores were 8.28 for free text and 8.43 for structured format, with a difference of 0.15. For time to complete each case, we could have detected a difference of about 30 sec. The observed mean times were 355 sec for free text and 347 sec for structured format with a difference of 8.4 sec. Finally, for efficiency of answering questions, we could have detected a difference of about 0.2 questions per minute. The observed efficiency for free text was 1.57 questions per minute and for structured format, it was 1.55 questions per minute for a difference of 0.02 questions per minute. For all three of the main outcomes of interest, the observed effect of report format was far less (approaching an order of magnitude) than the difference our experiment was powered to detect.

Qualitative Results
The debriefing meeting was attended by 15 of 16 subjects. One of the male students was serving in a clinical clerkship at another institution. One of the general questions asked for preference about the report organization. One option to this question was "like a laboratory report." By this we meant standardized headings in the body with results organized under these headings. The choices and number of responses were as follows. Like a laboratory report (11/15 = 73%), like a newspaper story (1/15 = 7%), and in the current (unstructured) format (3/15 = 20%). We also asked how they would prefer to have uncertainty expressed. The choices and responses were in words (8/15 = 53%), as a semiquantitative scale (3/15 = 20%), and in quantitative terms (2/15 = 13%). The final general question asked about how subjects would respond to an explicitly worded recommendation in a radiology report. The responses were as follows. The subject would be compelled to follow the recommendation (2/15 = 13%), they might be compelled to follow the recommendation (10/15 = 67%), and they would not feel so compelled (3/15 = 20%).

The responses to the Likert-scaled questions about preference between free text (1) and structured format (10) are summarized in Table 4 with median, mode, and range for each one. Clearly, the subjects strongly tended to prefer the structured format for the seven separately articulated domains as well as overall. During the mediated discussion, this perception was reinforced. At least five subjects clearly expressed the opinion that they would like to see all radiology reports formatted in a manner similar to our structured condition (as in Appendix 3). One participant mentioned that the headings should be consistent across all instances of a report type. He said "it might be confusing if some people include biliary system under liver and some people included it under gallbladder, for example." Others felt that the order of headings should be altered dynamically so that the abnormal findings would be at the top of the report. One potential drawback to the structured format was expressed as follows: "The overall gestalt of the examination might be lost in structure whereas with free text a sense of severity and more acuity can be conveyed." A corollary opinion that was expressed quite forcefully and frequently was that reports should still have a clearly labeled impression section. The concept of organizing a report like a newspaper story was linked to the need for an impression. Like a news story, subjects wanted to see reports have an equivalent of the lead paragraph in which findings are synthesized and condensed to allow rapid review and to highlight the diagnostic impression.


View this table:
[in this window]
[in a new window]

 
TABLE 4 : Summary of Responses to Format Preference Questions

 


Discussion
Top
Abstract
Background and Significance
Materials and Methods
Results
Discussion
References
 
The working hypothesis of the experiment was that our subjects would take significantly less time to read a structured report and answer questions about its content than they would with the free text version. We were not certain that subjects would attain higher or even equal scores with the structured version. To allow for this, we introduced a measure of efficiency (correctly answered questions per minute). The pattern of outcomes that we expected was less time for structure, equal to slightly lower scores (accuracy) with structure, and increased efficiency with structure. To our surprise, there were no significant differences on any of the three measures. The planned power to detect differences between free text and structure was achieved and indeed exceeded in the experiment. Furthermore, all hypotheses were tested using two-tailed methods thus allowing for either alternative to the null of equivalence.

We assert that there is no effect of report format on speed, accuracy, or efficiency for our subject population reading the types of reports we presented to them. By extension, we suggest that the same phenomenon may pertain to practicing physicians. If this is true, designers of reporting systems may not need to work so hard to produce old-style narrative documents out of structured elements. At the same time, it would seem that free text reports are not as difficult to read for content as some believe. The choice of report format and structure may be made on the basis of referring physician preference and considering the effect of reading into structure by radiologists. However, we advise those seeking to fundamentally change the way in which radiology reports are created and displayed to proceed with caution in consideration of the following. In a recent article, Ash et al. [5] discussed unintended consequences of overemphasizing structured information entry in health care informatics. They cite evidence from studies in cognitive psychology and sociology that in a shared context, concise, unconstrained, free text communication is the most effective for coordinating work around a complex task (5, 6). Overly structured data can lead to loss of cognitive focus by clinicians, both during input and review. This can cause clinicians to experience a loss of overview about the case at hand when they have to attend to data contained in many different fields, sometimes on different screens within an interface (7, 8). Furthermore, the act of writing or dictating in narrative form may be integral to the cognitive processing of the case [9]. Our finding of dissonance between subjects' preferences concerning report format and their actual performance reading them for comprehension confirms that the cognitive issues are complex. In our opinion, tried and true methods of authoring and displaying radiology reports should not be abandoned without considering the consequences.

Our follow-up session and the questionnaires completed by the subjects shed light on reader preference for report format. They all strongly and consistently preferred the structured version to the free text. This preference for structured format was consistent across all seven domains that we asked about with modal values on the 10-point Likert scale all being 10 (prefer structure). Also, the corollary question about general report organization resulted in 73% preferring a "laboratory report" format over the alternatives. The opinions of our subjects are entirely consistent with other workers' findings with respect to physician preferences about radiology reports. There is a large body of published research detailing the opinion of referring physicians regarding the content and format of radiology reports [10-15]. The terminology differs somewhat but attributes consistently endorsed by consumers (readers) of radiology reports include complete, itemized, and structured. Another element that is commonly preferred by referring clinicians is that the report should contain a complete listing of pertinent negative findings. In aggregate, these opinions seem to militate for a report organization and format like the "laboratory report" option preferred by our subjects.

Selection of senior medical students to serve as subjects proved to be quite successful. Interest and enthusiasm were such that we were able to double the sample size with little effort. Even after the study had closed, numerous students asked to participate. We found the experimental paradigm to be very acceptable to subjects and quite easy to administer. These factors should allow us to easily extend and expand the experiments—using additional cohorts of senior medical students—to address limitations described below.

Perhaps the most important limitation in our study has to do with generalizing our results from senior medical students answering content-specific questions to practicing physicians using radiology reports for clinical decision making. We narrowed the focus of the research to evaluate readability of the documents containing radiology interpretations with respect to the format alone. Medical students' subsequent experiences during training and practice certainly do lead to differences in many skills and habits. However, we argue that the simple ability to read a passage of text and comprehend its content is already well established by the senior year of medical school.

A difficult design consideration was whether to test subjects with both the free text and structured version of each case. This would have provided even greater power to detect the effect of format on outcomes by virtue of having a directly paired comparison. We think that our choice of a balanced block design, incomplete in the format factor, was valid for two reasons. First, the planned and achieved power of the chosen design allowed us to detect differences between free text and structure that were far less than what we considered to be practically relevant. Second, having subjects see cases twice would have introduced methodologically difficult problems with memory effects.

Another issue is that our subjects had no time constraints or other pressures placed on them during testing. We plan on adding features to the experimental paradigm that will stress a subject's short-term memory of the material. The structured versions of the reports we used had phrasing and syntax identical to that found in the original narrative versions. In practice, the language and construction of interpretative statements would likely be rather different in structured reports. Readers may be more (or less) able to rapidly comprehend medical content presented in structured format using "telegraphic" constructions such as "LIVER: Negative."

To address limitations described above and extend the scope of our inferences, we plan on at least three extensions to the experiment using the same cases, questions, and randomization scheme. First, we will remove the button on the question page that allows going back to review the report. Subjects will know this and that they must answer all 10 questions after one reading. There will be no time limit for either reading the report or answering the questions. Second, we will place a time constraint on how long the report is visible before switching to the question page. Again, subjects will know that they cannot go back to review the report while answering questions. Third, we will enable either structured or free text formats to be viewed at the discretion of the subject while they go through a case. The tracking code will record which version(s) they look at and for how long. This will allow us to determine if subjects develop and actually act on a preference for one format or the other as they move through the cases.

Further research will involve psychometric evaluation of the questions themselves. Once the three additional experiments detailed above have been completed, we will have a large number (64) of answers to each of 120 different questions about radiology report content. This should allow us to use standard techniques to assess item difficulty, reliability, various correlations, and discriminatory power. These results will be interesting in their own right by revealing what kinds of questions are challenging for readers to answer. This might guide radiologists in explicitly including phraseology in their reports to address these difficulties. Types of questions that exhibit high levels of variance in the answers given or are poorly correlated with subject's overall scores will also be of interest. Given this knowledge about the types of questions that are most reliable and discriminatory, we can redesign the cases and questions to optimize power to detect subtle differences in reader performance. Such information about question content may guide other researchers in their own experiments about readability of medical documents.

To our knowledge, this work is the first experimental evaluation of radiology reports whose primary outcomes are quantitative measures of information transfer to readers of the documents. Based on the results described above, we assert that there is no difference in information transfer efficiency between free text (narrative style) report format and structured (itemized) reports having the same content. Despite the fact that they performed no better with the structured versions, our subjects clearly preferred it to the free text format.


References
Top
Abstract
Background and Significance
Materials and Methods
Results
Discussion
References
 

  1. Merritt CRB. New president says emphasize signal, delete noise from radiology reports. ARRS Memo: Newsletter of the American Roentgen Ray Society 2004; 15(3):1 -8
  2. Sistrom CL, Langlotz CP. A framework for improved radiology reporting. J Am Coll Radiol 2005;2 : 159-167
  3. Duncan DB. t tests and intervals for comparisons suggested by the data. Biometrics 1975;31 : 339-359[CrossRef]
  4. Pearson K. Mathematical contributions to the theory of evolution. III. regression, heredity and panmixia. Phil Trans Royal Soc Ser A 1896; 187:253
  5. Ash JS, Berg M, Coiera E. Some unintended consequences of information technology in health care: the nature of patient care information system-related errors. J Am Med Inform Assoc2004; 11:104 -112[Abstract/Free Full Text]
  6. Garrod S. How groups co-ordinate their concepts and terminology: implications for medical informatics. Methods Inf Med1998; 37:471 -476[Medline]
  7. Patel VL, Kaufman DR. Medical informatics and the science of cognition. J Am Med Inform Assoc 1998;5 : 493-502[Abstract/Free Full Text]
  8. Patel VL, Kushniruk AW. Understanding, navigating and communicating knowledge: issues and challenges. Methods Inf Med1998; 37:460 -470[Medline]
  9. Berg M. Practices of reading and writing: the constitutive role of the patient record in medical work. Sociol Health Illness 1996; 18:499 -524[CrossRef]
  10. Clinger NJ, Hunter TB, Hillman BJ. Radiology reporting: attitudes of referring physicians. Radiology 1988;169 : 825-826[Abstract/Free Full Text]
  11. Gunderman RB, Ambrosius WT, Cohen M. Radiology reporting in an academic children's hospital: what referring physicians think. Pediatr Radiol 2000;30 : 307-314[CrossRef][Medline]
  12. Johnson AJ, Ying J, Swan JS, Willicam LS, Applegate KE, Littenberg B. Improving the quality of radiology reporting: a physician survey to define the target. J Am Coll Radiol 2004;1 : 497-505[CrossRef][Medline]
  13. Lafortune M, Breton G, Baudouin JL. The radiological report: what is useful for the referring physician? Can Assoc Radiol J 1988; 39:140 -143[Medline]
  14. McLoughlin RF, So CB, Gray RR, Brandt R. Radiology reports: how much descriptive detail is enough? AJR1995; 165:803 -806[Abstract/Free Full Text]
  15. Naik SS, Hanbidge A, Wilson SR. Radiology reports: examining radiologist and clinician preferences regarding style and content. AJR 2001; 176:591 -598[Abstract/Free Full Text]

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
RadiologyHome page
D. L. Weiss and C. P. Langlotz
Structured Reporting: Patient Care Enhancement or Productivity Nightmare?
Radiology, December 1, 2008; 249(3): 739 - 747.
[Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
L. Berlin
Replacing traditional text radiology reports with image-centric reports: a shift from epiphany to enigma?
Am. J. Roentgenol., November 1, 2006; 187(5): 1156 - 1159.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sistrom, C. L.
Right arrow Articles by Honeyman-Buck, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sistrom, C. L.
Right arrow Articles by Honeyman-Buck, J.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Hotlight (NEW!)
Right arrow
What's Hotlight?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS