The American Board of Radiology (ABR) administers certification examinations in diagnostic radiology and in radiologic sub-specialties. The ABR regularly offers Maintenance of Certification (MOC) examinations in both its own testing centers (Tucson, AZ, and Chicago, IL) and in Pearson VUE testing centers across the United States. The ABR commenced the move from oral to computer-based examinations for initial board certification in 2013 and will deliver its first computer-based Initial Certification (IC) examinations later this year.
Running imaging-based examinations in distributed locations with large volumes of candidates raises many challenges. Displays must be of comparable quality for all candidates to ensure equitable examinations; however, this can be difficult to achieve across multiple testing centers, and transporting sufficient volumes of standardized displays is impracticable. Furthermore, the examination material must be secure. With this in mind, the ABR has expressed an interest in using the iPad (Apple) as a display device for examinations because these tablet computers are sufficiently portable to be transported to multiple locations, thus allowing all displays to be monitored and examination images to be stored only on ABR-controlled devices.
Many articles have recently been published describing various uses of the iPad in radiology, including administration and data collection [
1], teaching and education [
2–
4], and even navigation in interventional and surgical procedures [
5–
8]. Studies of diagnostic usefulness have also been performed with various generations of the iPad [
9–
15], with most investigators optimistic about its potential, at least for certain tasks or situations. However, for the iPad to be adopted as a display for imaging-based examinations, it is important that all relevant image features are adequately visualized. This study aimed to prospectively determine whether the iPad 3 (hereafter referred to as “the iPad”) was suitable for image display in ABR examinations and how it compares with standard monitors of the type currently used.
Subjects and Methods
Overview
The method used in this study was based on that used by Krupinski et al. [
16], who investigated whether monitors of different resolutions commonly encountered in Pearson VUE testing centers were of adequate quality for use in ABR MOC examinations; however, there were some modifications.
Radiologists examining at the ABR Diagnostic Radiology IC oral examinations in Louisville, KY, in June 2013 reviewed a selection of examination cases from past ABR IC examinations in their specialist areas on both an iPad and on 2-MP monitors (ViewSonic) typical of those found in the ABR's Chicago and Tucson testing centers. The visibility of the pertinent image features for each case was rated and the results compared between the devices. Participants were also asked to give their opinions about the use of the iPad for ABR examinations. The study was declared exempt from full ethical review at the home institution. The data were controlled and analyzed by three of the authors after collection.
Cases
Cases from nine of the 10 radiologic specialties, all of which had been previously used at the November 2011 IC examinations, were supplied by the ABR. The ultrasound specialty was not investigated individually (although ultrasound images did appear in some cases in other specialties). A selection of 20 cases from each specialty, considered representative of the different pathologies and image types in the examinations, were selected by two of the authors who are lecturers in diagnostic imaging with guidance from another author who is a consultant radiologist. Each case included between one and seven screens with single or multiple images (
Fig. 1), stacks (gastrointestinal only), or video (nuclear medicine and cardiopulmonary only). These were inserted along with case information (e.g., history, findings, and diagnosis as provided to examiners) into specially modified software (Ziltron) and loaded onto the iPads and computers.
Displays and Software
The Ziltron software was used to display the cases and record ratings on the iPads (of the same make and model) and monitors. Five iPads and four LED 1080-p full-HD (ViewSonic) monitors were used to facilitate multiple participants simultaneously. The Ziltron software allowed zooming, panning, and window and level adjustments on both iPads and monitors through touch and mouse commands, respectively. Images were initially presented full screen, regardless of display device. A fly-in tab could be activated to showcase information. A 5-point rating scale was presented at the bottom of the screen for each case.
Figure 2 shows the iPad and monitor versions of the software.
The iPads could not be calibrated to the DICOM gray-scale standard display function (GSDF) [
17]; however, measurements of maximum and minimum luminance at the center of the display were made using a Luxi photometer (Unfors). Substantial differences existed between the maximum brightness of individual iPads when brightness was set to maximum; therefore, the brightness settings of the brightest were reduced to bring all maximum luminances within 10% of the mean.
Table 1 shows all characteristics of both the monitors and iPads. A TG18-QC test pattern (American Association of Physicists in Medicine) [
18] was viewed on each iPad, and the 5% and 95% luminance patches were visible on all displays. Ambient lighting was maintained at a dim level (< 50 lux). All monitors were calibrated to the DICOM GSDF using VeriLUM calibration software and pod (ImageSmiths).
Readings
One hundred nineteen examining radiologists voluntarily completed the study, of whom 114 reviewed cases from a single specialty and five in two or more, yielding a total of 124 matched readings on the two devices. Demographic data, including years since ABR certification, years of ABR examining, and use of an iPad or other tablet computer were recorded. Ten participants did not complete some or any demographic data. Further details are shown in
Table 2. Sixty-nine percent (
n = 82) of participants reported regular iPad use (any generation), with 6% (
n = 7) regularly using a different tablet computer other than an iPad and 24% (
n = 29) not regularly using a tablet. One participant's response was deemed invalid.
Participants were asked to review the cases and assign each case a single rating of the visibility of the relevant image features in each case. Ratings were made on the following scale of 1–5: 1, unacceptably poor visualization; 2, barely acceptable visualization; 3, fair visualization; 4, good visualization; and 5, excellent visualization.
The Ziltron software was shown to participants before commencing readings, and support was available if required. A counterbalanced methodology was used, whereby approximately half of the participants read on the iPad first and half on the monitor first to avoid potential bias introduced by reading order. No time limit was imposed, although the participants were bound by examination schedules. Review typically took no more than 10 minutes per device. Participants could pause and resume the study at a later time if desired.
Statistical Analysis
A matched-pairs Wilcoxon signed rank test was used to compare the ratings of visibility of relevant image features on the iPad with those on the monitor for all specialties pooled and for individual specialties. The effects of reading order, participant experience, and regular tablet computer use were investigated using Wilcoxon signed rank tests and sum logistic and stepwise regressions.
Because each specialty case-set typically contained images from different modalities (e.g., the mammography cases included mammograms, ultrasound images, and MR images), Wilcoxon signed rank tests were also used to investigate whether the ratings for the iPad and monitor were significantly different for each modality regardless of examination specialty. The categorization of images for this analysis was basic—stacks and videos were analyzed as separate categories, and screens of still images were categorized by modality, regardless of the number of individual images on the screen at one time. For instance,
Figure 1 would have been categorized as a single projection radiograph screen. Screens with images from more than one modality were classed as “other.” Wilcoxon signed rank tests were used to determine whether the number of screens from each modality in each case was associated with monitor or iPad ratings. Results of all analyses were considered significant at
p ≤ 0.05.
Participants were also asked to provide a yes or no answer to the question: “Having completed the study, do you think that the iPad 3 could be considered adequate for the display of images for ABR examinations (such as Initial Certification, Maintenance of Certification, etc.)?” and allowed to explain “Why or why not?” A simple thematic analysis was also performed on the answers provided and on any other written comments.
Results
Differences in Image Quality Ratings
Ratings of image-feature visibility were significantly higher for the iPad than for the monitor across all cases and participants as a whole (
p = 0.0154). The iPad also rated significantly higher in four of the nine specialties, with the monitors rated significantly higher in one and no significant differences in the remainder (
Table 3).
One hundred twenty-four readings of 20 cases (minus one case for one participant in which a device failed to record a single reading) yielded 2479 matched-case ratings. The same ratings were returned on both devices in 1523 of these, with the rating higher on the iPad in 525 and the monitor in 431. Ratings of “unacceptably poor visualization” were recorded eight times for both the iPad and monitor, eight times for the iPad only, and eight times for the monitor only.
For the readers who did not regularly use a tablet computer, no significant differences in ratings for the monitor and iPad 3 were noted (p = 0.4857), whereas tablet users rated the iPad significantly higher (p = 0.0026). Years after ABR certification and years examining had no significant relationship with image quality ratings.
Wilcoxon signed rank tests showed that ratings were higher for the second reading device than the first—that is, those who read first on the iPad rated the monitor more highly (p = 0.0036) and vice versa (p < 0.0001). Because the study counterbalancing was not precisely matched, it was considered important to rule out reading order bias as the cause of the overall preference for the iPad. Sum logistic and stepwise regressions determined that the participants’ response to the question “Having completed the study, do you think that the iPad 3 could be considered adequate for the display of images for ABR examinations (such as Initial Certification, Maintenance of Certification, etc.)?” was the factor most predictive of difference in iPad and monitor ratings and that bias associated with reader order should not have significantly influenced the results.
Image Type and Ratings
Cases with a higher frequency of still CT images, fluoroscopic, and mixed “other” images were associated with significantly higher monitor ratings than iPad ratings. Conversely, the iPad rated significantly higher for cases with higher frequency of CT stacks, MR images, and nuclear medicine images. It is possible that the mechanism of scrolling through CT stacks on the iPad (by dragging a finger along the screen) was preferred to that on the monitor (clicking and dragging with the mouse) by participants, which may have encouraged the higher score; however, because participants were not asked to report their preferences, we cannot say for certain whether this influenced the results. A preference for the iPad may also exist for cases with a higher frequency of ultrasound images, although the difference was not significant (
Table 4). No differences were detected for projection radiographs or videos (although few videos were included in the study).
Participant Opinion and Thematic Analysis
From 124 readings, 117 valid (yes or no) responses to the question: “Having completed the study, do you think that the iPad 3 could be considered adequate for the display of images for ABR examinations (such as Initial Certification, Maintenance of Certification, etc.)?” were recorded (with seven ambiguous or missing responses). One hundred five (89.74%) of the respondents answered “yes” and 12 (10.26%) “no.” The most common concerns expressed related to the resolution and screen size of the iPad; the specific areas most often commented on favorably were the image manipulation, interface, and resolution.
Discussion
When considering the results, it is worth noting that the differences in ratings for monitors and iPads, despite being significant in many cases, were small enough that their practical significance is difficult to quantify and apply. For example, a mean rating of 0.2 higher on the scale between “fair” and “good” visualization may not have any real consequence in the examination setting.
Four specialties (breast, gastrointestinal, genitourinary, and nuclear medicine) showed a significant preference for the iPad in terms of mean rating of the visibility of relevant image features. Other studies that have investigated the diagnostic accuracy of the iPad (of various generations) generally found it to be equivalent to the primary or secondary class monitors to which it was compared [
9–
14] (although one did find significantly better performance with a monitor [
15]). Perhaps, therefore, this result is not surprising.
Because no cases were rated “unacceptably poor visualization” on the iPad only and there was either a preference for the iPad or no significant difference, it appears reasonable to conclude that the iPad is suitable for use in examinations in the gastrointestinal, genitourinary, neuroradiology, and pediatric specialties.
Although a significant difference in favor of the monitor was noted in the vascular and interventional specialty, no images were rated “unacceptably poor visualization” on the iPad, and therefore its use may still be appropriate.
“Unacceptably poor visualization” was assigned to one or more cases on the iPad but not the monitor in the cardiopulmo-nary, musculoskeletal, nuclear medicine, and breast specialties. However, three of the six radiologists who assigned those ratings still stated that they considered the iPad appropriate for use in ABR examinations. Although the decision of what displays to use rests with the ABR, the vast majority of participants appeared to accept, or indeed favor, the iPad. Therefore, its benefits may outweigh doubts surrounding its performance.
Although the categorization of images into modality types for analysis was basic, it is interesting to note that several image types appear to be associated with higher ratings on one or the other display device (
Table 3). Potential reasons for some of these are easy to posit—for instance, many of the still CT images included several slices, and this meant that individual slices were very small when displayed on the smaller iPad screen (
Fig. 3). This, however, might be easily overcome by presenting the images in a different format designed for the iPad. This may also address some of the participants’ concerns over the screen being too small. Preferences for one display over the other for some other modalities are more difficult to explain.
Participants’ Opinions
Although many positive opinions concerning the iPad's display were expressed, the most common concerns included the iPad being of lower resolution than the monitor and that the images appeared more pixelated on the iPad. This raises interesting issues regarding the relationship between physical and perceived image quality because the resolution of the images as loaded into the software was identical for the two devices, and the spatial resolution of the iPad was higher than the LCDs. One might posit that participants may have zoomed images to a higher degree (to the point where pixelation was obvious) using the iPad, either due to the smaller display size or to a behavioral preference related to the “pinch” zooming action used on the iPad. Both the iPad and desktop software used the same linear interpolation scaling algorithm for the zoom function. Alternatively, it is possible that a preconceived notion of tablet computers with smaller screens having low spatial resolution influenced some participants’ perceptions. If the iPad is adopted for examination or other purposes, this potential confirmation bias should be noted.
Regarding the other most common concern—the small screen size of the iPad—studies of the effect of image size have been limited [
19,
20]; however, the results of the current study appear to show that image quality on the whole was adequate. Ergonomic issues should also be further investigated if the iPad is to be used for extended examinations.
Scope of Results and Limitations
This study did not aim to determine whether the iPad was suitable for diagnosis in a clinical setting, and its results should not be interpreted as such; the examination cases were not weighted toward difficult-to-detect abnormalities, and the methodology did not test abnormality detection but rather sought the opinion of experienced examiners about whether diagnoses could reasonably be made on typical examination cases. Furthermore, the study did not aim to evaluate the suitability of the Ziltron software for ABR examinations but only the suitability of the iPad 3 display itself. The results should likely be applicable also to the iPad 4 because the display characteristics do not appear to have changed between these models. A study limitation is the lack of DICOM GSDF calibration of the iPads.
Other potential limitations in the study include participants potentially rating cases before fully reviewing all images or case information or recognizing cases from real examinations; however, it is postulated that viewer behavior did not change significantly between devices and that any effects should have affected both devices equally.
It is possible that the study population was biased because those who felt more strongly for or against use of the iPad in ABR examinations might have been more likely to volunteer; however, the study participants represented approximately 39% of the total population of examiners in the categories tested, and many examiners expressed an interest in research in general, not only on the iPad.
Finally, hindsight bias has been shown in radiology [
21], and it is possible that the participating examiners’ perceptions of how visible image features appeared may have been influenced by having access to the case information and findings provided. Examination candidates, of course, would not have this advantage.
This study was based partially on the comprehensive investigation by Krupinski et al. [
16]; however, Krupinski et al. analyzed ratings assigned to individual images rather than cases. We acknowledge that approach would have yielded substantially more samples for testing and therefore potentially increased statistical power of the tests; however, it was thought that assigning ratings to whole cases might avoid multiple ratings on broadly similar images (e.g., multiple images from a single intervention or series) influencing the results, provided a realistic assessment of the ABR examination system, and streamlined the study for participants who had limited time during examination break periods.
Practical Applications
The results of the study indicate a strong acceptance of the iPad 3 as a potential display device for use in ABR examinations, which may, with appropriate precautions, facilitate easier dissemination and greater standardization of display for candidates. Changes in the format in which some images are presented (e.g., avoiding multiple slices in a single screen) may improve acceptance further. Further investigation of other factors that may influence examination performance (e.g., ergonomic factors) would be beneficial.