AJR InPractice
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Zivian, M. T.
Right arrow Articles by Lucio, R. W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zivian, M. T.
Right arrow Articles by Lucio, R. W.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Hotlight (NEW!)
Right arrow
What's Hotlight?
AJR 2005; 184:697-699
© American Roentgen Ray Society

General Radiologists' Diagnostic Accuracy: Incomplete Presentation of Data Casts Doubt on Study's Conclusions

Marilyn T. Zivian1 and Raziel Gershater2

1 York University Toronto, Canada
2 North York General Hospital Toronto, Canada

Recently the AJR published a set of standards meant to serve as guidelines for improving the quality of articles reporting diagnostic accuracy (the Standards for Reporting of Diagnostic Accuracy [STARD] initiative [1]). The STARD guidelines stress complete and accurate reporting of a study's methodology and results, including the statistical examination of its data. Complete and accurate reporting allows readers to detect any bias in the study's methodology and/or data analysis (i.e., the study's internal validity) and judge how widely the results may be generalized and applied (i.e., the study's external validity).

Earlier that same year, AJR published a study that attempted to assess general radiologists' accuracy when interpreting emergency CT scans of the head received via teleradiology. The Erly et al. [2] study does not follow the STARD guidelines. To be fair to the authors, their work was done before the publication of the STARD guidelines; nevertheless, we will attempt to show below how incomplete reporting of the study's data undermines its internal and external validity.

Briefly, 15 board-certified general radiologists practicing in the community interpreted 716 consecutive emergency CT scans received via teleradiology. Five neuroradiologists interpreted hard copies of each CT scan the day after the general radiologists' initial interpretations. Comparisons of the neuroradiologists' final interpretations and the general radiologists' preliminary interpretations were categorized as showing agreement, "insignificant" disagreement (no active disease), or "significant" disagreement.

The authors reported 95% agreement between the general radiologists' initial interpretations and the neuroradiologists' final interpretations. Active disease was depicted in 47 of the CT scans. In 11 of the 47, the neuroradiologists' final reports differed from the general radiologists' preliminary interpretations. Significant disagreements occurred in 16 of the 716 cases: Five of the 16 significant errors were false-positives (false alarms), and 11 were false-negatives (misses). Insignificant disagreements occurred in 23 of the 716 cases. The missing data are the number of false alarms or misses for these insignificant disagreements. What turns out to be important about these data is that very few, just two of the 23, were false alarms and many more, 21, were misses (Dr. Erly, personal communication).

Given that the study's data are well suited for signal detection analysis, it is striking that none was done. Nor were sensitivity or specificity rates given for the total data set or for the data set from which the insignificant disagreements have been deleted. Table 1 displays the hits, false alarms, misses, and correct rejections for the Erly et al. [2] data subset from which the insignificant disagreements have been deleted, the complete Erly et al. data set, and the data that would result if someone unfailingly reported only negative findings. Agreement, sensitivity, and specificity rates are given immediately below each of the three data sets.


View this table:
[in this window]
[in a new window]
 
TABLE 1 Signal Detection Presentations of the Erly et al. [2] Data Set from Which "Insignificant" Disagreements Have Been Deleted, the Complete Data Set, and the Data Set That Would Result from Consistently Reporting Negative Findings

 

In addition to the inconsistency concerning the total number of negative findings noted in Table 1, the data reveal additional inconsistencies and confusions. First, when insignificant disagreements are deleted, the total number of cases should be reduced accordingly when computing the general radiologists' accuracy rate. Apparently this was not done, leading the authors to report an accuracy rate of 95% (based on a total of 716), rather than 98% (based on the total corrected for the deleted insignificant disagreements or 693). Second, comparisons of the analyses in Table 1, columns 1 and 2, reveal that when all disagreements, significant and insignificant, are included, there is no change in specificity rate (99%), but the sensitivity rate of the general radiologists falls from 77% to just 53%, barely above chance.

To understand what the combined data reveal, we believe that it helps to consider an extreme case: the diagnostic accuracy of someone like the first author of this letter, who knows nothing about CT scans of the head except that in over 90% of the cases the finding will be negative. Armed with just that one piece of information, she would optimize her performance by always reporting no abnormality. Her data (assuming the same proportion of individuals who presented with abnormal CT scans as in the Erly et al. [2] study) may be found in Table 1, column 3. It can be seen that her agreement and specificity rates are almost equivalent to those of the general radiologists: 91% versus 95% for agreement and 100% versus 99% for rate of specificity. This is not surprising given that so few people (9.3%) present in emergency departments with abnormal CT scans of their heads. Thus, all the data taken together reveal that in this context, accuracy and specificity rates are misleading statistics and convey an overly positive picture of the general radiologists' diagnostic sensitivity.

Erly et al. [2] also reported that the general radiologists' accuracy rates are remarkably similar to those found in previous studies. But if the previous authors did not categorize and delete insignificant disagreements in exactly the same way, Erly et al. are asking readers to compare apples to oranges.

The authors of this letter would also like to argue that deleting insignificant disagreements may be a dubious practice clinically, as well as experimentally. It seems to us that a radiologist who makes insignificant errors would be more likely to also make significant errors. This must be empirically verified, but if it is true, then insignificant disagreements are as important as significant disagreements, and readers would need to know as much about one as the other when assessing diagnostic accuracy.

Finally, it seems to us that the very small proportion of false alarms (1%) in the general radiologists' responses, taken together with the data discussed previously, point to the hypothesis that the general radiologists' performance reveals a bias to report negative findings, a bias that may account for their missing close to one out of two positive findings (or, if you still prefer, one out of four significant positive findings). This then leads to interesting questions of what lies behind the bias. Poorer training? The quality of CT scans received via teleradiology? The knowledge that the findings in nine of 10 emergency CT scans of the head will be normal? Fatigue? A combination of these? Something else?


References
Top
References
References 
 

  1. Bossuyt PM, Reitsma JB, Bruns DE, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. AJR 2003;181:51 –56[Abstract/Free Full Text]
  2. Erly WK, Ashdown BC, Lucio RW II, Carmody RF, Sugar JF, Alcala JN. Evalutation of emergency CT scans of the head: is there a community standard? AJR 2003;180:1727 –1730[Abstract/Free Full Text]

Reply

William K. Erly1, Boyd C. Ashdown2 and Richard W. Lucio3

1 The University of Arizona Health Sciences Center Tucson, AZ 85724-5067
2 Radiology Ltd. Tucson, AZ 85716
3 St. Mary's Hospital Tucson, AZ 85745

We would like to thank Drs. Zivian and Gershater for their careful reading and interpretation of the data reported in our article [1]. The authors are concerned that the published data do not conform to STARD guidelines. However, as they note, these were not published at the time our article was submitted. Of course, had these been available at the time of publication, we would have adhered to them. Nevertheless, we stand by our methods, results, and conclusions.

Zivian and Gershater are troubled by exclusion from the data analysis of the 23 insignificant disagreements between the preliminary report and the final neuroradiologist's interpretation. These data were not included in the article because we believe that they reflect time and space limitations inherent in generating preliminary reports, rather than a failure of perception or analysis on the part of the nighttime radiologists. Therefore, we do not believe that the insignificant disagreement data are an accurate indication of the radiologists' accuracy. Considering the clinical milieu in which the reports were generated, we do not believe that an insignificant disagreement represents an error. Unfortunately, Dr. Zivian obtained these data in a phone conversation in which she asked the number of false-negative and false-positive cases in that subgroup. We were told that the data were for research but were not informed that the data were to be used for publication. Furthermore, she did not ask for the clinical context of these data nor the reasoning that led to their exclusion.

Zivian and Gershater's opinions appear due, at least in part, to a lack of understanding of the purpose and nature of these preliminary reports. This misunderstanding may reflect our failure to define the difference between the preliminary and final report. The preliminary reports were meant to inform the emergency room physician of active or significant disease processes that demand immediate intervention, not to provide a comprehensive analysis of the CT images. Consequently, a preliminary report of "negative" in a trauma patient with chronic ischemic change may not be the result of "missing" the ischemic disease. The nighttime radiologists may have decided that because a finding was of no acute clinical consequence, it did not need to be included in the preliminary report. Thus, although the reports may not be identical, the difference is not a radiologic miss, and what Zivian and Gershater interpret as misses may simply be a failure of reporting. This is supported by a review of the insignificant discordant reports, in which 21 of the 23 were due to the absence of an insignificant finding on the preliminary report. In our article, we wished to focus on significant disagreements because these may have affected patient treatment or outcome, as well as being the disagreements that we believed were the result of a failure of perception or analysis, rather than a failure of reporting.

Zivian and Gershater are correct in stating that comparing these data with those of previous reports of radiologist accuracy may be comparing apples to oranges. We believe that regardless of how our data are treated, when a subjective assessment of radiologic and clinical significance is used as the measurement, comparison between studies should be made with caution. An insignificant disagreement to one radiologist may be either an agreement or a significant disagreement to another radiologist. Nevertheless, we believe that most clinical radiologists will concur with our analysis of discordant reports.

Regarding other methodological issues, we included the insignificant cases in the 716 case total because we were interested only in the rate of significant disagreement. In any event, discarding the 23 cases from the case total as Zivian and Gershater suggest leaves the rate of significant disagreement unchanged at 2%.

Zivian and Gershater acknowledge that there is no evidence to support their argument that a radiologist who makes insignificant errors is more likely to make significant errors. On the contrary, it may be that the ability to disregard insignificant findings while interpreting an image allows one to search out and identify significant findings more readily. It is even more dubious for Zivian and Gershater to conclude that radiologists who do not report an insignificant finding are more likely to make a significant error. In other words, as Zivian and Gershater admit, there are no data to support their statement that "insignificant disagreements are as important as significant disagreements." Having made both insignificant and significant errors in our clinical work, we definitely prefer the former to the latter.

Finally, we agree that there may be a bias in the reporting of the general radiologist towards the negative. This is a bias with which many of us struggle. One part of this may be, as Zivian and Gershater suggest, the high incidence of negative studies originating from the emergency department.

In summary, we believe Zivian and Gershater's objections are based on their misunderstanding of the clinical context of our insignificant disagreement data. They do not understand that an insignificant disagreement between a preliminary and final report is not the same thing as a miss. We believe it is more likely a failure of reporting and should not have been included in our assessment of nighttime radiologists' accuracy. Consequently, we remain confident in the value and validity of our study, data analysis, and conclusion.


References 
Top
References
References 
 

  1. Erly WK, Ashdown BC, Lucio RW II, Carmody RF, Sugar JF, Alcala, JN. Evalutation of emergency CT scans of the head: is there a community standard? AJR 2003;180:1727 –1730

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?



This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Zivian, M. T.
Right arrow Articles by Lucio, R. W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zivian, M. T.
Right arrow Articles by Lucio, R. W.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Hotlight (NEW!)
Right arrow
What's Hotlight?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS