|
|
||||||||
1 All authors: Department of Radiology, University of Pittsburgh and Magee-Womens Hospital, 300 Halket St., Pittsburgh, PA 15213.
Received April 10, 2002;
accepted after revision June 24, 2002.
Address correspondence to D. Gur, Department of Radiology, University of
Pittsburgh, 200 Lothrop St., Pittsburgh, PA 15213-2582.
Abstract
|
|
|---|
SUBJECTS AND METHODS. In a prospective study, 33 technologists at a central facility and five satellite breast imaging facilities recorded whether mammograms obtained during 3019 examinations showed negative findings or findings that indicated that additional procedures were required. The technologists were not specifically trained for the experiment. The technologists' interpretations were compared with radiologists' interpretations.
RESULTS. Technologists and radiologists agreed in 82% of the cases (77% negative findings and 5% requiring follow-up). Of the 175 cases recommended for follow-up by only the radiologists, 17 were ultimately biopsied and two were found to be malignant.
CONCLUSION. Even without undergoing additional training, technologists can perform at reasonable levels of accuracy in classifying screening mammograms. The possibility of using technologists to group cases after the technologists have undergone training is an interesting concept that should be explored further.
|
|
|---|
One study, performed by Hillman et al. [6], found that specifically trained physician assistants could interpret mammograms with a high degree of accuracy. Most other studies in this area have trained X-ray technologists to evaluate mammograms. Accuracy levels in these studies vary, but these findings, which were obtained under restricted laboratory environments, are encouraging [7,8,9].
Although somewhat outdated perhaps, the concept of using nonphysicians to assist in the diagnostic process remains an interesting one because of the rapid changes in the clinical practice of mammography screening. Mammography technologists have frequent and direct access to the acquired images, and many technologists have a large amount of experience reviewing mammograms with both negative and abnormal findings for quality assurance purposes. Therefore, with additional training, technologists could accurately group screening mammograms into two basic categories: cases that require further evaluation and cases that have either negative or clearly benign findings and the patient needs to return only at the next periodic screening date.
The purpose of this study was to assess in a prospective manner the baseline ability (i.e., before training) of mammography technologists to classify screening mammograms into these two groups.
|
|
|---|
All mammography technologists from the hospital and the satellite facilities were asked to participate in this institutional review boardapproved study to evaluate their ability to classify mammograms as either requiring follow-up or not requiring follow-up. The 33 technologists who participated did not receive formal training for the purpose of this study. At the time of the study, the number of years that the technologists had worked specifically in mammography ranged from 2 to 26. The technologists had a median of 13 years of experience, and approximately 60% are certified by the American Cancer Society as instructors of breast self-examination.
The technologists were asked to take a minute or two to evaluate the screening mammograms they acquired when they performed general quality assurance. Because the technologists review available previously obtained mammograms as part of their routine, the technologists had the opportunity to look at previously obtained mammograms, when available, during the assessment. Viewing conditions were not controlled during the study (e.g., room lighting and background masking). After reviewing a screening mammogram, the technologists completed a short form that included the patient's and technologist's identification numbers, the date of the examination, where the examination was performed, and the category into which they classified the mammograms. The two possible categories were as follows: the patient requires additional workup, such as additional views, sonography, or biopsy; or the patient may return in a year without further workup. Technologists were not asked to identify the specific reason for the recommendation for additional workup or the specific location of a suspected abnormality, when applicable. They were not instructed to err on the false-positive side to increase sensitivity. Data collection was performed over a period of approximately 6 weeks, yielding 3019 examination ratings.
All forms were collected by a department supervisor and submitted to the research staff. Data were entered into a database in which the technologists were identified only by a number, which was listed with the examination date, a patient identifier, and a binary classification code. Data about all cases were input into a second database; in addition, the interpretation made by one of nine radiologists was ascertained from the reports (0, recommending further patient evaluation; 1, negative findings; and 2, benign findings) and input into the second database. All cases that were recalled by the radiologists were characterized with respect to the reason for the recall and were followed up to determine the final outcome (disposition) of the case.
The radiologists who interpreted these mammograms as part of their clinical duties are all faculty members of the department and specialize in women's imaging in general and breast imaging in particular. The experience of these nine radiologists in breast imaging ranges from 3 to 20 years, all are certified according to the tenets of the Mammography Quality Standards Act [10], and they all actively participate in continuing medical education in related areas.
All nine radiologists interpreted the mammograms used for our study as part of their routine work. These mammograms are among more than 42,000 screening mammograms currently being interpreted per year. Individual radiologists interpret from 2000 to more than 5000 cases per year. All radiologists participate in monthly reviews of cases with false-negative findings and of recall rates. Quality assurance review (including double interpretations) is performed routinely. Our verified false-negative rate for this group of radiologists for 1999 and 2000 was approximately 0.75 false-negative findings per 10,000 interpretations.
Analysis
To assess the ability of the technologists to accurately characterize
findings on screening mammograms, we needed to compute the proportion of cases
in which the technologist and the radiologist agreed that further evaluation
was required or that the patient required no further workup (or both).
Similarly, we calculated disagreement rates for cases identified by the
technologist as requiring further action and by the radiologist as not
requiring follow-up. Alternately, we also reviewed the cases in which the
radiologist asked for follow-up and the technologist did not.
In evaluating the technologists' performance, we conservatively assumed that any cases missed by the radiologists would also be missed by the technologists; hence, the radiologists' interpretations served as the gold standard. We also examined the reasons for recall provided by the radiologists in their reports so that we could determine which features were interpreted as requiring follow-up by the radiologists but not by the technologists.
Cases were initially divided into five categories: suspicious microcalcifications, suspicious nodule or mass, asymmetry, palpable mass not visualized on the screening mammogram, and miscellaneous findings. The miscellaneous category included cases for which technical problems occurred with the mammograms or the acquisition protocol, cases of suspected ruptured implant, regions of scar tissue, and cases with multiple reasons for recall. Because all breast examinations at our facilities (screening and diagnostic) include a clinical examination and because a sonogram and additional views (or both) of a palpable abnormalityregardless of whether it is depicted on the imagesare often obtained, we listed these cases in the initial distribution as reported by the radiologists; however, these cases were later excluded from the analysis because follow-up was recommended on the basis of information provided from another source, not imaging. Technical recalls, as determined by the radiologists, were also excluded from the analysis.
To investigate the possibility that technologists, particularly before receiving special training, are more likely to characterize a mammogram as showing suspicious findings if a benign abnormality is present, we analyzed the data for the subset of cases: those that had been classified by the radiologists as a category 2 lesion, according to the Breast Imaging Reporting and Data System (BI-RADS) [11].
|
|
|---|
|
Table 2 shows, as an example, the distribution of cases with agreement and disagreement between radiologists and technologists for 11 technologists who made 100 or more assessments each. Of these 11 technologists, three had a disagreement rate of less than 5% of the cases recommended only by the radiologists for a follow-up (technologists 1, 7, and 9). Furthermore, the highest miss rate for this group was 10% for technologist 10 (with an average of 6%), if we assume that the radiologists' interpretations are correct (i.e., a gold standard). Similar results were observed for the group as a whole. However, although participation was completely voluntary, this group of technologists clearly showed, partially through self-selection, an active interest in participating. We report the level of agreement between the technologists' interpretations and the radiologists' interpretations.
|
Table 3 shows the distribution of cases with agreement and disagreement between radiologists and technologists for radiologists who made more than 100 assessments each in this group of cases. Of note are the average recall rate of 13% (for this group of radiologists and cases) and the wide variability among the recall rates for different radiologists (range, 9-19%). The largest difference between recommendations for recall occurred in the subgroup of cases for which the reason stated by the radiologist was asymmetry or architectural distortion.
|
Of the 72 cases recalled by the radiologists for this reason, only 21 were recommended for recall by the technologists. Several possible reasons for the higher than expected recall rate for the radiologists in this study include but are not limited to the following: One of the radiologists included in this study joined our group only a few weeks before data collection began. Although this radiologist was experienced in interpreting diagnostic mammograms, this individual was not familiar at the time with the type of cases we routinely see in our screening program. Most of our patients have been enrolled for several years in our screening program, and the positive findings associated with the average cases have been shown to be more subtle on a relative scale in our population than would generally be expected. Finally, because of the large volume of cases in our practice and the variability in the technologists' abilities to make independent decisions, all decisions about acquiring additional images are made by physicians; technologists are limited to acquiring only four views per examination unless a technical fault clearly exists in the study. Hence, in general, our radiologists are more concerned about false-negative findings for subtle cases.
Of the 832 cases rated by the radiologists as a BI-RADS category 2 lesion (i.e., a benign finding) in at least one breast, technologists recommended follow-up for 183 (22%). In contrast, they recommended only 17% of all other cases for follow-up. Of the cases recommended for follow-up by only the radiologists (n = 175 patients), additional views (one bilaterally) were obtained in 96 patients, and of this group, nine underwent biopsy. Sonography was recommended for 43 patients, and one underwent biopsy. Both additional views and sonography were recommended for 36 patients. Seven of these 36 patients ultimately underwent biopsy. Seventeen biopsies were performed at our institution in this group of patients: two were found to be malignant and 15 were benign. One of the malignant lesions was depicted on mammography as a subtle cluster of microcalcifications, and the other was suspected because the radiologist detected architectural distortion on mammography.
|
|
|---|
Despite these encouraging results, using technologists as physician
extenders in the diagnostic process would clearly require that they be
specifically trained for this purpose. Both their detection sensitivity and
specificity must increase above the baseline levels reported here. The wide
distribution of reporting in terms of cases per technologist stems from
several factors, including work schedule and individual assignments to
different areas (e.g., diagnostic, screening, biopsy). Because we did not
force technologists to report every screening case and during the period in
question we performed approximately 5000 screening procedures, intentional
selection of cases may have occurred. Namely, some technologists may have
reported only what they perceived to be "easy" or
"clear" cases. However, because of the general level of compliance
in recordings (
60% of all cases were reported), we suspect that most
participants did not intentionally select cases. Our preliminary results
indicate that accurate determination of benign findings and recognition of
potentially important asymmetry are only some of the areas in which training
could improve the technologists' performance. Even if technologists can be
trained to detect suspected abnormalities, characterizing the abnormalities as
benign or malignant with a high degree of accuracy requires a substantial
amount of training. This project addressed only the former step in
detection.
We recognize that the approach we investigated in this preliminary study has little impact on current practices. Our study was designed as a prospective real-time experiment that did not substantially affect technologists' work-flow and did not affect the radiologists' work-flow. Only a few studies of this type have, to our knowledge, been reported in the literature. By its nature, this design yielded limitations that must be recognized. Most important, cases were not specifically selected; hence, the number of cases with positive findings was small, and the number of errors, as shown by the estimated false-negative rate, was large. Last, any attempt to formally include technologists in the diagnostic process clearly should address a number of difficult issues including, but not limited to, the handling of the litigious nature of mammography.
|
|
|---|
This article has been cited by other articles:
![]() |
L. E. M. Duijm, J. H. Groenewoud, J. Fracheboud, and H. J. de Koning Additional Double Reading of Screening Mammograms by Radiologic Technologists: Impact on Screening Performance Parameters J Natl Cancer Inst, August 1, 2007; 99(15): 1162 - 1170. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |