|
|
||||||||
1 All authors: Department of Radiology, Division of Nuclear Medicine, Massachusetts General Hospital and Harvard Medical School, Fruit St., Boston, MA 02114.
Received September 13, 1999;
accepted after revision January 24, 2000.
Address correspondence to J. A. Scott.
Abstract
|
|
|---|
MATERIALS AND METHODS. Digital data were obtained from 100 patients with normal findings on chest radiographs who were undergoing both radionuclide ventilation-perfusion scanning and pulmonary angiography. Interpretations of differently trained neural networks were compared with those of three experienced nuclear medicine practitioners unaware of the clinical diagnosis.
RESULTS. Machines running neural networks performed similarly to experienced scan interpreters in the detection of pulmonary embolism. Both the human observers and the networks performed best in cases with large emboli. Neural network performance was best in the right lung, when the networks were trained using only cases with large emboli and when networks were trained independently in the right and left lungs. The best predictions resulted from a collaborative interpretation incorporating both the human and computer predictions.
CONCLUSION. Computers running artificial neural networks using scan data obtained directly from the anterior and posterior ventilation and perfusion images, without human involvement, perform comparably with experienced observers in patients with normal findings on chest radiographs. Human observers can improve their interpretations by incorporating computer output to formulate diagnostic prediction. The method of training the networks is critical to optimizing performance.
|
|
|---|
This study was performed to answer four questions. First, can interpretations performed by a neural network compare favorably with those performed by experienced human observers? Second, does the size of the embolus on which the network is trained influence its accuracy? In other words, will a network trained using only large segmental emboli perform better than one trained on cases with both large and small (i.e., isolated subsegmental) emboli? Does the ability of the network to detect emboli differ between the right and left lungs? Finally, does the incorporation of the results from two different networks (independently predicting the likelihood of embolism in the right and left lung) perform similarly to the more standard training pattern in which a single network is trained on the presence of emboli in either lung?
|
|
|---|
Chest Radiographs
Half the chest radiographs obtained at the time of the radionuclide study
were posterior and lateral examinations with the other half performed with the
patient in the anterior semiupright position. Normal interpretations of these
examinations specified the absence of pleural effusion, parenchymal
consolidation, lung resection, pneumothorax, and metallic artifacts such as
those caused by pacemakers. Neither subsegmental nor platelike atelectasis was
considered to be abnormal for purposes of the study because neither ordinarily
interferes with clinical scan interpretation.
VentilationPerfusion Imaging Procedure
All examinations were performed with a gamma camera (Orbiter; Siemens
Medical Systems, Hoffman Estates, IL). Ventilationperfusion scanning
was performed with the patient in the supine position using from 370 to 740
MBq of xenon-133 breathed through a closed delivery system. Washin,
equilibrium, and washout images were obtained in the posterior projection. The
study consisted of an initial washin image obtained for 100 K-counts or until
limit of breath retention, followed by two 90-sec equilibrium images, three
washout images of 45 sec each, and three trapping images of 60 sec each. Only
the first equilibrium image was quantified, this being performed between 30
and 120 sec of xenon inhalation. All images were obtained with a large
field-of-view gamma camera in a 128 x 128 digital matrix. Perfusion
imaging was obtained after the IV administration of approximately 150 MBq of
99mTc-macroaggregated albumin with the patient in the supine
position, with acquisitions totaling 1000 K-counts each. Only the anterior and
posterior perfusion images were quantified.
Pulmonary Angiography
Pulmonary angiography was performed in accordance with the Prospective
Investigation of Pulmonary Embolism Diagnosis (PIOPED) guidelines
[6]. The angiograms were
interpreted by experienced angiographers at this institution. Separate
injections were administered for each lung with magnified oblique views of the
lung bases. The results were documented indicating the presence or absence of
acute pulmonary embolism, the size and location of the embolus, and the
presence or absence of signs suggesting chronic embolism.
Study Population
One hundred studies were obtained from patients who had normal findings on
chest radiographs followed by both angiography and radionuclide
ventilationperfusion imaging. Data were available for all patients
regarding the size, chronicity, and location of emboli as shown on
angiography. The mean age of these patients was 57 years with age ranging from
26 to 88 years. Chart review indicated that 25% of the studies (n =
25) had been officially interpreted as showing low probability, 65%
(n = 65) as showing intermediate probability, and 10% (n =
10) as showing high probability. Pulmonary emboli were present in 32 of the
100 cases. Twenty-eight of these emboli were acute and four were chronic on
the basis of their appearance on angiography. Six of the cases showed isolated
subsegmental emboli. Twenty-six cases of emboli were in the right lung (11 in
the upper lobe, 10 in the middle lobe, and 22 in the lower lobe). Twenty-two
cases of emboli were in the left lung (seven in the upper lobe and 18 in the
lower lobe). One case was excluded from consideration with respect to the
right lung because of uncertainty in the angiographic interpretation.
Image Parameters
The perfusion and ventilation images were evaluated with a user-independent
whole-lung region-of-interest method. First, an edge-detection program was
used to separate the right and left lungs in the posterior and anterior
perfusion images and in the (posterior) equilibrium ventilation image. The
lower gray-scale cutoff was adjusted on each image to the lowest setting
permitting separation of the two lungs as independent regions of interest.
This setting was used to define the right- and left-lung regions of interest
in subsequent data analysis.
Several indexes were derived from the two regions of interest taken from each image, each region of interest representing the entire right or left lung. These indexes included the total area of the region of interest, the vertical centroid (a y-axis area-weighted center-of-mass measurement for each lung) on both the ventilation and perfusion images, the longest axis in each lung region of interest, the mean pixel count (range, 0-255) in the region of interest, and the standard deviation of counts per pixel in the region of interest. These indexes were obtained from the posterior perfusion, anterior perfusion, and equilibrium ventilation images using image analysis software (Optimas 6; Optimas, Bothell, WA).
The indexes were then combined to form a series of six parameters, providing a total of 12 independent image statistics with six of the independent statistics derived from each side of the chest. These parameters included the ratio of ventilation area to perfusion area, the ratio of ventilation longest axis to perfusion longest axis, the mean count density, the ratio of the standard deviation of the count density to the mean count density in the posterior perfusion image, the vertical (y-axis) centroid on the perfusion image, and the ratio of the ventilation y-axis centroids to perfusion y-axis centroids.
Presentation of Scans to Clinical Observers
Three nuclear medicine staff physicians, each with at least 15 years'
clinical experience, interpreted the scans while unaware of the clinical and
angiographic findings. The digital scans were presented on CD-ROM. Each
observer was asked to estimate the likelihood of embolism in the left lung, in
the right lung, and the overall likelihood of embolism, ranking this from 0%
to 100%. This ranking was to represent the observer's best initial idea or
gestalt estimate for each case. The results were analyzed separately for each
observer.
Artificial Neural Networks
All artificial neural networks were constructed using software
(Neuroshell2; Ward Systems Group, Frederick, MD). Three sets of neural
networks were trained. Two different networks were used to independently
evaluate the likelihood of embolism in the left lung and the likelihood of
embolism in the right lung. Each of these two networks used the six input
parameters and was trained for the presence or absence of embolism in the lung
of interest. A third network was constructed from all 12 inputs obtained from
both lungs and trained on the likelihood of embolism in either lung. Then,
within these three groups, networks were trained either using cases with acute
segmental or large emboli or using cases with acute emboli of any size or
number (including isolated subsegmental emboli). Patients with chronic emboli
were excluded from the training process because ventilationperfusion
scanning is generally considered to be most sensitive to acute emboli.
However, an artificial neural network trained on the patients with acute
emboli was subsequently tested on the chronic cases.
For prediction of the likelihood of embolism in an individual lung, a three-layer backpropagation network architecture was used with six inputs, a single hidden layer of two nodes, and one output node, expressing the likelihood of embolism. For prediction of the likelihood of embolism in either lung, the same network architecture was used except with 12 inputs, a single hidden layer of four hidden nodes, and a single output. A logistic activation function was applied to the hidden layer. Each network was trained for 1000 iterations. The output of the artificial neural network was the scaled likelihood of pulmonary embolism expressed as a continuous variable from 0 to 100. The artificial neural networks were applied using the "jackknife" method [7], in which all cases except one were used to train the artificial neural network, which was then applied to the single excluded case. This procedure was repeated 100 times on the 100 cases so that each case was left out only once (i.e., used as the test case for an artificial neural network trained on the remaining cases).
In this manner, data obtained from the left lung (six inputs) were used to predict the likelihood of embolism in the left lung. The networks trained on left-lung data using only cases without emboli or with acute segmental emboli were termed "ANNLSEG." The networks trained on left-lung data using cases with either no emboli or acute emboli of any size were termed "ANNLALL." The right-lung counterparts, "ANNRSEG" and "ANNRALL," were trained using six inputs derived from the right lung, reflecting the presence or absence of emboli in the right lung. Neural networks designed to predict the overall presence or absence of embolism in either lung incorporated all 12 inputs. These were termed "ANNSEG" when only acute segmental emboli were allowed in the training set and "ANNALL" when acute emboli of any size were allowed in the training set. Another parameter called "ANNMAX" was created by taking the highest likelihood of embolism in the left or right lung produced by networks trained separately on the two lungs (i.e., the higher of ANNRSEG and ANNLSEG values). A set of parameters termed "observer 1CONSULT," "observer 2CONSULT," and "observer 3CONSULT" represented the arithmetic mean of the human observer's interpretation and the network prediction for that case. This figure represented an attempt to define the utility of the network as it might be used in actual clinical practice.
Although the jackknife procedure was used for all training cases, the cases excluded from training differed depending on the experiment. These cases were dealt with in the following manner. Chronic emboli (as defined by their angiographic appearance) were not included in the training sets, although networks derived from the trained cases were subsequently applied to the interpretation of scans in patients shown to have chronic emboli on angiography. Networks trained on only acute segmental or larger emboli (i.e., ANNSEG, ANNLSEG, ANNRSEG) were subsequently applied to the ventilationperfusion scans in patients with isolated acute subsegmental emboli on angiography.
It is important to document the ability of the three human observers to interpret the studies because these findings are used to reference performance of the neural networks. The interpretive accuracy of the three observers can be inferred from their performance over the past 14 years of clinical scan interpretation. During this time, the three observers interpreted 4931 scans, of which 6.7% (332) were interpreted as showing findings that were normal, 57.1% (2816) as low probability, 29.0% (1428) as indeterminate or as intermediate probability, and 7.2% (355) as high probability. Of the selected cases that went to angiography, pulmonary embolism was present in 11% (17/149) of the normal or low-probability cases, 38% (167/435) of the intermediateindeterminate group, and 93% (42/45) of the high-probability group. The individual reading accuracy of the three observers did not significantly differ (observer 1, 91%; observer 2, 89%; observer 3, 90%). Here, accuracy is defined as the number of patients with normal or low-probability interpretations with no emboli on angiography plus the number of patients with high-probability interpretations who had emboli on angiography divided by the total number of patients with normal, low-, and high-probability interpretations. Patients at this institution who underwent angiography after ventilationperfusion scanning typically represented cases in which the clinical evaluation was discordant with the ventilationperfusion scan interpretation. The figures are thus likely to overstate the overall error rate of ventilationperfusion scanning. These results are consistent with other published data [6, 8].
Statistical Analysis
Standard receiver operating characteristic (ROC) curve analysis (SPSS Base
9.0; Statistical Package for the Social Sciences, Chicago, IL) was performed
using the z statistic with a two-tailed p value to determine
whether the performance of the network was significantly better than chance.
The index of performance for the artificial neural networks was the area under
the ROC curves (Az) expressed as Az
± SEM (standard error of the mean). Comparison of two ROC curves was
performed using software (Accuroc 2.0; Accumetric, Quebec, Canada). Plots were
constructed using Rockit software (beta version 0.9.1; Metz CE, Chicago, IL)
on Windows '95 (Microsoft, Redmond, WA). Ranked data were analyzed using the
Wilcoxon's signed rank test. All data are shown as mean ± SEM.
Differences between means were established using a two-tailed t test.
A 350-MHz processorbased computer (Pentium II; Intel, Santa Clara, CA)
constructed by one of the authors was used to perform all calculations.
|
|
|---|
|
All networks trained on the presence or absence of embolism in either lung performed better than chance expectation. Two of these three networks were ANNALL and ANNSEG, which were trained for the presence or absence of any acute embolus and for the presence of segmental or larger acute emboli, respectively. A third figure, ANNMAX, represented the higher of the individual predictions of ANNLSEG and ANNRSEG on a given scan. Although all whole-lung networks made predictions significantly different from chance, the magnitude of the Az values was consistently in the order of ANNMAX > ANNSEG > ANNALL.
Figure 1 shows the positive and negative predictive values for the three human observers as well as for the network ANNMAX in predicting the presence or absence of emboli in either lung. Figure 2 shows similar data for predictions in the right lung alone, and Figure 3 shows predictions for the left lung alone. In all cases, the networks performed comparably with the human observers.
|
|
|
Figure 4 shows the ROCs for the three human observers and for ANNMAX in detecting acute segmental emboli in either lung. The corresponding AZ values are shown in Table 2. Figure 5 shows the ROC curves for the three human observers and ANNRSEG in the right lung. The corresponding Az values were the following: ANNRSEG, 0.811 ± 0.065; observer 1, 0.819 ± 0.058; observer 2, 0.818 ± 0.060; and observer 3, 0.825 + 0.060. Figure 6 shows the ROC curves for the three human observers and ANNLSEG in the left lung. The Az values were the following: ANNLSEG, 0.711 ± 0.066; observer 1, 0.725 ± 0.081; observer 2, 0.819 ± 0.059; and observer 3, 0.705 ± 0.090.
|
|
|
|
Table 3 categorizes the likelihood of embolism in either lung according to the ranges suggested in the revised PIOPED criteria [6, 9]. In accordance with published practice, this defines a low probability as indicating a less than 20% likelihood of embolism and high probability as a greater than 80% likelihood. The network and human predictions were translated into probability categories according to this system, as might reflect the case in actual clinical practice. A definitive interpretation thus implies an estimated likelihood of embolism of more than 80% or less than 20%. An error, for our purposes, was defined as estimating the likelihood of embolism as low when embolism was present or vice versa. As might be expected, there was a higher error rate associated with the detection of subsegmental emboli and with a higher proportion of definitive interpretations. Observer 1 interpreted 73% of the cases (n = 73) definitively while making 16 errors. The network classified 71% (n = 71) definitively, while making 10 errors. Observers 2 and 3 interpreted fewer cases definitively (40% and 51%, respectively) while making correspondingly fewer errors (five errors each).
|
A "consult" figure was defined to represent the arithmetic mean of the human observer's and the network's predictions in given cases. This mean reflects a hypothetical clinical scenario in which the network's prediction was weighted equally with the clinician's estimate. Table 2 shows the Az values for the individual human observers compared with these consult scores. In all cases, the predictions were significantly better than chance (p < 0.001). In all cases, the consult figure showed a higher Az value than did the observer or network alone.
|
|
|---|
Recent work has examined interobserver variability among angiographers in identifying isolated subsegmental emboli [13]. Such variability in the gold standard would impede neural network training. Subsegmental emboli may also be small enough to fall into the spectrum of "normal variability" in scan appearance. These cases would hinder training because the network would associate similar scan data with both the presence and absence of embolism on angiography. These concerns prompted us to create two sets of networks: the first one to train on only acute segmental or larger emboli and the second one to train on acute emboli of any size. Networks trained on only large emboli produced the highest Az values during testing. More clear-cut training data appear to permit better convergence of the network.
Empirically, most scan interpreters believe that the left lung is more difficult to evaluate than is the right, owing to variability imposed by the cardiac silhouette. We examined this by creating networks using statistical parameters derived from only one lung and training these on the presence or absence of emboli in only that lung. Consistent with expectation, performance of the networks in the right lung showed higher Az values than did similar networks in the left lung. Among the human observers as well, the positive predictive value in the left lung was lower than that in the right lung.
If there is considerable variability in the appearance of the left lung, could left-lung inputs obscure more meaningful right-lung data in a network designed to predict the overall likelihood of embolism (trained on inputs from both lungs)? To answer this question, we compared two different methods of predicting the likelihood of embolism. The first method used six left-lung and six right-lung inputs to predict the presence or absence of embolism in either lung (ANNALL). The second method took the higher likelihood of embolism from two six-input networks separately trained on the left and right lungs (ANNMAX). Across all subgroups of patients, training individual networks in each lung proved superior to a single network trained on both lungs together. The crossover of the ROC curves in Figure 6 reflects a skewing of distribution of ANNLSEG predictions (relative to the human observers) toward underdiagnosis of embolism in the left lung.
Although we have previously shown that similar "statistical" networks predicted better than the official reports in the patient's chart [5], this comparison may not adequately reflect the gestalt of an experienced scan interpreter. Although an experienced observer may estimate a 75% likelihood of embolism, this estimate is obscured in a categoric indeterminate or intermediate interpretation. Comparing the network with gestalt interpretations using percentage likelihoods provides a more rigorous evaluation of the method than does a retrospective comparison with chart-based data.
As is evident in Figures 1,2,3,4,5,6, the neural network performed essentially indistinguishably from the three clinical interpreters' gestalt predictions. Table 3 shows that a higher error rate is generally associated with a higher proportion of definitively interpreted cases. Despite this, the network made only 10 errors (low probability in the presence of embolism or high probability in its absence) compared with 16 errors made by observer 1 with a similar definitive interpretation ratio (71% versus 73%). These observations suggest that truncating scan interpretations to arbitrary cutoff points of 20% or 80% may sacrifice much of the experiential gestalt of a seasoned scan interpreter. The superiority of experienced gestalt interpretations compared with the algorithm-based scan categorizations has been noted previously [14].
We modeled the clinical application of the networks by creating a consult figure, representing the likelihood of embolism as the arithmetic mean of the human and network predictions. Table 2 shows uniformly higher Az values for the consult than for either the human or network prediction. Even the best human predictions (observer 2) were improved by averaging this observer's prediction with that obtained from the network. Viewing the scan from different perspectives, the human and computer interpreters appear to make different types of errors. Thus, an error made by one will often be corrected by including the prediction of the other. This synergy is observed when two successfully predictive but imperfectly correlated systems are used together. An optimal clinical strategy may thus involve a "partnership" between man and machine.
In conclusion, these data confirm that the adjunctive use of artificial neural networks can be helpful to interpret ventilation-perfusion scans either as a "second opinion" in the absence of a clinical colleague or for training purposes. To paraphrase Bertolt Brecht, artificial intelligence may not open the door to wisdom, but it may at least set limits to error [15].
|
|
|---|
This article has been cited by other articles:
![]() |
C. D. Johnson Hard- versus Soft-Copy Interpretation Radiology, June 1, 2003; 227(3): 629 - 630. [Full Text] [PDF] |
||||
![]() |
A. M. Evancho, H. Yoshida, and A. Dachman Computer-aided Diagnosis: Blessing or Curse? * Drs Yoshida and Dachman respond: Radiology, November 1, 2002; 225(2): 606 - 607. [Full Text] [PDF] |
||||
![]() |
J. Eng Predicting the Presence of Acute Pulmonary Embolism: A Comparative Analysis of the Artificial Neural Network, Logistic Regression, and Threshold Models Am. J. Roentgenol., October 1, 2002; 179(4): 869 - 874. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Scott Pulmonary Perfusion Patterns and Pulmonary Arterial Pressure Radiology, August 1, 2002; 224(2): 513 - 518. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |