|
|
||||||||
1
Department of Radiology, University of Occupational and Environmental Health
School of Medicine, Iseigaoka 1-1, Yahatanishi-ku, Kitakyushu-shi, Japan
807-8555.
2
Nippon Bunri University General Research Center, Nippon Bunri University,
Ichiki 1727, Oita-shi, Japan 870-0397.
3
Kurt Rossmann Laboratories for Radiologic Image Research, Department of
Radiology, The University of Chicago, 5841 S. Maryland Ave., Chicago, IL
60637.
Received January 25, 2001;
accepted after revision August 30, 2001.
Presented at the annual meeting of the American Roentgen Ray Society,
Washington, DC, May 2000.
Abstract
|
|
|---|
MATERIALS AND METHODS. We selected 155 cases with pulmonary nodules less than 3 cm (99 malignant nodules and 56 benign nodules). An artificial neural network was used to distinguish benign from malignant nodules on the basis of seven clinical parameters and 16 radiologic findings that were extracted by attending radiologists using subjective rating scales. In the observer test, 12 radiologists (four attending radiologists, four radiology fellows, and four radiology residents) were presented with high-resolution CT images, first without and then with the artificial neural network output. Observer performance was evaluated by means of receiver operating characteristic analysis using a continuous rating scale.
RESULTS. The artificial neural network showed a high performance in differentiating benign from malignant pulmonary nodules (Az = 0.951). The average Az value for all radiologists increased by a statistically significant level, from 0.831 to 0.959, with the use of the artificial neural network output.
CONCLUSION. Our computerized scheme using the artificial neural network can improve the diagnostic accuracy of radiologists who are differentiating benign from malignant pulmonary nodules on high-resolution CT.
|
|
|---|
Artificial neural networks have been studied intensively in the field of computer science in recent years and have been shown to be a powerful tool for a variety of data-classification and pattern-recognition tasks. However, the usefulness of artificial neural networks in diagnostic radiology was reported only in chest radiography and mammography in the diagnosis of pulmonary nodules, interstitial lung disease, pediatric lung lesions, and breast nodules [7,8,9,10,11,12,13]. The purpose of this study was to apply an artificial neural network in the differentiation of benign from malignant pulmonary nodules on high-resolution CT images and to evaluate the effect of artificial neural network output on the performance of radiologists using receiver operating characteristic (ROC) analysis.
|
|
|---|
CT was performed with a TCT-900S Helix (Toshiba, Tokyo, Japan) or Somatom Plus 4 (Siemens Medical Solutions, Erlangen, Germany). Routine scanning of the whole lung (120 k Vp, 150 mA or 140 kVp, 170 mA) was first performed using the helical mode with a table speed of 10 mm/sec and a 10- or 5-mm collimation. Images were printed as fixed settings (lung window center, -650 to -700 H; lung window width, 1500-1600 H; mediastinum window center, 35-50 H; mediastinum window width, 300-360 H). Additional high-resolution CT with 2.0-mm collimation (120 kVp, 250 mA or 140 kVp, 145 mA and 1.0- or 0.75-sec scanning time) covering the tumor was performed in all patients. High-resolution CT images were reconstructed with a high-spatial-frequency algorithm and printed at fixed settings (lung window center, -650 or -700 H; lung window width, 1500 H; mediastinum window center, 35 or 50 H; mediastinum window width, 300 H). All scans were obtained with the patients in the supine position and at end inspiration.
Artificial Neural Network Scheme
We used a three-layer, feed-forward, artificial neural network with a
back-propagation algorithm that was developed at The University of Chicago. We
designed the artificial neural network with 23 input units for seven clinical
parameters and 16 radiologic findings and one output unit corresponding to the
likelihood of malignancy. The clinical parameters included the patient's age,
sex, history of smoking, underlying malignancy, history of familial
malignancy, weight loss, and severity of symptoms. Sixteen radiologic findings
were classified into three categories: features related to nodules (size,
shape, concavity, border definition, irregular undulation, spiculation, air
space, air bronchogram, ground-glass opacity, and calcification); secondary
abnormalities (pleural indentation, satellite lesions, arterial involvement,
and venous involvement); and additional abnormalities (emphysematous changes
and lymphadenopathy). High-resolution CT images were used for all radiologic
findings, except for lymphadenopathy for which we used conventional CT
images.
Subjective ratings for the 16 radiologic findings were provided independently by three attending radiologists with more than 10 years' experience in chest radiology who were unaware of the final diagnosis. Three attending radiologists used a score sheet with a scale from 0 to 10. The observers used a ruler to measure the maximum size of the nodule. The shape ranged from strand (score, 0) to round (score, 10). Concavity was defined as the concave or the straight line of the border of the nodule measuring more than 4 mm in length. Concavity ranged from no such line (score, 10) to more than three lines (score, 0) with one line (score, 5) in the middle. Border definition ranged from well defined (score, 0) to poorly defined (score, 10). Irregular undulation was defined as the unevenness of the margin, which ranged from smooth (score, 0) to irregular in the whole margin (score, 10). Spiculation ranged from nonspiculated (score, 0) to spiculated in the whole margin (score, 10). We defined the focal air attenuation of the nodule as air space. Air space was assessed as for its ratio within the nodule, ranging from 0% (score, 0) to greater than 75% (score, 10). Cavitation was included in air space, and no attempt was made to separate it from air space. Air bronchogram was defined as a tubelike or branched air structure within the nodule, ranging from obviously having no structure (score, 0) to having more than two such structures (score, 10). Ground-glass opacity was evaluated for its ratio within the nodule, ranging from 0% (score, 0) to 100% (score, 10). Calcification was defined as an area of high attenuation observed on the mediastinal window setting and was assessed for its ratio within the nodule, ranging from 0% (score, 0) to greater than 50% (score, 10). The "edge-enhancement" artifact was ignored. Pleural indentation ranged from having no (score, 0) to obviously having more than two pleural indentations (score, 10). Satellite lesion was defined as a separate small discrete nodule observed within 5 mm of the dominant nodule. It ranged from having no lesion (score, 0) to obviously having more than two satellite lesions (score, 10). Some pulmonary vessels coursing along the bronchi were considered arteries and others, veins. Arterial (or venous) involvement was defined as the vessel running into the nodule; it ranged from no involvement (score, 0) to the involvements of more than two vessels (score, 10). Emphysematous change was defined as enlargement of air space without obvious fibrosis surrounding the nodule, ranging from no (score, 0) to marked emphysematous change with complete loss of normal lung parenchyma (score, 10). Conventional CT images were used for the evaluation of lymphadenopathy, ranging from no lymphadenopathy (score, 0) to lymphadenopathies consisting of more than two lymph nodes enlarged more than 1 cm in diameter (score, 10). Reference samples with scores of 0, 5, and 10 were shown to each attending radiologist for all these radiologic findings except size.
Table 1 shows one example of the subjective ratings of one attending radiologist for lung cancer (Fig. 1) and organizing pneumonia (Fig. 2). Input data obtained from clinical parameters and subjective ratings for radiologic findings, which were rated by attending radiologists having experience of more than 10 years, were normalized to the range from 0 to 1.
|
|
|
A round-robin method (leave-one-out method) was applied for training and testing to evaluate the performance of the artificial neural network [12]. In this method, all but one case in a database is used to train the artificial neural network. The single case that is left out is then used to test the artificial neural network. This procedure is repeated so that each case in the database is used once as a test case. Output values ranging from 0 to 1 indicated the likelihood of malignancy in each case (Fig. 3).
|
Observer Test
In the observer test, 50 cases (25 malignant, 25 benign) were selected in
which the performance (Az = 0.951) was practically equal
to that obtained with all cases in the database. Twelve radiologists,
including four attending radiologists with 10 years' or more experience, four
radiology fellows with 4-6 years' experience, and four radiology residents
with less than 3 years' experience participated in the observer test. We used
a sequential test for ROC studies to evaluate the performance of radiologists
[14]. First, observers were
shown high-resolution CT and conventional CT images with seven clinical
parameters for the initial rating of the confidence level of malignancy
(without artificial neural network output). Subsequently, artificial neural
network output was presented to the same observer, who rated the confidence
level a second time (with artificial neural network output). The observer
could either change the initial ratings or leave them unchanged.
The observer's confidence level about the likelihood of malignancy was represented using an analog continuous-rating scale with a line-checking method. For the initial ratings, the observers used a black ballpoint pen to mark their confidence levels along an 8-cm line. Ratings of "definitely benign" and "definitely malignant" were marked above the left and the right ends of the line, respectively. If the second ratings were different from the initial ones, the observers used a red ballpoint pen to mark their confidence levels along the same line. For data analysis, the confidence level was scored with a maximum of 100 units by measuring the distance from the left end of the line to the marked point. A total of 10 cases that were not used for the test were selected to train all observers.
Data Analysis
The diagnostic performance of radiologists with and without the artificial
neural network output was evaluated using ROC analysis. Binormal ROC curves
for diagnosing benign from malignant nodules were estimated using the LABROC 5
algorithm (available through the Internet from Metz, CE, The University of
Chicago, Chicago, IL). LABROC 5 was used to obtain maximum-likelihood
estimates of binormal ROC curves from the continuous ordinal-scale rating
data. Az values representing the area under the ROC curve
were calculated. The statistical significance of differences between ROC
curves for each reading condition was determined by applying a two-tailed
t test for paired data to the reader-specific Az
values. We also compared the diagnostic performance of attending radiologists,
radiology fellows, and residents using a two-tailed test. To represent the
overall performance for each group of observers, average ROC curves were
generated for the four attending radiologists, the four radiology fellows, the
four radiology residents, and all radiologists by averaging the two binomial
parameters of their individual ROC curves
[15,16,17].
Another indication of performance was the number of correctly diagnosed cases
for which the observer's confidence level changed because of the artificial
neural network output. We assumed that a clinically relevant change in the
confidence rating occurred only when the difference calculated in this way was
greater than 30 units on the confidence rating scale
[14]. The difference between
the average number of cases affected beneficially and those affected
detrimentally using the artificial neural network output was analyzed using a
two-tailed test.
|
|
|---|
|
|
|
|
|
|
The number of cases affected either beneficially or detrimentally by the artificial neural network output on benign and malignant nodules are shown in Figures 9 and 10. The artificial neural network output affected the observers' confidence in 59 cases. The confidence level was affected beneficially for 25 malignant and 29 benign nodules. Only four malignant and one benign nodule were detrimentally affected. The number of cases affected beneficially were significantly higher than the number of cases affected detrimentally for both benign and malignant nodules (p < 0.001).
|
|
|
|
|---|
Nakamura et al. [18] used the artificial neural network with 10 input units for two clinical parameters and eight radiologic findings using chest radiography. The Az value (Az = 0.951) of our current computer-aided diagnostic scheme using high-resolution CT is higher than the Az value (Az = 0.854) of their scheme using chest radiography. A statistical multiple object detection and location system (S-MODELS) neural network technique to differentiate benign from malignant pulmonary nodules on CT findings was also reported [7]. This S-MODELS showed the potential to reduce the number of biopsies without missing malignant nodules. However, unlike our current method with the artificial neural network, the weights of the S-MODELS are fixed, and there is no iterative learning process.
Gurney and Swensen [6] used five radiographic (chest radiographs and CT) and two clinical characteristics and compared the performances between the artificial neural network and the Bayesian method. They reported that the Az value of the Bayesian system (Az = 0.894) was significantly higher than that of the neural network (Az = 0.871) and that the neural networks offered no advantage over the Bayesian system in the prediction of probability of malignancy in solitary pulmonary nodules. Our scheme using the neural network achieved a higher performance (Az = 0.951). At present, we do not know the reasons for the difference between our results and the previous report because different cases were used in each study.
It is difficult to examine the cause of this difference. The artificial neural network has a unique ability to learn specific patterns between input and output data if it is repeatedly trained with examples. However, this ability strongly depends on the quality of the input data. The quality of input data using the subjective ratings for the artificial neural network depends on the ability of radiologists. If the input data are randomly selected and have no correlation with the output data, the artificial neural network cannot learn any specific patterns. It should be noted that important input data in the present study were given by the subjective ratings for 16 radiologic findings on all nodules by three attending radiologists who had over 10 years' experience. To further develop this diagnostic system for practical use, we need to construct a computer-aided diagnostic scheme with ratings by radiology residents who are inexperienced and then to evaluate its diagnostic performance.
We used 16 radiologic and seven clinical findings, which are, at present, considered useful to differentiate benign from malignant solitary pulmonary nodules [2, 5]. If other useful clinical data such as tumor markers were available as input data, a better performance with the artificial neural network could be expected. However, in this study, we used a smaller number of essential features in our attempt to develop an artificial neural network scheme for use in practical clinical settings. Although the present 155 cases were limited in number and the combination of all input data used in the present study would not be necessarily generally applicable, we attempted to construct a computer-aided diagnostic scheme consisting of 23 input data that were proved useful. In the future, it may be necessary to find the combination of input data that provide the best performance, using a much larger number of cases.
In the observer tests, our computer-aided diagnostic scheme using an artificial neural network improved the diagnostic accuracy of radiologists in terms of differentiating benign from malignant pulmonary nodules on high-resolution CT. The diagnostic performance by all the radiologists was significantly (p < 0.001) improved with the artificial neural network output.
The artificial neural network output affected the observer confidence in 59 cases in terms of both beneficial and detrimental effects. The number of cases affected beneficially was significantly larger than those affected detrimentally for both benign and malignant nodules (p < 0.001). In our study, the artificial neural network output caused little detrimental effect (0.17%) on observers' confidence levels, especially in the diagnosis of benign nodules. To investigate why the artificial neural network had little detrimental effect (0.17%), we analyzed cases in which the artificial neural network produced a "wrong output." Most radiologists, even the inexperienced radiology residents, were not confused by the artificial neural network output in such cases. The results may arise from the difference between the radiologists and the artificial neural network in the process of differentiating benign from malignant nodules, although input data for the artificial neural network were subjectively extracted by radiologists. This finding also supports the concept of using artificial neural networks as a second opinion to complement the performance of radiologists.
In conclusion, our computer-aided diagnostic scheme using an artificial neural network showed a high performance and improved the diagnostic accuracy of radiologists in differentiating benign from malignant pulmonary nodules on high resolution CT.
|
|
|---|
This article has been cited by other articles:
![]() |
K. Yamashita, T. Yoshiura, H. Arimura, F. Mihara, T. Noguchi, A. Hiwatashi, O. Togao, Y. Yamashita, T. Shono, S. Kumazawa, et al. Performance Evaluation of Radiologists with Artificial Neural Network for Differential Diagnosis of Intra-Axial Cerebral Tumors on MR Images AJNR Am. J. Neuroradiol., June 1, 2008; 29(6): 1153 - 1158. [Abstract] [Full Text] [PDF] |
||||
![]() |
E M Schultz, G D Sanders, P R Trotter, E F Patz Jr, G A Silvestri, D K Owens, and M K Gould Validation of two models to estimate the probability of malignancy in patients with solitary pulmonary nodules Thorax, April 1, 2008; 63(4): 335 - 341. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. K. Gould, J. Fletcher, M. D. Iannettoni, W. R. Lynch, D. E. Midthun, D. P. Naidich, and D. E. Ost Evaluation of Patients With Pulmonary Nodules: When Is It Lung Cancer?: ACCP Evidence-Based Clinical Practice Guidelines (2nd Edition) Chest, September 1, 2007; 132(3_suppl): 108S - 130S. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Nie, Q. Li, F. Li, Y. Pu, D. Appelbaum, and K. Doi Integrating PET and CT Information to Improve Diagnostic Accuracy for Lung Nodules: A Semiautomatic Computer-Aided Method J. Nucl. Med., July 1, 2006; 47(7): 1075 - 1080. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. H. Kim, J. M. Lee, J. H. Kim, K. G. Kim, J. K. Han, K. H. Lee, S. H. Park, N.-J. Yi, K.-S. Suh, S. K. An, et al. Appropriateness of a Donor Liver with Respect to Macrosteatosis: Application of Artificial Neural Networks to US Images--Initial Experience Radiology, March 1, 2005; 234(3): 793 - 803. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Li, M. Aoyama, J. Shiraishi, H. Abe, Q. Li, K. Suzuki, R. Engelmann, S. Sone, H. MacMahon, and K. Doi Radiologists' Performance for Differentiating Benign from Malignant Lung Nodules on High-Resolution CT Using Computer-Estimated Likelihood of Malignancy Am. J. Roentgenol., November 1, 2004; 183(5): 1209 - 1215. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Fukushima, K. Ashizawa, T. Yamaguchi, N. Matsuyama, H. Hayashi, I. Kida, Y. Imafuku, A. Egawa, S. Kimura, K. Nagaoki, et al. Application of an Artificial Neural Network to High-Resolution CT: Usefulness in Differential Diagnosis of Diffuse Lung Disease Am. J. Roentgenol., August 1, 2004; 183(2): 297 - 305. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |