|
|
||||||||
1 Department of Radiology, Kurt Rossmann Laboratories for Radiologic Image
Research, MC-2026, The University of Chicago, 5841 S Maryland Ave., Chicago,
IL 60637.
2 Department of Intelligent Systems, Faculty of Information Sciences, Hiroshima
City University, Hiroshima 731-3194, Japan.
3 Azumi General Hospital, Ikeda, Nagano 399-8695, Japan.
Received November 10, 2003;
accepted after revision January 30, 2004.
Supported in part by USPHS grant CA62625.
Abstract
|
|
|---|
MATERIALS AND METHODS. We developed an automated computerized scheme for determining the likelihood of malignancy of lung nodules on multiple HRCT slices; the likelihood estimate was obtained from various objective features of the nodules using linear discriminant analysis. The data set used in this observer study consisted of 28 primary lung cancers (6-20 mm) and 28 benign nodules. Cancer cases included nodules with pure ground-glass opacity, mixed ground-glass opacity, and solid opacity. Benign nodules were selected by matching their size and pattern to the malignant nodules. Consecutive region-of-interest images for each nodule on HRCT were displayed for interpretation in stacked mode on a cathode ray tube monitor. The images were presented to 16 radiologistsfirst without and then with the computer outputwho were asked to indicate their confidence level regarding the malignancy of a nodule. Performance was evaluated by receiver operating characteristic (ROC) analysis.
RESULTS. The area under the ROC curve (Az value) of the CAD scheme alone was 0.831 for distinguishing benign from malignant nodules. The average Az value for radiologists was improved with the aid of the CAD scheme from 0.785 to 0.853 by a statistically significant level (p = 0.016). The radiologists' diagnostic performance with the CAD scheme was more accurate than that of the CAD scheme alone (p < 0.05) and also that of radiologists alone.
CONCLUSION. CAD has the potential to improve radiologists' diagnostic accuracy in distinguishing small benign nodules from malignant ones on HRCT.
|
|
|---|
We developed an automated computerized scheme [7] for determination of the likelihood measure of malignancy by using various objective features of the nodules in our a database of thick-section low-dose CT; one or two slices were used for image analysis of each nodule. The low-dose CT database consisted of 489 nodules obtained from a mass screening for lung cancer in Nagano, Japan [2]. All of these nodules were considered as suspicious or indeterminate lesions when detected by radiologists on low-dose CT screening. With the use of receiver operating characteristic (ROC) analysis, our computerized scheme achieved an area under the ROC curve (Az value) of 0.846 for distinction between 76 malignant and 413 benign lung nodules.
Recently, we further developed another computerized scheme for distinction between malignant and benign lesions derived from multiple slices of HRCT (1-mm collimation) based on 2D and 3D volume data. The HRCT database consisted of 244 small noncalcified (3-20 mm) nodules obtained as part of follow-up diagnostic work for suspicious or indeterminate lesions detected on low-dose CT in the same screening program.
In the present study, we assessed observer performance using ROC analysis to evaluate the effectiveness of our computer-aided diagnosis (CAD) scheme to assist radiologists in distinguishing small benign from malignant lung nodules in various patterns at HRCT. The malignant lung cancers included nodules with pure ground-glass opacity, mixed ground-glass opacity, and solid opacity; the benign nodules were selected by matching their size and pattern to the cancers on HRCT in this observer study.
|
|
|---|
Database
The diagnostic HRCT database used in this study consisted of 59 patients
(27 men, 32 women; mean age, 64.6 years) with 61 malignant nodules and 169
patients (99 men, 70 women, mean age 61.6 years) with 183 benign nodules. The
database was obtained as part of an annual 3-year low-dose CT screening for
lung cancer from 17,892 examinations on 7,847 individuals in Nagano, Japan
[2]. HRCT scans were obtained
on a helical scanner (HiSpeed Advantage, GE Healthcare) with a standard tube
current (200 mA) to cover the entire nodule lesion, 1-mm collimation, and a
bone reconstruction algorithm with a 0.5-mm interval.
Two features concerning the size and pattern type of the pulmonary nodules on HRCT were subjectively determined by radiologists for the purpose of grouping nodules in our database. The mean size (average length and width) was recorded by one radiologist. The three types of patterns of these nodulespure ground-glass opacity, mixed ground-glass opacity, and solid opacitywere viewed independently and grouped by three radiologists without knowledge of the final diagnosis, and a consensus was reached through discussion. Nodules with benign-pattern calcifications (diffuse, central, popcorn, and laminar, or concentric calcification) were excluded. The range of nodule sizes for the 61 malignant and 183 benign nodules was 6-19 mm (mean, 12 mm) and 3-20 mm (mean, 7 mm), respectively. Among the 61 malignant nodules, there were 18 nodules with pure ground-glass opacity, 28 with mixed ground-glass opacity, and 15 with solid opacity, whereas 183 benign nodules included 12 with pure ground-glass opacity, 30 with mixed ground-glass opacity, and 141 with solid opacity.
All malignant nodules were primary lung cancers confirmed by surgery, including 49 well-differentiated adenocarcinomas, eight other adenocarcinomas, two squamous cell carcinomas, and two localized small cell carcinomas. Among the 183 benign nodules, nine (four cases of nodular fibrosis; and one case each of inflammatory granuloma, cryptococcoma, focal organizing pneumonia, inflammatory pseudotumor, and sclerosing hemangioma) were confirmed by surgery, 51 had resolved at follow-up examination, and 123 showed no change for 2 or more years.
CAD
With our CAD scheme, the nodules were segmented automatically using a
dynamic programming technique
[7]. Forty-one and 15 image
features based on 2D and 3D volume data, respectively, were determined from
quantitative analysis of the nodule outline and pixel values. Linear
discriminant analysis was used to distinguish benign from malignant nodules.
The performance of this CAD scheme was evaluated on the basis of a
leave-one-out testing method by use of 61 malignant and 183 benign lung
nodules. For the input of the linear discriminant analysis, we selected many
combinations from 56 features and two clinical parameters (patient age and
sex). The following features were used in this study: effective diameter,
contrast of the segmented nodule on the HRCT image, overlap measures of two
gray-level histograms for the inside and outside regions of the segmented
nodule on the HRCT image, overlap measures of two gray-level histograms for
the inside and outside regions of the segmented nodule on the edge-gradient
image, radial gradient index for the inside region of the segmented nodule on
the HRCT image, peak value of the histogram for the inside region of the
segmented nodule on the edge-gradient image, pixel value at the peak of the
histogram for the inside region of the segmented nodule on the edge-gradient
image, and pixel value at the peak of the histogram for the inside region of
the segmented nodule on the HRCT image.
Our computerized classification method outputs a percentage, from 1% to 99%, that indicates the likelihood of malignancy. The performance of the classification scheme yielded an Az value of 0.937 (0.919 for nodules with pure ground-glass opacity, 0.852 for nodules with mixed ground-glass opacity, and 0.957 for solid nodules) for distinction between 61 malignant and 183 benign lung nodules.
Observer Study
The data used in this observer study consisted of 28 malignant nodules that
were randomly selected from the 61 primary lung cancers and 28 benign nodules
that were selected from the 183 benign nodules by matching in size and pattern
to the cancers. For both malignant and benign lesions, nine nodules ranged
from 6 to 10 mm and 19 nodules ranged from 11 to 20 mm. The patterns involved
were eight nodules with pure ground-glass opacity, 12 with mixed ground-glass
opacity, and eight with solid opacity. Examples of cases used for this
observer study are shown in Figures
1A,
1B,
1C,
1D,
1E,
1F.
|
|
|
|
|
|
Sixteen radiologists participated in this observer study. The 16 radiologists, seven chest radiologists and nine other radiologists, have a mean of 14 years of experience (range, 7-26 years). Consecutive region-of-interest HRCT images for each nodule were displayed for interpretation using the cine mode on a cathode ray tube monitor (1,280 x 1,024 resolution). The window settings were initially at a width of 1,500 H and a level of -550 H, but the settings could be adjusted by the observer. In addition, zooming capability was provided. Two clinical parameters (patient age and sex) were displayed to the observer on the monitor.
The observers were told that the purpose of this observer study was to assist radiologists in distinguishing benign from malignant lesions on HRCT by using a CAD scheme. The instructions for the observers were an explanation of the role of CAD output as a second opinion. The observers were told that 28 malignant lesions (6-10 mm, nine cases; 11-20 mm, 19 cases; pure ground-glass opacity, eight cases; mixed ground-glass opacity, 12 cases; and solid opacity, eight cases) and 28 benign lesions (matched in size and pattern to the malignant lesions) were included in this study and that the sensitivity and specificity of our CAD scheme, using a threshold of 0.50 (50%) likelihood of malignancy, are 80% and 75%, respectively.
The observers were instructed to click on a bar (left, benignancy; right, malignancy) on the screen using a mouse to indicate confidence level regarding the malignancy (or benignancy) of a lesion first without and then with computer output, and after indicating your confidence (without and with CAD), click on one of the four following clinical actions: return to annual screening; follow-up in 6 months; follow-up in 3 months; or biopsy or surgery.
For a training session before the test, we provided five cases so that the observers could learn how to operate the cine mode interface and how to take into account the computer output in their decision. The review time was not limited. The average review time was 46 min (range, 28-100 min).
Data Analysis
The confidence level ratings from each observer were analyzed using
receiver operating characteristic (ROC) methodology, and a quasimaximum
likelihood estimation of the binormal distribution was fitted to the
radiologists' confidence ratings
[8]. The statistical
significance of the difference in Az values between
observer interpretations without and with the CAD scheme was tested using the
Dorfman-Berbaum-Metz method
[9]; this method included both
observer variation and case sample variation by means of an
analysis-of-variance approach. The statistical significance of the difference
in Az values between the computer outputs and observer
interpretations (without and with the CAD scheme) was tested by means of
confidence interval method by taking into account observer variation alone
[10]. The effect of the
computer output on the rating scores and also the change in scores that were
due to the use of the CAD scheme were analyzed. The distributions of the
radiologists' ratings and of the computer outputs were compared for the
malignant and benign nodules.
The statistical significance of the difference in clinical actions between the beneficial and detrimental effect of the CAD scheme for each of the malignant and benign nodules was estimated using the Student's paired t test for 16 radiologists.
|
|
|---|
|
|
Figures 3A, 3B shows the correlation between the computer outputs and the average radiologists' ratings without (Fig. 3A) and with (Fig. 3B) the CAD scheme for indicating the malignancy and benignancy of lung nodules. The radiologists' interpretations with the computer aid were, in general, more accurate than those of the radiologists alone for most of the malignant and benign nodules (Figs. 1A, 1B, 1C, 1D, 1E, 1F). Note, however, that there were some cases for which the radiologists' ratings without CAD scheme were correct and the likelihood of malignancy in the computer output was incorrect. In those cases, the radiologists gave the correct ratings with the CAD scheme, as illustrated by three cancer cases (black circles) in the upper left quadrant and three benign cases (white circles) in the lower right quadrant in Figure 3B. Sample cases are shown in Figures 4A, 4B.
|
|
|
|
The effect of the computer output on the average change in rating score due to the CAD is illustrated in Figure 5. The relationship between the likelihood of malignancy and the average change in confidence level (average change in ratings from without to with CAD) for each nodule by the 16 radiologists has a large correlation coefficient (r = 0.927). The radiologists increased their confidence level toward malignancy when the likelihood of malignancy was greater than 0.50 and decreased the confidence level toward benignancy when the likelihood measure was less than 0.50 for most of the malignant and benign nodules.
|
For the four clinical actionsreturn to annual screening, follow-up in 6 months, follow-up in 3 months, or biopsy or surgery, we attempted to quantify the changes in clinical action that were due to the CAD scheme. For malignant nodules, the average number of nodules for which clinical actions were changed by the 16 radiologists toward a beneficial effect (step up) (mean, 4.1 nodules) was greater than that toward a detrimental effect (step down) (mean, 1.2 nodules) (p = 0.003). For benign nodules, the number of nodules affected by the CAD scheme toward a beneficial effect (step down) and detrimental effect (step up) was 3.1 and 2.1, respectively (p = 0.15). Table 2 shows only the cases for which the clinical action was changed to or from the two extreme situationsthat is, from biopsy or surgery to screening and from screening to biopsy or surgery. For malignant nodules, the difference was statistically significant between the change to (1.9 cases) and the change from (0.8 cases) biopsy or surgery (p = 0.007) and between the change from (0.7 cases) and the change to (0.1 cases) screening (p = 0.02). For benign nodules, there was no statistically significant difference between them.
|
|
|
|---|
Previous studies indicated several methods for determining the probability of malignancy in masses on mammography [17, 18] and solitary pulmonary nodules on chest radiography [19-22] and chest CT [7, 23-25]. Automated feature-extraction techniques have been applied in CAD schemes for classification of malignant and benign masses on breast and lung images [7, 17, 18, 22]. Several observer studies indicated that the likelihood-of-malignancy measures can improve radiologists' diagnostic accuracy in distinguishing benign from malignant lesions on radiographs [17, 18, 23, 26] and low-dose CT scans [27]. A recent study indicated that the use of an artificial neural network (ANN) as a computer aid based on attending radiologists' subjective rating scores improved radiologists' performance in terms of Az value from 0.831 to 0.959 in differentiating benign from malignant pulmonary nodules on HRCT [25]. The performance of our automated feature-extraction scheme for all nodules in our database (Az = 0.937) was comparable to that of the ANN by use of subjective ratings (Az = 0.951) [25]. Our observer study indicates the usefulness of our automated computerized scheme in the classification of pulmonary nodules on HRCT images. In the future, therefore, an automated computerized scheme as second opinion may be acceptable to radiologists in clinical situations.
Our automated computerized scheme is based on various objective features (size, contrast, shape, margin, internal opacity, and internal features) of the nodules. The performance of the CAD scheme was evaluated on the basis of a leave-one-out testing method using 61 malignant and 183 benign nodules. In the computer output, a misclassification by the CAD system was observed to occur in large benign solid nodules (Fig. 4B) and in nodules with mixed ground-glass opacity, including benign (Fig. 1D) and malignant lesions (Fig. 4A). These misclassifications probably occurred because our database was obtained from a CT screening program in which all (15 lesions) solid malignant lesions were more than 10 mm, 94% (133/141) of solid benign nodules were 10 mm or less, and in a nodule with mixed ground-glass opacity, it was more difficult to differentiate benign from malignant by the CAD scheme. Also, there was a limitation in this observer study because the 56 nodules were included for developing the CAD scheme. The number of nodules, especially malignant nodules in our database, was not enough to divide training and test groups in this study, and we plan to use an independent database from other CT screening programs to test the usefulness of our CAD scheme in the future.
Our results in this study showed that the radiologists' performance with CAD scheme (0.853) was greater than that of either radiologists alone (0.785) or computer output alone (0.831), with statistically significant differences in Az values. The radiologists generally increased or decreased their confidence level when the likelihood of malignancy was above or below 0.50, respectively, and the changes based on CAD output for most nodules were toward a beneficial effect. Important findings are that the radiologists' initial ratings without CAD were clearly correct for some nodules and that even when the computer output indicated incorrect results, no serious detrimental effect to the radiologists' ratings as a result of the CAD output occurred. Thus, radiologists were able to maintain their correct judgments when nodules appeared obviously benign or malignant despite an incorrect CAD output. In addition, the correct computer output was able to assist radiologists in improving their decisions on many subtle cases. Therefore, this study indicated that a synergistic improvement in observers' interpretation by use of a CAD scheme as a second opinion was possible, because the radiologists were able to maintain their own correct opinions on some obvious cases, whereas the computer output assisted in improving their decisions on the majority of subtle cases.
In this study, we quantified the changes due to the CAD scheme in two extreme situationsthat is, changes to or from biopsy or screening, which are important decisions in cancer screening. The results indicate the benefit of the computer aid to radiologists in making correct recommendations for malignant lesions. However, no significant benefit of the computer aid to radiologists was observed for benign nodules. Possible reasons might be that because this study was based on lung cancer CT screening, radiologists were highly alerted to avoid making underin-terpretations for subtle pulmonary nodules regardless of the result of the CAD scheme.
Acknowledgments
We thank Takeshi Kobayashi, Kazuto Ashizawa, Naohiro Matsuyama, Hajime
Abiru, Tetsuji Yamaguchi, Chaotong Zhang, Peter MacEneaney, Ulrich Bick,
Christopher Straus, Edward Michale, Gregory Scott Stacy, Akiko Egawa, Tomoaki
Okimoto, Kazunori Minami, and Shuji Sakai, for participating as observers;
Shigehiko Katsuragawa, for helpful suggestions; and Elisabeth Lanzl for
improving the manuscript.
|
|
|---|
1 cm) detected at population-based CT screening for lung cancer:
reliable high-resolution CT features of benign lesions.
AJR 2003;180:955
-964This article has been cited by other articles:
![]() |
F. Girvin and J. P. Ko Pulmonary Nodules: Detection, Assessment, and CAD Am. J. Roentgenol., October 1, 2008; 191(4): 1057 - 1069. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. P. Krestin, J. C. Miller, S. J. Golding, G. G. Frija, G. M. Glazer, H. G. Ringertz, and J. H. Thrall Reinventing Radiology in a Digital and Molecular Age: Summary of Proceedings of the Sixth Biannual Symposium of the International Society for Strategic Studies in Radiology (IS3R), August 25 27, 2005 Radiology, September 1, 2007; 244(3): 633 - 638. [Full Text] [PDF] |
||||
![]() |
Y. Nie, Q. Li, F. Li, Y. Pu, D. Appelbaum, and K. Doi Integrating PET and CT Information to Improve Diagnostic Accuracy for Lung Nodules: A Semiautomatic Computer-Aided Method J. Nucl. Med., July 1, 2006; 47(7): 1075 - 1080. [Abstract] [Full Text] [PDF] |
||||
![]() |
K Doi Current status and future potential of computer-aided diagnosis in medical imaging Br. J. Radiol., January 1, 2005; 78(suppl_1): S3 - s19. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |