Original Research
Neuroradiology/Head and Neck Imaging
June 24, 2016

Classifier Model Based on Machine Learning Algorithms: Application to Differential Diagnosis of Suspicious Thyroid Nodules via Sonography


OBJECTIVE. The purpose of this article is to construct classifier models using machine learning algorithms and to evaluate their diagnostic performances for differentiating malignant from benign thyroid nodules.
MATERIALS AND METHODS. This study included 970 histopathologically proven thyroid nodules in 970 patients. Two radiologists retrospectively reviewed ultrasound images, and nodules were graded according to a five-tier sonographic scoring system. Statistically significant variables based on an experienced radiologist's observations were obtained with attribute optimization using fivefold cross-validation and applied as the input nodes to build models for predicting malignancy of nodules. The performances of the machine learning algorithms and radiologists were compared using ROC curve analysis.
RESULTS. Diagnosis by the experienced radiologist achieved the highest predictive accuracy of 88.66% with a specificity of 85.33%, whereas the radial basis function (RBF)–neural network (NN) achieved the highest sensitivity of 92.31%. The AUC value for diagnosis by the experienced radiologist (AUC = 0.9135) was greater than those for diagnosis by the less experienced radiologist, the naïve Bayes classifier, the support vector machine, and the RBF-NN (AUC = 0.8492, 0.8811, 0.9033, and 0.9103, respectively; p < 0.05).
CONCLUSION. The machine learning algorithms underperformed with respect to the experienced radiologist's readings used to construct them, and the RBF-NN outperformed the other machine learning algorithm models.
Thyroid nodules are very common, with an incidence of 10–67% among the general population as identified by high-resolution ultrasound [1, 2]. Although most thyroid nodules are benign, according to the results of ultrasound-guided fine-needle aspiration, 9–15% of nodules are malignant [24]. After evaluation of a thyroid nodule, it is important to determine the most appropriate strategies to properly manage malignant nodules while avoiding unnecessary procedures and surgery in patients with benign nodules.
Ultrasound is an ideal imaging modality for examining thyroid nodules because it is noninvasive and cost-effective. Many studies have investigated the use of ultrasound features for predicting the risk of nodule malignancy [2, 3, 5], and several ultrasound features have been proposed as possible markers of malignancy, including the presence of hypoechogenicity, microcalcifications, a taller-than-wide shape, ill-defined margins, internal vascularity, extracapsular invasion, and a suspicious lymph node [6]. However, no single ultrasound feature is adequately sensitive or specific to identify all malignant nodules. Meanwhile, ultrasound is a rather subjective and operator-dependent diagnostic tool. High inter- and intraobserver agreement is achieved only among experienced radiologists for interpretation of ultrasound findings related to thyroid nodules [7].
In the past few decades, a number of machine learning algorithms have been developed to create classifier models for preoperation diagnosis, including binary logistic regression, the naïve Bayes classifier, the support vector machine (SVM), and the radial basis function (RBF)–neural network (NN) [811], and these differ in terms of their corresponding characteristics. For instance, SVMs are basically widely accepted linear classifiers and are considered very effective for pattern recognition and machine learning. The key feature of the SVM is the maximization of the functional gap between two classes so as to minimize the generalization error. The naïve Bayes classifier also has been widely used in disease diagnosis. An advantage of the naïve Bayes classifier is that it requires only a small amount of training data to estimate the parameters necessary for classification. In the current study, we designed classifier models using different machine learning algorithms to differentiate malignant from benign thyroid nodules on ultrasound and compared the diagnostic performances of the machine learning algorithms to that of two radiologists using ROC curve analysis.

Materials and Methods

Study Population

The study cohort included 1073 patients who underwent partial or total thyroidectomy between January 2012 and June 2014 at Jiangsu Institute of Nuclear Medicine. The decision to undergo surgery was based on any one of the following criteria: first, abnormal results of ultrasound-guided fine-needle aspiration, including malignancy, suspicion of malignancy, and follicular lesion of undetermined significance; second, suspicious malignant ultrasound findings, including hypoechogenicity, microcalcifications, a taller-than-wide shape and associated cervical lymphadenopathy with round shape, intranodal cystic components, or microcalcifications; and third, pressure symptoms [12]. Patients' medical records, including age, sex, ultrasound features of the dominant nodule, and histopathologic results, were collected retrospectively. The dominant nodule was defined as the nodule most likely to be malignant among all the nodules observed on ultrasound. The largest nodule was designated as the dominant nodule when none of the nodules showed suspicious ultrasound features. Nodule identification was achieved via ultrasound imaging and gross pathologic examination records. We excluded 103 nodules in 103 patients because the ultrasound data or images for these nodules were incomplete, or there were coalescent thyroid lesions not clearly distinguishable and matched with the histopathologic results. This retrospective study was approved by the institutional review board of Jiangsu Institute of Nuclear Medicine, and the need for informed patient consent for inclusion in this study was waived.

Ultrasound and Image Interpretation

Thyroid ultrasound examination was performed by a radiologist using an ultrasound system (iU22, Philips Healthcare) with a 5-12–MHz transducer. Static images were archived as jpeg image files for subsequent evaluation. The ultrasound images of the dominant nodule were presented in a random fashion by the study coordinator (an author with 6 years of ultrasound experience). During interpretation, two radiologists with differing experience in ultrasound examination of thyroid nodules (17 and 3 years, respectively) were blinded to any subsequent cytologic or histologic diagnosis as well as the assessments of the other radiologist. In the measurement of nodules, lengths corresponded to the long-axis measurements on the longitudinal scan, and widths and thicknesses corresponded to the short-axis measurements on the transverse scan. The following ultrasound features were documented for each nodule: location (left, right, or isthmus), position (upper pole, medium, or lower pole), shape (ovoid-to-round, taller-than-wide, or irregular), margin (ill-defined or well-defined), internal contents (solid, mixture, or cystic), echogenicity (hyperechoic, isoechoic, hypoechoic, or marked hypoechoic), calcification (microcalcification, macrocalcification, or rim calcification), echogenic foci in solid portion (absent or present), halo sign (usual or unusual), infiltration and extracapsular invasion (absent or present), multifocal (absent or present), increased intranodular vascularity (absent or present), and abnormal lymphadenopathy (absent or present). The definitions and categorizations of shape, echogenicity, and calcification were consistent with those used in the literature [6]. Partly interrupted rim calcification corresponded to incomplete peripheral calcification surrounding the lesion. Ill-defined margin included spiculated and microlobulated margin. The internal content was categorized in terms of the ratio of the cystic portion to the solid portion as solid (≤ 10% of the cystic portion), mixture (> 10% of the cystic portion and ≤ 90% of the cystic portion), and cystic (> 90% of the cystic portion) [6]. Echogenic foci were bright spots with comet tails caused by reverberation artifacts. When a lesion was surrounded by an irregular or thick anechoic halo, it was considered an unusual halo. Infiltration was defined as an interruption of the hyperechogenicity of the thyroid capsule. Extracapsular invasion was considered if the tumoral tissue extended beyond the contours of the thyroid gland and invaded into adjacent structures [13]. A multifocal lesion was defined by the presence of two or more isolated or noncontiguous lesions with similar suspicious malignant ultrasound features in one or both thyroid lobes [14]. Increased intranodular vascularity was defined by an increased predominance within the nodule in comparison with the surrounding parenchyma. Lymph nodes were considered abnormal when a suspicious sonographic finding (calcifications and cystic change) was present, whereas round shape, hyperechogenicity, and abnormal vascularity of lymph nodes were excluded for low specificity or positive predictive value [15, 16].

Sonographic Scoring System

In accordance with our experiences, combined with the data available in the literature [1618], we adopted a five-tier sonographic scoring system for stratifying the risk of nodule malignancy in routine clinical practice. Ultrasound features, including marked hypoechogenicity, taller-than-wide shape, microcalcifications, multifocal, infiltration or extracapsular invasion, and abnormal lymphadenopathy, were regarded as indications for malignancy. Borderline ultrasound features included hypoechogenicity, an irregular shape, an ill-defined margin, an unusual halo, increased intranodular vascularity, macrocalcifications, and partly interrupted rim calcifications. The ultrasound features of a benign nodule included isoechogenicity, hyperechogenicity, an ovoid-to-round shape, a well-defined margin, echogenic foci in the cystic portion, a usual halo (regular and thin), and a spongiform and pure cystic nodule. The criteria for ultrasound-based diagnosis of a thyroid nodule were as follows: If a nodule had three or more ultrasound features of malignancy, regardless of borderline or benign ultrasound features, the nodule was considered malignant. If a thyroid nodule had at least one ultrasound feature of malignancy, regardless of borderline or benign ultrasound features, it was considered as being suspicious for malignancy. If a thyroid nodule had one or more borderline ultrasound features without malignant features, regardless of benign ultrasound features, it was considered borderline. If a thyroid nodule had two or more benign ultrasound features suggesting no malignancy or borderline ultrasound features, it was considered as likely benign. Finally, if a thyroid nodule was a spongiform or pure cystic nodule with no features of malignancy or borderline ultrasound features, it was considered benign [18].

Classifier Model Construction

We built the models using Matlab (version 8.1, MathWorks). The observations of radiologist 1 (i.e., the experienced radiologist) were evaluated using the machine learning algorithms to obtain the model diagnoses for the same case. The standard data-mining process was adopted in the classifier development and consisted of the following steps: data cleaning and integration, data selection and transformation, and data mining and pattern evaluation. All variables describing clinical information and ultrasound features were selected as input variables for the classifier models. Subsequent attribute optimization was used to optimize variable selection and input node confirmation. A fivefold cross-validation using a train-and-test method was performed to guarantee the validity of the results. Unrelated and confounding variables were then pruned from the final models. A random number was assigned to each nodule. Nodules with an odd number (n = 485, 50%) were assigned to a training cohort, and those with an even number were assigned to a validation cohort (n = 485, 50%).
In the first training phase, we had applied three classification algorithms on the obtained significant attributes. For naïve Bayes classifier, we set the first evidence on the input nodes, and then the output nodes could be queried using Bayesian network inference. The links in the naïve Bayes classifier were directed from output to input, which made the classifier simplified, because there were no interactions between the inputs. The SVM classifier was trained with a learning algorithm from optimization theory. The training instances closest to the maximum margin hyperplane were called support vectors, which were then used to build an optimal linear separating hyperplane. The RBF-NN contained three layers of nodes but with only a single hidden layer. The classifier-adopted RBF-NN used a clustering method to determine the center of the RBF, and the least-mean square method was used to determine the connection weights between the hidden layer and output layer. Training was terminated when the error between the actual output and the desired output was minimized or the given maximum number of epochs to train was obtained. Cross-validation was used to reduce the risk of overtraining, which will lead to good performance in a training cohort but poor performance in a validation cohort. The data were split via stratified sampling to ensure the same class distribution in the subset to generate comparison results with lower bias.
Subsequently, the classifiers were used for classification in the second validation phase. The performance was analyzed with ROC curves. Ultimately, once we add evidence to a trained and cross-validated classifier given prior knowledge, it will generate case-specific malignant prediction through predicting the probability of classification.

Statistical Analysis

Continuous variables are presented as mean ± SD and were compared with the two-sample t test. Categoric variables are presented as percentages, and chi-square or Fisher exact tests were used to determine the statistical significance of the differences between the two groups. For evaluation of the performance of machine learning algorithms and the two radiologists in the task of predicting the probability of malignancy, the AUC values were calculated and compared.


Clinical Characteristics

The mean age of the 970 patients was 46.71 ± 12.14 (SD) years (range, 13–83 years); 756 (77.94%) were female and 214 (22.06%) were male. There were 507 cases of malignant nodules (52.27%) and 463 cases of benign nodules (47.73%). Diagnoses of malignancy included papillary thyroid carcinoma (n = 487), follicular thyroid carcinoma (n = 12), medullary thyroid carcinoma (n = 4), well-differentiated carcinoma (n = 3), and clear cell carcinoma (n = 1).
The sex distribution was not statistically significantly different between the patients with malignant and benign nodules (male-to-female ratio, 103:404 vs 111:352; p = 0.1702), but a statistically significant difference in age was observed between the patients with malignant and benign nodules (44.76 ± 12.15 years vs 48.92 ± 12.14; p < 0.01). Benign nodules were statistically significantly larger than the malignant nodules (p < 0.01), and a taller-than-wide shape was found more frequently in malignant nodules (39.05%) than in benign nodules (10.37%). More malignant nodules had an ill-defined margin compared with the benign nodules (91.72% vs 57.24%). Solid nodules were more frequently detected among the malignant nodules (96.45% vs 67.82%). Hypoechogenicity was an ultrasound feature found in most malignant nodules (94.87%), and the frequency of microcalcifications in malignant nodules was statistically significantly higher than that in benign nodules (44.38% vs 8.64%). When echogenic foci presented in the solid portion, the prevalence of malignancy was 24.4% (10/41) (Fig. 1). The presence of unusual halo sign, infiltration and extracapsular invasion, and multifocal and abnormal lymphadenopathy were statistically significantly different between malignant and benign groups (p < 0.01). No statistically significant differences were found with respect to patient sex, left and right lobe location, cystic contents, hyperechogenicity and marked hypoechogenicity, macrocalcification, partly interrupted rim calcification, and increased intranodular vascularity between benign and malignant nodules (p > 0.05). Thus, these features were regarded as suspicious invalid redundant features for malignancy.
Fig. 1A —Two patients with papillary thyroid carcinoma. In both cases, classic comet tail artifact that inverted echogenic triangle was clearly documented posterior to focus in solid portion.
A, 51-year-old woman with papillary thyroid carcinoma. Longitudinal ultrasound image shows 4.6-mm hypoechogenic solid nodule (area within arrows) with internal echogenic foci in lower pole of right lobe of thyroid gland. Both radiologists classified nodule as borderline. Nodule had probability of malignancy of 58.4% for naïve Bayes classifier, 68.9% for support vector machine (SVM), and 67.1% for radial basis function (RBF)–neural network (NN).
Fig. 1B —Two patients with papillary thyroid carcinoma. In both cases, classic comet tail artifact that inverted echogenic triangle was clearly documented posterior to focus in solid portion.
B, 44-year-old man with papillary thyroid carcinoma. Ultrasound image shows 13.5-mm mixture nodule containing multiple punctate echogenic foci in both cystic and solid portions. Radiologist 1 classified nodule as borderline, and radiologist 2 classified it as likely benign. Nodule had probability of malignancy of 55.2% for naïve Bayes classifier, 61.0% for SVM, and 63.6% for RBF-NN.
We assessed the predictive accuracy of machine learning algorithms, both including (Table 1) and excluding (Table 2) the suspicious invalid redundant features. After excluding the suspicious invalid redundant features, the accuracies of the naïve Bayes classifier and SVM for predicting malignancy of thyroid nodules (83.30% and 83.89%, respectively) were better than those achieved with inclusion of the suspicious invalid redundant features (82.99% and 83.71%, respectively; p < 0.05). After excluding the suspicious invalid redundant features, the accuracy of the RBF-NN was slightly less than that with inclusion of the suspicious invalid redundant features (83.61% vs 83.92%; p = 0.6302), whereas the AUC was higher with suspicious invalid redundant features excluded versus included (0.8951 vs 0.8937; p < 0.05). Therefore, we considered that the suspicious invalid redundant features were unrelated to the construction of the classifier and pruned them from the final models.
TABLE 1: The Performance of Proposed Predictive Model With Suspicious Invalid Redundant Features
Predictive ModelSensitivity (%)Specificity (%)Accuracy (%)AUC
Naïve Bayes classifier90.78 (2.82)74.37 (6.13)82.99 (3.94)0.8775 (0.0239)
Support vector machine90.49 (3.46)76.37 (6.63)83.71 (3.97)0.8881 (0.0247)
Radial basis function-neural network90.89 (2.61)76.33 (6.29)83.92 (3.54)0.8937 (0.0232)

Note—Data are mean (SD).

TABLE 2: The Performance of Proposed Predictive Model Without Suspicious Invalid Redundant Features
Predictive ModelSensitivity (%)Specificity (%)Accuracy (%)AUC
Naïve Bayes classifier91.21 (3.33)74.60 (6.38)83.30 (4.46)0.8777 (0.0254)
Support vector machine88.79 (3.46)76.44 (5.99)83.89 (3.29)0.8730 (0.0284)
Radial basis function-neural network90.48 (2.67)76.06 (6.27)83.61 (3.56)0.8951 (0.0210)

Note—Data are mean (SD).

Comparison of Models

The performances of the proposed models were evaluated using the validation cohort (Table 3). The most favorable sensitivity, specificity, positive predictive, negative predictive, and accuracy values were achieved when we defined the indeterminate as benign. In the final evaluation of the validation cohort, radiologist 1 (the experienced radiologist) achieved the highest prediction accuracy of 88.66% with a sensitivity of 91.54% and a specificity of 85.33%. Radiologist 2 (the less experienced radiologist) achieved a prediction accuracy of 81.03% with a sensitivity of 85.38% and a specificity of 76.00%. For comparison, the respective corresponding values were 83.30%, 89.62%, and 76.00% for the naïve Bayes classifier; 83.09%, 89.23%, and 75.11% for the SVM; and 84.74%, 92.31%, and 76.00% for the RBF-NN. The AUC value for radiologist 1 was higher (AUC = 0.9135) than those for radiologist 2, naïve Bayes classifier, SVM, and RBF-NN (AUC = 0.8492, 0.8811, 0.9033, and 0.9103, respectively; p < 0.05). In addition, the AUC for the RBF-NN was statistically significantly higher than those for the other machine learning algorithms and radiologist 2 (all p < 0.05).
TABLE 3: The Performance of Proposed Predictive Models and Two Radiologists in Validation Cohort
Predictive ModelSensitivity (%)Specificity (%)Accuracy (%)AUC
Radiologist 191.54 (4.26)85.33 (5.04)88.66 (3.40)0.9135 (0.0190)
Radiologist 285.38 (4.46)76.00 (6.13)81.03 (3.55)0.8492 (0.0234)
Naïve Bayes classifier89.62 (2.75)76.00 (5.59)83.30 (4.01)0.8811 (0.0231)
Support vector machine89.23 (3.55)75.11 (6.64)83.09 (4.48)0.9033 (0.0280)
Radial basis function-neural network92.31 (4.33)76.00 (5.38)84.74 (4.10)0.9103 (0.0231)

Note—Data are mean (SD).


The primary aim of this study was to design classifier models using different machine learning algorithms for the differential diagnosis of suspicious thyroid nodules on ultrasound. The use of more highly discriminatory attributes as inputs into the machine learning algorithms enhanced the accuracy of the evaluations. There was no interobserver bias for ultrasound procedures, which were performed by a single experienced radiologist. The bias in model construction was acceptable, because less misperception of the observations was achieved by using a single radiologist. The optimum classifier model with the RBF-NN could effectively predict malignancy (AUC = 0.9103) in thyroid nodules but underperformed compared with the experienced radiologist (AUC = 0.9135).
The goal of thyroid nodule evaluation is to determine whether a nodule is benign or malignant to choose the most appropriate management. As a cost-effective tool, ultrasound has a high sensitivity for detecting thyroid nodules. Several published studies have proven that some ultrasound features are associated with malignancy, including hypoechogenicity, microcalcifications, a taller-than-wide shape, spiculated margins, and intranodular vascularity [2, 3, 5, 1921]. Unfortunately, thyroid carcinoma can have different presentations depending on the histopathologic subtype. Therefore, the predictive values of these features are extremely variable between studies. In addition, variations in radiologists' perceptions and the lack of standard definitions of the features observed in the images also contribute to variability in thyroid nodule diagnosis.
In recent decades, a few clinical research groups have used data mining to develop quantitative and reproducible decision models for determining the diagnosis or treatment strategies in thyroid diseases [2224]. Their results showed that machine learning algorithms could be helpful for improving the accuracy of diagnoses by reducing the diagnostic error rate. One key contributing factor to the efficiency of machine learning algorithms is the capability of incremental learning from qualitative data. Using a training dataset, the machine learning algorithms are capable of determining the probability of certain classes of thyroid disease in some cases, given the values of the predictor variables. Different machine learning algorithms were used in the current study in an attempt to select the best one that yields satisfactory results.
Within a discriminative model, the SVM tends to find the optimal hyperplane between different categories, reflecting the differences between heterogeneous data. The SVM is widely used for classification because of its simplicity and robustness. However, the discriminative model is inherently supervised and, thus, cannot easily be extended to unsupervised learning [23, 25]. The naïve Bayes classifier, a generative model, analyzes the distribution of the data from the statistics and thereby reflects the similarity of homogeneous data. The naïve Bayes classifier is highly scalable and requires only a small amount of training data to estimate the parameters necessary for classification [26]. Both discriminative models and generative models have limitations, and neither model type provides a perfect solution for practical application. The RBF-NN, a hybrid classifier model merging the discriminative model (multilayer perceptron) with the generative model (gaussian mixture model), is able to approximate any arbitrary nonlinear function in a complex multidimensional space with computing complexity reduction [27]. Hence, with the properties of experience-based learning and ability for generalization, the RBF-NN is regarded as an ideal solution for solving complex problems [28, 29]. On the basis of the results of the current study, compared with the other machine learning algorithms, the RBF-NN showed excellent performance in all aspects, and even though the accuracy of the RBF-NN was slightly reduced after exclusion of the suspicious invalid redundant features, the RBF-NN achieved the highest AUC value of 0.8951. These results also suggest that the RBF-NN had good performance with respect to noise data disposal. Overall, the proper selection of a classifier on the basis of the inherent properties was the key point for model construction.
Accurate observation and interpretation of thyroid features is important for model construction; therefore, in the current study, we applied only the observations provided by the most experienced radiologist in developing the models. When we compared the diagnostic performances between the machine learning algorithms and radiologists, the RBF-NN significantly outperformed even the experienced radiologist in terms of sensitivity. This finding indicates that the RBF-NN has a better chance of correctly identifying positive cases (i.e., malignant thyroid nodules). However, none of the machine learning algorithms achieved the diagnostic performance in terms of accuracy, specificity, positive predictive value, and negative predictive value provided by the experienced radiologist. First, this outcome may be due to both the complexity of the machine learning algorithms and the huge processing power required for so many data points. Actually, it is impossible to construct a model that both captures the regularities in the training cohort accurately and generalizes those data for the validation cohort well. When the training cohort data present with a large amount of noise, the prediction will be less accurate. Therefore, technically, constructing an ideal model with an assumed infinitely large dataset cannot be achieved by simply expanding the dataset. Second, some quantitative and reproducible ultrasound-based classification systems, including the thyroid imaging reporting and data system, have been developed to improve the consistency and quality of radiologic reporting [30, 31]. With consideration of the findings of a previous study [10], in a sense, the outperformance of an experienced radiologist may also be attributed to the application of a sonographic scoring system. In addition, machine learning algorithms have more difficulty in accurately predicting malignancy in nodules with two concomitant pathologic diagnoses. The poor learning ability of the models for some histopathologic subtypes is attributed to the limited number of cases.
The significant difference in the diagnostic accuracy of the two radiologists may be explained by variations in observer perception and interpretation on the basis of the ultrasound features. Inexperienced or nonspecialist radiologists may overestimate their knowledge and experience, making decisions on the basis of a limited number of conspicuous features. In addition, they may fail to consider all of the features systematically. Machine learning algorithms have the capability to consistently and comprehensively merge all of the variables and readily adjust in the context of noisy data. Therefore, a classifier model based on a machine learning algorithm could potentially facilitate decision making and case education for inexperienced radiologists, which is consistent with the conclusion of a previous study [10].
There were some limitations to the current study. First, the results were derived from a selected preoperative population; consequently, the malignancy rate of thyroid nodules was relatively high (52.3%) compared with those in other studies (45–51%) [10, 11]. In addition, the validated model must be fit for utilization in situations of clinical decision making with respect to operative indications, and most cases of malignancy involved papillary carcinoma, with fewer cases of other subtypes of thyroid carcinoma included. Second, because of the limited numbers of cases included, some ultrasound features, such as cystic and partly interrupted rim calcification, that showed no significant difference between malignant and benign nodules were regarded as suspicious invalid redundant features and then were pruned from model construction. The significance of including these ultrasound features should be validated in future comprehensive studies. Third, we retrospectively reviewed still images specifically selected by an investigator, rather than ultrasound images collected in real time. Further studies are needed to evaluate the validity and interobserver consistency for the classifier models based on machine learning algorithms.
We have already started some preliminary work on creating a visual interface builder. By inputting the ultrasound features given by an experienced radiologist, a malignancy risk estimation system for thyroid nodule based on the developed classifier model will provide a real-time calculation of the probability for malignancy, which will play a valuable role for management decision in clinical practice.


In the current study, we have developed different machine learning algorithms for use in the differential diagnosis of suspicious thyroid nodules. Our results showed that the machine learning algorithms, using input from an experienced radiologist, performed better than an inexperienced radiologist but not as well as the experienced radiologist whose data were used to create the algorithms. In addition, a comparative study of the different models showed that the RBF-NN outperformed the other machine learning algorithms.


Cooper DS, Doherty GM, Haugen BR, et al. Revised American Thyroid Association management guidelines for patients with thyroid nodules and differentiated thyroid cancer: the American Thyroid Association (ATA) guidelines taskforce on thyroid nodules and differentiated thyroid cancer. Thyroid 2009; 19:1167–1214
Frates MC, Benson CB, Charboneau JW, et al. Management of thyroid nodules detected at US: Society of Radiologists in Ultrasound Consensus Conference Statement 1. Radiology 2005; 237:794–800
Papini E, Guglielmi R, Bianchini A, et al. Risk of malignancy in nonpalpable thyroid nodules: predictive value of ultrasound and color-Doppler features. J Clin Endocrinol Metab 2002; 87:1941–1946
Nam-Goong IS, Kim HY, Gong G, et al. Ultrasonography-guided fine-needle aspiration of thyroid incidentaloma: correlation with pathological findings. Clin Endocrinol (Oxf) 2004; 60:21–28
Kim EK, Park CS, Chung WY, et al. New sonographic criteria for recommending fine-needle aspiration biopsy of nonpalpable solid nodules of the thyroid. AJR 2002; 178:687–691
Moon WJ, Baek JH, Jung SL, et al.; Korean Society of Thyroid Radiology; Korean Society of Radiology. Ultrasonography and the ultrasound-based management of thyroid nodules: consensus statement and recommendations. Korean J Radiol 2011; 12:1–14
Seo JY, Kim EK, Baek JH, Shin JH, Han KH, Kwak JY. Can ultrasound be as a surrogate marker for diagnosing a papillary thyroid cancer? Comparison with BRAF mutation analysis. Yonsei Med J 2014; 55:871–878
Lim KJ, Choi CS, Yoon DY, et al. Computer-aided diagnosis for the differentiation of malignant from benign thyroid nodules on ultrasonography. Acad Radiol 2008; 15:853–858
Chen SJ, Chang CY, Chang KY, et al. Classification of the thyroid nodules based on characteristic sonographic textural feature and correlated histopathology using hierarchical support vector machines. Ultrasound Med Biol 2010; 36:2018–2026
Liu YI, Kamaya A, Desser TS, Rubin DL. A Bayesian network for differentiating benign from malignant thyroid nodules using sonographic and demographic features. AJR 2011; 196:[web] W598–W605
Stojadinovic A, Peoples GE, Libutti SK, et al. Development of a clinical decision model for thyroid nodules. BMC Surg 2009; 9:12
Chinese Society of Endocrinology. Guidelines taskforce on thyroid nodules and differentiated thyroid cancer. Chin J Endocrinol Metab 2012; 28:779–797
Park JP, Roh JL, Lee JH, et al. Risk factors for central neck lymph node metastasis of clinically noninvasive, node-negative papillary thyroid microcarcinoma. Am J Surg 2014; 208:412–418
Nam SY, Shin JH, Han BK, et al. Preoperative ultrasonographic features of papillary thyroid carcinoma predict biological behavior. J Clin Endocrinol Metab 2013; 98:1476–1482
Sohn YM, Kwak JY, Kim EK, Moon HJ, Kim SJ, Kim MJ. Diagnostic approach for evaluation of lymph node metastasis from thyroid cancer using ultrasound and fine-needle aspiration biopsy. AJR 2010; 194:38–43
Lee YH, Kim DW, In HS, et al. Differentiation between benign and malignant solid thyroid nodules using an US classification system. Korean J Radiol 2011; 12:559–567
Horvath E, Majlis S, Rossi R, et al. An ultrasonogram reporting system for thyroid nodules stratifying cancer risk for clinical management. J Clin Endocrinol Metab 2009; 94:1748–1751
Andrioli M, Carzaniga C, Persani L. Standardized ultrasound report for thyroid nodules: the endocrinologist's viewpoint. Eur Thyroid J 2013; 2:37–48
Chan BK, Desser TS, McDougall IR, Weigel RJ, Jeffrey RB Jr. Common and uncommon sonographic features of papillary thyroid carcinoma. J Ultrasound Med 2003; 22:1083–1090
Cappelli C, Castellano M, Pirola I, et al. The predictive value of ultrasound findings in the management of thyroid nodules. QJM 2007; 100:29–35
Moon WJ, Jung SL, Lee JH, et al. Benign and malignant thyroid nodules: US differentiation—multicenter retrospective study. Radiology 2008; 247:762–770
Li LN, Ouyang JH, Chen HL, Liu DY. A computer aided diagnosis system for thyroid disease using extreme learning machine. J Med Syst 2012; 36:3327–3337
Gopinath B, Shanthi N. Computer-aided diagnosis system for classifying benign and malignant thyroid nodules in multi-stained FNAB cytological images. Australas Phys Eng Sci Med 2013; 36:219–230
Jajroudi M, Baniasadi T, Kamkar L, Arbabi F, Sanei M, Ahmadzade M. Prediction of survival in thyroid cancer using data mining technique. Technol Cancer Res Treat 2014; 13:353–359
Tsantis S, Cavouras D, Kalatzis I, Piliouras N, Dimitropoulos N, Nikiforidis G. Development of a support vector machine-based image analysis system for assessing the thyroid nodule malignancy risk on ultrasound. Ultrasound Med Biol 2005; 31:1451–1459
Domingos P, Pazzani M. On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 1997; 29:103–130
Hu YH, Hwang JN. Handbook of neural networks signal processing. New York, NY: CRC Press, 2001:81
Unkelbach J, Sun Y, Schmidhuber J. An EM based training algorithm for recurrent neural networks. In: Alippi C, Polycarpou M, Panayiotou C, Ellinas G, eds. ICANN 2009: 19th International Conference on Artificial Neural Networks. Limassol, Cyprus: European Neural Network Society, 2009: 964–976
Tsihrintzis GA, Jain LC, eds. Multimedia services in intelligent environments: advanced tools and methodologies. Berlin, Germany: Springer, Science & Business Media, 2008:62
Park JY, Lee HJ, Jang HW, et al. A proposal for a thyroid imaging reporting and data system for ultrasound features of thyroid carcinoma. Thyroid 2009; 19:1257–1264
Kwak JY, Han KH, Yoon JH, et al. Thyroid imaging reporting and data system for US features of nodules: a step in establishing better stratification of cancer risk. Radiology 2011; 260:892–899

Information & Authors


Published In

American Journal of Roentgenology
Pages: 859 - 864
PubMed: 27340876


Submitted: November 9, 2015
Accepted: April 1, 2016
Version of record online: June 24, 2016


  1. classifier
  2. nodule
  3. thyroid
  4. ultrasound



Hongxun Wu
Department of Ultrasound, Jiangyuan Hospital Affiliated to Jiangsu Institute of Nuclear Medicine (Key Laboratory of Nuclear Medicine, Ministry of Health/Jiangsu Key Laboratory of Molecular Nuclear Medicine), 20 Qianrong Rd, Wuxi, Jiangsu 214063, China.
Zhaohong Deng
School of Digital Media, Jiangnan University, Wuxi, Jiangsu, China.
Bingjie Zhang
Department of Ultrasound, Jiangyuan Hospital Affiliated to Jiangsu Institute of Nuclear Medicine (Key Laboratory of Nuclear Medicine, Ministry of Health/Jiangsu Key Laboratory of Molecular Nuclear Medicine), 20 Qianrong Rd, Wuxi, Jiangsu 214063, China.
Qianyun Liu
Department of Ultrasound, Jiangyuan Hospital Affiliated to Jiangsu Institute of Nuclear Medicine (Key Laboratory of Nuclear Medicine, Ministry of Health/Jiangsu Key Laboratory of Molecular Nuclear Medicine), 20 Qianrong Rd, Wuxi, Jiangsu 214063, China.
Junyong Chen
School of Digital Media, Jiangnan University, Wuxi, Jiangsu, China.


Address correspondence to H. Wu ([email protected]).

Funding Information

Supported by grant H201227 from Planned Project of Health Department of Jiangsu Province.

Metrics & Citations



Export Citations

To download the citation to this article, select your reference manager software.

Articles citing this article







Copy the content Link

Share on social media