OBJECTIVE. The objectives of our study were to evaluate the in vivo reproducibility of automated volume calculations of small lung nodules with both low-dose and standard-dose CT and to assess whether repeatability within each technique varies according to the diameter, site, or morphology of the nodule or to percentage of emphysema.
SUBJECTS AND METHODS. Sixty-six subjects with 83 solid pulmonary nodules between 5 and 10 mm in diameter were enrolled in this prospective study. Four consecutive MDCT data sets, two low dose and two standard dose, were obtained for each nodule on separate breath-holds during the same session. The volume of each nodule was calculated by automated software. Repeatability was evaluated by Bland-Altman's approach and the coefficient of repeatability. Associations of the percentage of volume variation between two measurements with nodule diameter, emphysema percentage, nodule site, and nodule morphology were assessed by Spearman's correlation coefficient and the Kruskal-Wallis test. A p value of < 0.05 was considered statistically significant.
RESULTS. The range of variation of the volumes of pulmonary nodules between two subsequent measurements was –38% ± 60% for low-dose CT and –27% ± 40% for standard-dose CT. No significant statistical association was found between variation in volume measurements and nodule site, nodule diameter, nodule morphology, or emphysema percentage by semiautomated calculation of lung density.
CONCLUSION. Automated volume calculations of small pulmonary nodules can significantly differ between two subsequent breath-holds with both low-dose and standard-dose CT techniques; in clinical practice we recommend that a volume variation of greater than 30% for nodules between 5 and 10 mm should be confirmed by follow-up CT to be sure that a nodule is actually growing.
CT is unanimously recognized as the gold standard in the detection and characterization of pulmonary nodules. Continuous technologic development considerably improves its diagnostic ability, particularly in the detection of small nodules, thanks to the possibility of obtaining submillimeter reconstructions .
On the other hand, reasonable doubts arise about the accuracy of manual diameter measurements in relation to intraobserver variability and interobserver variability [2, 3]. The introduction in clinical practice of new software that can automatically identify and isolate pulmonary nodules and calculate their volume and doubling time finds its rationale in those premises. Prevention, particularly with the appearance of new lung cancer screening programs using low-dose CT, and oncology, with requests for monitoring therapeutic response, are the main fields of study.
The size of newly appeared nodules and the growth of known nodules on follow-up are the most important parameters in the management of pulmonary nodules. In both settings—planning the most appropriate workup or assessing any volume change—it is mandatory to know nodule diameter and volume with the best possible approximation. In a screening project setting, although solid noncalcified nodules > 10 mm should be examined using PET or biopsy and nodules < 5 mm can be monitored at a 1-year interval without major risk [4, 5], the management of 5- to 10-mm nodules represents an unsolved problem and relies mostly on follow-up. Therefore, it is essential to introduce in clinical practice an accurate and reproducible method for the reliable evaluation of pulmonary nodules; several authors have suggested that automated volume analysis is an accurate tool [6, 7].
The aims of this prospective study were thus to evaluate the reproducibility of in vivo automated volume calculations of pulmonary nodules with a diameter of between 5 and 10 mm with low-dose and standard-dose techniques during the same session, without the patient needing to move from the CT table, and to assess whether the repeatability of each technique varies according to nodule diameter, site, or morphology or to the presence and amount of emphysema.
Subjects and Methods
This prospective study was approved by the institutional ethics committee, and all participants were informed about the purpose and methods of the study and provided written consent. Between September 2005 and February 2007, 84 subjects (49 men and 35 women; mean age, 60.5 years) with 101 undetermined pulmonary nodules were enrolled in this study. Participants with at least one known solid pulmonary nodule between 5 and 10 mm detected by previous chest CT who were scheduled for follow-up CT were recruited.
For the purpose of this study, we selected only solid noncalcified pulmonary nodules (i.e., completely obscuring the parenchyma ) with a maximum axial diameter of between 5 and 10 mm with no visible pleural or vessel attachment. Two expert radiologists, both with 8 years' experience in CT, used electronic calipers to measure the nodules on images displayed with lung window settings (window width, 1,260 HU; window level, –480 HU) on a workstation screen.
To guarantee a completely operator-independent procedure, nodules marked by radiologists that could not automatically be segmented by dedicated software were excluded. The site (lobe) and shape (regular, irregular, or spiculated) of the nodule were reported by both radiologists for each nodule that satisfied these criteria. A third radiologist resolved differences in categorization and discrepancies in measurements between the two radiologists.
Because of a visible vessel or pleural attachment, 10 patients with a single nodule were excluded from the study. Eight patients with a single nodule were excluded because automated segmentation by the software was incomplete.
Sixty-six subjects (41 men and 25 women; mean age, 59.5 years; range, 34–78 years) with 83 undetermined pulmonary nodules were enrolled in this study.
CT was performed on a 16-MDCT system (LightSpeed 16, GE Healthcare). Four CT data sets were obtained for each nodule within a few minutes, without the patient needing to move from the CT table, on different breath-holds without contrast material injection: two low-dose and two standard-dose data sets.
Before the first CT acquisition, patients were trained to repeat a similar breath-hold to reduce significant respiratory excursions. During a full breath-hold we acquired the whole lung volume with the following low-dose CT protocol (low-dose protocol 1): 140 kVp, 40 mA, 0.8-second tube rotation, pitch of 1.75, 2.5-mm slice thickness, and lung reconstruction kernel. The acquisition field of view ranged from 320 to 380 mm depending on the patient's body habitus.
Once a nodule with the described characteristics was identified, a second low-dose CT examination was performed (low-dose protocol 2) with the same protocol and field of view as low-dose protocol 1 but with a limited-range excursion through the nodule for 50 mm along the z-axis to reduce patient exposure. Immediately after that, two standard-dose CT data sets (standard-dose protocols 1 and 2) were obtained during a repeated breath-hold using the following parameters: 120 kVp, automatic tube current modulation range of 100–440 mA, pitch of 0.938, 0.625-mm slice thickness, and lung reconstruction kernel.
All CT image sets were transferred to a workstation (Advantage 4.2, GE Healthcare) for further analysis. In standard-dose protocol 1, the maximum transverse axial nodule diameter was measured with electronic calipers by two radiologists using the same lung windows settings (window width, 1,260 HU; window level, –480 HU).
Commercially available software (ALA Single, GE Healthcare) was used by a single observer with 8 years' experience to calculate the volume of the pulmonary nodules for each CT data set. We decided to use a single observer to optimize the methods and give the best possible results by decreasing potential interobserver differences.
After the nodule was identified and manually marked by the operator with a mouse click, the software performed an automated segmentation of nodules and elaborated a 3D template model to provide a nodular volume estimate in cubic millimeters.
Lung volume, emphysema volume, and emphysema percentage were calculated for each patient on the whole-chest CT protocol (i.e., low-dose protocol 1) using a semiautomated calculation of lung density (Volume Viewer 2, GE Healthcare). The lung volume was obtained by selecting a threshold of between –1,023 and –200 HU on the volumetric reconstruction and manually cutting structures different from lung parenchyma, such as the trachea or part of the colon, when included in the reconstructed volume. From the 3D volume data set, a graph showing the frequency distribution of attenuation values of the voxels included in the volume is displayed. A threshold cutoff for voxels having attenuation values of less than –950 HU was selected, and the relative volume of these low-attenuation voxels and the percentage of total lung volume were calculated [9, 10].
The repeatability of low-dose and standard-dose CT measurements was evaluated using Bland and Altman's approach [11, 12]. Because the absolute differences between the repeated measurements increased with an increase in nodule volume, we applied a logarithmic transformation of the measurements, as suggested by Bland and Altman [11, 12]. The log transformation is the only one that allows results to be interpreted in relation to the original data: The mean difference of logarithmics, with its 95% limit of agreements, can back-transform (antilog) and corresponds to the geometric mean of the ratios of the original values, with its 95% limits of agreement. Thus, to interpret results in relation to the original data, we presented a plot of the ratio between the original measurements against their geometric mean, with the corresponding 95% limits of agreement. The precision of the limits of agreement was assessed by 95% CIs. We also calculated the coefficient of repeatability (CR) proposed by the British Standard Institution , which is defined as 1.96 × SD of the sample differences between two measurements. The CR is the value below which the absolute differences between two repeated measurements may be expected to lay with a probability of 95%; therefore, the smaller the coefficient is, the better the repeatability.
After evaluating repeatability within each method, we assessed whether repeatability depended on nodule or lung characteristics. For this purpose, we calculated the percentage of variation between the two original measurements for each method and correlated it with nodule diameter and emphysema percentage by Spearman's correlation coefficient. The Kruskal-Wallis test was performed to test whether the percentage of variation between measurements differed significantly by nodule site or morphology. A p value of < 0.05 was considered statistically significant. The analyses were performed using SAS software (version 8.2, SAS Institute).
Nodules were almost evenly distributed throughout all lobes: 23% (19/83) were in the right upper lobe; 13% (11/83), right middle lobe; 19% (16/83), right lower lobe; 21% (17/83), left upper lobe; and 24% (20/83), left lower lobe. The mean diameter ± SD of all the nodules measured by the two radiologists was the same, 7.2 ± 2.0 mm. Twenty-five nodules were between 5 and 6 mm; 24, between 6.1 and 7 mm; 15, between 7.1 and 8 mm; six, between 8.1 and 9 mm; and 13, between 9.1 and 10 mm. Forty-one percent had regular shape, 52% were irregular, and 7% were spiculated. The 332 volumes calculated by the program ranged from 33 to 1,276 mm3 for low-dose CT and from 40 to 1,306 mm3 for standard-dose CT.
The mean nodule volume (± SD) was 220 ± 241 mm3 in low-dose protocol 1 versus 220 ± 243 mm3 in low-dose protocol 2. For standard-dose protocol 1 and standard-dose protocol 2, the mean volume (± SD) was 216 ± 238 mm3 and 221 ± 242 mm3, respectively.
The CR was 1.63 (77.14 mm3) for low-dose and 1.40 (68.30 mm3) for standard-dose CT measurements. The geometric means of the ratios, with 95% limits of agreement, for low-dose and standard-dose CT were 0.99 (0.62–1.60) and 1.01 (0.73–1.40), respectively, with the highest percentage of variation observed for small nodules compared with larger ones (Fig. 1A, 1B). The 95% CIs for the lower and upper limits of agreement were, respectively, 0.56–0.68 and 1.46–1.75 for low-dose CT and 0.68–0.77 and 1.31–1.49 for standard-dose CT. Using the original measurements without logarithmic transformation, the arithmetic means of the difference between the two measurements with 95% limits of agreement for low-dose and standard-dose CT were –0.40 mm3 (–77.54 to 76.74 mm3) and 4.04 mm3 (–64.27 to 72.34 mm3), respectively.
The highest repeatability for low-dose CT was observed for the five nodules larger than 750 mm3 (geometric mean, 1.01; 95% CI, 0.94–1.09) and for standard-dose CT, for the seven nodules larger than 500 mm3 (geometric mean, 1.02; 95% CI, 0.91–1.14).
The percentage of variation in the volume measurements of the 83 lung nodules was 17.0% ± 18.7% (mean ± SD) for low-dose CT and 12.8% ± 11.5% for standard-dose CT.
The mean (± SD) lung volume, emphysema volume, and emphysema percentage were 5,526 ± 1,445 cm3, 115 ± 249 cm3, and 1.94% ± 4.04%, respectively. We did not find any significant statistical association between percentage of variation in volume measurements and the site or morphology of the nodules (Table 1). Moreover, no statistically significant correlation was observed between percentage of variation in volume measurements and nodule diameter or emphysema percentage.
TABLE 1: Average Percentage of Variation in Pulmonary Nodule Volume Measurements Based on CT According to Nodule Site and Morphology
Kruskal-Wallis test, p = 0.49 for low-dose CT, 0.14 for standard-dose CT.
Kruskal-Wallis test, p = 0.15 for low-dose CT, 0.36 for standard-dose CT.
Automated volume measurements of lung nodules have been proven to be accurate by in vitro studies [14, 15], but in vivo calculations can theoretically be affected by different and frequently unpredictable physiologic and pathologic conditions. These conditions include the different inspiratory levels that significantly affect pulmonary nodule measurements, as reported by Petkovska et al.  and Gietema et al. ; moreover, using cardiac gating, Boll et al.  found that nodules located near the heart show as much as a 34% volume change during the cardiac cycle. Among the pathologic causes are pre-existing obstructive diseases such as pulmonary emphysema and severe asthma .
Because variation in nodule volume is used as the principal tool in the diagnosis of the nature of pulmonary nodules < 10 mm detected by screening programs and because important clinical decisions are made (i.e., to perform surgery or not) according to these data, the accuracy of volume measurements is a critical factor.
In this study, we evaluated the reproducibility of automated volume measurements within two low-dose CT examinations and within two standard-dose CT examinations: As expected, we obtained a better CR for standard-dose CT than low-dose CT (CR = 1.40 vs 1.63, respectively). Moreover, if we analyze the ratios of two low-dose and standard-dose volume measurements versus the mean of the two measurements, we find that 95% of the volumetric changes between two subsequent measurements were expected to range between –38% and 60% for low-dose CT (95% CI, 0.62–1.60) and between –27% and 40% for standard-dose CT (95% CI, 0.73–1.40). An example of the variations is shown in Figures 2A, 2B and 3A, 3B.
These results suggest that automated volume calculations of some small pulmonary nodules may have significant variations that could drastically influence the diagnosis and further therapeutic choices. The variation in the volume calculations shown by our results is even more important if we consider that these results were obtained with an interval of only a few minutes between examinations and that, therefore, the variation could be greater between CT examinations performed at a 1- or 3-month interval. If we consider a volumetric variation threshold of 20% as a stability finding , we find that 33% (n = 27) of the nodules for low-dose CT and 24% (n = 20) for standard-dose CT are beyond the stability range and therefore are potentially subject to erroneous diagnostic decisions [21, 22], especially nodules smaller than 300 mm3 for which the highest percentage of variation was observed (Fig. 1A, 1B). Assuming the same geometric means and SDs observed in our study, a sample size of 41 nodules was enough to find that at least 5% of nodules would be beyond the stability range.
Increasing the volumetric variation threshold to 30%, we found that 14% (n = 12) of nodules on low-dose CT and 10% (n = 8) on standard-dose CT are beyond the stability range. A sample size of 47 nodules would be enough to find that at least 5% of nodules are beyond this further stability range.
We could probably explain the volumetric variations beyond the stability range as a set of concomitant events such as different inspiratory level, cardiac cycle, and inaccurate segmentation for volume calculation. Those conditions could also explain the substantial differences between our variability results and those reported by investigators whose in vitro study yielded excellent results with reproducibility errors of less than 3% .
Because we used different techniques and, in particular, a different slice thickness, we did not compare low-dose CT and standard-dose CT results. However, the better CR for standard-dose CT could in part be explained by the different slice thickness adopted because the accuracy of volume calculations is higher at thinner section thicknesses  particularly for small nodules. As reported by Goo et al. , the mean absolute percentage error of volume calculation in a phantom study increased consistently with a decrease in nodule size at each considered section thickness ranging from 0.75 to 5 mm. In our series, the volume of smaller nodules partially could have been affected by a 2.5-mm section thickness, and a thinner section would provide a better CR for low-dose CT. However, we did not evaluate absolute nodule volume but rather volume variation, and we can assume that the influence of slice thickness on automated calculation was the same in the two subsequent CT data sets; the same consideration concerns the lung reconstruction algorithm adopted.
The volumes calculated by the program ranged from 33 to 1,306 mm3: When we transformed volumes into diameters, we obtained, respectively, 4.0 and 13.6 mm, contrasting the inclusion criteria. The discrepancy of these measurements was probably caused by a combination of the inadequacy of manual measurements [2, 3] of small lung nodules and by transforming volume in diameter assuming that each nodule was a perfect spherical object.
According to our results and contrary to those reported by Gietema et al. , nodule shape (round, irregular, spiculated) is not an influent repeatability factor. Gietema et al. reported a high repeatability for round nodules but a reduced repeatability for nonspherical nodules with irregular shapes.
Although our results show that volumetric variations are not influenced by the amount of emphysema, this finding could be due to the low percentage of emphysema calculated in our series and to the low-dose technique used for calculation. The accuracy of the low-dose technique in quantifying emphysema is, in fact, not definitely established [9, 25].
Some limitations affect this study. The first depends on the in vivo study performed: We do not know the real volume of each nodule, but only the volume automatically calculated by the software, although the real volume is not a significant figure because we considered only volume variations, not absolute values. The second limitation is due to the nature of the nodule assumed as the target of the study—that is, solid nodules with no visible pleural or vascular contact. We decided to use this target to allow evaluation of repeatability in an optimal condition. In fact, evaluation of repeatability could have been further influenced by juxtapleural or juxtavascular nodules . Finally, the results reported in our study are strictly tied to the performance of the software and CT scanner used and therefore cannot be generalized.
In conclusion, despite several studies in the literature that showed the high accuracy of automated volumetric calculations of lung nodules [3, 22], our results suggest that automated volume calculations for pulmonary nodules between 5 and 10 mm can vary significantly between two subsequent breath-holds.
Response evaluation criteria in solid tumors or World Health Organization criteria are the current standards to assess therapy response in solid tumors; although automated volumetric evaluation is a promising method, the results of the current study suggest that a volumetric approach is also affected by pitfalls. In clinical practice we recommend that a volume variation of greater than 30% for nodules between 5 and 10 mm should be confirmed by follow-up CT to be sure that a nodule is actually growing. Further studies are necessary before this tool can be used to drive clinical decisions.
Madani A, De Maertelaer V, Zanen J, Gevenois PA. Pulmonary emphysema: radiation dose and section thickness at multidetector CT quantification—comparison with macroscopic and microscopic morphometry. Radiology 2007; 243:250–257
Gietema HA, Schaefer-Prokop CM, Mali WP, Groenewegen G, Prokop M. Pulmonary nodules: interscan variability of semiautomated volume measurements with multisection CT influence of inspiration level, nodule size, and segmentation performance. Radiology 2007; 245:888–894
Henschke CI. International early lung cancer action program (I-ELCAP): enrollment and screening protocol. www.ielcap.org/professionals/docs/ielcap-old.pdf. Published January 1, 2008. Accessed March 20, 2008
Pauls S, Kurschner C, Dharaiya E, et al. Comparison of manual and automated size measurements of lung metastases on MDCT images: potential influence on therapeutic decisions. Eur J Radiol 2008; 66:19 –26