Original Research
Genitourinary Imaging
April 14, 2021

Comparison of CT Texture Analysis Software Platforms in Renal Cell Carcinoma: Reproducibility of Numerical Values and Association With Histologic Subtype Across Platforms

Abstract

OBJECTIVE. The purpose of this article is to evaluate interobserver, intraobserver, and interplatform variability and compare the previously established association between texture metrics and tumor histologic subtype using three commercially available CT texture analysis (CTTA) software platforms on the same dataset of large (> 7 cm) renal cell carcinomas (RCCs).
MATERIALS AND METHODS. CT-based texture analysis was performed on contrast-enhanced MDCT images of large (> 7 cm) untreated RCCs in 124 patients (median age, 62 years; 82 men and 42 women) using three different software platforms. Using this previously studied cohort, texture features were compared across platforms. Features were correlated with histologic subtype, and strength of association was compared between platforms. Single-slice and volumetric measures from one platform were compared. Values for interobserver and intraobserver variability on a tumor subset (n = 30) were assessed across platforms.
RESULTS. Metrics including mean gray-level intensity, SD, and volume correlated fairly well across platforms (concordance correlation coefficient [CCC], 0.66–0.99; mean relative difference [MRD], 0.17–5.97%). Entropy showed high variability (CCC, 0.04; MRD, 44.5%). Mean, SD, mean of positive pixels (MPP), and entropy were associated with clear cell histologic subtype on almost all platforms (p < .05). Mean, SD, entropy, and MPP were highly reproducible on most platforms on both interobserver and intraobserver analysis.
CONCLUSION. Select texture metrics were reproducible across platforms and readers, but other metrics were widely variable. If clinical models are developed that use CTTA for medical decision making, these differences in reproducibility of some features across platforms need to be considered, and standardization is critical for more widespread adaptation and implementation.
Spatial heterogeneity is a common feature of renal cell carcinoma (RCC), with multiple studies showing variability within tumors with respect to pathologic features, genomics, RNA expression, mutation analysis, and protein expression [13]. This heterogeneity gives rise to a spectrum of biologic and clinical behavior that can impact prognosis and management but that can be challenging to identify with current imaging techniques and percutaneous biopsy [47]. CT texture analysis (CTTA) is one of a slate of quantitative, noninvasive radiomics tools that have been shown to provide useful diagnostic and prognostic information using data obtained from routine CT scans [8]. CTTA uses the distribution and frequency of CT scan pixel attenuation to provide an objective measurement of tumor heterogeneity, which is associated with adverse biology [911]. CTTA has been shown to be predictive of histologic and clinical outcomes for a variety of primary tumors including renal cell carcinoma [1217]. Despite this demonstrated utility, the role of CTTA in patient care has yet to be fully realized.
One of the primary obstacles preventing widespread adoption of CTTA is a lack of generalizability of studies. Multiple software platforms exist to measure the same CT texture metrics, and little comparison data are available between these platforms [11, 18]. Furthermore, information about the most robust and reproducible texture measures and the factors other than biologic heterogeneity that affect texture measures is still emerging. A growing body of literature describing technical factors may alter texture measures and affect their reproducibility [1925].
With this in mind, we sought to evaluate interobserver variability, intraobserver variability, and interplatform variability for CTTA using three commercially available software platforms on the same dataset of large RCCs and investigate the ability of these platforms to replicate a previously established association between texture features and histologic subtype.

Materials and Methods

The current study was approved by the institutional review board at the University of Wisconsin School of Medicine and Public Health and was HIPAA compliant. The need for informed consent was waived.

CT Images

The CT images obtained between 2000 and 2013 of 124 patients (42 women and 82 men; median age, 62 years and IQR, 53–68 years) with large (> 7 cm) RCCs were retrieved from the surgical database of the department of urology of the University of Wisconsin School of Medicine and Public Health and were retrospectively reviewed (Table S1, which can be viewed online at www.ajronline.org). All patients in the cohort had a CT scan performed before undergoing surgery or receiving any other treatment. Subsequent removal of the primary tumor and pathologic analysis that included histologic subtyping and nuclear grading were performed for all patients. CT texture analysis data from these patients were previously analyzed in the 2016 study by Lubner et al. [26].
CT scans with a portal venous phase were included because texture analysis of the portal venous phase provided the most robust results in the previous study [26]. A total of 74 of 124 (59.7%) CT examinations were performed at institutions other than the study institution. All scans were performed using MDCT scanners and the imaging parameters of a tube potential of 100–140 kV (with 110 of 124 [88.7%] scans using a tube potential of 120 kV) and a matrix of 512 × 512 × 16. Most CT scans were performed using automated or variable tube current, and the slice thickness used for 122 of 124 scans was 2–5 mm.

Texture Analysis Platforms

Three commercially available CTTA platforms were compared: TexRAD (Feedback Medical), Healthmyne Radiomic Precision Metrics (Healthmyne), and Mint Lesion (Mint Medical) (Table 1). Tex-RAD is a web-based platform that performs texture analysis on a single slice of cross-sectional CT images. TexRAD uses an optional initial filtration step and a Laplacian of gaussian spatial bandpass filter that has been described previously [2628]. In brief, image filtration allows texture features of different sizes to be extracted and allows the determination of spatial scaling factor (SSF) from fine (SSF, 1) to coarse (SSF, 6). Unfiltered data (SSF, 0) is also provided. TexRAD was the only platform in this study that allowed an optional image filtration step, so to enable comparison of similar data across platforms, filtered data were not included in the analysis. Unfiltered TexRAD data were used for the comparison. TexRAD extracts first-order texture features including mean gray-level intensity, entropy, SD, and mean of positive pixels (MPP). Healthmyne is a server-based platform that performs volumetric CTTA. Healthmyne does not perform a filtration step and analyzes unfiltered (SSF, 0) data. This software extracts more than 300 radiomics features, including first-order texture features (mean gray-level intensity, entropy, SD) and second-order texture features derived using a gray-level cooccurrence matrix. Second-order metrics allow quantification of the spatial relationship between pixels [11]. Healthmyne was the only platform that provided second-order metrics, and for this reason second-order metrics were not included in this analysis. Mint Lesion is a locally housed platform that performs both single-slice and volumetric texture analysis of unfiltered (SSF, 0) data. Mint Lesion extracts first-order texture features, including mean gray-level intensity, entropy, SD, and kurtosis. As noted above, there is heterogeneity between texture platforms regarding which metrics are provided. In this analysis, only texture features shared between platforms were included. This includes only first-order texture features, specifically, mean gray-level intensity, SD, entropy, skewness, kurtosis, uniformity, and MPP. These texture metrics have been described previously [11]. Although tumor volume is not a texture measure, this was included as an anatomic measurement for comparison in variability to texture features across platforms.
TABLE 1: Summary of Texture Features of Texture Analysis Platforms
FeaturePlatform
HealthmyneMint LesionTexRAD
Segmentation techniqueVolumetric (3D)Volumetric (3D) and single slice (2D)Single slice (2D)
Metrics calculatedFirst and second orderFirst orderFirst order
Filtration stepNoNoOptional

Note—Platforms used were Healthmyne Radiomic Precision Metrics (Healthmyne), Mint Lesion (Mint Medical), and TexRAD (Feedback Medical).

ROI Selection

For single-slice texture platforms (TexRAD and Mint Lesion 2D) ROI selection has been described previously, in which a single slice at the largest cross-sectional diameter of the tumor is analyzed [26]. The process of ROI selection for 3D platforms (Mint Lesion 3D and Healthmyne) starts when the CT scan of interest is opened in the texture analysis platform. The index slice at the level at the largest overall transverse tumor diameter is identified. The tumor is traced at this level with care to maintain the outer margins of the tracing just within the boundaries of the tumor. Once the single-slice ROI has been traced, automatic segmentation is performed. During automatic segmentation, the entire volume of the tumor as seen on cross-sectional CT is automatically segmented by the CTTA platform. After automatic segmentation, the user must manually refine the tumor boundaries to ensure nontumor tissues are excluded from analysis. Once correct tumor margins have been verified, the texture metrics are calculated. The software output includes information on a variety of histogram characteristics including mean gray-level intensity and SD of the pixel histogram, MPP, entropy, uniformity, kurtosis, and skewness. For the interplatform analysis, all segmentations were created by a trained medical student under the direct supervision of a fellowship-trained abdominal radiologist with 11 years of experience. Interobserver (different users) and intraobserver (same user) variability were measured using texture data from a subset of CT images from 30 randomly selected patients. For interobserver variability, segmentations were drawn by two medical students with experience in tumor segmentation under direct supervision by the fellowship-trained abdominal radiologist with 11 years of experience. Both students received training from the same abdominal radiologist and were blinded to each other's slice selection and segmentations. For intraobserver variability, two segmentations were drawn by one medical student at least 2 weeks apart.

Statistical Analysis

Continuous variables were summarized using descriptive statistics. Frequencies and corresponding percentages were determined for categoric variables. All comparisons between platforms were made using unfiltered (SSF, 0) texture data. Agreement between platforms and observers was measured using mean relative difference (MRD) with 95% CI (mean relative difference ± 1.96 SD of the difference) and concordance correlation coefficient (CCC), a method of comparison that has been used in previous analyses [29]. MRD between two values is defined as (M1 – M2) / M1 × 100, where M1 is measurement 1 and M2 is measurement 2, with an MRD of 10 denoting 10% difference between the observed values of each platform. For this analysis, an MRD of less than 10% was considered a small difference. CCC is a measurement of how far the plotting of two continuous variables lie from a line of perfect concordance (45 degrees on square scatterplot), with CCC of 1 denoting perfect concordance and CCC of −1 denoting perfect inverse concordance [30, 31]. In this analysis, CCC 0.90 and greater was considered very strong, CCC less than 0.90 to 0.60 was considered moderate, and CCC less than 0.60 was considered poor. Bland-Altmann plots were constructed to visualize concordance and MRD [32]. To compare the ability of each platform to predict histologic subtype, simple logistic regression was used to analyze association between histologic subtype (clear cell vs nonclear cell) and texture features. It should be noted again that this analysis has been previously performed in Lubner et al. [26], which found that certain texture metrics were significantly associated with histologic subtype and nuclear grade. Therefore, the purpose of this study was not to establish an association between texture metrics and histologic features, but rather to investigate the consistency of association between CTTA platforms even if there were differences in the numeric magnitude of the variables.
Three separate analyses were performed to compare platforms: single-slice to single-slice analysis using different platforms (TexRAD and Mint Lesion 2D), volumetric to volumetric analysis using different platforms (Healthmyne and Mint Lesion 3D), and single-slice to volumetric analysis using the same platform (Mint Lesion 2D and Mint Lesion 3D). Interobserver and intraobserver variability were measured using texture data from a subset of CT images from 30 patients. Intraobserver and interob-server variability were measured with MRD with 95% CI, Bland-Altman plots, and CCC.
Stata version 15.0 (StataCorp) was used for all analyses.

Results

Platform Comparison

Variability of texture metrics between platforms was measured in 124 patients (Table S1). For single-slice platforms (Mint Lesion 2D and TexRAD), with TexRAD as the reference platform, variability was lowest for mean gray-level intensity (CCC, 0.94; MRD, 0.30%), SD (CCC, 0.68; MRD, −2.13%), and MPP (CCC, 0.93, MRD = 0.03%) (Fig. 1A). Variability was high for entropy, skewness, and kurtosis, with CCC of 0.04, −0.01, and 0.08 and MRD of −44.47%, −306.67%, and −76.63%, respectively (Fig. 1B and Table 2).
Fig. 1A —Bland-Altman plots of mean gray-level intensity and entropy. Gray lines indicate 95% CI.
A, Plots for interplatform comparison between TexRAD (Feedback Medical) and Mint Lesion 2D (Mint Medical) for single-slice platforms for mean gray-level intensity (A) and entropy (B).
Fig. 1B —Bland-Altman plots of mean gray-level intensity and entropy. Gray lines indicate 95% CI.
B, Plots for interplatform comparison between TexRAD (Feedback Medical) and Mint Lesion 2D (Mint Medical) for single-slice platforms for mean gray-level intensity (A) and entropy (B).
Fig. 1C —Bland-Altman plots of mean gray-level intensity and entropy. Gray lines indicate 95% CI.
C, Plots for interplatform comparison between Healthmyne Radiomic Precision Metrics (Healthmyne) and Mint Lesion 3D for volumetric platforms for mean gray-level intensity (C) and entropy (D).
Fig. 1D —Bland-Altman plots of mean gray-level intensity and entropy. Gray lines indicate 95% CI.
D, Plots for interplatform comparison between Healthmyne Radiomic Precision Metrics (Healthmyne) and Mint Lesion 3D for volumetric platforms for mean gray-level intensity (C) and entropy (D).
Fig. 1E —Bland-Altman plots of mean gray-level intensity and entropy. Gray lines indicate 95% CI.
E, Plots for intraplatform comparison between Mint Lesion 2D and Mint Lesion 3D mean gray-level intensity (E) and entropy (F).
Fig. 1F —Bland-Altman plots of mean gray-level intensity and entropy. Gray lines indicate 95% CI.
F, Plots for intraplatform comparison between Mint Lesion 2D and Mint Lesion 3D mean gray-level intensity (E) and entropy (F).
TABLE 2: Concordance Correlation Coefficient (CCC) and Mean Relative Difference (MRD) Intraplatform Comparison
Platform, FeatureMRD (%)CCC95% CI
Volumetrica   
 Mean0.910.98−10.87 to 12.73
 SD−1.240.66−53.06 to 50.53
 Entropy6.200.07−40.65 to 164.74
 Skewness−392.430.48−10,043 to 9259
 Kurtosis37.790.18−579.26 to 654.71
 Uniformity99.980.0022.82–177.15
 Volume5.970.99−17.39 to 29.33
Single-sliceb   
 Mean0.300.94−22.68 to 23.43
 SD−2.130.68−53.61 to 49.25
 Entropy−44.470.04−52.94 to −36.13
 MPP0.030.93−53.61 to 49.25
 Skewness−306.67−0.01−9231 to 8610
 Kurtosis−76.630.08−1075.00 to 922.12
Intraplatformc   
 Mean3.320.96−13.40 to 20.02
 SD−0.170.74−49.48 to 49.11
 Entropy0.290.96−2.78 to 3.35
 MPP2.860.96−12.05 to 17.77
 Skewness−570.33−0.01−15,215 to 14,069
 Kurtosis40.550.16−299.25 to 380.09
 Uniformity−11.110.093453.00–324.79

Note—Mean = mean gray-level intensity, MPP = mean of positive pixels.

a
Values show comparison of Healthmyne Radiomic Precision Metrics (Healthmyne) to Mint Lesion (Mint Medical).
b
Values show comparison of TexRAD (Feedback Medical) to Mint Lesion.
c
Values show comparison of Mint Lesion 3D to Mint Lesion 2D.
For volumetric platforms (Mint Lesion 3D and Healthmyne), variability was lowest for volume, which served as an anatomic reference standard (CCC, 0.99; MRD, 5.97%). As with 2D measurements, low variability was seen for mean gray-level intensity (CCC, 0.98; MRD, 0.91%) and SD (CCC, 0.66; MRD, −1.24%) (Fig. 1C). Entropy showed low MRD (6.20%), though correlation was also low (0.07) (Fig. 1D). Kurtosis, skewness, and uniformity all showed high interplatform variability with MRD of 37.79%, −392.43%, and 98.98% and CCC of 0.18, 0.48, and 0.00, respectively (Table 2).
When comparisons were made using the same platform with different sampling techniques (single-slice vs volumetric using Mint Lesion), variability was lowest for mean gray-level intensity (CCC, 0.96; MRD, 3.32%), SD (CCC, 0.74; MRD, −0.17%), entropy (CCC, 0.96; MRD, 0.29%), and MPP (CCC, 0.96; MRD, 2.86%). Variability was high when comparing single-slice to volumetric measurements from the same platform for skewness (CCC, −0.01; MRD, −570.33), kurtosis (CCC, 0.16; MRD, 40.55), and uniformity (CCC, 0.09; MRD, −11.11) (Table 2).
Interplatform comparisons were also made for strength of association between individual texture measurements and histologic subtype and clear cell RCC versus nonclear cell RCC as a quality check to see if the relationship was maintained even if the numeric magnitude of the measured feature changed (Table 3). Mean gray-level intensity, SD, entropy, and MPP were significantly associated with histologic subtype in three platforms (TexRAD and Mint Lesion 2D and 3D). Skewness was significantly associated with histologic subtype in Mint Lesion 3D and Healthmyne, but not TexRAD or Mint Lesion 2D. Uniformity was significantly associated with histologic subtype for Mint Lesion 2D and 3D only. Kurtosis was not associated with histologic subtype in any platform (Table 4).
TABLE 3: Comparison of Association Between Texture Metrics and Histologic Subtype
FeatureMint Lesion 3DHealthmyne 3DMint Lesion 2DTexRAD 2D
OtherClear CellpOtherClear CellpOtherClear CellpOtherClear Cellp
Mean71.0958.47.0271.7059.29.0268.4557.94.0468.7058.16.05
SD31.8921.84.00131.3922.06.00131.9222.01.00131.1122.16< .001
Entropy6.896.44.0017.376.77.166.876.43.0014.754.46< .001
MPP70.2659.17.0270.2659.17.0270.2559.35.04
Kurtosis5.137.01.439.216.55.702.193.51.651.493.54.36
Skewness0.730.02.660.020.19.420.730.02.660.120.26.53
Uniformity0.010.03.00151.8355.19.500.010.03.001

Note—Comparison was of clear cell renal cell carcinoma (clear cell) versus other histologic subtype using one-way logistic regression for Mint Lesion (Mint Medical), Healthmyne Radiomic Precision Metrics (Healthmyne), andTexRAD (Feedback Medical) platforms. Dash indicates was metric not available for the platform. Mean = mean gray-level intensity, MPP = mean of positive pixels.

TABLE 4: lnterobserver and Intraobserver Comparison of Concordance Correlation Coefficient (CCC) and Mean Relative Difference (MRD) for Texture Metrics
Platform, FeatureInterobserver ComparisonIntraobserver Comparison
CCCMRD (%)95% CI (%)CCCMRD (%)95% CI (%)
Single slice      
 TexRAD      
  Mean1.00−0.41−5.26 to 2.961.00−0.52−4.48 to 3.43
  SD1.00−0.72−4.99 to 3.071.00−0.86−5.99 to 4.27
  Entropy0.99−0.21−1.31 to 0.730.990−1.33 to 0.97
  MPP1.00−0.47−3.76 to 2.801.00−0.54−4.41 to 3.33
  Skewness0.9814.52238.71 to −206.450.98−7.93−14.6 to 16.5
  Kurtosis0.986.98356.60 to −345.280.9822.77−288.42 to 242.88
  Uniformitya      
  Areaa      
 Mint Lesion      
  Mean0.99−0.70−6.66 to 8.070.990.29−9.64 to 10.23
  SD0.93−0.49−10.16 to 9.210.982.14−9.60 to 13.83
  Entropy0.980.30−2.23 to 1.750.202.52−20.0 to 25.1
  MPP0.99−0.61−6.36 to 7.580.980.36−8.99 to 9.60
  Skewness0.0098.78−1191.00 to 993.550.00102.33−987.1 to 1191.67
  Kurtosis0.32−10.68−99.35 to 120.480.440.32−57.18 to 57.63
  Uniformity0.0925.00−343.75 to 287.50.0927.60−284.5 to 340.14
  Area0.93−2.74−31.58 to 37.060.932.50−36.7 to 41.7
Volumetric      
 Mint Lesion      
  Mean1.00−0.54−3.03 to 4.130.990.45−6.02 to 6.97
  SD1.00−1.27−2.65 to 5.191.000.46−2.71 to 3.64
  Entropy1.00−0.15−0.58 to 1.091.000.15−0.50 to 0.61
  MPP1.000.00−2.93 to 4.100.990.35−5.94 to 6.68
  Skewness0.78−75.44−1436.00 to 1600.000.73152.63−1000.00 to 1303.51
  Kurtosis0.73−21.17−301.63 to 344.20.3228.83−220.68 to 278.35
  Uniformity1.000.00−9.09 to 0.001.000.00.0
  Volume0.99−3.05−12.13 to 18.210.993.91−11.74 to 19.83
 Healthmyne      
  Mean0.813.93−34.08 to 41.950.98−1.46−13.96 to 11.02
  SD0.624.62−43.22 to 52.420.96−1.47−17.73 to 14.55
  Entropy−0.0077.67−44.96 to 60.240.096.44−43.25 to 56.08
  MPPa      
  Skewness0.9243.55−796.77 to 882.250.39237.10−1758.00 to 2232.3
  Kurtosis0.992.89−45.92 to 51.830.0544.52−354.22 to 443.27
  Uniformity0.923.10−28.55 to 34.770.99−0.21−7.32 to 6.91
  Volume0.97−15.73−56.47 to 24.000.98−3.50−30.65 to 23.65

Note—Interobserver metrics were obtained independently by two users and intraobserver metrics were obtained by single user at two time periods 2 weeks apart. Mean = mean gray-level intensity, MPP = mean of positive pixels.

a
Metric not available for platform.

Interobserver Comparison

Interobserver variability was measured in 30 patients (Table S1). For the single-slice platforms (Mint Lesion 2D and TexRAD), mean gray-level intensity, MPP, SD, and entropy showed low interobserver variability with a CCC greater than 0.90 and an MRD of less than 5.0% (Table 4 and Fig. 2A). ROI area, although not reported for TexRAD, showed low variability and high concordance between users on Mint Lesion 2D. Skewness and kurtosis, which describe the symmetry and peakedness of a histogram, respectively, showed high variability (MRD > 5%) and high concordance (CCC > 0.90) on TexRAD and high variability (MRD > 5%) and low concordance (CCC < 0.90) on Mint Lesion 2D (Table 4).
Fig. 2A —Bland-Altman plots of entropy for interobserver and intraobserver comparison. Gray lines indicate 95% CI.
A, Plots compare interobserver measurements of single-slice entropy for Mint Lesion 2D (Mint Medical) (A) and volumetric entropy for Healthmyne Radiomic Precision Metrics (Healthmyne) (B).
Fig. 2B —Bland-Altman plots of entropy for interobserver and intraobserver comparison. Gray lines indicate 95% CI.
B, Plots compare interobserver measurements of single-slice entropy for Mint Lesion 2D (Mint Medical) (A) and volumetric entropy for Healthmyne Radiomic Precision Metrics (Healthmyne) (B).
Fig. 2C —Bland-Altman plots of entropy for interobserver and intraobserver comparison. Gray lines indicate 95% CI.
C, Plots compare intraobserver measurements of single-slice entropy for Mint Lesion 2D (C) and volumetric entropy for Healthmyne (D). Measurements were drawn by one medical student at least 2 weeks apart.
Fig. 2D —Bland-Altman plots of entropy for interobserver and intraobserver comparison. Gray lines indicate 95% CI.
D, Plots compare intraobserver measurements of single-slice entropy for Mint Lesion 2D (C) and volumetric entropy for Healthmyne (D). Measurements were drawn by one medical student at least 2 weeks apart.
For volumetric platforms (Mint Lesion 3D and Healthmyne), interobserver variability was measured in 29 patients (one patient was excluded from this component of analysis because of corrupted data from one platform). On both volumetric platforms, mean gray-level intensity showed low variability (CCC > 0.81; MRD < 5.0%) between users. On Mint Lesion 3D, volume, SD, entropy, and MPP also showed low variability (CCC > 0.90; MRD < 5.0%). On Healthmyne, entropy showed low concordance (CCC, −0.01) and an MRD of 7.67%, (Fig. 2B). Volume showed high concordance (CCC, 0.97) and an MRD of −15.73%, and SD showed moderate concordance (CCC, 0.62) and an MRD of 4.62%. Skewness and kurtosis on both platforms showed high variability between users, with low concordance and high MRD (Table 4).

Intraobserver Comparison

Intraobserver variability was measured in 30 patients (Table 4). For single-slice platforms (Mint Lesion 2D and TexRAD), mean gray-level intensity, MPP, and SD showed low interobserver variability with a CCC greater than 0.90 and an MRD of less than 5.0%. Entropy showed low variability (CCC, 0.99; MRD < 1.0%) in TexRAD. Concordance for entropy measured in Mint Lesion 2D was low (CCC, 0.20) despite a low MRD (2.52%) (Fig. 2C). For TexRAD, skewness and kurtosis both showed high concordance (CCC, 0.99) but high MRD (−7.93% and 22.77%, respectively). For Mint Lesion 2D, skewness and kurtosis both showed low concordance. MRD for kurtosis was low (0.32%) and for skewness was high (102.3%). The 2D area, although not reported in TexRAD, had low intraobserver variability with a CCC of 0.91 and an MRD of 2.50% (Table 4). For volumetric platforms (Healthmyne and Mint Lesion 3D), mean gray-level intensity, SD, uniformity, and volume showed low variability on both platforms, with a CCC greater than 0.90 and an MRD of less than 5.0%. For Mint Lesion 3D, entropy and MPP showed low intraobserver variability (CCC, 0.99; MRD, < 1%). For Healthmyne, entropy showed high intraobserver variability (CCC, 0.09; MRD, 6.44) and MPP is not reported. For both platforms, intraobserver variability was high for kurtosis and skewness, with CCC of 0.05–0.73 and and MRD 28.83–237.10% (Table 4).

Discussion

CTTA is an emerging technology with many potential diagnostic and prognostic applications. Previous studies have shown associations between tumor textural features and tumor histologic features, aggressiveness, and survival outcomes. Traditionally, tumor behavior is measured either by observing changes in tumor size over time or by obtaining a tissue biopsy and microscopic analysis.
In theory, CTTA is one radiomics tool that can provide a more global and quantitative measurement of tumor behavior and has the potential to obviate invasive tests or longitudinal observation, which can be time consuming. However, the general-izability and reproducibility of these findings has been an area of recent intense study. Numerous CTTA platforms—both commercially available and developed in-house—are used to obtain texture metrics, and little is known about the variability between these platforms and between users who take measurements with these platforms. There has been no standardization of platforms or metrics to date. The objective of this study was to measure variability of similar metrics obtained by three commercially available CTTA platforms, reproducibility of their previously established associations with tumor histologic subtype, and variability between different users and within the same user over time on the same dataset.
The results of our interobserver and intraobserver analyses provide insight into the reliability of CTTA measurements between users. This analysis shows that certain important CTTA metrics can reliably be collected by different users and by the same users over time. Mean gray-level intensity, SD, MPP, and entropy, which are variables commonly seen with important clinical associations in other studies, all showed low interobserver and intraobserver variability. Importantly, low variability was observed for both volumetric and single-slice platforms. Kurtosis and skewness, however, were less consistent between and within users. Because of the nature of these measures (symmetry and peakedness of the pixel histogram), it makes sense that these metrics may be more susceptible to extremes in tissue attenuation caused by calcification or adipose tissue. Slight differences in how the segmentation is drawn or use of a single-slice instead of a volumetric ROI may affect measures like this more strongly as a result of inclusion of more extreme pixel attenuation values in some cases. In sum, these findings suggest that certain key texture metrics (but not all) may be reliably obtained by multiple users. Those that may be more heavily impacted by extreme pixel attenuation values may be less reproducible.
Similarly, comparison of different platforms revealed that certain metrics are more consistent across platforms than others. For single-slice platforms (Mint Lesion 2D and TexRAD), mean gray-level intensity, SD, and MPP showed low MRD and moderate-to-strong correlation. Entropy, skewness, kurtosis, and uniformity showed poor correlation and high MRD. Similar findings were seen when comparing volumetric platforms (Mint Lesion 3D and Healthmyne) (Table 2). Interestingly, entropy measurements were similar when collected using different sample techniques on the same platform (Mint Lesion 2D and Mint Lesion 3D) (Table 2). Mean gray-level intensity, SD, and MPP also showed low variability within the same platform, whereas kurtosis, skewness, and uniformity showed high variability.
Features such as mean gray-level intensity and SD, which have appeared as meaningful features in a variety of studies, show reproducibility across platforms in this analysis. In previous studies, entropy has also repeatedly showed promise as a clinically useful texture metric [26]. Although the magnitude of entropy was more variable across platforms, in this analysis it remained significantly associated with histologic subtype on three of four platforms, suggesting that entropy may be an important descriptor of tumor heterogeneity even when the magnitude of the number is not the same. However, for entropy to be a valuable clinical tool, some standardization of how it is measured or calculated is needed. Interestingly, these differences in entropy values were not identified during intraplatform comparison using Mint Lesion, in which the metrics derived from single-slice segmentations were compared with volumetric segmentations using the same platform. These findings provide some reassurance that certain texture features likely reflect true biologic processes that can be objectively measured using CTTA. However, consistent differences in the metric values between platforms may limit the utility of CTTA in clinical decision making. Without a standardized reference to which the various CTTA platforms can be calibrated, clinicians will be unable to apply the findings from new studies to patient findings. This represents a real challenge for the widespread adoption of texture analysis and must be addressed in future studies.
There is ongoing debate regarding the potential benefits of volumetric ROI analysis compared with single-slice analysis. It has been suggested that single-slice segmentation is adequate for identifying clinically relevant differences within patients, but volumetric sampling offers the theoretic benefit of a larger and more robust pixel sampling [33, 34]. In addition, sampling with a single-slice technique offers the clear benefit of time efficiency; segmentation of one single slice takes considerably less time than segmentation of an entire tumor volume. In this analysis, there was no difference in predictive ability for a volumetric platform (Mint Lesion 3D) compared with a single-slice platform (Mint Lesion 2D). Furthermore, a greater number of outliers can be seen in measurements taken using a volumetric sampling technique than single slice (Figs. 1D, 2B, and 2D). This discrepancy may be explained by higher sampling precision using the single-slice versus volumetric technique, which allows selective omission of tumor slices containing extremely high-attenuation (calcifications) or low-attenuation (macroscopic fat) pixels (unless detection of these pixels is desirable for the clinical task [e.g., detecting fat in lipid-poor angiomyolipomas]).
There are some limitations to this study. This analysis investigated interplatform, interobserver, and intraobserver variability using three commercially available platforms that were in use at our home institution. Novel proprietary and commercial CTTA platforms continue to emerge, making a truly comprehensive assessment of available platforms challenging, and it is possible that inclusion of other platforms could have influenced the results. Furthermore, there were differences in the number and type of metrics produced by each platform. In this study, only metrics common across two or more platforms were included in the analysis, although there are many different types of texture metrics beyond what were included in this analysis [18]. Because not all platforms included a data filtration step, only unfiltered data were evaluated and no data normalization step was included. This is a highly select population of large, biopsy-proven RCC; correlation of texture metrics across platforms may differ in other populations. The observed associations between texture metrics and tumor histologic subtype has been previously described and may not be generalizable to all renal tumors [26]. Although not the primary focus this study, these findings suggest reliable associations between texture metrics and histologic subtype across CTTA platforms. Only portal venous CT images were included in this analysis, because this phase of contrast has been shown to have provided the most robust results in prior studies. Further analysis into other phases of contrast, such as unenhanced images, is needed. In addition, there was some heterogeneity in the CT techniques used for the studies included in this analysis. This variability was addressed by the use of identical patient cohorts when comparing metrics between platforms and users. Although inter- and intraobserver variability was assessed, this was a controlled experiment in which both medical students were coached in detail by the same radiologist. Further assessment, particularly of interobserver variability, in less tightly controlled circumstances is warranted.
This study was not intended to provide comprehensive guidance on the use of all CTTA platforms and metrics; rather, it investigated whether a difference exists between platforms. Future research is needed to identify which platform features may provide the highest clinical utility. Platforms such as Healthmyne, which offers second-order metrics, provide a theoretic benefit of greater richness of information. However, platforms such as Mint Lesion and TexRAD, which limit the metric output to fewer variables, may be more conducive to a streamlined clinical process. In this analysis, robust correlation for most texture metrics was observed between single-slice and volumetric sampling techniques on the same platform, and the increases in efficiency and sampling precision for the single-slice technique may justify its use over volumetric sampling techniques for certain clinical indications. Additional study is needed to compare volumetric to single-slice sampling techniques in their ability to predict clinical outcomes.
In summary, this study reveals that, despite reliable associations between tumor subtype and texture measurements and consistency between platforms for several important texture metrics, variability exists between texture metrics obtained from different platforms. Careful attention should be paid to the CTTA platform being used to measure tumor heterogeneity. Meaningful interpretation and comparison of texture metrics used in clinical practice requires a standardized system with which texture platforms can be calibrated and reliable, reproducible values obtained.

Supplemental Content

File (06_20_22823_suppdata_s01.pdf)

References

1.
Gerlinger M, Rowan AJ, Horswell S, et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med 2012; 366:883–892
2.
Ball MW, Bezerra SM, Gorin MA, et al. Grade heterogeneity in small renal masses: potential implications for renal mass biopsy. J Urol 2015; 193:36–40
3.
Halverson SJ, Kunju LP, Bhalla R, et al. Accuracy of determining small renal mass management with risk stratified biopsies: confirmation by final pathology. J Urol 2013; 189:441–446
4.
Kapur P, Peña-Llopis S, Christie A, et al. Effects on survival of BAP1 and PBRM1 mutations in sporadic clear-cell renal-cell carcinoma: a retrospective analysis with independent validation. Lancet Oncol 2013; 14:159–167
5.
Shuch B, Bratslavsky G, Linehan WM, Srinivasan R. Sarcomatoid renal cell carcinoma: a comprehensive review of the biology and current treatment strategies. Oncologist 2012; 17:46–54
6.
Shuch B, Bratslavsky G, Shih J, et al. Impact of pathological tumour characteristics in patients with sarcomatoid renal cell carcinoma. BJU Int 2012; 109:1600–1606
7.
Abel EJ, Carrasco A, Culp SH, et al. Limitations of preoperative biopsy in patients with metastatic renal cell carcinoma: comparison to surgical pathology in 405 cases. BJU Int 2012; 110:1742–1746
8.
Davnall F, Yip CS, Ljungqvist G, et al. Assessment of tumor heterogeneity: an emerging imaging tool for clinical practice? Insights Imaging 2012; 3:573–589
9.
Weiss GJ, Ganeshan B, Miles KA, et al. Noninvasive image texture analysis differentiates K-ras mutation from pan-wildtype NSCLC and is prognostic. PLoS One 2014; 9:e100244
10.
Bashir U, Siddique MM, Mclean E, Goh V, Cook GJ. Imaging heterogeneity in lung cancer: techniques, applications, and challenges. AJR 2016; 207:534–543
11.
Lubner MG, Smith AD, Sandrasegaran K, Sahani DV, Pickhardt PJ. CT texture analysis: definitions, applications, biologic correlates, and challenges. RadioGraphics 2017; 37:1483–1503
12.
Miles KA, Ganeshan B, Griffiths MR, Young RC, Chatwin CR. Colorectal cancer: texture analysis of portal phase hepatic CT images as a potential marker of survival. Radiology 2009; 250:444–452
13.
Ganeshan B, Panayiotou E, Burnand K, Dizdarevic S, Miles K. Tumour heterogeneity in non-small cell lung carcinoma assessed by CT texture analysis: a potential marker of survival. Eur Radiol 2012; 22:796–802
14.
Miles KA, Ganeshan B, Rodriguez-Justo M, et al. Multifunctional imaging signature for V-KI-RAS2 Kirsten rat sarcoma viral oncogene homolog (KRAS) mutations in colorectal cancer. J Nucl Med 2014; 55:386–391
15.
Ng F, Ganeshan B, Kozarski R, Miles KA, Goh V. Assessment of primary colorectal cancer heterogeneity by using whole-tumor texture analysis: contrast-enhanced CT texture as a biomarker of 5-year survival. Radiology 2013; 266:177–184
16.
Yip C, Landau D, Kozarski R, et al. Primary esophageal cancer: heterogeneity as potential prognostic biomarker in patients treated with definitive chemotherapy and radiation therapy. Radiology 2014; 270:141–148
17.
Zhang H, Graham CM, Elci O, et al. Locally advanced squamous cell carcinoma of the head and neck: CT texture and histogram analysis allow independent prediction of overall survival in patients treated with induction chemotherapy. Radiology 2013; 269:801–809
18.
Summers RM. Texture analysis in radiology: does the emperor have no clothes? Abdom Radiol (NY) 2017; 42:342–345
19.
Meyer M, Ronald J, Vernuccio F, et al. Reproducibility of CT radiomic features within the same patient: influence of radiation dose and CT reconstruction settings. Radiology 2019; 293:583–591
20.
Berenguer R, Pastor-Juan MDR, Canales-Vázquez J, et al. Radiomics of CT features may be nonreproducible and redundant: influence of CT acquisition parameters. Radiology 2018; 288:407–415
21.
Kim H, Park CM, Lee M, et al. Impact of reconstruction algorithms on CT radiomic features of pulmonary tumors: analysis of intra- and inter-reader variability and inter-reconstruction algorithm variability. PLoS One 2016; 11:e0164924
22.
Lu L, Ehmke RC, Schwartz LH, Zhao B. Assessing agreement between radio-mic features computed for multiple CT imaging settings. PLoS One 2016; 11:e0166550
23.
Orlhac F, Frouin F, Nioche C, Ayache N, Buvat I. Validation of a method to compensate multicenter effects affecting CT radiomics. Radiology 2019; 291:53–59
24.
Mackin D, Fave X, Zhang L, et al. Measuring computed tomography scanner variability of radiomics features. Invest Radiol 2015; 50:757–765
25.
Midya A, Chakraborty J, Gönen M, Do RKG, Simpson AL. Influence of CT acquisition and reconstruction parameters on radiomic feature reproducibility. J Med Imaging (Bellingham) 2018; 5:011020
26.
Lubner MG, Stabo N, Abel EJ, Del Rio AM, Pickhardt PJ. CT textural analysis of large primary renal cell carcinomas: pretreatment tumor heterogeneity correlates with histologic findings and clinical outcomes. AJR 2016; 207:96–105
27.
Ganeshan B, Abaleke S, Young RC, Chatwin CR, Miles KA. Texture analysis of non-small cell lung cancer on unenhanced computed tomography: initial evidence for a relationship with tumour glucose metabolism and stage. Cancer Imaging 2010; 10:137–143
28.
Goh V, Ganeshan B, Nathan P, Juttla JK, Vinayan A, Miles KA. Assessment of response to tyrosine kinase inhibitors in metastatic renal cell cancer: CT texture as a predictive biomarker. Radiology 2011; 261:165–171
29.
Krajewski KM, Nishino M, Franchetti Y, Ramaiya NH, Van den Abbeele AD, Choueiri TK. Intraobserver and interobserver variability in computed tomography size and attenuation measurements in patients with renal cell carcinoma receiving antiangiogenic therapy: implications for alternative response criteria. Cancer 2014; 120:711–721
30.
Lin LI. A concordance correlation coefficient to evaluate reproducibility. Biometrics 1989; 45:255–268
31.
McBride RB. A proposal for strength-of-agreement criteria for Lin's concordance correlation coefficient. Hamilton, New Zealand: NIWA Client Report for Ministry of Health, 2005; www.medcalc.org/download/pdf/McBride2005.pdf. Accessed January 2020
32.
Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986; 1:307–310
33.
Lubner MG, Stabo N, Lubner SJ, et al. CT textural analysis of hepatic meta-static colorectal cancer: pre-treatment tumor heterogeneity correlates with pathology and clinical outcomes. Abdom Imaging 2015; 40:2331–2337
34.
Ng F, Kozarski R, Ganeshan B, Goh V. Assessment of tumor heterogeneity by CT texture analysis: can the largest cross-sectional area be used as an alternative to whole tumor analysis? Eur J Radiol 2013; 82:342–348

Information & Authors

Information

Published In

American Journal of Roentgenology
Pages: 1549 - 1557
PubMed: 33852332

History

Submitted: January 13, 2020
Revision requested: February 4, 2020
Revision received: April 24, 2020
Accepted: June 3, 2020
Version of record online: April 14, 2021

Keywords

  1. CT
  2. renal cell carcinoma
  3. reproducibility
  4. texture analysis

Authors

Affiliations

Leo D. Dreyfuss, MD
Department of Urology, University of Wisconsin School of Medicine and Public Health, Madison, WI
Department of Radiology, University of Wisconsin School of Medicine and Public Health, E3/311 Clinical Sciences Center, 600 Highland Ave, Madison WI 53792
E. Jason Abel, MD
Department of Urology, University of Wisconsin School of Medicine and Public Health, Madison, WI
Jered Nystrom, MD
Department of Radiology, University of Wisconsin School of Medicine and Public Health, E3/311 Clinical Sciences Center, 600 Highland Ave, Madison WI 53792
Nicholas J. Stabo, MD
Department of Radiology, University of Wisconsin School of Medicine and Public Health, E3/311 Clinical Sciences Center, 600 Highland Ave, Madison WI 53792
Perry J. Pickhardt, MD
Department of Radiology, University of Wisconsin School of Medicine and Public Health, E3/311 Clinical Sciences Center, 600 Highland Ave, Madison WI 53792
Meghan G. Lubner, MD
Department of Radiology, University of Wisconsin School of Medicine and Public Health, E3/311 Clinical Sciences Center, 600 Highland Ave, Madison WI 53792

Notes

Address correspondence to M. G. Lubner ([email protected]).
M. G. Lubner has received grant funding from Philips Healthcare and Ethicon. P. J. Pickhardt is an advisor to Bracco and shareholder in Elucent Medical, SHINE Medical Technologies, and Cellectar Biosciences. The remaining authors declare that they have no disclosures relevant to the subject matter of this article.

Metrics & Citations

Metrics

Citations

Export Citations

To download the citation to this article, select your reference manager software.

Articles citing this article

View Options

View options

PDF

View PDF

PDF Download

Download PDF

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share on social media