Fluorine-18 FDG PET metabolic activity has been increasingly used as a quantitative biomarker in evaluating treatment response [
1–
3]. The FDG PET maximum standardized uptake value (SUV
max) is a commonly used FDG imaging parameter in oncology [
4–
9] because of its simplicity and excellent reader reliability. However, SUV
max, representing maximum single-pixel metabolic information of the tumor, may not accurately represent tumor biology. Imaging parameters that measure FDG, including metabolic tumor volume and total lesion glycolysis of the tumor, are emerging as imaging parameters for the prediction of outcome and therapy response in patients with solid tumors [
10–
14].
For metabolic tumor volume and total lesion glycolysis to be useful in clinical practice as reliable prognostic and predictive parameters, the reader agreement and variability of these parameters must be established. Semiautomatic segmentation methods have been shown to have less variability than manual methods [
15] in measuring tumor parameters. Semiautomatic isocontour region of interest based on a fixed percentage of SUV
max is a common and widely used method because of its availability across imaging workstations and ease of use [
14]. We previously studied the intrareader agreement of FDG tumor volumetric parameters in human solid tumors [
14]. However, the interreader reliability of metabolic tumor volume and total lesion glycolysis has not been well studied in a large patient population and needs to be established for clinical translation and reliability across readers.
The objective of this study is to establish the interreader reliability and variability of segmenting metabolic tumor volume and total lesion glycolysis of primary tumor using two fixed thresholds, 40% and 50% of lesion SUVmax, in patients with head and neck, lung, and breast cancers and to investigate the impact of the lesion FDG avidity and lesion location on interreader reliability and variability.
Materials and Methods
Patients and Study Design
This study is a retrospective evaluation of PET/CT images. Approval from our institutional review board was granted with a waiver of informed consent. Patients with lung, head and neck, and breast cancers who underwent a baseline PET/CT at our institution in 2009 were included in the study. Patients who underwent prior local or systemic therapy were excluded. The study population included 111 patients (mean [± SD] age, 61.9 ± 12.5 years). We analyzed 121 patients in our previous study to establish the intrareader agreement [
16], using only one reader. We used the same patient cohort as described in the previous study, except for 10 of these patients, because of technical failure of transfer of images from our archival system into the dedicated workstation. Hence, our study cohort was a total of 111 patients with head and neck, breast, and lung cancers.
PET/CT
All PET/CT studies were performed on a PET/CT scanner (Discovery STE 16, GE Healthcare) according to the institutional standard clinical protocol. A dedicated head and neck protocol was instituted for all patients with head and neck cancer. These dedicated head and neck scans were done from skull base to aortic arch with the arms down and were followed by a second scan from clavicle to mid thigh with the arms up to complete imaging from the skull base to the mid thighs. For patients with breast and lung cancers, the scans were done from skull base to mid thigh with the arms up. The average patient blood glucose level was 101.5 ± 24.8 mg/dL. Patients were injected with a mean dose of 13.1 ± 2.9 mCi (484.7 ± 107.3 MBq) of FDG, and the mean uptake time was 91.7 ± 25.2 minutes.
Image Analysis
All PET/CT studies were retrieved from the electronic archival system and were reviewed on a workstation (Advantage, software version 4.2, GE Healthcare). To establish the interreader reliability, two readers, a junior and senior faculty member, performed all reads independently. The junior faculty is board certified in radiology, is nuclear radiology fellowship trained, and had 4 years of experience reading PET/CT (reader 1). The senior faculty is dual board certified in radiology and nuclear medicine and had 10 years of experience reading PET/CT (reader 2). Reader 1 was the original reader in the prior intrareader study. The bias of his measurements for this study is minimized because this study was conducted with different analytic software, all measurements are independent of intrareader study, PET/CT scans were reviewed in random order, and there was a 6-month interval between the two studies.
For patients who had more than one lesion, the index lesion was a priori identified so that both readers would segment the same lesion for metabolic tumor volume and total lesion glycolysis. Thus, only one lesion was segmented in each patient. PET, CT, and fused PET/CT images were reviewed in axial, coronal, and sagittal planes. For the purposes of this study, the relevant imaging parameter measurements were the primary tumor SUVmax, metabolic tumor volume, and total lesion glycolysis segmented from PET. Metabolic tumor volume was defined as the tumor volume with FDG uptake segmented by two fixed thresholds, 40% and 50% of SUVmax. The total lesion glycolysis was defined as metabolic tumor volume multiplied by the mean SUV. We did not segment the FDG tumor volumetric parameters using a gradient segmentation method, because this method was not available in the workstation we used.
The fixed SUV
max threshold contouring method relies on including all voxels that are greater than a defined percentage of the maximal voxel within an operator-defined sphere. In this study, we have chosen two fixed thresholds, 40% and 50% of lesion SUV
max, because threshold values of 40–50% have been commonly used in previous studies [
17–
19]. Cross-sectional circles are displayed in all three projections (axial, sagittal, and coronal) to ensure 3D coverage of the primary tumor (
Figs. 1A and
1B). The edges of the primary tumor are semiautomatically calculated and outlined by the software. Because the readers define the cross-sectional circles, the reader variability may be introduced, if the readers are not careful to include all parts of the FDG-avid lesion in three dimensions or if the readers included adjacent FDG-avid structures. Either of these would lead to errors in calculated semiautomatic metabolic tumor volume and total lesion glycolysis. The threshold segmentation methods of volume measurement available in the Advantage Workstation software have been previously used [
14].
Statistical Methods
We present our summary statistics as the mean ± SD for continuous variables or frequency and percentage for categoric variables. The reliability of metabolic tumor volume and total lesion glycolysis was measured using the intraclass correlation coefficient (ICC), as generated by two-way random effects model and an absolute agreement definition, and is reported as a point estimate with 95% CI. The independent variables are the patients, and the dependent variables are the readers’ measurements. Readers and patients are considered as randomly selected from a large pool of readers and patients. The absolute agreement definition considers that systematic differences between the two readers are relevant in estimating the ICC. The ICC ranges between 0 and 1.0, with values closer to 1.0 representing better reproducibility [
20]. The reproducibility of ICCs based on their precision (half width of 95% CI × 100%) was also established. Variability of measurements between the readers was measured with Bland-Altman analysis with bias, SD, and 95% CI. Analysis was performed for absolute difference in metabolic tumor volume and total lesion glycolysis between the two readers.
To investigate the impact of FDG avidity of the primary tumors on interreader reliability and variability of volumetric segmentation, we performed ICCs and Bland-Altman subgroup analysis of the 30 lesions (patients) with lowest SUVmax and the 30 lesions (patients) with highest SUVmax of the study population. Furthermore, we investigated the impact of location of the tumors (head and neck or lung or breast) on the interreader reliability and variability of volumetric segmentation. We used the Prism 5 (GraphPad Software) and SPSS 20 (SPSS) statistical packages for all analyses, and all hypothesis tests are two-sided with a significance level of 0.05.
Discussion
The goal of our study was to establish the interreader reliability and variability of metabolic tumor volume and total lesion glycolysis using two fixed threshold isocontouring methods in various human solid tumors for quantitative imaging. We also investigated the impact of index lesion SUVmax on the interreader reliability and variability of metabolic tumor volume and total lesion glycolysis. Our results show that interreader reliability is excellent for both metabolic tumor volume and total lesion glycolysis segmented using 40% and 50% SUVmax. The precision is narrower for 50% SUVmax segmentation than for 40% SUVmax segmentation for the entire study population and for the breast, head and neck, and lung cancer subgroups. For low-FDG-avid lesions, the ICC precision width was also narrower for 50% SUVmax segmentation for metabolic tumor volume and total lesion glycolysis than 40% SUVmax segmentations. For high-FDG-avid lesions, the ICC precision is similar for both metabolic tumor volume and total lesion glycolysis segmented by both threshold methods. The Bland-Altman plots show that the variability between the readers (95% CI width of bias) is narrower for 50% SUVmax segmentations for both metabolic tumor volume and total lesion glycolysis for the entire study population as well as for breast, head and neck, and lung cancer subgroups and low- and high-FDG-avid lesions. The variability was also narrower for high-FDG-avid lesions for both metabolic tumor volume and total lesion glycolysis than for low-FDG-avid lesions.
The implications of these findings are significant for quantitative imaging with FDG PET volumetric imaging parameters. The FDG avidity and location of the tumors affect the precision and interreader variability of the measurements of FDG volumetric parameters. Fifty-percent SUV
max segmentations may be appropriate to use because they show narrower precision and variability with all subgroups. Because FDG avidity of the lesions influences the precision and variability, it may be necessary to establish a level of FDG avidity as a cutoff point, above which the volumetric parameters can be applied for therapy assessment in oncology. The lower interreader variability for 50% SUV
max segmentation can be explained by the fact that it includes central higher metabolically active volume than does 40% SUV
max segmentation. This is similar to the results from prior phantom studies showing lesser variability with 50% SUV
max segmentation than with 42% SUV
max segmentation in repeatability studies [
21]. In our study, the ICCs for 50% SUV
max segmentation approach 1.0, similar to the ICCs observed for SUV
max in prior studies. Hence, the 50% SUV
max segmentation for metabolic tumor volume and total lesion glycolysis can be used for therapy response assessment and as prognostic FDG parameters with high interreader reliability and minimal variability.
There are advantages and disadvantages for threshold-based segmentations. The threshold segmentations based on a fixed percentage of SUV
max are very easy to implement and use [
22]. The repeatability of fixed threshold segmentations is better than adaptive threshold segmentation, because the latter is dependent on a manually drawn region of interest leading to variable delineation [
21]. However, the accuracy of threshold-based approaches can be limited for lesions with a nonhomogeneous distribution [
23] of FDG uptake.
The results of this study provide preliminary data for more challenging investigation, in the future, for establishing pre- and post-therapy metabolic tumor volume and total lesion glycolysis assessment and interreader reliability and variability of these measurements in human solid tumors. More challenging variables, such as radiation-induced inflammation, would affect the interreader reliability and variability of these measurements in the post-therapy settings. It is important to establish the interreader agreement and variability so that treatment-related changes can be estimated more accurately for therapy assessment. It appears that FDG volumetric tumor parameters are superior to SUV
max and peak SUV in predicting the outcome of oncologic patients with multiple solid tumors [
24–
27] and are very likely to be used in routine clinical practice in the future. This requires careful investigations to establish the biologic and reader variability of these volumetric parameters, before deployment in clinical practice. The 50% SUV
max segmentation has a precision width of 0.2% and Bland-Altman bias 95% CI of 8.1 mL for metabolic tumor volume for the entire study population. Hence, the measurements may be more reliable for tumors with more than 8 mL of metabolic tumor volume.
Our results are limited because we included only two isocontour fixed threshold segmentation methods in this study. Further studies are needed to assess the other segmentation methods, such as the gradient and adaptive segmentation methods, to establish the interreader agreement of metabolic tumor volume and total lesion glycolysis in human solid tumors, compared with the 50% SUVmax threshold. We used a single vendor's software to segment the tumors, and it would be necessary to establish the interreader agreement of volumetric FDG parameters across different viewing platforms and software programs from different commercial vendors for clinical translation. In addition, the study was limited to only two readers from the same institution, albeit with different experience levels in reading PET scans and a priori–identified lesions. It would be ideal to perform the study with four or five readers from different institutions and with varied professional experience to simulate clinical practice. We included only untreated tumors and only three different tumor types. However, measuring metabolic tumor volume and total lesion glycolysis after treatment may be challenging because treatment-related inflammation may affect adjacent normal tissue. In addition, other FDG-avid tumors, especially those with low FDG avidity that is similar to that of background, such as a subgroup of pancreatic cancers, may be difficult to segment, and measuring the metabolic tumor volume and total lesion glycolysis can be challenging. We also identified the index lesion, a priori, if there was more than one lesion. Hence, the interreader agreement and variability estimates reported in this study are likely high-end estimates. We would need to perform the study with readers from different institutions who have varied experience and with both treated and untreated and unidentified lesions in future.
In conclusion, there is excellent interreader agreement, with ICCs approaching 1.0, for measurement of metabolic tumor volume and total lesion glycolysis with 40% and 50% SUVmax segmentations in various human solid tumors. The precision width is narrower for 50% SUVmax than for 40% SUVmax segmentation. FDG avidity of the tumors and location of the tumors influence the interreader reliability, precision, and variability of FDG volumetric parameter measurements. The 50% SUVmax segmentation has a precision width of 0.2% and a Bland-Altman bias 95% CI of 8.1 mL for metabolic tumor volume for the entire study population. Hence, the measurements may be more reliable for tumors with greater than 8 mL of metabolic tumor volume.