OBJECTIVE. The purpose of this study was to facilitate interpretation of 99mTc-mercaptoacetyltriglycine (MAG3) diuretic scans by identifying key interpretative variables and developing a predictive model for computer-assisted diagnosis.
MATERIALS AND METHODS. Ninety-seven studies were randomly selected from an archived database of MAG3 baseline and furosemide acquisitions and scan interpretations (obstruction, equivocal finding, or no obstruction) derived from a consensus of three experts. Sixty-one studies (120 kidneys) were randomly chosen to build a predictive model for diagnosing or excluding obstruction. The other 36 studies (71 kidneys) composed the validation group. The probability of normal drainage (no obstruction) at the baseline acquisition and the probability of no obstruction, equivocal finding, or obstruction after furosemide administration were determined by logistic regression analysis and proportional odds modeling of MAG3 renographic data.
RESULTS. The single most important baseline variable for excluding obstruction was the ratio of postvoid counts to maximum counts. Renal counts in the last minute of furosemide acquisition divided by the maximum baseline acquisition renal counts and time to half-maximum counts after furosemide administration in a pelvic region of interest were the critical variables for determining obstruction. The area under the receiver operating characteristic curve (AUC) for predicting normal drainage in the validation sample was 0.93 (standard error, 0.02); sensitivity, 85%; specificity, 93%. The AUC for the diagnosis of obstruction after furosemide administration was 0.84 (standard error, 0.06); sensitivity, 82%; specificity, 83%.
CONCLUSION. A predictive system has been developed that provides a promising computer-assisted diagnosis approach to the interpretation of MAG3 diuretic renal scans; this system has also identified the key variables required for scan interpretation.
Innovative approaches are required to address the escalating costs of medical care. In radiology and nuclear medicine, one such approach has been the use of computer-assisted diagnosis to increase efficiency, improve accuracy, and ultimately reduce costs by helping radiologists interpret studies at a faster rate with higher degree of accuracy. One particularly successful example of computer-assisted diagnosis in nuclear medicine is bulls-eye analysis in SPECT myocardial perfusion imaging . Other areas that can especially benefit from this approach are low-volume studies such as diuretic renography, with which many general radiologists might have had only limited training and experience [2–4]. Limited experience in diuretic renography is a wide-spread problem. A large percentage of the estimated 590,000 renal scans obtained annually in the United States are interpreted at sites that perform fewer than three studies per week . Even full-time nuclear medicine physicians disagree as much as 20% of the time about whether a kidney is obstructed, indeterminate with respect to obstruction, or not obstructed .
To evaluate suspected renal obstruction, an international consensus panel has recommended baseline radionuclide imaging with 99mTc-mercaptoacetyltriglycine (MAG3) followed by furosemide administration and an additional 15 minutes of imaging . From the baseline and post–furosemide administration acquisitions, quantitative variables, such as time to half-maximum counts after furosemide administration, are derived to help evaluate possible kidney obstruction. Physicians with limited experience or insufficient training may try to compensate with overreliance on a single parameter, such as time to half-maximum counts, but such an approach can lead to erroneous scan interpretation due to failure to consider variables that can be equally or even more important.
To address the problem of limited experience in diuretic renography, we used a statistical approach to develop a predictive model for evaluating suspected obstruction in patients referred for diuresis renography. We had two goals. The first was to develop a statistical decision support system for evaluation of the presence or absence of obstruction with data derived from the MAG3 renogram. The second goal was to determine the most important variables for making or excluding the diagnosis of obstruction so that general radiologists and nuclear medicine physicians could immediately apply this information in their practices to reach an informed interpretation.
Materials and Methods
General Protocol and Patient Selection
Data collection and database use were compliant with the terms of HIPAA and followed institutional review board approval with waiver of the requirement for informed consent. Our diuretic renography protocol is based on the protocol recommended by an international consensus panel . A baseline scan is obtained for 24 minutes after MAG3 administration. In approximately one third of our patients, obstruction can be excluded with the baseline scan, and the study is complete. The other two thirds of patients receive an IV injection of furosemide, and an additional 20-minute acquisition is performed.
The raw and processed data for all MAG3 scans obtained for possible obstruction are archived in a database. The database also includes a subgroup of patients whose scans experts have interpreted as showing obstruction, equivocal findings, or no obstruction. Readers were defined as experts if they had more than 20 years of experience in full-time academic nuclear medicine, multiple publications on renal nuclear medicine, and been invited to present renal nuclear medicine educational sessions at national radiology and nuclear medicine meetings.
Using information available only from the scan, each expert independently scored each kidney on the baseline scan as not obstructed or needing furosemide and scored the baseline and furosemide scans for the presence of obstruction on a 5-point scale: 1, not obstructed; 2, probably not obstructed; 3, equivocal; 4, probably obstructed; and 5, obstructed. For analysis, the 5-point scale was condensed to a 3-point scale on which not obstructed and probably not obstructed were considered not obstructed and probably obstructed and obstructed were considered obstructed. The consensus interpretation was determined by majority vote unless there was substantial disagreement. In those cases, a conference of the three readers was used to achieve a consensus interpretation. In general, there was very good agreement among the three readers. On only 8 of 191 kidneys (4.2%) did two of the three expert readers disagree regarding the presence or absence of obstruction. This disagreement was resolved at a consensus conference of the three readers.
Studies of 97 patients (54 men [56%], 43 women [44%]; mean age, 54 years; SD, 17 years; range, 20–87 years) were randomly collected from the archived database containing both MAG3 scans and the expert consensus reading; three patients had only one kidney. Patients with renal grafts, ileal conduits, and neobladders were excluded from the database because there were too few patients in these categories to build and test a predictive model. Eight patients underwent two studies, and one patient underwent three studies. The mean time between studies was 12.8 months (range, 1–41 months).
The studies were coded and dated. Expert readers might have recognized that there were two studies of the same patient (same code, different date), but the entry criteria were the same for both studies, and each study was interpreted independently. At the time of the baseline acquisition, 32 of the 97 studies were interpreted as having normal drainage from both kidneys. Because normal drainage excludes obstruction, furosemide acquisition was not performed. The experts classified 121 of the total 191 kidneys (63%) as not obstructed, 30 (16%) as having equivocal findings, and 40 (21%) as obstructed. Because a larger number of patients are generally needed to develop a robust model than to test the model once it is developed, we built the predictive model by randomly allocating 61 patient studies (120 kidneys) to the training set and allocating the other 36 patient studies (71 kidneys) to determining the predictive accuracy of the model.
Acquisition Protocol and Data Processing
After injection of approximately 10 mCi of 99mTc-MAG3, a baseline acquisition was performed with the patient supine. All studies were processed with an updated in-house version of the QuantEM program (Emory University) developed specifically for use with 99mTc-MAG3. The program has been validated in a multicenter trial and generates specific quantitative parameters recommended for scan interpretation and calculating camera-based MAG3 clearance [8–11]. Background-subtracted whole-kidney and cortical curves and multiple quantitative curve parameters were generated (Table 1). Age and sex also were recorded.
A standard 40-mg dose of furosemide was administered at the beginning of a separate 20-minute acquisition. The dose of furosemide was occasionally increased to 60 or 80 mg if the MAG3 clearance was reduced or if the patient was known to have an elevated creatinine concentration. Technologists assigned a region of interest over the whole kidney and a region of interest limited to retained activity in the renal collecting system (pelvic region of interest) (Figs. 1A, 1B, 1C, and 1D). The time to half-maximum counts was based on a linear fit. Quantitative parameters (Figs. 1A, 1B, 1C, and 1D and Table 1) were automatically extracted.
Statistical Analysis and Development of the Predictive Model
There were two outcomes of interest: whether experts diagnosed normal kidney drainage on the baseline scan (Y1 = 1 for normal drainage, which excludes obstruction, and Y1 = 0 otherwise) and whether experts diagnosed obstruction after furosemide administration (Y2 = 0 for no obstruction, Y2 = 1 for equivocal finding, and Y2 = 2 for obstruction). The expert consensus served as the reference standard. To predict normal drainage (Y1) on the baseline scan, we used a logistic regression model with the quantitative variables from baseline scan variables in a vector (X1). That is,
The regression coefficient vector β1T indicates the strength of association between the quantitative scan variables and the outcome Y1. Because the diagnosis of obstruction was rated at three levels after furosemide administration, we used a proportional odds model  for the baseline quantitative variables (X1) and furosemide acquisition scan variables (X2) as the independent variables. However, two important statistical issues had to be addressed. First, crucial second-stage furosemide predictor variables were not available when a patient did not receive furosemide, and the absence of these variables had to be considered in modeling of the probability of kidney obstruction . Second, whenever they diagnosed normal kidney drainage (Y1 = 1), the experts simultaneously inferred that the kidney was not obstructed (Y2 = 0). Because there was no variability in the outcome Y2 when Y1 = 1, we next modeled the outcome of Y2 given Y1 using the proportional odds regression model, as follows:
When X2 was missing, we used a multiple imputation approach [14, 15] by imputing missing values based on the conditional distribution of X2 given X1. The overall likelihood is described in Appendix 1. We maximized this likelihood to obtain parameter estimates. For predicting the outcome Y1, we used equation 1 to obtain the predicted value P(Y1 = 1). We used equation 2 to facilitate the prediction probabilities for each category—no obstruction, equivocal finding, and obstruction— and, consequently, the prediction probabilities of any combined categories. To evaluate the predictive power with receiver operating characteristic (ROC) curves , we focused on two prediction categories: P(Y2 ≥ 1) and P(Y2 = 2), which gave the predicted probability for the combined no obstructed–equivocal interpretation and the predicted probability of obstruction.
For the logistic regression model (equation 1) and the proportional odds model (equation 2), we first used a backward regression procedure to determine important predictors needed for the modeling process. To ensure we did not eliminate any important variables, we fitted a series of models to determine whether the model could be improved by adding other variables. We used the Akaike information criterion to rank several competing multivariate models based on the selected important predictors . A smaller value of the Akaike information criterion suggests a more parsimonious model. In addition, Hosmer-Lemeshow tests were performed to assess the goodness of fit  and calibration of the model .
To develop predictive models for the two outcomes—diagnosis of normal drainage (no obstruction) on the baseline study and diagnosis of obstruction after furosemide administration—we first carried out a variable selection procedure using the training dataset of 61 patients (120 kidneys) using the variables listed in Table 1. Demographic variables such as age and sex were initially considered but were not found significant. In the univariate model, when each variable was considered individually, all variables were found significantly predictive of normal drainage (values greater than 2.71) (Table 2). Higher values suggest greater importance of a particular variable, but because many of the values are highly correlated, the relative importance can best be evaluated in a multivariate model. In the multivariate model, only three variables—ratio of the counts in the kidney 19–20 minutes on the baseline acquisition to the maximum counts in the kidney) (20 min/max count ratio), ratio of the counts in the kidney after voiding to the maximum counts in the kidney (postvoid/max count ratio), and the time in minutes to reach maximum counts in the cortical region of interest (cortical Tmax)—were identified as useful for predicting the probability of normal drainage on the baseline study. When these three variables were included in the multivariate model, adding other variables did not improve the goodness of fit. Stated differently, all of the remaining variables were associated with p > 0.05, indicating that their inclusion in the model did not improve the results (Table 2). Table 2 does not show p for 20 min/max count ratio, cortical Tmax, or postvoid/max count ratio because these variables had already been chosen in the multivariate model.
The probability of obstruction was based on results of analysis of the variables in Table 3. In the univariate model, when each variable was considered in isolation, approximately two thirds of the variables were predictive of the presence or absence of obstruction. Again, when the multivariate model was considered, only two variables—ratio of kidney counts in the last minute of furosemide acquisition to maximum kidney counts in the baseline acquisition (postfurosemide 20 min/max baseline count ratio) and time in minutes for counts in the renal pelvis to decrease 50% after furosemide administration (pelvic T1/2)— were singled out as significant for predicting the probability of diagnosing obstruction. The other variables in Table 3 correlated with the two selected variables, and their inclusion in the model did not improve the goodness of fit (all p > 0.05) (Table 3). In Table 3, p is not provided for postfurosemide 20 min/max baseline count ratio or pelvic T1/2 because these variables had already been chosen in the multivariate model. In summary, only a few variables were identified as significant for predicting both no obstruction and obstruction.
Table 4 shows the regression coefficients associated with two probabilistic models. The results correspond to the probability of finding normal drainage on the baseline scan (model 1) and the probability of the diagnosis of obstruction after furosemide administration (model 2). The variable postvoid/max ratio was the most significant variable (p < 0.0001) in determining normal kidney drainage and, consequently, the most significant variable in excluding the diagnosis of obstruction. The mean postvoid/max count ratio for the kidneys interpreted as having normal drainage was 0.18 (SD, 0.16), compared with 0.78 (SD, 0.28) for kidneys that had abnormal baseline drainage, might have been obstructed, and required a second acquisition after furosemide administration. With our models, it was possible to calculate the effect of critical variables. For example, if we assume that three kidneys had the same average value of 20 min/max count ratio and cortical Tmax but had different postvoid/max ratios of 0.18, 0.78, and 1.38, the results obtained with our predictive model suggest that the probabilities of normal drainage (no obstruction) in these three kidneys are 0.89, 0.04, and 0.001.
The ratio of count 20 minutes after furosemide administration to baseline maximum count was the most significant variable (p < 0.0001) in determining the diagnosis of obstruction (Table 4, model 2). The mean values of this variable for the no obstruction and obstruction groups were 0.11 (SD, 0.12) and 0.92 (SD, 0.36). The effect of specific values of these variables also can be calculated with this model. For example, if we assume three kidneys have the same average pelvic T1/2 value but different postfurosemide 20 min/max baseline count ratios of 0.11, 0.92, and 1.90, the predicted probabilities for the diagnosis of obstruction would be 0.002, 0.268, and 0.994. Moreover, the probabilities of diagnosis of either obstruction or equivocal evidence of obstruction of a kidney would be 0.012, 0.665, and 0.999. The predictive equations are shown in the footnote to Table 4.
ROC curves were constructed to determine the predictive accuracy of the regression equations. The area under the ROC curve (AUC) for normal kidney drainage on the baseline scan was very high for both the training set (AUC, 0.97; 95% CI, 0.94–0.99) and the validation set (AUC, 0.93; 95% CI, 0.89–0.98) (3A and 3B).
The predictive equations result in probability as opposed to a binary (yes or no) clinical decision or a classification rule. To make the results useful to clinicians, we chose cutoff points in the training set for obtaining probabilities with high sensitivity and specificity in the range 0.83–0.95 (Table 5). For predicting normal kidney drainage on the baseline scan, the cutoff point was 0.5; for predicting the probability of renal obstruction versus no obstruction or equivocal finding, the cutoff point was 0.4, and for predicting the probability of combined obstruction and equivocal finding versus no renal obstruction, the cutoff point was 0.6. The sensitivity and specificity of the classification rules were independently evaluated for the validation data set with the same cutoff points, and the results are shown in Table 5. The validation set had high sensitivity and specificity for normal drainage (no obstruction) on the baseline scan (sensitivity, 0.85; specificity, 0.93). As expected, predicting obstruction was more difficult than predicting normal drainage, but the model had reasonably high sensitivity (case 1, 0.82; case 2, 0.78) and specificity (case 1, 0.83; case 2, 0.95) for the validation group.
Our statistical model can be used to predict normal drainage on baseline scans and the presence or absence of obstruction on scans obtained after furosemide administration and to determine the critical variables for excluding or making the diagnosis of obstruction. The analysis would have been more straightforward if we had evaluated only scans obtained after furosemide administration, but this design would have eliminated a large segment of patients, would have resulted in a sample not representative of our patient population, might have biased our results, and would have limited the applicability of the predictive equation to future patients.
A contribution of our paper is that our predictive model was designed to apply to the general problem that arises when an experimental design itself introduces missing data. We encountered two nontrivial statistical challenges in analyzing the data. First, the experimental design itself introduced missing data. When it was determined that a patient had normal drainage (no obstruction), a second acquisition with furosemide was not performed, and consequently, the furosemide acquisition data were missing. Our proposed imputation approach of handling missing data led to reasonable results, was further validated with an independent data set, and could be applied to clinical studies in which the experimental design includes decision nodes that introduce missing data. Second, the kidney outcomes of the same individual could be correlated, and hierarchic modeling might have been more appropriate. However, we first fitted a hierarchic model (not reported here) and found that the odds ratio of the outcomes was almost 1. We therefore focused on a simpler approach, modeling the marginal distribution of kidney-specific outcomes while not accounting for the correlation between outcomes. This method still allowed consistent estimates of regression coefficients . Most important, it resulted in a model that can be easily interpreted in clinical practice.
Based on the univariate model (Table 3), the ratios of furosemide acquisition 20-minute counts to baseline 1- to 2-minute counts, furosemide acquisition 1- to 2-minute counts to maximum baseline counts, and furosemide acquisition 20-minute counts to maximum baseline counts all have comparable likelihood ratios. These variables are all highly correlated with each other, and use of a model containing any one of the variables and the pelvic time to half-maximum count would be reasonable. We chose the model containing the furosemide acquisition 20-minute to baseline maximum count ratio because it yielded the lowest value for the Akaike information criterion, suggesting that it is the most parsimonious model fitting the data. Based on the importance of the postvoid to maximum count ratio on the baseline scan and the importance of the furosemide acquisition 20-minute to baseline maximum counts ratio, the ratio of postvoid kidney counts after furosemide administration to maximum counts at baseline is also likely to be a robust predictor of obstruction or absence of obstruction and should be evaluated in future studies.
The time to half-maximum counts after furosemide administration is commonly used to help make or exclude the diagnosis of obstruction , but this time can be affected by dehydration, size and compliance of the renal pelvis, bladder distention, patient position, dose of furosemide, technical factors such as region of interest assignment (whole kidney, cortical parenchyma, pelvis), and the algorithm used to calculate the time to half-maximum counts. In particular, owing to both reduced clearance and prolonged transit time, time to half-maximum counts is prolonged in kidneys with impaired function. As renal function deteriorates, more radiotracer remains in circulation and tracer continuing to enter the kidney compromises measurement of tracer exiting the kidney. This problem can be ameliorated by delaying the administration of furosemide until the entire dilated collecting system is filled with tracer or by assigning a region of interest over the dilated collecting system [21, 22]. In practice, the times to half-maximum counts for the whole kidney and pelvic regions of interest are quite similar because of the dominance of counts in the collecting system, but there can be major differences in selected cases (unpublished data). In summary, the pelvic time to half-maximum counts is a better index of emptying from the collecting system because there is no confounding parenchymal component.
Another set of variables relates the postvoid and prevoid kidney counts to the 1- to 2-minute or 2- to 3-minute baseline counts [23, 24]. The ratios of postvoid to 1- to 2-minute counts and of furosemide acquisition counts at 20 minutes to baseline counts at 1–2 minutes were clearly important variables based on the results of univariate analysis. They did not, however, have quite the same predictive value as ratios based on maximum counts.
We evaluated a diuresis renography protocol based on the approach recommended by an international consensus panel ; our results cannot be extrapolated to other protocols. Our studies also were confined to adults and patients with native kidneys, although we expect that the interpretative criteria would also apply to renal transplant patients with suspected obstruction. Patients with renal grafts, ileal conduits, and neobladders were not included in the database because their numbers were too few to build and test a predictive model. We assigned 61 studies (120 kidneys) to the training set and 36 studies (71 kidneys) to the validation set because, in general, more studies are required to develop a robust model than to validate it. Our results from the validation data set showed reasonably high estimates for sensitivity and specificity. The variability and CI of these estimates would have been smaller with a larger validation set.
Ideally, ROC analysis should have a reference standard independent of the method under evaluation. It could be argued that a Whitaker test or clinical outcome would be a better standard, but Whitaker tests are rarely performed, and clinical outcome is a compromised reference standard because the clinical decision to intervene or not to intervene is often contingent on the interpretation of the diuresis renographic study. For our purposes, the absence of an independent reference standard is not an issue because our goal was to develop a predictive statistical model that leads to study interpretations matching those of experts. This goal was based on the widely accepted assumption that radiologists who specialize in specific areas have more expertise in those areas than do general radiologists, who have a much broader and diverse practice.
The statistical predictive system performs well in assessment of normal kidney drainage on baseline scans and in identifying an obstruction or absence of obstruction of a kidney after administration of furosemide. Multivariate analysis showed that the ratio of 20-minute to maximum counts, cortical time to half-maximum counts, and the ratio of postvoid to maximum counts were critical variables for excluding obstruction on baseline scans. For the diagnosis of obstruction, the ratio of furosemide acquisition counts at 20 minutes to baseline maximum counts and pelvic time to half-maximum counts were the critical variables. Both of these variables are easy to measure and can be useful in interpretation of diuretic MAG3 renal scans. In addition to use in identifying the most important variables for interpreting MAG3 scans of patients with suspected obstruction, predictive model systems have potential for educating trainees and for assisting radiologists in clinical practice in interpretation of diuretic MAG3 scans.
Supported by National Institutes of Health grant RO1-EB008838, the National Institute of Biomedical Imaging and Bioengineering, the National Institute of Diabetes and Digestive and Kidney Diseases, and a URC grant from Emory University.
Cerqueira MD, Weissman NJ, Dilsizian V, et al. Standardized myocardial segmentation and nomenclature for tomographic imaging of the heart. AHA Writing Group on Myocardial Segmentation and Registration for Cardiac Imaging. Circulation 2002; 105:539–542
Hunsche A. Value of quantitative data in the interpretation of dieresis renography for suspected urinary tract obstruction. (thesis). Porto Alegre, Rio Grade do Sul, Brazil: Federal University of Rio Grande do Sul, 2006