|
|
||||||||
Review |
1 Department of Radiology, Medical University of South Carolina, 169 Ashley
Ave., Rm. 297, PO Box 250322, Charleston, SC 29425.
2 Department of Internal Medicine, Division of Pulmonary and Critical Care
Medicine, Medical University of South Carolina, Charleston, SC.
Received July 19, 2007;
accepted after revision September 25, 2007.
CME
Abstract
|
|
|---|
CONCLUSION. Although there are hopeful developments in lung cancer screening, a number of unresolved issues must be answered before adopting screening on a large scale. Currently no data exist to suggest that lung cancer screening with CT will result in a decrease in lung cancer mortality.
Keywords: chest imaging CT lung cancer screening
|
|
|---|
The idea of screening for lung cancer goes back to the 1950s and the Philadelphia Pulmonary Neoplasm Research Project, which performed periodic photofluorogram screening on more than 6,000 male volunteers [7]. Although survival was slightly better in the screening-detected cancers than in the symptom-detected cancers, no difference was seen in the outcome [7]. In the 1970s and 1980s, several randomized controlled trials of chest radiography were performed with similar results: More early-stage cancers were found in the screened group, more surgical procedures were performed, but lung cancer–specific mortality was similar, leading to the conclusion that screening was not helpful in reducing lung cancer mortality [8, 9]. Interest in screening was rekindled in the mid to late 1990s with the reports of high rates of early-stage cancers found on CT screening [10–14]. Ultimately, to show the effectiveness of screening, not only do more early-stage cancers need to be found in the screened group, but there must also be fewer late-stage cancers (i.e., a stage shift) [15]. Because these trials did not contain a control group, they were also insufficient to determine whether CT screening could decrease the lung cancer mortality rate.
These promising results led to the funding of the National Lung Screening Trial (NLST) by the National Cancer Institute [16]. The NLST is a randomized controlled trial comparing CT screening with chest radiography, with lung cancer mortality as the end point. More than 50,000 participants were enrolled across the United States. Screening ended in 2006, and subjects are currently in follow-up, with final results expected around 2010. A second randomized trial of CT screening, the NELSON trial, has no screening as the comparison armand has accrued 20,000 participants in Belgium and Denmark [17].
Although the definitive trial methodology for determining the benefit of screening is ongoing, the controversy over screening remains in the public domain. This culminated in the publication of two major scientific papers that reached vastly different conclusions [18, 19] and two policy briefs taking a more philosophic view of screening in the context of social responsibility [20, 21]. In this article, we summarize the cases for and against lung cancer screening in the context of the 10 criteria for an effective screening test [22] (Table 1). We end with two questions: If lung cancer screening is effective in reducing the lung cancer mortality rate, will it be cost-effective? And will the target population actually participate in a lung cancer screening regimen?
|
Biases Inherent in Screening Studies
|
|
|---|
Lead-time bias results from the earlier detection of a disease, which leads to an earlier diagnosis and an apparent survival advantage but does not truly affect the date of death. In other words, survival increases without an effect on the disease-specific mortality. Length-time bias relates to the relative aggressiveness of tumors. In a screened group, more indolent tumors are more likely to be detected, whereas aggressive tumors are more likely to be detected by symptoms. This disproportionally assigns more indolent disease to the intervention group and results in the appearance of a survival benefit. Overdiagnosis bias, the most extreme form of length-time bias, occurs when the disease is detected and thought to be "cured." However, if the disease had not been detected at all, it would never have caused symptoms. These tumors provide the illusion of a cure where none was needed. The extent to which overdiagnosis exists in any screening test can profoundly influence the apparent success of screening.
This is not to say, however, that inherent biases do not occur in randomized control trials (RCTs). In a prospective RCT, the intervention arm (in the case of lung cancer, CT) is compared with a control arm that consists of usual care. In each case, participants are followed up for a number of years after the intervention, with reduction in disease-specific mortality as the end point of the following period. Although randomization should lead to both arms of the trial containing similar populations, crossover of patients from the control arm to the intervention arm (e.g., in the NLST, a subject in the radiography arm undergoes CT outside the trial) can confound the results, potentially reducing the disease-specific mortality of the control arm. The feasibility trial for the NLST showed a 0.9% crossover rate after the prevalence screening and 1.3% after year 1 [23]. The NLST is empowered to account for contamination of the chest radiography (control) arm.
Two other biases are inherent in RCTs that may influence study outcome: sticky diagnosis and slippery linkage [24]. "Sticky diagnosis" refers to the increased likelihood of disease detection in the screened population. This means that the target disease has a higher likelihood of being listed as the cause of death even if it is not truly related. Thus, the apparent disease-specific mortality of the screened disease will be artificially increased. "Slippery linkage" refers to the possibility that the downstream results of screening may lead to mortality without being attributed to the target disease itself. For example, a screening-detected nodule undergoes wedge resection and ultimately proves to be benign. If the subject died from complications related to the procedure, the death would not be attributed to lung cancer although the screening process directly contributed to death. Because death is not assigned to the target disease, the value of screening may be overestimated. For this reason, a corollary end point to disease-specific mortality is all-cause mortality [24].
Screening for Lung Cancer: The Case for Optimism
|
|
|---|
|
|
The experience in Japan comes predominately from three trials, Anti-Lung Cancer Association (ALCA) [13], Hitachi Employee's Health Insurance Group (Hitachi) [12], and Matsumoto Research Center (Matsumoto) [14, 27]. All of these studies used 10-mm collimation for the CT scans; and two studies, ALCA and Matsumoto, also included sputum cytology in the screening regimen. In the ALCA study, the screening frequency was 6-month intervals. Of 15,050 total participants, 72 lung cancers were detected during the prevalence screening (0.4%), 57 of which were stage IA (79.1%). Among the three trials, a total of 21,762 incidence (annual) screenings have been reported, with 60 (0.2%) new cancers detected. Fifty (83.3%) of these were stage IA. Note that the criteria for enrollment were less stringent in Japan, with screening available at a younger age, usually 40 years, and a history of smoking was not a requirement for participation. The percentage of nonsmokers in these trials was 14–53%. Thus, although the numbers are encouraging, it is unclear that the success of the Japanese trials can be generalized to usual lung cancer screening cohorts.
The ELCAP trial was the first major lung cancer CT screening trial in the United States. It enrolled 1,000 symptom-free individuals 60 years and older who had at least a 10-pack-year history of smoking [10]. The prevalence screening revealed 233 noncalcified nodules and 27 lung cancers, of which 23 (85.1%) were stage I. During incidence screening, seven additional lung cancers were identified by screening: five stage I and two by symptoms, both advanced stage cancers [11]. The results of ELCAP combined with data from Japan led to the performance of I-ELCAP, a large multicenter, multinational nonrandomized trial [28].
The I-ELCAP is an international consortium of institutions offering lung cancer screening under a similar technical protocol with standardized definitions and central pathology review [28]. However, the criteria for screening are derived at each local institution. Although the study specifies that subjects should undergo prevalence screening and at least one annual screening, interim cancers should be documented, and diagnosed lung cancers should be followed up for 10 years, the study has not specifically tracked subjects not diagnosed with lung cancer.
Recently, the group reported on more than 31,000 prevalence screening examinations and more than 27,000 annual screening examinations resulting in the diagnosis of 405 prevalence cancers, five interim cancers, and 74 cancers at annual screening. From these subjects, the group determined that 85% had clinical stage I lung cancer, resulting in a 10-year survival rate of 88% for the stage I group. The percentages of stage I cancers and survival were much higher than traditionally reported for lung cancer. From this, the authors concluded that 80% of lung cancer deaths could be prevented by screening [19]. A similarly high rate of stage I cancers was found in the New York group of I-ELCAP [29].
A corollary study published by the same group also provides optimism based on tumor size [30]. Reviewing 436 cases of non–small cell lung cancer (NSCLC) diagnosed during the study, the percentage of N0M0 cases was 85% overall and 91%, 83%, 68%, and 55% for tumors < 15, 16–25, 26–35, and > 36 mm, respectively. Overall, statistical significance was seen in the relationship of tumor size to tumor stage at the smaller sizes. The trend for a size–stage relationship was most pronounced for solid nodules, whereas most part-solid and nonsolid nodules were N0M0 (97%) regardless of size.
|
|
|---|
Screening for Lung Cancer: The Case for Skepticism
|
|
|---|
From Tables 2 and 3, it can be seen that the average yield for lung cancer at the prevalence screening ranges from 0.3% to 2.3% and depends on the characteristics of the screened population. In general, the yield drops off 2–3 times with incidence screening. The rate of detection of stage I cancers ranges from 50% to 80% at prevalence screening. During follow-up screening, anywhere from 25% to 85% of detected cancers are stage I. Pooling U.S. data, only 60% of cancers detected after the prevalence screening were stage I. The discrepancy becomes more clear when viewed from the standpoint of the U.S. ELCAP [11, 28] data (23/27, 85% stage I) versus Mayo and NLST feasibility data (19/43, 44% stage 1) [23, 35]. The source of and reasons for the discrepancy in the stage of prevalence cancers in the United States remain unexplored.
Furthermore, the issues of overdiagnosis and accuracy have not been adequately addressed in the context of CT screening (criterion 3). In the study by Bach et al. [18], lung cancer diagnoses in three nonrandomized trials were compared with a previously validated model of lung cancer risk. Overall, there were 144 cases diagnosed, compared with 45 expected, and 109 resections with 11 expected without a decline in either advanced lung cancers or lung cancer deaths. This certainly raises the question of whether the added diagnoses will affect lung cancer mortality, and if not, the added 100 cases may be seen as overdiagnosis.
The argument against overdiagnosis is based on the prevailing belief that untreated lung cancer is a uniformly fatal disease. However, this belief arises from the knowledge of deaths related to lung cancer without an adequate accounting of deaths in individuals with, but unrelated to, undiagnosed lung cancer and from post-hoc analysis of tumor growth in chest radiography screening trials [36]. Other studies of screening-detected cancers on chest radiography attempting to refute the concept of overdiagnosis are hampered by inadequate staging and the definition of lung cancer deaths [37, 38]. Thus, it is not clear whether these tumors were truly early stage or whether they were in fact the primary cause of death. The analysis by Flehinger et al. [37] included 21 T1 and 24 T2 tumors; of the 45 cancers, 28 were medically inoperable, and the criteria for lung cancer death were not defined. In the study by Sobue et al. [38], only 10% of subjects underwent CT, 70% did not have evidence of disease progression, and yet 80% of deaths were attributed to lung cancer.
Further evidenceof overdiagnosis bias comes from the growth rate of screening-detected cancers. The volume doubling time for lung cancer has generally been considered to be between 30 and 365 days, usually less than 120 days, with longer doubling time or lack of growth at 2 years assumed to be relative signs of benignity. With lung cancer screening, two new types of nodules, nonsolid and part-solid, have been described. One series of 19 nonsolid nodules from Japan that were followed up for more than 2 years resulted in 10 operative procedures for five benign nodules, four broncho-alveolar cell carcinomas, and one adenocarcinoma [39]. One malignancy did not show growth over a 2-year interval. Lung cancers diagnosed during the Mayo CT screening trial consisted of 50% adenocarcinomas with median and mean doubling times of 343 and 746 days, respectively [40]. Nonadenocarcinoma histology had mean doubling times of 34–100 days. Overall, 27% of cases in that series had a doubling time of more than 400 days and made up 25% of all stage IA cancers. A similar result was obtained in Japan, where 27 of 61 (44%) screening-detected cancers had doubling times of more than 450 days. Ultimately, if the rate of early-stage cancer diminishes on incidence screening, it will provide further evidence of overdiagnosis at prevalence screening.
The accuracy of CT for the detection of lung cancer is a complex issue (criterion 4). Although clearly its sensitivity exceeds that of chest radiography for both nodule detection and cancer detection, the specificity is quite poor given that noncalcified nodules should be viewed as potential cancer. Thus, CT accuracy is relatively poor as a result of the high number of false-positive examinations. In most series, the rate of nodule detection is 20–50%, with more than 90% of the detected noncalcified nodules ultimately shown to be benign.
Many investigators are exploring which noncalcified nodules are clearly benign and therefore do not require additional follow-up. For some, these advancements are viewed as methods to improve the efficacy of lung cancer screening. For example, Libby et al. [41] showed that in a small subset of 41 nodules, 29% had decreased in size or resolved at the 2-month follow-up CT. Markowitz et al. [42] divided nodules into indeterminate and suspicious categories and found only three cancers in 727 nodules deemed indeterminate over 18 months of follow-up, whereas 30 of 102 suspicious nodules were proven to be malignant. Although this confirms the concept that many small nodules can be safely followed up, perhaps without interval imaging until the next annual screening, it means that periodic follow-ups will be required and nodule detection may well have a quality-of-life impact on individual subjects.
Others have found that growth analysis may be deceiving. Jennings et al. [43] reviewed 149 stage I lung cancers that had periodic follow-up. Thirteen of these proven cancers, ranging in size from 10 to 20 mm, did not show interval growth and, surprisingly, four decreased in volume by at least 25% [43]. Therefore, even a decrease in the size of a nodule must be viewed with caution. A consensus statement by the Fleischner Society recommends that nodules > 4 mm in high-risk individuals should be followed up at 6-month intervals for at least 2 years [44].
A reasonable corollary question is "Is CT-based detection early enough?" (criterion 5). Because it is clear that tumor biology, as evidenced by volume doubling time, is highly variable, it is reasonable to think that other factors are highly variable, including the ability to metastasize, the ability of immune surveillance to ward off metastatic disease, and the vulnerable sites for metastatic disease. For instance, circulating tumor cells can be detected in peripheral blood by reverse transcriptase-polymerase chain reaction in up to 60% of resectable lung cancers and 38% of pathologic stage I lung cancers [45]. If host factors are most important, then small size per se will not be as critical for altering lung cancer mortality. If one looks at a tumor from the initial carcinogenic mutation until the time it reaches 1 mm, it will have undergone 20 volume doublings. An additional 10 volume doublings will result in a tumor size of 10 mm, and an additional four volume doublings will result in a size of 25 mm. Assuming a constant growth rate for a doubling time of 60 days, a tumor will reach the size of 1, 10, and 25 mm in 1,200, 1,800, and 2,040 days from the initial mutation. For a doubling time of 365 days, those times increase to 20, 30, and 34 years. If CT detects cancer at an average size of 10 mm and chest radiography at 25 mm, the critical period in the disease is between volume doublings 30 and 34, and screening will truly benefit only those whose tumors would have metastasized between these two time points. From our example, the window of detection benefit would be from 240 days for rapid-growing tumors to 4 years for relatively slow-growing tumors.
In general, the radiation dose risk–benefit ratio of CT favors performing CT in symptomatic individuals; however, some are concerned that this will not be the case for lung cancer screening (criterion 6). Unlike the breast, the lung remains a radiosensitive organ well into the sixth and seventh decades of life, and thus has the potential for developing a radiation-induced cancer. Brenner [46] has suggested that annual CT screening resulting in a lung organ dose of 5 mGy from age 50 to age 75 would increase the number of expected lung cancers by 5%, so the mortality benefit would need to exceed 5%. These risks could be mitigated by starting screening later or by increasing time between screenings. A risk–benefit analysis performed on the Italung screening trial [47] concluded that there was benefit based on an expected mortality benefit of screening of 20–30%and that excess mortality from screening would be closer to 1%. The discrepancy may in part be due to their calculations being based on total body dose rather than organ-specific dose. Although this calculationshows that the risk of radiation should not be a significant issue if the mortality benefit from screening is statistically significant, it also suggests that, in the absence of a mortality benefit, screening may not be neutral, but harmful.
The therapy for early-stage lung cancer, although good, is not benign (criterion 10). In the American College of Surgeons Oncology Group Z0030 trial [48], the mortality rate for lobectomy by experienced thoracic surgeons was 1% and complications occurred in 37%. However, in the community at large, perioperative mortality (within 30 days) was 4.5–7.6% for lobectomy, depending on surgeon expertise [49, 50], and 4.9% for a wedge resection [50], the operation performed if a nodule was shown to be benign at surgery. Although it has been suggested that with careful CT follow-up, PET, and trans thoracic needle biopsy, benign nodules should not ever be resected, evidence shows that even in experienced hands, 20% of surgeries performed on screening-detected nodules are for benign disease [51].
|
|
|---|
Marshall et al. [52] found similar estimates, with a baseline finding of $5,940 per life year gained. After the model was run over scenarios that varied the prevalence of cancer and the estimate of lead-time bias, the cost-effectiveness estimate increased to a high of $58,183 per life year gained [52]. In a follow-up study, Marshall et al. [54] explored the costs of annual low-dose CT screening examined by modeling quality-adjusted life years (QALY). Her group determined that the incremental cost-effectiveness per QALY was $19,533, with a low of $10,800 and a high of $62,000 if excessive lead-time or overdiagnosis bias was present. Although public policy in the United States has never used these estimates to decide which health care interventions should be offered to the population, most cost-effectiveness ratios of accepted tests and therapies in medicine cluster in the range of $10,000–100,000 per QALY [55].
The previous findings regarding cost-effectiveness contrast markedly to those of Mahadevia et al. [56], who stratified individuals by smoking status (continuing, quitting at the time of the first screening, and former). The incremental cost-effectiveness per QALY gained was $116,300 for current smokers, $558,600 for quitting smokers, and $2,322,700 for former smokers. On the basis of these data, it was suggested that periodic screening of about half of the 50 million smokers in the United States could cost approximately $116 billion per year. Note that half the cancers detected in the Mayo screening trial were in former smokers [35]. The model was sensitive to the degree of stage shift (i.e., an increase in the number of patients diagnosed with early-stage, presumably curable lung cancers in the screened group and a decrease in advanced cancers), adherence to screening, the degree of length-time or overdiagnosis bias, the cost of CT, and anxiety about indeterminate nodules.
How can one have cost-efficacy estimates that vary from a low of $2,500 to a high of $2,322,700? The differences appear to be in how models are constructed and what variables are counted. The study with the highest estimates examined costs over a longer time horizon and considered numerous variables in its baseline model that the other cost-effectiveness studies elected to omit [56]. It can be argued that this effort is premature and that one should prove efficacy before attempting to assess cost-effectiveness.
|
|
|---|
However, unlike screening programs for the other common cancers, lung cancer screening could be the first screening program of its size that targets a population with a specific poor health habit—namely, cigarette smoking. One aspect of the screening debate that lacks information is whether the target group for screening (smokers) has different attitudes about the value of screening. In a recent nationwide telephone survey, attitudes about screening for lung cancer were ascertained among current smokers, former smokers, and nonsmokers [58]. This study of 2,001 persons found that smokers were significantly more likely than nonsmokers to be male, nonwhite, and less educated; more likely to report poor health status or having had cancer; and less likely to be able to identify a usual source of health care. Compared with nonsmokers, smokers were less likely to believe that early detection would result in a good chance of survival and are less likely to be willing to consider CT screening for lung cancer. Surprisingly, only half of the current smokers surveyed would opt for surgical resection of a screening-diagnosed cancer.
That study has several important implications. First, the demographic characteristics of current smokers are different from those of former smokers or nonsmokers, and the current smokers are less likely to have access to health care. Second, smokers have markedly different beliefs about their risk of cancer, their understanding of screening test characteristics, and the benefits of cancer treatment when the cancer is detected earlier. Third, smokers are less willing to pay for a screening test or undergo appropriate treatment (in this case, surgery) for a screening-detected cancer. Finally, smokers appear significantly less likely than their nonsmoking counterparts to consider CT screening for lung cancer. Combined findings in that study suggest that there may be substantial obstacles to successful implementation of a mass screening program for lung cancer directed toward cigarette smokers.
If screening for lung cancer is found to be efficacious, innovative programs will need to be developed to ensure that those at the highest risk for developing lung cancer (i.e., smokers) participate in the screening program.
|
|
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
C. J. Baglole, S. B. Maggirwar, T. A. Gasiewicz, T. H. Thatcher, R. P. Phipps, and P. J. Sime The Aryl Hydrocarbon Receptor Attenuates Tobacco Smoke-induced Cyclooxygenase-2 and Prostaglandin Production in Lung Fibroblasts through Regulation of the NF-{kappa}B Family Member RelB J. Biol. Chem., October 24, 2008; 283(43): 28944 - 28957. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Stanley Reflections on This Month's Wealth of Content Am. J. Roentgenol., March 1, 2008; 190(3): 555 - 555. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |