Modelling and Inference in Screening: Exemplification with the Faecal Occult Blood Test

The projections for future growth in the number of new patients with colorectal cancer in most parts of the world remain unfavorable. When we consider the substantial morbidity and mortality that accompanies the disease, the acute need for improvements and better solutions in patient care becomes evident. This volume, organized in five sections, represents a synopsis of the significant efforts from scientists, clinicians and investigators towards finding improvements in different patient care aspects including nutrition, diagnostic approaches, treatment strategies with the addition of some novel therapeutic approaches, and prevention. For scientists involved in investigations that explore fundamental cellular events in colorectal cancer, this volume provides a framework for translational integration of cell biological and clinical information. Clinicians as well as other healthcare professionals involved in patient management for colorectal cancer will find this volume useful.

screening saves lives (Colon Cancer Prevention Project, 2011). Prevention efforts in the population requires reliable estimates of the sensitivity of the test, the sojourn time of the disease, the transition probabilities from the disease-free state to the preclinical state, the lead time of the disease and the indirect effects in the screening per se in the estimates of rates of the disease. The aim of this chapter is to introduce the concept of probability modelling in colorectal cancer screening, and the statistical methods developed by the authors in this area (Wu et al., 2005(Wu et al., , 2007(Wu et al., , 2009a(Wu et al., , 2009b. We will estimate these essential components from a population based perspective. In section 2, we provide the definition, model, methods and application of essential parameters needed when estimating indicators of cancer screening. In section 3, we provide the methods and application for estimating the distribution of the lead time in cancer screening. In section 4, we provide the definition, methods and application when evaluating the long term screening outcomes in CRC. Finally, conclusions and recommendations for future research are provided in section 5. We will focus on one particular test, the faecal occult blood test (FOBT). FOBT has been used as a sign of colon cancer, given that tumours tend to bleed and blood in the stool can be detected using this test. We will apply our methods to the Minnesota Colorectal Cancer Control Study (MCCCS) (Mandel et al., 1999), to inform the readers about the benefits of probability modelling in colorectal cancer screening using FOBT, as well as reached recommendations for the test. The Minnesota Colorectal Cancer Control Study (MCCCS) was carried out between 1976 and 1982 in Minnesota, U.S.A. (Mandel et al., 1999). Approximately 46,000 subjects were randomized to receive either: five annual FOBT screenings, three biennial FOBT screenings or no screening (usual care at the time of the study). Each screening cycle consisted of six hemoccult slides (Hemoccult®, Beckman Coulter, Palo Alto, California); about 83% of slides were re-hydrated. If any of the slides was positive, then the screen was considered positive and a definitive follow-up exam was done, including colonoscopy (Mandel et al., 1999). Due to a lower than expected death rate among the usual care group, the investigators resumed screening between 1986 and 1992. We restricted this analysis to the annual group and to the original five screenings.

Sensitivity, sojourn time and transition probability in colon cancer screening
We assume that the disease develops by progressing through three states, denoted by 0 p c SSS . 0 S represents the disease-free state. p S represents the preclinical disease state, in which an asymptomatic individual unknowingly has disease that the screening exam can detect. Similarly, c S represents the clinical state when the disease manifests itself in clinical symptoms. Sensitivity is the probability that the screening exam is positive given that the individual is in the preclinical state p S . The sensitivity cannot be easily estimated from data collected during screening, but can be estimated using probability modelling (Wu et al., 2005(Wu et al., , 2009b. We exemplify the rationale for this issue using Table 1. Let us assume the data in Table 1  Sojourn time refers to the time beginning when the disease first develops until the manifestation of clinical symptoms, which is the time length in the preclinical state. For individuals diagnosed with cancer by screening exams, they will be treated immediately; hence the onset of the clinical state c S is not observable. For individuals diagnosed with cancer between screenings (the interval case), though the onset of the clinical state is available, the onset of the preclinical state is still unknown. Therefore, estimation of the sojourn time distribution is difficult from data collected in screening studies. However, the sojourn time duration can be estimated under model assumptions, the preclinical phase of colorectal cancer may last more than 5 years (American Cancer Society, 2011, Prevost et al., 1998. The transition probability into the preclinical stage is the probability density function (PDF) of making a transition from the disease-free state to the preclinical state. It is continuously changing with one's age (Wu et al., 2009a) on CRC, and is difficult to estimate without proper modelling. These three parameters are the key parameters for the estimation of other important indicators in cancer screening, and they cannot be easily estimated from data. We will briefly review the age-dependent likelihood method that we used in estimating these three parameters, and provide the key result using the MCCCS data (Wu et al., 2005(Wu et al., , 2009b.

Model and method
Consider a cohort of initially asymptomatic individuals who enrolled in a screening program. The sensitivity is (t), where t is the individual's age at the screening exam. The probability density function (PDF) of making a transition from 0 S to p S at age t is () wt . Let (1) To facilitate the understanding of this likelihood function, we will describe it in terms of the MCCCS age groups. In the MCCCS, the initial age of participants varied from 28 to 90 years old, among men, and 36 to 93 years old, among women, so this is the range of 0 t . Because the MCCCS required five annual FOBT screenings, K = 5. The 0 , kt D is the probability that an individual will be diagnosed at the k-th scheduled exam given that she/he is in p S (see equation 2 and 3); and the 0 , kt I is the probability of being an incident case in the k-th screening interval (see equation 4).
in the above formulae. We modelled the age effect t and the time duration x in the preclinical state very carefully using a parametric model stated in equation 5.   Wu et al. 2005Wu et al. , 2009b shows the detailed justifications on how these age effect functions were chosen. Models in equation 1-5 were estimated using programs C/C++ (Silicon Graphics, I, 2003, Stroustrup, B, 2011 and we applied the likelihood separately for men and women in the MCCCS. Markov Chain Monte Carlo (MCMC) was used to generate random samples from the joint posterior distribution of the parameters in the likelihood for Bayesian inference (Wu et al., 2005(Wu et al., , 2009b. The posterior distribution within the MCMC was partitioned into four sub-chains, e.g. sampling the posterior distribution for 2 01 (,) ,, , (,) bb    separately. Non-informative priors were used for all parameters (Wu et al., 2009b). Each MCMC was run for 20,000 steps; after a burn-in of 15,000 iterations, then posterior samples were collected every 20 steps, which finally provided 250 samples from each chain (Wu et al., 2009b). Because four overdispersed chains were simulated using MCMC, a pool of 1000 posterior samples were used for the analysis presented below. These Bayesian posterior samples are notated as * j  .
Bayes estimates of the highest posterior density (HPD) interval were also computed, which are similar to confidence interval from a frequentist perspective and also known as credible intervals from the Bayesian perspective.

Results
The original FOBT screening data from MCCCS for each age group, male and female, are published in table 1 and 2 in Wu et al. 2009b

Distribution of the lead time in colorectal cancer screening
The goal of screening is to catch the disease before clinical symptom appears. This means that the detection and removal of any precancerous growth is important as well as the diagnosis of cancer at an early stage. To understand this, several time events are essential to prevention efforts and they will be described briefly here. If a person enters the preclinical state ( p S ) at age 1 t , and his/her clinical symptoms present later at age 2 t , then 21 () tt  is the sojourn time in the preclinical state. If a person is offered a screening exam at some time point t within the interval 12 (,) tt , and cancer is diagnosed, then the length of the time 2 () tt  is the lead time. We consider lead time as the time gained by screening for that particular person.

Methods
We will briefly review the probability distribution of the lead time derived under a progressive disease model (Wu et al., 2007(Wu et al., , 2009a. Assume there are K ordered screenings that, for a specific individual, occur at ages 01 The lead time distribution is a conditional distribution given that someone will develop clinical disease before death. We let D represent a Bernoulli random variable, with 1 D  indicating the development of clinical disease and 0 D  indicating the absence of the clinical disease before death. We use L to denote the lead time. We consider the lead time to be zero for individuals whose disease is not detected by the regular screening exam but who develops clinical symptoms between exams. The distribution of the lead time is a mixture of the conditional probability (0 | 1 ) PL D  and the conditional probability density function (| 1 ) L fz D  , for any 0 0 zTt   . Here, T represents the span of the human life, which is a fixed upper bound, and 0 t is the individual's age at his/her initial screening exam. We define the same ( ) ,( ) ,() ,() tw tq xQ x  as in Section 2.1. The distribution for the lead time was derived and presented in equation 6 (Wu et al., 2007(Wu et al., , 2009a. , and it was derived as: for all j=1, 2… K, with () ii t   is the sensitivity at age i t . The joint PDF (, 1 ) L fz D  in equation 6 was derived and presented as: We used the posterior samples *

Results
We applied our method to make predictive inference of the lead time using FOBT for males and females. We assumed for this simulation that the initial age is 50, and an ending age of 80. It is clear that the lead time distribution is a function of the sensitivity, the sojourn time distribution, the transition density, the screening frequency, and the initial age and ending age. Accurate estimation of the sensitivity, the sojourn time distribution and transition density were acquired from MCCCS study in section 2. Now, we plugged the estimates obtained from Section 2 into the simulation equation 10 in Section 3.1, leading to the estimation of the lead time distribution under different screening scenarios. In other words, we estimated what the results would be if people were screened at different screening intervals. The results are summarized in table 2 in Wu et al. 2009a. The time interval between screens was 6, 9, 12, 18 and 24 months, within ages 50 ( 0 t ) and 80 years ( T ). Also, the density curves for the lead time are shown in Figure 2 and 3 in Wu et al. 2009a for different screening intervals for both males and females. From those results, if a man begins annual screening (i.e.  = 12 months) when he is 50 years old and continues until he reaches 80, then there is a 18.87% chance that he will not benefit from early detection by the screening program if he develops colorectal cancer during those thirty years. His chance of no-early-detection from the screening program decreases to 6.45% for screening exams conducted 6 months apart. While for females, the chance of no-early-detection is 9.48% for annual screenings and 2.39% for screening every 6 months. Also, Table 2 in Wu et al. 2009a showed that the mean lead time increases as the screening time interval decreases for both males and females. In other words, more screening exams will contribute to a longer lead time, which would translate to treatment of the disease at an earlier stage and, potentially improved prognosis. The increase in the mean lead time is partly due to the smaller point mass for zero lead time when screening exams are closer together. The standard error of the lead time decreases as the time between screening exams increases. Similarly, Table 2 in Wu et al. 2009a revealed that the standard deviation for the lead time was larger than the mean lead time (Wu et al., 2009a). In the same table, the mode of the lead time, which is the value that is most likely taken by the lead time when it is positive, was 0.68 years (or 8 months), corresponding to screening exams every 6 months for males, and 0.96 years (or 11.5 months) for females (Wu et al., 2009a). With annual exams, the mode value for the lead time is 0.60 years (6 months) for males and 0.78 years (9.4 months) for females.

Evaluating long term screening outcomes in colorectal cancer
Recently there have been heated arguments in the topic of over diagnosis, the diagnosis of ``disease" that will not cause symptoms or death during a patient's lifetime (Lichtenfeld, J L, 2010). Some profound questions should be asked with regards to over diagnosis. How do we evaluate the long-term outcomes due to continuous regular screening? Will regular screening exams contribute to a greater chance of over diagnosis? What are the percentages of over diagnosis and true-early-detection among the screen-detected cancer patients? How should the probability of no-early-detection and the probability of disease-free-life be estimated? Some research has been done in the area of over diagnosis. However, the majority of research in this area has been based on observational studies, and mainly in breast cancer (Day, 2005, Duffy et al., 2008, Welch & Black, 2010, Zackrisson et al., 2006, there is little reference to this problem in colorectal cancer. The flaws of using observational studies are obvious: (a) the result based on one study cannot be extended to other scenarios. The reason is that for one particular study, with one particular screening interval, the result may be correct, however, one cannot use this result to make inference for studies with different screening intervals or different cohorts without probability modelling. On the other hand, it is clear that it is of great value to policy makers to know how the proportion of over diagnosis is changing with screening frequency, sensitivity of the screening modality, and other risk factors; (b) using observational studies usually needs a long follow-up period to collect cancer incidence data from both the screening group and the control group, because most of the observational studies compares the incidence rates in the two groups to estimate over diagnosis. This is not cost effective, and the inference maybe biased. To overcome these flaws, we used probability modelling, and instead of dealing with over diagnosis alone, we will address the long-term outcomes for the whole cohort, with over diagnosis as one outcome. All initially superficially healthy participants will be classified into four mutually exclusive categories: true-early-detection, no-early-detection, over diagnosis and symptom-free-life (Wu & Rosner, 2010).
-Case 1 (Symptom-free-life or SympF): A person who took part in screening exams that never detected colorectal cancer, and ultimately the person died of other causes. -Case 2 (No-early-detection or NoED): A person who took part in screening exams, but whose disease manifest itself clinically and was not detected by screening. -Case 3 (True-early-detection or TrueED): A person whose colorectal cancer was diagnosed at a scheduled screening exam and whose clinical symptoms would have appeared before death. -Case 4 (Over diagnosis or OverD): A person who was diagnosed with colorectal cancer at a scheduled screening exam but whose clinical symptoms would NOT have appeared before death. Every participant who takes part in the screening will eventually fall into one of these four outcomes. It is hoped that this will provide a systematic approach and a frame work for the evaluation of long term outcomes in cancer screening.

Methods
For an initially asymptomatic individual taking K screenings at their ages 01 For an individual currently at age 0 t , his/her lifetime is random, and it would not be practical to fix the number of screening exams to any fixed number K . If, however, he/she follows a pre-planned screening schedule, or, more simply, if he/she plans to be screened every 12, 18, or 24 months, then the probability of each outcome when his/her lifetime T is longer than 0 t can be obtained using equation 16.
Where the lifetime probability density function The probability for each of the four cases is a function of the sensitivity () t  , the transition probability density () wt , the sojourn time distribution () qx , a person's age at the first screening 0 t and his/her future screening interval  . The age-dependent sensitivity () t  , the age-dependent transition probability, and the sojourn time distribution () qx , were estimated from the MCCCS data (Wu et al., 2009a) and were given in Section 2.2. Given the MCCCS data, the posterior predictive probability of each case can be estimated as:

PC a s e i T t M C C C S PC a s e i T t n
Where * j  is the MCMC random sample drawn from the posterior distribution and n  1000 is the posterior sample size. Furthermore, we defined a diagnosed case as when either an interval clinical incident case or a screen-detected case happens in a study. Researchers may be interested in the proportion of "no-early-detection", "true-early-detection" and "over diagnosis" given that it is a diagnosed case. For example, among females, what are the estimated probabilities of "no-early-detection", given that a woman has been diagnosed with colorectal cancer, either through scheduled screening exam or not. Last but not least, researchers are most interested in the probabilities of "true-early-detection" and "over diagnosis" given that it is a screendetected case. All of these conditional probabilities were also estimated using equations 12-19 using the definition of conditional probability.

Results
In section 2.2 we estimated * j  as a MCMC random sample drawn from the posterior distribution. A total of 1000 posterior samples were put into equation 19 to conduct the Bayesian inference. This Bayesian inference assumed that there is a program consisting of periodic screening exams from three hypothetical cohorts of asymptomatic individuals. Those cohorts have initial ages of 40, 50 and 60 at the first screening exam for males or females. For each group, we examined various screening frequencies, with screening interval  = 12, 18, and 24 months. For the lifetime distribution, we used the actuarial life and transition probability obtained from the MCCCS data for males and females. Overall, the proportion of over diagnosis was very small, less than 0.3% for any age and gender. The probability of "true-early-detection" for males was between 1.91% (at 60 years old, with 24 months as screening interval) and 3.28% for 40 years old, with 12 months as screening interval. Correspondingly, the probability of "true-early-detection" for females was between 2.75% for 60 years old, with screening interval of 24 months and 3.76% for 40 years old, with screening interval of 12 months. Regardless of the age, the probability of "true-early-detection" slightly decreased as the screening time interval increased and overall, the probability of "true-early-detection" was consistently lower for males than for females. The probability of "no-early-detection" for males was between 0.53% for 60 years old, with screening interval of 12 months to 1.95% for 40 years old, with screening interval of 24 months. The probability of "no-early-detection" for females was between 0.29% for 60 years old, with screening interval of 12 months to 1.34% for 40 years old, with screening interval of 12 months. In general, the probability of "no-early-detection" slightly increased as the screening interval increased, and slightly decreased as the age at initial screening was older. The probability of "symptom-free-life" was very large (e.g. over 95%). Regardless of age or gender, the probability of "symptom-free-life" was almost constant for any number of months between two screenings. For example, among 50 years old males, the probability of "symptom-free-life" was 95.84% if 12 months was the screening time interval; 95.87% if 18 months was the screening time interval; and 95.90% if 24 months was the screening time interval. The sum of the probability of all four cases should add up to 1, and it was observed in the simulation, the total probability is above 0.998, due to simulation accuracy, this is clinically insignificant. The box plot of the probability for each case when 0 t =60 is given in Figure 1. Within each box plot, the three left-hand-side boxes represent females and the three right-hand-side boxes represent males, and these probabilities are presented at different screening intervals. We decided to present the box plots when the initial screening age was 60 but similar box plots were observed for 0 t = 40 and 50. Again, we see in Figure 1 that the probability of "symptom-free-life" and the probability of "over diagnosis" are pretty stable over the screening time intervals, regardless of gender. The probability of "noearly-detection" increased monotonically with the screening time interval, while the probability of "true-early-detection" decreases monotonically with the length of the screening time interval. The estimated conditional probabilities of "no-early-detection", "true-early-detection" and "over diagnosis", given that it is a diagnosed case, for females and males were estimated. Among the initial age of 40 years-old women group, the percentage of over diagnosis given that it was a diagnosed cancer case was 5.04%, if she was screened every 24 months apart; and 6.50%, if she was screened every 12 months apart. Similarly, the estimated conditional probabilities of "true-early-detection" and "over diagnosis" given that it is a screen-detected case, for females and males were also estimated. Among the 40 years-old women initial age group, the percentage of over diagnosis among the screen-detected cases was 6.75% (95% HPD: 2.56%-19.27%), if screened every 24 months apart, and 7.12% (95% HPD: 2.76%-19.91%), if screened every 12 months apart.  Figure 2 shows the probabilities of "true-early-detection" and the probability of "over diagnosis" among those whose cancer would be diagnosed by regular screening exam for the initial-age-60 group of both genders. In Figure 2, the screening time interval for males and females are presented for 12, 18 and 24 months. The estimated mean percentage for "true-early-detection" and "over diagnosis" given that it is a screen-detected case was similar for males and females. However, the 95% C.I. for males were much larger than that for females; this indicates that there is more uncertainty of these probabilities for males.

Discussion
We presented some results in probability modelling and statistical inference in colorectal cancer screening, using FOBT as an example. As we have shown in section 2, the three key parameters are the sensitivity of the screening modality, the transition probability of the disease, and the sojourn time distribution. All other parameters of interests can be expressed as a function of these three key parameters, hence accurate estimation of them is very important. These three key parameters are the building blocks in the cancer screening model, many researchers are striving to improve the modelling and get more accurate estimates of these parameters. Although, other researchers have also estimated the sensitivity and the mean sojourn time in fecal Hemoccult testing, using data from Calvados, France between 1991 and 1994 (Prevost et al., 1998), their models are different from the progressive model that we used here. Prevost et al. (1998) modelled the incidence of cancer as a Poisson random variable, with different parameter value for the mean of the Poisson distribution (Prevost et al., 1998). Another difference is that their sojourn time was assumed to follow an exponential distribution. They reported that the mean sojourn time increases with age, which is approximately two years among 45-54 years-old, 3 years among 55-64 years-old, and 6 years among 65-74 years-old (Prevost et al., 1998). Their estimation of sensitivity decreases with age, which is approximately 75% among 45-54 years-old, 50% among 55-64 years-old, and 40% among 65-74 years-old (Prevost et al., 1998). Church et al (1997) used the same Minnesota study (MCCCS) to estimate the sensitivity. However, their estimate of program sensitivity is about 90%, regardless of age (Church et al., 1997). Our estimates are more accurate for different age groups as reported in section 2. There are other data sets that were used to estimate the FOBT screening sensitivity and mean sojourn time. For example, French data reported by Launoy et al. (1997) estimated the FOBT mass-screening sensitivity to be about 50% (Launoy et al., 1997). Their estimated mean sojourn time was longer than our results, between 4.5 and 5 years for all combined cancer cases. Also, these researches showed that the estimation of the sensitivity and the sojourn time maybe negatively correlated (Launoy et al., 1997). Better modelling strategies are needed to handle this situation. We plan to explore solutions accounting for the negative correlation between the sensitivity and the sojourn time to solve this problem. There is little research in the topic of lead time bias or the lead time distribution except in Wu et al. (2009) (Wu et al., 2009a). Since there is convincing evidence that FOBT and/or other colorectal screening modalities can significantly reduce mortality (Mandel et al., 1999), the U.S. Preventive Services Task Force has recommended screening people between 50-75 years-old since 2008 (U.S.Preventive Services Task Force, 2008). Unfortunately, the compliance to colorectal cancer screening is low in the U.S. and the world (Sarfaty & Wender, 2007). We hope the lead time results from our simulations and models (Section 3) can provide some helpful information to general audiences about the benefit of taking screening exams. There is almost no research in the topic of over diagnosis or long-term outcomes in colorectal cancer, to our knowledge. As the first of the baby boomer generation turns age 65 this year, evaluating the long-term outcomes will provide useful information and great insights to policy makers. We hope our method will provide a frame work and a systematic approach for evaluation purposes. To explore this topic more, we will need to obtain more recent screening data. We are exploring if data from the Prostate, Lung Colorectal and Ovarian (PLCO) cancer screening trial can be released to us (National Cancer Institute Division of Cancer Prevention, 2011). Our future research topic includes three areas: (1) exploring the relationship between sensitivity and the sojourn time distribution, and building up a better modelling strategy for these three parameters; (2) exploring the optimal screening interval based on an individual's screening history; and (3) exploring the survival benefit from screening after removing the lead time bias, hence we can have a better understanding of what we gained from screening. We hope the research will benefit the health of the general population.