Alzheimer’s and other neurodegenerative diseases are generally incurable and often difficult to diagnose accurately. Yet early and accurate diagnosis of a neurodegenerative disease can potentially contribute to more effective treatment. Hence research efforts are moving towards early identification of high risk subjects and prevention of disease progression with biomarkers. Unfortunately dementia and biomarker studies are hampered by variables such as drop outs, challenges in comparing data sets, discordant biomarker sets, availability of histopathological confirmation at death, validity of cognitive testing, and nonlinear fluctuations in cognitive domains as disease progresses in vivo in subjects. This chapter is an assessment of the challenges in the early diagnosis of dementia, as well as a presentation of the issues faced in conducting dementia and biomarker studies.
- Alzheimer’s disease
- mild cognitive impairment
- early diagnosis
Although dementia is a priority for research globally, dementia studies are very complicated to design [1, 2]. Patents have a time limit which might expire prior to completing a trial, thus complicating contracts with a pharmaceutical company to use their drugs. Drug studies may involve issues related to the use of biomarkers which have not been validated for such use, like disclosure of biomarker results to participants. The treatment target for best outcome is still unestablished, and there are no guarantees that any treatment will work. In addition the odds of success are poor based on a string of crushing defeats so far [3, 4]. Pharmaceuticals pull out of trials because of the price and risk of not succeeding. Due to the slowly progressive nature of dementia, there is a huge time-lag between the commencement of trials and obtaining results. Dementia covers a multitude of specialities, including neurologists, geriatricians, nuclear medicine physicians, radiologists, psychogeriatricians, pathologists, and psychologists. Collaboration with colleagues from different sub-specialities and with regulatory agencies is needed to successfully conduct studies.
In any diagnostic entity, there is increased heterogeneity the earlier it is addressed, and so mild cognitive impairment (MCI) is a challenging population to study due to the heterogenous phenotypes, etiologies and prognosis, both cross-sectionally and longitudinally. Furthermore, similar symptoms can often be attributed to multiple different causes, each to varying degrees. Although there is a good amount of consistency between MCI studies themselves, increased heterogeneity in the actual early disease states does result in differences in outcome between MCI studies. The new research criteria for MCI due to Alzheimer’s disease (AD) is an attempt to eventually move beyond highlighting MCI as a major risk factor for AD to operationalizing the prognostication of cognitive impairment in clinical settings.
This chapter considers the methodological issues, challenges and assumptions that need to be taken into consideration when evaluating dementia and biomarker studies.
2. Challenges in data acquisition and analysis
2.1. Challenges in recruiting participants for dementia studies
Longitudinal studies are better at establishing causal directions than are cross-sectional studies. However it is not easy to recruit MCI participants, especially for a longitudinal dementia study . Factors affecting eligibility for enrolment include lack of awareness of the trial, lack of benefits to the participant, stringent enrolment criteria which may exclude many people, older age of study volunteers, co-morbidity factors, disability, lack of mobility, requiring the cooperation of a partner or carer, transportation, administration of medication, too many tests, and intensive monitoring of the individual’s condition and progress. In general, dementia trials usually take at least 5–6 years to discover whether a drug works or not, due to slow enrolment [6, 7]. Ramifications of this include slow development of potential new treatment, increased costs associated with clinical trials, and impact on the reliability of trial results due to changes which include scanners, investigators, personnel, and economic cycles.
In order to improve internal validity, studies may seek to make recruitment criteria more stringent so as to reduce the heterogeneity typically seen in a memory clinic. Yet in order for studies to be more relevant to clinicians, they also need to be anchored clinically, which means recruitment criteria cannot be too tough for participants to be enrolled. One way to increase the number of volunteers is to simplify recruitment enrolment criteria and screening processes. By being less stringent on suitable subjects for recruitment, more can be eligible for enrolment which helps to encourage referrals from clinicians.
2.2. Leveraging data sets
The support for small studies with less statistical and mathematical rigour to detect or demonstrate a response may be just as important as large randomised controlled trials to validate a response. Justifying resources to be spent on designing and running a study first requires more than just a good idea, but also supporting data from smaller studies, as well as available time-frame and interest. While big studies are often desirable for improving validity, relatively smaller longitudinal studies may be no less significant in exposing a scientific law, if data was collected and analysed the right way. We should remember that the modern science of genetics was founded on cross breeding yellow and green peas and their offsprings, at a time when many competing theories were making headway.
Research efforts are moving towards early identification of high risk subjects and prevention of progression. In the preclinical space, there is not yet a lot of longitudinal biomarker data. Longitudinal data provides important knowledge of biomarkers in predicting and monitoring cognitive and functional decline. To make the most of the limited data, use of both familiar as well as more sophisticated statistical techniques is required. There is a need for equations and formulas that can embrace heterogeneity without being too complex.
The Cox regression survival analysis is one statistical approach that can distill the heterogeneity of MCI aetiologies to determine independent risk factors for MCI conversion to AD. Cox regression is a survival analysis statistical technique that enables the simultaneous comparison and adjustment of the effects of several risk factors (i.e. the predictor variables or covariates) of an unwanted event occurring. It can also accommodate covariates that are dichotomous, continuous, and even if they might change in value. The required inputs are: time to an unwanted event of interest, the unwanted event of interest, and the predictor variables. The result is expressed as hazard ratios, which is the proportion of an unwanted event of interest between groups at an instantaneous moment in time. According to the Cox regression model, the hazard for an individual is a fixed hazard for any other individual. By inputting all known variables (risk factors) in a study cohort into the Cox model, we can adjust for all of them simultaneously.
2.3. Source of subjects, where and when the study was conducted
The source of subjects is a significant point that affects rates of conversion to AD . People seeking specialist care for memory loss are more selected compared with people in the community who happen to have some memory problems . Different studies have different aims and designs, and different methods to operationalize criteria . Cognitive complains can be spontaneous, yet not routinely elicited in some cases; and clinical assessments can be standardised in some cases but based on more subjective clinical judgement in others.
Recruitment sites are an important consideration in designing studies. Cohorts at different sites are demographically different in some ways, so academic sites perform differently from commercial sites. Some cohorts like the Australian Imaging Biomarkers and Lifestyle healthy control cohort are Apolipoprotein E ε4 (E4) enriched . The Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort consists of 398 MCI subjects, who were mostly white and highly educated, had intermediate cognitive measures and cerebral spinal fluid (CSF) biomarker levels between the ADNI controls and AD groups , and there was also a high proportion of E4 carriers.
MCI cohorts recruited today may not be entirely relevant to tomorrow’s world. Secular changes influence the predictive value of cognitive performance in dementia. For example, in the Flynn Effect , massive gains in IQ of Americans were observed between 1932 and 1978. Humanity seems to gain skills that make IQ tests outdated. Lifestyle technology development like software apps may further leverage our function and so delay residential care.
2.4. Challenges in comparing data sets
Retrofitting criteria and statistical models developed from experience with one cohort to another that has different demographic characteristics will end up with varying outcomes, not to mention the different combinations of measurements, cut-offs, number of subjects, and length of follow-up between samples that will further compound the variability of results [13, 14, 15, 16, 17].
Validity is gained when results are repeatable. Power is gained when shared data is combined. Sometimes data sets are easily comparable. For example, the ability of 3.0-Tesla (T) and 1.5-T scanners to track longitudinal atrophy in AD and MCI patients using tensor based morphometry are both similar and powerful enough to detect atrophy longitudinally , so it may not matter much that one cohort had their magnetic resonance imaging (MRI) on a 1.5-T scanner and another cohort had their MRI on a 3.0-T scanner. However in dementia studies, combining data sets is not a trivial issue. Comparing results from different studies that have used different methodologies is rather difficult. Combining data from different scanners introduces noise. Different positron emission tomography (PET) or MRI scanners have different scanner and software combinations. Inter scanner variability is excluded if all cross-sectional and longitudinal scans are performed on the same scanner—but this is not practical.
Lack of standardisation threatens to hamper the comparison and replication of results, increase analytical variability, and complicate the evaluation of methods . Different methods of biomarker analyses give varying degrees of precision . Drop outs or missing data are dealt with differently. Time lag between receiving a clinical diagnosis of subjective cognitive impairment (SCI) or MCI and enrolment differs between studies. If the time lag between diagnosis and recruitment is long, this might make one SCI or MCI cohort have more stable subjects, and so less likely to progress to a dementia subtype. Different population norms are used for neuropsychological tests, and different batteries of neuropsychological tests are used.
Given that the stability of cognition can be affected by many factors in the short term, it is important to consider what variables are corrected for when we read published studies. As mentioned above, a down side to robustly designed studies which are generally informative as they control for many factors, is that they may not simulate routine clinical practice well.
2.5. Drop outs and their risk factors
Drop outs in research studies due to relocation and loss of interest should be classified as random dropouts. However drop outs from MCI studies are not entirely random . Traditional survival analysis assumes censored observations are non-informative and ignorable . Yet death alters the probability of observing dementia.
Risk factors for cognitive and functional impairments in MCI can also be risk factors for dropping out early from MCI studies causing potential bias in the sample. For example, E4 is a risk factor for progression from a clinical dementia rating (CDR) of 0.5 to a CDR of 1 and above and a risk factor for cardiovascular mortality . Heart failure is a risk factor for progression from mild cognitive to severe cognitive impairment, and for functional decline . Stroke is a risk factor for non-amnestic cognitive and functional decline .
A joint modelling approach can potentially reduce the bias which attenuates the effect of neuropathology on cognitive decline. This bias occurs if non-random drop outs are excluded from analyses, or if the last observation carried forward method is used.
3. Diagnostic challenges
3.1. Accuracy of diagnosis
The dementia field is filled with many contradictory ideas and controversies. Accuracy of dementia diagnoses has been an unresolved challenge. For example, in the religious orders study involving over 1000 nuns, the majority of cases particularly in those over 85 have AD pathology as well as several other pathologies . Of the phenotypes that look like clinically probable AD, some had Lewy bodies or other predominant neurodegenerative disorders at autopsy.
3.2. Volatility of clinical outcomes
Diagnosing during the pre-dementia stages is challenged by fluctuations in cognitive ability over long periods of time . In short term MCI studies, outcomes are rather volatile, such that one can revert to normal, remain MCI with improvement or deterioration in cognitive abilities, convert to dementia, improve after deteriorating further, or deteriorate again after improving. For example, in the Rochester Minnesota longitudinal study, as high as 35% of MCI reverted to normal when followed long enough . However two-thirds of these ultimately progressed again to MCI or dementia. In the Pittsburgh longitudinal health study after over a decade of follow-up, a small percent return to normal after being diagnosed with MCI .
One way to account for the observed volatility is the rigid way disease and states are categorised. By taking a disease continuum and subjecting it to arbitrary boundaries, patients are likely to bounce in and out of them. Another cause of volatility is the random fluctuation of cognitive test scores up to half a standard deviation. Someone vulnerable near the cut-off could be having a good day and so their scores may be considered to be within the normal range, or having a bad day and so their scores may be considered to be within the MCI range. This variability of performance from day to day is not a trivial matter because it predicts future decline over and beyond cognitive performance . Consecutive clinical information should be taken more seriously as it may discount initial diagnoses.
The entire trajectory of cognitive decline in one at risk of AD is not necessarily due solely to AD. To date only up to half of cognitive decline can be accounted for by neuropathology seen on autopsies of brains, e.g. AD, micro and macro infarcts, Lewy bodies, TDP-43, pre-synaptic proteins, and neuronal density and locus . Pathology may trigger events or formation of other pathologies, thus causing people’s brains to differ in how they respond to the predominant neurodegenerative pathology. For example, mixed AD with Lewy Bodies will have more variability in their cognition due to attention impairment .
3.3. The paradox of Alzheimer’s disease biomarker validation studies
High quality studies validating the diagnostic utility of biomarkers involve blinding of clinicians to the biomarker results when making a clinical diagnosis, and blinding assessors of the biomarkers to the clinical diagnoses. However the diagnosis of clinically probable AD using standard criteria has an error rate of at least 20%, and definite diagnosis requires confirmatory pathology . Hence no biomarker study can outweigh the quality of the clinical diagnosis even if double blinding is the gold standard. Unblinding a clinician to an amyloid PET scan result introduces circularity in the validation of the amyloid PET scan. However doing so has value as it may actually improve the certainty of an AD diagnosis or correct a wrong diagnosis of AD.
3.4. Qualitative versus quantitative approach to diagnosis
The ability to accurately diagnose the clinical group to which a subject belongs is a crucial first step for appropriate management, and for clinical trial design. Categorising participants into MCI subtypes is heavily reliant on cross-sectional performance on neuropsychological tests as compared with a matched normal cohort. However, clinical assessment rather than quantitative variables takes precedence in assigning individuals into a dementia subtype. The problem with basing the MCI criteria on objective scores is that objective scores which are arbitrarily defined are required to support the subjective complains of symptoms which fluctuate. This system of categorising MCI helps to define MCI subgroups to facilitate research studies, but adds confusion when applied to assessing individuals. It has been observed in the ADNI cohorts that study variables have significant overlap between clinical groups, and that groups differ more qualitatively than quantitatively .
3.5. Conundrums in dementia studies
Even with histopathological confirmation of a definite AD diagnosis at death, it can be argued that there is always a degree of circularity in testing the predictive utility of any individual biomarker or clinical marker in high risk subjects for conversion to AD, unless each factor is not associated with each other. For example, if subjects are recruited from different sites, then regrouped by biomarker profile, those recruited from tertiary memory clinics are likely to both progress to AD faster and have positive biomarker or clinical marker profiles, whatever biomarker or clinical marker is used. Therefore in testing predictive utility for conversion to AD, comparing between at least two or more biomarkers or clinical markers, may enhance study quality.
All dementia neuropathological studies are designed based on neuropathologies we currently know how to identify. Neuropathologies that we do not know how to identify due to limitations in current histopathological staining techniques are pathologies that are not studied. Should they in fact be clinically relevant, we are unable to know this.
In order to test the concept that early intervention before disruption of neuronal integrity is key in successful therapy, subjects will have to be recruited at a stage where there is minimal disruption of neuronal integrity. However, if these subjects are recruited at too early stages of disease, they may not decline for the same reason that they are recruited, so results may be negative and they are not considered to have a disease but a syndrome. Having to recruit subjects with a syndrome but not a disease classification makes it harder to apply for research funding. If subjects are recruited after downstream processes have began, even though there is minimal disruption of neuronal integrity at enrolment, the treatment may not work. Yet it is easier to raise money when subjects are considered to have a disease.
3.6. Discordant biomarker results
Phenotypes can range between being atypical to being unambiguous. Clinical labels lose credibility when challenged by biomarker evidence which are themselves not perfect. It is possible for an amyloid PET scan to be positive and the CSF Aβ level to be high, and vice versa. It is possible for tracer uptake to be concentrated only on one brain region unilaterally. It is possible for tracer uptake to increase rapidly between serial scans within a relatively short space of time. It is possible for tracer uptake to decrease between serial scans. False negatives, albeit rare, have been reported with Pittsburgh compound B (PiB) scans . Even pathological confirmation, which is the gold standard, is not an exact science. Conflicting biomarkers add complexity to diagnosis and prognostication. It is important to apply Bayesian logic (i.e. post-test probability is affected by pre-test probability and the robustness of the test) when considering differentials.
3.7. Clinical diagnosis versus clinical deterioration
Clinical diagnosis does not necessarily predict deterioration over time. It is appropriate to conclude that having a positive amyloid scan will result in AD patterns of deficits developing, but this does not exclude significant co-morbid conditions from becoming the predominant contributing factor in cognitive or functional decline. Older persons may be living long enough to accumulate another threat to the body. Thus neurodegenerative pathologies may be more relevant in pre-terminal decline than terminal decline. Death is a competing risk for seeing the clinical syndrome develop, even though the pathology is there.
4. Principles and challenges in cognitive testing
Cognitive tests demonstrate cognitive performance. They should be considered an adjunct tool in the assessment and management of an underlying neurodegenerative condition. All tests are based on paradigms on how we learn information. In order to detect deficits, tests are designed to push people until they make errors. A low score does not diagnose dementia. A high score does not exclude dementia. A single score cannot be considered in isolation.
Confidence that cognitive tests accurately reflect subject cognition is important. Tests require a wide response distribution and evenness of scale to enable sensitive detection of clinical changes and assessment of the degree of deficits. Sensitivity to cognitive disease and change over time, enables tracking of disease progression, evaluation of treatment effectiveness, and maintains focus on the symptoms and disease of interest. Measures should be able to capture deficits, have low noise, and relate to biological markers. Characterising early presenters based on neuropsychological test performance should be detailed enough to make sense, but not overly precise—otherwise it can paradoxically complicate assessment and follow-up.
Data is currently lacking in how well tests track with amyloid. Longitudinal examination of different trajectories of cognitive decline over time can validate specific biomarker profiles, help to elucidate underlying mechanisms of disease, and predict clinical outcome. The challenge in observational studies is to be selective yet inclusive of tests that can be operationalised in all participants, and sensitive enough to track changes . Regulatory agencies require that measures are well experienced and understood . Application of technology can enable easier tailoring of cognitive and functioning assessment protocols to meet the needs of unique populations or settings, and extend the possibility of administering assessments and delivering interventions remotely .
Cognitive tests cannot extract specific unimodal factors alone. They all extract broad based processes. No neuropsychological test is orthogonal because testing is affected by many processes, like allocation of attention resources, language and executive function. All tests should be empirically derived from actual patients, then refined to improve sensitivity, reduce variability, and simplify use. When developing a test, having some overlap between measures to ensure concurrent validity is worthwhile, but there should not be too much correlation either. Some tests are more highly predictive than others. For example the semantic interference test was highly predictive of decline from MCI to dementia over an average 30 month period compared with standard memory tests such as memory for passage and visual reproduction .
4.2. The importance of pattern recognition
Cognitive testing is not specific for a neuropathology. External manifestations of results are due to a combination of neuropathology and cognitive reserve. Patterns of deficits on different sub-scores are important for the assessment of underlying pathology, so better testing approaches should distinguish between memory and non-memory cognitive domains. The possibility of a neurodegenerative disease is raised when there is a typical cerebral pattern of spread [38, 39, 40, 41]. This possibility is reduced when there is no overlap between deficit patterns on sub-scores and neurodegenerative subtypes. For example, since living items is the most impaired semantic category in AD, relatively poorer scores in this category compared with others raises the odds of AD. The pattern of scores should be interpreted in context to the patient’s situation, e.g. poor education, culturally and linguistically diverse background, co-morbidities, conditions of the testing environment, hearing aids, glasses, tester, etc.
4.3. Difficulties with cognitive testing
Cognitive measures may not be able to detect subtle changes or effects of underlying neuropathology due to cognitive reserve, ceiling effect, or floor effect. Cognitive measures should be sufficiently sensitive and specific to detect the effects being tested for, while being clinically meaningful at the same time. Delayed logical memory or face-name tests are examples of tests that can well detect amyloid deposition in the brain [42, 43].
Cognition is a heterogeneous construct, so while more sensitive and precise measures may emerge, there will be limits to applying them across different cohorts. Reference norms differ for different patient groups. For example, IQ-adjusted norms are used to predict progressive cognitive decline in highly intelligent older individuals . People who have individualised strategies for learning (that is, those with high cognitive reserve) will do much better in general, so neuropsychological testing can be quite noisy. Non-memory tests are generally less predictive of dementia in those with more education. Neuropsychological screening tools like the mini-mental state examination are cultural and language biased even with the use of an interpreter . Efficacy can be limited by ceiling effects and variability in subject performance over time. Cognitive testing may be more subjective than biomarker measurements as results can be influenced by the behaviour of persons conducting or taking the test, fatigue of the patient, and time of the day. Cognitive testing is susceptible to attention deficits, so delirium, depression, and distress can result in scores in the dementia range.
4.4. Non-linear decline trajectory
Cognitive decline in ageing and dementia follow a non-linear trajectory . However, during short time intervals of only 2–3 years, changes may appear to be linear. Acceleration over time (i.e. the non-linearity) is usually clearly seen with data points 7 years and beyond. Cognitive scales may be sensitive to early changes but do not work well later, or sensitive to changes in the later stage and do not work well earlier. While considerable work needs to be conducted to establish which tasks are sensitive at particular stages of the preclinical period, the rule of thumb is that the earlier the test, is the less precise it is. Still there is an increasing interest in developing tools to detect the earliest manifestations of cognitive decline in order to prescribe remediation strategies or measure effectiveness of treatment approaches. The more sensitive the measure, the less numbers are needed in a trial.
4.5. Composite scoring
Composite testing smooths individual scores to better average the overall score. A simple approach by deriving composite scores from combining different tests can enable more equality of different tests, reduce noise and facilitate a statistically more simple analysis of relationships between cognitive domains like memory and imaging data. This would simplify studies that make comparison between groups.
The best neuropsychological test batteries are not necessarily the longest or the most comprehensive. A certain degree of precision is required, but there may be no need to be overly precise. People do dread having their neuropsychological deficits pointed out, and it can be emotionally difficult for them to sit through a battery of tests. The size of a battery matters not as much as the quality of the precision of the battery in detecting degrees of cognitive deficits.
One way to validate such neuropsychiatric composite scores is to see if similarity of results can be obtained from different cohorts. Memory composite scores like the ADNI-Mem have been found to be comparable with other memory measures in the prediction of cognitive change over time, and could also differentiate changes over time. Such composite scores were associated with neuroimaging parameters .
4.6. Serial scoring and practice effects
Serial assessments enable better cognitive evaluation than cross-sectional assessment. For example, the trajectory pattern of serial scores helps to differentiate between dementia and delirium. While serial assessments are better than cross-sectional assessments, they become subjected to practice effects. Practice or re-test effects occur in non-demented adults . They involve episodic memory in learning test content, procedural non-declarative learning for familiarisation with task procedures, and anxiety reduction by desensitisation. Practice effects are not necessarily a nuisance as they themselves comprise a test. For example, one study showed that the loss of short-term practice effects portends a worse prognosis after 1 year in patients with MCI . When the Cogstate was repeated four times a day, having attenuated practice effect in non-demented participants detects MCI [50, 51].
5. Principles and challenges in biomarker use
A biomarker is any identifiable biological measurement that can be objectively measured; that accurately represents underlying pathology associated with disease, like blood, CSF, or imaging; and that changes with risk or expression of disease. Biomarkers in dementia measure directly, the neuropathology that is primarily responsible, like the amount of β-amyloid (Aβ) plaques in the Alzheimer’s disease brain (e.g. CSF Aβ42 and Aβ amyloid PET), and indirectly, their downstream effects, like the amount of neuronal damage (e.g. CSF tau and volumetric MRI) or synaptic dysfunction (e.g. FDG PET). Biomarkers should not be confused with genetic risk factors, e.g. Apolipoprotein E ε4 polymorphism.
The diagnostic goals of biomarkers in dementia are to ensure significant neuropathology is present or not present in people at risk of developing dementia, so as to increase confidence in making a dementia subtype diagnosis like AD or non-AD in atypical cases, to reduce subject numbers in clinical studies, and to reduce heterogeneity in a study cohort. The prognostic goals of biomarkers are to assess risk and proximity of future decline by serving as surrogate outcome measures to demonstrate effects on downstream targets of neurodysfunction and neurodegeneration, to help define the disease stage, and reduce trial duration. The theragnostic goals of biomarkers are to serve as end point measures to prove engagement of disease modifying treatment with Aβ plaques, and to select drug of choice.
Due to the added value that biomarkers bring, they enable us to hypothesise in a much more rigorous way how we conduct dementia studies. For example, the development of disease-modifying anti-amyloid therapies is now assisted by in vivo cerebral Aβ imaging to reduce the sample size by better selection of eligible volunteers for trials and to evaluate the efficacy of treatment. Biomarkers can help in planning which drugs are safe for AD drug trials by seeing if there had been some unexpected outcome in the brain. This would potentially improve safety, minimise cost which will in turn enable more drugs to be trialled while avoiding unsafe ones. Nonetheless, at this point in time, biomarkers are not used routinely in most clinical settings in dementia management. On top of limited access or support from current clinical guidelines, no neurodegenerative disease modifying drugs are currently licenced for routine use. However, should disease-modifying therapy become available, the issue of expanding infrastructure to meet the demands for biomarkers will be a subject of further debate. The potential for the usefulness of biomarkers is fully dependant on whether or not a cure for AD or non-AD dementias can be found.
The fundamental consideration with any assessment approach in dementia, whether with clinical bedside tests or with biomarkers is how precise a measure is in determining what it is meant to be detecting. To be used as surrogates for clinical measures, biomarkers need to be validated as reflecting clinical and/or pathological disease processes, taking into account the phase of disease where they have a high degree of specificity and sensitivity [52, 53]. Standardising procedures will reduce measurement errors in clinical trials. They should apply similarly to everyone no matter what race, language or culture they come from. Ideally, the biomarkers and clinical markers must be strongly associated, yet independent of each other, in order to be used as recruitment criteria and as outcome measures, yet avoiding circularity. However validating the relationship between biomarker change and cognitive outcome is an imperfect science. Considerable challenges remain in establishing the relationship between biological and cognitive measures throughout the chronology of the preclinical phase of AD.
A measurable biomarker needs to be operable clinically, have significant clinical implications if results are positive, and have clinical utility in terms of improving confidence in diagnosing, prognosticating or guiding treatment options. Unlike cognitive assessments, biomarkers offer more objective results and are considered complimentary to memory testing. They are highly valued for their ability to detect underlying structures or neuropathology in vivo. However the evaluation of biomarkers is an expensive endeavour, and cannot be carried out without collaboration between pharmaceuticals and public institutions.
The reproducibility of biomarker results can be affected by many factors. For example, discrepancy of biomarkers and cognitive tests can happen because of a plateau of biomarkers prior to cognitive change. Individual biomarkers of amyloid PET, MRI, FDG PET, and CSF in the ADNI cohort vary in their rate of change during disease progression, such that they fit better in sigmoidal models than linear models . An ideal biomarker should have a sensitivity, specificity, as well as positive and negative predictive values above 80% for whatever is it supposed to be testing for [55, 56]. Biomarkers are expensive. Risks, benefits and costs have to be discussed with the patient.
5.2. Operationalisation challenges
The challenges in operationalising biomarkers for clinical practice are: standardization of techniques; harmonising practices between settings; and developing infrastructure for community access to access them. In applying biomarkers in the clinical setting, we need to consider the noise and variability factors, whether these are going to present a critical issue when it comes to trying to apply this in cross-sectional or longitudinal evaluation. Different biomarkers provide different levels of certainty, are sensitive and specific at different disease stages and in different disease subtypes. Cross-sectional data of single time-point measures have less predictability than multiple measurements for seeing progression and outcomes in longitudinal data, which then in turn limits on-going participation. For most biomarkers, biomarker progressions are more associated with cognitive decline than baseline values . This suggests that clinical trials which require recruiting at-risk subjects could be improved by using progression rather than baseline values in biomarkers to enrich the study subjects. Further studies are warranted to estimate the incremental effectiveness of improving clinical trial statistical power by using biomarker progression criteria.
Biomarkers should only offer additional information which we are unable to obtain during routine history-taking, physical examination, and investigations. Their use is more appropriate when there is some uncertainty in the clinical picture. All test results must be carefully interpreted in the context of a patient’s clinical presentation. All tests have inherent limitations, so over-reliance on any test without first considering relevant clinical information is likely to lead to either over- or under-diagnosis, with potentially negative consequences. Hence we need to exercise our clinical judgement to consider how additional information helps in improving the probability of a dementia subtype diagnosis or in guiding treatment. Over-emphasising biomarkers at the expense of appreciating the context of an individual case may end up inappropriately prioritising less important aspects of a case.
Until an effect on a particular biomarker is reasonably likely to predict clinical benefit by widespread evidence based agreement, it should not be used routinely as a surrogate outcome measure in AD. The specific potential benefits of biomarkers as individuals transit from normal to SCI, SCI to MCI, or MCI to dementia states need to be identified and measured. Although further validation for currently available biomarkers is still required, advancement in the biomarker field is currently approaching a plateau, as there is still no biomarker breakthrough that can capture processes upstream to Aβ accumulation.
Finally, it is wrongly assumed that biomarkers are just as sensitive and specific for detecting neuropathology across the age range and across the disease stage. For example, since the standardised uptake value ratio (SUVR) is calculated using cerebellar grey matter as the reference region, in late to advanced stages there will be amyloid build-up causing reduction of SUVR. This has implications for longitudinal studies. The general reduction in amyloid load after the plateau with ageing may falsely suggest that treatments are working.
5.3. Cerebral spinal fluid biomarkers
CSF tau levels increase because of tau leaking from neuronal injury, and CSF Aβ levels decrease possibly because Aβ is crystallising in the cortices. The potential benefits of using cerebral spinal fluid biomarkers in AD research studies and prevention trials are the ability to: identify the presence of AD pathologies in the absence of cognitive symptoms; evaluate therapeutic target engagement; stage disease pathology; track progression of disease pathology; evaluate potential therapy-related disease modification; cost effectively assess multiple analytes in a single sample; and allow for better trial design with fewer subjects, shorter duration, and assessment of effects on the underlying disease pathologies.
CSF biomarkers are currently not routinely recommended for individual use in clinical practice. The disadvantage of CSF is that it requires a lumbar puncture. Not everyone is willing to have one, and also there is increased use of anticoagulation treatment in the elderly. Hence is it not suitable for population studies. Other challenges in the use of CSF include the lack of protocol and assay standardisation, sub-optimal assay reproducibility, difficulties in defining normal vs. abnormal cut-off values, misperception regarding safety, tolerability and utility of CSF collection and analysis, and the need for assay development and validity in the presence of a therapeutic agent, especially with antibody-based therapies. Agreement between CSF Aβ and florbetapir in ADNI subjects is reasonable but not great (κ = 0.72) cross-sectionally and longitudinally .
An analysis of within-site and inter-site assay reliability across seven centres using aliquots of CSF from normal control subjects and AD patients showed the coefficient of variation was 5.3% for Aβ, 6.7% for t-tau, and 10.8% for p-tau within centre, and it was 17.9, 13.1 and 14.6% for Aβ, t-tau, and p-tau respectively between centres . The reason for the inter-laboratory precision is not well understood.
Determining the threshold of a positive or negative biomarker result is arbitrary to some extent, and can be problematic. Yet it may significantly influence categories and outcomes. The essential difference between MCI and those considered to have normal cognition is evidence of objective impairment on cognitive test scores, even though cut-off scores are arbitrarily defined.
Different approaches to determining cut-offs yield different degrees of positives, and form a band of intermediates close to where the cut-offs are. A case can be made for cut-offs to be modified by age rather than by merely depending on a simple number, but this will increase complexity in the analyses. Examples of cut-off approaches include clustering analysis, 95th percentile, iterative outlier approach, absolute cut-off (e.g. SUVR over 1.50 for PiB scans), and greater than control mean plus two standard deviations.
CSF may be abnormal before PET and the discordance of low CSF Aβ42 levels with PiB depends on the cut-offs for both . Cases with discordance of both biomarkers are usually cases where one or both biomarker results are around the cut-off.
Cut-offs can have implications in the design of AD trials. Lower cut-offs for amyloid positivity ensure the sample subjects are more likely to have AD, and high cut-offs might avoid exposing individuals to the risks of treatment with little chance of benefit.
6. Ethical challenges in the disclosure of biomarker results
By and large, the medical community tends to blur the distinction between that which is kept strictly for research, and that applied in routine clinical practice. At present, the boundaries between current research guidelines in dementia research and clinical practice are not distinct. Research criteria have a strong potential to impact clinical practice, such that terminologies used in research settings easily become adopted into routine clinical practice.
Biomarkers in dementia give risk information only, and results can be inconclusive. Until a cure is developed, the distance between advancements in diagnosis and treatment continues to grow. A positive result is not a diagnosis. Not all with positive biomarker results will develop AD. Potential harms with study participation include confusion over inconclusive results, being given wrong diagnoses, stigmatisation, exploitation, discrimination, negative affective reactions , escalation of insurance premiums , loss of the right to drive, additional work conditions, and over-protection by law which can disadvantage employers.
6.2. Disclosure of biomarker results
Disclosure of AD biomarker results is an important consideration in dementia trials. Study designs that reveal increased risk may facilitate willingness to participate . People participate in studies because by knowing, they may potentially lower their risk, so they may give their time and effort . Similarly investigators are more in favour of disclosing scan results to MCI than to healthy controls . Communicating AD risk information has wide-ranging ethical, psychological, behavioural, and social implications. People have different views about whether or not they actually want to learn the results. Periodic assessments of mood and well-being, providing access to appropriate care if there are problems, and presence of a designate partner for support are important considerations for participation in studies.
The practice in ADNI has been not to disclose biomarker results to participants. Yet being in the Anti-Amyloid Treatment in Asymptomatic Alzheimer’s (A4) study means that a participant is declaring that he has a positive amyloid PET scan. No disclosure would be needed in the A4 study if it was designed as a three-arm randomised control trial with normal controls. However this would require greater sample sizes escalating costs and complicating the informed consent process.
Although biomarker use had been limited to research, clinicians in tertiary care are often involved in biomarker research, and have an interest in the biomarker result to guide management of their patients. Before biomarkers were officially approved for routine clinical use, specialist clinicians were already applying biomarkers results informally in clinical practice with the informed consent of their patients . It was openness for accumulating such experiences that drove thinking and enabled planning in biomarker validation studies. Clinicians are motivated to refer their patients for biomarker research studies, and patients are motivated to participate, when they can benefit from obtaining a copy of the results even if the biomarkers are not validated.
The more opportunities there are to use biomarkers in the clinical setting, the more we are going to find cases of amyloid PET scans showing intermediate levels of amyloid in the brain, particularly as cases requiring biomarkers to improve the diagnostic work-up tend to present with some degree of diagnostic dilemma. While these cases are the hardest to diagnose, they are also potential opportunities to further our understanding.
Both positive and negative biomarker results can benefit patients and families. A negative result brings relief, and unnecessary further clinical testing is avoided. A positive result when handled well enables early decision making when participants still have capacity, efficient channelling of resources, and it also encourages healthy lifestyle change.
6.3. Evidence-based disclosure practice
The problem with AD is not merely whether one has plaques in the brain or not, or whether people will want to know if they have the disease, but also how long do they have before they have to move into residential care, and if they do have the disease whether they can be eligible for costly drug treatment. One other consideration is what people will do once they get that information. While disease modifying treatment is currently only available by participating in drug trials and may offer a glimmer of hope, it does have side effects and is not guaranteed to work. Clinicians need to be sensitive to the negative impact breaking bad news can have on patients, and be ready to provide support, like disease counselling. Regardless of whether patients want to know, the disease will progress, and confidently diagnosing AD will help them and their relatives make firm plans.
The need to mitigate the potential harm must be balanced by the patient’s right to know their result. Cognitive biases in affective forecasting may over- or under-estimate reactions to negative events. Empirically validated methods of disclosing risk information can inform practice and policy, and avoid speculation of how long and how intensely negative reactions will last following disclosure. The full long term downstream effects of finding out and of how individuals and families interpret and handle the information is not known, so these people should be followed to observe the effects of disclosure.
One study that followed 148 cognitively normal people participating in a randomised clinical trial of genetic testing for Alzheimer’s disease for 1 year after risk assessment and E4 disclosure showed that those tested as positive were 5.76 times more likely to have altered their long-term care insurance than those who did not receive E4 genotype disclosure . Nonetheless the broader literature suggests that receiving a diagnosis of MCI or AD did not increase depression or anxiety in patients nor their carers in the short term, and anxiety often decreased . One study that assessed the impact of genetic risk assessment on adult children of people with AD showed a slight increase in the impact of event between E4 carriers and non-carriers at 6 weeks, but the effect washed out at 6 months . Hence E4 status can be revealed safely to patients without risk of long-term depression or anxiety.
7. Final word
Other than finding a cure, promoting healthy brain ageing is also important. This can be done by determining and promoting those factors that promote longevity and healthy brain ageing. Promotion involves staying mentally and physically active, staying socially engaged, and controlling cardiovascular risk factors like weight, blood pressure, cholesterol, and blood sugar, quitting smoking and having a balanced diet.
The need to be persistent, to innovate and to move forward is urgent despite numerous challenges. Whether we choose to address the conundrums or ignore them because of technical difficulties, the tsunami of the dementia epidemic will hit us in a few short years. Fortunately the dementia field has been very motivated. In spite of the numerous challenges in developing new models of understanding, diagnostic criteria, clinical markers, biomarkers, treatment, and improving diagnostic accuracy, the field is marching towards addressing, and intervening in, AD in its early stages.
Finally, attention to the nuances and caveats, and applying little tweaks in study designs can improve efficiency and study quality, reduce risk, and shed new insights.
Mrs. Judith de Grauw assisted with baseline copyediting of this chapter.
Conflict of interest
No conflict of interest to declare.