Number of persons classified with ‘significant CAD’ and ‘No CAD’ by a CAD Finder Test among a population of male Olympic sprinters (low prevalence population) (93% True Positives (465); 20% False Positives (1900)).
The number of testing modalities available for the diagnosis of significant coronary artery disease has grown over the last few decades. Inappropriate utilization of these tests often leads to: (i) further investigation, (ii) physician and patient uncertainty, (iii) harm and poor outcomes, and (iv) increase in health care costs. An informed approach to the evaluation of the patients with stable ischemic chest pain can lead to efficient use of resources and better outcomes. Throughout the course of this chapter, we will explain how the applications of age-old statistical principles are still relevant in this modern era of technological advancement.
- Bayes’ theorem
- coronary artery disease
- ischemic heart disease
- appropriate use criteria
Cardiovascular diseases (CVDs) remain a leading cause of death across the world . Ischemic heart disease (IHD) is one of the largest contributors to these deaths both globally and in the United States of America  and contributes to years of productivity loss due to complications from disease sequelae. These include non-fatal myocardial infarction, stable angina pectoris and symptomatic ischemic cardiomyopathy. Although the number of deaths resulting from fatal MI has been decreasing, the number of quality years lost from IHD complications has been increasing . The decrease in mortality is largely due to interventions for the management of acute coronary syndromes (ACS) and early percutaneous coronary intervention (PCI) for ST-segment elevation MI [4, 5].
There are a variety of scoring systems/tools which have been used to predict (with varying degrees of success), which patients are likely to have obstructive coronary artery disease as a cause of their chest pain .
The most frequently used clinical decision making tool to decide the likelihood of CAD on the basis of patient characteristics, is the Diamond-Forrester classification . It is derived from the application of Bayesian principles  and has formed the backbone of many of the guideline statements for the management of patients with suspected stable ischemic heart disease [9, 10].
Due to the limitations of the history and physical examination in determining the likelihood of disease, clinicians have utilized various testing modalities to further increase certainty. Evaluation of chest pain has been no different. The number of available testing strategies has increased over the last few decades, and the technologies underlying these tests are constantly being refined. Despite the growing number of options, many clinicians remain unsure how to utilize these modalities [5, 11]. The increasing utilization of these tests often leads to: (i) further investigation, (ii) physician and patient uncertainty/anxiety , (iii) harm and (iv) increase in health care costs.
We aim to cover the following in our chapter review:
Briefly simplify the principles of Bayes’ theorem. Give a brief overview of the concepts of how the post-test probability of disease varies based on pre-test likelihood and test characteristics.
Review key stratification methods for the likelihood of significant CAD.
Outline the test characteristics of the main functional and anatomic imaging modalities used in chest pain evaluation based on available evidence.
Use practical examples to show how the use of low sensitivity/specificity testing in varying patient groups can lead to post-test uncertainty and the need for further testing.
Explain how many of the currently available appropriate use criteria guiding testing, conflict with Bayesian principles.
Bayes’ theorem has been previously applied in many clinical scenarios, including the evaluation of chest pain [13, 14, 15, 16]. This chapter will neither be burdened with complex statistical formulas nor difficult to follow calculations. Rather, it will provide a practical approach to decision making and dealing with diagnostic uncertainty in patients with stable chest pain. Though many of the concepts expressed here are not original to the authors, we hope that this review will provide a comprehensive approach to testing—considering patient outcomes and resource utilization. The almost three century old principles of the Bayesian approach to decision making are just as relevant today with the growing technological advancements in the new era of precision medicine.
2. Bayesian approaches in clinical decision making
2.1 Clinicians and statistics
Health-care professionals at all levels of training and expertise often struggle with conceptualizing many statistical and probability ideas [17, 18]. Even for the most experienced mathematicians, the complex calculations involved in large decision making scenarios using Bayesian approaches are hard to comprehend, far less compute .
To follow Bayesian approaches in clinical decision making, it does require the understanding of some key statistical concepts. We hope to present this in an easy to follow format that is based in evidence .
The widely referenced Harvard Medical School cognitive experiment (see Appendix A) (and the research which followed) has provided much useful insight into how we approach probability testing. Two of these insights are:
Medical professionals often fall victim to what is referred to as the “base-rate neglect” fallacy. Here-in they place unwarranted reliability to the outcome of a test—positive or negative, ignoring the relevance of prevalence in the population.
Stated differently: A positive test result for a rare disease is more likely to be a false positive, regardless of how well the test can detect the presence of disease (sensitivity). The converse is also true, that a negative test result in a population in which there is a very high prevalence, is more likely to be a false negative.
When results are presented as frequencies rather than probabilities, they are easier to follow. Take this example from Fenton and colleagues :
Out of 1 million people 1000 are likely to die from treatment A, but only 10 are likely to die from treatment B.
The probability of dying from treatment A is 0.001, but the probability of dying from treatment B is 0.00001.
Before applying Bayes’ theorem to the evaluation of chest pain, we will review some of the key statistical and probability concepts necessary to gain an understanding of Bayesian approaches.
2.2 The characteristics of diagnostic tests
There are a few characteristics of diagnostic tests which are paramount to the understanding and use of Bayesian arguments. These include sensitivity, specificity, positive predictive value, negative predictive value and likelihood ratios. There are many factors which influence the reliability or these values.
Sensitivity and specificity are often explained in complex statistical terminology, however, they can be defined very simply:
Sensitivity (Sens.): The ability to pick up disease when disease is present
Specificity (Spec.): The ability to rule out disease when there is no disease
Let us use the example of a hypothetical test designed to detect patients with CAD called ‘CAD Finder’. We have two Groups of patients, Groups A and B (Figure 1). The 100 patients in Group A have proven CAD and the 100 in Group B are proven to be without CAD. To measure the sensitivity of the ‘CAD Finder’ we use it on patients in Group A and see how many have a positive result (93%). This is the sensitivity of the ‘CAD Finder’ for picking up CAD. To measure specificity, we perform the ‘CAD Finder’ on Group B and see how many have a negative result (80%). This would be the specificity for the ‘CAD Finder’ for ruling out CAD. It should also be noticed that, 7 out of 100 patients with CAD will falsely test negative and 20 out of 100 without CAD will get a false positive result.
The sensitivity and specificity of any test or maneuver are usually compared to a “gold-standard” test. In the case of suspected coronary artery disease, that test is invasive coronary angiography.
In clinical practice, it is often more helpful to gauge the performance of a test based on the prevalence of the disease of interest. This introduces the concepts of positive and negative predictive value .
Positive predictive value (PPV): The ability of a test to be positive when disease truly is present.
Negative predictive value (NPV): The ability of a test to be negative when disease truly is absent.
PPV and NPV vary inversely with the prevalence of a disease in a population. The relevance of this becomes apparent when tests which have been “studied” in a subgroup are applied in another population with different characteristics and disease prevalence. This brings us to our final concept worth defining:
Likelihood ratios: “the likelihood that a given test result would be expected in a patient with the target disorder compared to the likelihood that that same result would be expected in a patient without the target disorder.” 
Using the formula:
LR+ = sensitivity/(1-specificity).
2.3 The importance of disease prevalence, population characteristics and cut-off values
The reliability of a test is dependent on the prevalence of the disease of interest in the population in which it is studied. Bayesian methods allow us to apply known test characteristics to any population, once the population characteristics and prevalence of disease is known.
We will illustrate the outcomes when the same hypothetical test ‘CAD Finder’ is used in two different populations: (i) male Olympic sprinters (Table 1) and (ii) male elderly veterans (Table 2). Continuing with our hypothetical exercise, it is noted that our ‘CAD Finder’ is best at differentiating between ‘disease’ and ‘no disease’.
Let us say that in a population of male Olympic sprinters, 5% have significant CAD and 95% were without. At the cut-off point for a positive result—93% of the ‘Significant CAD’ disease group would test positive (true positive) but 20% of the ‘No CAD’ group would also test positive (false positive). If the CAD Finder is used on a population of 10,000 similar Olympic sprinters, the outcomes would be as shown in Table 1.
2.3.1 Performance of the ‘CAD finder’ in detecting significant CAD
When our ‘CAD Finder’ is used on our population of 10,000 Olympic sprinters, 2365 would test positive. However, only 465 out of those 2365 (20%) truly have significant CAD. This means that 1900 athletes will be false positives.
Rule 1. If a disease is uncommon in a population (i.e.) low prevalence—in this case 5% prevalence, any positives are more likely to be false positives.
In the population of male elderly veterans 95% have significant CAD and 5% are without disease. The outcomes of the ‘CAD Finder’ on a population of 10,000 similar male veterans would be depicted below in Table 2.
Out of the 1065 veterans who test negative for CAD, 665 (62%) actually have significant CAD. This is a large proportion of false negative results. Compare that to the population of Olympic sprinters where 35 out of the 7635 negative test results (<1%) are false negatives.
Rule 2. If a disease is highly prevalent in a population, in this case 95%, any negatives are more likely to be false negatives.
2.3.2 Performance of the ‘CAD finder’ in detecting persons without CAD
Since 95% of the population of athletes (9500) are truly without significant CAD, in the absence of any testing at all, if you told a patient in this population, they were without disease you would be correct most of the time. However, if we relied on our ‘CAD Finder’, our ability to detect athletes without CAD decreases from 95 to 76.3% (7635 of our Olympic sprinters test negative).
This illustrates that in this population of Olympic sprinters, our CAD Finder test performs very unreliably despite its good characteristics (Sensitivity 93%, Specificity 80%).
Rule 3. A test is unreliable in picking up disease when the prevalence of disease in the population is less than the value of the ‘false positive rate/true positive rate’.
In our above example with the ‘CAD Finder’, the false positive/true positive rate = 20/93; =21%.
2.4 The Bayesian method
2.4.1 What is Bayes’ theorem
Bayes’ theorem (or more accurately Bayes’ Rule)  was first described by the 18th century Episcopal minister Thomas Bayes in an essay published in the Philosophical Transactions of the Royal Society of London in 1764, in which he describes solving a complex problem of chances involving billiard balls . Since Bayes’ theorem was first theorized, it has been expressed in a variety of ways. We will use three  formats that are relevant to our discussion.
In mathematical terms, Bayes’ theorem is expressed as follows:
In this formula Pr(A|X) is the chance of having disease (A) given a positive test (X); Pr(X|A) is the chance of a positive test (X) given the presence of disease (A); Pr(A) is the pretest probability of the disease; Pr(~A) is the pretest probability of not having disease and Pr(X|~A) is the chance of a positive test (X) even if there is no disease (~A).
In plain English, Bayes’ theorem states: “The probability of having a disease based on a selected test (after the test is done), is related to the pre-test probability that the patient has the disease (or its prevalence) and the test’s sensitivity and specificity.”
In diagrammatic form (Figure 2), the posttest probability is proportional to the pretest probability times the likelihood of the disease in the population. This generates the simplest representation of Bayesian statistics:
3. Practical use of Bayesian principles in cardiovascular testing
Given that accurate application of Bayes’ statistics relies on the updating of probabilities (based on the acquisition of new evidence)—it is obvious then that any recommendations stated hereafter are as current as the present medical knowledge and is influenced by the writers. Our aim therefore is not to provide guidelines on the evaluation of patients with stable chest pain, but to convey a sense of comfort with using these tools, to allow the reader to generate their own informed approach to patient care.
There are generally two aims of performing cardiac testing in patients with stable chest pain (i) to determine which patients are likely to have obstructive CAD and (ii) to predict outcome or prognosis. In our discussion that follows, we will review how current testing modalities achieve either or both targets.
3.1 Determining pre-test probability in patients with stable chest pain
There are many factors which must be accounted for by the clinician when determining the pre-test probability of a patient (without known CAD) having coronary artery disease as a cause of their chest pain. These include history, patient characteristics, physical examination findings, physician experience, bias/heuristics among others. Approaches to rule-out other cardiac and non-cardiac causes will not be covered here.
We will focus on only one approach (Diamond-Forrester classification) to the generation of pre-test probability data , see Tables 3 and 4. This risk prediction model was generated through Bayesian statistics. Other scoring methods include the Goldman Reilly criteria (Goldstein), Thrombolysis in Myocardial Infarction (TIMI) risk score and the Morise Score . Although it was developed over three decades ago, the Diamond-Forrester classification has been validated in modern populations . The Diamond-Forrester classification stratifies patient pretest probability on the basis of three clinical variables: age, gender and chest pain characteristics. It allows clinicians to stratify patients along a spectrum of pretest likelihoods from 2 to 94%. For simplicity, many guideline groups have chosen to categorize patients in groups of very low, low, intermediate and high pretest probability (see Table 4). This classification of pretest probability (very low, low, intermediate and high) will form our basis of using Bayesian methods to select testing in patients.
3.2 Testing modalities for the evaluation of stable chest pain
The tests available for the evaluation of patients with stable chest pain can be divided into functional or anatomic. This is based on the type of information provided. The list of functional tests includes exercise ECG, stress echocardiogram, myocardial perfusion imaging (single-photon emission tomography (SPECT) and positron emission tomography (PET)) and stress MRI. The list of anatomic tests includes coronary CT angiography and coronary artery calcium scoring and the gold standard test-cardiac catheterization.
In Table 5, we have included the characteristics of the testing modalities we will reference throughout this chapter . Please note that for each testing modality in Table 5, two values are reported for each test characteristic (sensitivity, specificity etc.) These values are based on the test’s performance when used in the overall population vs. in patients with suspected CAD. We already know from Bayesian principles that this difference is based on varying prevalence of disease in these two groups.
One important limitation of the available data is that there are no head-to-head trials comparing the test performance of many of the pharmacological or exercise testing modalities—using cardiac catheterization as a reference. Current data is limited to small samples sizes and the use of old techniques/technologies.
3.3 Noninvasive functional tests
It is important to make the distinction between exercise testing and ‘stress testing’ which are often used synonymously. ‘Stress testing’ refers to any pharmacological or exercise testing modality which imposes a stress on the cardiovascular system . Many of these alternative forms of ‘stress testing’ modalities will be covered in the following subsections.
3.3.1 Exercise ECG
Exercise ECG testing is often the first used testing modality in the workup of stable chest pain . It has been around for several decades and has been studied in many clinical scenarios. Unfortunately, it has an intermediate sensitivity and specificity for the detection of CAD. Exercise testing is based on the premise of monitoring the cardiovascular system’s response to physiological stress. This is to determine clinical signs and symptoms which would not be present at rest.
Exercise testing in the lab uses dynamic testing principles because (i) it can be graduated and (ii) it places a volume stress on the heart .
There are various exercise protocols available for laboratory exercise testing. The Bruce Protocol is most commonly used . Regardless of the protocol, exercise capacity is graduated and measured at various “stages”. This is based on the physiological principle that at a given intensity of activity, the parameters of heart rate, blood pressure, and cardiac output are relatively constant . The quantity of oxygen transported and utilized in cellular metabolism is measured as V02max, which will be discussed in more detail later . A second measurement parameter of oxygen utilization is myocardial oxygen uptake or Mo2. In clinical scenarios Mo2 is estimated by the product of heart rate and systolic blood pressure (rate-pressure product). In healthy myocardium, there is a linear relationship between Mo2 and coronary vessel blood flow .
However, if there is either an obstruction to flow or reduction in myocardial cell uptake of oxygen (as in ischemia), then the Mo2 in the region will be reduced. The implications of this are that if there is a coronary obstruction, a point will be reached where there is supply/demand mismatch and signs and symptoms of ischemia will occur. Regardless of the form of physical activity, for any given coronary obstruction, angina usually occurs at the same rate-pressure product.
The rate of oxygen uptake by healthy tissue will increase with increasing demand until it reaches a maximal level of oxygen extraction. This is termed the V02max, and varies with age (and to a lesser degree with gender). It has been observed that increasing V02 during exercise has a linear relationship with heart rate until it reaches the V02max plateau. At this point, the heart rate may continue to increase as myocardial energy generation reverts to anaerobic metabolism. This, in turn, may cause signs and symptoms of ischemia, which in turn may cause a ‘false positive’ finding for ischemia.
Most labs use the formula, 220-Age (for either gender) to calculate a Maximal Predicted Heart Rate (MPHR) as a surrogate for V02max. Target heart rates for exercise testing are then usually set at 85–100% of MPHR. In the absence of symptoms, ECG changes (or any other reasons to stop the test early), the test is routinely stopped when the MPHR is achieved. It is therefore important to recognize that exercise stress tests can have a false-negative if the level of exercise does not reach the target but may become false-positive if it reaches above the maximum.
In general, the two common exercise methods used are either the treadmill or cycle ergometer. End points for exercise include: achievement of target heart rate, fatigue, symptoms (such as chest pain or dyspnea), significant ECG changes (such as ST segment depressions or elevations), significant dysrhythmias, drop of systolic blood pressure (usually >10 mmHg), patient request or inability to continue.
The diagnosis of ischemia is usually made from chest pains and/or development of horizontal or down-sloping ST segment deviations of ≥1 mm during exercise or the recovery period. Other ECG criteria, such as the development of a bundle branch block (especially LBBB) may suggest ischemia (but are less sensitive).
In addition to evaluation for ischemia, the exertional capacity and hemodynamic responses to stress testing have additional prognostic value . One of the more popular ways to evaluate this is with the Duke Treadmill Score (DTS), which incorporates exertional capacity, ECG changes and symptom onset (Exercise minutes (Bruce) − (ST deviation in mm X 5) − (angina index X 4)) .
There are several limitations to standard exercise stress testing. Firstly, it is limited by a patient’s ability to exercise and achieve the target heart rate. Additionally, any baseline ECG abnormalities, such as left bundle branch block, left ventricular hypertrophy with repolarization abnormalities, ST segment depression and ventricular pre-excitation, further reduces the test’s specificity and even sensitivity .
The use of image testing as an adjunct to exercise testing, improves both sensitivity and specificity. Imaging with the use of pharmacologic agents can also be useful in patients who cannot ambulate or have baseline ECG abnormalities limiting interpretation. Currently, the most commonly used stress imaging modalities are echocardiography and nuclear imaging.
3.3.2 Stress echocardiography
Stress echocardiography (SE) is reliant on the identification of wall motion and wall thickening abnormalities. Generally, patients undergo echo acquisitions prior to exercise and immediately after exercise (or, in patients who exercise on supine bicycles, at peak exercise). The images are compared between rest and stress for changes in wall motion, wall thickening and left ventricular sized and function. Sensitivities for SE are about 88%, with a specificity of 89%.
The major advantages of SE are availability, absence of radiation exposure and cost. However, it has been reported to have lower sensitivity than radionuclide imaging, especially in single vessel disease detection . It is also operator dependent and image quality is limited by patient characteristics (e.g. COPD and obesity). When used with exercise testing, it is limited by the patient’s ability to get into position for scanning quickly after peak exercise (typically, the patient must get off the treadmill, get in the correct position on the bed and the sonographer must acquire good images in four different views—all within 1 min).
For patients who cannot walk, dobutamine stress echocardiography (DSE) has been shown to offer similar sensitivity and specificity to SE for the detection of obstructive CAD. However, because it does not involve exercise, the additional prognostic data offered by exercise protocols (e.g. using the Duke Treadmill Score) are not available with DSE.
3.3.3 Nuclear imaging testing
There are two forms of radionuclide imaging modalities currently available for chest pain evaluation: single-photon emission tomography (SPECT) and positron emission tomography (PET). Both rely on the use of radiotracer isotopes to detect areas of ischemia or infarct. There are two types of SPECT radiotracers commonly used in clinical settings: technetium (Tc-99m)-labeled tracers and thallium (Tl-201). PET imaging uses the more high-energy rubidium (Rb)-82 radiotracer. The physical and physiological principles behind how radiotracers elements are used to evaluate for CAD is beyond the scope of this chapter. We will focus on the performance characteristics of both nuclear tests and a few pertinent advantages and disadvantages.
SPECT imaging can be used in conjunction with exercise or pharmacologic agents to assess for CAD. When SPECT is used with exercise protocols, it provides similar functional capacity and ECG data as exercise ECG, with the added sensitivity and specificity provided by imaging. It is the more commonly used and more readily available of the radionuclide modalities.
Pharmacologic stress images are most commonly obtained with vasodilating agents (adenosine, dipyridamole and regadenoson). The vasodilating agents either differently or indirectly stimulate the A2A (adenosine) receptors leading to coronary arteriolar vasodilation. Vasodilating agents are suitable for use in patients who cannot exercise, where significant increases in heart rate are not desired (e.g. patients with permanent pacemaker devices or left bundle branch block) or patients who have had recent acute coronary syndromes. Sensitivity and specificities for these are between 83–84% and 79–85% respectively.
SPECT is a great alternative to stress echocardiography, when patient’s characteristics prevent good imaging windows [30, 31]. It has a number of important limitations. Since it is dependent on radiotracer delivery to tissue, in patients with equally compromised flow in all coronary vessels (triple vessel disease), there is risk for a false negative study—due to so-called ‘balanced ischemia’. Soft tissue and uptake by other organs (e.g.) gallbladder can also limit image quality and often cause false positive findings .
PET imaging uses the more high-energy rubidium (Rb)-82 radiotracer. This allows for less displacement by soft tissue and potentially fewer false positives. PET has the additional advantage of being combined with CT imaging. This allows for soft-tissue attenuation correction (to reduce false positives) and assessment of coronary artery calcium (which will be discussed later).
The major disadvantage of PET is the shorter half-life of radiotracer and its cost. As a result, its availability is limited.
3.4 Noninvasive anatomic tests
3.4.1 Coronary CT angiography (CCTA)
Coronary CT angiography allows for non-invasive assessment of coronary artery disease . Intravenous contrast agents within the lumen of coronary vessels facilitate visual calculation of obstruction/stenosis. Newer CT techniques use a 64-multislide (or better) acquisition hardware to obtain images. Available post-processing software packages further refine images.
The definition of ‘significant coronary artery disease’ (CAD) with CT coronary angiography is ≥70% diameter stenosis of at least one major epicardial artery segment or ≥50% diameter stenosis in the left main coronary artery . The presence of collateral coronary supply as well as low at-risk myocardium can render high degree stenotic lesions asymptomatic. For this reason, the degree of CAD obstruction correlates poorly with symptoms and limits its usefulness for determining the cause of chest pain.
The presence of coronary calcium causes artifacts during imaging that may obscure the coronary lumen. For this reason, CCTA is often combined with assessment of coronary calcium to make predictions about outcomes or prognosis . Novel techniques such as fractional flow reserve derived from CT (FFRCT) are being used to determine whether stenotic lesions are physiologically significant . FFRCT is beyond the scope of discussion here.
Although CCTA allows for rapid imagining, it has limited utility in patients with rapid heart rates, arrhythmias, renal impairment. It also carries radiation exposure risk.
3.4.2 Coronary artery calcium scoring (CACS)
CACS is obtained using non-contrast CT scanning. Post-processing algorithms are used to quantify the degree of coronary vessel calcification. The three most common scoring methods are Agatston, volume and mass. CAC scoring has been in formal use since 1990 . Although traditionally it has been recommended for use in asymptomatic patients, it has been shown to provide similar predictive reliability as functional testing, especially when used with other CT modalities .
Evidence suggests that a positive CAC is more sensitive than functional testing at predicting MACE. Alternatively, an abnormal functional test result is more specific. Increasing the cut-off point of a ‘positive’ CAC improves specificity at the expense of sensitivity . This is a consequence of the Bayesian principles we discussed. However, it has little usefulness in determining the cause of chest pain.
3.5 Bayesian approach to test selection
Using the Diamond-Forrester classification, there are three broad categories of patients with stable chest pain that a clinician will likely encounter. These are grouped as ‘low pretest probability—5%’, ‘intermediate pre-test probability—approx. 50%’ and ‘high pre-test probability—95%’. In the following three subsections we will explore strategies of using Bayesian approaches to testing in each of these groups.
There are many factors which will influence real life decisions on which tests to order and when. Some of these are patient characteristics (ability to exercise), resource availability, institutional culture, and the degree of uncertainty a clinician/patient is comfortable with accepting.
Bayesian approaches allow us to make informed decisions on sequential test selection. Tests assessing CAD by the same markers should not be duplicated. To avoid this, one potential way of classifying testing modalities would be: (i) tests that identify ischemia via surface changes—(e.g. stress ECG), (ii) tests that identify ischemia via nuclear tracer, (iii) tests that identify ischemia via wall motion abnormalities—(e.g. stress echocardiography) .
3.5.1 Test selection in patients with low and high pre-test probability
We know from ‘Rule 1’ that in a population with low prevalence, any positive result is likely to be a false positive. This means that in this scenario (low pretest probability) if one opts not to perform any cardiac testing, because you suspect the patient does not have CAD, you would be correct most of the time. ‘Rule 3’ stated that “If a disease is highly prevalent in a population, any negatives are more likely to be false negatives”. We also learnt in Section 2 that when testing is performed in this group, the possibility of true positive results does not improve much on no testing at all. If one opted to assume that every patient in this group (high pretest probability) had significant CAD and went ahead with a definitive test/treatment (i.e. ICA), one would be correct most of the time, and may avoid an unnecessary interim test.
3.5.2 Test selection in patients with intermediate pre-test probability
The intermediate pre-test probability group is where Bayesian approaches yield the greatest benefit. This is also where informed sequential testing can be very informative and efficient if used correctly. It is impossible for us to illustrate the relative outcomes of all possible combinations of testing here. We will use two examples to illustrate how tests with varying sensitivities and specificities can give different post-test probabilities when used in high (Patient A), low (Patient B) or intermediate pre-test probability patients (Patient C).
Patient A has very high pre-test probability of disease (95%).
Patient B has very low pre-test probability of disease (5%).
Patient C has an intermediate probability of disease (50%).
Example 1. A test has a sensitivity of 90% and specificity 90% (e.g. nuclear imaging testing).
Patient A (with 95% pre-test probability):
Chance a + test will mean + disease= >99%
Chance a − test will mean + disease = 68%
Patient B (with 5% pre-test probability):
Chance a + test will mean + disease = 32%
Chance a − test will mean + disease = <1%
Patient C (with 50% pre-test probability):
Chance a + test will mean + disease = 90%
Chance a − test will mean + disease = 10%
Example 2. A test has a sensitivity of 65% and specificity 65% (e.g. Exercise ECG testing).
Patient A (95% pre-test probability):
Chance a + test will mean + disease = 97%
Chance a − test will mean + disease = 91%
Patient B (5% pre-test probability):
Chance a + test will mean + disease = 9%
Chance a − test will mean + disease = 3%
Patient C (50% pre-test probability):
Chance a + test will mean + disease = 65%
Chance a − test will mean + disease = 35%
The decision on when to stop testing or proceed with further evaluation based on post-test certainty, is highly individualized. It is based on the post-test value at which one thinks a disease has been either sufficiently ‘ruled-in’ or ‘ruled-out’. Generally, most a pre-test or post-test probability of ≤15% or ≥85% would be considered reasons to stop further testing—and either assume no disease when the probability is ≤15% or assume disease when the probability is ≥85%.
4. Appropriate use criteria and the delivery of high-value care
Technological advancement has resulted in a growing number of available testing modalities that offer increased sensitivity and specificity . This rapid growth has led to an increase in healthcare expenditure. Payers were the first to respond to this growth by implementing restrictions to regulate expenditure through strict criteria for reimbursement and prior authorization requirement.
In response, clinician led groups developed Appropriate Use Criteria (AUC) to improve efficient utilization of these testing modalities . The American College of Cardiology (ACC) along with other organizations have developed AUC documents to aide with decision making for cardiac testing. The first statement of this kind was for the efficient utilization of myocardial perfusion imaging in 2005 . Subsequently, AUC documents have classified indications for the use of: CCTA, echocardiography (including stress), cardiac magnetic resonance among others.
AUC documents are distinct from clinical guideline/recommendation statements. Appropriate use is classified based on the ratio of possible benefit versus the potential harm of a procedure with an eye to ‘cost-effectiveness’. The definition of “an appropriate diagnostic or therapeutic procedure is one in which the expected clinical benefit exceeds the risks of the procedure by a sufficiently wide margin, such that the procedure is generally considered acceptable or reasonable care” .
AUC are scored from 1 to 9 and classified into three broad categories: ‘rarely appropriate care’ (score 1–3), ‘may be appropriate care’ (score 4–6) and ‘appropriate care’ (score 7–9). The Bayesian criticism of these AUC statements is that they are often not based on evidence but rather on aims to reduce healthcare expenditure based on the availability of less costly, safe alternatives.
5. Review of the current recommendations for the evaluation of stable ischemic heart disease (SIHD)
The use of functional testing (both exercise and pharmacological) has long been favored as the preferred modality of ischemia evaluation . The main reasons are: (i) that applying increasing workload on the heart more closely mimics the pathophysiology of supply/demand mismatch, (ii) exercise capacity provides incremental diagnostic and prognostic information and finally, and (iii) cost-effectiveness.
Since the advent of exercise stress testing, many alternative diagnostic modalities have become available. As was previously mentioned, most of these testing modalities have not been compared to each other in head-to-head trials. The Bayesian criticism of much of the available evidence is that limited data regarding pretest clinical stratification or how posttest classification contributed to decision making is reported .
The biggest questions of the last decade have been whether (i) bypassing testing is a viable/safe option in select low risk patient groups, and (ii) if anatomic testing is as good as or a better alternative to traditional functional testing [37, 43]. It would be counterproductive to review all the guideline recommendations and AUC here since they are frequently revised and there are many competing agencies that produce them. Instead, we will just highlight a few which either follow or contradict the Bayesian method to give the reader a better understanding of some of their limitations.
In patients with low pretest probability and at low risk it is reasonable to exclude the diagnosis of stable angina on clinical assessment alone and defer further diagnostic testing . This is supported by Bayesian argument. The decision to defer testing is highly individual and will be guided by physician experience and shared decision making with patients.
The AUCs recommend exercise ECG as the initial testing modality in most patient populations [9, 10, 11]. We have previously showed the potential pitfalls of low sensitivity and specificity testing in either low or high probability patients. For this reason we do not recommend that exercise ECG testing be used alone to either exclude or diagnose significant CAD as a cause of stable chest pain, no matter the pretest probability . However, exercise testing is still recommended because of the valuable prognostic information it provides (with the use of Duke Treadmill Score, etc.). This leads to the other bit of contradictory recommendations in the area of stress echocardiography in low pretest groups—where a patient’s ability to exercise downgrades stress echocardiography testing from ‘appropriate’ to ‘inappropriate’ . Presumably, the AUC feels that a regular ECG exercise test is preferable in these patients, contrary to Bayesian arguments.
When the pretest probability of disease is very high, any testing is likely to result in a false negative. Bayesian methods argue for proceeding with definitive testing/intervention. However, even in this patient population invasive coronary angiography is usually a second or third-line option after other non-invasive tests are inconclusive [9, 33].
The anatomic testing modalities have high sensitivity. Nevertheless, they are regarded as rarely appropriate across all patient groups by current AUC . On the other hand, the current NICE guidelines recommend offering 64-slice CCTA as a first testing modality in patients with typical angina. The recommendations for Heartflow FFRCT are less clear .
The final question remains, what is regarded as ‘confirmatory’. This varies with clinician comfort and experience. Some guidelines suggest that a posttest probability of >85% is sufficient to confirm the diagnosis of significant CAD .
6. Limitations of the Bayesian approach
One of the major limitations of the use of the Bayesian method is its reliance on pre-test probability (some advocates of Bayesian approaches consider this to be a strength). Many critics have stated that this results in ‘subjectivity’ on which inferences of post-test outcomes are based . We have already stated some of the many factors which contribute to variations in pre-test estimates at the individual clinician level. This point just reiterates the importance of scrutinizing the quality of evidence on which conclusions are based. Other proponents of the Bayesian approach have therefore stressed the need to base pre-test probability on sound reasoning and evidence and to combine available data to reach consensus .
Bayesian analysis is very important in the evaluation of patients with stable ischemic chest pain. Even with the use of a simplified assessment tool of pretest probabilities, one can maximize efficiency of test selection, lower costs and improve patient outcomes.
Conflict of interest
The authors have no conflicts of interest to declare.
Appendices and Nomenclature
“One in thousand people have a prevalence for a particular heart disease. There is a test to detect this disease. The test is 100% accurate for people who have the disease, and is 95% accurate for those who do not (this means that 5% of people who do not have the disease will be wrongly diagnosed as having it). If a randomly selected person tests positive what is the probability that the person actually has the disease?”