Open access peer-reviewed chapter - ONLINE FIRST

# On the Value of Conducting and Communicating Counterfactual Exercise: Lessons from Epidemiology and Climate Science

By Gary Yohe

Submitted: June 10th 2020Reviewed: August 18th 2020Published: September 22nd 2020

DOI: 10.5772/intechopen.93639

## Abstract

Modeling is a critical part of crafting adaptive and mitigative responses to existential threats like the COVID-19 coronavirus and climate change. The United Nations, in its efforts to promote 17 Sustainable Development Goals, has recognized both sources of risk as cross-cutting themes in part because both expose the wide list of social and economic challenges facing the globe. Here, evidence is presented to encourage the research communities of both topics to work together within and across the boundaries of their international infrastructures, because their modeling approaches, their social objectives, and their desire effectively to bring rigorous science to opinion writers and decision-makers are so similar. Casting decision analysis in terms of tolerable risk, conducting policy relevant counterfactual experiments, participating in organized model comparison exercises, and other research strategies are all part of their common scientific toolsets. These communities also share a responsibility to continue to hone their communication skills so that their insights are more easily understood by the public at large—skills that are also essential to protect their science from attack by groups and individuals who purposefully espouse their own misguided or deliberately misstated perspectives and/or, sometimes, their own corrupted personal agendas.

### Keywords

• risk
• coronavirus
• COVID-19
• climate change
• integrated modeling
• model projections
• counterfactual experiments
• model comparisons
• tolerable risk
• value of information
• sustainable development goals

## 1. Introduction

There are currently 17 United Nations Sustainable Development Goals, the so-called SDGs, whose content was updated on the United Nations Website [1]. In its preface, the then current (July 1, 2020) collection of SDGs were presented as

“the blueprint to achieve a better and more sustainable future for all. They address the global challenges we face, including those related to poverty, inequality, climate change, environmental degradation, peace and justice.”

They are “a call for action by all countries—poor, rich, and middle-income—to promote prosperity while protecting the planet.” The SDGs thereby recognize issues that revolve around trying to develop sustainably in a holistic global sense, but that does not mean that they miss the more microscale and pervasive issues where the devil lies in the details. That list is long and growing: ending poverty, building economic growth, and confronting social needs like access to quality education, quality health care, social protection against ordinary and extreme risks, quality opportunities and security in employment, personal security everywhere, food security, promoting equity and justice, and much more. In other words, it is like promoting the public welfare however it is as measured and monitored.

All of this must happen in the context of growing direct and indirect risks from ordinary environmental pollution, extraordinary and sometimes existential risks from climate change, as well as sudden and unrelenting risks from pandemics like COVID-19. While the United Nations asserts accurately that SDGs can “provide a critical framework for COVID-19 recovery,” it is also true that pandemics and climate change can expose the extent to which the progress toward achieving any SDGs has not been as significant as one might have hoped or even expected [2]. Both are color-blind, and neither is impressed by social or economic status. In one way or another, both can strike anyone or everyone living anywhere or everywhere.

Coronavirus pandemics and climate change are therefore a cross-cutting theme of enormous concern across the full range of sustainability issues. The very organization of the SDGs shows that this truth has been recognized by the United Nations. “COVID-19 Response” boxes are highlighted close to the tops of the presentations of all 17 of the goals. In addition, climate change has been sustainable development goal for some time; SDG-13, to be specific. Labeled “Take urgent action to combat climate change and its impacts”, this goal identifies three critical “targets” for action by decision-makers of all stripes: “strengthen resilience and adaptive capacity to climate-related hazards and natural disasters in all countries, integrate climate change measures into national policies, strategies and planning, and improve education, awareness-raising and human and institutional capacity on climate change mitigation, adaptation, impact reduction and early warning.” In addition, it is important to note that the COVID-19 Response Box for SDG-13 calls for: investments that accelerate decarbonization and promote sustainable solutions to energy market distortions, recognition of all climate risks, the creation of green jobs and vibrant employment markets within sustainable and inclusive growth, persistent transitions to more resilient societies that are fair to all, and reliance on international cooperation to most effectively respond to the challenges of climate change, pandemics, and all of the SDGs.

Integrated models are one of the primary tools through which rigorous science can be inserted into deliberations of actions whose goals (social values and costs calibrated in whatever metric is appropriate) extend well into the future—a year or more for viral pandemics up to a century or two for climate change [3, 4, 5]. They are, therefore, a means by which decision-makers who are charged with promoting sustainable futures can apply rational and rigorous risk management procedures to their challenges. They are, as well, the means by which decision-makers can organize their thoughts around what emerging data and new science are revealing about the relative likelihoods and fundamental characteristics of possible futures.

This is particularly important because climate change impacts and global pandemics have both shown the tendency of, as put in Flyvbjerg [6], “regressing to the tail” over time. In words, both have shown patterns of never having offered their worst possible outcome; that is, things can always get worse, and there is no reason to believe, ceteris paribus, that that will not always be the case. Technically, this is possible when the tails of distributions are so fat that the mean and variance, among other moments, do not exist; generalized Pareto distributions display this characteristic. Table 1 in [6] provides a list of the top 10 phenomena (by “tail thickness”) and their calibration metric: earthquakes (Richter Scale max), cybercrime (financial loss), wars (per capita death rate), pandemics (deaths), IT procurement (percentage cost overrun), floods (water volume), bankruptcies (percentage of firms per industry), forest fires (area burned), Olympic Games (percentage cost overrun), and blackouts (number of customers affected). Italics, here, put three of the top 10 squarely within the focus of this discussion.

Number of billion-dollar disasters (average per year)Associated costs (average per year)Associated fatalities (average per year)
1980s (1980–1989)28 (2.8)$127.7B ($12.8B)
2808 (281)
1990s (1990–1999)52 (5.2)$269.6B ($27.0B)
2173 (217)
2000s (2000–2009)59 (5.9)$510.3B ($51.0B)
3051 (305)
2010s (2010–2019)119 (11.9)$802.0B ($80.2B)
5212 (521)
Last 5 years (2015–2019)69 (13.8)$531.7B ($106.3B)
3862 (772)
Last 3 years (2017–2019)44 (14.7)$456.7B ($152.2B)
3569 (1190)
Overall (1980–2019)258 (6.5)$1.754.6B ($43.9B)
13,249 (331)

### Table 1.

Billion dollar disasters from climate and weather across the US.

Source: [7].

More specifically and less technically, the US National Oceanographic and Atmospheric Administration has observed dramatically increasing trends in the number of billion dollar national catastrophes and the fraction of each year’s list that can be attributed to anthropogenic climate change [7]. Incredible episodes of enormous and increasing amounts of rain in one place over consecutive days have, for example, begun to occur because climate change has moved steering wind patterns, such as hurricanes like Harvey in 2017 and Florence in 2018, to suddenly not know where to go. Rapid successions of storms that do not diminish in intensity are now more common around the world because subsurface waters are historically hot in the spawning oceans. Damage records are meant to be broken, but Maria broke the bank for the third storm on the same track in less than one month in 2017. Fires from north to south across all of 2019 brought California more burned area and property than any time in history. Table 1 shows that these and other climate and weather disasters averaged $80.2 billion with 521 lives lost in the last decade; over the past 2015–2019, the averages were$106.3 billion with 772 lives lost. In addition, COVID-19 indirectly caused economic damages in the US early in its course that were larger than the Great Depression, at least in terms of the rates of unemployment and economic loss [8].

In the face of these kinds of threats, what are the response options that need modeling support? Mitigation is one—slow the pace of the risk so that the spread of the consequences (symptoms) does not overwhelm social capacities to respond and adapt. That is, “flatten the curve” by social distancing, wearing masks, testing, tracking, and quarantining, sheltering at home, locking down nonessential economic activity (that cannot be done remotely), etc. Or, invest in reducing the emissions of greenhouse gases and decarbonizing the macroeconomy and thereby reduce the likelihoods of significant harm. Shrinking the tails of the most extreme consequences is another—invest in new adaptations and response actions (therapeutics and vaccines) that can eradicate the explosive nature of potential outbreaks. That is, invest in the development and distribution of new ways to minimize the ravages of the virus or prevent it from invading human beings. Or, invest in forward-looking or responsive adaptations that reduce the consequences of climate change.

These are abstract issues, of course, but confronting them is critical for efforts to manage the controversies that surround action decisions—controversies that can be born of misinterpretations of modeling results and applications, deliberate distortions designed by unscrupulous agents to promulgate false perceptions, exaggerated foci that obscure social, economic, and political complexities, as well as unfounded assertions that attack the integrity of sound scientific practices [9]. These controversies make it clear that modelers need to work continually to improve the models that they employ to answer comparative policy-relevant questions and to communicate their results effectively. They therefore lead to the conclusion that efforts manage climate and health risks need to include exercising novel and traditional methods for improve modeling practices, the understanding of modeling structures, and the communication of modeling results. These efforts are just as important carefully taking account of more widely expressed modeling concerns: assumptions, bias, framing, and immodesty.

Here, similarities and synergies between epidemiologic models of pandemics like COVID-19 and integrated models of longer-term risks from climate change provide a context for productive suggestions about how to structure these efforts—strategies like policy-relevant counterfactual exercises, structural model comparison experiments, value of information calculations, out-of-scale reality checks, and model updating are all highlighted, here. The goal is to offer some thoughts about how these research activities can support sound communication for sustainable development. This is especially important because systemic social and economic inadequacies have been laid bare by the COVID-19 pandemic and will be exacerbated by the growing global climate crisis [10].

Section 2 provides some context by reviewing briefly the early history of modeling the COVID-19 coronavirus with reference to the needs and challenges of that enterprise—representing the virus, the consequences of exposure, the implications of responding or not, the need for intervening in the workings of the economy, and so on. Section 3 frames the issue of improving the production and communication of modeling results in a skeptical, frightened, and uncertain world. Tools like methods to identify thresholds of tolerable risk, counterfactual modeling exercises, structured model comparisons, and value of information calculations are introduced and discussed briefly with regard to practicality, context, and experience. Concluding and synthetic remarks occupy the last section.

## 2. The early history of modeling COVID-19 in support of decision-makers

Even as the COVID-19 pandemic evolved through the beginning of its course in early 2020, discussions were underway around the world about preparing for the longer term. In the US, they were based on painful lessons learned from a response often characterized by delays, inefficiencies, a lack of federal coordination, and a pervasive skepticism about the science. Elsewhere, lessons were sometimes more timely and less painful, but the number of cases and deaths continued to climb daily nearly everywhere. Some of these lessons were, of course, obvious. Containment and mitigation can have a positive effect. Creating effective diagnostic tests is difficult. It is even harder to produce and distribute high-quality tests and personal protection equipment in the quantities required. Fast-tracking new therapeutics might become productive, but the real hope probably lies in creating a new and effective vaccine as quickly as possible amidst uncertainties about the character of immunity from the virus and the distribution of the vaccine, itself. Other lessons were more obscure, but one seemed to touch nearly every point where action decisions were required or anticipated: informative modeling results are difficult to communicate and they are easy to criticize because coping with apparently incomprehensible uncertainty is not a widely distributed skill.

In the US and many other countries, virus impact projection models played a prominent role in political and public discussions about what it would mean to “flatten the curve” of new COVID-19 infections. Such models were essential to provide insight into the enormous scope of the problem. They became critical tools for planning the timing of efforts to return societies and economies to pre-COVID-19 activities without doing more damage [11]. Many, however, did not provide necessary information on projected uncertainties in the course and severity of the virus, the key determinants of these uncertainties, the information required to reduce them, and/or best practices in conveying all of this to decision-makers, their constituents, and their bosses [12].

As a result, it was challenging for the primary “clients” of modelers’ products (decision-makers across governments of all scales, businesses large and small, religious organizations, public and private foundations, individuals, etc.) to be comfortable with the idea of assigning likelihoods or even degrees of confidence to their various outputs—that is, to the varieties of possible futures born of processing results from multiple modeling efforts and/or accommodating deliberately created probability distributions from a single model.

For example, when faced in February with five model results and a consulting firm that produced a “composite” estimate of questionable value, Governor Cuomo of New York ultimately picked the model that produced the projections of hospital demand that matched the maximum number of hospital beds and intensive care beds that his state could make available. The state had determined that it could essentially double its total capacity across its 12 geographic regions (53,000 beds including 26,000 intensive care beds) by manipulating equipment on hand, converting non-treatment rooms into patient rooms, and organizing hundreds of hospitals into a single administrative entity—just barely adequate against the middle scenario that had estimated a maximum need of roughly 120,000 beds including 60,000 ICU beds [13]. Why did he ignore the extreme possibilities? Not because they were totally implausible. “Why waste time”, he had thought, “worrying about the two extreme scenarios that would surely overwhelm the entire state hospital system regardless of what we did?”

Governor Cuomo’s predicament was a reflection of at least three phenomena that define the communication context of modelers’ best efforts. First of all, modeling is an essential tool for understanding the likely outcomes of different strategies for responding to a fast-moving global pandemic like COVID-19 [11] or, as consistently noted by the Intergovernmental Panel on Climate Change (IPCC), a slow but accelerating stressor like climate change [14, 15, 16, 17, 18]. Indeed, any phenomenon that produces large, growing, and widespread risk over time can threaten the planet’s ability to develop sustainably. However, developing and refining models for any of these threats are very difficult. In most cases, the most useful modeling necessarily involves multidisciplinary collaboration between epidemiologists (climate scientists, natural scientists, etc.), public health experts, mathematicians, statisticians, and economists—the sort of collaboration that cannot be built in a few days and is not possible at all without personal buy-in by willing participants.

Secondly, appropriately displaying uncertainty bands around “best-practice” projections increases the public communication challenges in engaging decision-makers. Such relative likelihood information must be communicated in a responsible, accurate, and understandable way, but also one that minimizes the risk that those who are uncomfortable with probabilistic information will simply throw up their hands and conclude that “Scientists do not know what they are talking about.” Care needs to be taken, as well, in communicating the value of looking at the tails of the distributions of results. Speaking of low likelihood extreme events of, for example, very bad outcomes cannot irresponsibly be labeled “fear mongering” if those events have very large consequences. Risk is, after all, the product of likelihood and confidence; and it can be comparatively large and therefore worthy of careful consideration if either factor is large.

Finally, model results and their underlying science are vulnerable to attack by skeptics and partisans who are generally suspicious or, more problematically, possess political agendas [9]. This is particularly concerning when projections honestly change markedly from week to week as new information from around the world becomes available and when results from individual models diverge markedly. It is frequently difficult to explain to decision-makers why they should accept projections of any single, well-described policy scenario when its projected outcomes can differ so widely from model to model. These differences do not mean that any given model or ensemble of models is completely untrustworthy; they mean that the modelers are trying to describe the full range of possible futures as well as they can from difference perspectives of natural and/or human processes. Of course, it was the former impression that undermined trust in published models of COVID-19 course projections, particularly after the “no policy” projections of the Imperial College London model [19] received such widespread public and political attention as a baseline description of the reality and seriousness of the health risks.

Some of the multiple efforts to understand the intricacies of the behavior this virus that blossomed well into the summer are covered briefly in [20]. Pei, et al. [21] is notable in this collection as perhaps the first rigorous counterfactual exercise; it was designed at Columbia University to answer the important question at the time: What would have happened if non-therapeutic interventions in the US had started earlier than March 15? According to their calculations, starting only a week earlier, on March 8, would have saved approximately 35,000 U.S. lives [a 55% reduction (95% CI: 46–62%)] and avoided more than 700,000 COVID-19 cases [a 62% reduction (95% CI: 55–68%)] through May 3. Starting interventions another week earlier could have reduced deaths by more than 50,000 (around 83%) with cases falling proportionately. There were no do-overs, of course. The US was well on its chosen pathway by May, but there would be chances to change course if (not really when) the virus came back. It follows that these published answers to an important “What if we had done X?” question should have become strong reasons to express urgency for renewed action if conditions began to deteriorate sometime downstream. They did with little prompt response, but that is another story.

Before then, on June 8th, Nature published two different counterfactual studies that considered the opposite question while including other countries. Hsiang et al. [22] focused on six countries (China, France, Iran, Italy, the UK, and the US) where travel restrictions, social distancing, canceled events, and lockdown orders had been imposed. Their calculations, supported by an estimate that COVID-19 cases had doubled roughly every 2 days starting in mid-January, suggested that as many as 62 million confirmed cases (385,000 in the US) had been prevented or delayed through the first week in April by the actions that had been implemented. Meanwhile, Flaxman et al. [23] focused on 11 European countries on the same question. They worked with estimated viral reproduction rates between 3 and 5; that is, every infected person was expected to infect between 3 and 5 other people per unit of time (the so called “serial interval”—estimated for COVID-19 in Du et al. [24] to be roughly 4 days). They estimated that a total of 3.1 million deaths (plus or minus 350,000) were avoided through the end of April, and they found that only lockdowns produced statistically significant effects on the number of estimated cases.

Were these high numbers really physically plausible? Yes, but they must be interpreted in their complete and proper contexts. The reported scenarios of all of the virus studies only described trajectories for cases and deaths that could be attributed to COVID-19 given alternative assumptions about the form and timing of any policy or behavioral response. As a result, each imagined path also involved a course of policy intervention that had other economic and social effects that were not captured in the analysis [25]. Ultimately, it is up to decision-makers to ponder the implicit trade-offs between these intertwined impacts, to ferret out joint levels of tolerable risk—a judgment they cannot be made honestly without acknowledging what the science says. Unfortunately, the president of the US called [21] a “political hit job” [26]. Even more troubling, conservatives more generally greeted coronavirus models with the same “detest” that they have voiced about climate models [27].

It is important to note that modeling of the COVID-19 coronavirus was not the first time in recent history that widespread modeling played a significant role in framing global and national responses and communicating their social value. Shortly after the discovery of the Ebola viral disease (EVD) in West Africa, modelers around the world began to work to inform decision-makers about the regional and global risks. Chretien et al. [28] chronical 125 models from 66 publications of trends in EVD transmission (in 41 publications), effectiveness of various responses (in 29), forecasts (projections in 29), spreading patterns across regions and countries (15), the phylogenetics of the disease (9), and the feasibility of vaccine trials (2).

Their takeaway messages include some points that are salient, here. Taken in their order, they began by highlighting the need to understand the influence of increasing awareness of severe infections across various levels of community, to improve the ability to sustain that awareness, and to include its manifestations in the models. They also argued strongly for model coordination and systematic comparison of modeling results to better understand the major sources of uncertainty and how models accommodate their inclusion. Indeed, they encouraged the adoption of ensemble approaches with transparent architectures for easier communication. Finally, drawing on Yozwiak et al. [29], they stress the importance of making data and results available more quickly and effectively to all interested parties. These efforts were part of an enormously successful global response organized by the World Health Organization (WHO) and the US Centers for Disease Control and Prevention (CDC), among others. When EVD subsided in November of 2015, 28,000 cases and 11,000 deaths had been reported in Guinea, Liberia, and Sierra Leone. In the US, the final tally was 4 cases diagnosed among 11 cases recorded and 2 deaths [30].

## 3. Dealing more effectively with the challenge of communicating new information

The three phenomena noted above are daunting, but the experiences of the virus modelers whose work was criticized unjustly is evidence of the importance of skilled communication that anticipates the dangers of inserting quality science into a political arena. Moreover, of course, improved communication depends in large measure on better modeling—taken one model at a time or together as informative ensemble.

These challenges bring to mind several strategies that can be productive in improving the workings of the models and the supporting of more confidence in their results. Before they are discussed, however, it can be productive to organize thoughts around more practical issues that can productively be considered when framing a complete research plan from creation to dissemination:

1. Models should be designed to produce results that are calibrated in terms of the welfare metrics that decision-makers and/or the public are using to compare possible futures against society’s implicit levels of tolerable risk.

2. Modelers should expend some significant efforts using their models to answer “What if?” questions that are actually being asked by decision-makers and members of the public (presumably in reference to tolerable risk). What would happen if we did nothing? Or if we did that? Or something else? What variables are most important in determining trends or variability in the answers to these questions? These are challenges that call for organized counterfactual explorations to consolidate insights from studies like [21, 22, 23].

3. Even more specifically, modelers can find profit in organizing themselves to examine systematically why different models can produce different results. Are the reasons structural, a matter of different assumptions, reflections of different sensitivities to exogenous drivers, and so on? Organizing and participating in carefully designed model comparison experiments, conducted as part of routine model development using representations of uncertainty against tolerable risk levels, can build capacity to communicate with some transparency and intuition why the results of a model or an ensemble of models are true and why they should be taken seriously.

4. Time scale matters in these questions, and so modelers should expect to asked about “When?” as well as “How?” and “What?”. Do calibrations of risk manifest themselves over short or long term? Immediately, or with a lag? When should decision-makers plan to act, and what metrics should be monitored to best inform evaluations of the efficacy of their decisions. It follows that answers to counterfactual and model comparison questions can be very time sensitive.

5. Reporting on value of information (VOI) calculations can often support conclusions about which variables are most important in driving the results into the future. This can be important information when it comes to framing plans for the next iteration of the modeling.

These thoughts are clearly interwoven, but the following subsections will provide some annotated descriptions of the italicized concepts and how they support the connections.

### 3.1 Thresholds of tolerable risk

Limits of tolerable risk reflect the level of “risk deemed acceptable by society in order that some particular benefit or functionality can be obtained, but in the knowledge that the risk has been evaluated and is being managed” (https://www.encyclopedia.com). Starting with its first report, the New York (City) Panel on Climate Change [31] employed this notion to frame both its evaluation and management of climate change risks to public and private infrastructure. NPCC communicated the concept to planners and decision-makers by pointing out, for example, that building codes imposed across the City did not try to guarantee that a building will never fall down. Instead, they were designed to produce an environment in which the likelihood of the building’s falling down was below some X% threshold, that is, risk above X% was not “tolerable.” As climate change or a pandemic or any other outside stressor pushes a particular risk profile closer and closer to similarly defined thresholds of social tolerability, it is reasonable to expect that the investment in risk-reducing adaptations can quickly become a critical part of an iterative response strategy over time.

Figure 1 portrays one way by which current and future risk can be evaluated. A smaller version was created to support adaptation considerations in the face of climate change for public and private investment in New York City infrastructure. The idea was to locate infrastructure on the matrix under the current climate—the beginning of the arrow indicates that location. Planners could then envision how the location on the matrix would move as future trajectories of change evolved—upward curving lines, perhaps, that generally move up and to the right at an increasing rate, but drawn as straight lines in Figure 1 for illustrative simplicity. Green boxes identify low-risk combinations of likelihood and consequence; they are benign and need not be of much worry to the people who manage the facility and the people who benefit from the services that it provides. Yellow and orange boxes identify moderate and significant risk combinations, respectively; they both lie below society’s perception of the limit of tolerable risk. Yellow boxes suggest moderate concern, but the orange boxes capture combinations that fall just short of the threshold of tolerability—the boundary between the orange and red boxes.

The arrow in Figure 1 shows how analysts could, by anticipating a dynamic scenario of climate change, alert decision-makers (as they moved along the arrow into the orange region) about the shrinking proximity of intolerable red combinations to which some reactive or preventative actions would be required. Assume for comparison that it takes 4 units of time to reach the tip of the arrow and that time is linear with the box dimensions. In the iterative response program, passing from green region to the yellow takes one unit of time and puts the risk on somebody’s radar screen. Passing from yellow to orange in another 1.25 units of time triggers earnest planning and preparation for adaptive response. Finally, passing into the red region during the final 1.75 units of time identifies the anticipated time for action that would certainly include the implementation of outcome monitoring initiatives.

Figure 1 also suggests how this conceptual device can be used to insert uncertainty about the future into the depiction and the iterative story. The upper dotted line represents a hypothetical 95th percentile scenario that portends larger consequences with growing likelihood. It starts at the same location as the arrow, but it gets to the red region in just 2.4 units of time and spends the remaining 1.6 units plunging farther into the red area. The lower dotted line represents the 5th percentile trajectory; it is also shorter, because it tracks below the median depicts cases where consequences increase more slowly along climate change scenarios that also proceed at a more leisurely pace. It does not even reach the orange level of risk over 4 units of time. Together, these two pathways bound 90% of possible futures drawn from Monte Carlo simulations of a single model or an ensemble of parallel modeling efforts that are all anchored at current conditions. Decision-makers would expect to accelerate preparation and implementation at the point where the upper boundary of the inner 90% projection region (or any other higher or lower likelihood range determined by social norms) crosses the orange-red boundary as a hedge against a high consequence but lower likelihood risk tail. The reported results could, if this analysis were completed, include a distribution of projected response-action trigger-times rather than a single-valued best guess.

Achieving broad acceptance for any tolerable risk threshold is a huge task for many reasons, of course. For one, risk tolerance varies widely across societies and individuals (the locations of their institutional or personal risk thresholds). For another, the real challenge for governors confronting a pandemic or extreme climate change might be navigating between different, but perhaps strongly contradictory or competing risk management plans. It is possible, though. New York State, for example, relied on science to frame its economic strategies in terms of avoiding futures that would overwhelm its hospital system during a second wave of the virus after what had been a successful first response. It supplemented White House [33] “gating criteria” with two forward-looking thresholds: (1) hold the transmission rate of the virus below 1.0 and (2) keep vacancies of hospital beds and ICU beds across the state above 30% of total bed capacity [13]. These are two tolerable risk thresholds to which results from integrated epidemiological-economic models can certainly speak if they are properly designed.

### 3.2 Counterfactual exercises

What can be learned when public health and climate change researchers confront the ubiquitous “What if?” questions of science? Recall that Section 2 reported on three COVID-19 counterfactual studies that were of extreme interest to decision-makers and the public at large: “What if we had started sooner?” and “What if we had not shut down the economy?” [21, 22, 23]. The results were striking, but plausible. More importantly, all three studies were also direct applications of one of the most fundamental research strategies in all of science. Counterfactual explorations, in fact, represent an approach to rigorous scientific inquiry that defines a research question, a trial group to test an answer and a control group to provide a basis for comparison—that is, the scientific method applied to scenarios with policy interventions and scenarios without.

Similar examples are abundant across the world of climate science, as well. The Summary for Policymakers of IPCC [17], for example, contains an iconic result from a comparison of two extreme assumptions. Figure SPM.4 is replicated here in Figure 2; it depicts a result that changed the way the entire world thought about global warming and our confidence in the proposition that it was primarily the product of human activity. The various panels of the figure compare the actual historical global mean surface temperature record (starting in 1910) with distributions of estimated global mean trajectories produced an ensemble of climate models including (trial group) and not including (control group) historically observed carbon emissions and associated forcings. The actual temperature pathway tracks inside only the distributions that include carbon forcings. Moreover, the inner 90-percentile regions of the two distributions around the mean estimates bifurcate around 1980 (earlier for some continents and later for Australia); that is, beyond those bifurcation dates, the likelihood that both distributions are the products of a static climate are virtually nil. Actual temperature tracking therefore combines with the bifurcations to confirm, with very high confidence in 2007, that carbon emissions are a primary cause of observed long term warming globally and across 6 of 7 continents.

Figure 3 shows results from a more recent counterfactual approach that confronts a “try this versus try that” comparison from [20]. The three panels show the results of a modeling exercise designed to produce distributions of economic cost (or benefit) from climate change in 4–20 year climate eras running from 2020 to 2100 for 4 different mitigation (temperature target) futures and 7 geographical regions that cover the contiguous 48 states of the continental US [34]; distributions of transient regional temperature changes were drawn from [35]. Panel A shows estimates for labor costs (in terms of lost annual wages per capita) for two different emissions scenarios—one is a “business as usual (BAU)” scenario, and the other keeps global mean surface temperature (GMST) increases below 2°C through 2100 along the median trajectory. Bifurcations of the inner 90% ranges occur by mid-century; and losses along BAU are uniformly much higher. Panel B replicates A for another two emissions scenarios—one limits the median GMST increase to 1.5°C, and the other, to 3°C. Again, statistically significant bifurcations occur in the mid-century, and losses are higher with warmer temperatures [34].

Loss differences for labor and 15 other sectors were a critical topic of concern when IPCC received an invitation from the members of the United Nations Framework Convention on Climate Change to provide report “on the impacts of global warming of 1.5°C above pre-industrial levels.” The IPCC accepted the invitation in April of 2016 when it decided to prepare a “Special Report on the impacts of global warming of 1.5°C above pre-industrial levels and related global greenhouse gas emission pathways, in the context of strengthening the global response to the threat of climate change, sustainable development, and efforts to eradicate poverty” [36]. The headline messages included: “Climate-related risks for natural and human systems are higher for global warming of 1.5°C than at present, but lower than at 2°C (high confidence). These risks depend on the magnitude and rate of warming, geographic location, levels of development and vulnerability, and on the choices and implementation of adaptation and mitigation options (high confidence). The avoided climate change impacts on sustainable development, eradication of poverty and reducing inequalities would be greater if global warming were limited to 1.5°C rather than 2°C, if mitigation and adaptation synergies are maximized while trade-offs are minimized (high confidence).”

When it came to economic damages, though, Yohe [35] suggests that the value of hitting a warming target of 2°C instead 1.5°C might not be as impressive as it is for natural systems and other social systems that are already stressed by confounding factors. Panel C of Figure 3 makes this point for 16 sectors that were subjected to the same regional analysis as described above. All of the 2°C distributions overlap the 1.5°C distributions in 2090, so no bifurcations can be observed. Any conclusion of higher economic cost for the 2°C target must therefore be offered with at most medium confidence, on the basis of a single study, anyway.

These examples show that decision-makers and the public should be happy to see their decisions and perceptions informed by counterfactual experiments designed to identify the when differences in the risk profiles of alternative responses become statistically significant in terms of their net social benefit. Plotting the foundational distributions over time for alternative response options allows these experiments quickly to estimate when, in the future, it can be expected that the risk portraits of various policy options will become statistically different with, say, very high confidence because the 5th to 95th percentile distribution cones bifurcate—valuable information, no doubt, for designing and implementing an iterative risk management response for a particular decision-making structure (like avoiding (in)tolerable risk).

### 3.3 Model comparisons

In the climate arena, large groups of willing modelers sometimes all agree to run their models with the same distributions of the same sets of driving variables to explore their models’ respective sensitivities or compare response policies’ performances across a spectrum projected futures [37, 38, 39]. Sometimes, the participants also run contrasting idiosyncratic “modelers’ choice” scenarios; and some even run full Monte Carlo analyses across relevant sources of uncertainty. When that happens, scientists can learn something about themselves as well as their topics of interest. The early EMF-12 experiment, for example, displayed a curious result that persists over time—the variances of the output distributions for the modelers’ choice runs were significantly smaller than the output variances for the “common inputs” runs. It would seem that integrated assessment teams tended to be uncomfortable if their results were outliers in comparison with competing teams—a cautionary bias for decision contexts where ensemble distributions may be too narrow because thick tails could be catastrophic [37].

The Coupled Model Intercomparison Project (CMIP) was established by the Program for Climate Model Diagnosis and Inter-comparison (PCMDI) at Lawrence Livermore National Laboratories. PCMDI’s mission since 1989 has been to develop methods to rigorously diagnose and evaluate climate models from around the world, because the causes and character of divergent modeling results should be uncovered before they are trusted by decision-makers around the world. Over time, it has “inspired a fundamental cultural shift in the climate research community: there is now an expectation that everyone should have timely and unimpeded access to output from standardized climate model simulations. This has enabled widespread scientific analysis and scrutiny of the models and, judging by the large number of resulting scientific publications, has accelerated our understanding of climate and climate change” [38].

CMIP, itself, began in 1995 with the support and encouragement of the World Climate Research Program (WCRP). Its first set of common experiments—comparing model responses to an “Idealized” forcing of 1% per year increase in carbon dioxide emissions. Subsequent experiments expanded continuing idealized forcing work to include parallel investigations historical forcings and comparisons with the observed records of climate variables like global mean surface temperature [39]. CMIP5 and CMIP6, for example, have explored why ensemble results do not track observations perfectly. The reason is uncertainty. Model results reflect uncertainties, of course; but temperature observations are also imprecise. They are not records from a global set of thermometers; they are, instead, the products of model interpretations of remotely sensed data. Understanding why and how the differences occur is especially important because they are the ammunition for attacks by science skeptics and politicians with an anti-climate change perspective [40, 41].

CMIP also sponsors coordinated experiments like the “water hosing” experiments designed to explore the sensitivity of the strength of the overturning of the North Atlantic thermohaline global circulation current to changes in upper ocean salinity. There, climate is held constant except for a simulated influx of un-salty glacial melt water from Greenland. CMIP has, as well and since its inception, focused a lot of attention of making model inter-comparison data available to a wider scientific community than the modelers themselves. Here, a global coordination effort for scientific collaboration is admitting that communication to other research communities, decision-makers at all levels, and private citizens is an important part of their job description; some authors (e.g., [40]) have even included second abstracts in their published versions written in plain language.

Contrasting that approach to public health model comparisons with dramatically different time scales and therefore dramatically different client needs adds diversity to the sources of new knowledge about the models and their relative skills. Shea et al. [42], motivated by the aggressive responses of many modeling groups to “forecast disease trajectory, assess interventions, and improve understanding of the pathogen,” expressed concern that their disparate projections might “hinder intervention planning and response by policy-makers.” These authors recognized that models do differ widely for a variety of good and not so good reasons. They also noted that relying on one model for authority might cause valuable “insights and information from other models” to be overlooked, thereby “limiting the opportunity for decision-makers to account for risk and uncertainty and resulting in more lives lost.” As a result, they advocated a more systematic approach that would use expert elicitation methods to inform a CMIP style model comparison architecture within which decision-theoretic frameworks would provide rigorous access to calibration techniques. While certainly an addition to a long tradition of sometimes sporadic model comparison in public health, NSF [43] notes that this proposal would be the first time modelers would be allowed in the structure itself to see why their models disagree.

### 3.4 Time scale matters

Taken together, an ensemble of models that were designed to inform COVID-19 response decisions were capable simultaneously and independently to produce estimates at many time scales and different geographic resolutions—daily, monthly, and a few years into the future for a city or town, a state, a region, or the country as a whole. So, too, are ensembles of climate models. Informed by new data and or new understanding of processes which drive component parts of their models, some modelers in either context can publish new sets of estimates for new combinations and permutations of scale and location diversity at the same time. Done often enough over the course of a month or two for pandemics and 5 or 10 years for climate, those modelers could synthesize collections of time series of short-term estimates for any number of important output variables.

Daily recalculations may be excessive for most pandemic models, but surely, on a regular basis, decision-makers, analysts, and media types are anxious to compare the ensemble distributions of these results against the actual historical data. These plots produce insight into the relative near-term skills of the models across different geographic scales. Modelers would surely be interested, as well and as shown by large participation in the CMIP exercises, because they will continue to try to improve their work and make it more valuable and accessible to the decision-makers who use it and the correspondents who interpret it. The point, as described in the mission of the PCMDI, is to generate early confidence in modelers’ abilities to project what will likely happen given what just happened and to communicate what that means clearly to the populations that care.

In light of this responsibility, climate scientists and epidemiologists have found it useful to conduct short-run skill tests of their models because they anticipate the need to understand and portray future changes that may happen very quickly. For example, it may become imperative at some point in the future to cope with sudden downstream impacts along an otherwise gradual scenario of change—an impact caused, for example, by crossing some unexpected critical threshold at some unknown date. In other cases, testing near term skill may be important to reassure clients of the quality of model results so that the implications of new and significantly different information can be processed by quickly decision-makers and the public.

### 3.5 The value of information

In an era of increasingly tight budgets, it is imperative that funders in both the public and private sectors understand the value of investments in different types of information distributed across and within the germane research areas. Climate change research is a case in point; billions of dollars are being spent to improve the knowledge base for future decision-making. A study by the National Research Council [44] called for decision tools to assist in estimating “the value of new information which can help decision makers plan research programs and determine which trends to monitor to best implement a risk management strategy.” Identifying the relevant decision-makers measurable priorities is critical, though, because it is their net benefit valuation protocols that produce the VOI metrics with which these calculations can be conducted. The Academy study emphasized that the application of decision theory could provide explicit descriptions of how rigorously to evaluate the appropriate value of perfect and imperfect information, much as done 30 years ago in [45, 46, 47].

On the health side of the comparison, VOI estimates need not be calibrated in monetary currency; human lives or other metric can be employed, especially if it can support aggregation across locations. This is one of the major lessons from the “risk to unique and threatened systems” Reason for Concern described in [48], even for the two areas where risks are measured in aggregates. In fact, risks of extreme (weather) events and risks to unique and threatened systems (including human communities) are areas where alternatives to financial currency are preferred.

Assuming access to decision-makers’ lists of operative and quantifiable valuation metrics (including confidence in the ability to keep experienced risk below a tolerable maximum), temporal distributions of the value of enacting a re-opening strategy of a locked-down economy relative to a “stay the course” strategy could be available, for example, and the sign and magnitude of valued in differences in lives or jobs or gross domestic product or income inequality or even the likelihood of slowing progress toward an SDG could all be of interest, so, too, would be estimates of distributions of the value of improved (or depreciated) information about driving variables and/or epidemiological-socio-economic specifications.

## 4. Concluding remarks

The parallel and analogous roles of modeling to support response action in the face of two different sources of global existential risk—global viral pandemics and global human induced climate change—were the motivation for this discussion of the importance of continuing to work to improve multiplicative modeling efforts on both fronts. The rationales for choosing those sources of risk were many. Both are the source of enormous risk with distributions of impacts that are very thick, so “regressing to the tail” is the appropriate frame. Both have adopted similar risk-based approaches to decision making at micro and macro scales [49]. Both have explored similar modeling techniques and have pursued common methods for improvement. Both have faced issues with the communication of difficult subjects and concepts, and both have been subjected to misguided and manipulative attack.

The critical need to continue concerted efforts to improve the science is matched in importance by two other essential components. One is the need to improve communication to decision-makers and the public at large, not only to advance knowledge and understanding of the results at the appropriate decision-making hub and the associated population, but also to defend the results and their communication from misguided and sometime dishonest attack. Both motives are have recently been highlighted by the Working Group on Readying Populations for COVID-19 Vaccine for the Johns Hopkins Center for Health [50] in preparation for achieving wide acceptance of a vaccine that can, when it is created and acclaimed to be safe and effective, be quickly and globally distributed. The other is the recognition that global risks require global responses, and so they require collaborative work across research groups, decision-makers, and populations scattered around the world.

The World Climate Research Program has been devoted to just that for decades, trying to improve both the production of collaborative new scientific results of real social value, but also their communication to positions of power. Climate effort on global scale is impressive so far, but its work is far from done and progress on real action vis-a-vis SDG-13 has been slow. The WCRP mission has not, however, been lost on scholars of the international health community. Chretien et al. [27] posited this assessment of the then current affairs: “New norms for data-sharing during public health emergencies would remove the most obvious hurdle for model comparison. The current situation where groups either negotiate bilaterally with individual countries or work exclusively with global health and development agencies is understandable, but highly ineffective. The EVD outbreak highlights again, after the 2003 Severe Acute Respiratory Syndrome epidemic and the 2009 influenza A (H1N1) pandemic, that an independent, well-resourced global data observatory could greatly facilitate the public health response in many ways, not least of which would be the enablement of rapid, high quality, and easily comparable disease-dynamic studies.”

The widely variant COVID-19 coronavirus experiences across the world brought these points to the fore just as they exposed a plethora of social, ethical, and economic realities. In a world moving toward nationalism with persistent racism, growing inequities and threats from the wealthiest nation on the planet to remove itself from the WHO, global welfare as calibrated by the metrics underlying the 17 SDGs is certainly in peril from these two cross-cutting themes. Recognition of common goals, common approaches, and common dedication to the general welfare of the planet across the climate and health science researchers may not be enough. Bringing those communities together through the matching international institutions designed to confront global crises with science and communication may be one of our best and last chances to avoid trusting universal herd immunity to protect us from everything.

## Acknowledgments

I gratefully acknowledge their many contributions to this work of Henry Jacoby, Richard Richels, and Benjamin Santer. I benefit and enjoy our weekly phone conversations about climate-related topics of the day. We work together on opinion pieces which we have placed in many venues. What appears here, to some degree, was drawn from the cutting room floor or our discussions, so any errors are certainly mine.

## Conflict of interest

The author declares no conflict of interest.

## How to cite and reference

### Cite this chapter Copy to clipboard

Gary Yohe (September 22nd 2020). On the Value of Conducting and Communicating Counterfactual Exercise: Lessons from Epidemiology and Climate Science [Online First], IntechOpen, DOI: 10.5772/intechopen.93639. Available from: