Methodology of Estimating Socioeconomic Burden of Disease Using National Health Insurance (NHI) Data

The cost-of-illness (COI) studies convert the burdens associated with certain illnesses into economic and monetary values so as to measure the socioeconomic costs that are inevitably incurred by a given society in association with certain illnesses. The estimated costs provided by COI studies provide an important basis for estimating the amounts of public health resources spent and productivity losses incurred and thereby make it possible to quantify the socioeconomic burdens that illnesses impose on society in general. In this chapter, we review the diverse methodologies and techniques for estimating the socioeconomic burden of disease, which is widely used in the established literature all over the world, and compare the pros and cons of each. This chapter introduces the existing COI studies in terms of their research designs, data selection and value assessment processes, applied perspectives, and chosen components of costs. Furthermore, this chapter introduces a real-world example of estimating the national economic burden of disease by using the National Health Insurance (NHI) data. We hope that this chapter will help readers better understand and use the COI study.


Introduction
The cost-of-illness (COI) studies convert the burdens associated with certain illnesses into economic and monetary values so as to measure the socioeconomic costs that are inevitably incurred by a given society in association with certain illnesses. The estimated costs provided by COI studies provide an important basis for estimating the amounts of public health resources spent and productivity losses incurred and thereby make it possible to quantify the socioeconomic burdens that illnesses impose on society in general. In other words, COI studies provide important information for determining the socioeconomic costs of illnesses, which, in turn, makes it possible to better prioritize public health policy tasks and issues. In particular, the findings of such studies tend to be straightforward and intuitive and aid policymakers in making related decisions.
In this chapter, we review the diverse methodologies and techniques for estimating the socioeconomic burden of disease, which is widely used in the established literature all over the world, and compare the pros and cons of each. Furthermore, this chapter introduces a real-world example of estimating the national economic burden of disease by using the National Health Insurance (NHI) data.

Review of methods for estimating the costs of illness
This section introduces the existing COI studies in terms of their research designs, data selection and value assessment processes, applied perspectives, and chosen components of costs.

Study designs
COI studies can be roughly divided into two groups, depending on the approaches they adopted to estimate the socioeconomic costs of illnesses. These two approaches are the incidence-based approach and prevalence-based approach [1][2][3].

Incidence-based approach
The incidence-based approach involves estimating the socioeconomic cost of a given illness throughout the entire lifespan of the illness, from its initial stage to the patientʼs complete recovery or death. This involves estimating not only the economic burden currently imposed by the illness but also the cost of future healthrelated losses, including those caused by sequela. This approach allows the researcher to identify economic losses over time, from the present into the future, but makes it impossible to take into account patients who have already suffered from the same disease. In other words, the incidence-based approach may not be well suited to estimating the economic burdens of certain types of illnesses (i.e., those that currently have low incidence rates but high prevalence rates) at certain moments in time.

Prevalence-based approach
Contrary to the incidence-based approach, the prevalence-based approach considers economic burdens accruing from not only existing patients suffering from a given illness for a fixed period of time but also from future and potential patients. This approach is well suited to estimating the economic costs of an illness at certain points in time but may not allow the researcher to estimate the cost accrued throughout the lifespan of the illness, from its initial stage to the patientʼs complete recovery (or death). Furthermore, this approach may not be so amenable to estimating the costs of frequent yet short-lived illnesses that do not last long enough for the researcher to find and identify suitable patients within a given period of time.
The prevalence-based approach is by far the more popular method used in previous studies. This is because it is important to take into account both new and existing patients suffering from the given illnesses in order to estimate the socioeconomic costs of those illnesses during certain periods.
The characteristics and pros and cons of these two approaches are summarized below ( Table 1).

Data selection
COI studies can also be divided into top-down and bottom-up studies, depending on how the data used were obtained [4,5].

Top-down studies
Top-down COI studies make use of data concerning the entire given population, including the entire range of diseases affecting that population on the national level, and then separate the diseases one by one to estimate their individual costs. In Korea, the most favored source of data for such studies is the billing information kept by the NHIS. Billing data provide a convenient glimpse into the total socioeconomic costs of illnesses in the given society. However, the vast scope of these data can easily lead researchers to include in their estimates expenses and costs that are not directly related to the given illness (e.g., costs of prescriptions or medical tests due to sub-diseases).

Bottom-up studies
Bottom-up COI studies review all relevant individual illnesses and then estimate the total socioeconomic cost of these illnesses for the given nation. These studies use the medical records of individual patients to estimate the costs for individual patients and then expand those estimates to arrive at the total cost for the entire group of patients affected. While this method affords relatively greater accuracy in estimation than the top-down method, it is, realistically, quite difficult to estimate the national socioeconomic cost due to the sheer volume and complexity of the data on individual patients. There may also be regional disparities in the availability and use of healthcare services, meaning that the resulting data may fail to represent the entire given society.

Incidence-based approach
Prevalence-based approach

Description
• Estimates the economic cost of an illness throughout its lifespan, ranging from the initial stage to the patientʼs complete recovery (or death) • Estimates the economic cost of an illness during a certain period of time by taking into account the costs generated by both new and existing patients Pros • Allows the researcher to consider not only the current cost but also the future cost of an illness and the sequela it causes and thereby estimate the economic losses incurred both in the present and the future • Better suited to estimating the current cost of an illness • Allows the researcher to consider both new and existing patients at given point(s) in time

Cons
• Makes it difficult for the researcher to consider existing patients that have already been afflicted with the given illness • Not applicable to illnesses that, at present, have high prevalence and low incidence rates • Makes it difficult for the researcher to estimate the total economic cost of an illness throughout its entire lifespan • The researcher may not find patients suffering from the given illness if the illness lasts for relatively short spans of time, despite its high incidence rate Table 1.
Comparison of approaches to estimating the socioeconomic costs of illnesses.

Value assessment
In general, there are two ways to evaluate and estimate the indirect socioeconomic costs of illnesses (namely, losses of labor and productivity). These are the human capital approach and the willingness-to-pay (WTP) approach [6,7].

Human capital approach
The human capital approach is the most commonly used method for estimating the value of human life and the costs of illnesses. Viewing humans as productive actors, this approach estimates the current value of a human life as the discounted future expected income.
In estimating the socioeconomic costs of illnesses, this approach posits patients as productive actors and applies specific discount rates to the income they would have earned through their labor in order to estimate their losses of working hours and resulting losses in productivity. This method equates the costs of death and illnesses to the losses of future total income that patients could have earned had they remained healthy. This approachʼs focus on the losses of labor productivity caused by individualsʼ illnesses reveals the opportunity costs of illnesses and death.
This approach is favored because the data it requires for estimating costs are relatively readily available and the outcomes of the analysis are relatively less influenced by the researcherʼs bias or subjective interpretation. Moreover, this approach translates the direct costs (e.g., costs of healthcare service) and indirect costs (e.g., losses of productivity) incurred by illnesses into losses of future income, estimated on the basis of the patientsʼ current income level. However, this approach may be discriminatory, in effect, against certain underproductive groups, such as students, housewives, and seniors. Some also criticize the approach for its implied ethic, i.e., that the value of human life can be measured on the basis of a personʼs ability to earn income. Finally, the approach also runs the risk of underestimating the intangible costs of illnesses, such as declines in quality of life and psychological suffering.
The human capital approach is the approach most commonly taken by the majority of studies. Compared to the WTP approach, the human capital approach is less time-consuming, more cost-effective, and better suited to ensuring the objectivity of analysis results, as it excludes the researcherʼs bias. Most importantly, it clearly quantifies losses of productivity due to illnesses based on patientsʼ income levels.

Willingness-to-pay approach
Also known as the contingent valuation method, the WTP approach estimates the economic value of something that is not easily converted into a monetary sum by surveying how much people would be willing to pay for it. COI studies adopting this approach ask survey participants how much they would be willing to pay to maintain or improve their health. This approach acknowledges the very commonsensical assumption that peopleʼs preferences for things that are not easily monetized can be used to estimate the economic values of those things. However, as this approach requires people to estimate the economic values of things that they are not used to monetizing, the answers given by survey participants may not be a reliable measure of the true value of those things. The questions used to survey peopleʼs willingness to pay can be either open-ended or close-ended. Open-ended questions ask participants to state the maximum amounts of money they would be willing to pay, while close-ended questions provide a few options from which participants may choose. Close-ended questions can be further divided into questions that apply bidding games and that use the dichotomous choice method. Questions using bidding games identify the maximum amounts people would be willing to pay by presenting them with a series of specific amounts of money and asking them whether they would be willing to pay such amounts. Questions that use the dichotomous choice method, on the other hand, present participants with two options of monetary sums at each time and proceed to the next pair of options depending on which of the preceding options the participants chose. The dichotomous choice method imposes relatively less cognitive burden on participants in deciding the economic values of certain things and allows them to arrive at a decision even in the absence of in-depth knowledge of the market situation. However, the answers that participants choose through this method may be merely the amounts of money they view as acceptable to pay, and not the maximum amounts of money they would be willing to pay. In applying this method, it is also difficult for the researcher to decide the proper intervals between the figures to be presented, meaning that it may take quite a long time for the researcher to identify the final amount of money that participants would actually be willing to pay.
The table below provides a summary of the differences between the human capital approach and the WTP approach ( Table 2).

Perspectives of analysis
The conclusions of analyses on the costs of illnesses may be dramatically different depending on which perspective the researchers chose to adopt. Since first attempted and defined, COI studies have been a popular topic of research and debate among many researchers worldwide. Analyses in the established literature today are largely guided by three perspectives: namely, the payer perspective, the patient perspective, and the societal perspective [8][9][10][11].

Payer perspective
The payer perspective focuses on the costs of illnesses that are paid by insurers and not patients. In South Korea, these costs are the costs covered by the NHI or the healthcare costs confirmed by the Health Insurance Review and Assessment Service (HIRA). These covered costs can be either narrowly construed as only the amounts paid by the insurer or more broadly construed as including the amounts of copayments made by patients as well. Taking the broad meaning would thus require the estimation of the healthcare costs confirmed by the HIRA, which encompass both the costs paid by the insurer and the copayments made by patients.

Patient perspective
The patient perspective requires the researcher to analyze and estimate the costs paid by patients due to given illnesses. These costs include the direct healthcare and non-healthcare costs and indirect costs. The direct healthcare costs include the copayments made by patients, the non-covered costs, and the costs of informal medical services, while the direct non-healthcare costs include the expenses patients have to pay in order to receive medical services, such as transportation expenses. Finally, the indirect costs include the costs incurred by patients in terms of time and the costs of caregiving.

Societal perspective
The societal perspective leads to the estimation of the costs estimated from both the payer and patient perspectives and the losses of societal productivity caused by the given diseases. In other words, the costs estimated based on this perspective include the costs of lost labor and productivity due to patients taking leaves of absence or dying prematurely. These costs may also encompass the costs of declines in quality of life and the psychological suffering of patients.
The table below summarizes the differences among these perspectives ( Table 3).

Components of costs
Existing studies that embrace the societal perspective generally posit several specific components of the costs subject to analysis, including direct costs, indirect costs, and intangible costs [8,[11][12][13]].

Direct costs
Direct costs refer to the amounts of money spent directly on treating or managing a given illness or more specifically, the amounts of money spent at medical institutions for the treatment and management of such illness. These costs can be further broken down into direct healthcare costs and direct non-healthcare costs. The direct healthcare costs include the costs of outpatient and/or hospitalization services and purchasing medications (including prescribed medications) to treat the given illness. Furthermore, it includes the costs incurred by outpatients and hospitalized patients alike due to disease, encompassing covered costs paid by insurer, copayments paid by patients, non-covered costs, and prescription costs.
The direct non-healthcare costs refer to the expenses paid by patients to visit and use the services of medical institutions, such as the costs of transportation and caregiving.

Indirect costs
Indirect costs refer to the losses of labor and productivity that are incurred in addition to the tangible (financial) costs of an illness. Examples include the amounts of time taken off (paid) work to go to medical institutions and the loss of future expected income, not only of patients but also of their family members or other loved ones who are compelled to care for them. The latter example may also be expressed as the opportunity costs of being ill, including the losses of working hours and leisure time.
The indirect costs, or losses of productivity, are estimated by defining the number of hospitalization days as the number of working days lost and the amounts of time spent for outpatient visits, as losses of working time. The losses of future income are due to the premature deaths of patients.

Intangible costs
The intangible costs represent the decline in the quality of life and psychological suffering of patients and loved ones. However, it is notoriously difficult to define and quantify these costs with precision (Drummond et al., 2005). Due to the scarcity of related data and the difficulty of quantification, researchers often forgo estimating these costs ( Table 4). ✓ Table 3.
Costs estimated based on different perspectives.

Example of COI research using NHI data
In this section, we present example of estimating cost-of-illness research using NHI data. This example titled "Socioeconomic Cost of Allergies" estimates the socioeconomic costs associated with allergic diseases using NHI data in South Korea [14]. In South Korea, all citizens are compulsory subscribers to the NHI scheme, which is a type of social insurance, and all medical institutions or health professionals are required to submit claim data to the NHI to charge the bill for the medical services. In other words, the NHI has medical information on around 50 million South Koreans. We hope this example will be useful for readers to conduct cost-of-illness studies.

Study design and cost components
The present example adopts the prevalence-based approach, because it is important to take into account both new and existing patients suffering from allergic diseases during certain periods. As well, this example employs the human capital approach as the value assessment method, because it clearly quantifies losses Cost associated with declining quality of life and psychological suffering Table 4.
Cost components and definitions.
of productivity due to illness based on patientsʼ income levels. Moreover, it is better suited to ensuring the objectivity of analysis results, as it excludes the researcherʼs bias. As for perspective, this study adopts the societal perspective and estimates both the direct costs paid by the insurer and patients and society-wide losses of productivity.
As the purpose of this study is to estimate the entire scope of the socioeconomic costs generated by allergic diseases in South Korea, this study estimates both the direct and indirect costs. The direct costs are divided into healthcare and nonhealthcare costs, as in previous studies that adopted the societal perspective. The indirect costs involve losses of productivity. More specifically, the direct healthcare costs include the costs incurred by outpatients and hospitalized patients, encompassing covered costs paid by insurer, copayments made by patients, noncovered costs, and prescription costs. The direct non-healthcare costs involve all expenses associated with visiting medical institutions, whether as outpatients or hospitalized patients, and receiving services for the treatment and management of allergic diseases, including the costs of transportation and caregiving. The indirect costs, or losses of productivity, are estimated by defining the number of hospitalization days as the number of working days lost and the amounts of time spent for outpatient visits, as losses of working time. The losses of future income due to the premature deaths are estimated for patients aged 15-69 (patients outside of this age bracket are excluded, as, in accordance with the law, they constitute the nonworking-age population). Due to the absence of objective data, however, intangible costs are not estimated in this example. The figure below summarizes the components of costs estimated in this study ( Table 5).

Data source and case definition
To analyze the socioeconomic costs due to allergic diseases in South Korea, this study used the 2014 National Patient Sample (NPS) derived from NHI data that were collected by the HIRA. In South Korea, almost all citizens (98% or higher) are compulsory subscribers to the NHI scheme, which is a type of social insurance, and all medical institutions are required to submit claim data to the HIRA to charge the bill for the medical services they provided when patients visited the medical institution. Consequently, the HIRA has medical information on around 50 million South Koreans. The NPS is the data of patients sampled from the large amount of claim data held by the HIRA, and it is an abridged version of claim data that contain 1-year information regarding medical treatments and prescriptions of the sampled patients. The data contain the information of about 1.4 million patients, who represent a sample of 3% of all patients.  The NHI data is administrative data, and the prevalence rate is influenced by the case definition of disease. In other words, prevalence rates can vary dramatically depending on how the cases are defined. Most of previous studies that made use of administrative data including NHI data generally used primary diagnoses to estimate prevalence rates. This approach, however, carries the risk of either underestimating or overestimating the prevalence rates. This study therefore applied more rigorous criteria in defining prevalence. First, it identified and extracted patients whose primary and secondary diagnoses were indicated using the ICD-10 codes for allergic diseases. Of these patients, this study identified those who had been hospitalized or made at least two outpatient visits each for allergic diseases and had been prescribed drugs commonly used to treat allergic diseases (as indicated in their insurance billing records), such as nedocromil sodium, oral steroids, ventolin, and so on. Only patients meeting these rigorous criteria were admitted into this study as patients.

Estimation methods
The sources of data for each component of the costs estimated in this studyincluding direct and indirect costs-are as follows.

Direct costs
Direct costs refer to the amounts of spending directly related to illness and are divided into healthcare and non-healthcare costs.
Direct healthcare costs are the costs of preventing, treating, or managing illnesses by using medical institutions and include the costs of outpatient services, hospitalization, and medications (prescriptions). The majority of existing studies that estimate direct healthcare costs rely on administrative and official statistics for their estimations. As there is little controversy over the use of administrative and official statistics in estimating the direct healthcare costs of illness, this study, also, uses the NHI data to estimate the direct healthcare costs of allergic diseases. Depending on who pays them, direct healthcare costs can be further broken down into covered costs paid by insurer, copayments made by patients, and non-covered costs also paid by patients. The formula used to estimate the direct healthcare costs is provided below.
The direct non-healthcare costs are the costs of transportation and caregiving incurred by patients in seeking and receiving the services of medical institutions. This study draws upon the national official statistics data on transportation costs. This data include information on the costs of one-way transportation paid by outpatients and hospitalized patients and estimate the final costs of transportation based on the assumption that some patients would be accompanied by their caregivers. The costs of one-way trips were multiplied by the price-adjusted index and used to estimate the total costs of round-trip transportation. As for the cost of caregiving, this study used the average daily cost of hiring a caregiver, as suggested by the caregivers association. Defining the cost of caregiving as the opportunity cost of caregiversʼ time during patientsʼ hospitalization, this study applied the average daily wage for caregivers as the unit cost of caregiving. This unit cost was then multiplied by the number of hospitalization days. The cost of caregiving was also estimated for outpatient visits based on the assumption that each outpatient visit takes up one-third of the caregiverʼs daily working hours. The formula used to estimate the direct non-healthcare costs is provided below.
where NHC = Direct non-healthcare costs, s = Sex, y = Age, N = Number of visits, i = Inpatients, o = Outpatients, Ct = Cost of transportation, L = Length of stay, Cc = Cost of caregiving.

Indirect costs
Indirect costs do not represent actual financial costs paid but the losses of labor and productivity due to illnesses. The indirect costs represent the amounts of working time lost in order to visit and use the services of medical institutions, the loss of future income due to the premature death of patients, and the opportunity cost of caregiving. The opportunity costs so incurred include not only the amount of working time lost but also the amount of leisure time lost. This study draws upon the employment and labor statistics provided by the government in order to estimate the indirect costs. These statistics are part of the official employment and labor statistics that provide information on the average daily and monthly wages, total working hours, and employment rates by sex and age.
The losses of labor (productivity) due to the need for treatment and recovery and losses of future income due to premature death were estimated in the following manner. First, loss of labor can be understood as the opportunity cost of labor incurred by spending time hospitalized or making outpatient visits to medical institutions instead of working. These opportunity costs were thus estimated for the working-age population (ages 15-69). In the case of hospitalized patients, loss of productivity was found by multiplying the daily average wage for each age group by the number of hospitalization days. For outpatients, the daily average wage for each age group was multiplied by the number of outpatient visits made, and the result was divided by 3 (based on the assumption that outpatient visits took up one-third of each patientʼs daily working hours). The formula used to estimate the losses of productivity is provided below.
The loss of future income due to the premature death of patients represents the decrease in expected income that individuals could have earned had they lived to their full life expectancy. To estimate this loss, this study relied upon the raw data for the official statistics on causes of death provided by the government to identify the number of deaths by sex and age and then applied the death rate to the average monthly wage and number of working days for each age group. The loss of future income was again estimated for the working-age population (ages 15-69) only, applying the employment rate of each age group. In order to convert the estimated