Computer-Aided Pharmacoepidemiology in Drug Use and Safety: Examining the Intersection between Data Science and Medicines Research

Pharmacoepidemiology is a relatively new area of study that focuses on research aimed at producing data about drugs’ usage and safety in well-defined populations. Its significant impact on patient safety has translated into improving health care systems worldwide, where it has been widely adopted. This field has developed to an extent that policy and guidelines makers have started using its evidence alongside that produced from randomised controlled clinical trials. Although this significant improvement has been partly attributed to the adoption of statistics and computer-aided models into the way pharmacoepidemiology studies are designed and conducted, certain gaps still exist. This chapter reports some of the significant developments made, along with the gaps observed so far, in the adoption of statistics and computing into pharmacoepidemiology research. The goal is to highlight efforts that have led to the new pharmacoepidemiology developments, while examining the intersection between data science and pharmacology through research narrative reviews of computer-aided pharmacology. The chapter shows the significant number of initiatives that have been applied/adopted to improve pharmacoepidemiology research. Nonetheless, further developments in integrating pharmacoepidemiology with computers and statistics are needed in order to enhance the research agenda.


Introduction
Pharmacoepidemiology is a research field that applies epidemiological concepts into clinical pharmacology. It is important in the provision of an evidence base for pharmacotherapy, due to the abundance of digital data that is mostly scanty [1,2]. Pharmacoepidemiology studies aim to quantify patterns of drug use, as well as adverse drug events, and include prescribing, use appropriateness, adherence to treatment regimen and persistence patterns, along with factors that assist in predicting medication use. In addition, pharmacoepidemiology studies involve drug safety studies in large populations that focus on common and uncommon, as well as predictable and unpredictable, adverse drug reactions (ADRs) [3]. In this case, all the studies rely on meta-data sources, and include primary data, comprising national data sources and surveys or registries; and secondary data comprising administrative databases, claims databases, as well as primary care electronic health and medical records. Figure 1 presents the general description of pharmacoepidemiology [4] being a multidisciplinary type of research field which intersects mathematical disciplines with pharmacology.
Recently, it has been established that clinical trial-oriented studies alone are mostly found to be insufficient to provide conclusive data about the drug's safety and occurrence of adverse effects in larger populations, especially the occurrence of idiosyncratic adverse events and other rare events. This is attributed to both the smaller populations and shorter time periods in which the medicines are tested. Additionally, the effectiveness of the medicines is not fully determined by the time the medicines are launched into the market. Post-marketing surveillance, with the help of either statistical or computing models on longitudinal data, becomes a critical tool for solving these challenges. Furthermore, it is important to highlight that adverse drug events and drug's efficacy can vary between clinical trial protocols and health care delivery systems [5][6][7]. Therefore, pharmacoepidemiology research data has found its way into many aspects of health care systems, such as policy making, drug utilisation and safety decision making, clinical trial design or validation, as well as guidance for the improvement of medical prescription by physicians. Additionally, it is also essential for research and project implementation, methodology development, vaccine and medical devices safety assessment, as well as for minimisation of medication errors and drug-induced toxicities [8].

Challenges and opportunities linked to pharmacoepidemiology
Pharmacoepidemiology research provides very important data for the benefit of patients' safety and care since the data generated is more informative and reliable when the study is well designed. Pharmacoepidemiology research offers many advantages, including the use of large patient samples and inclusion of Computer-Aided Pharmacoepidemiology in Drug Use and Safety: Examining the Intersection… DOI: http://dx.doi.org /10.5772/intechopen.98730 subpopulations that are under research in uncontrolled conditions [1]. It also describes and estimates the risks and other drug safety or efficacy phenomena in practice [9]. Pharmacoepidemiology approaches make the studies cheaper and faster, when compared to the randomised controlled trials initially performed prior to marketing or after marketing, thus enabling the researchers to assess generic medications, as well as medications after a long period of use. The methods used in pharmacoepidemiology research can also be adapted for their use in pharmacovigilance to assist in unearthing unknown side effects or ADRs, together with the discovery of new drug usages [10].
However, pharmacoepidemiology research also has its own drawbacks, such as contamination of the data with confounding factors and many sources of bias (information bias, selection bias), due to the non-randomised nature of treatment selection, being harder to draw conclusions [1,11]. In addition, although inclusion of statistical models into pharmacoepidemiology has been already seen, little is known about integrating pharmacology with community behaviour models, such as social networks. Nonetheless, different scholars have suggested several ways of improving pharmacoepidemiology research, including the use of active comparison groups and within-individual designs, as well as propensity scoring [12]. Additionally, pharmacoepidemiology studies have also been improved by triangulation of multiple analytical and data collection approaches, aiming to enhance the confidence in inferred causal relationships [13]. The developments made in the use of databases, computer and statistical models, and big data have led to enormous improvements in the robustness of pharmacoepidemiology studies and the production of reliable data that is being considered as good evidence for inclusion in guidelines, alongside data generated from randomised controlled trials [14].
Having shown that pharmacoepidemiology research is now producing data that is important for health care guidelines and policy development, it is essential that researchers can collaborate with guideline writers to ensure that they frame their questions to get useful answers. On the other hand, pharmacoepidemiology researchers should design their studies in such a way that guideline writers are provided with concrete answers, thus reducing the uncertainty in the evidence base. Additionally, since pharmacoepidemiology depends on statistical and data sciences, there is a need for further development of techniques in these fields to improve the application of pharmacoepidemiology. It is also important to enhance public engagement and capacity building (data resources and researcher base) to take full advantage of future opportunities [1].

Computational and statistical models in pharmacoepidemiology
The advent and development of computers has led to the development of databases that have become essential in pharmacoepidemiology. Several Electronic Health Records (EHRs) systems have been developed to keep longitudinal digital records of patient health information that are generated after a series of visits in a hospital setting [15]. EHRs contain patient data related to diseases, medicines and laboratory results, if any, and enable the provision of patient centred treatment by the health care providers [16,17]. When these databases are linked or nationalised, it prevents patients repeatedly describing their medical histories, in case of treatment transfers. In addition, such data can be accessed by policy makers or researchers [18]. The use of computerised databases has led to a significant reduction in adverse events and prescription errors [19,20], shorter hospital stays and lower mortality [21], along with better patient tracking, information exchange, efficient handling of information, and real-time data provision [16,22]. Large pharmacoepidemiology data bases facilitate research, but they require well trained personnel to produce and handle big data [17,23]. The use of electronic data has led to a significant reduction in the manual effort of data collection, easy incorporation of regional data into a study, minimal need for recalls, and removal of interviewer bias [24].

Usage of computational and statistical models
So far, a very close link between pharmacology and computational and statistical models has been established (Figure 1). In his work, Bentley [25] provides a well organised chapter describing the key statistical models used in the field of pharmacoepidemiology, both at descriptive and inferential analysis levels. Description uses measures of central tendency (e.g. mean), dispersion (e.g. variance), range (e.g. range, maximum and minimum), expressed in tables (e.g. cross-tabulations) and charts but inference may use regression models (e.g. linear, logistic, and Cox). These statistical techniques and descriptions aid in understanding data on usage and effects of drug administration at community level although it is also important to have a good knowledge of the potential errors involved in the design and analysis of pharmacoepidemiology studies [26].
Statistics play a major role in managing the quantifiable errors present in pharmacoepidemiology data analysis and interpretation [27]. Despite a growing interest in applying epidemiology statistical methods in pharmaceutical studies, a proper usage of the statistical techniques in research studies is often still lacking. For example, Suissa [26] states that pharmacoepidemiology observational research studies are hugely affected by information bias (when selecting variables of interest for the study), selection bias (during inclusion and exclusion of subjects), and confounding bias (due to imbalances in covariates). To circumvent these problems, both randomised controlled trials and cohort and case control studies, also used in epidemiological studies [28], have therefore been recommended by several researchers in pharmacoepidemiology [29].
Accordingly, in order to appraise the significance of epidemiological data and the design of studies on drug risk and safety, we reviewed a couple of research studies that have been conducted in developing countries, including in Malawi. We tried to focus on citing the key statistical and computational methods used in such research studies. To achieve this, we have used a similar approach to the one described by Sequi et al. [30] who presented a review of studies to underscore the processes of analysing and reporting data related to paediatric drug utilisation. Out of the 22 studies, the majority (91%) reported at least one descriptive measure, with the mean being the most common one (82%, 18/22), followed by the standard deviation (23%, 5/22). The chi-square test was observed in 12 studies, while graphical analysis was reported in 14 papers. However, only 16 papers reported the number of drug prescriptions and/or packages, while 10 reported the prevalence of the drug prescription. Consequently, the authors observed that only a few of the studies reviewed applied statistical methods and reported data in a satisfactory manner [27].
In a review paper which has set a position on current usage of statistical models in pharmacoepidemiology, Rosli and others [31] systematically reviewed published studies on drug utilisation in hospitalised neonates in Europe, the United States, India, Brazil, and Iran. The findings were not far from those reported by [30] such that a majority (70%) used descriptive statistics to analyse pharmacoepidemiology data. Nonetheless, some quite remarkable variations were observed regarding to the study design and methodology, sources of data, and sampling process among the selected studies. Of the included studies, 45% were based on cross-sectional or retrospective designs, 40% were prospective, and the remainder (15%) were point prevalence surveys.
Likewise, a 2020 review of 84 drug utilisation studies among neonates by Al-Turkait et al. [32] has shown that median, ranges and mean are frequently reported statistical parameters used for describing pharmacoepidemiology data, and that the style of reporting is mostly descriptive. However, in general public health, Hayat et al. [33] found a variety of statistical methods that were identified in the 216 papers reviewed, whereby 81.9% used an observational study design. 93.1% substantive analysis, 95% used descriptive statistics (tabular or graphical) while statistical inference (t-test, Chi-square, correlation with confidence intervals and p-values) was used in 76%. Logistic regression models were frequently used (38.4%), followed by linear regression models (19.4%).
Sequi et al. [30] recommended that the methodology of drug utilisation studies needs to be improved and we have also observed that drug use in the community is affected by drug availability, pricing, and affordability [34]. Therefore, the logistical and socio-economic aspects of pharmacoepidemiology studies should not be ignored. These two observations were the two key benchmarks for scoring the papers we have found and reviewed. For each study, we extracted information on the study design/type, data sources, period, assessment of variables used and corresponding statistical estimates (incidence, prevalence, pharmacy sales, prescription data), and diagnostic assessment. Table 1 provides the overall summary details of the included papers.
By analysing Table 1, we have noticed that the status of pharmacoepidemiology research in some developing countries, like Malawi, is still at an infancy stage, compared to other developing countries that have adopted advanced inferential analyses into their pharmacoepidemiology research. Our findings do not differ from those reported by Sequi et al. [30], which the majority of the papers focused on the use of descriptive statistics. In addition, few studies clearly demonstrated the use of social/human behaviour network models in pharmacoepidemiology research [44,45]. The inclusion of social/human behaviour network models into pharmacoepidemiology research is fundamental in the understanding of community structure and behaviour, for instance before mass drug administration during an outbreak such as COVID-19 [46,47].

Big data in pharmacoepidemiology
Big data is another translational and frontier scientific discipline at the interface of computer science and statistics [48]. This field has found its way into pharmacoepidemiology research by simplifying the data interpretation and trend analysis of the volumes of data produced from many sources in health records [49]. With big data, pharmacoepidemiology research experts and data scientists detect ADRs, and collaborate in signal detection, verification and validation of medication or vaccine safety signals, as well as in the expansion of analytic methodologies for analysing the large volumes of heterogeneous data [14]. For example, the Exploring and Understanding Adverse Drug Reactions (EU-ADR) European project has incorporated innovative research methods in their pharmacovigilance research through the use of a web platform, aiming to provide advanced medication data exploration and assessment features. This enables data scientists and pharmacoepidemiology experts to mine EHRs for drug-events of their interest [4,50].

SPSS and Excel
Drug utilization [42] Retrospective Pharmacokinetic data of children > = 2 years and adults 2018 Both descriptive and inferential models (mean absolute error from non-linear statistical models)

Importance of databases
Apart from the statistical innovations that have been incorporated into pharmacoepidemiology research, computer databases, networks and software are also playing a critical role in enhancing the field of pharmacoepidemiology, and notable developments have been reported in North America, Europe, and the Asia-Pacific region [51]. The rapid development of computer-aided technology has led to the improvement of electronic health records, which have further led to the advancement of many databases that may be used locally or internationally. Consequently, this has allowed for the possibility of conducting pharmacoepidemiology studies using multiple databases in one or more countries [5]. Several mechanisms have been developed to ensure maximum benefit from the multinational databases and collaborations, such as the creation of research networks [5].
The use of multinational databases enables researchers and policy makers to compare how medications and medical devices are utilised and prescribed, as well as to compare their safety profiles in different settings [51]. It also allows the identification of the underlying factors for the differences or similarities observed, which may include different patient selection, delivery systems and genetic differences [51]. Moreover, it relates drug effects (beneficial or adverse) with differences in ethnic groups (receptor and cytochrome polymorphism effect) and lifestyle (such as dietary habits), among others [52].
Furthermore, the use of multiple databases has overcome sample size problems for rare exposures, outcomes of medications, or rare diseases [5]. While it is challenging to get sufficient power when studying one area, data from multiple databases increase the sample size, thus providing the required statistical power. Additionally, the general use of meta-data may help to solve problems experienced by some countries or areas that do not have their own policies, medications, or medical devices [53]. Therefore, multiple databases provide reference points for such cases. Multiple databases also provide a platform for collaboration and communication amongst researchers in different and distant nations, which has led to the advancement of research in pharmacoepidemiology [5].

Multi-database networks
According to Sturkenboom and Schink [51], electronic healthcare databases have allowed analyses of drug and vaccine utilisation, including investigations of comparative effectiveness and safety. Consequently, both local and international databases have been developed worldwide for use in pharmacoepidemiology. In North America, administrative databases, such as the Health Services Databases in Saskatchewan [54] and the Ontario Health Insurance Plan [55] in Canada, have been set up to manage health care delivery costs, with the fundamental purpose of allowing fiscal tracking and accounting for the delivery of health care from a payer perspective. In the USA, databases managed by Government payers for claims data, for instance Medicaid and Medicare, data are also used in research [56].
Since some of the databases do not cover the entire population, some research networks have been set-up to facilitate multi-database studies that can cover the whole nation. These include the Canadian Drug Safety and Effectiveness Network (CDSEN), set-up in 2007 by the Canadian government, which connects multiple researchers across Canada with expertise in pharmacoepidemiology research [57,58] as well as the USA Food and Drug Administration (FDA), whom established a Sentinel Initiative in 2008 with the purpose of refining safety signals that would enable the development of a scalable and transparent organisational structure to study the safety of medical products [59], mainly through the organisation of multiple databases managed via one research governance structure [5,60].
Similar initiatives have also been adopted in Europe. The EU-ADR [61] was initiated by the European Commission to develop a drug safety surveillance system reliant on connections amongst databases in European countries. This initiative benefits from reliable clinical data obtained from the electronic healthcare records of over 30 million of patients within all the participating countries, thus ensuring an efficient analysis of drug safety issues. Another initiative adopted along the same lines is the Pharmacoepidemiology Research on Outcomes of Therapeutics by an European ConsorTium (PROTECT), which involves 19 collaborative international working groups, networks and research projects in Europe [62]. Nordic countries have established the Nordic Pharmaco-Epidemiological Network (NorPEN), aiming to promote research collaboration and initiate cross-country population-based comparative research in pharmacoepidemiology, for further promotion of safer medication use [63].
The Asian Pharmacoepidemiology Network (AsPEN) was formed in 2008 by four countries, namely Korea, Japan, Australia, and Taiwan, and has currently expanded to Singapore, China, India, Hong Kong, and Thailand [64]. The AsPEN [65] was created to provide mechanisms for supporting pharmacoepidemiology research in Asia, as well as to facilitate the identification and validation of emerging safety issues among the Asian countries. The diversity of the countries provides multi-cultural and ethnic sources of safety data [63,64]. Nevertheless, this is still an ongoing process, as some countries are still developing their own databases and infrastructures. Special attention should be given to the challenges of handling such multi-complex meta-data, and may involve collaboration of mathematicians, statisticians, epidemiologists and computer scientists (Figure 1).
Research networks specialised in certain subpopulations have also been initiated with the goal of studying populations under-represented in clinical trials, such as children, older people, and pregnant women. The most notable networks established for this purpose comprise the Task-force in Europe for Drug Development for the Young (TEDDY) [66]; the European network of population-based registries for the surveillance of congenital anomalies (EUROCAT) [67], for providing early warnings of new teratogenic exposures on congenital anomalies; the Innovative Medicines Initiatives (IMI) [68], for fostering collaboration between different stakeholders (the European Union and the European pharmaceutical industry) in order to address growing challenges in bringing new medicines to market and the rapidly evolving healthcare landscape; the VACCINE.GRID [69], a global network of leading public health organisations concerned with vaccine benefits and risk assessment; and the International Society for Pharmacoepidemiology (ISPE), an international professional organisation dedicated to the open exchange of scientific information for the benefit of people, drug safety in pregnancy, vaccine safety and/or biologics safety [70].
Last but not least, we have also noticed that computational infrastructures have been developed in places where data participants can transform their data locally, as well as execute standardised analytical programs and combine the results [45]. Data science has also been exploited in pharmacoepidemiology research, where it is used in the evaluation of various analytical methods in the context of a network of databases [45,47]. Common data models that are capable of accommodating heterogeneous databases and executing large-scale statistical analyses [71][72][73], whose resources sometimes can be downloaded from a website [74], have also been developed. as those comprising data that may be potentially used to improve pharmacoepidemiology research. Although this is not an exhaustive list, these databases may serve as a supplement to those already reported [51]. Although the majority of pharmacoepidemiology research is found in developed countries, most of these databases are open for re-use of data, thus providing an opportunity for enhanced pharmacoepidemiology research, for instance in Asia and Africa [103].

Challenges with use of databases
Databases have limitations that affect their use in pharmacoepidemiology. Bias is one of the challenges and may be categorised into confounding, selection bias and time-related bias [98]. Confounding is further sub classified into confounding by indication, unmeasured or residual confounding, time-dependent confounding, and health user or adherer effect. Selection bias is reported to be associated with database use, being in the subcategories of protopathic bias, losses to follow up, prevalent user bias, and missing data. Another type of bias widely reported is measurement bias, which comes in the form of miscalculation bias, miscalculation of exposure, as well as miscalculation of outcomes. Time-related bias is classified into immortal bias, immeasurable time bias, time-window bias and time-lag bias [98].

Conclusions
Through a cross-examination of the intersection between data science principles and pharmacoepidemiology, this chapter has demonstrated that pharmacoepidemiology has greatly evolved over the years, from being a mere research field to one that is playing a significant role in the enhancement of patient safety, as well as in the development of health care guidelines and policies. Our examination of the intersection between data science techniques and pharmacoepidemiology was limited to the policy and research narratives of computer-aided pharmacoepidemiology studies across the globe. The level of evidence generated from several studies indicates that the field is now as important as randomised clinical trials have been, which can be attributed to the adoption of statistical and computational principles and practices. However, it is important to highlight that, although there has been a significant number of initiatives reported to improve pharmacoepidemiology research, the identified gaps and challenges presented in this chapter show that this field still has some potential to grow, for instance by properly integrating the existing data science techniques with appropriate principles and practices. The inclusion of both logistical and social/human behaviour network models into pharmacoepidemiology is strongly recommended. argument of the study, reviewed research papers on statistical and computing models, and participated in the manuscript writing process. All authors have read and approved the final manuscript.

Conflict of interest
The authors declare no conflict of interest.