Open access peer-reviewed chapter

Computer-Aided Pharmacoepidemiology in Drug Use and Safety: Examining the Intersection between Data Science and Medicines Research

By Ibrahim Chikowe and Elias Peter Mwakilama

Submitted: September 10th 2020Reviewed: June 4th 2021Published: October 7th 2021

DOI: 10.5772/intechopen.98730

Downloaded: 64


Pharmacoepidemiology is a relatively new area of study that focuses on research aimed at producing data about drugs’ usage and safety in well-defined populations. Its significant impact on patient safety has translated into improving health care systems worldwide, where it has been widely adopted. This field has developed to an extent that policy and guidelines makers have started using its evidence alongside that produced from randomised controlled clinical trials. Although this significant improvement has been partly attributed to the adoption of statistics and computer-aided models into the way pharmacoepidemiology studies are designed and conducted, certain gaps still exist. This chapter reports some of the significant developments made, along with the gaps observed so far, in the adoption of statistics and computing into pharmacoepidemiology research. The goal is to highlight efforts that have led to the new pharmacoepidemiology developments, while examining the intersection between data science and pharmacology through research narrative reviews of computer-aided pharmacology. The chapter shows the significant number of initiatives that have been applied/adopted to improve pharmacoepidemiology research. Nonetheless, further developments in integrating pharmacoepidemiology with computers and statistics are needed in order to enhance the research agenda.


  • Database
  • data science
  • computer-aided
  • pharmacovigilance
  • safety
  • adverse drug reaction

1. Introduction

Pharmacoepidemiology is a research field that applies epidemiological concepts into clinical pharmacology. It is important in the provision of an evidence base for pharmacotherapy, due to the abundance of digital data that is mostly scanty [1, 2]. Pharmacoepidemiology studies aim to quantify patterns of drug use, as well as adverse drug events, and include prescribing, use appropriateness, adherence to treatment regimen and persistence patterns, along with factors that assist in predicting medication use. In addition, pharmacoepidemiology studies involve drug safety studies in large populations that focus on common and uncommon, as well as predictable and unpredictable, adverse drug reactions (ADRs) [3]. In this case, all the studies rely on meta-data sources, and include primary data, comprising national data sources and surveys or registries; and secondary data comprising administrative databases, claims databases, as well as primary care electronic health and medical records. Figure 1 presents the general description of pharmacoepidemiology [4] being a multidisciplinary type of research field which intersects mathematical disciplines with pharmacology.

Figure 1.

Main contributors of Pharmacoepidemiology.

Recently, it has been established that clinical trial-oriented studies alone are mostly found to be insufficient to provide conclusive data about the drug’s safety and occurrence of adverse effects in larger populations, especially the occurrence of idiosyncratic adverse events and other rare events. This is attributed to both the smaller populations and shorter time periods in which the medicines are tested. Additionally, the effectiveness of the medicines is not fully determined by the time the medicines are launched into the market. Post-marketing surveillance, with the help of either statistical or computing models on longitudinal data, becomes a critical tool for solving these challenges. Furthermore, it is important to highlight that adverse drug events and drug’s efficacy can vary between clinical trial protocols and health care delivery systems [5, 6, 7]. Therefore, pharmacoepidemiology research data has found its way into many aspects of health care systems, such as policy making, drug utilisation and safety decision making, clinical trial design or validation, as well as guidance for the improvement of medical prescription by physicians. Additionally, it is also essential for research and project implementation, methodology development, vaccine and medical devices safety assessment, as well as for minimisation of medication errors and drug-induced toxicities [8].


2. Challenges and opportunities linked to pharmacoepidemiology

Pharmacoepidemiology research provides very important data for the benefit of patients’ safety and care since the data generated is more informative and reliable when the study is well designed. Pharmacoepidemiology research offers many advantages, including the use of large patient samples and inclusion of subpopulations that are under research in uncontrolled conditions [1]. It also describes and estimates the risks and other drug safety or efficacy phenomena in practice [9]. Pharmacoepidemiology approaches make the studies cheaper and faster, when compared to the randomised controlled trials initially performed prior to marketing or after marketing, thus enabling the researchers to assess generic medications, as well as medications after a long period of use. The methods used in pharmacoepidemiology research can also be adapted for their use in pharmacovigilance to assist in unearthing unknown side effects or ADRs, together with the discovery of new drug usages [10].

However, pharmacoepidemiology research also has its own drawbacks, such as contamination of the data with confounding factors and many sources of bias (information bias, selection bias), due to the non-randomised nature of treatment selection, being harder to draw conclusions [1, 11]. In addition, although inclusion of statistical models into pharmacoepidemiology has been already seen, little is known about integrating pharmacology with community behaviour models, such as social networks. Nonetheless, different scholars have suggested several ways of improving pharmacoepidemiology research, including the use of active comparison groups and within-individual designs, as well as propensity scoring [12]. Additionally, pharmacoepidemiology studies have also been improved by triangulation of multiple analytical and data collection approaches, aiming to enhance the confidence in inferred causal relationships [13]. The developments made in the use of databases, computer and statistical models, and big data have led to enormous improvements in the robustness of pharmacoepidemiology studies and the production of reliable data that is being considered as good evidence for inclusion in guidelines, alongside data generated from randomised controlled trials [14].

Having shown that pharmacoepidemiology research is now producing data that is important for health care guidelines and policy development, it is essential that researchers can collaborate with guideline writers to ensure that they frame their questions to get useful answers. On the other hand, pharmacoepidemiology researchers should design their studies in such a way that guideline writers are provided with concrete answers, thus reducing the uncertainty in the evidence base. Additionally, since pharmacoepidemiology depends on statistical and data sciences, there is a need for further development of techniques in these fields to improve the application of pharmacoepidemiology. It is also important to enhance public engagement and capacity building (data resources and researcher base) to take full advantage of future opportunities [1].


3. Computational and statistical models in pharmacoepidemiology

The advent and development of computers has led to the development of databases that have become essential in pharmacoepidemiology. Several Electronic Health Records (EHRs) systems have been developed to keep longitudinal digital records of patient health information that are generated after a series of visits in a hospital setting [15]. EHRs contain patient data related to diseases, medicines and laboratory results, if any, and enable the provision of patient centred treatment by the health care providers [16, 17]. When these databases are linked or nationalised, it prevents patients repeatedly describing their medical histories, in case of treatment transfers. In addition, such data can be accessed by policy makers or researchers [18]. The use of computerised databases has led to a significant reduction in adverse events and prescription errors [19, 20], shorter hospital stays and lower mortality [21], along with better patient tracking, information exchange, efficient handling of information, and real-time data provision [16, 22]. Large pharmacoepidemiology data bases facilitate research, but they require well trained personnel to produce and handle big data [17, 23]. The use of electronic data has led to a significant reduction in the manual effort of data collection, easy incorporation of regional data into a study, minimal need for recalls, and removal of interviewer bias [24].

3.1 Progress and limitations

3.1.1 Usage of computational and statistical models

So far, a very close link between pharmacology and computational and statistical models has been established (Figure 1). In his work, Bentley [25] provides a well organised chapter describing the key statistical models used in the field of pharmacoepidemiology, both at descriptive and inferential analysis levels. Description uses measures of central tendency (e.g. mean), dispersion (e.g. variance), range (e.g. range, maximum and minimum), expressed in tables (e.g. cross-tabulations) and charts but inference may use regression models (e.g. linear, logistic, and Cox). These statistical techniques and descriptions aid in understanding data on usage and effects of drug administration at community level although it is also important to have a good knowledge of the potential errors involved in the design and analysis of pharmacoepidemiology studies [26].

Statistics play a major role in managing the quantifiable errors present in pharmacoepidemiology data analysis and interpretation [27]. Despite a growing interest in applying epidemiology statistical methods in pharmaceutical studies, a proper usage of the statistical techniques in research studies is often still lacking. For example, Suissa [26] states that pharmacoepidemiology observational research studies are hugely affected by information bias (when selecting variables of interest for the study), selection bias (during inclusion and exclusion of subjects), and confounding bias (due to imbalances in covariates). To circumvent these problems, both randomised controlled trials and cohort and case control studies, also used in epidemiological studies [28], have therefore been recommended by several researchers in pharmacoepidemiology [29].

Accordingly, in order to appraise the significance of epidemiological data and the design of studies on drug risk and safety, we reviewed a couple of research studies that have been conducted in developing countries, including in Malawi. We tried to focus on citing the key statistical and computational methods used in such research studies. To achieve this, we have used a similar approach to the one described by Sequi et al. [30] who presented a review of studies to underscore the processes of analysing and reporting data related to paediatric drug utilisation. Out of the 22 studies, the majority (91%) reported at least one descriptive measure, with the mean being the most common one (82%, 18/22), followed by the standard deviation (23%, 5/22). The chi-square test was observed in 12 studies, while graphical analysis was reported in 14 papers. However, only 16 papers reported the number of drug prescriptions and/or packages, while 10 reported the prevalence of the drug prescription. Consequently, the authors observed that only a few of the studies reviewed applied statistical methods and reported data in a satisfactory manner [27].

In a review paper which has set a position on current usage of statistical models in pharmacoepidemiology, Rosli and others [31] systematically reviewed published studies on drug utilisation in hospitalised neonates in Europe, the United States, India, Brazil, and Iran. The findings were not far from those reported by [30] such that a majority (70%) used descriptive statistics to analyse pharmacoepidemiology data. Nonetheless, some quite remarkable variations were observed regarding to the study design and methodology, sources of data, and sampling process among the selected studies. Of the included studies, 45% were based on cross-sectional or retrospective designs, 40% were prospective, and the remainder (15%) were point prevalence surveys.

Likewise, a 2020 review of 84 drug utilisation studies among neonates by Al-Turkait et al. [32] has shown that median, ranges and mean are frequently reported statistical parameters used for describing pharmacoepidemiology data, and that the style of reporting is mostly descriptive. However, in general public health, Hayat et al. [33] found a variety of statistical methods that were identified in the 216 papers reviewed, whereby 81.9% used an observational study design. 93.1% substantive analysis, 95% used descriptive statistics (tabular or graphical) while statistical inference (t-test, Chi-square, correlation with confidence intervals and p-values) was used in 76%. Logistic regression models were frequently used (38.4%), followed by linear regression models (19.4%).

Sequi et al. [30] recommended that the methodology of drug utilisation studies needs to be improved and we have also observed that drug use in the community is affected by drug availability, pricing, and affordability [34]. Therefore, the logistical and socio-economic aspects of pharmacoepidemiology studies should not be ignored. These two observations were the two key benchmarks for scoring the papers we have found and reviewed. For each study, we extracted information on the study design/type, data sources, period, assessment of variables used and corresponding statistical estimates (incidence, prevalence, pharmacy sales, prescription data), and diagnostic assessment. Table 1 provides the overall summary details of the included papers.

Study type/designData source(s)YearStatistical methodsVariable(s) of interestReference
Cross-sectionalSurvey questionnaire data2018Descriptive (percentages, frequencies, charts, median, ratios)
Drug availability, Drug pricing, Affordability[34]
Controlled trialArticles2017-Vaccination times, Dosage amounts[35]
Cross-sectionalProspective population census, passive surveillance, serological studies and healthcare utilisation surveys2017Descriptive (charts, percentages)
Pathogen transmission, exposure and susceptibility[36]
RandomisationBasic survey2019Descriptive (percentages)
Excel & SPSS
Drug abuse, Prevalence[37]
CohortAnonymised patient record database2013–2016Descriptive (percentages), inferential-negative binomial regression (confidence intervals)
Incidence and mortality ratios[38]
Randomized Clinical TrialClinical data2012Descriptive (proportions), inferential (chi-square test, Kruskal-Wallis test, confidence intervals, incidence rate ratio, p-values, risk ratios)
Software- Stata
Antiretroviral (ARV) usage, initiation[39]
Key Informant Interviews (KII) and Focus Groups (FGs)Recorded and transcribed qualitative data2019Thematic analysis
Vaccination trials[40]
Matched case–control studyCase–control study data1993Descriptive (tables, frequencies, percentages) and inferential (conditional logistic regression, relative risks, odds ratio, likelihood ratios, and confidence intervals)
Software- not mentioned
BCG vaccine, efficacy, leprosy[41]
Cross-sectionalDrug prescription data from hospital electronic database2020Descriptive (frequencies and percentages for categorical variables) and (means, medians, standard deviations (SD), and interquartile ranges (IQR) for continuous variables). Mean and SD were used for normal distribution and median and IQR were used for skewed distribution.
SPSS and Excel
Drug utilization[42]
RetrospectivePharmacokinetic data of children > = 2 years and adults2018Both descriptive and inferential models (mean absolute error from non-linear statistical models)
Drug dosing and clearance[43]

Table 1.

A review of computer aided research studies and usage of statistical models in Pharmacoepidemiology.

By analysing Table 1, we have noticed that the status of pharmacoepidemiology research in some developing countries, like Malawi, is still at an infancy stage, compared to other developing countries that have adopted advanced inferential analyses into their pharmacoepidemiology research. Our findings do not differ from those reported by Sequi et al. [30], which the majority of the papers focused on the use of descriptive statistics. In addition, few studies clearly demonstrated the use of social/human behaviour network models in pharmacoepidemiology research [44, 45]. The inclusion of social/human behaviour network models into pharmacoepidemiology research is fundamental in the understanding of community structure and behaviour, for instance before mass drug administration during an outbreak such as COVID-19 [46, 47].

3.1.2 Big data in pharmacoepidemiology

Big data is another translational and frontier scientific discipline at the interface of computer science and statistics [48]. This field has found its way into pharmacoepidemiology research by simplifying the data interpretation and trend analysis of the volumes of data produced from many sources in health records [49]. With big data, pharmacoepidemiology research experts and data scientists detect ADRs, and collaborate in signal detection, verification and validation of medication or vaccine safety signals, as well as in the expansion of analytic methodologies for analysing the large volumes of heterogeneous data [14]. For example, the Exploring and Understanding Adverse Drug Reactions (EU-ADR) European project has incorporated innovative research methods in their pharmacovigilance research through the use of a web platform, aiming to provide advanced medication data exploration and assessment features. This enables data scientists and pharmacoepidemiology experts to mine EHRs for drug-events of their interest [4, 50].

3.2 Databases

3.2.1 Importance of databases

Apart from the statistical innovations that have been incorporated into pharmacoepidemiology research, computer databases, networks and software are also playing a critical role in enhancing the field of pharmacoepidemiology, and notable developments have been reported in North America, Europe, and the Asia-Pacific region [51]. The rapid development of computer-aided technology has led to the improvement of electronic health records, which have further led to the advancement of many databases that may be used locally or internationally. Consequently, this has allowed for the possibility of conducting pharmacoepidemiology studies using multiple databases in one or more countries [5]. Several mechanisms have been developed to ensure maximum benefit from the multinational databases and collaborations, such as the creation of research networks [5].

The use of multinational databases enables researchers and policy makers to compare how medications and medical devices are utilised and prescribed, as well as to compare their safety profiles in different settings [51]. It also allows the identification of the underlying factors for the differences or similarities observed, which may include different patient selection, delivery systems and genetic differences [51]. Moreover, it relates drug effects (beneficial or adverse) with differences in ethnic groups (receptor and cytochrome polymorphism effect) and lifestyle (such as dietary habits), among others [52].

Furthermore, the use of multiple databases has overcome sample size problems for rare exposures, outcomes of medications, or rare diseases [5]. While it is challenging to get sufficient power when studying one area, data from multiple databases increase the sample size, thus providing the required statistical power. Additionally, the general use of meta-data may help to solve problems experienced by some countries or areas that do not have their own policies, medications, or medical devices [53]. Therefore, multiple databases provide reference points for such cases. Multiple databases also provide a platform for collaboration and communication amongst researchers in different and distant nations, which has led to the advancement of research in pharmacoepidemiology [5].

3.2.2 Multi-database networks

According to Sturkenboom and Schink [51], electronic healthcare databases have allowed analyses of drug and vaccine utilisation, including investigations of comparative effectiveness and safety. Consequently, both local and international databases have been developed worldwide for use in pharmacoepidemiology. In North America, administrative databases, such as the Health Services Databases in Saskatchewan [54] and the Ontario Health Insurance Plan [55] in Canada, have been set up to manage health care delivery costs, with the fundamental purpose of allowing fiscal tracking and accounting for the delivery of health care from a payer perspective. In the USA, databases managed by Government payers for claims data, for instance Medicaid and Medicare, data are also used in research [56].

Since some of the databases do not cover the entire population, some research networks have been set-up to facilitate multi-database studies that can cover the whole nation. These include the Canadian Drug Safety and Effectiveness Network (CDSEN), set-up in 2007 by the Canadian government, which connects multiple researchers across Canada with expertise in pharmacoepidemiology research [57, 58] as well as the USA Food and Drug Administration (FDA), whom established a Sentinel Initiative in 2008 with the purpose of refining safety signals that would enable the development of a scalable and transparent organisational structure to study the safety of medical products [59], mainly through the organisation of multiple databases managed via one research governance structure [5, 60].

Similar initiatives have also been adopted in Europe. The EU-ADR [61] was initiated by the European Commission to develop a drug safety surveillance system reliant on connections amongst databases in European countries. This initiative benefits from reliable clinical data obtained from the electronic healthcare records of over 30 million of patients within all the participating countries, thus ensuring an efficient analysis of drug safety issues. Another initiative adopted along the same lines is the Pharmacoepidemiology Research on Outcomes of Therapeutics by an European ConsorTium (PROTECT), which involves 19 collaborative international working groups, networks and research projects in Europe [62]. Nordic countries have established the Nordic Pharmaco-Epidemiological Network (NorPEN), aiming to promote research collaboration and initiate cross-country population-based comparative research in pharmacoepidemiology, for further promotion of safer medication use [63].

The Asian Pharmacoepidemiology Network (AsPEN) was formed in 2008 by four countries, namely Korea, Japan, Australia, and Taiwan, and has currently expanded to Singapore, China, India, Hong Kong, and Thailand [64]. The AsPEN [65] was created to provide mechanisms for supporting pharmacoepidemiology research in Asia, as well as to facilitate the identification and validation of emerging safety issues among the Asian countries. The diversity of the countries provides multi-cultural and ethnic sources of safety data [63, 64]. Nevertheless, this is still an ongoing process, as some countries are still developing their own databases and infrastructures. Special attention should be given to the challenges of handling such multi-complex meta-data, and may involve collaboration of mathematicians, statisticians, epidemiologists and computer scientists (Figure 1).

Research networks specialised in certain subpopulations have also been initiated with the goal of studying populations under-represented in clinical trials, such as children, older people, and pregnant women. The most notable networks established for this purpose comprise the Task-force in Europe for Drug Development for the Young (TEDDY) [66]; the European network of population-based registries for the surveillance of congenital anomalies (EUROCAT) [67], for providing early warnings of new teratogenic exposures on congenital anomalies; the Innovative Medicines Initiatives (IMI) [68], for fostering collaboration between different stakeholders (the European Union and the European pharmaceutical industry) in order to address growing challenges in bringing new medicines to market and the rapidly evolving healthcare landscape; the VACCINE.GRID [69], a global network of leading public health organisations concerned with vaccine benefits and risk assessment; and the International Society for Pharmacoepidemiology (ISPE), an international professional organisation dedicated to the open exchange of scientific information for the benefit of people, drug safety in pregnancy, vaccine safety and/or biologics safety [70].

Last but not least, we have also noticed that computational infrastructures have been developed in places where data participants can transform their data locally, as well as execute standardised analytical programs and combine the results [45]. Data science has also been exploited in pharmacoepidemiology research, where it is used in the evaluation of various analytical methods in the context of a network of databases [45, 47]. Common data models that are capable of accommodating heterogeneous databases and executing large-scale statistical analyses [71, 72, 73], whose resources sometimes can be downloaded from a website [74], have also been developed. Table 2 illustrates a few databases that are currently being used as well as those comprising data that may be potentially used to improve pharmacoepidemiology research. Although this is not an exhaustive list, these databases may serve as a supplement to those already reported [51].

Database nameHost(s)DesignDataLocationTarget populationData coverageReference(s)
Electronic Patient Registration SystemQueen Elizabeth Central HospitalMultipleMultipleMalawiVariousVital signs data, treatment, demographic data, diagnostic information[38]
IADB.nlMultipleMultipleNetherlandsOver 500,000 peopleLive and stillbirth pregnancy identification, medicine use data, prescriptions from 54 community pharmacies[52]
DEFF Research DatabaseMinistry of science, Technology and innovation; Ministry of Culture; Ministry of EducationMultipleMultipleDenmarkCountrywideDispensed drugs, with potential for linkage to outcomes[75]
Odense University Pharmacoepidemiological Database (OPED)University of Southern DenmarkMultipleMultipleCounty of Funen in DenmarkCountrywideReimbursed prescriptions[76]
Disease Analyser Patient DatabaseMultipleMultipleGermanyGerman, UK, French, and Austrian populationDiagnoses, prescriptions, risk factors (such as smoking and obesity), and laboratory values for approximately 10 million patients[77]
German Longitudinal Prescription Database (LRx)MultipleMultipleGermanyCountrywideDiseases, drug utilisation, treatment costs, 60% of prescriptions reimbursed by statutory health insurance funds in Germany[78]
Database on Veterinary Clinical Research in Homeopathy.MultipleMultipleGermanyMany200 entries of randomised clinical trials, non-randomised clinical trials, observational studies, drug proving, case reports and case series[79]
UK General Practice Research Database (GPRD).UK Department of HealthLongitudinalCase reportsUK5 million patients; CountrywideCollated information from over 500 general physicians’ practices[80, 81]
Clinical Research DatabaseMemorial Sloan-Kettering Cancer Center (MSKCC)MultipleMultiplePatients on IRB approved studies who have passed through bone marrow transplant (BMT)Diseases, pathology, infusion, treatment, among others.[82]
Cancer Research DataBase (CRDB)Cancer informatics projectMultipleMediation and data warehousingSmall moleculeSmall molecule data, computational docking results, functional assays, and protein structure data[83]
Danish Database for Biological Therapies in Rheumatology (DANBIO)DANBIOMultipleMultipleDenmarkPatients taking Biological treatmentsPatients with rheumatoid arthritis (RA), psoriatic arthritis (PsA) and axial spondyloarthritis (Ax SpA), who are followed longitudinally[84, 85]
The FoodCast Research Image Database (FRIDa)MultipleMultipleSwedenWide range of foodstuff and related materials877 images from eight different categories: natural-food, natural-non-food items. Artificial food-related objects[86]
Pharmacy Dispensing DatabaseMultipleMultipleNetherlands, Denmark, Norway, Wales, France and Tuscany-ItalyCountrywideMedicine use data[87]
Danish National Patient Registry, Norway Medical Birth RegistryMultipleMultipleNorway and DenmarkCountrywidePregnancy loss identification[87]
Influenza Research Database (IRD)Bioinformatics Resource CenterMultipleMultipleUSAll species of influenza virus sequence dataInfluenza virus data, analytical and visualisation tools for influenza virus, personal workbenches for storing data[88, 89]
Beth Israel Deaconess Medical CentreWashington heart Centre, Beth Israel hospital, BostonMultipleMultipleUSACountrywidePatient problems, medication, lab results[90]
USDA’s National Nutrient Database for Standard Reference, the Dietary Supplement Ingredient Database, the Food and Nutrient Database for Dietary Studies, and the USDA’s Food Patterns Equivalents DatabaseUS Department of Agriculture (USDA)MultipleMultipleUSAFoodstuffsFood and nutrients[91]
Camden and Islington NHS Foundation Trust (C&I) Research DatabaseSouth London and Maussley NHS Foundation Trust (SLaM)MultipleMultipleUKCountrywide108,168 mental health patients; 23,538 were receiving active care[92]
Population and Housing Census (PHC), Health and Welfare Survey (HWS), Socio-Economic Survey (SES), Reproductive Health Survey (RHS), National Disability Survey (NDS), Multiple Indicator Cluster Survey (MICS)National Statistics Office (NSO)Interviews, face to face, self-enumeration, internetCross sectionalThailandVariousGeneral population, health insurance, illness, health services, payment, equity, injury, co-morbidity, income, expenditure, debt, household distribution, family planning, maternal and child health. AIDS, Cancer, infertility, sex education, adolescent health[93]
Cancer RegistryNational Cancer Institute (NCI)LongitudinalCase reportsThailandAll patientsCancerous diseases, medicines[93]
Thai VigibaseHealth Product Vigilance Centre (HPVC)Case reportsThailandAll patientsAdverse events[93]
Adverse Events DatabasePharmaceutical and Medical Devices Agency (PMDA)Case reportsJapanCountrywideAdverse events[93]
National Community Pharmacy GroupMultipleMultipleSouth AfricaCountrywideDrug utilisation[94]
South African Medicine Claims DataPharmaceutical Benefit Management Company (PBM)MultipleMultipleSouth AfricaCountrywideMedicines claims[95]
Strategic Typhoid Alliance Across Africa (STRATAA)Malawi, Nepal, and BangladeshCountrywideDemographic data, typhoid disease data[96]
VigiBaseUppsala Monitoring Centre (UMC)MultipleMultipleSwedenWorldwideAdverse drug events[97]
District Health Information System 2 (DHIS-2)Kenya Medical Research Institute (KEMRI), Kamuzu Central HospitalMultipleMultipleKenya, Malawi, Uganda, Zambia [98]VariousGeneral health records and drug supply[99, 100]
Mitishamba Database of Natural ProductsUniversity of NairobiAnti-Malaria drugsNatural productsKenyaSub-Saharan AfricaMedicinal plants[101]
International Databases to Evaluate AIDS (IeDEA-EA)KEMRI, Mbarara Univ. & TanzaniaHIV-AIDS careDrugs and Personal Protective Equipment (PPEs)Kenya, Tanzania, UgandaEast African populationHIV care treatment[102]

Table 2.

Computer databases currently used in pharmacoepidemiology research.

Although the majority of pharmacoepidemiology research is found in developed countries, most of these databases are open for re-use of data, thus providing an opportunity for enhanced pharmacoepidemiology research, for instance in Asia and Africa [103].

3.2.3 Challenges with use of databases

Databases have limitations that affect their use in pharmacoepidemiology. Bias is one of the challenges and may be categorised into confounding, selection bias and time-related bias [98]. Confounding is further sub classified into confounding by indication, unmeasured or residual confounding, time-dependent confounding, and health user or adherer effect. Selection bias is reported to be associated with database use, being in the subcategories of protopathic bias, losses to follow up, prevalent user bias, and missing data. Another type of bias widely reported is measurement bias, which comes in the form of miscalculation bias, miscalculation of exposure, as well as miscalculation of outcomes. Time-related bias is classified into immortal bias, immeasurable time bias, time-window bias and time-lag bias [98].


4. Conclusions

Through a cross-examination of the intersection between data science principles and pharmacoepidemiology, this chapter has demonstrated that pharmacoepidemiology has greatly evolved over the years, from being a mere research field to one that is playing a significant role in the enhancement of patient safety, as well as in the development of health care guidelines and policies. Our examination of the intersection between data science techniques and pharmacoepidemiology was limited to the policy and research narratives of computer-aided pharmacoepidemiology studies across the globe. The level of evidence generated from several studies indicates that the field is now as important as randomised clinical trials have been, which can be attributed to the adoption of statistical and computational principles and practices. However, it is important to highlight that, although there has been a significant number of initiatives reported to improve pharmacoepidemiology research, the identified gaps and challenges presented in this chapter show that this field still has some potential to grow, for instance by properly integrating the existing data science techniques with appropriate principles and practices. The inclusion of both logistical and social/human behaviour network models into pharmacoepidemiology is strongly recommended.



This publication was made possible with funding from the Agency for Scientific Research and Training (ASRT) in Malawi. Sincere thanks are due to Dr. David Scott for the technical, language editing and proofreading support on the manuscript.


Conflict of interest

The authors declare no conflict of interest.


Author contributions

IC conceived the study, performed the review of pharmacoepidemiology databases and participated in the manuscript writing process. EM reshaped the argument of the study, reviewed research papers on statistical and computing models, and participated in the manuscript writing process. All authors have read and approved the final manuscript.


Appendices and nomenclature


Adverse Drug Reactions


Acquired Immunodeficiency Syndrome


Antiretroviral drugs


Asian Pharmacoepidemiology Network


BCG-Bacille Calmette-Guerin


Bone Marrow Transplant


Canadian Drug Safety and Effectiveness Network


Coronavirus Disease 2019


Danish Database for Biological Therapies in Rheumatology


District Health Information System (version 2)


Electronic Health Records


Exploring and Understanding Adverse Drug Reactions


European Network of Population-based Registries for the Surveillance of Congenital Anomalies


Food and Drug Administration


Focus Groups Discussion


The FoodCast Research Image Database


UK General Practice Research Database


Health Product Vigilance Centre


Health and Welfare Survey

InterAction Database


East African International Databases to Evaluate AIDS


Innovative Medicines Initiatives


Interquartile Range


Influenza Research Database


International Society for Pharmacoepidemiology


Key Informant Interviews


Multiple Indicator Cluster Survey


Memorial Sloan-Kettering Cancer Centre


National Cancer Institute


National Disability Survey


Nordic Pharmaco- Epidemiological Network


National Statistical Office


Odense University Pharmacoepidemiological Database


Pharmaceutical Benefit Management Company


Population and Housing Census


Pharmaceutical and Medical Devices Agency


Research on Outcomes of Therapeutics by an European ConsorTium


Reproductive Health Survey


Standard Deviations


Socio-Economic Survey


Statistical Package for Social Scientists


Strategic Typhoid Alliance across Africa


Task-force in Europe for Drug Development for the Young


U.S. Department of Agriculture

© 2021 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Ibrahim Chikowe and Elias Peter Mwakilama (October 7th 2021). Computer-Aided Pharmacoepidemiology in Drug Use and Safety: Examining the Intersection between Data Science and Medicines Research, New Insights into the Future of Pharmacoepidemiology and Drug Safety, Maria Teresa Herdeiro, Fátima Roque, Adolfo Figueiras and Tânia Magalhães Silva, IntechOpen, DOI: 10.5772/intechopen.98730. Available from:

chapter statistics

64total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Basics and Essentials of Medical Devices Safety Surveillance

By Vivekanandan Kalaiselven, Shatrunajay Shukla, Nikita Mishra and Pawan Kumar

Related Book

First chapter

Hot-Melt Extrusion (HME): From Process to Pharmaceutical Applications

By Mohammed Maniruzzaman, Dennis Douroumis, Joshua S. Boateng and Martin J. Snowden

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us