A Systematic Review of Knowledge Visualization Approaches Using Big Data Methodology for Clinical Decision Support

This chapter reports on results from a systematic review of peer-reviewed studies related to big data knowledge visualization for clinical decision support (CDS). The aims were to identify and synthesize sources of big data in knowledge visualization, identify visualization interactivity approaches for CDS, and summarize outcomes. Searches were conducted via PubMed, Embase, Ebscohost, CINAHL, Medline, Web of Science, and IEEE Xplore in April 2019, using search terms representing concepts of: big data, knowledge visualization, and clinical decision support. A Google Scholar gray literature search was also conducted. All references were screened for eligibility. Our review returned 3252 references, with 17 studies remaining after screening. Data were extracted and coded from these studies and analyzed using a PICOS framework. The most common audience intended for the studies was healthcare providers (n = 16); the most common source of big data was electronic health records (EHRs) (n = 12), followed by microbiology/pathology laboratory data (n = 8). The most common intervention type was some form of analysis platform/tool (n = 7). We identified and classified studies by visualization type, user intent, big data platforms and tools used, big data analytics methods, and outcomes from big data knowledge visualization of CDS applications.


Introduction
A clinical decision support system (CDSS) involves the use of digital information and communication technologies to bring relevant knowledge to bear on the healthcare and well-being of the patient ( [1], p. 6). A CDSS has the following components ( [1], p. 11): Purpose: Task or process of clinical care which uses the CDSS. e. Application Environment (CDSS interaction with a host application (e.g. an electronic health record system)) The growing use of medical/healthcare big data and data analytics is having a profound impact on decision models and knowledge bases in CDSS development and applications of today. There is a particular need to address user requirements when attempting to recognize patterns in the massive volumes of data being presented to decision makers (as in the Result Specification component of a CDSS structure, as outlined above). In psychology and cognitive neuroscience, pattern recognition in humans is the result of a cognitive process whereby the brain attempts to match information from a stimulus that is received and entered into short term memory with certain content retrieved from long-term memory. While the stimulus may arise from any of our senses, in this chapter we will focus on visual sensations of information that can inform decision making.
Why is knowledge visualization important to clinical decision support? Knowledge visualization is essential when there is a need to augment human capabilities rather than to attempt to automate decision-making computationally. Data mining is widely used for discovering latent knowledge from databases. This can contribute to clinical decision making. Electronic health record (EHR) and other healthcare systems contain large volumes of patient data, often with different formats and structures, so these data are typically accessible only in a form that is not conducive to rapid synthesis and interpretation for potential therapeutic outcomes.
Medical data inherently contain information that provides support for patientcentric and personalized healthcare. Machine learning approaches, combined with high-performance distributed computing technologies such as Hadoop can assist in the exploitation of big healthcare databases. In a high pressure clinical environment, well-designed knowledge-based visualization can play a key role for clinicians and managers who must deal with complex issues and decisions. These may arise from algorithms derived through computational results from healthcare databases containing data from many patients, with each record having hundreds of attributes. As other chapters in this book discuss in detail, there are machine learning/deep learning and other approaches that can convert these massive databases into decision support resources. These resources are virtually never deterministic and are difficult to validate to a high degree of accuracy, but they can help to support decision makers needing to make individualized healthcare decisions.
The value in knowledge visualization lies in presenting knowledge to the user in a way that gives the user factual information that is useful for current decisions. Since there are multiple dimensions to every specific decision, the visualization designer must provide flexibility in potential displays, while taking into account A Systematic Review of Knowledge Visualization Approaches Using Big Data Methodology… DOI: http://dx.doi.org /10.5772/intechopen.90266 limitations on computer resources, human capabilities, and the display environment. Visualization system usage can be analyzed in terms of why the user needs it, what data is shown, and how the display idiom is designed ( [2], p. 1).
The digitization of patient health records has profoundly impacted medicine and healthcare. The compilation of medical history provides a holistic account of patient condition, family history, social determinants, procedures, and medications. In addition to regular treatment benefits, the availability of this information in digital form has created opportunities for population-level monitoring and studies that can help to guide initiatives for improving quality of care. Each hospital client population base varies in geography, size, and organizational structure, and extracting meaning from the data to support a hospital's strategic mission requires a combination of clinical, statistical, and technical literacy. Effective visualization of data resources is essential to the efficient use of these resources by the healthcare facility [3].
The objectives of this study are: • To explore the sources of big data in knowledge visualization for clinical decision support; • To identify the visualization and interactivity approaches that have been used in the light of recent CDSS advances in the use of big data; • To identify big data analytical techniques and technologies uses in CDSS with knowledge visualization; • To determine the purposes of CDSS, and to illustrate the potential benefits of big data and knowledge visualization for these purposes; • To recognize what appears to be the future of visualization in the support of big data and CDSS By systematically investigating these objectives in detail, this systematic review will make a significant contribution towards understanding how big data knowledge visualization can be used in clinical decision support. This will benefit the development of future applications in this field.

Design
This systematic review was carried out according to the recommendations and reporting specified in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [4,5].

Inclusion and exclusion criteria
Any peer-reviewed studies published from January 2008 to April 2019 that explored the applications of big data knowledge visualization in a clinical decision support setting were retained for full-text review. Excluded studies included reports on results of: abstracts, surveys, focus groups, feasibility studies, monitoring devices, "what-if " analysis, big data collection techniques, commentaries, letters, editorials or short reports, mere usage of big data technologies or knowledge visualization, and studies not published in English. Publications describing big data and visualization applications that were not developed for clinical decision support were also excluded.

Search strategy and screening process
We developed the search strategy according to the Cochrane Handbook of Systematic Reviews of Interventions [6]. Search keywords (terms) were grouped into three categories: big data, knowledge visualization, and clinical decision support (see Table 1). Searches were conducted in electronic databases: EBSCOHOST, EMBASE, MEDLINE VIA OVID, PUBMED, IEEE XPLORE, WEB of SCIENCE, and GOOGLE SCHOLAR (the latter for a gray literature search) on April 5-6, 2019. We also used the related article function in PubMed, Science Direct, and Springer Link databases on initially included studies to identify additional studies.
Studies were selected in two steps. First, the title and abstract for each study were screened for inclusion and exclusion criteria and classified by two reviewers. If the title and abstract did not provide enough information to assess its relevance or if a final decision could not be made, we assessed the full article. Second, the full texts of the studies without enough information and/or deemed to be potentially relevant were reviewed randomly by two reviewers. Any disagreements between the two reviewers were resolved via discussion or by consultation with a third reviewer. Studies cited in eligible articles were also reviewed following a similar screening process. The studies identified for systematic review were examined by two reviewers qualitatively, as described below.

Data: collection, management, analysis and classification
All reviewers used a shared spreadsheet template to classify and summarize eligible studies and keep track of review progress. Counts and percentages were based on the databases from which each study was identified in the following order: WEB OF SCIENCE, IEEE XPLORE, PUBMED, EMBASE, EBSCOHOST and GOOGLE SCHOLAR. For example, if a study was identified in both WEB OF SCIENCE and EMBASE, it counted as being found within the WEB OF SCIENCE.
All eligible studies were classified using the Population/Problem-Intervention-Comparison-Outcomes-Study (PICOS) design framework [6,7], which provides a common approach for detailed specification of the review questions and criteria during

KEYWORDS: (P) AND (I) AND (O):
within each group the keywords are combined using the "OR" operator study design. In this systematic review, the same PICOS categories are used to synthesize the data extracted from eligible studies. All PICOS subcategories were identified and classified through the review process. Similarly [7] where applicable, pre-existing taxonomies such as interaction taxonomy [8,9], major date types of big data in health care [10] and target audience [7] were integrated with the PICOS framework. The synthesis of the findings contains results for each PICOS category (Population/Problem-Intervention-Comparison-Outcomes-Study Design) which covers all related subcategories.

Results
A total of 2338 references were retrieved from our first search of electronic databases. A second search of the gray literature via GOOGLE SCHOLAR (n = 780) and a third manual search (n = 134) yielded a total of 3252 studies. We then removed 866 (26.6%) duplicate studies. We screened all titles and abstracts for 2386 (73.4%) papers and excluded 1843 studies because the visualizations discussed were not for healthcare, studies were only for big data without visualization or were not for CDS (clinical decision support), papers that used genomics/ DNA or image data, or presented only an abstract. The full text of each of the remaining 543 (16.7%) studies was then read. A search of citations yielded an additional 34 eligible studies. A total of 577 studies were classified and 560 of these studies were excluded. The PRISMA flow diagram (Figure 1) summarizes the screening process. The 17 studies remaining were included in the qualitative synthesis [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27] that follows.

Characteristics of included studies
The majority of the 17 included publications were indexed in the Web of Science (n = 12, 70.6%). The series Lecture Notes in Computer Science (LNCS) and the Journal of the American Medical Informatics Association (JAMIA) were the other sources of the studies (n = 3 (17.6%) and n = 2 (11.8%) respectively). More than half of the studies (n = 9, 52.9%) were journal publications and nearly 94% of the studies were published in the past 4.5 years (2015-2019). Studies originated from 10 countries: the USA tops the list with 9 (near 53%) publications, followed by Canada with 3 (17.6%), France and UAE with 2 each (11.8%), and Greece, Italy, South Korea, Portugal, Taiwan and Spain with 1 each (5.9%). 5 (29.4%) studies were conducted in two countries and more than half of the studies were conducted at a university. More than 82% of the studies discussed one specific type of visualization tool or system.
A majority of the studies were in non-intensive hospital settings (n = 9, 53%), 2 in intensive surgical and trauma settings, 2 for outpatients, and 6 studies that did not specify hospital settings. Studies were designed to investigate different diseases or health problems, such as acute kidney injury [20], appendicitis [16], cardio, respiratory, and adverse events [19,26], chronic diseases and diabetes [11,13], hospital infections and sepsis [20,25]. Two studies were developed to help predict various hospital complications after treatment [17,20], and seven studies (41%) did not provide any information about specific patient diseases or problems.

Big data major types and sources
Big data in health care can be classified into four major types based on data sources: (1) big data in medicine, (or medical/clinical big data); (2) big data in public health and behavior; (3) big data in medical experiments; and (4) big data in medical literature [10]. We identified three major types of big data in reviewed studies (Figure 2).
Big data in medicine and clinics includes big data generated in hospitals, such as electronic health records (EHRs)/medical records (EMRs), personal health records (PHRs), and medical images (visual information of the internal human body). This group was the most frequent source of data in the 17 studies. Sixteen studies (94%) used big data from medicine and clinics. EHRs, EMRs (n = 12) and PHRs (n = 7) were the main sources of big data. These records consisted of some combination of clinical notes, laboratory and image reporting results, medical histories, hospital stay information, medication and allergies, patient demographics, diagnoses, sensor data, etc., all of which are the basis for personalized medicine.
Big data sourced from public health and patient activities are the second major sources of big data in our studies (n = 13, 76%). These focus on the physiological data of users that are often collected by portable equipment [10] such as electrical and electromagnetic signals from body vitals collected by wearable devices, daily health records, and from sports and personal diets.
The main sources of the big data in public health and behavior include measures and records of electrical and electromagnetic signals from the body (n = 5, 29%) which comprise live data feeds from patient monitoring systems, electrocardiograph (ECG) and electroencephalography (EEG) signals, and sensor data from the Internet of Things (IOT). These approaches connect humans "to the Internet via sensors, and microprocessor chips that record and transmit data" such as brain activity, heart rate, sound waves, and body temperature" [22,23].
Structured knowledge, such as big data from the medical literature under Medical Subject Headings (MESH) codes, the International Classification of Diseases 10th revision (ICD-10), laboratory test codes, etc. and social media data are also significant source of big data in a number of the 17 studies we selected for review (n = 3, 18%) (Figure 2).

Big data size
The studies reviewed included more than 5.4 million patients, mostly admitted to hospital. The sample size of patients in the studies ranged from 1757 to 2.9 million. Authors reported the number of records, tests and patient stay days that ranged from 15,700 to 17.8 million. Three studies [11,18,21] used structured and/or social Advances in Intelligent and Personalized Clinical Decision Support Systems 8 media big data ranging from 7000 to 300 million records. About 65% of the studies (n = 11) used real time data such as ECG streaming, producing 500,000 messages per minute and 80 different types of vitals totaling 100,000 messages [26].

Big data types
Big data visualization studies varied significantly among the data types they could handle [8].
The data information types could be categorical (nominal (diagnosis, treatment)) or ordinal ("high", "low" blood pressure level), numerical (for example, cholesterol measure, temperature), texts, maps or networks. All 17 studies used categorical and numerical data information types, while more than 75% (n = 13) used text data types (clinician notes, laboratory results, notes, etc.). Applications in 13 studies used time series data, including real time signals in three studies (18%) [21,22,26]. Three studies [13,15,16] used maps in their applications to provide clinical decision support. All applications reviewed could deal with several types of data information, with the most common being combinations of categorical, numerical, text and time series.
We further categorized data by medical information type. Most of the studies dealt with physical examination of patients, patient outcomes and diseases, symptoms, and treatment problems which require support for clinical decision making (see Table 3).
A variety of data collection methods were used, with more than half of the studies using data from EHR/EMR systems. The other most common method involved receiving data from continuous monitoring via EEG, ECG, bedside monitoring, or IOT [11,14,19,22,23,26]. Two studies used big databases [11,20], one study used data from a provincial pathology laboratory [13] and one study used user input data in combination with IOT data [14].

Intervention type and big data visualization techniques
We identified 11 subcategories of the PICOS intervention classification to help classify all interventions in the studies such as intervention type, big data visualization techniques, visualization types and others.
For big data visualization techniques 47% (n = 8) of all studies developed web application and 41% (n = 7) of all studies used dashboards for supporting clinical decisions.
The most common visualization category (n = 14.82%) for our studies was tabular (table) visualization for seven studies [11,14,16,17,20,26,27]. Three studies used tabular visualization with color (9 different indicators) and tabular visualization with color indicating risks [12,22,23]. One study used tabular visualization with color coding indicating change from previous state [25] or tabular visualization with color coding indicating change in value and out of normal ranges or trend of changes [21]. One study used a very innovative tabular visualization with color comets [19], where the comet head representing risks at the current time, and a tail that is 3 h long.

Support for user intent
Information visualization can be compared and classified by interaction features. There are a great variety of interaction visualization techniques [29], but we will use an intent model proposed by [9] and extended by [8]. The concept focuses  on 'What a user wants to achieve' and described as "user intent," quite effective technique to classify the low-level interaction techniques into seven descriptive high-level categories [9]. We also added an additional category that includes printing, submitting feedback or saving features ( Table 3).
Most of the studies support user interactivity (n = 15, 89%). Select: mark something as interesting interaction techniques such as keeping track or managing a group are available in 15 and 13 studies respectively (77%). The Filter: show me something conditionally interaction technique was very useful for multiple patient applications/systems and enables users to change the set of data items conditionally (in some specific conditions). This second intent played a leading part in the reviewed applications and was used in 14 studies (82%).
The Explore: show me something else includes items such as reposition, sorting, editing, adjusting axis and Reconfigure: show me a different arrangement (switch representation technique, vary visual encoding) intents were available in 13 studies (77%), while Abstract/Elaborate: show me more or less detail intent gives users the ability to adjust the level of abstraction of a data representation (for example via time or time constraints) is available in 12 studies (71%).
In more than half of the studies (n = 10, 59%) the Connect: show me related items intent is supported, where the most common interaction technique was to show patient/group relationship (n = 9, 53%). Print, Submit Feedback, Record/Download: show me something to save (printing, submitting feedback and recommendations, recording or downloading) intent is enabled in 8 studies (47%) with the most common being printing. Encoding: show me a different representation (temporal data binning, altering fundamental representation of visual appearance (e.g. color, size, and shape) intent is supported in 7 studies (41%) with the most common being the switch representation technique (n = 5, 29%).

Cognitive presentation and units visualized
In most of the 17 studies, color and similarity (grouping) are the most supported cognitive presentations. Eight studies (47%) support all cognitive presentation types, such as color, size and similarity (grouping), seven studies (41%) used color and similarity (grouping) and only one study [20] supported color and size. In just one study [15], only color presentation is supported.
More than 76% of the studies (n = 13) supported visualization for single patients, nine studies (53%) enabled visualization for multiple patients. Nearly  30% of the studies (n = 5) visualized aggregated data, such as groups of patients. Application or visualization tools can be used for single users in one study [23]. In addition, only one study supported data for multi-products [25].

Big data analytics methods and problems
Three main groups of analytical methods were identified in the 17 studies. The most commonly used type was Descriptive (n = 16, 94%) which summarizes data to provide useful information and sometimes prepares it for further analysis. Predictive analytics (conditional logistic regression, deep learning, machine learning, data mining, AI) was the second most popular analytical method and was used in 11 studies (65%). Only two studies (12%) supported natural language processing (text mining, sentiment analysis).
These analytical methods were used to solve a variety of analytical problems such as analyzing all note types from different providers [27], classification [23] and pattern recognition [11], clustering [13,23], electronic risk visualization of respiratory and cardiovascular events [19], identification of sequences of inappropriate drug administration cases [21], calculation of aggregated score and progression of the patient [26], perioperative risk prediction and visualization [20], and risk estimation [17,18,26]. One study [24] supported only summary statistics for contours (as in box-whisker, i.e. mean, median, percentiles 25 and 75, and outliers). One other study [25] used statistical measures such as prevalence and incidence of microorganisms, antibiotics, and microbiologicals.

Big data platforms and tools
The use of big data requires the support of new analytical and other types of tools for handling such massive and different types of data as well as technological innovations for data management, integration and interoperability, storage, distributed processing and analysis, and visualization. The studies we reviewed used a great variety of big data platforms and tools. We classified them into 13 categories which are presented in Table 3.
Cloud computing services (such as Amazon Web Services, Azure and Google clouds or other clouds not specified) were used in most of the studies (n = 11, 65%).
We also identified studies that supported a microservices architecture for big data applications [20], big data cluster resource management [13], big data integration and interoperability and data maintenance/storage technologies (Rest for Traditional Storage (RAID)) [26], messaging cloud based services such as Google Cloud Messaging (GCM) [13] and Java Message Service (JMS) [25], mobile application development framework [22].

PICOS classification: part 3: comparison
Other studies provided comparison for concurrent controls with no visualization [19], different users [13], other web portals [11]. Seven studies (41%) did not provide or mention any comparison or comparators.

PICOS classification: part 4: outcomes
We identified six subcategories of the PICOS classification of outcomes (clinical decision support) which helped us to classify all outcomes in our studies such as system/tool type, clinical decision support purpose, clinical area support, measured outcome, clinical or patient outcomes divided into clinical and usability, outcomes effects or potential effects.

System/tool type
A majority of studies (n = 16, 94%) used decision support systems, six studies (35%) were designed for monitoring intelligent systems (sensors, devices), two (12%) used expert systems together with decision support systems [12,25], one used a system for optimizing operations (lab resources depending on order volumes and treatments) and a decision support system [13], while only one study was developed as a visualization system [17].

Clinical decision support purpose
Each of the systems was designed to address a different purpose, and some were used for several purposes. We classified clinical decision support purposes into three categories of use: (1) early detection of diseases; (2) improvement of decision making; and (3) patient-centric care.
Improvement of decision making. This was the largest common purpose group (n = 14, 82%). The intent of these systems included utilizing diverse data to provide automated and augmented insight, discovery, and evidence-based health and wellness decision support (n = 10, 59%) [11,12,14,16,18,20,[22][23][24][25], followed by monitoring and assisting individuals with intelligent systems: sensors, devices and robotics, to maintain function and independence (n = 2, 12%) [14,15], improving accuracy [15,24], supporting clinical workflow [27] and monitoring clinical pathways [24]. Other purposes included recalling all similar patients from an institution's electronic medical record (EMR) repository, exploring "what-if " scenarios, and collecting these evidence-based cohorts for future statistical validation and pattern mining [24]. Also included were detecting inadequate treatments [21], tracking treatments [13], supporting and formulating guidelines (standards of care, expert standards, best practices) adherence to and formulating guidelines for interpreting and adjusting severity thresholds regarding the measurement of test results [11,13].
Early detection of diseases. This was the second most frequently used common purposes group (n = 12, 71%). The major purpose was disease monitoring, such as supervising (the patients), monitoring (the clinical condition) and notifying (healthcare personnel) [12,17,25,26], reporting on continuous updates on seizures [23], visualizing signal data on physician and clinician smartphone devices to analyze, inspect, and recommend the appropriate medical decisions [22,23], assisting users in the prescription of appropriate empiric or targeted antibiotic treatments [25]. Other purposes in this group included adopting and tracking healthier behaviors [11,21], monitoring patient safety for adverse events following medical procedures [17], identifying VAE (ventilator adverse events) risk for more rigorously identification of adverse events at an earlier stage [26], surveillance and management of antibiotics [24], health tracking [25,27], and surgery risk assessment [20].
Patient-centric health care was supported in two studies (12%). This included planning post-discharge care coordination (which could include the follow-up visit with the primary care physician or with a subspecialist, or care on subsequent days at home by a visiting nurse) [16] and cyber-based empowering of patients and healthy individuals to play a substantial role in their own health and treatment [11].

Were outcomes measured?
In most studies outcomes were not measured (n = 10, 59%) or reported (n = 2, 12%). Two studies reported more than one type of patient outcome measured (algorithm evaluation, time spent, etc.) [21,27], specific tests such as oxygenation of the blood, heartbeats, daily steps, etc. [19], wound healing and/or length of stay, and time spent [12,14].
Usability outcomes. This refers to whether the system that was developed is easy to use, cost efficient and satisfactory for the users. Only two studies did not report usability outcomes [16,26]. Ten studies (59%) reported effectiveness of their systems, such as gaining more knowledge [12,18,[21][22][23][24][25] and decreasing errors [13], effectiveness usage to estimate changes over time or to compare the estimated risks from one hospital to another [17], and optimizing and planning resources [13]. Six studies (35%) reported efficiency outcomes, such as low costs [17,18] and time needed on clinically relevant tasks [11,19,20,26]. Almost 30% of the studies (n = 5) included satisfaction outcomes, such as satisfaction with system usage [23,27], visual validity (i.e. real time and interactive visualization on physician smartphone applications) [14,22,23], improvement and user friendliness [16].
One study [12] reported that their system increased predictive power for patient admissions (ICU) and 30% success in predicting an event 7 days before it occurs. This study reported that when the system was used, they found decreasing negative patient outcomes (death, disease, adverse reactions, infections)-50% of patient deaths were predicted within 7 days before the event occurred. Ninety-five percent of all decisions (an increase from 30%) were based on information coming from the system [12]. They concluded that monitoring antibiotic use in real-time, either from an institutional or individual perspective, immediately generated targeted interventions that led to more adequate antibiotic use.
Another study [14] reported that both doctors and normal users are getting more and more familiar with the system as a function of "disease numbers" (comorbidity) increases. A person with four diseases understands visualization outputs three times more quickly than a person with only one disease (30 s vs. 90 s). User satisfaction increases as well since they can now understand disease diagnosis DOI: http://dx.doi.org /10.5772/intechopen.90266 results diagrams faster (from 6 (for patient with 1 disease) to near 10 (for patient with 4 diseases) on a scale of 0-10.
In another study, the rate of septic shock in the Surgical and Trauma Intensive Care Unit (STICU) decreased by more than half after the display of the system monitor was made available to the STICU (p < 0.05). These results remained statistically significant even after adjusting for other control variables. The rates of respiratory failure, hemorrhage and mortality did not change significantly in either unit when comparing the periods before and after monitor display [19].
Time to detection of an inappropriate treatment decreased when using an algorithm (sorted and ordered sequences) developed for one system [21]. The study also reported that the algorithm is a very good classifier when compared with a pharmacovigilance expert (gold standard review). Another study [27] reported a non-statistically significant difference in time on clinically relevant tasks, but many others reported improved potential outcomes without any quantitative measures.

PICOS classification: part 5: study design
The subcategories of the PICOS classification of the study design are presented in Table 5. Three subcategories were identified that could classify the study design, including: (1) analytics and descriptive study design, (2) study design score and (3) whether the system was a prototype or actually used in practice. Thirteen  studies (77%) were identified as descriptive (qualitative) studies, although 10 (59%) of them were identified only as qualitative designs [11,13,16,18,20,[22][23][24][25][26], three were identified as mixed qualitative and quantitative descriptive [12,14,19], one was identified as descriptive qualitative with survey [15] and one was descriptive with user-centered, iterative design [27]. Five studies (29%) used analytics designs [12,14,17,19,21]. Most of the studies reviewed (n = 13, 77%) were prototypes but with results based on real data. Only four of the studies reported systems (23%) actually used in practice.

Conclusion
To provide effective patient-centered healthcare, it is essential to manage and analyze huge amounts of data. In the past decade, the variety and volume of health data sources have both increased dramatically, making traditional data management and analysis tools insufficient. Big data has emerged as a response to the growing need for health organizations to have new tools capable of processing massive amounts and varieties of healthcare data [30]. A major advantage of big data techniques is the use of advanced analysis techniques such as predictive analytics to improve clinical care, quality of care and patient outcomes.
This systematic review identified 17 studies with different visualization types and user intents, with a wide variety of data collection methods, big data platforms and tools, clinical decision support purposes to understand and synthesize existing approaches of big data knowledge visualization for clinical decision support. The results of this review emphasize the use of common types of big data knowledge visualization, patterns of big data analytics methods and classification of outcomes for clinical decision support.
The study demonstrated big differences in terms of visualization techniques used, user intents, big data tool implementations and outcome effects. Most studies reported only potential effectiveness from using knowledge visualization in clinical setting. This included gaining more knowledge for clinical decision support and improved clinical decision making based on better and more timely information, decreasing negative patient outcomes (i.e. death, disease, adverse reactions, infections) and improving patient outcomes (i.e. blood sugar, decreased length of stay and time saving), increasing predictive power of adverse events, and decreasing cost of care. Much more research is needed on implementing different techniques for big data knowledge visualization and evaluating the resulting outcomes from clinical decision support. Additional study is also needed to provide solid evidence that clinical outcomes can be improved through clinical decision support through big data knowledge visualization.

Limitations
Our study has three limitations. First, we only searched publications in English, and thus did not capture studies published in other languages. We also did not include commercial applications/systems which are used for clinical decision support and which may be using advanced techniques for knowledge visualization support. Lastly, papers on big data knowledge visualization but not implemented on big data platforms/tools were excluded from our review. Additional knowledge to understand big data knowledge visualization for clinical decision support could be possible through a review of such systems.