Introduction to Descriptive Statistics

Olubunmi Alabi; Tosin Bukola

doi:10.5772/intechopen.1002475

Abstract

This chapter offers a comprehensive exploration of descriptive statistics, tracing its historical development from Condorcet’s “average” concept to Galton and Pearson’s contributions. Emphasizing its pivotal role in academia, descriptive statistics serve as a fundamental tool for summarizing and analyzing data across disciplines. The chapter underscores how descriptive statistics drive research inspiration and guide analysis, and provide a foundation for advanced statistical techniques. It delves into their historical context, highlighting their organizational and presentational significance. Furthermore, the chapter accentuates the advantages of descriptive statistics in academia, including their ability to succinctly represent complex data, aid decision-making, and enhance research communication. It highlights the potency of visualization in discerning data patterns and explores emerging trends like large dataset analysis, Bayesian statistics, and nonparametric methods. Sources of variance intrinsic to descriptive statistics, such as sampling fluctuations, measurement errors, and outliers, are discussed, stressing the importance of considering these factors in data interpretation.

Keywords

academic research
data analysis
data visualization
decision-making
research methodology
data summarization

Author Information

Show +

Olubunmi Alabi*
- African University of Science and Technology, Abuja, Nigeria
Tosin Bukola
- University of Greenwich, London, United Kingdom

*Address all correspondence to: oalabi@aust.edu.ng

1. Introduction

The French mathematician and philosopher Condorcet established the idea of the “average” as a means to summarize data, which is when descriptive statistics got their start. Yet, the widespread use of descriptive statistics in academic study did not start until the 19th century. Francis Galton, who was concerned in the examination of human features and attributes, was one of the major forerunners of descriptive statistics. Galton created various statistical methods that are still frequently applied in academic research today, such as the correlation and regression analysis concepts. The American statistician and mathematician in the early 20th century Karl Pearson created the “normal distribution,” which is a bell-shaped curve that characterizes the distribution of many natural occurrences. Moreover, Pearson created a number of correlational measures and popularized the chi-square test, which evaluates the significance of variations between observed and predicted frequencies. With the advent of new methods like multivariate analysis and factor analysis in the middle of the 20th century, the development of electronic computers sparked a revolution in statistical analysis. Descriptive statistics is the analysis and summarization of data to gain insights into its characteristics and distribution [1].

Descriptive statistics help researchers generate study ideas and guide further analysis by allowing them to explore data patterns and trends [2]. Descriptive statistics were used more often in academic research because they helped researchers better comprehend their datasets and served as a basis for more sophisticated statistical techniques. Similarly, Descriptive statistics are used to summarize and analyze data in a variety of academic areas, including psychology, sociology, economics, education, and epidemiology [3]. Descriptive statistics continue to be a crucial research tool in academia today, giving researchers a method to compile and analyze data from many fields. It is now simpler than ever to analyze and understand data, enabling researchers to make better informed judgments based on their results. This is due to the development of new statistical techniques and computer tools. Descriptive statistics can benefit researchers in hypothesis creation and exploratory analysis by identifying trends, patterns, and correlations between variables in huge datasets [4]. Descriptive statistics are important in data-driven decision-making processes because they allow stakeholders to make educated decisions based on reliable data [5].

2. Background

The history of descriptive statistics may be traced back to the 17th century, when early pioneers like John Graunt and William Petty laid the groundwork for statistical analysis [6]. Descriptive statistics is a fundamental concept in academia that is widely used across many disciplines, including social sciences, economics, medicine, engineering, and business. Descriptive statistics provides a comprehensive background for understanding data by organizing, summarizing, and presenting information effectively [7]. In academia, descriptive statistics is used to summarize and analyze data, providing insights into the patterns, trends, and characteristics of a dataset. Similarly, in academic research, descriptive statistics are often used as a preliminary analysis technique to gain a better understanding of the dataset before applying more complex statistical methods. Descriptive statistics lay the groundwork for inferential statistics by assisting researchers in drawing inferences about a population based on observed sample data [8]. Descriptive statistics aid in the identification and analysis of outliers, which can give useful insights into unusual observations or data collecting problems [9].

Descriptive statistics enable researchers to synthesize both quantitative and qualitative data, allowing for a thorough examination of factors [10]. Descriptive statistics can provide valuable information about the central tendency, variability, and distribution of the data, allowing researchers to make informed decisions about the appropriate statistical techniques to use. Descriptive statistics are an essential component of survey research technique, allowing researchers to efficiently summarize and display survey results [11]. Descriptive statistics may be used to summarize data as well as spot outliers, or observations that dramatically depart from the trend of the data as a whole. Finding outliers can help researchers spot any issues or abnormalities in the data so they can make the necessary modifications or repairs. In academic research, descriptive statistics are frequently employed to address research issues and evaluate hypotheses. Descriptive statistics, for instance, can be used to compare the average scores of two groups to see if there is a significant difference between them. In order to create new hypotheses or validate preexisting ideas, descriptive statistics may also be used to find patterns and correlations in the data.

There are several sources of variation that can affect the descriptive statistics of a data set, some of which include: Sampling Variation, descriptive statistics are often calculated using a sample of data rather than the entire population. Therefore, the descriptive statistics can vary depending on the particular sample that is selected. This is known as sampling variation. Measurement Variation, different measurement methods can produce different results, leading to variation in descriptive statistics. For example, if a scale is used to measure the weight of objects, slight differences in how the scale is used can produce slightly different measurements.

Data entry errors are mistakes made during the data entry process which can lead to variation in descriptive statistics. Even small errors, such as transposing two digits, can significantly impact the results. Outliers, Outliers are extreme values that fall outside of the expected range of values. These values can skew the descriptive statistics, making them appear more or less extreme than they actually are. Natural Variation, Natural variation refers to the inherent variability in the data itself. For example, if a data set contains measurements of the heights of trees, there will naturally be variation in the heights of the trees. It is important to understand these sources of variation when interpreting and using descriptive statistics in academia. Properly accounting for these sources of variation can help ensure that the descriptive statistics accurately reflect the underlying data.

Some emerging patterns in descriptive statistics in academia include: Big data analysis, with the increasing availability of large data sets, researchers are using descriptive statistics to identify patterns and trends in the data. The use of big data analysis techniques, such as machine learning and data mining, is becoming more common in academic research. Visualization techniques, advances in data visualization techniques are enabling researchers to more easily identify patterns in data sets. For example, heat maps and scatter plots can be used to visualize the relationship between different variables. Bayesian statistics is an emerging area of research in academia, which involves using probability theory to make inferences about data. Bayesian statistics can provide more accurate estimates of descriptive statistics, particularly when dealing with complex data sets.

Non-parametric statistics are becoming increasingly popular in academia, particularly when dealing with data sets that do not meet the assumptions of traditional parametric statistical tests. Non-parametric tests do not require the data to be normally distributed, and can be more robust to outliers. Open science practices, such as pre-registration and data sharing, are becoming more common in academia. This is enabling researchers to more easily replicate and verify the results of descriptive statistical analyses, which can improve the quality and reliability of research findings. Overall, the emerging patterns in descriptive statistics in academia reflect the increasing availability of data, the need for more accurate and robust statistical techniques, and a growing emphasis on transparency and openness in research practices.

3. Benefits of descriptive statistics

The advantages of descriptive statistics extend beyond research and academia, with applications in commercial decision-making, public policy, and strategic planning [12]. The benefits of descriptive statistics include providing a clear and concise summary of data, aiding in decision-making processes, and facilitating effective communication of findings [13]. Descriptive statistics provide numerous benefits to academia, some of which include: Summarization of Data: descriptive statistics allow researchers to quickly and efficiently summarize large data sets, providing a snapshot of the key characteristics of the data. This can help researchers identify patterns and trends in the data, and can also help to simplify complex data sets. Better decision making: descriptive statistics can help researchers make data-driven decisions. For example, if a researcher is comparing the effectiveness of two different treatments, descriptive statistics can be used to identify which treatment is more effective based on the data. Visualization of data: descriptive statistics can be used to create visualizations of data, which can make it easier to communicate research findings to others.

Histograms, bar charts, and scatterplots are examples of data visualization techniques that may be used to graphically depict data in order to detect trends, outliers, and correlations [14]. Visualizations can also help to identify patterns and trends in the data that might not be immediately apparent from raw data. Hypothesis Testing: descriptive statistics are often used in hypothesis testing, which allows researchers to determine whether a particular hypothesis about a data set is supported by the data. This can help to validate research findings and increase confidence in the conclusions drawn from the data. Improved data quality: Descriptive statistics can help to identify errors or inconsistencies in the data, which can help researchers improve the quality of the data. This can lead to more accurate research findings and a better understanding of the underlying phenomena. Overall, the benefits of descriptive statistics in academia are many and varied. They help researchers summarize large data sets, make data-driven decisions, visualize data, validate research findings, and improve the quality of the data. By using descriptive statistics, researchers can gain valuable insights into complex data sets and make more informed decisions based on the data.

4. Practical applications of descriptive statistics

Descriptive statistics has practical applications in disciplines such as business, social sciences, healthcare, finance, and market research [15]. Descriptive statistics have a wide range of practical applications in academia, some of which include: Data Summarization: Descriptive statistics can be used to summarize large data sets, making it easier for researchers to understand the key characteristics of the data. This is particularly useful when dealing with complex data sets that contain many variables. Hypothesis Testing: Descriptive statistics can be used to test hypotheses about a data set. For example, researchers can use descriptive statistics to test whether the mean value of a particular variable is significantly different from a hypothesized value. Data visualization: descriptive statistics can be used to create visualizations of data, which can make it easier to identify patterns and trends in the data. For example, a histogram or boxplot can be used to visualize the distribution of a variable. Comparing Groups: Descriptive statistics can be used to compare different groups within a data set. For example, researchers may compare the mean values of a particular variable between different demographic groups, such as age or gender. Predictive modeling: Descriptive statistics can be used to build predictive models, which can be used to forecast future trends or outcomes. For example, a researcher might use descriptive statistics to identify the key variables that predict student performance in a particular course. The practical applications of descriptive statistics in academia are wide-ranging and varied. They can be used in many different fields, including psychology, economics, sociology, and biology, among others, to provide insights into complex data sets and help researchers make data-driven decisions (Figure 1).

Figure 1.
Types of descriptive statistics. Ref: https://www.analyticssteps.com/blogs/types-descriptive-analysis-examples-steps.

Descriptive statistics is a useful tool for researchers in a variety of sectors since it allows them express the major characteristics of a dataset, such as its frequency, central tendency, variability, and distribution.

4.1 Central tendency measurements

Central tendency metrics, such as mean, median, and mode, are essential descriptive statistics that offer information about the average or typical value in a collection [16]. One of the primary purposes of descriptive statistics is to summarize data in a succinct and useful manner. Measures of central tendency, such as the median, are resistant to outliers and offer a more representative assessment of the average value in a skewed distribution [17]. The mean, median, and mode are measures of central tendency that are used to characterize the usual or center value of a dataset. The mean of a dataset is the arithmetic average, but the median is the midway number when the data is ordered in order of magnitude. The mode is the most often occurring value in the collection. Central tendency measurements are one of the most important aspects of descriptive statistics, as they provide a summary of the “typical” value of a data set.

The three most commonly used measures of central tendency are: Mean: the mean is calculated by adding up all the values in a data set and dividing by the total number of values. The mean is sensitive to outliers, as even one extreme value can greatly affect the mean. Median: the median is the middle value in a data set when the values are ordered from smallest to largest. If the data set has an odd number of values, the median is the middle value. If the data set has an even number of values, the median is the average of the two middle values. The median is more robust to outliers than the mean. Mode: the mode is the most common value in a data set. In some cases, there may be multiple modes (i.e. bimodal or multimodal distributions). The mode is useful for identifying the most frequently occurring value in a data set. Each of these measures of central tendency provides a different perspective on the “typical” value of a data set, and which measure is most appropriate to use depends on the nature of the data and the research question being addressed. For example, if the data set contains extreme outliers, the median may be a better measure of central tendency than the mean. Conversely, if the data set is symmetrical and normally distributed, the mean may provide the best measure of central tendency.

4.2 Variability indices

It is another key part of descriptive statistics is determining data variability. The spread or dispersion of data points about the central tendency readings is quantified by variability indices such as range, variance, and standard deviation [18]. Variability measures, such as range, variance, and standard deviation, reveal information about the spread or dispersion of the data. Variability indices, such as the coefficient of variation, allow you to compare variability across various datasets with different scales or units of measurement [19]. The range is the distance between the dataset’s greatest and lowest values, and the variance and standard deviation are measures of how much the data values depart from the mean. Variability indices are measures used in descriptive statistics to provide information about how much the data varies or how spread out it is. Variability indices, such as the interquartile range, give insights into data distribution while being less impacted by extreme values than the standard deviation [20]. Some commonly used variability indices include:

Range: The range is the difference between the largest and smallest values in a data set. It provides a simple measure of the spread of the data, but is sensitive to outliers. Interquartile Range (IQR): The IQR is the range of the middle 50% of the data. It is calculated by subtracting the 25th percentile (lower quartile) from the 75th percentile (upper quartile). The IQR is more robust to outliers than the range. Variance: The variance is a measure of how spread out the data is around the mean. It is calculated by taking the average of the squared differences between each data point and the mean. The variance is sensitive to outliers. Standard Deviation: The standard deviation is the square root of the variance. It provides a measure of how much the data varies from the mean, and is more commonly used than the variance because it has the same units as the original data.

Coefficient of Variation (CV): The CV is a measure of relative variability, expressed as a percentage. It is calculated by dividing the standard deviation by the mean and multiplying by 100. The CV is useful for comparing variability across different data sets that have different units or scales. These variability indices provide important information about the spread and variability of the data, which can help researchers better understand the characteristics of the data and draw meaningful conclusions from it.

4.3 Data visualization

Data may be visually represented using graphical approaches in addition to numerical metrics. Graphs and charts, such as histograms, box plots, and scatterplots, allow researchers investigate data patterns and correlations. Box plots and violin plots are efficient data visualization approaches for showing data distribution and spotting potential outliers [21]. They may also be used to detect outliers, or data points that deviate dramatically from the rest of the data. Data visualization is an important aspect of descriptive statistics, as it allows researchers to communicate complex data in a visual and easily understandable format. Some common types of data visualization used in descriptive statistics include: Histograms: Histograms are used to display the distribution of a continuous variable. The data is divided into intervals (or “bins”), and the number of observations falling into each bin is displayed on the vertical axis. Histograms provide a visual representation of the shape of the distribution, and can help to identify outliers or skewness. Box plots: Box plots provide a graphical representation of the distribution of a continuous variable. The application of graphical approaches, such as scatterplots and heat maps, improves comprehension of correlations and patterns in large datasets [22].

The box represents the middle 50% of the data, with the median displayed as a horizontal line inside the box. The whiskers extend to the minimum and maximum values in the data set, and any outliers are displayed as points outside the whiskers. Box plots are useful for comparing distributions across different groups or for identifying outliers. Scatter plots: Scatter plots are used to display the relationship between two continuous variables. Each data point is represented as a point on the graph, with one variable displayed on the horizontal axis and the other variable displayed on the vertical axis. Scatter plots can help to identify patterns or relationships in the data, such as a positive or negative correlation. Bar charts: Bar charts are used to display the distribution of a categorical variable.

The categories are displayed on the horizontal axis, and the frequency or percentage of observations falling into each category is displayed on the vertical axis. Bar charts can help to compare the frequency of different categories or to display the results of a survey or questionnaire. Heat maps: Heat maps are used to display the relationship between two categorical variables. The categories are displayed on both the horizontal and vertical axes, and the frequency or percentage of observations falling into each combination of categories is displayed using a color scale. Heat maps can help to identify patterns or relationships in the data, such as a higher frequency of observations in certain combinations of categories. These types of data visualizations can help researchers to communicate complex data in a clear and understandable format, and can also provide insights into the characteristics of the data that may not be immediately apparent from the raw data.

4.4 Data cleaning and preprocessing

Data cleaning and preprocessing procedures, such as imputation methods for missing data, aid in the preservation of data integrity and the reduction of bias in descriptive analysis [23]. Before beginning any statistical analysis, be certain that the data is clean and well arranged. The process of discovering and fixing flaws or inconsistencies in data, such as missing numbers or outliers, is known as data cleaning. Data preparation is the process of putting data into an appropriate format for analysis, such as scaling or normalizing the data. Data cleaning and preprocessing are essential steps in descriptive analysis, as they help to ensure that the data is accurate, complete, and ready for analysis. Some common data cleaning and preprocessing steps include: Handling missing data: Missing data can be a common problem in datasets and can impact the accuracy of the analysis. Depending on the amount of missing data, researchers may choose to remove incomplete cases or impute missing values using techniques such as mean imputation, regression imputation, or multiple imputation. Handling outliers: Outliers are extreme values that are different from the majority of the data points and can distort the analysis. Outlier identification and removal procedures, for example, assist increase the accuracy and reliability of descriptive statistics [24].

To assure the correctness and dependability of descriptive statistics, data cleaning and preprocessing require finding and dealing with missing values, outliers, and data inconsistencies [25]. Researchers may choose to remove or transform outliers to better reflect the characteristics of the data. Data transformation: Data transformation is used to normalize the data or to make it easier to analyze. Common transformations include logarithmic, square root, or Box-Cox transformations. Handling categorical data: Categorical data, such as nominal or ordinal data, may need to be recoded into numerical data before analysis. Researchers may also need to handle missing or inconsistent categories within the data. Standardizing data: Standardizing data involves scaling the data to have a mean of zero and a standard deviation of one. This can be useful for comparing variables with different units or scales. Data integration: Data integration involves merging or linking multiple datasets to create a single, comprehensive dataset for analysis. This may involve matching or merging datasets based on common variables or identifiers. By performing these data cleaning and preprocessing steps, researchers can ensure that the data is accurate and ready for analysis, which can lead to more reliable and meaningful insights from the data.

5. Descriptive statistics in academic methodology

Descriptive statistics are important in academic technique because they enable researchers to synthesize and describe data collected for research objectives [26]. Descriptive statistics is often used in combination with other statistical techniques, such as inferential statistics, to draw conclusions and make predictions from the data. In academic research, descriptive statistics is used in a variety of ways, such as describing sample characteristics. Descriptive statistics is used to describe the characteristics of a sample, such as the mean, median, and standard deviation of a variable. This information can be used to identify patterns, trends, or differences within the sample. Identifying data outliers: Descriptive statistics can help researchers identify potential outliers or anomalies in the data, which can affect the validity of the results. For example, identifying extreme values in a dataset can help researchers to investigate whether these values are due to measurement error or a true characteristic of the population.

Communicating research findings: Descriptive statistics is used to summarize and communicate research findings in a clear and concise manner. Graphs, charts, and tables can be used to display descriptive statistics in a way that is easy to understand and interpret. Testing assumptions: Descriptive statistics can be used to test assumptions about the data, such as normality or homogeneity of variance, which are important for selecting appropriate statistical tests and interpreting the results. Overall, descriptive statistics is a critical methodology in academic research that helps researchers to describe and understand the characteristics of their data. By using descriptive statistics, researchers can draw meaningful insights and conclusions from their data, and communicate these findings to others in a clear and concise manner.

6. Pitfalls of descriptive statistics

The possibility for misunderstanding, reliance on summary measures alone, and susceptibility to high values or outliers are all disadvantages of descriptive statistics [27]. While descriptive statistics is an essential tool in academic statistics, there are several potential pitfalls that researchers should be aware of: Limited scope: Descriptive statistics can provide a useful summary of the characteristics of a dataset, but it is limited in its ability to provide insights into the underlying causes or mechanisms that drive the data. Descriptive statistics alone cannot establish causal relationships or test hypotheses. Misleading interpretations: Descriptive statistics can be misleading if not interpreted correctly. For example, a small sample size may not accurately represent the population, and summary statistics such as the mean may not be meaningful if the data is not normally distributed.

Incomplete analysis: Descriptive statistics can only provide a limited view of the data, and researchers may need to use additional statistical techniques to fully analyze the data. For example, hypothesis testing and regression analysis may be needed to establish relationships between variables and make predictions. Biased data: Descriptive statistics can be biased if the data is not representative of the population of interest. Sampling bias, measurement bias, or non-response bias can all impact the validity of descriptive statistics. Over-reliance on summary statistics: Descriptive statistics can be over-reliant on summary statistics such as the mean or median, which may not provide a complete picture of the data. Visualizations and other descriptive statistics, such as measures of variability, can provide additional insight into the data. To avoid these pitfalls, researchers should carefully consider the scope and limitations of descriptive statistics and use additional statistical techniques as needed. They should also ensure that their data is representative of the population of interest and interpret their descriptive statistics in a thoughtful and nuanced manner.

7. Conclusion

Researchers can test the normalcy assumptions of their data by using relevant descriptive statistics techniques such as measures of skewness and kurtosis [28]. Descriptive statistics has become a fundamental methodology in academic research that is used to summarize and describe the characteristics of a dataset, such as the central tendency, variability, and distribution of the data. It is used in a wide range of disciplines, including social sciences, natural sciences, engineering, and business. Descriptive statistics can be used to describe sample characteristics, identify data outliers, communicate research findings, and test assumptions. The kind of data, research topic, and particular aims of the study all influence the right choice and implementation of descriptive statistical approaches [29].

However, there are several potential pitfalls of descriptive statistics, including limited scope, misleading interpretations, incomplete analysis, biased data, and over-reliance on summary statistics. The use of descriptive statistics in data presentation can improve the interpretability of study findings, making complicated material more accessible to a larger audience [30]. To use descriptive statistics effectively in academic research, researchers should carefully consider the limitations and scope of the methodology, use additional statistical techniques as needed, ensure that their data is representative of the population of interest, and interpret their descriptive statistics in a thoughtful and nuanced manner.

Conflict of interest

The authors declare no conflict of interest.

References

1. Agresti A, Franklin C. Statistics: The Art and Science of Learning from Data. Upper Saddle River, NJ: Pearson; 2009
2. Norman GR, Streiner DL. Biostatistics: The Bare Essentials. 4th ed. Shelton (CT): PMPH-USA; 2014
3. Cohen J, Cohen P, West SG, Aiken LS. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. New York: Routledge; 2013
4. Osborne J. Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment. 2019;10(7):1-9
5. Field A, Hole G. How To Design and Report Experiments Sage. The Tyranny of Evaluation Human Factors in Computing Systems CHI Fringe; 2003
6. Anders H. A History of Mathematical Statistics from 1750 to 1930. New York: Wiley; 1998. p. xvii+795. ISBN 0-471-17912-4
7. Rebecca M. Warner’s Applied Statistics: From Bivariate Through Multivariate Techniques. Second Edition. Thousand Oaks, California: SAGE Publications; 2012
8. Sullivan LM, Artino AR Jr. Analyzing and interpreting continuous data using ordinal regression. Journal of Graduate Medical Education. 2013;5(4):542-543
9. Hoaglin DC, Mosteller F, Tukey JW. Understanding Robust and Exploratory Data Analysis. John Wiley & Sons; 2011
10. Maxwell SE, Delaney HD, Kelley K. Designing Experiments and Analyzing Data: A Model Comparison Perspective. Routledge; 2017
11. De Leeuw ED, Hox JJ. International Handbook of Survey Methodology. Routledge; 2008
12. Chatfield C. The Analysis of Time Series: An Introduction. CRC Press; 2016
13. Tabachnick BG, Fidell LS. Using Multivariate Statistics. Pearson; 2013
14. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer; 2016
15. Field A, Miles J, Field Z. Discovering Statistics Using R. Sage; 2012
16. Howell DC. Statistical Methods for Psychology. Cengage Learning; 2013
17. Wilcox RR. Modern Statistics for the Social and Behavioral Sciences: A Practical Introduction. CRC Press; 2017
18. Hair JF, Black WC, Babin BJ, Anderson RE. Multivariate Data Analysis. Pearson; 2019
19. Beasley TM, Schumacker RE. Multiple regression approach to analyzing contingency tables: Post hoc and planned comparison procedures. Journal of Experimental Education. 2013;81(3):310-312
20. Dodge Y. The Concise Encyclopedia of Statistics. Springer Science & Business Media; 2008
21. Krzywinski M, Altman N. Points of significance: Visualizing samples with box plots. Nature Methods. 2014;11(2):119-120
22. Cleveland WS. Visualizing data. Hobart Press; 1993
23. Little RJ, Rubin DB. Statistical Analysis with Missing Data. John Wiley & Sons; 2019
24. Filzmoser P, Maronna R, Werner M. Outlier identification in high dimensions. Computational Statistics & Data Analysis. 2008;52(3):1694-1711
25. Shmueli G, Bruce PC, Yahav I, Patel NR, Lichtendahl KC Jr, Desarbo WS. Data Mining for Business Analytics: Concepts, Techniques, and Applications in R. John Wiley & Sons; 2017
26. Aguinis H, Gottfredson RK. Statistical power analysis in HRM research. Organizational Research Methods. 2013;16(2):289-324
27. Stevens JP. Applied Multivariate Statistics for the Social Sciences. Routledge; 2012
28. Byrne BM. Structural Equation Modeling with AMOS: Basic Concepts, Applications, and Programming. Routledge; 2016
29. Everitt BS, Hothorn T. An Introduction to Applied Multivariate Analysis with R. Springer; 2011
30. Kosslyn SM. Graph Design for the Eye and Mind. Oxford University Press; 2006

[1] 1. Agresti A, Franklin C. Statistics: The Art and Science of Learning from Data. Upper Saddle River, NJ: Pearson; 2009

[2] 2. Norman GR, Streiner DL. Biostatistics: The Bare Essentials. 4th ed. Shelton (CT): PMPH-USA; 2014

[3] 3. Cohen J, Cohen P, West SG, Aiken LS. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. New York: Routledge; 2013

[4] 4. Osborne J. Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment. 2019;10(7):1-9

[5] 5. Field A, Hole G. How To Design and Report Experiments Sage. The Tyranny of Evaluation Human Factors in Computing Systems CHI Fringe; 2003

[6] 6. Anders H. A History of Mathematical Statistics from 1750 to 1930. New York: Wiley; 1998. p. xvii+795. ISBN 0-471-17912-4

[7] 7. Rebecca M. Warner’s Applied Statistics: From Bivariate Through Multivariate Techniques. Second Edition. Thousand Oaks, California: SAGE Publications; 2012

[8] 8. Sullivan LM, Artino AR Jr. Analyzing and interpreting continuous data using ordinal regression. Journal of Graduate Medical Education. 2013;5(4):542-543

[9] 9. Hoaglin DC, Mosteller F, Tukey JW. Understanding Robust and Exploratory Data Analysis. John Wiley & Sons; 2011

[10] 10. Maxwell SE, Delaney HD, Kelley K. Designing Experiments and Analyzing Data: A Model Comparison Perspective. Routledge; 2017

[11] 11. De Leeuw ED, Hox JJ. International Handbook of Survey Methodology. Routledge; 2008

[12] 12. Chatfield C. The Analysis of Time Series: An Introduction. CRC Press; 2016

[13] 13. Tabachnick BG, Fidell LS. Using Multivariate Statistics. Pearson; 2013

[14] 14. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer; 2016

[15] 15. Field A, Miles J, Field Z. Discovering Statistics Using R. Sage; 2012

[16] 16. Howell DC. Statistical Methods for Psychology. Cengage Learning; 2013

[17] 17. Wilcox RR. Modern Statistics for the Social and Behavioral Sciences: A Practical Introduction. CRC Press; 2017

[18] 18. Hair JF, Black WC, Babin BJ, Anderson RE. Multivariate Data Analysis. Pearson; 2019

[19] 19. Beasley TM, Schumacker RE. Multiple regression approach to analyzing contingency tables: Post hoc and planned comparison procedures. Journal of Experimental Education. 2013;81(3):310-312

[20] 20. Dodge Y. The Concise Encyclopedia of Statistics. Springer Science & Business Media; 2008

[21] 21. Krzywinski M, Altman N. Points of significance: Visualizing samples with box plots. Nature Methods. 2014;11(2):119-120

[22] 22. Cleveland WS. Visualizing data. Hobart Press; 1993

[23] 23. Little RJ, Rubin DB. Statistical Analysis with Missing Data. John Wiley & Sons; 2019

[24] 24. Filzmoser P, Maronna R, Werner M. Outlier identification in high dimensions. Computational Statistics & Data Analysis. 2008;52(3):1694-1711

[25] 25. Shmueli G, Bruce PC, Yahav I, Patel NR, Lichtendahl KC Jr, Desarbo WS. Data Mining for Business Analytics: Concepts, Techniques, and Applications in R. John Wiley & Sons; 2017

[26] 26. Aguinis H, Gottfredson RK. Statistical power analysis in HRM research. Organizational Research Methods. 2013;16(2):289-324

[27] 27. Stevens JP. Applied Multivariate Statistics for the Social Sciences. Routledge; 2012

[28] 28. Byrne BM. Structural Equation Modeling with AMOS: Basic Concepts, Applications, and Programming. Routledge; 2016

[29] 29. Everitt BS, Hothorn T. An Introduction to Applied Multivariate Analysis with R. Springer; 2011

[30] 30. Kosslyn SM. Graph Design for the Eye and Mind. Oxford University Press; 2006