A Quantitative Analysis of Big Data Analytics Capabilities and Supply Chain Management

Janine Zitianellis

doi:10.5772/intechopen.111473

Abstract

With the emergence of Big Data Technologies (BDT) and the growing application of Big Data Analytics (BDA), Supply Chain Management (SCM) researchers increasingly utilize BDA due to the opportunities from BDT and BDA present. Supply Chain (SC) data is inherently complex and results in an environment with high uncertainty, which presents a real challenge for SC decision-makers. This research study aimed to investigate and illustrate the application of BDA within the existing decision-making process. BDT allowed for the extraction and processing of SC data. BDA aided further understanding of SC inefficiencies and delivered valuable, actionable insights by validating the existence of the SC bullwhip phenomenon and its contributing factors. Furthermore, BDA enabled the pragmatic evaluation of linear and nonlinear regression SC relationships by applying machine learning techniques such as Principal Component Analysis (PCA) and multivariable regression analysis. Moreover, applying more sophisticated BDA time series and forecasting techniques such as Sarimax, Tbats, and neural networks improved forecasting accuracy. Ultimately, the improved demand planning and forecast accuracy will reduce SC uncertainty and the effects of the observed SC bullwhip phenomenon, thus creating a competitive advantage for all the members within the SC value chain.

Keywords

big data analytics
supply chain management
bullwhip phenomenon
principal component analysis
regression analysis
demand planning and forecasting

Author Information

Show +

Janine Zitianellis*
- Monarch Business School Switzerland, Cape Town, South Africa

*Address all correspondence to: janine.zitianellis@umonarch-email.ch

1. Introduction

BDA presents an opportunity for precise and transparent information flow between crucial SC components such as procurement, inventory management, and demand planning and forecasting and encourages SC integration and collaboration to promote overall SC efficiency [1, 2].

SCM ultimately attempts to match varying supply and demand rates in the most cost-efficient manner. As one can imagine, the flow of materials and information through several organizations within the SC network is inherently and increasingly complex due to various SC channels and data nodes driven by factors such as market globalization and SC digitalization initiatives [3]. SC inefficiencies will manifest and contribute to the well-examined bullwhip phenomenon [4].

This research critically evaluated the relationship between BDA capabilities and SC performance within a single case study. Therefore, the primary focus of this research was to investigate and illustrate the application of BDA within the existing decision-making process to overcome technology constraints and challenges within the SC information flow and obtain valuable insights into the current market environment. This research aimed to understand better and measure the interaction between in-store sales and in-store stockholding as a dynamic market indicator aiding optimal demand planning and forecasting. The following research question addresses specific SC issues, such as reducing operational costs, risks, and the financial impacts associated with demand forecast inefficiencies and missed market opportunities:

“What BDA methods improve SC demand planning and forecasting in SMEs?”

Even though growing research into the application of BDA within SMEs is evident, there are limited case studies that illustrate the application of BDA reflecting measurable business value, such as a reduction in operational cost, risk, and the financial impact associated with demand forecast inefficiencies and missed market opportunities. Thus, this research project aimed to advance practical knowledge within the area of interest, applying various BDA techniques, increasing SC collaboration and communication, and ultimately improving SC efficiency by employing a single organization case study within the SME industry.

2. Big data, analytics, and supply chain management within the SME industry

2.1 Big data and analytics

In 1941, the term “information explosion” was followed by several publications between 1944 and 2000, reflecting on the magnitude and expected growth rate of data and information. It remains unclear as to the true origin of the term “Big data.” While many states the term was popularized by computer scientist John R Mashey it is believed that the term was only officially coined in 2005 by Roger Mougalas and the O’Reilly Media group [5], describing big data as large datasets that cannot be processed or consumed through traditional business processes and analytical tools [6]. Gartner, a leading technology research and consulting firm, extends a more recent definition by incorporating critical characteristics of big data and describing big data as: “…high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation” [7]. Several studies report on the characteristics of big data, often described as the five Vs of big data or big data dimensions. Namely, these are volume, variety, velocity, value, and veracity [8, 9, 10]. Volume and variety describe the data’s size, magnitude, and format. At the same time, velocity, value, and veracity summarize how closely the data is processed in real time, the business value the information generates, and the trust in the data quality. There have been several extensions to the big data Vs in recent years, including adding variability. Variability considers the change in data structures and, more importantly, the pace, frequency, and extent to which those data structures change. Understanding the variability of one’s data, supported by a high level of data veracity, allows for efficient planning of available resources and processes to ensure minimal disruption to the organization’s decision-making process.

Big Data Analytics BD, coupled with analytics, defined as BDA, presents scalable and cost-effective opportunities to process and integrate different data structures, enabling the extraction of valuable insights from high volume and various structured and unstructured data. Kitchin [11] examined the impact of BDA on established methodologies and how big data acts as a disruptive innovation while permitting new and more efficient analytical methods to emerge. Technical skills are vital in extracting information and insights from that data effectively. Those analytical efforts also rely on clearly articulated business objectives and the evaluation of results by subject matter experts.

The insights drawn allow organizations to respond to rapidly developing environments and attain a competitive advantage on tactical and strategic levels [1, 9]. An empirical study by [9] argues that organizations require agility to reconfigure operations to respond to BDA insights delivered and generate actionable insights to obtain optimal business value from BDA initiatives. The author observes that even though organizations within Norway are progressive in adopting information and technology advances, only some organizations observe the complete business benefit of their BDA investment. Surprisingly, the study reports a low 10.8% of companies with more than 4 years of BDA experience, considering a notable increase in BDA trends over the past decade [9]. Findings in a study by [1] suggest organizations still lack an understanding of the required BDA capabilities to enable BDA as a strategic driver. The study emphasizes that the success of BDA initiatives within an organization is dependent on a composite factor of the allocation of time and financial resources, a supportive BDA infrastructure, and leadership commitment to driving a data-driven culture.

2.2 Big data analytics within supply chain management

BDA presents an opportunity for precise and transparent information flow between crucial SC components such as procurement and inventory management, encouraging SC integration and collaboration [1, 2]. Furthermore, the computational efficiency of BDA accommodates the increasing complexity of SC data, driven by factors such as market globalization, increased market competition, and SC digitalization initiatives [3]. Thus, BDA allows for more accurate demand planning and forecasting underpinning SC inventory management decisions to promote overall SC efficiency.

SCM was defined by [12] as the alignment and horizontal integration of organization, supplier, and customer processes that tie customer demand with capital, materials, services, and information. The definition is consistent with the description provided by [13], who described SCM as the interconnection between organizations through processes to satisfy consumer demand. While excess stock appears to be the conventional approach to managing supply and demand rate fluctuations, holding excess stock has several disadvantages, including the increased risk of stock losses through theft or damages. Fisher [14] expresses the importance of implementing SC policies to ensure downstream stock levels are kept at a minimum, driving increased throughput rate and reducing the working capital otherwise engaged in stock.

The early works of Professor Marshal Fisher stressed the critical importance of prompt information sharing, enabled by innovative technologies and methods, allowing SC members the opportunity to adjust lead times and efficiently respond to changes in market demand. Still, demand forecasting necessitates accurate SC data and the sharing of appropriate SC information by all the members within the SC [2, 15].

To fully appreciate the complexity of SC demand planning and forecasting, one must understand the focal point of demand and supply: inventory management. The core commodity within the SC is inventory, alternatively, stock. Slack et al. [13] describe stock “as the accumulation of material resources within a transformation system.” Conjointly, inventory planning aims to ensure an optimal inventory level, decreasing the risk and cost of stock-outs and carrying excessive stock to efficiently respond to changes in the customer demand [13].

SC decision-maker supports these objectives by implementing the appropriate inventory management policies and executing replenishment strategies informed by forecasted demand.

The accuracy of the forecasted demand is crucial in support of the following critical decisions:

The volume decision: What and how many stock items to order?
The timing decision: When to place the stock order?
The inventory analysis and control decision: What policies and procedures support the decision-making process?

The efficiency of the inventory decision-making process contributes significantly to the bullwhip effect. The bullwhip describes the consequences of minor interruptions within the demand side of the SC, escalating into significant disruptions, such as amplified demand creating variability in replenishment orders moving up the SC [16, 17].

A rigorous evaluation of the bullwhip effect by [4] examines the bullwhip effect as a clear indication of SC inefficiency. Several studies have postulated a convergence between demand signal processing, lead times, order batching, shortage gaming and rationing, price fluctuations, and behavioral causes as factors causing SC disruptions and contributing significantly to the bullwhip effect directly associate with the three major SC inventory decisions [4], Authors [13, 14] suggest that an abnormal rise in SC costs is evidence of deterioration in several supply chains due to self-serving SC relationships and poor and unnecessary price promotion practices causing SC disruptions, increasing SC uncertainty, and leading to inadequate SC performance.

BDA can facilitate SC management by delivering valuable insights and improving demand forecasting. Studies [1, 3] discussed a notable increase in the literature between 2015 to 2019, particularly research that applied more sophisticated BDA techniques in demand forecasting to improve accuracy. The authors illustrate deployment of more sophisticated machine learning methods, including but not limited to neural networks, regression, ARIMA, Support Vector Machine (SVM), and decision trees. These methods address the drawbacks of conventional time series techniques, such as the high reliance on domain knowledge and the inability to incorporate external factors and compute complex non-linear customer demand behavior and relationships. They are believed to outperform conventional methods [3].

The computational efficiency of BDA accommodates the increasing complexity of SC data, allowing for more accurate forecasting and predictions underpinning SC inventory decisions. Thus, improved forecast accuracy can lessen the effects of the observed SC bullwhip phenomenon, reflecting measurable business value, including but not limited to a reduction in operational cost, risk, and the financial impact associated with demand forecast inefficiencies and missed market opportunities.

2.3 Big data analytics within the SME industry

BDA presents an opportunity for SMEs that do not have the resources to invest in costly systems and data analytics infrastructures to leverage the same technologies and capabilities as their larger counterparts [8]. For example, SMEs can utilize BDA techniques to aid and improve essential SC functions such as procurement, inventory management, and demand planning and forecasting.

However, despite the SME industry being universally acknowledged as a crucial sector, much of the literature focuses on the benefits to larger organizations, demonstrating a gap in understanding BDA within the SME industry. A study by [18] revealed that 32.8% of South African SMEs considered the rapidly changing technological environment a critical challenge, with 28.24% concerned with the high cost associated with information technology. These challenges faced by SMEs influenced the implementation of strategic initiatives to reduce operational costs, increase profits, and create a competitive advantage. Similarly, [19] expressed concerns about the slow adoption and integration of technology and innovation within the SME sector and drew attention to little progress in promoting awareness of BDA opportunities. Thus, much uncertainty still exists about the adoption and relevance of BDA within the SME industry.

3. Research methodology

3.1 Design

The main research objectives are advancing practical knowledge to an identified research problem to reduce SC uncertainty and improve demand planning and forecasting, mainly through applying BDA, a developing field within the SME industry. Due to the complexity surrounding SC processes and data, this research adopted a pragmatic position, and the research premise is built on existing theory and concluded as valid. Subsequently, the research is underpinned by a deductive research approach which involves testing causal relationships of two or more concepts of variables set out in a series of hypotheses within the theory boundaries and conditions [20] further describes the deductive research approach as a well-known approach within social research, deducing and subjecting the hypothesis to empirical examination. In more simplistic terms, deductive research flows from theory to data. The use of a quantitative design inherently increases the possibility of research generalizability.

Deploying a single case study research strategy guided the necessary research actions in a structured and linear fashion and promoted coherence throughout the research project. The research’s descriptive and explanatory nature allowed for deeper insights and knowledge into the role of BDA in improving demand planning and forecasting, which desired a longitudinal design to explore the relationship between consumer demand and supply within brick-and-mortar stores.

3.2 Techniques and procedures

The BDA component of this research utilized the CRISP-DM framework [21]. The CRISP-DM is a cross-industry standard process for data mining, developed initially in 1996 to form data mining projects. Today, the methodology is still relevant and successfully applied within the BDA and data science field [22]. With a strong emphasis on business understanding and the constant alignment with business objectives throughout the process, several activities transformed the data into actionable insights to better understand the research problem and support the research objectives. The chosen analysis types are driven by the nature of the research and are closely aligned with the research question, namely descriptive, diagnostic, and predictive analysis. The descriptive nature of the analysis was conducted on a bivariate level to explore any relationship between the measure variables and determine the magnitude or impact of a change in one variable on the other, most often measured by the change in the respective mean values. Understanding the relationship is achievable through correlation analysis and contingency tables. However, it is crucial to note that the bi-variate analysis measures the relationship, draws attention to the effect of change on one another, and by no means implies causality [20].

Throughout this research analysis, the widely used Pearson correlation method was adopted to understand the strength and significance of the linear relationship relative to the unit of analysis [23]. A drawback of the Pearson correlation is the requirement for normally distributed data, as it is a parametric method. Thus, to further support initial correlation findings, non-parametric methods such as Kendall’s tau_b and Spearman’s rank (rho) were deployed to achieve correlation synthesis. Furthermore, adjustments to the interpretation of the correlation scores were necessary, considering the nature of slow-moving non-food products with sparse data [24]. Thus, any variable reflecting a Pearson and Spearman’s (rho) correlation score greater than 0,2 was considered a fair association.

A series of machine learning techniques namely, multivariable linear regression and regression trees, and variable reduction technique, namely principal component analysis (PCA) were explored to pursue a further in-depth analysis and address this research’s diagnostic nature. As Jim [23] states, regression analysis applies various statistical processes to understand the nature of the relationship between dependent and independent variables. Therefore, regression analysis is used to further analyze the relationship between supplier-retailer and retailer-consumer demand variance and the bullwhip phenomenon. The regression equation estimates the relationship and identifies factors of importance that influence the bullwhip measure. In addition, multivariable regression allows for more than one independent variable in the model, whereby each variable has an additive contribution toward the change in the dependent variable [25]. However, a known limitation of regression methods is the sensitivity to highly correlated independent variables, known as multicollinearity. Consequently, multicollinearity introduces data redundancy, impacts the statistical significance of variables, and reduces the precision of estimated model coefficients [26]. A solution was assayed for exploiting multicollinearity using techniques such as principal component analysis (PCA). According to James et al. [25], PCA compares common variation between variables and generates a series of uncorrelated, linear combinations or indexes known as components that collectively explain the most significant proportion of variance within the dataset. Furthermore, it can reduce known regression limitations, such as overfitting, if the PCA assumptions available in Appendix E hold true [27]. Finally, the predictive nature of the research deployed various time series and forecasting techniques performed within RStudio statistical software. Analysis results were visualized within the Tableau tool stack.

3.3 Ethical considerations

Data protection and ethical considerations within a B2B context mainly focus on compliance with the retailer’s data storage and usage policies and procedures. It was of utmost importance that the B2B portal data and in-store observations were processed securely, and usage was access controlled, treating any sensitive retailer data as “personal” data. Furthermore, the same principles and consent were considered for the B2B data in compliance with the General Data Protection Regulation (GDPR) framework [28]. The extracted data were stored in a CSV format on a secure designated Google Drive for collection and analysis. The necessary procedures and activities were carried out to anonymize the data.

4. Case study: South African SME supplier of slow-moving consumer goods

The South-African SME, Zeus Africa, competes as a polymer (HDPE) toy ride-on bikes manufacturer within the South African toy industry. Industry expert [29] reviews the sector as a challenging yet promising market. Thus, harnessing innovative technological capabilities to improve demand planning and forecasting accuracy and support effective decision-making is critical for the organization’s survival and competitive advantage in current market conditions. A key consideration is the information flow and capability of the chosen organization to integrate their B2B data and observational store data into the existing decision-making process to derive valuable consumer insights and reduce supply chain uncertainty.

4.1 Data description

The case study involves identifying the problem and its relevant objectives and hypotheses. The critical sources employed to generate the required information are the following:

4.1.1 Primary internal quantitative financial data

The data extracted from the organization’s primary financial system consist of sales orders spanning over 3 years, from August 2018 to October 2021. The data include transaction-level daily sales orders of seven unique SKUs delivered to a single retailer.

4.1.2 Internal secondary quantitative B2B data

The data extracted from the organization’s B2B portal consist of weekly aggregated sales and stock holding data for brick-and-mortar stores, spanning from August 2018 to October 2021.

4.1.3 Integrated data containing supplier sales orders and store sales and inventory data

The analysis was supported by an additional integrated dataset, namely “B2B store replenishment,” containing supplier sales orders and sales and inventory data at the store level. This dataset was aggregated on a store and SKU ID level.

The data was needed to understand the relationship between supplier-retailer and retailer-consumer demand and store stock levels. Sales levels are assumed to be sensitive to in-store stockholding, and the unit of analysis for this study was the relationship between these components within the retail brick-and-mortar stores, considering the following demand scenarios:

Supplier-retail sales orders to adequately meet retailer-consumer demand.
Retailer-consumer demand when stock is available.
Retailer-consumer demand when there is insufficient stock or no stock is available.

These units contribute to an understanding of historical patterns and trends of consumer demand relative to demand planning over time. Moreover, understanding SC uncertainty by measuring the influence of stock supply on consumer demand and the overall impact on upstream SC demand. In addition, exploring the variance between supplier-retailer and retailer-consumer demand substantiates the existence of the bullwhip phenomenon.

4.2 Descriptive analysis and discussion

The descriptive analysis aims to identify critical SKUs through the application of bivariate techniques and establish if sufficient evidence within supplier-retailer and retailer-consumer demand variance suggests the bullwhip effect.

4.2.1 Evidence of the bullwhip phenomenon

The analysis considered the SD values of the supplier sales order quantity, the store stock replenishment, and sales quantity demand as the unit of analysis to validate the existence of the bullwhip effect. A higher SD sales order quantity value is observed than store stock replenishment and sales quantity demand. Thus, demonstrating a higher variance in supplier-retailer demand than retailer-consumer demand. Subsequently, the bullwhip ratio was derived by adapting the metric used by [4]. The bullwhip at a supplier-retailer demand was determined by measuring the variance ratio between supplier sales order quantity and the store stock replenishment, see Equation (1).

Zeus Africa supplier bullwhip ratio:

Supplierbullwhipratio=skustd.devSalesOrderQuantitystd.devReplenishmentStockInE1

The bullwhip at a retailer-consumer demand was determined by measuring the variance ratio between store stock replenishment and the sales quantity demand, see Equation (2).

Zeus Africa retailer bullwhip ratio:

Retailerbullwhipratio=skustd.devReplenishmentStockInstd.devSalesQtyDemandE2

Interpretation of the bullwhip ratio is relatively simple. A bullwhip ratio of one indicates an equal variance between demand and supply. Therefore, no upstream SC demand amplification is evident. Conversely, a bullwhip ratio of less than one indicates that supply is less variable than demand. In conclusion, a bullwhip ratio greater than one indicates amplified demand variability. Figure 1 represents the bullwhip ratio obtained for supplier-retailer demand and the retailer-consumer demand for each SKU.

Figure 1.
Zeus Africa supplier and retailer bullwhip ratio.

Inspection of supplier-retailer demand in Figure 1 revealed sufficient evidence of demand amplification for all SKUs except SKU ID BZ030, arguably due to the short time in the market, indicating that the consumer demand variance translated into amplified sales order quantity variance. The descriptive analysis yielded sufficient evidence to validate the bullwhip effect leading to a further in-depth analysis of the factors driving the bullwhip phenomenon discussed in the upcoming Section 4.3.

4.3 Diagnostic analysis and discussion

The objective of the diagnostic analysis within the research context was to validate the driving factors contributing to the observed bullwhip phenomenon attributed to the supplier-retailer and retailer-consumer demand variance through regression analysis techniques. In addition, the diagnostic and predictive analysis techniques employed a stratified sampling method to support the research study’s validity and reliability, and the sample outcome is available in Appendix A.

4.3.1 Regression analysis

The preceding Section 2.2 highlights demand signal processing, lead times, order batching, behavioral causes, shortage gaming and rationing, and price fluctuations as major contributing factors to the bullwhip phenomenon. The analysis considered the calculated store bullwhip ratio presented in Equation (3) as the dependent variable and variables associated with store sales quantity demand, stock levels, and stock replenishment variance as the independent variables.

Zeus Africa retailer internal (store) bullwhip ratio:

Storebullwhipratio=store,skustd.devReplenishmentStockInstd.devSalesQtyDemandE3

The regression analysis further analyzes the relationship between supplier-retailer and retailer-consumer demand variance and the bullwhip phenomenon. The regression equation estimates the relationship and identifies factors of importance that influence the bullwhip measure. Discussing the application of regression analysis relevant to this research project focuses on the business benefit by highlighting the impact and the degree of change in the bullwhip measure due to demand variability and SC uncertainty.

An exploratory analysis of the stores’ bullwhip ratio indicates that more than 50% of the store and SKUs recorded notably high bullwhip ratios implying significant variances between the store stock replenishment and sales quantity demand. Furthermore, Figure 2 revealed a much higher mean store bullwhip ratio than the national bullwhip ratio. Thus, a national bullwhip ratio potentially conceals stores recording excessive bullwhip ratios and subsequently overlooks SC inefficiencies.

Figure 2.
Zeus Africa retailer store bullwhip ratio.

Determining the strength and significance of the linear relationship between the bullwhip ratio and the variables mapped to the contributing factors was applying the correlation methods set out in Section 3.3. However, to satisfy the Pearson correlation assumption of normality, a logarithmic transformation (log10) of the bullwhip ratio was necessary to reduce skewness and achieve a near-normal distribution [30], represented in Figure 3.

Figure 3.
Zeus Africa logarithmic transformation (log10) of the bullwhip ratio.

The bullwhip ratio correlation coefficients and covariance interpretation are limited to the variables satisfying the condition available in Appendix B. Specifically, the analysis reveals that high stock levels over long periods are associated with an increase in the store log-transformed bullwhip ratio versus higher stock turn ratios. Reducing the overall time that SKUs are stocked in a store is associated with a decrease in the store’s log-transformed bullwhip ratio.

Furthermore, the correlation analysis highlighted the existence of multicollinearity within the independent variables, available in Appendix C. Multicollinearity introduces data redundancy, impacts the statistical significance of variables, and reduces the precision of estimated model coefficients [26]. However, principal component analysis (PCA) exploits multicollinearity, compares common variations between variables, and generates a series of linear combinations or indexes known as components [27].

4.3.2 PCA analysis

According to [25], PCA produces uncorrelated components comprised of the most optimal linear combination of variables and collectively explains the most significant proportion of variance within the dataset. Furthermore, it can reduce known regression limitations such as overfitting, if the PCA assumptions hold true. Each principal component (PC) represents a proportion of variance explained (PVE). The first PC often carries the most significant PVE value, and the objective is to include as few PCs as possible in the regression model while explaining the most cumulative variance. Author [25] highlights that while there is no single approach to determine the optimal number of PCs, an intuitive inspection of the scree plots detailing each PC PVE and the cumulative PVE would highlight the optimal number of PCs.

A review of the scree plots available in Appendix E and relevant to this research project yielded cumulatively that four PCs explain 79% of the variance within the dataset. In addition, the model accuracy was assessed by evaluating the root mean square of residuals (RMSR). The PCA model yielded an RMSR value of 0.056, which is on the cusp of the acceptable threshold value of 0.05 [31].

An overview of each component generated and the associated variable loading is provided. The loading represents the variable coefficient for each generated component, indicating the strength of the association with the component. Strong positive variable loadings provide substantial confirmation that the underlying variables and encoded PCs can be explicitly associated with the contributing factors to the bullwhip phenomenon.

Following is an overview of each component generated and the associated variable loading. The loading represents the variable coefficient to each generated component, thus indicating the strength of the association with the component, presented in Table 1.

Variable loadings	PCA (PC1) - Price, Promotion fluctuations	PCA (PC2) - Order batching	PCA (PC3) - Demand signal processing	PCA (PC4) - Lead time
Total demand (sales quantity)	0.88
Total stock replenishment	0.90
Total promo indicator	0.78
Rate of sale (ros)	0.91
Mean supply (stock quantity balance)	0.79
Total store count excess stock ind		0.77
Total month end indicator		0.97
Weeks total trading		0.97
Total store count lows ind			0.77
Store count lows ind ratio			0.76
Store stock turn ratio			0.70
Std. store replenishment				0.82
Mean weeks of stock				0.60
Total lead time				0.87
Distance from dc			0.48
SS loadings	4.16	3	2.58	2.08
Proportion variance	0.28	0.20	0.17	0.14
Cumulative variance	0.28	0.48	0.65	0.79

Table 1.

Zeus Africa PCA variable loadings.

5. Components explain 79% of the variance

5.1 Regression analysis training and testing sets

This allows for further in-depth analysis of the store’s bullwhip ratio through the application of regression techniques, which involves creating training and testing datasets to initiate the analysis. By stratifying on SKU ID and dividing the “B2B store replenishment” dataset into training and testing datasets according to a 70/30 ratio split available in Appendix D, whereby the training data train the various algorithms. The testing data aids in an unbiased evaluation of the model’s accuracy and predictive power [25]. The fair representation of SKU ID between the training and testing sets was verified to ensure consistency.

5.1.1 Multivariable regression analysis

Multivariable linear regression is a parametric assessment method that makes certain assumptions about underlying data for analysis [32]. Table 2 represents a summary of the model assumptions and their outcomes.

Assumption	Condition satisfied
Assumption 1: The regression model is linear in parameters.	True
Assumption 2: The mean of residuals is zero or close to zero.	True
Assumption 3: Homoscedasticity of residuals or equal variance.	True
Assumption 4: No autocorrelation of residuals.	False
Assumption 5: The number of observations must be greater than the number of Xs.	True
Assumption 6: No perfect multicollinearity.	True
Assumption 7: Normality of residuals.	True

Table 2.

Bullwhip regression model assumptions.

While all conditions were met, the autocorrelation in the residuals was a source of uncertainty. The non-randomness within the error terms indicates an underlying pattern within the store bullwhip ratio, and these factors are missing from the current data. Alternatively, the model requires adjustment. An adjustment to a non-linear polynomial relationship to the third degree was made. Polynomial relationships describe a curvilinear relationship that displays an increase in the dependent variable with each increase in the independent variable until it reaches a point where subsequent increases result in a decrease. With an adjusted R square value of 0.32, RMSE value of 0.72, and SI value of 0.50, the model did not yield any more success in comparison and was subsequently discarded.

5.1.2 Regression model coefficients and interpretations

The model coefficient describes the expected impact on the dependent variable and the relationship between the dependent and independent variables. Figure 4 illustrates the relationship between the log-transformed bullwhip measure and each PC at a store and SKU level, underpinned by the regression coefficient output and the relevant business interpretation detailed in Appendix E. Inspection of Figure 4 revealed that SKU ID BZ001 is more prone to outliers across all PCs. Furthermore, SKU ID BZ030 is visibly isolated when reviewing the log-transformed bullwhip ratio and PCA (PC2) order batching relationship. Keeping in mind the variables associated with PCA (PC2), namely “total trading weeks” and “total store count excess stock,” a possible explanation for the results is the short period that SKU ID BZ030 has been in the market, which aligns with the correlation analysis findings of the bullwhip ratio tends to decrease over time.

Figure 4.
Zeus Africa log-transformed bullwhip ratio and PCs relationship.

5.1.3 Regression model evaluation and accuracy

The final model was measured and assessed at a significance level of 0.05, denoted as α = 0.05. Appendix F details each assessment measure, the applicable hypothesis test, and the business-relevant interpretation. The model was applied to the test data and the complete dataset to determine how well the model predicts or generalizes to new data, using the computed RMSE and SI as performance indicators for each dataset compared. The test data reported a lower RMSE of 0.07, indicating no overfitting, and reporting a lower SI of 0.05, which is well within the accepted threshold value of 1. Applying the model to the complete dataset, a slightly increased RMSE value of 0.18 was observed. Notwithstanding, the observed SI value of 0.54 remains within the threshold. Thus, accepting the accuracy of the predictions and concluding the model could generalize well to new data [25]. While not all model assumptions were satisfied, the adjusted R Squared within the context of this research project at 0.31, albeit low, is acceptable considering the complexity of these interlinked bullwhip attributes, the model prediction performance measures, namely SE, RMSE, and SI values, are considered satisfactory and the overall model at P-value <0.05 is significant.

5.2 Predictive analysis and discussion

The objective of the predictive analysis is to predict the likelihood of future events. The primary focus was on improving demand planning and forecast accuracy. Within the context of this research project is the application of BDA incorporating dynamic market demand signals sourced from the retailer B2B data into the Zeus Africa decision-making process.

5.2.1 Time series analysis for demand planning and forecasting

Time series analysis involves analyzing historical data before forecasting. Time series methodologies are classed and considered to be the mining of complex data types yet described as a sequence of ordered events expressed numerically, such as customer demand recorded at equal time intervals [33]. Forecasting customer demand involves transforming the time component into an independent variable and estimating future demand based on observed historical demand and current demand signals.

In a recent review of predictive BDA for supply chain forecasting, authors [3] highlight a growing trend and increase in the application of BDA techniques such as but not limited to Neural Networks, regression, Arima, Support vector machine (SVM), and decision trees within the area of SC demand forecasting. The application of these methods addresses the drawbacks of conventional time series techniques, such as the high reliance on domain knowledge and the inability to incorporate external factors and compute complex non-linear customer demand behavior and relationships.

Seeking evidence of improved SC demand forecasting through BDA techniques, the research employed and grouped five well-established time series methods [34]. The time series techniques were grouped into two categories a) conventional time series techniques, namely Holt-Winters and Arima, and b) BDT-enabled time series techniques, namely Sarimax, Tbats, and Neural Networks. These techniques are briefly described in Appendix B. The forecast accuracy of each of the minimal viable time series models employed was benchmarked using several measures described in Appendix C, followed by a business-relevant interpretation.

5.2.2 Time series models and forecasts

The key variables identified to analyze and forecast supplier-retailer demand was the dependent variable, namely sales order quantity, and the independent variable, namely the partition period representing the “time” element of sales orders delivered to the retailer DC during the analysis period. Furthermore, periods of non-delivery were interpreted as missing values in the analysis. Missing values can potentially introduce model bias and decrease model performance. Several time series models can accommodate missing values, namely Arima and Neural Networks models. However, missing values are problematic for time series regression and Tbats techniques. Consequently, it necessitated value interpolation, imputing missing values [35]. Furthermore, a cubic spline interpolation was employed to accommodate the non-linear relationship between supplier-retailer demand, retailer-consumer demand, and stock supply [36].

Additional independent variables were identified and derived from the retailer B2B data, namely (a) the total mean estimated lead time from placing a stock order and receiving thereof at the retailer brick and mortar store, (b) promotional indicator, (c) national stock supply at the point in time, (d) national rate of sale (ROS), and (e) the absolute number (count) of stores reflecting no stock, low stock or excess stock available at the point in time. These variables were incorporated to improve demand forecast accuracy.

5.2.3 Time series model and forecasting evaluation

The following measures assessed the forecast accuracy and benchmarked each of the minimal viable time series models employed, followed by the business-relevant interpretation. Table 3 defines the selected model evaluation and assessment measure.

Measure	Definition
Mean absolute error (MAE)	Represents the average magnitude of errors and is expressed in units of the dependent variable. The absolute fit of the model to the data can be measured by how closely the predicted values align with the actual values.
Mean absolute percentage error (MAPE)	Represents the forecast errors as a % of the actual observed value.
Forecast Errors Root mean squared error (RMSE)	Represents the squared absolute fit of the model to the data, measuring how close the actual values are to the predicted values. The RMSE value is expressed in units of the dependent variable and translates to the standard deviation of the unexplained variance.
Scatter index (SI)	A measure of determining if the RMSE is acceptable, a value of <1 is deemed acceptable.

Table 3.

Time series model evaluation criteria.

5.2.4 Time series training and testing sets

The standard approach for validating time series model performance is selecting observations into the relevant training and testing data sets, and commonly the testing, also referred to as the validation set, will consist of the most recent data observations. Within the context of this research, the test data set contained observations for the most recent 13 weeks from 11 July 2021 to 3 October 2021, and the training data set included all available observations before 11 July 2021 for each SKU ID.

5.2.5 Time series model and forecasting for SKU ID BZ001

Each time series method recorded SI values of <1 and within the acceptable threshold. However, the BDA techniques recorded a significantly lower mean MAPE (error rate) of 33.5 than the conventional techniques’ mean MAPE (error rate). In addition, the BDA techniques reflect higher forecast accuracy, as evident from lower mean MAE (1172) and mean RMSE (1663) values compared to the mean MAE (1148) and mean RMSE (1882) of the conventional methods. The overall results indicate that both Sarimax and Tbats models are considered the best-performing models, with low MAPE (error rate) values of 27.8 and 23.0, respectively. Accompanied by low MEA values of 975 and 955, respectively, and RMSE values of 1315 and 1466.

5.2.6 Time series model and forecasting for SKU ID BZ004

Each series method recorded SI values of <1 and within the acceptable threshold. However, the BDA techniques recorded significantly lower mean MAPE (error rate) values of 37.5 than the mean MAPE (error rate) of 47.2 for the conventional techniques. In addition, the BDA techniques reflect higher forecast accuracy, as evident from lower mean MAE (648) and mean RMSE (719) values compared to the mean MAE (928) and mean RMSE (1120) of the conventional methods. The overall results indicate that Sarimax is the best-performing model with the lowest MAPE (error rate) value of 23.7 and the lowest MEA value of 300, and RMSE value of 370.

5.2.7 Time series model and forecasting for SKU ID BZ009

Each series method recorded SI values of <1 and within the acceptable threshold. The BDA techniques significantly improved the MAPE (error rate) value of 30.1 for the conventional techniques to the MAPE (error rate) value of 22.0. In addition, the BDA techniques reflect higher forecast accuracy, as evident from lower mean MAE (163) and mean RMSE (196) values compared to the mean MAE (236) and mean RMSE (308) of the conventional methods. The results indicate that the Sarimax model is considered the best-performing model, with a low MAPE (error rate) value of 15.3. Accompanied by a low MEA value of 103 and an RMSE value of 119.

5.2.8 Time series model and forecasting for SKU ID BZ012

Each series method recorded SI values of <1 and within the acceptable threshold. The overall BDA techniques delivered counterintuitive results due to the poor performance of the Sarimax model, recording a high MAPE (error rate) value of 72.3 and a high MEA value of 933, and RMSE value of 1247. However, the overall results indicate similar performance of the Arima, TBATS, and Neural Networks models, reporting closely aligned MAPE, MAE, and RMSE values.

5.2.9 Time series model and forecasting for SKU ID BZ013

Each time series method recorded SI values of <1 and within the acceptable threshold. Even though the Neural Networks model was unsuitable, the remaining BDA techniques significantly improved the MAPE (error rate) value of 54.9 for the conventional techniques to the MAPE (error rate) value of 19.0. In addition, the BDA techniques reflect higher forecast accuracy as evidenced by lower mean MAE (553) and mean RMSE (685) values compared to the mean MAE (1604) and mean RMSE (1770) of the conventional methods. The results indicate that the Sarimax model is considered the best-performing model, with a low MAPE (error rate) value of 12.8. Accompanied by a low MEA value of 400 and RMSE value of 540.

5.2.10 Time series model and forecasting for SKU ID BZ030

Each time series method recorded SI values of <1 and within the acceptable threshold. The limited trend or seasonality components rendered the Holt-Winters model unsuitable. However, the conventional Arima model outperformed the BDA techniques, as evident from the low MAPE (error rate) value of 3.5 compared to the MAPE (error rate) value of 14.2 for BDA techniques. In addition, the Arima model reflected a higher forecast accuracy, as evident from lower mean MAE (258) and RMSE (322) values compared to the mean MAE (1023) and RMSE (1340) of the BDA techniques. The overall results indicate that the Arima and Neural Networks models are considered the best-performing models, with closely aligned MAPE, MAE, and RMSE values.

5.2.11 Time series model and forecasting summary

Analysis results show that BDA techniques outperform conventional methods when there is sufficient data available that reflects high seasonal trends and fluctuations. In contrast, conventional techniques, mainly the Arima model, were better suited for SKUs with limited historical or low seasonal data. Table 4 presents each SKU’s optimal time series and forecasting techniques.

SKU ID/Optimal method	Conventional Time series and forecasting techniques	BDA Time series and forecasting techniques
BZ001		X
BZ004		X
BZ009		X
BZ012	X
BZ013		X
BZ030	X

Table 4.

Time series model and forecasting summary.

6. Conclusions and managerial implications

The research premise considers that the efficient management of an organization across the various operational sub-areas will lead to a sustainable competitive advantage within the area of interest of this research, namely SCM [37].

The computational efficiency of BDA accommodates the increasing complexity of SC data and assists in managing market challenges. Zeus Africa and similar SMEs would gain from investing in open-source BD technologies and relevant BDA skills and techniques, harnessing the capabilities of these innovative technologies, and allowing for more accurate forecasting and predictions underpinning SC inventory decisions. A compressive BDA framework consisting of comprehensively understanding SKU importance and value by integrating descriptive analysis techniques, namely the ABC inventory and bivariate analysis, will inform and enable the adjustment of resources and processes in line with actual consumer demand at the lowest possible cost without compromising consumer satisfaction levels.

Moreover, incorporating external data, such as their B2B data, and integrating key SC performance measures, such as the bullwhip ratio, into their decision-making process will highlight operational inefficiencies and challenges, enabling an informed and data-driven approach and improving sales order quantity forecasts. However, the author recognizes the limitation of a more appropriate bullwhip ratio relative to accommodate slow-moving products that need further refinements and presents avenues for future research.

While the financial impact of compounded demand planning and forecasting inefficiency because of unaddressed distorted consumer demand and store stock supply was not established and presents avenues for future research, improved forecast accuracy can lessen the effects of the observed SC bullwhip phenomenon, reflecting measurable business value, including but not limited to a reduction in operational cost, risk, and the financial impact associated with demand forecast inefficiencies and missed market opportunities. Therefore, applying various BDA techniques results in improved SC efficiencies and thus contributes to the research premise.

Conflict of interest

The authors declare no conflict of interest.

Thanks

I extend my gratitude to Zeus Africa for granting me the opportunity to enter into the supply chain world and for the resources and support to undertake this research study.

A. Sampling

Recognizing factors influencing the required sample size such as (a) the number of independent variables, (b) missing values, (c) high variance observed in the dependent variables [38], a minimum representative sample size for each SKU at supplier-retailer and retailer-consumer demand level was determined and employed a stratified sampling method to support the research study’s validity and reliability.

The required sample size for each SKU ID was inadequate for this analysis, given a 0.05 error rate that is relevant to supplier-retailer demand. Notwithstanding, it was acceptable at a 0.1 error rate.

SKU ID/Measure	Deliveries			Sample size provided (n)	The sample size required (n) at (ε)
SKU ID/Measure	variance	std. dev	std. error	observation	0.05 error	0.1 error
BZ001	0.1	0.4	0.4	70	191	48
BZ004	0.2	0.4	0.4	121	239	60
BZ009	0.1	0.3	0.3	81	169	43
BZ012	0.2	0.5	0.5	73	362	91
BZ013	0.0	0.0	0.0	32	0	0
BZ014	0.2	0.4	0.4	11	252	63
BZ030	0.0	0.0	0.0	11	0	0

The required sample size, given a 0.05 error rate relevant to the retailer-consumer demand, was adequate.

SKU ID/Measure	Deliveries			Sample size provided (n)	The sample size required (n) at (ε)
SKU ID/Measure	variance	std. dev	std. error	observation	0.05 error	0.1 error
BZ001	5	2.2	1.9	127.340	7719	1930
BZ004	5.2	2.3	2.1	129.689	7790	1993
BZ009	1.1	1	0.9	88.809	1686	422
BZ012	1.7	1.3	1.1	99.859	2607	652
BZ013	2	1.4	1.2	55.158	3127	782
BZ014	4.3	2.1	1.7	18.167	6581	1646
BZ030	1.4	1.2	0.3	30.466	2152	538

B. Interpretation of the log-transformed bullwhip ratio correlation coefficients and covariance

Variable 1	Variable 2	Pearson correlation coefficient	Kendall’s tau_b	Spearman’s rho	Interpretation
Store Count Excess Stock Ratio	Log-transformed bullwhip ratio	0.25	0.22	0.32	Positive relationship: An increase in the time increments relative to the total period of the store holding excess stock is associated with an increase in the log-transformed bullwhip ratio.
Est. Weeks of Stock (Supply)	Log-transformed bullwhip ratio	0.2	0.23	0.33	Positive relationship: An increase in the number of weeks of stock available is associated with an increase in the log-transformed bullwhip ratio.
Mean Supply (Stock Quantity Balance)	Log-transformed bullwhip ratio	0.21	0.24	0.34	Positive relationship: An increase in the mean stock on hand is associated with an increase in the log-transformed bullwhip ratio.
Weeks	Log-transformed bullwhip ratio	−0.4	−0.26	−0.36	Negative relationship: An increase in the number of weeks the SKUs are ranged in-store is associated with a decrease in the log-transformed bullwhip ratio.
Store stock turn ratio	Log-transformed bullwhip ratio	−0.47	−0.44	−0.6	Negative relationship: An increase in the store stock turn ratio is associated with a decrease in the log-transformed bullwhip ratio.

C. Bullwhip correlation analysis and interpretations

D. Zeus Africa multivariable regression training and testing data

SKU ID	BZ001	BZ004	BZ009	BZ012	BZ013	BZ030	Total
Training Freq and %	444	559	342	488	199	641	2673
Training Freq and %	16.6%	20.9%	12.8%	18.3%	7.4%	24.0%	100.0%
Testing Freq and %	191	239	146	209	85	275	1145
Testing Freq and %	16.7%	20.9%	12.8%	18.3%	7.4%	24.0%	100.0%

E. PCA model assumptions

Assumption 1: Sphericity or existence of collinearity between the variables.

Bartlett’s Test for Sphericity is used to determine whether the intercorrelation matrix comes from a non-linear population. If this is not the case, PCA may not be appropriate, as it relies on constructing a linear combination of the variables. The hypothesis being tested by Bartlett’s Test is stated as follows:

Ho: No collinearity between the variables exist.

Ha: Collinearity between the variables exist.

Application of Bartlett’s Test at a significance level of 0.05, denoted as α = 0.05, yielded a p-value of <0.038, thus rejecting the null hypothesis and accepting the alternative hypothesis of sufficient evidence that collinearity between the variables exist.

Assumption 2: Sample adequacy.

The KMO (Kaiser-Meyer-Olkin) test is applied to determine if the data is suitable for dimension reduction techniques such as factor analysis or PCA. The following threshold guides interpretation of the KMO statistic, and KMO values can be interpreted as follows (Stephanie, 2016):

KMO values between:

to 0.49 is considered unacceptable.
0.50 to 0.59 is considered miserable.
0.60 to 0.69 is considered mediocre.

In summary, KMO values greater than 0.79 indicate an adequate data sample. KMO values less than 0.5 suggest an inadequate data sample and require remedial action. Applying the KMO (Kaiser-Meyer-Olkin) test yielded an overall KMO score of 0.79, concluding that the data sample is adequate. Furthermore, variables recording a KMO score of less than 0.50 were excluded from the PCA.

Assumption 3. Positive determinant of the correlation or variance-covariance matrices The determinant value must be positive, implying a positive symmetric covariance matrix. The assumption was tested by applying the R “det” function. The “det” function yielded a value of 2.75601e-09, concluding that the positive determine value satisfies the assumption.

Assumption 4. PCA scree plot - PVE and cumulative PVE.

The figure below highlights a drop in the PVE after the fourth PC. However, these four PCs can explain 79% of the variance within the dataset. In addition, the model accuracy on four PCs yielded a root mean square of residuals (RMSR) value of 0.056, which is on the cusp of the acceptable 0.05 threshold (Rajput, 2018). Loadings are representative of the eigenvalue for the respective principal component. All components reflect acceptable SS loading values of greater than 1.

F. Zeus Africa multivariable regression model coefficients and interpretations

Coefficient	Estimate	P-value	Interpretation
Intercept	0.3174756	< 2e-16	The intercept can be interpreted as the average log-transformed bullwhip ratio if all independent variables are set to a value of 0.
SKU ID BZ001	0.3174756	< 2e-16	The estimated value of SKU ID BZ001 is the base value and is equivalent to the intercept. Translating to a mean log-transformed bullwhip ratio of 0.3174756.
SKU ID BZ004	0.0651518	4.58e-08	Stores stocking and selling SKU ID BZ004 will increase the mean log-transformed bullwhip measure by 0.0651518 compared to the base SKU ID BZ001.
SKU ID BZ009	−0.0430543	0.002237	Stores stocking and selling SKU ID BZ009 will decrease the mean log-transformed bullwhip measure by −0.0430543 compared to the base SKU ID BZ001.
SKU ID BZ012	0.0226517	0.062349 ** Significant at α = 0.01	Stores stocking and selling SKU ID BZ012 will increase the mean log-transformed bullwhip measure by 0.0226517 compared to the base SKU ID BZ001. While SKU ID did not reflect as significant at an error rate of 0.05, however, accepting an error rate of 0.01 denoted as α = 0.01 is considered significant.
SKU ID BZ013	0.0705915	4.50e-06	Stores stocking and selling SKU ID BZ013 will increase the mean log-transformed bullwhip measure by 0.0705915 compared to the base SKU ID BZ001.
SKU ID BZ030	0.0704256	0.000277	Stores stocking and selling SKU ID BZ013 will increase the mean log-transformed bullwhip measure by 0.0704256 compared to the base SKU ID BZ001.
Median weeks store replenishment	−0.0028166	0.003893	For every increase in the median weeks between store replenishment, the mean log-transformed measure will decrease by 0.0028166.
PCA (PC1) - Price, Promotion fluctuations	0.0131129	0.000460	For every unit increase in the PCA price promotion component, the mean log-transformed bullwhip measure will increase by 0.0131129.
PCA (PC2) - Order batching	−0.0726859	< 2e-16	For every unit increase in the PCA order batching component, the mean log-transformed bullwhip measure will decrease by −0.0726859.
PCA (PC3) - Demand signal processing	−0.0549302	< 2e-16	For every unit increase in the PCA demand signal processing component, the mean log-transformed bullwhip measure will decrease by −0.0549302.
PCA (PC4) - Lead time	0.0377625	5.55e-14	For every unit increase in the PCA lead time component, the mean log-transformed bullwhip measure will increase by 0.0377625.

G. Zeus Africa regression model evaluation and interpretations

Measure	Hypothesis/Definition	Result	Interpretation
F critical value and overall p-value	H_o: Intercept only model fits data H_a: Model fits data better than intercept only model	F Stat: 125.1 P-value: < 2.2e-16 With the p-value <0.05 we Fail to reject H_o and accept H_a	We can conclude that the overall model is acceptable, and there is a statistically significant relationship between the store bullwhip ratio and the contributing bullwhip principal components and store demand planning variables identified.
Adjusted R Squared	The independent variables explain the proportion of the dependent variable with adjustment for the number of terms in the model.	Adj. R Square: 0.31	The contributing bullwhip principal components can explain 31% of the variance in the store bullwhip ratio, and store demand planning variables identified.
P-value	H_o: Independent variable does not correlate with the dependent variable. H_a: Independent variable is correlated with the dependent variable.	P-value: < 2.2e-16	Any contributing bullwhip principal components and store demand planning variables with p-value >0.05 were excluded from the model. It is accepted that a statistically significant relationship exists between the remaining bullwhip principal components and store demand planning variables and the store bullwhip ratio.
Standard Error (S)	A measure of goodness of fit is expressed in absolute terms. The following rule of thumb applies to measure the typical distance of the data points from the regression line. The standard error should be smaller than one standard deviation.	Log-transformed S: 0.17 S: 1.5 Log-transformed Std.Dev: 0.21 Std.Dev: 1.6	The standard error of 0.17 is less than one standard deviation of 0.21 and concludes that the model has the required level of precision.
Root mean squared error (RMSE)	Represents the absolute fit of the model to the data, measuring how close the actual values are to the predicted values. RMSE value represents the unit of the dependent variable and translates to the standard deviation of the unexplained variance.	Log-transformed RMSE: 0.17 RMSE: 1.5	The bullwhip ratio ranges from −0.82 to 1.44 units with a standard deviation of 0.21, in context the RMSE value of 0.17 is acceptable and conclude the model is a good fit to the data.
Scatter index (SI)	A measure of determining if RMSE is acceptable, a value of <1 is deemed acceptable.	SI: 0.39	The scatter index of 0.39 is below the threshold, thus accepting the integrity of the predictions.

H. Time series technique class

Technique	Description	Class
Holt-Winters	An exponential smoothing forecast method. Transforming demand (response variable) into a weighted average of past observation values, whereby current observation values will carry more weight or importance than older observations.	Conventional
Arima (AutoRegressive Integrated Moving Average)	AutoRegressive - Output is regressed on its own lagged observation values. Integrated - The number of times differencing needs to be applied to achieve stationarity. Moving average - using past forecast errors as opposed to past observation values.	Conventional
Sarimax (Seasonal ARIMA + Exogenous variables)	Incorporates all ARIMA components and accommodates exogenous variables – external regressors to improve forecast accuracy.	BDA
Tbats (Trigonometric seasonality Box-Cox transformation ARMA errors Trend Seasonality + Exogenous variables)	An exponential smoothing forecast method. Accommodating for multiple seasonal patterns by the use of a trigonometric function and allowing for exogenous variables - external regressors to improve forecast accuracy.	BDA
Neural Networks	Accommodates non-linear relationships between demand (response variable) and exogenous variables - external regressors.	BDA
SMA Simple Moving Average	Transforming demand (response variable) into an arithmetic average by the number of periods within a given range.	Conventional

I. Time series model measure

Measure	Definition
Mean absolute error (MAE)	Represents average magnitude of errors and expressed in units of the dependent variable. The absolute fit of the model to the data, measuring how close the actual values are to the predicted values.
Mean absolute percentage error (MAPE)	Represents the forecast errors as a % of the actual observed value.
Forecast Errors Root mean squared error (RMSE)	Represents the squared absolute fit of the model to the data, measuring how close the actual values are to the predicted values. The RMSE value is expressed in units of the dependent variable and translates to the standard deviation of the unexplained variance.
Scatter index (SI)	A measure of determining if RMSE is acceptable, a value of <1 is deemed acceptable.

References

1. Cetindamar D, Shdifat B, Erfani S. Assessing big data analytics capability and sustainability in supply chains [Internet]. 2020. [cited 2022 Aug 17]. Available from: http://hdl.handle.net/10125/63765
2. Mafini C, Muposhi A. Predictive analytics for supply chain collaboration, risk management and financial performance in small to medium enterprises. Southern African Business Review. 2017;21(1):311-338
3. Seyedan M, Mafakheri F. Predictive big data analytics for supply chain demand forecasting: methods, applications, and research opportunities. Journal of Big Data. 2020 Jul 25;7(1):53
4. Disney SM, Lambrecht MR. On replenishment rules, forecasting, and the bullwhip effect in supply chains. now Publishers. 2008. [cited 2022 Jul 4]. [Internet] Available from: https://ofppt.scholarvox.com/catalog/book/10232240?_locale=en
5. Firican G. The history of big data [Internet]. LightsOnData. 2022. [cited 2022 Aug 8]. Available from: https://www.lightsondata.com/the-history-of-big-data/
6. Dontha R. Who came up with the name big data? - DataScienceCentral.com. Data Science Central. [Internet]. 2017. [cited 2022 Aug 8]. Available from: https://www.datasciencecentral.com/who-came-up-with-the-name-big-data/
7. Definition of Big Data - Gartner Information Technology Glossary [Internet]. Gartner. [cited 2022 Aug 8]. Available from: https://www.gartner.com/en/information-technology/glossary/big-data
8. Iqbal M, Kazmi SHA, Manzoor A, Soomrani AR, Butt SH, Shaikh KA. A study of big data for business growth in SMEs: Opportunities & challenges. 2018
9. Mikalef P, Krogstie J, Pappas IO, Pavlou P. Exploring the relationship between big data analytics capability and competitive performance: The mediating roles of dynamic and operational capabilities. Information and Management. 2020 Mar 1;57(2):103169
10. Oncioiu I, Bunget OC, Türkeș MC, Căpușneanu S, Topor DI, Tamaș AS, et al. The impact of big data analytics on company performance in supply chain management. Sustainability. 2019;11(18):4864
11. Kitchin R. Big data, new epistemologies and paradigm shifts. Big Data & Society. 2014;1(1):1-12
12. Krajewski LJ, Malhotra MK, Ritzman LP. Operations Management. Processes and Supply Chains. 11th ed. England: Pearson; 2016
13. Slack N, Chambers S, Johnston R. Operations Management. 5th ed. United Kingdom: Pitman Publishing; 2007
14. Fisher ML. What is the right supply chain for your product? [Internet]. 1997. [cited 2021 Jan 1]. Available from: https://www.academia.edu/31156494/What_Is_the_Right_Supply_Chain_for_Your_Product
15. Mathu KM. The information technology role in supplier-customer information-sharing in the supply chain management of south African small and medium-sized enterprises. South African Journal of Economic and Management Sciences (SAJEMS). 2019;22(1):8
16. Sousa AL, Ribeiro T, Relvas S, Barbosa-Póvoa A. Using machine learning for enhancing the understanding of bullwhip effect in the oil and gas industry. Machine Learning and Knowledge Extraction. 2019;1:994-1012. DOI: 10.3390/make1030057
17. Tamim WA, Nawaz RR. Supply Chain Management: Reducing the Bullwhip Effect in SME’s. Market Forces. 2017;12(1):55
18. Mafini C, Omoruyi O. Logistics benefits and challenges: The case of SMEs in a South African local municipality. The Southern African Journal of Entrepreneurship and Small Business. 2013;6(1):145
19. Soroka A, Liu Y, Han L, Haleem MS. Big data driven customer insights for SMEs in redistributed manufacturing. Procedia CIRP. 2017;63:692-697
20. Bryman A. Social Research Methods. 4th ed. Oxford: Uk Oxford University Press; 2012
21. Smart Vision Europe. What Is the CRISP-DM methodology. Smart Vision - Europe. 2017. [Internet] Available from: https://www.sv-europe.com/crisp-dm-methodology/
22. Rodrigues I. CRISP-DM methodology leader in data mining and big data [Internet]. Medium. 2020. [cited 2021 Jan 1]. Available from: https://towardsdatascience.com/crisp-dm-methodology-leader-in-data-mining-and-big-data-467efd3d3781
23. Jim F. Regression analysis: An intuitive guide. Statistics By Jim. 2019. [ebook] [Internet] [cited 2020 Jan 1]. Available from: https://statisticsbyjim.selz.com/item/regression-analysis-an-intuitive-guide
24. Akoglu H. User’s guide to correlation coefficients. Turkish Journal of Emergency Medicine. 2018;18(3):91-93
25. James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning. New York, Ny: Springer; 2013
26. Frost J. Multicollinearity in regression analysis: Problems, detection, and solutions - statistics by Jim. Statistics by Jim. 2017. [cited 2021 Jan 1]. [Internet] Available from: https://statisticsbyjim.com/regression/multicollinearity-in-regression-analysis/
27. Sobolewska E. RPubs - principal component regression [Internet]. rpubs.com. 2019. [cited 2022 Jan 1]. Available from: https://rpubs.com/esobolewska/pcr-step-by-step
28. Gracey M. When B2B data is personal data and what that means with the GDPR. Medium. 2017. [Internet] [cited 2020 Jan 1]. Available from: https://medium.com/@digital_compliance/when-b2b-data-is-personal-data-and-what-that-means-with-the-gdpr-d4223ea74e09
29. Muller L. The South African toy market - a country divided, yet incredibly promising. Seekeing Alpha. 2017. [Internet] Available from: https://seekingalpha.com/article/4085341-south-african-toy-market-country-divided-yet-incredibly-promising
30. Bellégo C, Benatia D, Pape LD. Dealing with logs and zeros in regression models [Internet]. papers.ssrn.com. 2021. [cited 2022 Jan 1]. Available from: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3444996
31. Rajput P. Exploratory factor analysis [Internet]. rstudio-pubs-static.s3.amazonaws.com. 2018. [cited 2022 Jan 1]. Available from: https://rstudio-pubs-static.s3.amazonaws.com/376139_e9adaefdf4594a79a54a3f87ff4852d6.html#:∼:text=Factor%20Analysis%20Model%20Adequacy
32. Prabhakaran S. 10 Assumptions of Linear Regression - Full List with Examples and Code [Internet]. r-statistics.co. 2016. [cited 2020 Jan 1]. Available from: http://r-statistics.co/Assumptions-of-Linear-Regression.html
33. Han J, Kamber M. Data Mining: Concepts and Techniques. 3rd ed. Waltham, MA, USA: Elsevier; 2012
34. Gautam A, Singh V. Parametric versus non-parametric time series forecasting methods: A review. Journal of Engineering Science and Technology Review. 2020;13(3):165-171
35. Hyndman RJ, Athanasopoulos G. Forecasting: Principles and Practice. 2nd ed OTexts2018. [cited 2021 Jan 1]. Melbourne, Australia: OTexts; 2023. [Internet] Available from: https://otexts.com/fpp2/
36. DataCamp. Splinefun Function - RDocumentation Stats (version 3.6.2) [Internet]. www.rdocumentation.org. [cited 2022 Jan 1]. Available from: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/splinefun
37. Moufaddal M, Benghabrit A, Bouhaddou I. Big Data Analytics for Supply Chain Management. Cham: Springer; 2018. [cited 2021 Jan 1]. pp. 976-986. Available from:. DOI: 10.1007/978-3-319-74500-8_87
38. Statistic Solutions. Sample Size Calculation [Internet]. Statisticsolutions.com. [cited 2021 Jan 1]. Available from: https://www.statisticssolutions.com/sample-size-calculation-2/

[1] 1. Cetindamar D, Shdifat B, Erfani S. Assessing big data analytics capability and sustainability in supply chains [Internet]. 2020. [cited 2022 Aug 17]. Available from: http://hdl.handle.net/10125/63765

[2] 2. Mafini C, Muposhi A. Predictive analytics for supply chain collaboration, risk management and financial performance in small to medium enterprises. Southern African Business Review. 2017;21(1):311-338

[3] 3. Seyedan M, Mafakheri F. Predictive big data analytics for supply chain demand forecasting: methods, applications, and research opportunities. Journal of Big Data. 2020 Jul 25;7(1):53

[4] 4. Disney SM, Lambrecht MR. On replenishment rules, forecasting, and the bullwhip effect in supply chains. now Publishers. 2008. [cited 2022 Jul 4]. [Internet] Available from: https://ofppt.scholarvox.com/catalog/book/10232240?_locale=en

[5] 5. Firican G. The history of big data [Internet]. LightsOnData. 2022. [cited 2022 Aug 8]. Available from: https://www.lightsondata.com/the-history-of-big-data/

[6] 6. Dontha R. Who came up with the name big data? - DataScienceCentral.com. Data Science Central. [Internet]. 2017. [cited 2022 Aug 8]. Available from: https://www.datasciencecentral.com/who-came-up-with-the-name-big-data/

[7] 7. Definition of Big Data - Gartner Information Technology Glossary [Internet]. Gartner. [cited 2022 Aug 8]. Available from: https://www.gartner.com/en/information-technology/glossary/big-data

[8] 8. Iqbal M, Kazmi SHA, Manzoor A, Soomrani AR, Butt SH, Shaikh KA. A study of big data for business growth in SMEs: Opportunities & challenges. 2018

[9] 9. Mikalef P, Krogstie J, Pappas IO, Pavlou P. Exploring the relationship between big data analytics capability and competitive performance: The mediating roles of dynamic and operational capabilities. Information and Management. 2020 Mar 1;57(2):103169

[10] 10. Oncioiu I, Bunget OC, Türkeș MC, Căpușneanu S, Topor DI, Tamaș AS, et al. The impact of big data analytics on company performance in supply chain management. Sustainability. 2019;11(18):4864

[11] 11. Kitchin R. Big data, new epistemologies and paradigm shifts. Big Data & Society. 2014;1(1):1-12

[12] 12. Krajewski LJ, Malhotra MK, Ritzman LP. Operations Management. Processes and Supply Chains. 11th ed. England: Pearson; 2016

[13] 13. Slack N, Chambers S, Johnston R. Operations Management. 5th ed. United Kingdom: Pitman Publishing; 2007

[14] 14. Fisher ML. What is the right supply chain for your product? [Internet]. 1997. [cited 2021 Jan 1]. Available from: https://www.academia.edu/31156494/What_Is_the_Right_Supply_Chain_for_Your_Product

[15] 15. Mathu KM. The information technology role in supplier-customer information-sharing in the supply chain management of south African small and medium-sized enterprises. South African Journal of Economic and Management Sciences (SAJEMS). 2019;22(1):8

[16] 16. Sousa AL, Ribeiro T, Relvas S, Barbosa-Póvoa A. Using machine learning for enhancing the understanding of bullwhip effect in the oil and gas industry. Machine Learning and Knowledge Extraction. 2019;1:994-1012. DOI: 10.3390/make1030057

[17] 17. Tamim WA, Nawaz RR. Supply Chain Management: Reducing the Bullwhip Effect in SME’s. Market Forces. 2017;12(1):55

[18] 18. Mafini C, Omoruyi O. Logistics benefits and challenges: The case of SMEs in a South African local municipality. The Southern African Journal of Entrepreneurship and Small Business. 2013;6(1):145

[19] 19. Soroka A, Liu Y, Han L, Haleem MS. Big data driven customer insights for SMEs in redistributed manufacturing. Procedia CIRP. 2017;63:692-697

[20] 20. Bryman A. Social Research Methods. 4th ed. Oxford: Uk Oxford University Press; 2012

[21] 21. Smart Vision Europe. What Is the CRISP-DM methodology. Smart Vision - Europe. 2017. [Internet] Available from: https://www.sv-europe.com/crisp-dm-methodology/

[22] 22. Rodrigues I. CRISP-DM methodology leader in data mining and big data [Internet]. Medium. 2020. [cited 2021 Jan 1]. Available from: https://towardsdatascience.com/crisp-dm-methodology-leader-in-data-mining-and-big-data-467efd3d3781

[23] 23. Jim F. Regression analysis: An intuitive guide. Statistics By Jim. 2019. [ebook] [Internet] [cited 2020 Jan 1]. Available from: https://statisticsbyjim.selz.com/item/regression-analysis-an-intuitive-guide

[24] 24. Akoglu H. User’s guide to correlation coefficients. Turkish Journal of Emergency Medicine. 2018;18(3):91-93

[25] 25. James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning. New York, Ny: Springer; 2013

[26] 26. Frost J. Multicollinearity in regression analysis: Problems, detection, and solutions - statistics by Jim. Statistics by Jim. 2017. [cited 2021 Jan 1]. [Internet] Available from: https://statisticsbyjim.com/regression/multicollinearity-in-regression-analysis/

[27] 27. Sobolewska E. RPubs - principal component regression [Internet]. rpubs.com. 2019. [cited 2022 Jan 1]. Available from: https://rpubs.com/esobolewska/pcr-step-by-step

[28] 28. Gracey M. When B2B data is personal data and what that means with the GDPR. Medium. 2017. [Internet] [cited 2020 Jan 1]. Available from: https://medium.com/@digital_compliance/when-b2b-data-is-personal-data-and-what-that-means-with-the-gdpr-d4223ea74e09

[29] 29. Muller L. The South African toy market - a country divided, yet incredibly promising. Seekeing Alpha. 2017. [Internet] Available from: https://seekingalpha.com/article/4085341-south-african-toy-market-country-divided-yet-incredibly-promising

[30] 30. Bellégo C, Benatia D, Pape LD. Dealing with logs and zeros in regression models [Internet]. papers.ssrn.com. 2021. [cited 2022 Jan 1]. Available from: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3444996

[31] 31. Rajput P. Exploratory factor analysis [Internet]. rstudio-pubs-static.s3.amazonaws.com. 2018. [cited 2022 Jan 1]. Available from: https://rstudio-pubs-static.s3.amazonaws.com/376139_e9adaefdf4594a79a54a3f87ff4852d6.html#:∼:text=Factor%20Analysis%20Model%20Adequacy

[32] 32. Prabhakaran S. 10 Assumptions of Linear Regression - Full List with Examples and Code [Internet]. r-statistics.co. 2016. [cited 2020 Jan 1]. Available from: http://r-statistics.co/Assumptions-of-Linear-Regression.html

[33] 33. Han J, Kamber M. Data Mining: Concepts and Techniques. 3rd ed. Waltham, MA, USA: Elsevier; 2012

[34] 34. Gautam A, Singh V. Parametric versus non-parametric time series forecasting methods: A review. Journal of Engineering Science and Technology Review. 2020;13(3):165-171

[35] 35. Hyndman RJ, Athanasopoulos G. Forecasting: Principles and Practice. 2nd ed OTexts2018. [cited 2021 Jan 1]. Melbourne, Australia: OTexts; 2023. [Internet] Available from: https://otexts.com/fpp2/

[36] 36. DataCamp. Splinefun Function - RDocumentation Stats (version 3.6.2) [Internet]. www.rdocumentation.org. [cited 2022 Jan 1]. Available from: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/splinefun

[37] 37. Moufaddal M, Benghabrit A, Bouhaddou I. Big Data Analytics for Supply Chain Management. Cham: Springer; 2018. [cited 2021 Jan 1]. pp. 976-986. Available from:. DOI: 10.1007/978-3-319-74500-8_87

[38] 38. Statistic Solutions. Sample Size Calculation [Internet]. Statisticsolutions.com. [cited 2021 Jan 1]. Available from: https://www.statisticssolutions.com/sample-size-calculation-2/