Open access peer-reviewed chapter

Seeking Accuracy in Forecasting Demand and Selling Prices: Comparison of Various Methods

By Zineb Aman, Latifa Ezzine, Yassine Erraoui, Younes Fakhradine El Bahi and Haj El Moussami

Submitted: May 12th 2020Reviewed: June 12th 2020Published: January 27th 2021

DOI: 10.5772/intechopen.93171

Downloaded: 51

Abstract

The need for a good forecast estimate is imperative for managing flows in a supply chain. For this, it is necessary to make forecasts and integrate them into the flow control models, in particular in contexts where demand is very variable. However, forecasts are never reliable, hence the need to give a measure of the quality of these forecasts, by giving a measure of the forecast uncertainty linked to the estimate made. Different forecasting models have been developed in the past, particularly in the statistical area. Before going to our application on real industrial cases which highlights a prospective study of demand forecasting and a comparative study of sales price forecasts, we begin, in the first section of this chapter, by presenting the forecasting models, as well as their validation and monitoring.

Keywords

  • forecasts
  • accuracy
  • quality of forecasts
  • demand forecasting
  • selling price forecasting

1. Introduction

For most companies, forecasting is a prerequisite for effective supply chain management. As explained by Lai et al. [1], forecasting is the basis of all production management systems. The entire supply chain is based on the data from forecast models.

In Ref. [2], the authors show the usefulness of forecasting and planning as a decision-making tool for organizing the supply chain across all horizons of time and at all levels.

In the academic field, forecasting occupies an important place. Given the primordial role of forecasting, we understand why many models have been developed since the beginning of the twentieth century. Research mainly developed from the 1950s onward with the use of mathematical models. A review of the literature was carried out by Stadtler [3]. We find there the interest of forecasting for the global supply chain in order to integrate the different organizations and coordinate their flows in order to satisfy the end consumer.

The various sources for making these forecasts are located throughout the supply chain, including the commercial part of the business. It is the analysis of this source that will help build the basis for future forecasting. In the end, the sources used to build the forecasts are therefore multiple.

2. Application to prospective approach: modeling and forecasting demand using the ARIMA models

In the manufacturing sector, forecasting demand is one of the most crucial problems in inventory management [4]; it can be used in various operational planning activities during the production process: capacity planning and management of used product acquisitions [5].

For both types of push/pull supply chain processes, demand forecasting forms the basis of all CS planning. The “pull” processes in the SC are performed in response to the client’s request, while all the “push” processes are performed in anticipation of the client’s request [6]. A business needs to know many factors related to forecasting demand. Some of these factors are listed below:

  • past requests;

  • product delivery time;

  • planned advertising or marketing efforts;

  • state of the economy;

  • price reduction planned; and

  • actions undertaken by competitors.

Businesses need to understand these factors before they can choose an appropriate forecasting method as it can be difficult to decide which method is the most suitable for forecasting. Forecasting methods are classified into the following types: time series, causal, qualitative, and simulation [6].

A time series is considered to be a set of observations cited in chronological order [7]. To forecast demand, time series forecasting models are based on historical data. These mathematical models used are based on the assumption that the future is an expansion of the past [8].

Numerous studies on demand forecasting by time series analysis have been carried out in several fields. They include demand forecasts for food sales [9], tourism [10], spare parts [4, 11], electricity [12, 13], automobiles [14], and some other goods and services [15, 16, 17].

In this section, we forecast the demand for a product in a food manufacturing operation based on real data, as well as the precision and characteristics of these forecasts.

Our study will be carried out according to the three stages of the Box-Jenkins approach: identification, estimation, and verification. We present the model relating to product demand from January 2010 to December 2015 as shown in Figure 1.

Figure 1.

Evolution of the final product’s sales.

2.1 Identification of model

This refers to the initial preprocessing of the data to make it stationary and to the choice of p and q values that can be adjusted during model fitting.

We present the ACF and PACF diagrams of the series in Figures 2 and 3, respectively. We find that this series oscillates, respectively, around an average value, and its autocorrelation function decreases to zero point rapidly, which proves the stationarity of the time series studied.

Figure 2.

ACF correlograms of the demand series.

Figure 3.

PACF correlograms of the demand series.

Moreover, to assess whether the data come from a stationary process, we can perform the unit root test: Dickey-Fuller test for stationarity. After carrying out the test on the Xlstat software, the results are grouped in Table 1.

  1. H0: The series has a unit root.

  2. H1: The series does not have a unit root. The series is stationary.

Tau (observed value)−1.350
Tau (critical value)−0.717
p (unilateral)0.844
α0.05

Table 1.

Test results.

The null hypothesis H0 cannot be rejected since the calculated p value is greater than the significance level α set at 0.05. We calculated the risk of rejecting the null hypothesis H0, while it is true. The risk is 84.38%.

In our study, we checked the stationarity of the series, and we noted from the ACF and PACF correlograms that our model cannot be pure RA or pure MA. Therefore, we tested several models to identify the most suitable for our series.

2.2 Estimation of model coefficients

Using the ARIMA procedure of the SPSS time series module [18], we can estimate the coefficients of our model by providing the parameters p, q, and d [19, 20, 21, 22].

The best model is as simple as possible and minimizes certain criteria, namely AIC criteria (Akaike criterion), SBC (Bayesian criterion of Schwarz), variance, and maximum likelihood [23, 24, 25]. The chosen model is that of ARIMA (0, 1, 1). For other models, either the Student “T-RATIO” test values are found in the range of ±1.96, or one of the values of the minimization criteria is higher than that found for the ARIMA model (1, 0, 1) with the constant value.

Table 2 presents the values of the different models. From this table, we choose the appropriate model on which we will base ourselves to make our forecasts.

CharacteristicsModels
ARIMA (1,0,2)ARIMA (2,0,2)ARIMA (1,0,1)ARIMA (1,0,0)ARIMA (0,0,1)ARIMA (1,0,1) without constant
AR (1)α10.929130.713710.907920.49434−0.417040.99755623
SEB0.1046160.7617580.0948520.10744710.11199890.00444769
T value8.88132040.93692929.5719554.600820−3.723567224.28617
p value0.000000000.352160080.0000000.000018230.000393840.0000000
MA (1)θ10.522690.317790.638800.71392452
SEB0.1670730.7415950.1615310.08579173
T value3.12849950.42851863,9546558.32160
p value0.002587110.669648150.000183190.0000000
AR (2)α20.19759
SEB0.659442
T value0.2996279
p value0.76538859
MA (2)θ20.170620.30202
SEB0.1422580.409353
T value1.19934290.7377864
p value0.234557080.46322050
ConstantCte124.42969124.52640125.53260128.53887129.22650
SEB12.60818912.60129611.7855376.57152354.8971088
T value9.86895819.882031210.65141119.55998126.388326
p value0.000000000.000000000.000000000.000000000.00000000
AIC688.86593690.82312688.77347689.37103693.59055692.04831
SBC697.9726702.20645695.60347693.92437698.14388696.60164
Log likelihood−340.43297−340.41156−341.38674−342.68552−344.79527−344.02415
Error28.03489828.22172128.21481228.57624929.44375928.461048

Table 2.

Coefficients of different models.

It is clear from Table 2 that the ARIMA model (1,0,1) is selected because all the coefficients are significantly different from 0 according to the Student test (|T-RATIO|) ≥ 1.96) with an acceptable level of adjustment.

The model residue is stationary and follows a white noise process in the range of ±40. The residue histogram shows whether the distribution of residues approximates a normal distribution. In our case, we have residues that distribute relatively normal around zero and with a relatively low dispersion at a 5% risk.

The chosen model parameters are presented in Table 3.

AR (1)α10.90792
SEB0.094852
T value9.571955
p value0.00000000
MA (1)θ10.63880
SEB0.161531
T value3.954655
p value0.00018319
ConstantΔ125.53260

Table 3.

ARIMA model parameters.

The developed model is given by Eq. 1.

yt=δ+α1yt1θ1εt1+εtE1

With:

  • yt, yt1: sales of periods t and t–1, respectively.

  • εt,εt1: residuals of periods t and t–1, and constitute a white nose.

  • α1, θ1: coefficients of autoregressive and moving average processes, respectively.

We can easily extract from Table 3 the coefficients of the autoregressive processes and moving averages and inject them into Eq. (1), which becomes:

yt=125,524+0,90792yt10.6388εt1E2

2.3 Accuracy of ARIMA (1, 0, 1) model

In order to assess the accuracy of the developed model, we compare the experimental and simulated sales during the same period. This comparison is drawn up in Table 4 and reveals that the model selected has great precision and an ability to simulate dynamic sales behavior. Therefore, this model can be used to analyze and model the demand in this food manufacturing.

Model73747576777879808182
Sales-Model_1Prévision95.1297.92100.46102.77104.86106.77108.49110.06111.49112.78
UCL151.41156.21160.35163.95167.08169.83172.25174.38176.26177.93
LCL38.8339.6340.5741.5942.6443.7044.7445.7546.7147.63

Table 4.

Forecast sales from January 2016 to October 2016.

Figure 4 shows that the model is validated since the predicted demand fluctuates around the adjustment and the forecast demand, which remained between the upper limit and the lower limit.

Figure 4.

Sales, fit, LCL, and UCL.

The error varies, but it is within the tolerance range. In order to minimize this error, we are opting for other approaches in our future work.

2.4 Forecast

Once the appropriate model is defined and validated, we must do the forecasting, using the IBM SPSS forecasting. Table 4 and Figure 5 present the results of the sales forecasts that we obtained by applying our ARIMA model (1, 0, 1) for the next 10 months from January 2016 to October 2016.

Figure 5.

Sales, fit, LCL, UCL, and forecasting.

The chosen model can therefore be used to model and forecast future demand in this food manufacturing. However, each time we have to feed historical data with new data to enrich it and thus improve the new model and forecasts.

The accrue forecasts presented facilitated the production decision in this business. Indeed, the model allowed us to forecast demand and make precise forecasts. Once we have a forecast of demand, it will be much easier to clearly plan the production and thus eliminate the heavy cost losses.

3. Application to comparative approach: comparison of the quality of forecasts obtained in the context of forecasting selling prices

Our second industrial application is devoted to a modeling study and comparative forecast of sales prices using ARIMA models, artificial neural networks, and support vector machines.

In this section, we will model the actual fuel price data named “SSP” in order to make important predictions to determine future selling prices. The model shown in Figure 6 is based on the price of “SSP” fuel in a petroleum production from January 2012 to December 2016.

Figure 6.

Selling price of “SSP.”

3.1 Forecasting using ARIMA models

3.1.1 Determination of the differentiation parameter

Under SPSS, we have drawn the autocorrelation function (ACF) and the partial autocorrelation function (PACF), the results found are presented in Figures 7 and 8.

Figure 7.

ACF correlogram for the sales price series.

Figure 8.

PACF correlogram for the sales price series.

The series has a large number of positive shifts for the autocorrelation function, so it must be differentiated.

The next step is to differentiate the series. You have to differentiate it enough to make it immobile but not drag with an excessive differentiation, which will cause a loss of information and therefore unstable models. In our case, we just had to take d = 1 because of the linearity of the trend.

Besides, to decide if the data come from a stationary process or not, we can carry out the unit root test: Dickey-Fuller test for stationarity. After performing the test on the Xlstat software, we grouped the results in Table 5.

  1. H0: The series has a unit root.

  2. H1: The series does not have a unit root. The series is stationary.

Tau (observed value)−4.0325
Tau (critical value)−0.7648
p (unilateral)0.0092
α0.05

Table 5.

Test results.

The null hypothesis H0 must be rejected, and the alternative hypothesis H1 must be accepted since the calculated p value is less than the significance level α set at 0.05. We calculated the risk of rejecting the null hypothesis H0, while it is true. The risk is less than 0.92%.

We conclude that our model will have an order of differentiation d = 1. We also note that the T-RATIO for the constant of model μ is less than 2 in absolute value. We must therefore deduct it from the model before determining the parameters p and q.

3.1.2 Determination of the autoregressive parameter

Figures 9 and 10 show the residue curve and the ACF and PACF diagrams of the residues of the ARIMA model (0, 1, 0), respectively.

Figure 9.

Residue curve.

Figure 10.

ACF and PACF diagrams of the residues of the ARIMA model (0,1,0).

We can clearly see from Figures 9 and 10 that the partial autocorrelation has a significant peak at offset 2, and we can then deduce that the differentiated series comprises an autoregressive signature. The parameter p is therefore equal to 1.

However, the T-RATIO for the autoregressive parameter φ1 is lower in absolute value than 2. So, we cannot retain this model. Similarly, the ARIMA model (2, 1, 0) presents the autoregressive parameters whose T-RATIO is less than 2 in absolute value.

3.1.3 Determination of the moving average parameter

Now, the T-RATIO for the moving average parameter θ1 is lower in absolute value than 2. So we cannot retain this model. Similarly, the ARIMA model (0,1,2) presents moving average parameters whose T-RATIO is less than 2 in absolute value.

3.1.4 Mixed ARIMA model

After several iterations and tests, we concluded that only the ARIMA model (1,1,1) had higher T-RATIOS in absolute value than 2. This is the model we should use to make forecasts.

With the coefficients obtained now, we can write the equation of the model retained as follows:

yt=yt10.928yt1yt2+0.873εt1+εtE3

Table 6 lists the forecasts obtained for the first quarter of 2017.

FortnightReal priceModel% error
1Q January10721042.49−2.752798507
2Q January10741043.05−2.881750466
1Q February10721043.59−2.650186567
2Q February10821044.21−3.492606285
1Q March10841044.81−3.615313653
2Q March10641045.48−1.740601504

Table 6.

Forecast results for the ARIMA model (1,1,1) [26].

The graph in Figure 11 proves the adequacy of the ARIMA model (1,1,1) developed, which is very close to the real model.

Figure 11.

Results of the ARIMA model (1,1,1) [26].

Table 2 allows us to admit that the chosen model can be used to model and forecast future sales in this petroleum production.

3.1.5 Forecasting using artificial neural networks

The goal here is to develop a relationship between experimental data collected from authentic sources to estimate the selling prices of fuel. We are trying to apply RBF radial-based neural networks, which are based on machine learning approaches due to the complex relationships between the input parameter and the output parameter. In this section, we present the modeling approach using this technique to precisely compare it with the ARIMA model used in the previous section.

3.1.6 Model development

The radial basis ANN model (comprising two layers) is trained for implementing the back propagation algorithm to minimize the mean squared error with one parameter (time) as the input and the desired output (fuel selling price). As presented on the visualization of the network shown in Figure 12, the first layer has radial basis transfer functions with the maximum number of 80 neurons, and the second layer has a linear transfer function, in order to build a consistent model for providing accurate forecasts [27].

Figure 12.

Visualization of the RBF network.

Feature selection is one of the core concepts in machine learning, which hugely impacts the performance of our model. Irrelevant or partially relevant features can negatively impact model performance. Feature selection and data cleaning should be the first and most important step of our model designing. However, in our case, this step may be omitted as long as our point cloud is significant. Subsequently, the dataset was randomly divided into two disjoint subsets of training set (60% of total dataset), which help us train our dataset to find the adequate model and testing set (40% of total dataset) to validate the model found. The training set is applied in order to develop the network. After the training phase, the reliability and accuracy of the network were perused with the test data. Besides, in our study, we implemented radial basis network of the MATLAB toolbox (i.e., “nwrb”). Furthermore, the Gaussian function is the main kernel function implemented here with the width parameter of 1 [27].

After executing the learning phase, we obtain Figures 13 and 14 that represent the learning of our database. Figure 15 represents the error in the training phase. During the test phase, we gave values to the input variable to visualize the results of the output and thus simulate our model.

Figure 13.

Training of the RBF network.

Figure 14.

Training graph of the RBF network.

Figure 15.

The graph of error.

3.1.7 Error optimization

Optimizing the error consists of a compromise to be made between the various parameters of the network, namely the speed, the objective, the number of neurons, and the number of neurons to be added to the hidden layer. This compromise is made on the basis of several tests of the different combinations carried out. Some of these combinations are presented in Table 7.

Parameters
GoalSpreadMNDF
0 .011.52525
0.0112530
0.0123030
0.010.81230
0.011.571030

Table 7.

Part of different combinations made.

After making different combinations, we find that the error is considerable for all the compromises. Consequently, no model can adapt to the time series, especially in the long term. The reason behind this result is not only the large fluctuations in the selling price of the fuel but also the percentage of the total dataset used in the training stage (60%). In fact, this percentage will not allow us to predict 40% of the total dataset. We will have to increase the percentage of training. In the next step, we will consider 80% of the total dataset for the training phase and 20% for testing the model. Table 8 summarizes the different combinations [27].

ParametersRelative error (%)
GoalSpreadMNDF
0.01110309.37
0.010.2525257.61
0.010.530205.29
0.010.830203.21
0.01120301.95

Table 8.

Error comparison for several combinations of parameters.

The combination that minimizes the error is therefore:

  • goal = 0.01;

  • spread = 1;

  • MN = 20; and

  • DF = 30.

We can conclude that learning with 80% of the database gives increased results in comparison with the other case (learning with 60%) since the error is minimized. The output is calculated and presented in Table 9.

Input (time)Real value of outputPredicted value of output% error
8310381027.21.04046243
8410431044.8−0.1725791
8510351033.40.15458937
8610401034.30.54807692
8710161034.7−1.84055118
8810151034.8−1.95073892
8910101034.9−2.46534653
9010011034.9−3.38661339
9110311034.9−0.37827352
9210331034.9−0.1839303
9310361034.90.10617761
9410301034.9−0.47572816
9510001034.9−3.49
9610421034.90.68138196
9710721034.93.4608209
9810741034.93.6405959
9910721034.93.4608209
10010821034.94.35304991
10110841034.94.5295203
10210641034.92.73496241

Table 9.

Predicted value of output after using the RBF model.

From Table 9, we can clearly see that the selected model can be used to model and forecast future sales in this petroleum manufacturing. As a last part, we will use the methodology of support vector machines to see that this is going to give a result.

3.1.8 Forecasting using support vector machines (SVMs)

The aim of our current work is to develop a relationship between experimental data collected from authentic sources to estimate the selling price of fuel. We are trying to apply support vector machines based on machine learning approaches because of the complex relationships between the input parameter and the output.

We prepared our database and then developed the program in Python language, which will be compiled on Spyder software.

We imported our dataset, which is the actual price of our fuel studied, created, and indexed the location of values from the database. Then, we standardized the data so that it corresponds to the learning process that will be carried out using the SVR function. In fact, we have divided our database into a learning part and another for the test. We tried two main distributions: (1) 60% of our database used in the learning phase and 40% used in the testing phase and (2) 80% of our database used in the learning phase and 20% used in the testing phase. We have kept the second distributions based on the results obtained after compiling the program. After that, we learned “Train X” and “Train Y” and executed the test to finally calculate the average of the errors and obtained the values ​predicted in the test phase, which are grouped in Figure 16.

Figure 16.

Results of the SVR function.

The average error is equal to 26.882361, which represents 2.53%. The error graph is shown in Figure 17.

Figure 17.

Error graph.

It is clear that the model chosen can be used to model and forecast future sales for this petroleum industry since the error observed (2.53%) respects the allowable margin of error set by the company at 3%. In addition, the SVR function is a useful tool, which guarantees good precision and minimizes the error compared to the ARIMA model.

3.2 Synthesis

In the first industrial application of this chapter, we modeled demand using ARIMA models. The model we have obtained will allow the company to forecast demand and make precise forecasts.

In the second application, we studied the selling prices of the SSP via three methodologies: ARIMA, RBF, and SVMs.

First, we developed an ARIMA model based on historical data. This study allowed us to determine the ARIMA model (1,1,1), which gives gasoline price forecasts close to the margin to reach for the first quarter of the current year with an average margin of error 2.855%. Second, we used the RBF technique to improve the modeling and forecasting of the selling price of fuel. It was found that this technique has proven its strength manifested in the error, which has been further minimized: 1.95% instead of 2.85% for the ARIMA model. Finally, we used the SVM function. The forecasts made are quite satisfactory because they respect the margin tolerated by the company. The error of the SVM function is around 2.53%.

As a summary, the SVM function has proven its strength manifesting itself in the error, which has been further minimized: 2.53% instead of 2.885% for the ARIMA model, but which remains higher than the error obtained using the RBF technique.

4. Conclusion

For most companies, forecasting is a prerequisite for effective supply chain management. Forecasting is the basis of all production management systems. The entire supply chain is based on data from forecast models.

In this chapter, we have presented the study of forecasting demand and selling prices in industrial companies. We also carried out a comparative study aimed at minimizing the error to guarantee increased forecasts.

In the first part, we modeled the future demand for a food company using ARIMA models based on the Box-Jenkins methodology. The model we have obtained will allow the company to forecast demand and make precise forecasts. We can clearly see that the chosen model can be used to model and forecast future demand for this agribusiness, but each time we need to populate the historical data with the new data.

Second, we carried out a study, which consists in comparing the quality of the forecasts obtained in the context of forecasting selling prices. We presented the application of three different methodologies allowing us to make sales forecasts in a company operating in the petroleum sector.

We have developed an ARIMA model based on historical data. This study allowed us to determine the optimal autoregressive, moving average, and differentiation parameters in order to make predictions. We found that the ARIMA model (1,1,1) gives gasoline price forecasts close to the margin to reach for the first quarter of the current year with an average margin of error of 2.855% included within the margin of error tolerated by the company (plus or minus 3% as margin of error). In addition, the hypothesis that the residues are white Gaussian noise has always been verified.

Then, we tried forecasting selling prices via the RBF technique in order to improve the modeling and forecasting done before. To do this, we have developed an RBF network based on historical data to come up with conclusions in terms of superiority of forecast performance. Consequently, the use of this technique has proven itself and has allowed us to minimize the error, which is 1.95% versus 2.85% for the ARIMA model.

Finally, we studied the SSP selling prices via the SVM function. We prepared our database and then developed the program in Python language, which will be compiled on Spyder software. The forecasts made are quite satisfactory with regard to the constraint imposed by the company (plus or minus 3% margin of error). The error of the SVM function is around 2.53%. Consequently, the SVM function has proven its strength manifesting itself in the error, which has been further minimized: 2.53% instead of 2.855% for the ARIMA model, but which remains higher by comparing it with the error obtained if we had opted for neural networks.

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Zineb Aman, Latifa Ezzine, Yassine Erraoui, Younes Fakhradine El Bahi and Haj El Moussami (January 27th 2021). Seeking Accuracy in Forecasting Demand and Selling Prices: Comparison of Various Methods, Forecasting in Mathematics - Recent Advances, New Perspectives and Applications, Abdo Abou Jaoude, IntechOpen, DOI: 10.5772/intechopen.93171. Available from:

chapter statistics

51total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

The Monte Carlo Techniques and the Complex Probability Paradigm

By Abdo Abou Jaoude

Related Book

First chapter

Introductory Chapter: Frontiers and Future Developments of the Complex Analysis

By Francisco Bulnes

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us