## 1. Introduction

When distributors and wholesalers seek help with issues relating to inventory management, they are usually concerned about an increasing level of out-of-stocks or over stocking. Out of stocks are leading to sales loss and customer service complaints. Over-stocks are resulting in slow inventory turnover and a buildup of dead inventory. In fact, out-of-stocks and over-stocks are actually a flip side of the same inventory management coin. Any effective initiative to resolve these issues must address core structural causes of these inventory management problems. Superior inventory management begins with timely, accurate, detailed demand forecasts.

Over last decade demand forecasting has played a prominent role in the corporations worldwide. Corporate executives have spent millions of dollars and invested thousands of man-hours trying to improve methods used & complicate it more. In each case little attention was paid to the integration between drivers, inputs and demand forecast (Harrison & Qizhong, 1993). In the face of all these advancements in hardware and software forecast error still remain high.

The inaccuracy in the forecast is due to previous researchers focused on statistical methods and their improvements only. There was no effort on the modeling of the problem and how to build an expert system to interact properly with the dynamic changes of the supply chain (Ajoy & Dobrivoje, 2005). The forecasting model is not treated as enterprise system has its specifications and constraints which are modeled and simulated.

In this research we propose a design of expert demand forecast system which is designed after deep understanding of demand cycle within dynamic supply chain and interaction between different parameters within the supply chain. It is utilizing Bayesian vector auto regression, restricted vector auto regression, and kernel fisher discriminant analysis (Scholkopf & Smola, 1998), (Scholkopf et al., 1999) with improved genetic algorithm to filter, analyze inputs and factors affecting demand along with demand history and then generate baseline and operational forecasts. This model proposes new mathematical and expert modeling methodology to generate forecasts. We used a practical case study from international FMCG (Fast Moving Consumer Goods) industry using over 1000 product types and results show that a significant forecast accuracy and other supply chain key performance indicators improvements over one year months rolling.

The proposed model is composed of the integration between statistical and intelligent methods with expert input to generate more accurate demand forecasts. The inputs to the model are history demand time series, demand factors history series and the expert inputs. The process starts with calculating the effects of demand factors on the demand series, which is followed by eliminating the effects of these factors. The next step is to perform the forecasting of raw or baseline demand. Finally, perform the estimation of the best collection of factors and activities in future based on history and expert inputs or using the judgmental input to adjust the activities collection. The outcome of this process is final operational forecast.

## 2. The detailed framework

The framework of the proposed demand planning model consists of three sub models; the first sub model is called “Factors Classifying Model”. Its inputs are history demand series and demand factors. The model outputs are cleaned history demand and Regular Factors Matrix (RFM) of possible factors on the demand series. The first model is responsible for the following consecutive steps: (1) evaluating real demand by eliminating the effects of the unavailability; (2) calculating the effects of demand factors on the cleaned sales using multiple regression models; (3) establishing the knowledge base which is updated after each run of the model; and (4) classifying input factors based on effect on demand.

The second sub model is called “Intelligent Time Series Model”. Its inputs are the cleaned history demand series, RFM. It is responsible for calculating time series components (trend, seasonality, and cycles) of the real demand and calculating raw or baseline forecast which represents demand without the effects of demand factors. Baseline is calculated by combining selected statistical methods (Bovas & Johannes, 1983): simple exponential smoothing, winter's additive and winter's multiplicative methods, other techniques can be found in (Allan, 2005). The best combination is obtained by calculating the best set of weights for each method using genetic algorithm which minimizes MSE.

Finally, the last sub model is the “Intelligent Combining Model”. Its inputs are the generated baseline forecast, RFM. And its output is the final forecast including forecast factors. The model compares factors using genetic algorithm, which minimizes the cost and increases the profit of forecast.

The final outcome of the model is the final demand forecast (Operational Forecast) and activities which maximize profit. Operational forecast is the summation of baseline forecast and activities forecast. The model can be further tuned using opinions of experienced people, which can change any of the activities followed by recalculation of the forecast model based on the new suggested parameters by experts. The proposed model is shown in Fig.1.

The following sections explain the construction of domain knowledge for demand factors which is used throughout the model. This is followed by the explanations of the first sub model (i.e. factors classifying model) in details. Then the second sub model (i.e. intelligent times series model) is explained along with the different statistical methods. And then the proposed genetic algorithm is explained using selected case study.

## 3. The domain knowledge & data structure

As explained in the introduction, the model inputs are divided into three categories: demand series, demand factors, and setup parameters. Demand series is the history demand time series which is considered as numbers exist in a time frame. Demand factors such as

activities or temperature. Set up parameters such as forecast horizon and history time buckets.

To ensure clear structuring of demand factors, it is needed to construct knowledge base, which is based on consumer products industry. It can be further tuned to fit another industry.

The following are the advantages behind the proposed knowledge base:

Explicitly define suitable structure of possible demand factors;

Enable the reuse of domain knowledge;

Clearly explain the different assumptions and intentions of model parameters;

Facilitate the analysis of domain knowledge.

### 3.2. “Demand Factors” classes

The classes of the demand factors domain knowledge are divided based on how they will be handled and analyzed in the model. Detailed explanations about the different demand factors with examples can be found in (Flores & Olson, 1992). Demand factors are divided into controllable and uncontrollable as shown in Fig.2.

### 3.2. Controllable factors

Controllable factors are those that expert can control their timing and amount. During the forecast process, the proposed model calculates their effect percentages on demand series using multiple regression methods then forecast their time and amount which maximizes the profit.

Controllable factors are divided into multi value and one value. Multiple values mean its values are ranged from multiple numbers like percentage of discount, e.g. 2%, 4%, or any other value. One value means it occurs or not, it can be explained by ON / OFF, where ON is mapped to 1 and OFF is mapped to 0.

Multi value division is divided into:

Commission (CM): which indicates percentage of price discount given for certain period, and it can be maximum 100% and minimum 0%

Availability (AV): which indicated percentage of product availability, its maximum limit is 100% and its minimum limit is 0%.

One value division is divided into:

Consumer Promotion (CP): which are the promotions done to the consumers, like “chance to win” so its value is done or not done.

Advertising (AD): Like TV or radio advertising, or street shows, also its value is done or not done.

### 3.3. Uncontrollable factors

Uncontrollable factors means we can’t control their time nor amount. In order to predict them in future, the proposed model forecasts them using time series analysis method like linear regression.

Uncontrollable factors are divided into:

Temperature (T): This represents the temperature degree for each period, so it can be varied under long range. The temperature is represented qualitatively based on ranges of the degrees as shown in Table 1.

No_Of_Working_Days (WD): Number of working days per period. It gives an indication of the strength of each period in terms of number of days they can produce and sell the product. WD is represented into two values small, and Normal as shown in Table 2.

## 4. Factors classifying model

Factors Classifying model is the first sub model in the solution framework. The inputs to the sub model are the history demand series which is coming from sales history, the demand factors, products prices, and the setup parameters. The outcomes of that model are RFM and cleaned history demand series (Raw History Demand).

First module is using domain knowledge rule. Inputs are categorized and inserted into the predefined factors parameters which are used throughout the model. This division is useful where it facilitates the utilization of rules and constraints. It is easy to add additional factors in future where system can adapt itself automatically, without changing the model structure.

The second module, real demand calculation, is used to eliminate the effects of availability factor from regular series and promotion series to generate real history regular demand and real history promotion series.

Final module is activity analysis & cleaning. Its inputs are real history demand series and real history factors series. The outcome of the factors classifying model is the cleaned regular and promotion series, and the regular factors matrix and the promotion factors matrix. The term cleaned means that it represents the real demand without any effect of the demand factors. Multiple regression method is used to calculate the effects of demand factors on demand series. The framework of the cause & effect model is shown in Fig.3.

### 4.1. Build knowledge base

Objectives:

This module is used to prepare and maintain the required data sets to perform the following tasks:

Calculate the data Series;

Calculate the activities costs;

Use the rules identified.

Inputs:

DS: represents the time series of the volumes of the demand and the time is the period buckets such as weeks or months.

DF: the history demand factors and activities and it consist of the following series:

P: series of the consumer promotions types and cost per month

CM: represents series of the price reduction percentage per month

AD: represents the series of the advertising types and cost per month

T: the temperature degrees per month series

WD: represents the series of number of working days per month

AV: availability % parameter

PP: represents the series of the product price history per month, this series is used to calculate the cost of the activities

Outputs:

RS: the time series of the regular demand (actual sales without the consumer promotions)

PS: the times series of the promotional demand.

AD: time series of one value either ON or OFF whether there was ad or not.

CM: the percentage of the price reduction.

CS: the series of the cost for all activities

T: temperature series is a divided into ranges in order to enable analyzing them and forecasting them.

WD: number of working days, which affects the number of visits to the consumer, and hence the sales.

Constraints:

RS ∩ PS = Ф

AD = 1 or 0

### 4.2. Real demand calculation

Objective:

Obtain the real demand by removing the effect of the availability parameter on the sales:

Loss = 0, if availability = 100%

= [RS * (100 - availability)] / availability, if availability < 100%

Removing the effect of promotion volumes on regular sales (For all RS = 0, RS = average (3 previous PS)

Constraints:

Availability <= 100%

### 4.3. Activity analysis & classifying

Objective:

Calculate the effect of the CP & AD & CM & CPS on the regular series using multiple regression statistical method for each uncontrollable pair (T, WD) to construct RFM;

Clean the regular series by removing the effect of the AD, CM and recalculate RS at PS > 0;

Classify the input factors based on effect using KFDA and improved genetic algorithm.

T | WD | CP (%) | CM (%) | AD (%) |

VL | S | 20 | 2 | 12 |

VL | N | 23 | 3 | 14 |

L | S | 27 | 5 | 18 |

L | N | 35 | 7 | 19 |

M | S | 40 | 8 | 25 |

M | N | 46 | 10 | 27 |

H | S | 52 | 11 | 29 |

H | N | 60 | 12 | 31 |

VH | S | 84 | 15 | 34 |

VH | N | 95 | 17 | 36 |

In the previous example, for raw one it indicates that when temperature is very low and #of working days is small then the effect of consumer promotion on the regular sales is 20%, the effect of 1% commission is 2%, and the effect of the advertising is 12%.

## 5. Intelligent time series model

After cleaned regular sales are calculated, using previous module, the history sales will be analyzed using statistical and computational methods. At this model three types of methods are used: SES, additive winter's method and multiplicative winter's method (Armstrong, 1998). Then the genetic algorithm is used to calculate the best weights between the three methods to give the least error and generate baseline forecast using optimum weights.

### 5.1. Simple exponential smoothing

SES is used to forecast non seasonal Time series. The assumption is that the means moves slowly over time. Heuristically, in such a case it would be reasonable to give more weight to the most recent observations and less to the observations in the distant past.

A = CRS

Y = Forecast

K = number of history periods

α is the smoothing constant and it is usually chosen between 0.05 and 0.30

Initial value of

Through repeated equation of the SES, it can be shown that:

Thus the influence of Y_{0} on Y_{n} is negligible, provided that n is moderately large and (1 - α) is smaller than 1. We take the simple average of the available history data (A_{1}, A_{2}, A_{3, …}, A_{n}) as the initial estimate of Y.

Choosing the smoothing constant α

The best value is between 0.05 and 0.30 so by simulating the result by calculating the MSE for each α. and estimating α which gives least MSE is the optimum one.

### 5.2. Winters’ methods

Winters (1960) considers linear trend model with seasonal indicators. The seasonal and the trend components can be either additive or multiplicative. More details about the methods equations are introduced in chapter 2.

To make initialization, it is needed one complete cycle of data, i.e. s values. Then set

To initialize trend, it is used s + k time periods

As long as the series is long enough, so it is used here k=s so that two completed cycles are used.

Initial seasonal indices can be taken as

The parameters γ, β, and α must lie in the interval [0, 1[, and they are selected to optimize the MSE.

### 5.3. Proposed genetic algorithm

Objective:

The objective of GA is to calculate the optimum weights W = (W1, W2, W3) which minimize the MSE in the training set to get the best forecast.

W1 is the weight of the SES method

W2 is the weight of the winter's additive method

W3 is the weight of the winter's multiplicative method

GA elements:

Fitness function = min (MSE)

Constraints:

Representation:

Our variables are the weights of the different statistical methods.

First, we need to encode decision variables into binary strings. The length of the string depends on the required precision. In this case the domain of the variable w_{j} is [a_{j}, b_{j}[and the required precision is two places after the decimal point.

The required bits, denoted by m_{j}, for the variable is calculated as follows:

The mapping from a binary string to a real number for variable w is completed as follows:

Where the decimal (substring) represents the decimal value of substring_{j} for decimal variable w_{j.}

To calculate the number of digits needed to represent the weight:

(11) |

Then number of digits needed for each weight is 7 digits. So the total length of the chromosome is 3 * 7 = 21 digits

Example of calculating the Weights:

W: 0011010 its decimal number = 26

W= 0 + 26 * 1/(2^{7} – 1) = 0.20

GA Procedure:

Initialization of the population

The initial population is selected randomly. The population is consisting of 10 solutions as follows:

V1 = [001001101101010110111]

V2 = [110010100011000001100]

V3 = [000010010011010101110]

V4 = [110010000100100001000]

V5 = [000000111110100000100]

V6 = [011111100110010100110]

V7 = [010001010011100001111]

V8 = [000000011101000001010]

V9 = [001010101010011000001]

V10= [111100000000000000111]

The corresponding decimals:

V1 = [w1, w2, w3] = [0.15, 0.42, 0.43]

V2 = [w1, w2, w3] = [0.80, 0.10, 0.10]

V3 = [w1, w2, w3] = [0.03, 0.61, 0.36]

V4 = [w1, w2, w3] = [0.79, 0.14, 0.07]

V5 = [w1, w2, w3] = [0.01, 0.96, 0.03]

V6 = [w1, w2, w3] = [0.50, 0.20, 0.30]

V7 = [w1, w2, w3] = [0.27, 0.61, 0.12]

V8 = [w1, w2, w3] = [0.00, 0.91, 0.09]

V9 = [w1, w2, w3] = [0.17, 0.32, 0.51]

V10= [w1, w2, w3] = [0.94, 0.00, 0.06]

Evaluation:

The process of evaluation the fitness of a chromosome consists of the following three steps

Evaluation Procedure:

Convert the chromosome’s genotype to its phenotype. This means converting binary string into relative real values, which is happened above.

Evaluate the objective function f (w^{k}).

Convert the value of objective function into fitness. For the minimization problem, the fitness is simple equal to the value of objective function eval (v_{k}) = f (w^{k}), k=1, 2…, pop_size

It is clear that chromosome V_{7} is the best result and that chromosome V_{10} is the weakest result.

Selection:

Its target to choose which solutions will remain in the new population and which solutions will be changed. By doing the following steps:

Get the least MSE from the original methods which is MSE

Arrange the solutions ascending based on their evaluation function

Compare the eval function for each solution by the MSE_{best}. The first one which is less than MSE_{best} will be taken to the new population. And the rest of the solutions will be changed.

Implementing on our example:

Eval (V1) = 52

Eval (V2) = 136

Eval (V3) = 59

Eval (V4) = 131

Eval (V5) = 69

Eval (V6) = 73

Eval (V7) = 49

Eval (V8) = 68

Eval (V9) = 54

Eval (V10) = 186

Arranging the solutions based on the evaluation function:

The MSE_{best} = MSE(V7) = 72

Then V7 will stay in the new population and the rest of the solutions will be changed.

Crossover:

Crossover used here is one-cut-point method, which randomly selects one cut-point and exchanges the right parts of two parents to generate offspring.

The probability of crossover is set as p_{c} = 0.25, so we expect that, on average, 25% of chromosomes undergo crossover. Crossover is performed in the following way:

Assume that the sequence of random numbers is:

0.266 0.288 0.295 0.163 0.567 0.0859 0.392 0.770 0.548 0.337

This means that the chromosome v4 and v6 were selected for crossover. Then we generate random number pos from the range [1, 20] (because 21 is the total length of a chromosome) as cutting point or in other words, the position of the crossover point. Assume that the generated number pos equals 3, the two chromosomes are cut after the first bit, and offspring are generated by exchange the right parts of them as follow:

V4 = [000000111110100000100]

V6 = [010001010011100001111]

||VV4’ = [010000111110100000100]

V6’ = [000001010011100001111]

Mutation:

Mutation alters one or more genes with a probability equal to the mutation rate. The probability of mutation is set as p_{m} = 0.01, so we expect that, on average, 1% of total bit of population would undergo mutation. There are m * pop_size = 21 * 10 = 210 bits in the whole population; we expect 2.1 mutations per generation. Every bit has an equal chance to be mutated. Thus we need to generate a sequence of random numbers r_{k} (k=1,2,…,210) from the range [0,1]. Assume that the following genes will go through mutation:

After Mutation:

V_{2} = [110010100011000001100]

V^{’}
_{2} = [110010110011000001110]

V_{5} = [000000111110100000100]

V^{’}
_{5} = [000010111110100000100]

After the mutation & the crossover, the new generation is:

V^{’}
_{1} = [001001101101010110111]

V^{’}
_{2} = [110010110011000001110]

V^{’}
_{3} = [000010010011010101110]

V^{’}
_{4} = [010000111110100000100]

V^{’}
_{5} = [000010111110100000100]

V^{’}
_{6} = [000001010011100001111]

V^{’}
_{7} = [010001010011100001111]

V^{’}
_{8} = [000000011101000001010]

V^{’}
_{9} = [001010101010011000001]

V^{’}
_{10} = [111100000000000000111]

We continue the iterations until the termination condition happened. The termination condition is that when doing 10 consecutive iterations without generating any new solution which is giving better evaluation function than the best previous one. Then we stop and the best solution with the optimum weights will be the last better evaluation function solution we obtained.

## 6. Intelligent combining model

The objective from this module is to calculate the optimum choices of the activities in the future periods which maximize the profit. So using this genetic algorithm module we can choose the best time we can do the activity and the best timing.

The first step is to forecast the uncontrollable factors for the required periods. T and WD are forecasted using the intelligent time series model at Fig.10. Then to calculate what is the best combination of the controllable factors (AD, CP, and CM) timings using the genetic algorithm and the values are calculated using RFM from the values of forecasted T & WD corresponding to the time point.

GA Elements:

Fitness function = Max (Profit)

Profit = Sales – Cost

Constraints:

Cost <= Limited_Cost

Assumptions:

# Periods = 12 periods

Representation:

There are three variables:

The timing of the CP for the coming 12 periods

The timing of the CM for the coming 12 periods

The timing of the AD for the coming 12 periods

To encode the above variables into binary digits so each variable will be consisting of 12 genes or bits representing the 12 periods. And if the bit is 0 that means there is no activity at this period, if the bit is 1 that means there is an activity at this period.

Total number of genes at each chromosome will equal 12 * 3 = 36.

GA Procedure:

Initialization of the population:

The initial population is selected randomly. The population is consisting of 10 solutions.

Example of initial population:

V1 = [101101011101010010010101001010101010]

V2 = [001010101010101010001111110010101000]

V3 = [111010001100101001011111000100001010]

V4 = [010101010000111010101000011100101010]

V5 = [101010101010000001010101110010101011]

V6 = [010101010110101010010000111001010111]

V7 = [110101011011001010001110101000101011]

V8 = [111010010101001010010101001000010111]

V9 = [110110010100010101010100011100010110]

V10 = [010000101001010101010100010001100011]

After calculating their real values:

V1 CP will happen in future in periods: 1, 3, 4, 6, 8, 9, 10, and 12

CM will happen in future in periods: 2, 5, 8, 10, and 12

AD will happen in future in periods: 3, 5, 7, 9, and 11

And so on

Evaluation:

Objective function = Max (Sales – cost)

CP_{ i} If at point i there is CP then it is 1, else it is 0

AD_{ i} If at point i there is CP then it is 1, else it is 0

CM_{ i} If at point i there is CP then it is 1, else it is 0

CP_Volume_{ i} = CP_{ i} * Y_{ i} * GET_CP_RFM (T_{ i}, WD_{ i})

AD_Volume_{ i} = AD_{ i} * Y_{ i} * GET_AD_RFM (T_{ i}, WD_{ i})

CM_Volume_{ i} = CM_{ i} * Y_{ i} * GET_CM_RFM (T_{ i}, WD_{ i})

Sales =∑_{i=1to12} PP * [Y_{ i} + (CP_Volume_{ i} + AD_Volume_{ i} + CM_Volume_{ i})]

CP_Cost_{ i} = Average (P_PC_{ i}) * CP_Volume_{ I}

AD_Cost_{ i} = AD_{ i} * Average (CS_AD_{ i})

CM_Cost_{ i} = Average (PP_{ i}) * CM_Volume_{ I}

Cost = ∑_{i=1to12} (CP_Cost_{ i} + AD_Cost_{ i} + CM_Cost_{ i})

For maximization problem, the fitness is simply equal to the value of the objective function

eval(v_{k}) = f(x^{k}),k=1, 2,…pop_size.

In this case we have three x’s: x_{1} represents the CP timing through the future 12 periods. x_{2} is represents the CM timing through the future 12 periods and x_{3} represents the AD timing through the future 12 periods.

CP_Effect, AD_Effect and CM_Effect are calculated through getting the value of the % from the RFM table.

Selection:

In most practices, a roulette wheel approach is adopted as the selection procedure; it belongs to the fitness-propotional selection and can select a new population with respect to the probability distribution based on fitness values. We constructed the roulette wheel as follows:

Calculate the fitness value eval(v_{k}) for each chromosome v:

Calculate the total fitness for the population:

Calculate selection probability p for each chromosome v_{k} :

Calculate cumulative probability for each chromosome v_{k}:

This selection process begins by spinning the roulette wheel pop_size times; each time, a single chromosome is selected for a new population in the following way:

Selection Procedure:

Generate a random number r from the range [0,1].

If r<=q_{1}, then select the first chromosome v_{k}; otherwise, select the kth chromosome v_{k} (2 <= k <= pop_size) such that q_{k-1}< r <=q_{k}.

Crossover & mutation will be implemented as explained in GA model. A detailed case study will be explained in the next chapter.

## 7. Results

In this session, there is a comparison between traditional statistical methods and the proposed model results. The comparison is based on the forecasting accuracy measure which is MSE. By running it on different types of products time series patterns and showing the difference between the two scenarios, it is shown that for all different types of times series the proposed model is giving better results than any traditional statistical method. Other examples are shown also in (Fred & Scott, 1992).

The model is run for 6 consequent months. And after each month the results are compared between the traditional statistical methods and the proposed model. And the following graphs are showing how improvements have done. A comparison between the models MSE is also shown. That proofs that the proposed model is improving forecasting accuracy for different types of times series. We will find too that each statistical method is giving better forecasting accuracy for some types of time series. That shows that complicating or using a standalone statistical method is not the best case to improve the forecasting accuracy.

## 8. Conclusions & future work

Forecasting using single statistical method showed limitations in providing accurate forecasting. The use of a combined intelligent model is needed for providing accurate forecasting specially for complex environments which have different demand factors. As described in this chapter, the use of a new genetic algorithm is proposed to combine statistical methods and for combining activities with the baseline forecast, which suggests forecasts that are more accurate. First, the genetic algorithm searches for the best combination of weights between the methods, which minimizes MSE error. Then, genetic algorithm searches between the activities timing to choose the best timing, which increases the profit. Forecast methods are chosen so that they can cover different types of times series. Comparison of the obtained results shows better accuracy than that obtained using traditional methods. Other combinations of forecast methods can be included in the proposed solution for better forecast accuracy. Further improvement to the forecast model is obtainable by changing the crossover, mutation in the genetic algorithm, or by changing the initial population. As future work, it is recommended to try to add some neural techniques to the proposed model as it showed improving the forecasting capabilities.