Open access peer-reviewed chapter - ONLINE FIRST

# Stock Market Trend Prediction Using Hidden Markov Model

By Deneshkumar Venugopal, Senthamarai Kannan Kaliyaperumal and Sonai Muthu Niraikulathan

Submitted: May 20th 2020Reviewed: September 11th 2020Published: November 12th 2020

DOI: 10.5772/intechopen.93988

## Abstract

In Recent years many forecasting methods have been proposed and implemented for the stock market trend prediction. In this Chapter, the trend analyses of the stock market prediction are presented by using Hidden Markov Model with the one day difference in close value for a particular period. The probability values π gives the trend percentage of the stock prices which is calculated for all the observe sequence and hidden sequences. This chapter helps for decision makers to make decisions in case of uncertainty on the basis of the percentage of probability values obtained from the steady state probability distribution.

### Keywords

• stock market
• HMM
• TPM
• EPM and trend prediction

## 1. Introduction

The fundamental idea behind a hidden Markov model is that there is a Markov process we cannot observe that determines the probability distribution for what we do observe. Thus a hidden Markov model is specified by the transition density of the Markov chain and the probability laws that govern what we observe given the state of the Markov chain. Given such a model, we want to estimate any parameters that occur in the model. And also determined the most likely sequence for the hidden process. Finally we may want the probability distribution for the hidden states at every location.

Let ytrepresents the observed value of the process at location tfor t=1,.,T,θtthe value of the hidden process at location tand let ϕ represents parameters necessary to determine the probability distribution for ytgiven θtand θtgiven θt1. In our applications, ytwill either be an increase or decrease and the hidden process will determine the probability distribution of observing different letters.

Our model is then described by the sets of probability distributions p(ytθt,ϕ)and pθtθt1ϕ. A crucial component of this model is that the ytare independent given the set of θtand θonly depends directly on its neighbors θt1and θt+1. The various distribution in which we are interested are pϕy1.yT, pθty1.yTfor all tand pθ1..θTy1..yt. We will adopt a Bayesian perspective, so that we treat θtas a random variable [1, 2].

The measure of best is to find the path that has the maximum probability in the HMM, given the sequenceX. Recall that the model gives the joint probabilities PrHXfor all sequence, it also gives the posterior probability PrHX=PrHX/PrX, for every possible state path H through the model, conditioned on the sequence Xwith maximum posterior probability [3, 4]. Given that the denominator PrXis constant in the conditional probability formula for a given sequence X, maximizing the posterior probability is equivalent to finding the state path H* that maximizes the joint probability PrHX.Nguyen  has determined the optimal number of states for the HMM by using the AIC, BIC and HQ information criteria and also discussed the applications of HMM in stock trading. Hassan and Nath  have applied HMM to the airlines stock forecast. HMMs have been used for pattern recognition and classification problems and it was suitable for modeling dynamic systems.

## 2. Hidden Markov model

Hidden Markov model (HMM) is a stochastic model which is not directly observable, It describes the observable events that are depends on internal factors. The observable events are represented as symbols, where the invisible factor involved in the observation is represented as a state. HMM is a stochastic model where the system is assumed to be a Markov Process with hidden states and it gives better accuracy than the other models. Using the given input values, the parameters of the HMM (λ) denoted by A, B and π are found out. An HMM is defined as λ = (S,O,A,B,π) where S = {s1,s2,…,sN} is a set of N possible states O = {o1,o2,…,oM} is a set of M possible observation symbols, A is an N*N state Transition Probability Matrix (TPM), B is an N*M observation or Emission Probability Matrix (EPM) and Π is an N dimensional initial state probability distribution vector and A,B and π should satisfy the following conditions (Figure 1):

j=1Naij=1where1iN;
j=1Mbij=1where1iN;
i=1Nπi=1whereπi0

### 2.1 Evaluation problem

Given the HMM = {A,B,π} and the observation sequence O = o1,o2,…,oM, the probability that model λ has generated sequence O is calculated. Often this problem is solved by the Forward Backward Algorithm [7, 8].

### 2.2 Decoding problem

Given the HMM λ = {A,B,π} and the observation sequence O = o1,o2,…,oM, calculate the most likely sequence of hidden states that produced this observation sequence O. Usually this problem is handled by Viterbi Algorithm [7, 8].

### 2.3 Learning problem

Given some training observation sequences O = o1,o2,…,oM, and general structure of HMM (numbers of hidden and visible states), determine HMM parameters λ = {A,B,π} that best fit training data. The most common solution for this problem is Baum-Welch algorithm [9, 10] which is considered as the traditional method for training HMM.

## 3. Results and discussions

In this chapter, the data has been taken fromYahoofinance.comand the NSE daily close value data for a month of January 2020 period is considered for the analysis.

Here two observing symbols “I” for Increasing states and the symbols “D” for decreasing states have been used. If the differences of close value greater than 0 its observing that the symbol is “f” and If the differences of close value less than 0 its observing that the symbol is “D”. There are six hidden states assumed and are denoted by the symbol S1, S2, S3, S4, S5, S6 are indicates that very low, low, moderate low, moderate high, high and very high respectively. The states are not directly observable.

The situations of the stock market are considered hidden. Given a sequence of observation we can find the hidden state sequence that produced those observations. Table 1 shows the daily close value of the stock market.

S. noDateClose
101/02/202041,626.64
201/03/202041,464.61
301/06/202040,676.63
401/07/202040,869.47
501/08/202040,817.74
601/09/202041,452.35
701/10/202041,599.72
801/13/202041,859.69
901/14/202041,952.63
1001/15/202041,872.73
1101/16/202041,932.56
1201/17/202041,945.37
1301/20/202041,528.91
1401/21/202041,323.81
1501/22/202041,115.38
1601/23/202041,386.4
1701/24/202041,613.19
1801/27/202041,155.12
1901/28/202040,966.86
2001/29/202041,198.66
2101/30/202040,913.82
2201/31/202040,723.49

### Table 1.

Daily close value of NSE.

Interval values:

S1 = −9500 to −551.

S2 = −550 to −251.

S3 = −250 to 249.

S4 = 250 to 8500.

S. noc.vD in 1 day CVo.sD in 2 days CVo.sD in3 days CVo.sD in 4 day CVo.sD in 5 day CVo.sD in6 days CVo.s
141,626.64
241,464.61162.03I
340,676.63787.98I−625.95D
40,869.47−192.84D980.82I−1606.77D
540,817.7451.73I−244.57D1225.39I−2882.16D
641,452.35−634.61D686.84I−930.91D2156.3I−4988.46D
741,599.72−147.37D−487.24D1173.58I2104.49I4260.79I−9249.25D
841,759.69−259.97D112.6I−599.84D1773.42I−3877.91D8138.7I
941,952.63−92.94D−167.03D279.63I−879.47D2652.89I−6530.8D
1041,872.7379.9I−172.84D5.81I273.82I−1153.28D3806.18I
1141,932.56−59.83D139.73I−312.57D318.38I−44.56D−1108.73D
1241,945.37−12.81D−47.02D−92.71D405.28I−86.9D42.34I
1341,528.91416.46I403.65I−450.67D357.96I47.32I−134.22D
1441,323.81205.1I211.36I192.22I−642.96D1000.92I−953.6D
1541,115.38208.43I−3.33D214.69I−22.4D−620.56D1621.48I
1641,386.4−271.02D479.45I−482.78D697.47I−719.87D99.31I
1741,613.19−226.79D−44.23D523.68I−1006.46D1703.93I−2423.8D
1841,155.12458.01I−684.86D640.63I−116.95D−889.51D2593.44I
1940,966.86188.26I269.81I−415.05D1055.68I938.73I−1828.24D
2041,198.66−231.8D420.06I−150.25D−264.8D1320.48I−381.75D
2140,913.82284.84I−516.64I936.7I−1086.95D822.15I498.33I
2240,723.49190.33I94.51I−611.15D1547.85I−2634.8D3456.95I

### Table 2.

Daily close value for finding differences in one day, two day, three days, four days, five days, six days close value.

The various probability values of TPM, EPM and π for difference in one day, two days, three days, four days, five days, six days close value are calculated as given below (Table 2).

S1S2S3S4
IDIDIDID
S100001000
S200001000
S30.07100.07100.14290.285700.4286
S40000.80.2000

### Table 3.

Transitions with probability values for one day close value.

Probability values of TPM, EPM, and π for difference in one day close value (Figure 2 and Table 3):

[S1S2S3S4S10010S20010S30.0710.0710.42860.4286S40010][IDS101S201S30.28490.7143S40.50.5]
S1S2S3S4
IDIDIDID
S10000000.50.5
S200000.50.500
S300.111000.33330.22220.11110.2222
S4000.333300.500.16670

### Table 4.

Transition table with probability values for difference in two day close value.

Probability values of TPM, EPM, and π for difference in two day close value (Figure 3 and Table 4).

[S1S2S3S4S10010S20010S30.011100.55550.3333S400.33330.50.1667][IDS10.50.5S20.50.5S30.44440.5556S410]
S1S2S3S4
IDIDIDID
S100000001
S2000000.7500.25
S3000.40.20.2000.2
S40.500.200.200.20

### Table 5.

Transition table with probability values for difference in three day close value.

Probability values of TPM, EPM, and π for difference in three day close value (Figure 4 and Table 5):

[S1S2S3S4S10001S2000.750.25S300.60.20.2S40.50.20.20.2][IDS101S211S30.60.4S410]
S1S2S3S4
IDIDIDID
S10.14290.242900.142900.142900.4286
S20.500000.500
S300000100
S40.428600.14290000.42860

### Table 6.

Transition table with probability values for difference in four day close value.

Probability values of TPM, EPM and π for difference in four days close value (Figure 5 and Table 6):

[S1S2S3S4S10.38580.14290.14290.4286S20.500.50S30010S40.42860.142900.4286][IDS10.14290.9573S20.50.5S301S410]
S1S2S3S4
IDIDIDID
S100.16670000.166700.6667
S200000000
S3000000.66670.33330
S40.7143000000.28570

### Table 7.

Transition table with probability values for difference in five day close value.

Probability values of TPM, EPM and π for difference in five days close value (Figure 6 and Table 7):

[S1S2S3S4S10.166700.16670.6667S20000S3000.66670.3333S40.7143000.6667][IDS101S201S30.33330.6667S410]
S1S2S3S4
IDIDIDID
S10000.200.200.6
S200000001
S30.33330.3333000.3333000
S40.50000.2500.250

### Table 8.

Transition table with probability values for difference in six day close value.

Probability values of TPM, EPM and π for difference in six days close value (Figure 7 and Table 8):

[S1S2S3S4S100.20.20.6S20001S30.666700.33330S40.500.250.25][IDS101S201S30.6670.3333S410]

The various transitions probability values for difference in one day to six days close values are displayed in Figure 2 to Figure 7 respectively.

Optimum Sequence of States:

To generate a random sequence of emission symbols and states are calculated by using the function “Hmmgenerate”. The HMM matlab toolbox syntax is: [Sequence,States] = Hmmgenerate(L,TPM,EPM). The length of both sequence and state to be generated is denoted by L . The fitness function used for finding the fitted value of sequence of states is defined by

(Fitness=)1compareijEQ1

Using the iterative procedure, for each TPM and EPM framed we get an optimum sequence of states generated.

The length of the sequence taken as L = 4 and the optimum sequence of states obtained from the all six day’s differences with TPM and EPM is given in the below and here ‘ε’ is the start symbol.

 1 ε → IS4 → DS4 → IS3 → DS4 2 ε → DS1 → IS4 → IS4 → DS3 3 ε → IS4 → Is2 → DS3 → DS1 4 ε → DS1 → DS4 → IS3 → DS4 5 ε → IS3 → IS3 → DS2 → DS4 6 ε → IS4 → DS1 → IS3 → DS4

Here, the one day difference of TPM and EPM has the shortest path. So the best optimum sequence is found from one day difference in close value. Using the fitness function we compute the fitness value for each of the optimum sequences of states obtained (Table 9).

S. no.Comparison of six optimum sequence of statesCalculated valueFitness = 1comparisionij
1(1,2) + (1,3) + (1,4)11
2(2,1) + (2,3) + (2,4)1.70.588
3(3,1) + (3,2) + (3,4)2.4250.412
4(4,1) + (4,2) + (4,3)3.150.32

### Table 9.

Comparison of six optimum state sequences.

In column four the highest value is the fitness value and the better is the performance of the particular sequence.

## 4. Conclusion

Stock prediction is challenging due to its randomness. Hidden Markov Model can be used for stock prediction by finding hidden patterns. Here the Hidden Markov model easily recognized four states of the stock market and also it was used to predict the future values. The highest value in the Optimum State Sequences is the better performance of the particular sequence. Hidden states and sequences have been generated to easily identify the level of the sequence whether the next day value is increasing. And also identified whether the increasing level is moderate high or high or very high and also decreasing level whether moderate low or low or very low. This model will be very much useful for short term as well as long term investors.

chapter PDF

## More

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## How to cite and reference

### Cite this chapter Copy to clipboard

Deneshkumar Venugopal, Senthamarai Kannan Kaliyaperumal and Sonai Muthu Niraikulathan (November 12th 2020). Stock Market Trend Prediction Using Hidden Markov Model [Online First], IntechOpen, DOI: 10.5772/intechopen.93988. Available from: