Open access peer-reviewed chapter

Stock Market Trend Prediction Using Hidden Markov Model

Written By

Deneshkumar Venugopal, Senthamarai Kannan Kaliyaperumal and Sonai Muthu Niraikulathan

Submitted: 20 May 2020 Reviewed: 11 September 2020 Published: 12 November 2020

DOI: 10.5772/intechopen.93988

From the Edited Volume

Forecasting in Mathematics - Recent Advances, New Perspectives and Applications

Edited by Abdo Abou Jaoude

Chapter metrics overview

1,264 Chapter Downloads

View Full Metrics

Abstract

In Recent years many forecasting methods have been proposed and implemented for the stock market trend prediction. In this Chapter, the trend analyses of the stock market prediction are presented by using Hidden Markov Model with the one day difference in close value for a particular period. The probability values π gives the trend percentage of the stock prices which is calculated for all the observe sequence and hidden sequences. This chapter helps for decision makers to make decisions in case of uncertainty on the basis of the percentage of probability values obtained from the steady state probability distribution.

Keywords

  • stock market
  • HMM
  • TPM
  • EPM and trend prediction

1. Introduction

The fundamental idea behind a hidden Markov model is that there is a Markov process we cannot observe that determines the probability distribution for what we do observe. Thus a hidden Markov model is specified by the transition density of the Markov chain and the probability laws that govern what we observe given the state of the Markov chain. Given such a model, we want to estimate any parameters that occur in the model. And also determined the most likely sequence for the hidden process. Finally we may want the probability distribution for the hidden states at every location.

Let yt represents the observed value of the process at location t for t=1,.,T,θt the value of the hidden process at location t and let ϕ represents parameters necessary to determine the probability distribution for yt given θt and θt given θt1. In our applications, yt will either be an increase or decrease and the hidden process will determine the probability distribution of observing different letters.

Our model is then described by the sets of probability distributions p(ytθt,ϕ) and pθtθt1ϕ. A crucial component of this model is that the yt are independent given the set of θt and θ only depends directly on its neighbors θt1 and θt+1. The various distribution in which we are interested are pϕy1.yT, pθty1.yT for all t and pθ1..θTy1..yt. We will adopt a Bayesian perspective, so that we treat θt as a random variable [1, 2].

The measure of best is to find the path that has the maximum probability in the HMM, given the sequenceX. Recall that the model gives the joint probabilities PrHX for all sequence, it also gives the posterior probability PrHX=PrHX/PrX, for every possible state path H through the model, conditioned on the sequence X with maximum posterior probability [3, 4]. Given that the denominator PrX is constant in the conditional probability formula for a given sequence X, maximizing the posterior probability is equivalent to finding the state path H* that maximizes the joint probability PrHX. Nguyen [5] has determined the optimal number of states for the HMM by using the AIC, BIC and HQ information criteria and also discussed the applications of HMM in stock trading. Hassan and Nath [6] have applied HMM to the airlines stock forecast. HMMs have been used for pattern recognition and classification problems and it was suitable for modeling dynamic systems.

Advertisement

2. Hidden Markov model

Hidden Markov model (HMM) is a stochastic model which is not directly observable, It describes the observable events that are depends on internal factors. The observable events are represented as symbols, where the invisible factor involved in the observation is represented as a state. HMM is a stochastic model where the system is assumed to be a Markov Process with hidden states and it gives better accuracy than the other models. Using the given input values, the parameters of the HMM (λ) denoted by A, B and π are found out. An HMM is defined as λ = (S,O,A,B,π) where S = {s1,s2,…,sN} is a set of N possible states O = {o1,o2,…,oM} is a set of M possible observation symbols, A is an N*N state Transition Probability Matrix (TPM), B is an N*M observation or Emission Probability Matrix (EPM) and Π is an N dimensional initial state probability distribution vector and A,B and π should satisfy the following conditions (Figure 1):

Figure 1.

Diagram of HMM.

j=1Naij=1where1iN;
j=1Mbij=1where1iN;
i=1Nπi=1whereπi0

2.1 Evaluation problem

Given the HMM = {A,B,π} and the observation sequence O = o1,o2,…,oM, the probability that model λ has generated sequence O is calculated. Often this problem is solved by the Forward Backward Algorithm [7, 8].

2.2 Decoding problem

Given the HMM λ = {A,B,π} and the observation sequence O = o1,o2,…,oM, calculate the most likely sequence of hidden states that produced this observation sequence O. Usually this problem is handled by Viterbi Algorithm [7, 8].

2.3 Learning problem

Given some training observation sequences O = o1,o2,…,oM, and general structure of HMM (numbers of hidden and visible states), determine HMM parameters λ = {A,B,π} that best fit training data. The most common solution for this problem is Baum-Welch algorithm [9, 10] which is considered as the traditional method for training HMM.

Advertisement

3. Results and discussions

In this chapter, the data has been taken from Yahoofinance.com and the NSE daily close value data for a month of January 2020 period is considered for the analysis.

Here two observing symbols “I” for Increasing states and the symbols “D” for decreasing states have been used. If the differences of close value greater than 0 its observing that the symbol is “f” and If the differences of close value less than 0 its observing that the symbol is “D”. There are six hidden states assumed and are denoted by the symbol S1, S2, S3, S4, S5, S6 are indicates that very low, low, moderate low, moderate high, high and very high respectively. The states are not directly observable.

The situations of the stock market are considered hidden. Given a sequence of observation we can find the hidden state sequence that produced those observations. Table 1 shows the daily close value of the stock market.

S. noDateClose
101/02/202041,626.64
201/03/202041,464.61
301/06/202040,676.63
401/07/202040,869.47
501/08/202040,817.74
601/09/202041,452.35
701/10/202041,599.72
801/13/202041,859.69
901/14/202041,952.63
1001/15/202041,872.73
1101/16/202041,932.56
1201/17/202041,945.37
1301/20/202041,528.91
1401/21/202041,323.81
1501/22/202041,115.38
1601/23/202041,386.4
1701/24/202041,613.19
1801/27/202041,155.12
1901/28/202040,966.86
2001/29/202041,198.66
2101/30/202040,913.82
2201/31/202040,723.49

Table 1.

Daily close value of NSE.

Interval values:

S1 = −9500 to −551.

S2 = −550 to −251.

S3 = −250 to 249.

S4 = 250 to 8500.

S. noc.vD in 1 day CVo.sD in 2 days CVo.sD in3 days CVo.sD in 4 day CVo.sD in 5 day CVo.sD in6 days CVo.s
141,626.64
241,464.61162.03I
340,676.63787.98I−625.95D
40,869.47−192.84D980.82I−1606.77D
540,817.7451.73I−244.57D1225.39I−2882.16D
641,452.35−634.61D686.84I−930.91D2156.3I−4988.46D
741,599.72−147.37D−487.24D1173.58I2104.49I4260.79I−9249.25D
841,759.69−259.97D112.6I−599.84D1773.42I−3877.91D8138.7I
941,952.63−92.94D−167.03D279.63I−879.47D2652.89I−6530.8D
1041,872.7379.9I−172.84D5.81I273.82I−1153.28D3806.18I
1141,932.56−59.83D139.73I−312.57D318.38I−44.56D−1108.73D
1241,945.37−12.81D−47.02D−92.71D405.28I−86.9D42.34I
1341,528.91416.46I403.65I−450.67D357.96I47.32I−134.22D
1441,323.81205.1I211.36I192.22I−642.96D1000.92I−953.6D
1541,115.38208.43I−3.33D214.69I−22.4D−620.56D1621.48I
1641,386.4−271.02D479.45I−482.78D697.47I−719.87D99.31I
1741,613.19−226.79D−44.23D523.68I−1006.46D1703.93I−2423.8D
1841,155.12458.01I−684.86D640.63I−116.95D−889.51D2593.44I
1940,966.86188.26I269.81I−415.05D1055.68I938.73I−1828.24D
2041,198.66−231.8D420.06I−150.25D−264.8D1320.48I−381.75D
2140,913.82284.84I−516.64I936.7I−1086.95D822.15I498.33I
2240,723.49190.33I94.51I−611.15D1547.85I−2634.8D3456.95I

Table 2.

Daily close value for finding differences in one day, two day, three days, four days, five days, six days close value.

The various probability values of TPM, EPM and π for difference in one day, two days, three days, four days, five days, six days close value are calculated as given below (Table 2).

S1S2S3S4
IDIDIDID
S100001000
S200001000
S30.07100.07100.14290.285700.4286
S40000.80.2000

Table 3.

Transitions with probability values for one day close value.

Probability values of TPM, EPM, and π for difference in one day close value (Figure 2 and Table 3):

[S1S2S3S4S10010S20010S30.0710.0710.42860.4286S40010][IDS101S201S30.28490.7143S40.50.5]

Figure 2.

Diagram of TPM day 1.

S1S2S3S4
IDIDIDID
S10000000.50.5
S200000.50.500
S300.111000.33330.22220.11110.2222
S4000.333300.500.16670

Table 4.

Transition table with probability values for difference in two day close value.

Probability values of TPM, EPM, and π for difference in two day close value (Figure 3 and Table 4).

[S1S2S3S4S10010S20010S30.011100.55550.3333S400.33330.50.1667][IDS10.50.5S20.50.5S30.44440.5556S410]

Figure 3.

Diagram of TPM day 2.

S1S2S3S4
IDIDIDID
S100000001
S2000000.7500.25
S3000.40.20.2000.2
S40.500.200.200.20

Table 5.

Transition table with probability values for difference in three day close value.

Probability values of TPM, EPM, and π for difference in three day close value (Figure 4 and Table 5):

[S1S2S3S4S10001S2000.750.25S300.60.20.2S40.50.20.20.2][IDS101S211S30.60.4S410]

Figure 4.

Diagram of TPM day 3.

S1S2S3S4
IDIDIDID
S10.14290.242900.142900.142900.4286
S20.500000.500
S300000100
S40.428600.14290000.42860

Table 6.

Transition table with probability values for difference in four day close value.

Probability values of TPM, EPM and π for difference in four days close value (Figure 5 and Table 6):

[S1S2S3S4S10.38580.14290.14290.4286S20.500.50S30010S40.42860.142900.4286][IDS10.14290.9573S20.50.5S301S410]

Figure 5.

Diagram of TPM day 4.

S1S2S3S4
IDIDIDID
S100.16670000.166700.6667
S200000000
S3000000.66670.33330
S40.7143000000.28570

Table 7.

Transition table with probability values for difference in five day close value.

Probability values of TPM, EPM and π for difference in five days close value (Figure 6 and Table 7):

[S1S2S3S4S10.166700.16670.6667S20000S3000.66670.3333S40.7143000.6667][IDS101S201S30.33330.6667S410]

Figure 6.

Diagram of TPM day 5.

S1S2S3S4
IDIDIDID
S10000.200.200.6
S200000001
S30.33330.3333000.3333000
S40.50000.2500.250

Table 8.

Transition table with probability values for difference in six day close value.

Probability values of TPM, EPM and π for difference in six days close value (Figure 7 and Table 8):

[S1S2S3S4S100.20.20.6S20001S30.666700.33330S40.500.250.25][IDS101S201S30.6670.3333S410]

Figure 7.

Diagram of TPM day 6.

The various transitions probability values for difference in one day to six days close values are displayed in Figure 2 to Figure 7 respectively.

Optimum Sequence of States:

To generate a random sequence of emission symbols and states are calculated by using the function “Hmmgenerate”. The HMM matlab toolbox syntax is: [Sequence,States] = Hmmgenerate(L,TPM,EPM). The length of both sequence and state to be generated is denoted by L [11]. The fitness function used for finding the fitted value of sequence of states is defined by

(Fitness=)1compareijEQ1

Using the iterative procedure, for each TPM and EPM framed we get an optimum sequence of states generated.

The length of the sequence taken as L = 4 and the optimum sequence of states obtained from the all six day’s differences with TPM and EPM is given in the below and here ‘ε’ is the start symbol.

1.εI
S4
D
S4
I
S3
D
S4
2.εD
S1
I
S4
I
S4
D
S3
3.εI
S4
I
s2
D
S3
D
S1
4.εD
S1
D
S4
I
S3
D
S4
5.εI
S3
I
S3
D
S2
D
S4
6.εI
S4
D
S1
I
S3
D
S4

Here, the one day difference of TPM and EPM has the shortest path. So the best optimum sequence is found from one day difference in close value. Using the fitness function we compute the fitness value for each of the optimum sequences of states obtained (Table 9).

S. no.Comparison of six optimum sequence of statesCalculated valueFitness = 1comparisionij
1(1,2) + (1,3) + (1,4)11
2(2,1) + (2,3) + (2,4)1.70.588
3(3,1) + (3,2) + (3,4)2.4250.412
4(4,1) + (4,2) + (4,3)3.150.32

Table 9.

Comparison of six optimum state sequences.

In column four the highest value is the fitness value and the better is the performance of the particular sequence.

Advertisement

4. Conclusion

Stock prediction is challenging due to its randomness. Hidden Markov Model can be used for stock prediction by finding hidden patterns. Here the Hidden Markov model easily recognized four states of the stock market and also it was used to predict the future values. The highest value in the Optimum State Sequences is the better performance of the particular sequence. Hidden states and sequences have been generated to easily identify the level of the sequence whether the next day value is increasing. And also identified whether the increasing level is moderate high or high or very high and also decreasing level whether moderate low or low or very low. This model will be very much useful for short term as well as long term investors.

References

  1. 1. Medhi J. Stochastic processes. New Age International; 1994
  2. 2. Reilly C. Statistics in human genetics and molecular biology. CRC Press; 2009 Jun 19
  3. 3. Brejová B, Brown DG, Vinaˇr T. ADVANCES IN HIDDEN MARKOV MODELS FOR SEQUENCE. Bioinformatics Algorithms: Techniques and Applications. 2008 Feb 15;3:55
  4. 4. Gupta A, Dhingra B. Stock market prediction using hidden Markov models. In 2012 Students Conference on Engineering and Systems 2012 Mar 16 (pp. 1-4). IEEE
  5. 5. Nguyen N. Hidden Markov model for stock trading. International Journal of Financial Studies. 2018 Jun;6(2):36
  6. 6. Hassan MR, Nath B. Stock market forecasting using hidden Markov model: a new approach. In 5th International Conference on Intelligent Systems Design and Applications (ISDA’05) 2005 Sep 8 (pp. 192-196). IEEE
  7. 7. Rabiner L. Theory and implementation of hidden Markov models. Fundamentals of speech recognition. 1993
  8. 8. Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE. 1989 Feb;77(2):257-286
  9. 9. Lloyd RW. Hidden Markov Models and the Baum-Welch Algorithm. IEEE Information Theory Society Newsletter. 2003 Dec;53(4)
  10. 10. Mandoiu I, Zelikovsky A. Bioinformatics algorithms: techniques and applications. John Wiley & Sons; 2008 Feb 25
  11. 11. Murphy K. HMM toolbox for Matlab. Internet: http://www.cs.ubc.ca/murphyk/Software/HMM/hmm.html,[Oct. 29, 2011]. 1998

Written By

Deneshkumar Venugopal, Senthamarai Kannan Kaliyaperumal and Sonai Muthu Niraikulathan

Submitted: 20 May 2020 Reviewed: 11 September 2020 Published: 12 November 2020