Open access peer-reviewed chapter

Design and Analysis of Robust Deep Learning Models for Stock Price Prediction

Written By

Jaydip Sen and Sidra Mehtab

Submitted: 28 May 2021 Reviewed: 18 August 2021 Published: 21 September 2021

DOI: 10.5772/intechopen.99982

From the Edited Volume

Machine Learning - Algorithms, Models and Applications

Edited by Jaydip Sen

Chapter metrics overview

491 Chapter Downloads

View Full Metrics

Abstract

Building predictive models for robust and accurate prediction of stock prices and stock price movement is a challenging research problem to solve. The well-known efficient market hypothesis believes in the impossibility of accurate prediction of future stock prices in an efficient stock market as the stock prices are assumed to be purely stochastic. However, numerous works proposed by researchers have demonstrated that it is possible to predict future stock prices with a high level of precision using sophisticated algorithms, model architectures, and the selection of appropriate variables in the models. This chapter proposes a collection of predictive regression models built on deep learning architecture for robust and precise prediction of the future prices of a stock listed in the diversified sectors in the National Stock Exchange (NSE) of India. The Metastock tool is used to download the historical stock prices over a period of two years (2013–2014) at 5 minutes intervals. While the records for the first year are used to train the models, the testing is carried out using the remaining records. The design approaches of all the models and their performance results are presented in detail. The models are also compared based on their execution time and accuracy of prediction.

Keywords

  • Stock Price Forecasting
  • Deep Learning
  • Univariate Analysis
  • Multivariate Analysis
  • Time Series Regression
  • Root Mean Square Error (RMSE)
  • Long-and-Short-Term Memory (LSTM) Network
  • Convolutional Neural Network (CNN)

1. Introduction

Building predictive models for robust and accurate prediction of stock prices and stock price movement is a very challenging research problem. The well-known efficient market hypothesis precludes any possibility of accurate prediction of future stock prices since it assumes stock prices to be purely stochastic in nature. Numerous works in the finance literature have shown that robust and precise prediction of future stock prices is using sophisticated machine learning and deep learning algorithms, model architectures, and selection of appropriate variables in the models.

Technical analysis of stocks has been a very interesting area of work for the researchers engaged in security and portfolio analysis. Numerous approaches to technical analysis have been proposed in the literature. Most of the algorithms here work on searching and finding some pre-identified patterns and sequences in the time series of stock prices. Prior detection of such patterns can be useful for the investors in the stock market in formulating their investment strategies in the market to maximize their profit. A rich set of such patterns has been identified in the finance literature for studying the behavior of stock price time series.

In this chapter, we propose a collection of forecasting models for predicting the prices of a critical stock of the automobile sector of India. The predictive framework consists of four CNN regression models and six models of regression built on the long-and-short-term memory (LSTM) architecture. Each model has a different architecture, different shapes of the input data, and different hyperparameter values.

The current work has the following three contributions. First, unlike the currently existing works in the literature, which mostly deal with time-series data of daily or weekly stock prices, the models in this work are built and tested on stock price data at a small interval of 5 minutes. Second, our propositions exploit the power of deep learning, and hence, they achieve a very high degree of precision and robustness in their performance. Among all models proposed in this work, the lowest ratio of the root mean square error (RMSE) to the average of the target variable is 0.006967. Finally, the speed of execution of the models is very fast. The fastest model requires 174.78 seconds for the execution of one round on the target hardware platform. It is worth mentioning here that the dataset used for training has 19500 records, while models are tested on 20500 records.

The chapter is organized as follows. Section 2 briefly discusses some related works in the literature. In Section 3, we discuss the method of data acquisition, the methodology followed, and the design details of the ten predictive models proposed by us. Section 4 exhibits the detailed experimental results and their analysis. A comparative study of the performance of the models is also made. In Section 5, we conclude the chapter and identify a few new directions of research.

Advertisement

2. Related work

The literature on systems and methods of stock price forecasting is quite rich. Numerous proposals exist on the mechanisms, approaches, and frameworks for predicting future stock prices and stock price movement patterns. At a broad level, these propositions can be classified into four categories. The proposals of the first category are based on different variants of univariate and multivariate regression models. Some of the notable approaches under this category are - ordinary least square (OLS) regression, multivariate adaptive regression spline (MARS), penalty-based regression, polynomial regression, etc. [1, 2, 3, 4]. These approaches are not, in general, capable of handling the high degree of volatility in the stock price data. Hence, quite often, these models do not yield an acceptable level of accuracy in prediction. Autoregressive integrated moving average (ARIMA) and other approaches of econometrics such as cointegration, vector autoregression (VAR), causality tests, and quantile regression (QR), are some of the methods which fall under the second category of propositions [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]. The methods of this category are superior to the simple regression-based methods. However, if the stock price data are too volatile and exhibit strong randomness, the econometric methods also are found to be inadequate, yielding inaccurate forecasting results. The learning-based approach is the salient characteristic of the propositions of the third category. These proposals are based on various algorithms and architectures of machine learning, deep learning, and reinforcement learning [17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41]. Since the frameworks under this category use complex predictive models working on sophisticated algorithms and architectures, the prediction accuracies of these models are found to be quite accurate in real-world applications. The propositions of the fourth category are broadly based on hybrid models built of machine learning and deep learning algorithms and architectures and also on the relevant inputs of sentiment and news items extracted from the social web [42, 43, 44, 45, 46, 47]. These models are found to yield the most accurate prediction of future stock prices and stock price movement patterns. The information-theoretic approach and the wavelet analysis have also been proposed in stock price prediction [48, 49]. Several portfolio optimization methods have also been presented in some works using forecasted stock returns and risks [50, 51, 52, 53, 54, 55].

In the following, we briefly discuss the salient features of some of the works under each category. We start with the regression-based proposals.

Enke et al. propose a multi-step approach to stock price prediction using a multiple regression model [2]. The proposition is based on a differential-evolution-based fuzzy clustering model and a fuzzy neural network. Ivanovski et al. present a linear regression and correlation study on some important stock prices listed in the Macedonian Stock Exchange [3]. The results of the work indicate a strong relationship between the stock prices and the index values of the stock exchange. Sen and Datta Chaudhuri analyze the trend and the seasonal characteristics of the capital goods sector and the small-cap sector of India using a time series decomposition approach and a linear regression model [4].

Among the econometric approaches, Du proposes an integrated model combining an ARIMA and a backpropagation neural network for predicting the future index values of the Shanghai Stock Exchange [6]. Jarrett and Kyper present an ARIMA-based model for predicting future stock prices [7]. The study conducted by the authors reveals two significant findings: (i) higher accuracy is achieved by models involving fewer parameters, and (ii) the daily return values exhibit a strong autoregressive property. Sen and Datta Chaudhuri different sectors of the Indian stock market using a time series decomposition approach and predict the future stock prices using different types of ARIMA and regression models [9, 10, 11, 12, 13, 14, 33]. Zhong and Enke present a gamut of econometric and statistical models, including ARIMA, generalized autoregressive conditional heteroscedasticity (GARCH), smoothing transition autoregressive (STAR), linear and quadratic discriminant analysis [16].

Machine learning and deep learning models have found widespread applications in designing predictive frameworks for stock prices. Baek and Kim propose a framework called ModAugNet, which is built on an LSTM deep learning model [17]. Chou and Nguyen preset a sliding window metaheuristic optimization method for stock price prediction [19]. Gocken et al. propose a hybrid artificial neural network using harmony search and genetic algorithms to analyze the relationship between various technical indicators of stocks and the index of the Turkish stock market [21]. Mehtab and Sen propose a gamut of models designed using machine learning and deep learning algorithms and architectures for accurate prediction of future stock prices and movement patterns [22, 23, 24, 25, 26, 27, 28, 34, 35]. The authors present several models which are built on several variants of convolutional neural networks (CNNs) and long-and-short-term memory networks (LSTMs) that yield a very high level of prediction accuracy. Zhang et al. present a multi-layer perceptron for financial data mining that is capable of recommending buy or sell strategies based on forecasted prices of stocks [40].

The hybrid models use relevant information in the social web and exploit the power of machine learning and deep learning architectures and algorithms for making predictions with a high level of accuracy. Among some well-known hybrid models, Bollen et al. present a scheme for computing the mood states of the public from the Twitter feeds and use the mood states information as an input to a nonlinear regression model built on a self-organizing fuzzy neural network [43]. The model is found to have yielded a prediction accuracy of 86%. Mehtab and Sen propose an LSTM-based predictive model with a sentiment analysis module that analyzes the public sentiment on Twitter and produces a highly accurate forecast of future stock prices [45]. Chen et al. present a scheme that collects relevant news articles from the web, converts the text corpus into a word feature set, and feeds the feature set of words into an LSTM regression model to achieve a highly accurate prediction of the future stock prices [44].

The most formidable challenge in designing a robust predictive model with a high level of precision for stock price forecasting is handling the randomness and the volatility exhibited by the time series. The current work utilizes the power of deep learning models in feature extraction and learning while exploiting their architectural diversity in achieving robustness and accuracy in stock price prediction on very granular time series data.

Advertisement

3. Methodology

We propose a gamut of predictive models built on deep learning architectures. We train, validate, and then test the models based on the historical stock price records of a well-known stock listed in the NSE, viz. Century Textiles. The historical prices of Century Textiles stock from 31st Dec 2012, a Monday to 9th Jan 2015, a Friday, are collected at 5 minutes intervals using the Metastock tool [56]. We carry out the training and validation of the models using the stock price data from 31st Dec 2012 to 30th Dec 2013. The models are tested based on the records for the remaining period, i.e., from 31st Dec 2013, to 9th Jan 2015. For maintaining uniformity in the sequence, we organize the entire dataset as a sequence of daily records arranged on a weekly basis from Monday to Friday. After the dataset is organized suitably, we split the dataset into two parts – the training set and the test set. While the training dataset consists of 19500 records, there are 20500 tuples in the test data. Every record has five attributes – open, high, low, close, and volume. We have not considered any adjusted attribute (i.e., adjusted close, adjusted volume, etc.) in our analysis.

In this chapter, we present ten regression models for stock price forecasting using a deep learning approach. For the univariate models, the objective is to forecast the future values of the variable open based on its past values. On the other hand, for the multivariate models, the job is to predict the future values of open using the historical values of all the five attributes in the stock data. The models are tested following an approach known as multi-step prediction using a walk-forward validation [22]. In this method, we use the training data for constructing the models. The models are then used for predicting the daily open values of the stock prices for the coming week. As a week completes, we include the actual stock price records of the week in the training dataset. With this extended training dataset, the open values are forecasted with a forecast horizon of 5 days so that the forecast for the days in the next week is available. This process continues till all the records in the test dataset are processed.

The suitability of CNNs in building predictive models for predicting future stock prices has been demonstrated in our previous work [22]. In the current work, we present a gamut of deep learning models built on CNN and LSTM architectures and illustrate their efficacy and effectiveness in solving the same problem.

CNNs perform two critical functions for extracting rich feature sets from input data. These functions are: (1) convolution and (2) pooling or sub-sampling [57]. A rich set of features is extracted by the convolution operation from the input, while the sub-sampling summarizes the salient features in a given locality in the feature space. The result of the final sub-sampling in a CNN is passed on to possibly multiple dense layers. The fully connected layers learn from the extracted features. The fully connected layers provide the network with the power of prediction.

LSTM is an adapted form of a recurrent neural network (RNN) and can interpret and then forecast sequential data like text and numerical time series data [57]. The networks have the ability to memorize the information on their past states in some designated cells in memory. These memory cells are called gates. The information on the past states, which is stored in the memory cells, is aggregated suitably at the forget gates by removing the irrelevant information. The input gates, on the other hand, receive information available to the network at the current timestamp. Using the information available at the input gates and the forget gates, the computation of the predicted values of the target variable is done by the network. The predicted value at each timestamp is made available through the output gate of the network [57].

The deep learning-based models we present in this paper differ in their design, structure, and dataflows. Our proposition includes four models based on the CNN architecture and six models built on the LSTM network architecture. The proposed models are as follows. The models have been named following a convention. The first part of the model’s name indicates the model type (CNN or LSTM), the second part of the name indicates the nature of the input data (univariate or multivariate). Finally, the third part is an integer indicating the size of the input data to the model (5 or 10). The ten models are as follows:

(i) CNN_UNIV_5 – a CNN model with an input of univariate open values of stock price records of the last week, (ii) CNN_UNIV_10 – a CNN model with an input of univariate open values of stock price records of the last couple of weeks, (iii) CNN_MULTV_10 – a CNN model with an input of multivariate stock price records consisting of five attributes of the last couple of weeks, where each variable is passed through a separate channel in a CNN, (iv) CNN_MULTH_10 – a CNN model with the last couple of weeks’ multivariate input data where each variable is used in a dedicated CNN and then combined in a multi-headed CNN architecture, (v) LSTM_UNIV_5 – an LSTM with univariate open values of the last week as the input, (vi) LSTM_UNIV_10 – an LSTM model with the last couple of weeks’ univariate open values as the input, (vii) LSTM_UNIV_ED_10 – an LSTM having an encoding and decoding ability with univariate open values of the last couple of weeks as the input, (viii) LSTM_MULTV_ED_10 – an LSTM based on encoding and decoding of the multivariate stock price data of five attributes of the last couple of weeks as the input, (ix) LSTM_UNIV_CNN_10 – a model with an encoding CNN and a decoding LSTM with univariate open values of the last couple of weeks as the input, and (x) LSTM_UNIV_CONV_10 – a model having a convolutional block for encoding and an LSTM block for decoding and with univariate open values of the last couple of weeks as the input.

We present a brief discussion on the model design. All the hyperparameters (i.e., the number of nodes in a layer, the size of a convolutional, LSTM or pooling layer, etc.) used in all the models are optimized using grid-search. However, we have not discussed the parameter optimization issues in this work.

3.1 The CNN_UNIV_5 model

This CNN model is based on a univariate input of open values of the last week’s stock price records. The model forecasts the following five values in the sequence as the predicted daily open index for the coming week. The model input has a shape (5, 1) as the five values of the last week’s daily open index are used as the input. Since the input data for the model is too small, a solitary convolutional block and a subsequent max-pooling block are deployed. The convolutional block has a feature space dimension of 16 and the filter (i.e., the kernel) size of 3. The convolutional block enables the model to read each input three times, and for each reading, it extracts 16 features from the input. Hence, the output data shape of the convolutional block is (3,16). The max-pooling layer reduces the dimension of the data by a factor of 1/2. Thus, the max-pooling operation transforms the data shape to (1, 16). The result of the max-pooling layer is transformed into an array structure of one-dimension by a flattening operation. This one-dimensional vector is then passed through a dense layer block and fed into the final output layer of the model. The output layer yields the five forecasted open values in sequence for the coming week. A batch size of 4 and an epoch number of 20 are used for training the model. The rectified linear unit (ReLU) activation function and the Adam optimizer for the gradient descent algorithm are used in all layers except the final output layer. In the output layer of the model, the sigmoid is used as the activation function. The use of the activation function and the optimizer is the same for all the models. The schematic architecture of the model is depicted in Figure 1.

Figure 1.

The schematic architecture of the model CNN_UNIV_5.

We compute the number of trainable parameters in the CNN_UNIV_5 model. As the role of the input layer is to provide the input data to the network, there is no learning involved in the input layer. There is no learning in the pooling layers as all these layers do is calculate the local aggregate features. The flatten layers do not involve any learning as well. Hence, in a CNN model, the trainable parameters are involved only in the convolutional layers and the dense layers. The number of trainable parameters (n1) in a one-dimensional convolutional layer is given by (1), where k is the kernel size, and d and f are the sizes of the feature space in the previous layer and the current layer, respectively. Since each element in the feature space has a bias, the term 1 is added in (1)

n1=kd+1fE1

The number of parameters (n2) in a dense layer of a CNN is given by (2), in which pcurrent and pprevious refer to the node count in the current layer and the previous layer, respectively. The second term on the right-hand side of (2) refers to the bias terms for the nodes in the current layer.

n2=pcurrpprev+1pcurrE2

The computation of the number of parameters in the CNN_UNIV_5 model is presented in Table 1. It is observed that the model involves 289 trainable parameters. The number of parameters in the convolutional layer is 64, while the two dense layers involve 170 and 55 parameters, respectively.

Layerkdfpprevpcurrn1n2#params
Conv1D (conv1d)31166464
Dense (dense)1610170170
Dense (dense_1)1055555
Total #parameters289

Table 1.

Computation of the number of params in the model CNN_UNIV_5.

3.2 The CNN_UNIV_10 model

This model is based on a univariate input of the open values of the last couple of weeks’ stock price data. The model computes the five forecasted daily open values in sequence for the coming week. The structure and the data flow for this model are identical to the CNN_UNIV_5 model. However, the input of the model has a shape of (10, 1). We use 70 epochs and 16 batch-size for training the model. Figure 2 shows the architecture of the model CNN_UNIV_10. The computation of the number of parameters in the model CNN_UNIV_10 is exhibited in Table 2.

Figure 2.

The architecture of the model CNN_UNIV_10.

Layerkdfpprevpcurrn1n2#params
Conv1D (conv1d)31166464
Dense (dense)6410650650
Dense (dense_1)1055555
Total #parameters769

Table 2.

The number of parameters in the model CNN_UNIV_10 model.

It is evident from Table 2 that the CNN_UNIV_10 involves 769 trainable parameters. The parameter counts for the convolutional layer, and the two dense layers are 64, 650, and 55 respectively.

3.3 The CNN_MULTV_10 model

This CNN model is built on the input of the last two weeks’ multivariate stock price records data. The five variables of the stock price time series are used in a CNN in five separate channels. The model uses a couple of convolutional layers, each of size (32, 3). The parameter values of the convolutional blocks indicate that 32 features are extracted from the input data by each convolutional layer using a feature map size of 32 and a filter size of 3. The input to the model has a shape of (10, 5), indicating ten records, each record having five features of the stock price data. After the first convolutional operation, the shape of the data is transformed to (8, 32). The value 32 corresponds to the number of features extracted, while the value 8 is obtained by the formula: f = (k - n) +1, where, k = 10, n = 3, hence, f = 8. Similarly, the output data shape of the second convolutional layer is (6, 32). A max-pooling layer reduces the feature space size by a factor of 1/2 producing an output data shape of (3, 32). The max-pooling block’s output is then passed on to a third convolutional layer with a feature map of 16 and a kernel size of 3. The data shape of the output from the third convolutional layer becomes (1, 16) following the same computation rule. Finally, another max-pooling block receives the results of the final convolutional layer. This block does not reduce the feature space since the input data shape to it already (1, 16). Hence, and the output of the final max-pooling layer remains unchanged to (1,16). A flatten operation follows that converts the 16 arrays containing one value to a single array containing 16 values. The output of the flatten operation is passed on to a fully connected block having 100 nodes. Finally, the output block with five nodes computes the predicted daily open index of the coming week. The epochs size and the batch size used in training the model are 70 and 16, respectively. Figure 3 depicts the CNN_MULTV_10 model. Table 3 shows the computation of the number of trainable parameters involved in the model.

Figure 3.

The schematic architecture of the model CNN_MULTV_10.

Layerkdfpprevpcurrn1n2#params
Conv1D (conv1d_4)3*5132512512
Conv1D (conv1d_5)3323231043014
Conv1D (conv1d_6)3321615521552
Dense (dense_3)1610017001700
Dense (dense_4)1005505505
Total #parameters7373

Table 3.

The number of parameters in the model CNN_MULTV_10.

From Table 3, it is observed that the total number of trainable parameters in the model CNN_MULTV_10 is 7373. The three convolutional layers conv1d_4, conv1d_5, and conv1d_6 involve 512, 3014, and 1552 parameters, respectively. It is to be noted that the value of k for the first convolutional layer, conv1d_4, is multiplied by a factor of five since there are five attributes in the input data for this layer. The two dense layers, dense_3 and dense_4 include 1700 and 505 parameters, respectively.

3.4 The CNN_MULTH_10 model

This CNN model uses a dedicated CNN block for each of the five input attributes in the stock price data. In other words, for each input variable, a separate CNN is used for feature extrication. We call this a multivariate and multi-headed CNN model. For each sub-CNN model, a couple of convolutional layers were used. The convolutional layers have a feature space dimension of 32 and a filter size (i.e., kernel size) of 3. The convolutional layers are followed by a max-pooling layer. The size of the feature space is reduced by a factor of 1/2 by the max-pooling layer. Following the computation rule discussed under the CNN_MULTV_10 model, the data shape of the output from the max-pooling layer for each sub-CNN model is (3, 32). A flatten operation follows converting the data into a single-dimensional array of size 96 for each input variable. A concatenation operation follows that concatenates the five arrays, each containing 96 values, into a single one-dimensional array of size 96*5 = 480. The output of the concatenation operation is passed successively through two dense layers containing 200 nodes and 100 nodes, respectively. In the end, the output layer having five nodes yields the forecasted five values as the daily open stock prices for the coming week. The epoch number and the batch size used in training the model are 70 and 16, respectively. Figure 4 shows the structure and data flow of the CNN_MULTH_10 model.

Figure 4.

The schematic architecture of the model CNN_MULTH_10.

Table 4 presents the necessary calculations for finding the number of parameters in the CNN_MULTH_10 model. Each of the five convolutions layers, conv1d_1, conv1d_3, conv1d_5, conv1d_7, and convid_9, involves 128 parameters. For each of these layers, k = 3, d = 1 and f = 3, and hence the number of trainable parameters is: (3 * 1 + 1) * 32 = 128. Hence, for the five convolutional layers, the total number of parameters is 128 * 5 = 640. Next, for each of the five convolutional layers, conv1d_2, conv1d_4, conv1d_6, conv1d_8, and con1d_10, involves 3104. Each layer of this group has k = 3, d = 32, and f = 32. Hence the number of trainable parameters for each layer is: (3*32 + 1) * 32 = 3104. Therefore, for the five convolutional layers, the total number of parameters is 3104 * 5 = 15, 520. The dense layers, dense_1, dense_2, and dense_3 involve 96200, 20100, and 505 parameters using (2). Hence, the model includes 132,965 parameters.

Layerkdfpprevpcurrn1n2#params
Conv1D (conv1d_1, conv1d_3, conv1d_5, conv1d_7, conv1d_9)3132640640
Conv1D (conv1d_2, convid_4, conv1d_6, conv1d_8, conv1d_10)332321552015520
Dense (dense_1)4802009620096200
Dense (dense_2)2001002010020100
Dense (dense_3)1005505505
Total #parameters132965

Table 4.

The number of parameters in the model CNN_MULTH_10 model.

3.5 The LSTM_UNIV_5 model

This model is based on an input of the univariate information of the open values of the last week’s stock price records. The model predicts the future five values in sequence as the daily open index for the coming week. The input has a shape of (5, 1) that indicates that the previous week’s daily open index values are passed as the input. An LSTM block having 200 nodes receives that data from the input layer. The number of nodes at the LSTM layer is determined using the grid-search. The results of the LSTM block are passed on to a fully connected layer (also known as a dense layer) of 100 nodes. Finally, the output layer containing five nodes receives the output of the dense layer and produces the following five future values of open for the coming week. In training the model, 20 epochs and 16 batch-size are used. Figure 5 presents the structure and data flow of the model.

Figure 5.

The schematic architecture of the model LSTM_UNIV_5.

As we did in the case of the CNN models, we now compute the number of parameters involved in the LSTM model. The input layers do not have any parameters, as the role of these layers is to just receive and forward the data. There are four gates in an LSTM network that have the same number of parameters. These four gates are known as (i) forget gate, (ii) input gate, (iii) input modulation gate, and the output gate. The number of parameters (n1) in each of the gates in an LSTM network is computed using (3), where x denotes the number of LSTM units, and y is the input dimension (i.e., the number of features in the input data)

n1=x+yx+xE3

Hence, the total number of parameters in an LSTM layer will be given by 4 * n1. The number of parameters (n2) in a dense layer of an LSTM network is computed using (4), where pprev and pcurr are the number of nodes in the previous layer and the current layer, respectively. The bias parameter of each node in the current layer is represented by the last term on the right-hand side of (4).

n2=pprevpcurr+pcurrE4

The computation of the number of parameters associated with the model LSTM_UNIV_5 is depicted in Table 5. In Table 5, the number of parameters in the LSTM layer is computed as follows: 4*[(200 + 1) * 200 + 200] = 161,600. The number of parameters in the dense layer, dens_4 is computed as: (200 * 100 + 100) = 20,100. Similarly, the parameters in the dense layers, dense_5 and dense_6, are computed. The total number of parameters in the LSTM_UNIV_5 model is found to be 182, 235.

Layerxypprevpcurrn1n2#params
LSTM (lstm_2)200140,400161600
Dense (dense_4)2001002010020100
Dense (dense_5)1005505505
Desne (dense_6)553030
Total #parameters182235

Table 5.

The number of parameters in the model LSTM_UNIV_5 model.

3.6 The LSTM_UNIV_10 model

LSTM_UNIV_10 model: This univariate model uses the last couple of weeks’ open index input and yields the daily forecasted open values for the coming week. The same values of the parameters and hyperparameters of the model LSTM_UNIV_5 are used here. Only, the input data shape is different. The input data shape of this model is (10, 1). Figure 6 presents the architecture of this model.

Figure 6.

The schematic architecture of the model LSTM_UNIV_10.

Table 6 presents the computation of the number of parameters involved in the modelLSTM_UNIV_10. Since the number of parameters in the LSTM layers depends only on the number of features in the input data and the node-count in the LSTM layer, and not on the number of input records in one epoch, the model LSTM_UNIV_10 has an identical number of parameters in the LSTM layer as that of the model LSTM_UNIV_5. Since both the models have the same number of dense layers and have the same architecture for those layers, the total number of parameters for both the models are the same.

Layerxypprevpcurrn1n2#params
LSTM (lstm_2)200140,400161600
Dense (dense_4)20010020,10020100
Dense (dense_5)1005505505
Desne (dense_6)553030
Total #parameters182235

Table 6.

The number of parameters in the model LSTM_UNIV_10.

3.7 The LSTM_UNIV_ED_10 model

This LSTM model has an encoding and decoding capability and is based on the input of the open values of the stock price records of the last couple of weeks. The model consists of two LSTM blocks. One LSTM block performs the encoding operation, while the other does the decoding. The encoder LSTM block consists of 200 nodes (determined using the grid-search procedure). The input data shape to the encoder LSTM is (10, 1). The encoding layer yields a one-dimensional vector of size 200 – each value corresponding to the feature extracted by a node in the LSTM layer from the ten input values received from the input layer. Corresponding to each timestamp of the output sequence (there are five timestamps for the output sequence for the five forecasted open values), the input data features are extracted once. Hence, the data shape from the repeat vector layer’s output is (5, 200). It signifies that in total 200 features are extracted from the input for each of the five timestamps corresponding to the model’s output (i.e., forecasted) sequence. The second LSTM block decodes the encoded features using 200 nodes.

The decoded result is passed on to a dense layer. The dense layer learns from the decoded values and predicts the future five values of the target variable (i.e., open) for the coming week through five nodes in the output layer. However, the forecasted values are not produced in a single timestamp. The forecasted values for the five days are made in five rounds. The round-wise forecasting is done using a TimeDistributedWrapper function that synchronizes the decoder LSTM block, the fully connected block, and the output layer in every round. The number of epochs and the batch sizes used in training the model are 70 and 16, respectively. Figure 7 presents the structure and the data flow of the LSTM_UNIV_ED_10 model.

Figure 7.

The schematic architecture of the model LSTM_UNIV_ED_10.

The computation of the number of parameters in the LSTM_UNIV_ED_10 model is shown in Table 7. The input layer and the repeat vector layer do not involve any learning, and hence these layers have no parameters. On the other hand, the two LSTM layers, lstm_3 and lstm_4, and the two dense layers, time_distributed_3, and time_distributed_4 involve learning. The number of parameters in the lstm_3 layer is computed as: 4 * [(200 + 1) * 200 + 200] = 161, 600. The computation of the number of parameters in the lstm_4 layer is as follows: 4 * [(200 + 200) * 200 + 200] = 320, 800. The computations of the dense layers’ parameters are identical to those in the models discussed earlier. The total number of parameters in this model turns out to be 5,02,601.

Layerxypprevpcurrn1n2#params
LSTM (lstm_3)200140,400161600
LSTM (lstm_4)20020080, 200320800
Dense (time_dist_dense_3)20010020,10020100
Dense (time_dist_dense_4)1001101101
Total #parameters502601

Table 7.

The number of parameters in the model LSTM_UNIV_ED_10.

3.8 The LSTM_MULTV_ED_10 model

This model is a multivariate version of LSTM_UNIV_ED_10. It uses the last couple of weeks’ stock price records and includes all the five attributes, i.e., open, high, low, close, and volume. Hence, the input data shape for the model is (10, 5). We use a batch size of 16 while training the model over 20 epochs. Figure 8 depicts the architecture of the multivariate encoder-decoder LSTM model.

Figure 8.

The schematic architecture of the model LSTM_MULTV_ED_10.

Table 8 shows the number of parameters in the LSTM_MULTV_ED_10 model. The computation of the parameters for this model is exactly similar to that for the model LSTM_UNIV_ED_50 expect for the first LSTM layer. The number of parameters in the first LSTM (i.e., the encoder) layer for this model will be different since the number of parameters is dependent on the count of the features in the input data. The computation of the parameter counts in the encoder LSTM layer, lstm_1, of the model is done as follows: 4 * [(200 + 5) * 200 + 200] = 164800. The total number of parameters for the model is found to be 505801.

Layerxypprevpcurrn1n2#params
LSTM (lstm_1)200541200164800
LSTM (lstm_2)20020080, 200320800
Dense (time_dist_dense_1)20010020,10020100
Dense (time_dist_dense_2)1001101101
Total #parameters505801

Table 8.

The number of parameters in the model LSTM_MULTV_ED_10.

3.9 The LSTM_UNIV_CNN_10 model

This model is a modified version of the LSTM_UNIV_ED_N_10 model. A dedicated CNN block carries out the encoding operation. CNNs are poor in their ability to learn from sequential data. However, we exploit the power of a one-dimensional CNN in extracting important features from time-series data. After the feature extraction is done, the extracted features are provided as the input into an LSTM block. The LSTM block decodes the features and makes robust forecasting of the future values in the sequence. The CNN block consists of a couple of convolutional layers, each of which has a feature map size of 64 and a kernel size of 3. The input data shape is (10, 1) as the model uses univariate data of the target variable of the past couple of weeks. The output shape of the initial convolutional layer is (8, 64). The value of 8 is arrived at using the computation: (10–3 + 1), while 64 refers to the feature space dimension.

Similarly, the shape of the output of the next convolutional block is (6, 64). A max-pooling block follows, which contracts the feature-space dimension by 1/2. Hence, the output data shape of the max-pooling layer is (3, 64). The max-pooling layer’s output is flattened into an array of single-dimension and size 3*64 = 192. The flattened vector is fed into the decoder LSTM block consisting of 200 nodes. The decoder architecture remains identical to the decoder block of the LSTM_UNIV_ED_10 model. We train the model over 20 epochs, with each epoch using 16 records. The structure and the data flow of the model are shown in Figure 9.

Figure 9.

The schematic architecture of the model LSTM_UNIV_CNN_10.

Table 9 presents the computation of the number of parameters in the model LSTM_UNIV_CNN_10. The input layer, the max-pooling layer, the flatten operation, and the repeat vector layer do not involve any learning, and hence they have no parameters. The number of parameters in the first convolutional layer is computed as follows: (3 + 1) * 64 = 256. For the second convolutional layer, the number of parameters is computed as: (3 * 64 + 1) * 64 = 12352. The number of parameters for the LSTM layer is computed as follows: 4 * [(200 + 192) * 200 + 200] = 314400. In the case of the first dense layer, the number of parameters is computed as follows: (200 * 100 + 100) = 20100. Finally, the number of parameters in the second dense layer is computed as (100 * 1 + 1) = 101. The total number of parameters in the model is found out to be 347209.

Layerkdfxypprevpcurrn1n2#param
Conv1D (conv1d_4)3164256256
Conv1D (conv1d_5)364641235212352
LSTM (lstm_2)20019278600314400
Dense (time_dist_4)20010020,10020100
Dense (time_dist_5)1001101101
Total #parameters347209

Table 9.

The number of parameters in the model LSTM_UNIV_CNN_10.

3.10 The LSTM_UNIV_CONV_10 model

This model is a modification of the LSTM_UNIV_CNN_10 model. The encoder CNN’s convolution operations and the decoding operations of the LSTM sub-module are integrated for every round of the sequence in the output. This encoder-decoder model is also known as the Convolutional-LSTM model [58]. This integrated model reads sequential input data, performs convolution operations on the data without any explicit CNN block, and decodes the extracted features using a dedicated LSTM block. The Keras framework contains a class, ConvLSTM2d, which is capable of performing two-dimensional convolution operations [58]. The two-dimensional ConvLSTM class is tweaked to enable it to process univariate data of one dimension. The architecture of the model LSTM_UNIV_CONV_10 is represented in Figure 10.

Figure 10.

The schematic architecture of the model LSTM_UNIV_CONV_10.

The computation of the number of parameters for the LSTM_UNIV_CONV_10 model is shown in Table 10. While the input layer, the flatten operation, and the repeat vector layer do not involve any learning, the other layers include trainable parameters. The number of parameters in the convolutional LSTM layer (i.e., conv_1st_m2d) is computed as follows: 4*x*[k (1+ x) + 1] = 4*64[3(1 + 64) + 1] = 50176. The number of parameters in the LSTM layer is computed as follows: 4*[(200 + 192)*200 + 100] = 314400. The number of parameters in the first time distributed dense layer is computed as (200*100 + 100) = 20100. The computation for the final dense layer is as follows: (100*1 + 1) = 101. The total number of parameters involved in the model, LSTM_UNIV_CONV_10 is 38,4777.

Layerkdfxypprevpcurrn1n2#param
ConvLSTM2D(conv_1st_m2d)31646411254450176
LSTM (lstm)20019278600314400
Dense (time_dist)20010020,10020100
Dense (time_dist_1)1001101101
Total #parameters384777

Table 10.

Computation of the no. of params in the model LSTM_UNIV_CONV_10.

Advertisement

4. Performance results

We present the results on the performance of the ten deep learning models on the dataset we prepared. We also compare the performances of the models. For designing a robust evaluation framework, we execute every model over ten rounds. The average performance of the ten rounds is considered as the overall performance of the model. We use four metrics for evaluation: (i) average RMSE, (ii) the RMSE for different days (i.e., Monday to Friday) of a week, (iii) the time needed for execution of one round, and (iv) the ratio of the RMSE to the response variable’s (i.e., open value’s) mean value. The models are trained on 19500 historical stock records and then tested on 20250 records. The mean value of the response variable, open, of the test dataset is 475.70. All experiments are carried on a system with an Intel i7 CPU with a clock frequency in the range 2.60 GHz – 2.56 GHz and 16GB RAM. The time needed to complete one round of execution of each model is recorded in seconds. The models are built using the Python programming language version 3.7.4 and the frameworks TensorFlow 2.3.0 and Keras 2.4.5.

Table 11 shows the results of the performance of the CNN_UNIV_5 model. The model takes, on average, 174.78 seconds to finish its one cycle of execution. For this model, the ratio of RMSE to the mean open values is 0.007288. The ratio of the RMSE to the average of the actual open values for day1 through day5 are 0.0062, 0.0066, 0.0073, 0.0078, and 0.0083, respectively. Here, day1 refers to Monday, and day5 is Friday. In all subsequent Tables, we will use the same notations. The RMSE values of the model CNN_UNIV_N_5 plotted on different days in a week are depicted in Figure 11 as per record no 2 in Table 11.

No.Agg RMSEDay1Day2Day3Day4Day5Time (sec)
14.0584.003.403.904.404.50173.95
23.7823.103.303.804.104.40176.92
33.3782.803.003.403.603.90172.21
43.2962.603.003.303.603.90173.11
53.2272.603.003.303.503.70174.72
63.2532.603.003.303.503.70183.77
73.8013.603.603.803.804.10172.29
83.2252.602.903.303.503.70171.92
93.3062.803.003.303.503.70174.92
103.3442.703.103.403.603.80174.01
Mean3.4672.943.133.483.713.94174.78
RMSE/Mean0.0072880.00620.00660.00730.00780.0083

Table 11.

The RMSE and the execution time of the CNN_UNIV_5 model.

Figure 11.

RMSE vs. day plot of CNN_UNIV_5 (depicted by tuple#2 in Table 11).

Table 12 depicts the performance results of the model CNN_UNIV_10. The model needs 185.01 seconds on average for one round. The ratio of the RMSE to the average of the open values for the model is 0.006967. The ratios of the RMSE to the average open values for day1 through day5 for the model are 0.0056, 0.0067, 0.0070, 0.0075, and 0.0080, respectively. Figure 12 presents the RMSE values for the results of round 7 in Table 12.

No.Agg RMSEDay1Day2Day3Day4Day5Time (sec)
13.1652.503.203.103.503.50177.86
23.8133.303.903.303.604.80202.25
33.2302.602.903.303.503.80183.45
43.2092.503.103.403.403.60188.35
53.1762.803.003.103.403.60180.30
63.2332.603.003.303.503.70181.20
73.3122.703.203.203.503.80188.81
83.0822.202.803.303.303.50180.89
93.7722.803.703.904.304.10186.23
103.1502.402.903.203.503.60180.78
Mean3.31422.643.173.313.553.80185.01
RMSE/Mean0.0069670.00560.00670.00700.00750.0080

Table 12.

The RMSE and the execution time of the CNN_UNIV_10 model.

Figure 12.

RMSE vs. day plot of CNN_UNIV_10 (depicted by tuple#7 in Table 12).

Table 13 depicts the performance results of the model CNN_MULTV_10. One round of execution of the model requires 202.78 seconds. The model yields a value of 0.009420 for the ratio of the RMSE to the average of the open values. The ratios of the RMSE values to the mean of the open values for day1 through day5 of a week are 0.0085, 0.0089, 0.0095, 0.0100, and 0.0101, respectively. The RMSE values of the model CNN_MULTV_N_10 plotted on different days in a week are depicted in Figure 13 based on record number 6 of Table 13.

No.Agg RMSEDay1Day2Day3Day4Day5Time (sec)
14.5254.004.304.504.705.00206.92
23.6063.103.303.703.804.00202.61
34.8304.604.704.705.105.00202.87
44.9384.404.804.705.305.40201.49
54.1933.504.004.104.604.60214.66
65.1014.704.905.205.305.30190.73
74.7514.404.504.805.005.00201.73
83.9273.203.704.004.304.40200.04
94.2673.903.804.504.604.40199.09
104.6614.404.504.604.904.90207.62
Mean4.47994.024.254.534.764.80202.78
RMSE/Mean0.0094200.00850.00890.00950.01000.0101

Table 13.

The RMSE and the execution time of the CNN_MULTV_10 model.

Figure 13.

RMSE vs. day plot of CNN_MULTV_10 (based on tuple#6 in Table 13).

Table 14 depicts the results of the model CNN_MULTH_10. The model needs, on average, 215.07 seconds to execute its one round. The ratio of the RMSE to the average of the open values is 0.008100. The ratios of the RMSE to the average open value for day1 to day5 are 0.0076, 0.0075, 0.0082, 0.0084, and 0.0088, respectively. The pattern of variations exhibited by the model daily RMSE is shown in Figure 14 as per record no 4 in Table 14.

No.Agg RMSEDay1Day2Day3Day4Day5Time (sec)
13.3382.702.803.303.704.00224.63
23.2642.803.103.303.503.70216.44
33.0152.302.703.103.303.50218.14
43.6923.203.404.003.804.00220.01
53.4442.803.203.403.803.90212.54
64.0194.503.703.704.203.90210.95
76.9886.407.407.206.807.10210.24
83.1332.502.803.203.403.60214.48
93.2782.403.103.703.403.60211.53
104.4695.903.604.004.104.40211.78
Mean3.8643.553.583.894.004.17215.07
RMSE/Mean0.0081000.00760.00750.00820.00840.0088

Table 14.

The RMSE and the execution time of the CNN_MULTH_10 model.

Figure 14.

RMSE vs. day plot of CNN_MULTH_10 (based on tuple#4 in Table 14).

The results of the LSTM_UNIV_5 model are depicted in Table 15. The average time needed to complete one round of the model is 371.62 seconds. The ratio of the RMSE and the average value of the target variable is 0.007770. The RMSE values for day1 to day5 are 0.0067, 0.0071, 0.0074, 0.0081, and 0.0086, respectively. The pattern of variation of the daily RMSE is as per record no 9 in Table 15 is depicted in Figure 15.

No.Agg RMSEDay1Day2Day3Day4Day5Time (sec)
13.1252.402.903.003.503.70372.28
23.3763.002.903.403.903.70371.73
32.9792.102.603.003.303.70368.72
43.3903.203.403.303.603.50368.58
54.3874.204.604.104.404.60379.10
65.1734.405.104.605.206.30373.84
73.4344.302.602.903.703.50368.91
83.9793.703.104.604.304.10371.02
92.8921.902.502.903.303.50371.95
103.6832.704.003.303.504.60370.07
Mean3.64183.193.373.513.874.12371.62
RMSE/Mean0.0077700.00670.00710.00740.00810.0086

Table 15.

The RMSE and the execution time of the LSTM_UNIV_5 model.

Figure 15.

RMSE vs. day plot of LSTM_UNIV_5 (depicted by tuple#9 in Table 15).

Table 16 exhibits the results of the model LSTM_UNIV_10. The model yields a value of 0.007380 for the ratio of its RMSE to the average open values, while one round of its execution needs 554.47 seconds. The RMSE values for day1 to day5 are 0.0061, 0.0070, 0.0074, 0079, and 0.0083 respectively. The RMSE pattern of the model as per record no 10 in Table 16 is exhibited in Figure 16.

No.Agg RMSEDay1Day2Day3Day4Day5Time (sec)
13.0052.402.402.803.703.50547.22
23.8593.503.303.803.904.70554.03
34.6014.504.504.604.804.60550.24
43.3422.704.003.103.403.50555.50
54.7144.804.404.704.605.10563.44
63.3362.503.203.303.603.90553.83
73.7113.104.004.003.603.90559.31
82.7951.902.402.803.203.40552.50
93.0121.802.802.903.603.50551.20
102.7511.702.303.003.003.30557.39
Mean3.51262.893.333.503.743.94554.47
RMSE/Mean0.0073800.00610.00700.00740.00790.0083

Table 16.

The RMSE and the execution time of the LSTM_UNIV_10 model.

Figure 16.

RMSE vs. day plot of LSTM_UNIV_10 (depicted by tuple#10 in Table 16).

Table 17 shows that the model LSTM_UNIV_ED_10 needs, on average, 307.27 seconds to execute its one round. The average value of the ratio of the RMSE to the average value of the target variable (i.e., the open values) for the model is 0.008350. The daily ratio values for day1 to day 5 of the model are, 0.0067, 0.0078, 0.0085, 0.0090, and 0.0095, respectively. Figure 17 exhibits the pattern of variation of the daily RMSE as per record no 9 in Table 17.

No.Agg RMSEDay1Day2Day3Day4Day5Time (sec)
12.9752.002.703.003.403.60310.28
24.8564.104.605.005.205.30306.22
35.5004.305.205.506.006.40306.08
43.6563.203.403.703.904.10305.64
52.8591.902.602.903.203.40306.03
63.8873.303.603.904.204.40305.34
74.0073.603.704.004.104.50304.69
83.4892.703.203.603.803.90305.26
92.9442.102.803.003.203.50314.37
105.4974.705.105.606.005.90308.78
Mean3.9713.193.694.024.304.50307.27
RMSE/Mean0.0083500.00670.00780.00850.00900.0095

Table 17.

The RMSE and the execution time of the LSTM_UNIV_ED_10 model.

Figure 17.

RMSE vs. day plot of LSTM_UNIV_ED_10 (as per tuple#5 in Table 17).

Table 18 shows that the model LSTM_MULTV_ED_10, on average, requires 634.34 seconds to complete the execution of its one round. For this model, the ratio of the RMSE to the average value of the target variable (i.e., the open values) is 0.010294. The ratios of the daily RMSE to the mean value of open for day1 to day5 are, respectively, 0.0094, 0.0099, 0.0102, 0.0107, and 0.0111. Figure 18 shows the pattern of the daily RMSE values of the model as per record no 10 in Table 18.

No.Agg RMSEDay1Day2Day3Day4Day5Time (sec)
15.8585.505.705.906.006.20631.53
24.0623.603.904.004.204.50617.62
36.6236.206.506.606.806.90640.09
43.6613.203.303.603.904.10624.22
55.8795.805.905.706.006.10632.34
64.8084.204.604.805.105.20644.48
74.6574.104.504.704.905.10631.72
83.8663.303.603.904.104.30633.28
93.9103.303.703.904.204.40647.29
105.6445.305.505.605.906.00640.86
Mean4.8974.504.724.875.115.28634.34
RMSE/Mean0.0102940.00940.00990.01020.01070.0111

Table 18.

The RMSE and the execution time of the LSTM_MULTV_ED_10 model.

Figure 18.

RMSE vs. day plot of LSTM_MULTV_ED_10 (as per tuple#10 in Table 18).

Table 19 depicts that the model LSTM_UNIV_CNN_N_10 requires, on average, 222.48 seconds to finish one round. For this model, the ratio of the RMSE to the average value of the target variable (i.e., the open values) is found to be 0.007916. The daily RMSE values for day1 to day5 are, 0.0065, 0.0074, 0.0080, 0.0085, and 0.0089 respectively. Figure 19 depicts the pattern of variation of the daily RMSE values for this model as per record no 3 in Table 19.

No.Agg RMSEDay1Day2Day3Day4Day5Time (sec)
13.8323.303.503.904.104.30221.18
23.2562.503.003.303.603.80219.74
34.3083.804.004.404.604.60222.59
44.0813.304.004.104.304.50227.95
53.3252.603.003.303.603.90224.46
63.8703.203.703.904.104.10223.40
73.6883.103.403.804.004.10222.89
83.8513.203.603.804.204.40221.87
93.7102.603.404.004.004.40219.74
103.7363.303.703.703.904.10220.96
Mean3.7663.093.533.824.044.22222.48
RMSE/Mean0.0079160.00650.00740.00800.00850.0089

Table 19.

The RMSE and the execution time of the LSTM_UNIV_CNN_10 model.

Figure 19.

RMSE vs. day plot of LSTM_UNIV_CNN_10 (as per tuple#3 in Table 19).

The results of the model LSTM_UNIV_CONV_N_10 are presented in Table 20. The model completes its one round, on average, in 265.97 seconds. The ratio of the RMSE to the average of the open values is 0.007490. The daily RMSE for day1 to day5 are 0.0056, 0.0068, 0.0077, 0.0082, and 0.0088, respectively. Figure 20 shows the patterns of daily RMSE values for this model as per record no 8 in Table 20.

No.Agg RMSEDay1Day2Day3Day4Day5Time (sec)
13.9713.003.604.004.404.60263.84
23.1032.402.803.203.403.60262.06
33.2362.302.903.303.603.80266.47
44.3473.104.004.604.705.00257.43
52.8602.202.502.803.203.40260.05
63.5252.503.603.503.804.00282.27
73.1632.302.803.203.503.80265.26
82.8702.002.602.903.203.50272.18
93.5042.203.103.703.704.40265.46
105.0534.704.405.205.305.60264.66
Mean3.5632.673.233.643.884.17265.97
RMSE/Mean0.0074900.00560.00680.00770.00820.0088

Table 20.

The RMSE and the execution time of the LSTM_UNIV_CONV_10 model.

Figure 20.

RMSE vs. day plot of LSTM_UNIV_CONV_10 (as per tuple#8 in Table 20).

Table 21 summarizes the performance of the ten models proposed in this chapter. We evaluate the models on two metrics and then rank the models on the basis of each metric. The two metrics used for the model evaluation are: (i) an accuracy matric computed as the ratio of the RMSE to the mean value of the target variable (i.e., open values), and (ii) a speed metric as measured by the time (in seconds) required for execution of one round of the model. The number of parameters in each model is also presented. It is noted that the CNN_UNIV_5 model is ranked 1 on its execution speed, while it occupies rank 2 on the accuracy parameter. The CNN_UNIV_10 model, on the other hand, is ranked 2 in terms of its speed of execution, while it is the most accurate model. It is also interesting to note that all the CNN models are faster than their LSTM counterparts. However, there is no appreciable difference in their accuracies except for the multivariate encoder-decoder LSTM model, LSTM_MULTV_ED_10.

No.Model#paramRMSE/MeanRankExec. Time (s)Rank
1CNN_UNIV_52890.0072882174.781
2CNN_UNIV_107690.0069671180.012
3CNN_MULTV_1073730.0094209202.783
4CNN_MULTH_101329650.0081007215.074
5LSTM_UNIV_51822350.0077705371.628
6LSTM_UNIV_101822350.0073803554.479
7LSTM_UNIV_ED_105026010.0083508307.277
8LSTM_MULTV_ED_105058010.01029410634.3410
9LSTM_UNIV_CNN_103472090.0079166222.485
10LSTM_UNIV_CONV_103847770.0074904265.976

Table 21.

Comparative analysis of the accuracy and execution speed of the models.

Another interesting observation is that the multivariate models are found to be inferior to the corresponding univariate models on the basis of the accuracy metric. The multivariate models, CNN_MULTV_10 and LSTM_MULTV_ED_10, are ranked 9 and 10, respectively, under the accuracy metric.

Finally, it is observed that the number of parameters in a model has an effect on its execution time, barring some notable exceptions. For the four CNN models, it is noted that with the increase in the number of parameters, there is a monotone increase in the execution time of the models. For the LSTM models, even though the models, LSTM_UNIV_CNN_10, LSTM_UNIV_CONV_10, and LSTM_UNIV_ED_10, have higher number of parameters than the vanilla LSTM models (i.e., LSTM_UNIV_5 and LSTM_UNIV_10), they are faster in execution. Evidently, the univariate encoder-decoder LSTM models are faster even when they involve a higher number of parameters than the vanilla LSTM models.

Advertisement

5. Conclusion

Prediction of future stock prices and price movement patterns is a challenging task if the stock price time series has a large amount of volatility. In this chapter, we presented ten deep learning-based regression models for robust and precise prediction of stock prices. Among the ten models, four of them are built on variants of CNN architectures, while the remaining six are constructed using different LSTM architectures. The historical stock price records are collected using the Metastock tool over a span of two years at five minutes intervals. The models are trained using the records of the first year, and then they are tested on the remaining records. The testing is carried out using an approach known as walk-forward validation, in which, based on the last one- or two-weeks historical stock prices, the predictions of stock prices for the five days of the next week are made. The overall RMSE and the RMSE for each day in a week are computed to evaluate the prediction accuracy of the models. The time needed to complete one round of execution of each model is also noted in order to measure the speed of execution of the models. The results revealed some very interesting observations. First, it is found that while the CNN models are faster, in general, the accuracies of both CNN and LSTM models are comparable. Second, the univariate models are faster and more accurate than their multivariate counterparts. And finally, the number of variables in a model has a significant effect on its speed of execution except for the univariate encoder-decoder LSTM models. As a future scope of work, we will design optimized models based on generative adversarial networks (GANs) for exploring the possibility of further improving the performance of the models.

References

  1. 1. Asghar, M. Z., Rahman, F., Kundi, F. M., Ahmed, S. Development of stock market trend prediction system using multiple regression. Computational and Mathematical Organization Theory, Vol 25, p. 271-301, 2019. DOI: 10.1007/s10588-019-09292-7
  2. 2. Enke, D., Grauer, M., Mehdiyev, N. Stock market prediction with multiple regression, fuzzy type-2 clustering, and neural network. Procedia Computer Science, Vol 6, p. 201-206, 2011. DOI: 10.1016/j.procs.2011.08.038
  3. 3. Ivanovski, Z., Ivanovska, N., Narasanov, Z. The regression analysis of stock returns at MSE. Journal of Modern Accounting and Auditing, Vol 12, No 4, p. 217-224, 2016. DOI: 10.17265/1548-6583/2016.04.003
  4. 4. Sen, J., Datta Chaudhuri, T. An alternative framework for time series decomposition and forecasting and its relevance for portfolio choice – A comparative study of the Indian consumer durable and small cap sector. Journal of Economics Library, Vol 3, No 2, p. 303 – 326, 2016. DOI: 10.1453/jel.v3i2.787
  5. 5. Adebiyi, A. A., Adewumi, A. O., Ayo, C. K. Comparison of ARIMA and artificial neural networks models for stock price prediction. Journal of Applied Mathematics, Vol 2014, Art ID: 614342, 2014. DOI: 10.1155/2014/614342
  6. 6. Du, Y. Application and analysis of forecasting stock price index based on combination of ARIMA model and BP neural network. In: Proceedings of the IEEE Chinese Control and Decision Conference (CCDC' 18), June 9-10, 2018, Shenyang, China, p. 2854-2857. DOI: 10.1109/CCDC.2018.8407611
  7. 7. Jarrett, J. E., Kyper, E. ARIMA modeling with intervention to forecast and analyze Chinese stock prices. International Journal of Engineering Business Management, Vol 3, No 3, p. 53-58, 2011. DOI: 10.5772/50938
  8. 8. Ning, Y., Wah, L. C., Erdan, L. Stock price prediction based on error correction model and Granger causality test. Cluster Computing, Vol 22, p. 4849-4958, 2019. DOI:10.1007/s10586-018-2406-6
  9. 9. Sen, J., Datta Chaudhuri, T. An investigation of the structural characteristics of the Indian IT sector and the capital goods sector – An application of the R programming language in time series decomposition and forecasting. Journal of Insurance and Financial Management, Vol 1, No 4, p. 68-112, 2016
  10. 10. Sen, J. A robust analysis and forecasting framework for the Indian mid cap sector using time series decomposition. Journal of Insurance and Financial Management, Vol 3, No 4, p. 1-32, 2017. DOI: 10.36227/techrxiv.15128901.v1
  11. 11. Sen, J., Datta Chaudhuri, T. Decomposition of time series data of stock markets and its implications for prediction – An application for the Indian auto sector. In: Proceedings of the 2nd National Conference on Advances in Business Research and Management Practices (ABRMP'16), January 6 – 7, 2016, Kolkata, p. 15-28. DOI: 10.13140/RG.2.1.3232.0241
  12. 12. Sen, J., Datta Chaudhuri, T. A time series analysis-based forecasting framework for the Indian healthcare sector. Journal of Insurance and Financial Management, Vol 3, No 1, p. 66-94, 2017
  13. 13. Sen, J., Datta Chaudhuri, T. A predictive analysis of the Indian FMCG sector using time series decomposition-based approach. Journal of Economics Library, Vol 4, No 2, p. 206-226, 2017. DOI: 10.1453/jel.v4i2.1282
  14. 14. Sen, J., Datta Chaudhuri, T. Understanding the sectors of Indian economy for portfolio choice. International Journal of Business Forecasting and Marketing Intelligence, Vol 4, No 2, p. 178-222, 2018. DOI: 10.1504/IJBFMI.2018.090914
  15. 15. Wang, L., Ma, F., Liu, J., Yang, L. Forecasting stock price volatility: New evidence from the GARCH-MIDAS model. International Journal of Forecasting, Vol 36, N0 2, p. 684-694, 2020. DOI: 10.1016/j.ijforecast.2019.08.005
  16. 16. Zhong, X., Enke, D. Forecasting daily stock market return using dimensionality reduction. Expert Systems with Applications, Vol 67, p. 126-139, 2017. DOI: 10.1016/j.eswa.2016.09.027
  17. 17. Baek, Y., Kim, H. Y. ModAugNet: A new forecasting framework for stock market index value with an overfitting prevention LSTM module and a prediction LSRM module. Expert Systems with Applications, Vol 113, p. 457-480, 2015. DOI: 10.1016/j.eswa.2018.07.019
  18. 18. Bao, W., Yue, J., Rao, Y. A deep learning framework for financial time series using stacked autoencoders and long-and-short-term memory. PLOS ONE, Vol 12, No 7, 2017. DOI: 10.1371/journal.pone.0180944
  19. 19. Chou, J., Nguyen, T. Forward forecast of stock price using sliding-window metaheuristic-optimized machine-learning regression. IEEE Transactions on Industrial Informatics, Vol 14, No 7, p. 3132-3142, 2018, DOI: 10.1109/TII.2018.2794389
  20. 20. Ding, G., Qin, L. Study on prediction of stock price based on the associated network model of LSTM. International Journal of Machine Learning and Cybernetics, Vol 11, p. 1307-1317, 2020. DOI: 10.1007/s13042-019-01041-1
  21. 21. Gocken, M., Ozcalici, M., Boru, A., Dosdogru, A. T. Integrating metaheuristics and artificial neural networks for improved stock price prediction. Expert Systems with Applications, Vol 44, p. 320-331, 2016. DOI: 10.1016/j.eswa.2015.09.029
  22. 22. Mehtab, S., Sen, J. Stock price prediction using convolutional neural networks on a multivariate time series. In: Proceedings of the 3rd National Conference on Machine Learning and Artificial Intelligence (NCMLAI'20), February 1-2, 2020, New Delhi, India. DOI: 10.36227/techrxiv.15088734.v1
  23. 23. Mehtab, S., Sen, J. Time Series Analysis-Based Stock Price Prediction Framework Using Machine Learning and Deep Learning Models. Technical Report No: NSHM_KOL_2020_SCA_DS_1, 2020. DOI: 10.13140/RG.2.2.14022.22085/2
  24. 24. Mehtab, S., Sen, J. Stock price prediction using CNN and LSTM-based deep learning models. In: Proceedings of the IEEE International Conference on Decision Aid Sciences and Applications (DASA'20), November 8-9, 2020, Sakheer, Bahrain, p. 447-453. DOI: 10.1109/DASA51403.2020.9317207
  25. 25. Mehtab, S., Sen, J., Dasgupta, S. Robust analysis of stock price time series using CNN and LSTM-based deep learning models. In: Proceedings of the 4th IEEE International Conference on Electronics, Communication and Aerospace Technology (ICECA'20), November 5-7, 2020, Coimbatore, India, p. 1481-1486. DOI: 10.1109/ICECA49313.2020.9297652
  26. 26. Mehtab, S., Sen, J., Dutta, A. Stock price prediction using machine learning and LSTM-based deep learning models. In: Thampi, S. M., Piramuthu, S., Li, K. C., Beretti, S., Wozniak, M., Singh, D. (eds), Machine Learning and Metaheuristics Algorithms and Applications, SoMMA 2020. Communications in Computer and Information Science, Vol 1366, p. 88-106, Springer, Singapore. DOI: 10.1007/978-981-16-0419-5_8
  27. 27. Mehtab, S., Sen, J. Analysis and forecasting of financial time series using CNN and LSTM-based deep learning models. In: Proceedings of the 2nd International Conference on Advances in Distributed Computing and Machine Learning (ICADCML'21), January 15-16, 2021, Bhubaneswar, India. (Accepted for publication)
  28. 28. Mehtab, S., Sen, J. A time series analysis-based stock price prediction using machine learning and deep learning models. International Journal of Business Forecasting and Marketing Intelligence (IJBFMI), Vol 6, No 4, p. 272-335. DOI: 10.1504/IJBFMI.2020.115691
  29. 29. Ning, B., Wu, J., Peng, H., Zhao, J. Using chaotic neural network to forecast stock index. In: Yu, W., He, H., Zhang, N. (eds.), Advances in Neural Networks. Lecture Notes in Computer Science, Vol 5551, p. 870-876, 2009. DOI: 10.1007/978-3-642-01507-6_98
  30. 30. Patel, J., Shah, S., Thakkar, P., Kotecha, K. Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Systems with Applications, Vol 42, No 1, p. 259-268, 2015. DOI: 10.1016/j.eswa.2014.07.040
  31. 31. Qiao, J., Wang, H. A self-organizing fuzzy neural network and its application to function approximation and forecast modeling. Neurocomputing, Vol 71, Nos 4-6, pp. 564-569, 2008. DOI: 10.1016/j.neucom.2007.07.026
  32. 32. Sen, J. Stock price prediction using machine learning and deep learning frameworks. In: Proceedings of the 6th International Conference on Business Analytics and Intelligence (ICBAI'18), December 20-22, Bangalore, 2018, India
  33. 33. Sen, J., Datta Chaudhuri, T. A robust predictive model for stock price forecasting. In: Proceedings of the 5th International Conference on Business Analytics and Intelligence (BAICONF'17), December 11-13, 2017, Bangalore, India
  34. 34. Sen, J., Dutta, A., Mehtab, S. Profitability analysis in stock investment using an LSTM-based deep learning model. In: Proceedings of the 2nd IEEE International Conference on Emerging Technologies (INCET’21), pp. 1-9, May 21-23, 2021, Belgaum, India. DOI: 10.1109/INCET51464.2021.9456385
  35. 35. Sen, J., Mehtab, S. Accurate stock price forecasting using robust and optimized deep learning models. In: Proceedings of the IEEE International Conference on Intelligent Computing (CONIT’21), June 25-27, 2021, Hubli, India. DOI: 10.1109/CONIT51480.2021.9498565
  36. 36. Senol, D., Ozturan, M. Stock price direction prediction using artificial neural network approach: The case of Turkey. Journal of Artificial Intelligence, Vol 1, No 2, p. 70-77, 2008. DOI: 10.3923/jai.2008.70.77
  37. 37. Shen, J., Fan, H., Chang, S. Stock index prediction based on adaptive training and pruning algorithm. In: Liu, D., Fei, S., Hou, Z., Zhang, H., Sun, C. (eds.), Advances in Neural Networks. Lecture Notes in Computer Science, Springer-Verlag, Vol 4492, p. 457-464, 2007. DOI: 10.1007/978-3-540-72393-6_55
  38. 38. Tseng, K-C., Kwon, O., Tjung, L. C. Time series and neural network forecast of daily stock prices. Investment Management and Financial Innovations, Vol 9, No 1, p. 32-54, 2012
  39. 39. Wu, Q., Chen, Y., Liu, Z. Ensemble model of intelligent paradigms for stock market forecasting. In: Proceedings of the 1st International Workshop on Knowledge Discovery and Data Mining, Washington DC, USA, p. 205-208, 2008. DOI: 10.1109/WKDD.2008.54
  40. 40. Zhang, D., Jiang, Q, Li, X. Application of neural networks in financial data mining. International Journal of Computer, Electrical, Automation, and Information Engineering, Vol 1, No 1, p. 225-228, 2007. DOI: 10.5281/zenodo.1333234
  41. 41. Zhu, X., Wang, H., Xu, L., Li, H. Predicting stock index increments by neural networks: The role of trading volume under different horizons. Expert Systems with Applications, Vol 34, No 4, pp. 3043-3054, 2008. DOI: 10.1016/j.eswa.2007.06.023
  42. 42. Ballings, M., den Poel, D. V., Hespeels, N., Gryp, R. Evaluating multiple classifiers for stock price direction prediction. Expert Systems with Applications, Vol 42, No 20, p. 7046-7056, 2015. DOI: 10.1016/j.eswa.2015.05.013
  43. 43. Bollen, J., Mao, H., Zeng, X. Twitter mood predicts the stock market. Journal of Computational Science, Vol 2, No 1, p. 1-8, 2011. DOI: 10.1016/j.jocs.2010.12.007
  44. 44. Chen, M-Y., Liao, C-H., Hsieh, R-P. Modeling public mood and emotion: Stock market trend prediction with anticipatory computing approach. Computers in Human Behavior, Vol 101, p. 402-408, 2019. DOI: 10.1016/j.chb.2019.03.021
  45. 45. Mehtab, S., Sen, J. A robust predictive model for stock price prediction using deep learning and natural language processing. In: Proceedings of the 7th International Conference on Business Analytics and Intelligence (BAICONF'19), December 5-7, 2019, Bangalore, India. DOI: 10.2139/ssrn.3502624
  46. 46. Nam, K., Seong, N. Financial news-based stock movement prediction using causality analysis of influence in the Korean stock market. Decision Support Systems, Vol 117, p. 100-112, 2019. DOI: 10.1016/j.dss.2018.11.004
  47. 47. Vargas, M. R., de Lima, B. S. L. P., Evsukoff, A. G. Deep learning for stock market prediction from financial news articles. In: Proceedings of the IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement systems and Applications (CIVEMSA'17), June 26-28, 2017, Annecy, France, p. 60-65. DOI: 10.1109/CIVEMSA.2017.7995302
  48. 48. Kim, M., Sayama, H. Predicting stock market movements using network science: An information theoretic approach. Applied Network Science, Vol 2, Article No: 35, 2017. DOI: 10.1007/s41109-017-0055-y
  49. 49. Lin, F-L., Yang, S.-Y., March, T., Chen, Y.-F. Stock and bond return relations and stock market uncertainty: Evidence from wavelet analysis. International Review of Economics & Finance, Vol 55, p. 285-294, 2018. DOI: 10.1016/j.iref.2017.07.013
  50. 50. Akcay, Y., Yalcin, A. Optimal portfolio selection with a shortfall probability constraint: Evidence from alternative distribution functions. Journal of Financial Research, Vol 33, No 1, p. 77-102, 2010. DOI: 10.1111/j.1475-6803.2009.01263.x
  51. 51. Caldeira, J. F., Moura, G. V., Santos, A. A. Yield curve forecast combinations based on bond portfolio performance. Journal of Forecasting, Special Issue Article, 2017. DOI: 101.1002/for.2476
  52. 52. Li, T., Zhang, W., Xu, W. A fuzzy portfolio selection model with background risk. Applied Mathematics and Computation, Vol 256, p. 505-513, 2015. DOI: 10.1016/j.amc.2015.01.007
  53. 53. Liu, Y. J., Zhang, W. G. A multi-period fuzzy portfolio optimization model with minimum transaction lots. European Journal of Operational Research, Vol 242, No 3, p. 933-941, 2015. DOI: 10.1016/j.ejor.2014.10.061
  54. 54. Mehlawat, M. K., Gupta, P. Fuzzy chance-constrained multiobjective portfolio selection model. IEEE Transaction on Fuzzy Systems, Vol 22, No 3, p. 653-671, 2014. DOI: 10.1109/TFUZZ.2013.2272479
  55. 55. Sen, J., Mehtab, S. A comparative study of optimum risk portfolio and eigen portfolio on the Indian stock market. International Journal of Business Forecasting and Marketing Intelligence (IJBFMI), Paper ID: IJBFMI-90288, Inderscience Publishers. (Accepted for publications)
  56. 56. Metastock Tool: http://metastock.com
  57. 57. Geron, A. Hands-On Machine Learning with Scikit-Learn Keras & Tensorflow. O'Reilly Publications, USA, 2019
  58. 58. Shi, X., Chen, Z., Wang, H., Yeung, D-Y., Wong, W-K. Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, December 7 – 12, Cambridge, MA, USA, Vol 1, pp. 802-810

Written By

Jaydip Sen and Sidra Mehtab

Submitted: 28 May 2021 Reviewed: 18 August 2021 Published: 21 September 2021