Design and Analysis of Robust Deep Learning Models for Stock Price Prediction

Building predictive models for robust and accurate prediction of stock prices and stock price movement is a challenging research problem to solve. The well-known efficient market hypothesis believes in the impossibility of accurate prediction of future stock prices in an efficient stock market as the stock prices are assumed to be purely stochastic. However, numerous works proposed by researchers have demonstrated that it is possible to predict future stock prices with a high level of precision using sophisticated algorithms, model architectures, and the selection of appropriate variables in the models. This chapter proposes a collection of predictive regression models built on deep learning architecture for robust and precise prediction of the future prices of a stock listed in the diversified sectors in the National Stock Exchange (NSE) of India. The Metastock tool is used to download the historical stock prices over a period of two years (2013- 2014) at 5 minutes intervals. While the records for the first year are used to train the models, the testing is carried out using the remaining records. The design approaches of all the models and their performance results are presented in detail. The models are also compared based on their execution time and accuracy of prediction.


Introduction
Building predictive models for robust and accurate prediction of stock prices and stock price movement is a very challenging research problem. The well-known efficient market hypothesis precludes any possibility of accurate prediction of future stock prices since it assumes stock prices to be purely stochastic in nature. Numerous works in the finance literature have shown that robust and precise prediction of future stock prices is using sophisticated machine learning and deep learning algorithms, model architectures, and selection of appropriate variables in the models.
Technical analysis of stocks has been a very interesting area of work for the researchers engaged in security and portfolio analysis. Numerous approaches to technical analysis have been proposed in the literature. Most of the algorithms here work on searching and finding some pre-identified patterns and sequences in the time series of stock prices. Prior detection of such patterns can be useful for the investors in the stock market in formulating their investment strategies in the market to maximize their profit. A rich set of such patterns has been identified in the finance literature for studying the behavior of stock price time series.
In this chapter, we propose a collection of forecasting models for predicting the prices of a critical stock of the automobile sector of India. The predictive framework consists of four CNN regression models and six models of regression built on the longand-short-term memory (LSTM) architecture. Each model has a different architecture, different shapes of the input data, and different hyperparameter values.
The current work has the following three contributions. First, unlike the currently existing works in the literature, which mostly deal with time-series data of daily or weekly stock prices, the models in this work are built and tested on stock price data at a small interval of 5 minutes. Second, our propositions exploit the power of deep learning, and hence, they achieve a very high degree of precision and robustness in their performance. Among all models proposed in this work, the lowest ratio of the root mean square error (RMSE) to the average of the target variable is 0.006967. Finally, the speed of execution of the models is very fast. The fastest model requires 174.78 seconds for the execution of one round on the target hardware platform. It is worth mentioning here that the dataset used for training has 19500 records, while models are tested on 20500 records.
The chapter is organized as follows. Section 2 briefly discusses some related works in the literature. In Section 3, we discuss the method of data acquisition, the methodology followed, and the design details of the ten predictive models proposed by us. Section 4 exhibits the detailed experimental results and their analysis. A comparative study of the performance of the models is also made. In Section 5, we conclude the chapter and identify a few new directions of research.

Related Work
The literature on systems and methods of stock price forecasting is quite rich. Numerous proposals exist on the mechanisms, approaches, and frameworks for predicting future stock prices and stock price movement patterns. At a broad level, these propositions can be classified into four categories. The proposals of the first category are based on different variants of univariate and multivariate regression models. Some of the notable approaches under this category are -ordinary least square (OLS) regression, multivariate adaptive regression spline (MARS), penaltybased regression, polynomial regression, etc. [2,13,16,37]. These approaches are not, in general, capable of handling the high degree of volatility in the stock price data. Hence, quite often, these models do not yield an acceptable level of accuracy in prediction. Autoregressive integrated moving average (ARIMA) and other approaches of econometrics such as cointegration, vector autoregression (VAR), causality tests, and quantile regression (QR), are some of the methods which fall under the second category of propositions [1, 12, 17, 33, 38, 40-43, 45, 52, 55]. The methods of this category are superior to the simple regression-based methods. However, if the stock price data are too volatile and exhibit strong randomness, the econometric methods also are found to be inadequate, yielding inaccurate forecasting results. The learning-based approach is the salient characteristic of the propositions of the third category. These proposals are based on various algorithms and architectures of machine learning, deep learning, and reinforcement learning [4, 6, 10, 11, 15, 24-30, 34-36, 39, 44, 46-50, 53, 54, 56]. Since the frameworks under this category use complex predictive models working on sophisticated algorithms and architectures, the prediction accuracies of these models are found to be quite accurate in real-world applications. The propositions of the fourth category are broadly based on hybrid models built of machine learning and deep learning algorithms and architectures and also on the relevant inputs of sentiment and news items extracted from the social web [5,7,9,23,32,51]. These models are found to yield the most accurate prediction of future stock prices and stock price movement patterns. The information-theoretic approach and the wavelet analysis have also been proposed in stock price prediction [18,20]. Several portfolio optimization methods have also been presented in some works using forecasted stock returns and risks [3,8,19,21,22].
In the following, we briefly discuss the salient features of some of the works under each category. We start with the regression-based proposals.
Enke et al. propose a multi-step approach to stock price prediction using a multiple regression model [13]. The proposition is based on a differential-evolutionbased fuzzy clustering model and a fuzzy neural network. Ivanovski et al. present a linear regression and correlation study on some important stock prices listed in the Macedonian Stock Exchange [16]. The results of the work indicate a strong relationship between the stock prices and the index values of the stock exchange. Sen and Datta Chaudhuri analyze the trend and the seasonal characteristics of the capital goods sector and the small-cap sector of India using a time series decomposition approach and a linear regression model [ 37].
Among the econometric approaches, Du proposes an integrated model combining an ARIMA and a backpropagation neural network for predicting the future index values of the Shanghai Stock Exchange [12]. Jarrett and Kyper present an ARIMAbased model for predicting future stock prices [17]. The study conducted by the authors reveals two significant findings: (i) higher accuracy is achieved by models involving fewer parameters, and (ii) the daily return values exhibit a strong autoregressive property. Sen and Datta Chaudhuri different sectors of the Indian stock market using a time series decomposition approach and predict the future stock prices using different types of ARIMA and regression models [38,[40][41][42][43][44][45]. Zhong and Enke present a gamut of econometric and statistical models, including ARIMA, generalized autoregressive conditional heteroscedasticity (GARCH), smoothing transition autoregressive (STAR), linear and quadratic discriminant analysis [55].
Machine learning and deep learning models have found widespread applications in designing predictive frameworks for stock prices. Baek and Kim propose a framework called ModAugNet, which is built on an LSTM deep learning model [ 4]. Chou and Nguyen preset a sliding window metaheuristic optimization method for stock price prediction [10]. Gocken et al. propose a hybrid artificial neural network using harmony search and genetic algorithms to analyze the relationship between various technical indicators of stocks and the index of the Turkish stock market [15]. Mehtab and Sen propose a gamut of models designed using machine learning and deep learning algorithms and architectures for accurate prediction of future stock prices and movement patterns [24-30, 46, 47]. The authors present several models which are built on several variants of convolutional neural networks (CNNs) and long-and-short-term memory networks (LSTMs) that yield a very high level of prediction accuracy. Zhang et al. present a multi-layer perceptron for financial data mining that is capable of recommending buy or sell strategies based on forecasted prices of stocks [55].
The hybrid models use relevant information in the social web and exploit the power of machine learning and deep learning architectures and algorithms for making predictions with a high level of accuracy. Among some well-known hybrid models, Bollen et al. present a scheme for computing the mood states of the public from the Twitter feeds and use the mood states information as an input to a nonlinear regression model built on a self-organizing fuzzy neural network [7]. The model is found to have yielded a prediction accuracy of 86%. Mehtab and Sen propose an LSTM-based predictive model with a sentiment analysis module that analyzes the public sentiment on Twitter and produces a highly accurate forecast of future stock prices [23]. Chen et al. present a scheme that collects relevant news articles from the web, converts the text corpus into a word feature set, and feeds the feature set of words into an LSTM regression model to achieve a highly accurate prediction of the future stock prices [9].
The most formidable challenge in designing a robust predictive model with a high level of precision for stock price forecasting is handling the randomness and the volatility exhibited by the time series. The current work utilizes the power of deep learning models in feature extraction and learning while exploiting their architectural diversity in achieving robustness and accuracy in stock price prediction on very granular time series data.

Methodology
We propose a gamut of predictive models built on deep learning architectures. We train, validate, and then test the models based on the historical stock price records of a well-known stock listed in the NSE, viz. Century Textiles. The historical prices of Century Textiles stock from 31st Dec 2012, a Monday to 9th Jan 2015, a Friday, are collected at 5 minutes intervals using the Metastock tool [31]. We carry out the training and validation of the models using the stock price data from 31st Dec 2012 to 30th Dec 2013. The models are tested based on the records for the remaining period, i.e., from 31st Dec 2013, to 9th Jan 2015. For maintaining uniformity in the sequence, we organize the entire dataset as a sequence of daily records arranged on a weekly basis from Monday to Friday. After the dataset is organized suitably, we split the dataset into two parts -the training set and the test set. While the training dataset consists of 19500 records, there are 20500 tuples in the test data. Every record has five attributes -open, high, low, close, and volume. We have not considered any adjusted attribute (i.e., adjusted close, adjusted volume, etc.) in our analysis.
We design ten regression models for stock price forecasting using a deep learning approach. For the univariate models, the objective is to forecast the future values of the variable open based on its past values. On the other hand, for the multivariate models, the job is to predict the future values of open using the historical values of all the five attributes in the stock data. The models are tested following an approach known as multi-step prediction using a walk-forward validation [24]. In this method, we use the training data for constructing the models. The models are then used for predicting the daily open values of the stock prices for the coming week. As a week completes, we include the actual stock price records of the week in the training dataset. With this extended training dataset, the open values are forecasted with a forecast horizon of 5 days so that the forecast for the days in the next week is available. This process continues till all the records in the test dataset are processed.
The suitability of CNNs in building predictive models for predicting future stock prices has been demonstrated in our previous work [24]. In the current work, we present a gamut of deep learning models built on CNN and LSTM architectures and illustrate their efficacy and effectiveness in solving the same problem.
CNNs perform two critical functions for extracting rich feature sets from input data. These functions are: (1) convolution and (2) pooling or sub-sampling [14]. A rich set of features is extracted by the convolution operation from the input, while the sub-sampling summarizes the salient features in a given locality in the feature space. The result of the final sub-sampling in a CNN is passed on to possibly multiple dense layers. The fully connected layers learn from the extracted features. The fully connected layers provide the network with the power of prediction.
LSTM is an adapted form of a recurrent neural network (RNN) and can interpret and then forecast sequential data like text and numerical time series data [14]. The networks have the ability to memorize the information on their past states in some designated cells in memory. These memory cells are called gates. The information on the past states, which is stored in the memory cells, is aggregated suitably at the forget gates by removing the irrelevant information. The input gates, on the other hand, receive information available to the network at the current timestamp. Using the information available at the input gates and the forget gates, the computation of the predicted values of the target variable is done by the network. The predicted value at each timestamp is made available through the output gate of the network [14].
The deep learning-based models we present in this paper differ in their design, structure, and dataflows. Our proposition includes four models based on the CNN architecture and six models built on the LSTM network architecture. The proposed models are as follows. The models have been named following a convention. The first part of the model's name indicates the model type (CNN or LSTM), the second part of the name indicates the nature of the input data (univariate or multivariate). Finally, the third part is an integer indicating the size of the input data to the model (5 or 10). The ten models are as follows: We present a brief discussion on the model design. All the hyperparameters (i.e., the number of nodes in a layer, the size of a convolutional, LSTM or pooling layer, etc.) used in all the models are optimized using grid-search. However, we have not discussed the parameter optimization issues in this work.

The CNN_UNIV_5 model
This CNN model is based on a univariate input of open values of the last week's stock price records. The model forecasts the following five values in the sequence as the predicted daily open index for the coming week. The model input has a shape (5, 1) as the five values of the last week's daily open index are used as the input. Since the input data for the model is too small, a solitary convolutional block and a subsequent max-subsampling block are deployed. The convolutional block has a feature space dimension of 16 and the filter (i.e., the kernel) size of 3. The convolutional block enables the model to read each input three times, and for each reading, it extracts 16 features from the input. Hence, the output data shape of the convolutional block is (3,16). The max-pooling layer reduces the dimension of the data by a factor of 1/2. Thus, the max-pooling operation transforms the data shape to (1,16). The result of the max-pooling layer is transformed into an array structure of one dimension by a flattening operation. This one-dimensional vector is then passed through a dense layer block and fed into the final output layer of the model. The  We compute the number of trainable parameters in the CNN_UNIV_5 model. As the role of the input layer is to provide the input data to the network, there is no learning involved in the input layer. There is no learning in the pooling layers as all these layers do is calculate the local aggregate features. The flatten layers do not involve any learning as well. Hence, in a CNN model, the trainable parameters are involved only in the convolutional layers and the dense layers. The number of trainable parameters (n1) in a one-dimensional convolutional layer is given by (1), where k is the kernel size, and d and f are the sizes of the feature space in the previous layer and the current layer, respectively. Since each element in the feature space has a bias, the term 1 is added in (1) The number of parameters (n2) in a dense layer of a CNN is given by (2), in which and refer to the node count in the current layer and the previous layer, respectively. The second term on the right-hand side of (2) refers to the bias terms for the nodes in the current layer.
The computation of the number of parameters in the CNN_UNIV_5 model is presented in Table 1. It is observed that the model involves 289 trainable parameters. The number of parameters in the convolutional layer is 64, while the two dense layers involve 170 and 55 parameters, respectively. The structure and the data flow for this model are identical to the CNN_UNIV_5 model. However, the input of the model has a shape of (10, 1). We use 70 epochs and 16 batch-size for training the model. Figure 2 shows the architecture of the model CNN_UNIV_10. The computation of the number of parameters in the model CNN_UNIV_10 is exhibited in Table 2.  It is evident from Table 2 that the CNN_UNIV_10 involves 769 trainable parameters. The parameter counts for the convolutional layer, and the two dense layers are 84, 650, and 55, respectively.

The CNN_MULTV_10 model
This CNN model is built on the input of the last two weeks' multivariate stock price records data. The five variables of the stock price time series are used in a CNN in five separate channels. The model uses a couple of convolutional layers, each of size (32,3). The parameter values of the convolutional blocks indicate that 32 features are extracted from the input data by each convolutional layer using a feature map size of 32 and a filter size of 3. The input to the model has a shape of (10, 5), indicating ten records, each record having five features of the stock price data. After the first convolutional operation, the shape of the data is transformed to (8,32). The value 32 corresponds to the number of features extracted, while the value 8 is obtained by the formula: f = (k -n) +1, where, k = 10, n = 3, hence, f = 8. Similarly, the output data shape of the second convolutional layer is (6,32). A max-pooling layer reduces the feature space size by a factor of 1/2 producing an output data shape of (3,32). The max-pooling block's output is then passed on to a third convolutional layer with a feature map of 16 and a kernel size of 3. The data shape of the output from the third convolutional layer becomes (1,16) following the same computation rule. Finally, another max-pooling block receives the results of the final convolutional layer. This block does not reduce the feature space since the input data shape to it already (1,16). Hence, and the output of the final max-pooling layer remains unchanged to (1,16). A flatten operation follows that converts the 16 arrays containing one value to a single array containing 16 values. The output of the flatten operation is passed on to a fully connected block having 100 nodes. Finally, the output block with five nodes computes the predicted daily open index of the coming week. The epochs size and the batch size used in training the model are 70 and 16, respectively. Figure 3 depicts the CNN_MULTV_10 model. Table 3 shows the computation of the number of trainable parameters involved in the model.  Table 3, it is observed that the total number of trainable parameters in the model CNN_MULTV_10 is 7373. The three convolutional layers conv1d_4, conv1d_5, and conv1d_6 involve 512, 3014, and 1552 parameters, respectively. It is to be noted that the value of k for the first convolutional layer, conv1d_4, is multiplied by a factor of five since there are five attributes in the input data for this layer. The two dense layers, dense_3 and dense_4 include 1700 and 505 parameters, respectively.

The CNN_MULTH_10 model
This CNN model uses a dedicated CNN block for each of the five input attributes in the stock price data. In other words, for each input variable, a separate CNN is used for feature extrication. We call this a multivariate and multi-headed CNN model. For each sub-CNN model, a couple of convolutional layers were used. The convolutional layers have a feature space dimension of 32 and a filter size (i.e., kernel size) of 3. The convolutional layers are followed by a max-pooling layer. The size of the feature space is reduced by a factor of 1/2 by the max-pooling layer. Following the computation rule discussed under the CNN_MULTV_10 model, the data shape of the output from the max-pooling layer for each sub-CNN model is (3,32). A flatten operation follows converting the data into a single-dimensional array of size 96 for each input variable.   Table 4 presents the necessary calculations for finding the number of parameters in the CNN_MULTH_10 model. Each of the five convolutions layers, conv1d_1, conv1d_3, conv1d_5, conv1d_7, and convid_9, involves 128 parameters. For each of these layers, k = 3, d = 1 and f = 3, and hence the number of trainable parameters is: (3 * 1 + 1) * 32 = 128. Hence, for the five convolutional layers, the total number of parameters is 128 * 5 = 640. Next, for each of the five convolutional layers, conv1d_2, conv1d_4, conv1d_6, conv1d_8, and con1d_10, involves 3104. Each layer of this group has k = 3, d = 32, and f = 32. Hence the number of trainable parameters for each layer is: (3*32 + 1) * 32 = 3104. Therefore, for the five convolutional layers, the total number of parameters is 3104 * 5 = 15, 520. The dense layers, dense_1, dense_2, and dense_3 involve 96200, 20100, and 505 parameters using (2). Hence, the model includes 132,965 parameters.   As we did in the case of the CNN models, we now compute the number of parameters involved in the LSTM model. The input layers do not have any parameters, as the role of these layers is to just receive and forward the data. There are four gates in an LSTM network that have the same number of parameters. These four gates are known as (i) forget gate, (ii) input gate, (iii) input modulation gate, and the output gate. The number of parameters (n1) in each of the gates in an LSTM network is computed using (3), where x denotes the number of LSTM units, and y is the input dimension (i.e., the number of features in the input data) Hence, the total number of parameters in an LSTM layer will be given by 4 * n1. The number of parameters (n2) in a dense layer of an LSTM network is computed using (4), where and are the number of nodes in the previous layer and the current layer, respectively. The bias parameter of each node in the current layer is represented by the last term on the right-hand side of (4).
The computation of the number of parameters associated with the model LSTM_UNIV_5 is depicted in Table 5. In Table 5  Only, the input data shape is different. The input data shape of this model is (10, 1). Figure 6 presents the architecture of this model. Table 6 presents the computation of the number of parameters involved in the modelLSTM_UNIV_10. Since the number of parameters in the LSTM layers depends only on the number of features in the input data and the node-count in the LSTM layer, and not on the number of input records in one epoch, the model LSTM_UNIV_10 has an identical number of parameters in the LSTM layer as that of the model LSTM_UNIV_5. Since both the models have the same number of dense layers and have the same architecture for those layers, the total number of parameters for both the models are the same.   nodes (determined using the grid-search procedure). The input data shape to the encoder LSTM is (10, 1). The encoding layer yields a one-dimensional vector of size 200 -each value corresponding to the feature extracted by a node in the LSTM layer from the ten input values received from the input layer. Corresponding to each timestamp of the output sequence (there are five timestamps for the output sequence for the five forecasted open values), the input data features are extracted once. Hence, the data shape from the repeat vector layer's output is (5,200). It signifies that 200 features are extracted from the input for each of the five timestamps corresponding to the model's output (i.e., forecasted) sequence. The second LSTM block decodes the encoded features using 200 nodes. The decoded result is passed on to a dense layer. The dense layer learns from the decoded values and predicts the future five values of the target variable (i.e., open) for the coming week through five nodes in the output layer. However, the forecasted values are not produced in a single timestamp. The forecasted values for the five days are made in five rounds. The round-wise forecasting is done using a TimeDistributedWrapper function that synchronizes the decoder LSTM block, the fully connected block, and the output layer in every round. The number of epochs and the batch sizes used in training the model are 70 and 16, respectively. Figure 7 presents the structure and the data flow of the LSTM_UNIV_ED_10 model.
The computation of the number of parameters in the LSTM_UNIV_ED_10 model is shown in Table 7. The input layer and the repeat vector layer do not involve any learning, and hence these layers have no parameters. On the other hand, the two LSTM layers, lstm_3 and lstm_4, and the two dense layers, time_distributed_3, and

The LSTM_MULTV_ED_10 model
This model is a multivariate version of LSTM_UNIV_ED_10. It uses the last couple of weeks' stock price records and includes all the five attributes, i.e., open, high, low, close, and volume. Hence, the input data shape for the model is (10,5). We use a batch size of 16 while training the model over 20 epochs. Figure 8 depicts the architecture of the multivariate encoder-decoder LSTM model. Table 8 shows the number of parameters in the LSTM_MULTV_ED_10 model. The computation of the parameters for this model is exactly similar to that for the model LSTM_UNIV_ED_50 expect for the first LSTM layer. The number of parameters in the first LSTM (i.e., the encoder) layer for this model will be different since the number of parameters is dependent on the count of the features in the input data. The computation of the parameter counts in the encoder LSTM layer, lstm_1, of the model is done as follows: 4 * [(200 + 5) * 200 + 200] = 164800. The total number of parameters for the model is found to be 505801.   Figure 9: The schematic architecture of the model LSTM_UNIV_CNN_10

The LSTM_UNIV_CNN_10 model
This model is a modified version of the LSTM_UNIV_ED_N_10 model. A dedicated CNN block carries out the encoding operation. CNNs are poor in their ability to learn from sequential data. However, we exploit the power of a one-dimensional CNN in extracting important features from time-series data. After the feature extraction is done, the extracted features are provided as the input into an LSTM block. The LSTM block decodes the features and makes robust forecasting of the future values in the sequence. The CNN block consists of a couple of convolutional layers, each of which has a feature map size of 64 and a kernel size of 3. The input data shape is (10, 1) as the model uses univariate data of the target variable of the past couple of weeks. The output shape of the initial convolutional layer is (8,64). The value of 8 is arrived at using the computation: (10-3+1), while 64 refers to the feature space dimension.
Similarly, the shape of the output of the next convolutional block is (6,64). A maxsubsampling block follows, which contracts the feature-space dimension by 1/2. Hence, the output data shape of the max-pooling layer is (3,64). The max-pooling layer's output is flattened into an array of single-dimension and size 3*64 = 192. The flattened vector is fed into the decoder LSTM block consisting of 200 nodes. The decoder architecture remains identical to the decoder block of the LSTM_UNIV_ED_10 model. We train the model over 20 epochs, with each epoch using 16 records. The structure and the data flow of the model are shown in Figure 9.  Table 9 presents the computation of the number of parameters in the model LSTM_UNIV_CNN_10. The input layer, the max-pooling layer, the flatten operation, and the repeat vector layer do not involve any learning, and hence they have no parameters. The number of parameters in the first convolutional layer is computed as follows: (3 + 1) * 64 = 256. For the second convolutional layer, the number of parameters is computed as: (3 * 64 + 1) * 64 = 12352. The number of parameters for the LSTM layer is computed as follows: 4 * [(200 + 192) * 200 + 200] = 314400. In the case of the first dense layer, the number of parameters is computed as follows: (200 * 100 + 100) = 20100. Finally, the number of parameters in the second dense layer is computed as (100 * 1 + 1) = 101. The total number of parameters in the model is found out to be 347209.

The LSTM_UNIV_CONV_10 model
This model is a modification of the LSTM_UNIV_CNN_10 model. The encoder CNN's convolution operations and the decoding operations of the LSTM sub-module are integrated for every round of the sequence in the output. This encoder-decoder model is also known as the Convolutional-LSTM model [13]. This integrated model reads sequential input data, performs convolution operations on the data without any explicit CNN block, and decodes the extracted features using a dedicated LSTM block. The Keras framework contains a class, ConvLSTM2d, capable of performing twodimensional convolution operations [13]. The two-dimensional ConvLSTM class is tweaked to enable it to process univariate data of one dimension. The architecture of the model LSTM_UNIV_CONV_10 is represented in Figure 10.  The computation of the number of parameters for the LSTM_UNIV_CONV_10 model is shown in Table 10. While the input layer, the flatten operation, and the repeat vector layer do not involve any learning, the other layers include trainable parameters. The number of parameters in the convolutional LSTM layer (i.e.,

Performance Results
We present the results on the performance of the ten deep learning models on the dataset we prepared. We also compare the performances of the models. For designing a robust evaluation framework, we execute every model over ten rounds. The average performance of the ten rounds is considered as the overall performance of the model.  Table 11 shows the results of the performance of the CNN_UNIV_5 model. The model takes, on average, 174.78 seconds to finish its one cycle of execution. For this model, the ratio of RMSE to the mean open values is 0.007288. The ratio of the RMSE to the average of the actual open values for day1 through day5 are 0.0062, 0.0066, 0.0073, 0.0078, and 0.0083, respectively. Here, day1 refers to Monday, and day5 is Friday. In all subsequent Tables, we will use the same notations. The RMSE values of the model CNN_UNIV_N_5 plotted on different days in a week are depicted in Figure  11 as per record no 2 in Table 11.  Figure 11: RMSE vs. day plot of CNN_UNIV_5 (depicted by tuple#2 in Table 11)  Figure 12 presents the RMSE values for the results of round 7 in Table 12.  Figure 12: RMSE vs. day plot of CNN_UNIV_10 (depicted by tuple#7 in Table 12)   Figure  13 based on record number 6 of Table 13.   Table 14)  Figure 14 as per record no 4 in Table 14.  Figure 15: RMSE vs. day plot of LSTM_UNIV_5 (depicted by tuple#9 in Table 15) The results of the LSTM_UNIV_5 model are depicted in Table 15. The average time needed to complete one round of the model is 371.62 seconds. The ratio of the RMSE and the average value of the target variable is 0.007770. The RMSE values for day1 to day5 are 0.0067, 0.0071, 0.0074, 0.0081, and 0.0086, respectively. The pattern of variation of the daily RMSE is as per record no 9 in Table 15 is depicted in Figure 15.   Table 16 is exhibited in Figure 16. Figure 16: RMSE vs. day plot of LSTM_UNIV_10 (depicted by tuple#10 in Table 16)    Figure 18 shows the pattern of the daily RMSE values of the model as per record no 10 in Table 18.  Figure 18: RMSE vs. day plot of LSTM_MULTV_ED_10 (as per tuple#10 in Table 18)   Figure 19 depicts the pattern of variation of the daily RMSE values for this model as per record no 3 in Table 19. Figure 19: RMSE vs. day plot of LSTM_UNIV_CNN_10 (as per tuple#3 in Table 19) The results of the model LSTM_UNIV_CONV_N_10 are presented in  Figure 20 shows the patterns of daily RMSE values for this model as per record no 8 in Table 20.   Table 20)  Table 21 summarizes the performance of the ten models proposed in this chapter. We evaluate the models on two metrics and then rank the models on the basis of each metric. The two metrics used for the model evaluation are: (i) an accuracy matric computed as the ratio of the RMSE to the mean value of the target variable (i.e., open values), and (ii) a speed metric as measured by the time (in seconds) required by the model for execution of its one round. The number of parameters in each model is also presented. It is noted that the CNN_UNIV_5 model is ranked 1 on its execution speed, while it occupies rank 2 on the accuracy parameter. The CNN_UNIV_10 model, on the other hand, is ranked 2 in terms of its speed of execution, while it is the most accurate model. It is also interesting to note that all the CNN models are faster than their LSTM counterparts. However, there is no appreciable difference in their accuracies except for the multivariate encoder-decoder LSTM model, LSTM_MULTV_ED_10.
Another interesting observation is that the multivariate models are found to be inferior to their corresponding univariate models on the basis of the accuracy metric. The multivariate models, CNN_MULTV_10 and LSTM_MULTV_ED_10, are ranked 9 and 10, respectively, under the accuracy metric.
Finally, it is observed that the number of parameters in a model has an effect on its execution time, barring some notable exceptions. For the four CNN models, it is noted that with the increase in the number of parameters, there is a monotone increase in the execution time of the models. For the LSTM models, even though the models, LSTM_UNIV_CNN_10, LSTM_UNIV_CONV_10, and LSTM_UNIV_ED_10, have higher number of parameters than the vanilla LSTM models (i.e., LSTM_UNIV_5 and LSTM_UNIV_10), they are faster in execution. Evidently, the univariate encoderdecoder LSTM models are faster even when they involve a higher number of parameters than the vanilla LSTM models.

Conclusion
Prediction of future stock prices and price movement patterns is a challenging task if the stock price time series has a large amount of volatility. In this chapter, we presented ten deep learning-based regression models for robust and precise prediction of stock prices. Among the ten models, four of them are built on variants of CNN architectures, while the remaining six are constructed using different LSTM architectures. The historical stock price records are collected using the Metastock tool over a span of two years at five minutes intervals. The models are trained using the records of the first year, and then they are tested on the remaining records. The testing is carried out using an approach known as walk-forward validation, in which, based on the last one-or two-weeks historical stock prices, the predictions of stock prices for the five days of the next week are made. The overall RMSE and the RMSE for each day in a week are computed to evaluate the prediction accuracy of the models. The time needed to complete one round of execution of each model is also noted in order to measure the speed of execution of the models. The results revealed some very interesting observations. First, it is found that while the CNN models are faster, in general, the accuracies of both CNN and LSTM models are comparable. Second, the univariate models are faster and more accurate than their multivariate counterparts. And finally, the number of variables in a model has a significant effect on its speed of execution except for the univariate encoder-decoder LSTM models. As a future scope of work, we will design optimized models based on generative adversarial networks (GANs) for exploring the possibility of further improving the performance of the models.