Open access peer-reviewed chapter

Time Series Forecasting on COVID-19 Data and Its Relevance to International Health Security

Written By

Steven Kraamwinkel

Submitted: 30 January 2022 Reviewed: 13 April 2022 Published: 15 June 2022

DOI: 10.5772/intechopen.104920

From the Edited Volume

Contemporary Developments and Perspectives in International Health Security - Volume 3

Edited by Stanislaw P. Stawicki, Ricardo Izurieta, Michael S. Firstenberg and Sagar C. Galwankar

Chapter metrics overview

184 Chapter Downloads

View Full Metrics


The Corona virus pandemic is the most tragic virus outbreak in more than a century. Corona has globally already taken the lives of four million people, across all continents. The virus has the potential to become very catastrophic, if a significant part of the world population does not have any form of immunity against it. In this project, the aim is to make forecasts on the number of daily infections in the Netherlands. Seven different models were implemented to forecast the number of infected people in a three-month time period. The sequential CNN model outperformed all other models substantially. The capabilities of CNN models in time series forecasting can be very encouraging in conducting more research on time series data with convolutional neural networks.


  • time series forecasting
  • data wrangling
  • convolutional neural network
  • machine learning
  • time series analysis
  • time series modeling
  • computational intelligence

1. Introduction

COVID-19 is currently a pandemic that has very serious impacts in today society. It does not only affect the health of individuals, but has also economical, social, cultural, and political consequences. Therefore is it of great importance to have Covid-19 data visible for as much possible, and to train models that are able to forecast the number of new infections on Covid-19, and to learn from them before new Covid mutations, or newer diseases occur.

The objective of this research was to make COVID-19 data more visible and interpretable, and having at least two statistical models that are able to make forecasts on COVID-19 cases using machine or deep learning techniques. It can be beneficial for future pandemics, to have models that are able to forecast and predict the possible impact a new virus can have not only on the health care sector, but on society as a whole.

Different artificial intelligence (AI) techniques can be used to aid scientists in medical and biological fields that are doing research in virology and pandemic control. And for society, more predictive analysis and data visualization that is understandable for the average person, means more awareness about past, current, and possibly future development of a virus pandemic.

1.1 The Corona virus in detail

The Corona virus, also named Covid-19, is a disease that is caused by the novel severe acute respiratory syndrome, and an infected patient shows well known symptoms such as fever, cough, and fatigue. Since the virus is able to spread via human-to-human contact from any person who is infected, the virus turned into a very catastrophic pandemic [1, 2].

The corona virus spread across the globe in less than six months from the first cases originating from the city of Wuhan in China. The World Health Organization has officially declared the corona virus as a global pandemic, and specific countries have regulated measures to reduce and overcome the severe effects of the virus.

1.2 Problem identification

The aim of this research is to implement multiple machine learning and computational intelligence (CI) models that are able to predict the number of corona infections over a certain specified time interval. In basic ML, there are two main types of learning, supervised learning and unsupervised learning. Considering the prediction of the number of corona infections in the time interval February 2021–April 2021, after observing input data from the number of infections of time interval April 2020–January 2021, the problem can be framed as a supervised learning problem. In supervised learning, an algorithm learns to make associations, also called mappings, between certain input that in many cases originates from a dataset, and certain output. Each sample originating from a dataset has an input component (x), with a corresponding output component (y). Supervised learning also aims at approximating the real underlying mapping between those inputs and outputs [3, 4].

Since the goal is to predict future values on the number of corona virus infections, the problem can also be identified as a regression type problem. This is the case, because numerical values are predicted, and given a specific input, an output in the form of a function f:n1k is produced. Regression type problems, can be both linear and nonlinear.

1.3 Introduction to time series data and time series forecasting

Data that registers the spread and development of the corona pandemic, always takes a specific time point into account [5]. Also, all databases recording corona infections, hospitalization, and deaths always have counts of individuals taken at specific points in time. Therefore the underlying data can be considered as time series data.

Time series data can be defined as a one-dimensional time ordered sequence of values of a variable that has an attached time dependent component. It are considered measurements of any type that are observed sequentially over time or at regular time intervals [4, 6, 7, 8, 9].

In many cases, a time series can be identified as a vector of type: {x1,x2,.,xn}, where each element xtm is an array of m values [5]. Time series data is always temporal data. This means that data is organized over time, with a time attribute being an index of the observations in the dataset [9]. Time series can be modeled in various domains, ranging from financial and stock market data, to weather and earthquake forecasting, as well as pandemics modeling and medicine intake [10].

Time series forecasting is an area of research, that is aimed at the analysis of past observations of a random variable, to develop a model that best captures underlying relationships and patterns. It contains also the prediction of future values of a random variable [1], as accurate as possible, with data that has a time component attached. All information to make any forecast is available, including historical data and knowledge of any future events that might impact the forecasts.

A typical time series forecasting model can be formulated as: Xt+1=f(xt,xt+1,,xtn+1), where xt is the time series data. In time series data, every point xt can be formulated by: xt=ft+st+ct+et, which is the sum of the trend, seasonal, cyclical, and irregular components [11].

Time series forecasting can be divided into one-step forecasting, and multi-step forecasting. In one-step forecasting, the next time step is computed using the historical inputs. In multi-step forecasting, the forecast of the previous time-step is used as an input, and combined with the historical data produces the output of the next multiple time steps [4].

Forecasting is different than prediction, since forecasting considers a temporal dimension, which is always contained in any time series data. In such temporal dimension, future forecasts are always dependent on the current situation. This makes forecasting and modeling forecasts more difficult than predictive analysis [12].

Many people wrongfully assume that forecasting future values is not possible. In fact, there are computational intelligence models that can capture data patterns, and are able to make better forecasts than random guessing, and also show better performance than simple models that make average or naive forecasts. In such models, not the exact future is predicted, but it can be estimated from available real world data [13].

Time series forecasts can aid many professional in their area of work in guiding their future actions and decision making processes. For example, it can advice medical practitioners to determine the course of a treatment with a patient.

In academics, time series forecasting is considered to be one of the most profitable data mining methods, and a core skill in the data analytics field, but also a relative difficult one, and a relative unknown field of research [14, 15, 16, 17, 18].

The key challenge in time series datasets is the presence of time-dependent confounding variables. It is still a tremendous challenge and even an obstacle to adjust a time series model for these time-dependent confounding variables, and many forecasts made today still contain certain biases in their results [19].


2. Models for time series forecasting

Despite its relative neglect compared to other research area’s in artificial intelligence, there have been numerous and also different types of models developed for time series forecasting. Two groups of models that have recently and in the past been refined for time series data are the classical machine learning models and deep learning neural networks.

2.1 Baseline models

Over the twentieth century simple and sometimes effective baseline models have been developed, that are able to make forecasts, somewhat better than random guessing. Two models that can efficiently and effectively be implemented as a baseline model in any time series modeling problem are:

  • The naive model, also called a persistence model. A model that uses the last seen observation as the forecast [20], and assumes that all future values are equal to the last observed point [21].

  • The simple average model, which uses as the forecasting value, the average value over all previously observed input values [4].

2.2 Classical machine learning models

Classical machine learning models also named statistical models, were originally developed from the 1960’s and later decades for predictive analysis. These models use the variables historical past to predict future observations [9], and are linear. Some of these methods, like ARIMA and Exponential Smoothing, are still widely used. This is mainly because of their high accuracy, robustness, efficiency, and the fact that they can be used by non-experts in machine learning [22]. Most of these methods make use of a concept called lagged prediction. This means that for a prediction of time t, it relies on t-1 and so on, all the way until t-n. In other words, it relies on data points that are in the previous period of time [23].

2.2.1 Auto regressive model

A statistical model where the value of interest is forecasted using a linear combination of past values of the variable, is called an autoregressive model [24]. Autoregression is a term that indicates that the predicted values are a regression of the current value of that variable against one or more prior values. The autoregressive (AR(p)) model is a stochastic model, that assumes some form of randomness in data. This means that future forecasts can be made with high accuracy, but not very close to being 100% accurate [9].

These models were developed from the concept that models are developed by regressing on previous values, also called lag terms [19]. An autoregressive model (AR(p)) can be formulated by:


where ϵt is white noise or error term, and ϕ1,.,ϕn are parameters [6, 7].

2.2.2 Moving average model

A moving average model is a model based on error lag regression. It is a stochastic process whose output values are linearly dependent on the weighted sum of a white noise error and the error term from previous time values [19, 20, 24].

A moving average model (MA(q)), builds a function of error terms of the past [11], and is basically the weighted sum of the current and past random errors [9]. A first order MA(q) model can be expressed by:


A higher-order MA(q) model can be expressed by the following formula:


2.2.3 ARIMA

A very popular and in many situations also an effective model in time series forecasting, is the ARIMA model, which is the acronym for AutoRegresive Integrated MovingAverage model. ARIMA models combine the AR and MA models, with an integrated (I-part) to an ARIMA model, that can make any data stationary by means of differencing [11, 24]. The model was orginally developed by the famous statisticians Box and Jenkins in 1968.

The purpose of an ARIMA model is to describe autocorrelations in time series data [6, 21].

An ARIMA model is a typical linear model, that assumes a linear correlation between the time-series values. It makes use of these linear dependencies to extract local patterns, and removes high-frequency noise from the underlying data [1, 16, 25].

ARIMA models have proven to be very accurate forecasting models for short-term forecasting, when there is a scarcity of trainable data [26]. It is arguably one of the most popular and widely used linear models in time series forecasting, due to its great flexibility and performance [16].

Any ARIMA model has three main hyperparameters; p, d, and q. The p parameter stands for the number of lag observations, the d parameter defines the degree of differencing, and the q parameter describes the previous error terms used to predict the future value [8, 9, 20, 26, 27].

The values for the p, d, and q values can be determined after plotting the ACF and PACF plot. ARIMA models are relatively simple to construct, and often show better performance that more complex, structural models [28].

Eq. (4) below shows the mathematical ARIMA model, in the following formula:


where α is the intercept term, β1 ….. βp are lag coefficients, θ1…… θq are the moving average coefficients, and ϵt1ϵtq are errors.

2.2.4 Exponential smoothing

Exponential smoothing (ES) models are based on a description of trend and seasonality in the data, and the prediction is a weighted linear sum of recent past observations or lags [24]. In single exponential smoothing, there is a parameter ɑ, the smoothing factor, which is an exponentially decreasing weight decay factor of past observations [1, 4, 6, 7]. Exponential smoothing can be formulated and computed as follows:


where t > 0, and where t > 0, and α is the smoothing factor, which can be set at any number ranging between 0 and 1 [23].

This means that for predictive purposes, the more recent observations have more weight in the computed predicted values, than the observations further away in the past [29]. Especially when the smoothing factor, α has a higher value close to 1. Any smoothing method on time series data will oftentimes yield sufficient performance with univariate data that contains low trend or seasonality [24]. It also requires only a low amount of computation power [23].

2.2.5 Holt-winters exponential smoothing

Holt-Winters exponential smoothing is also called triple exponential smoothing. It is a smoothing method that is similar to exponential smoothing models, where the next time step is an exponentially weighted linear function of observations at prior time steps. It is a more advanced smoothing method, since it also takes trend and seasonality into account when making forecasts. Therefore, HWES is suitable for univariate time series with trend and also seasonal components [24], and often performs well.

2.3 Time series forecasting with neural networks

A neural network can be thought of as a network of neurons which are organized in layers, weights that are added to some of the networks parameters, and an activation function that causes the network to converge towards minimizing or maximizing an objective [30].

An artificial neural network (ANN) has a data-driven approach, where training depends on the available data. Furthermore, ANN models do not make any assumptions about the statistical distribution of the underlying time series, and are able to perform consistently non-linear modeling [20].

The goal of any ANN is to optimize an algorithm towards an objective function. This optimization is the process of finding optimal values for parameters or function arguments that minimizes or maximizes that function [3].

ANN’s are flexible and non-parametric methods, which can perform nonlinear mappings from data. They are able, similar to other machine learning methods, to generalize over data. This is a process called generalization, and is the ability of a machine learning algorithm to perform well on new and previously unseen inputs. The generalization error, also mentioned as the test error, is the expected value of the error on a new input. It can be estimated by measuring its performance on a test set of examples that were collected separately from the training set, by performance metrics. In the research the two performance metrics are the root mean squared error (RMSE), and the mean average error (MAE) [3].

Neural networks are stochastic by nature. This means that given the same model configuration and the same training set, a different internal set of weights will result each time the model is trained to a different performance [4].

Today, deep learning is centered around artificial neural networks, than can be defined as a non linear function from a set of input variables x to a set of output variables y, controlled by a vector w of adjustable parameters. These networks allow nonlinear relationships between the response variable and its predictors, and are able to overcome the challenges faced by linear statistical models [6, 20].

A typical neural network always contains an activation function, an optimization procedure, and a set of hyperparameters. Many different functions can serve as an activation function, because a neural network is able to approximate any continuous function that maps input values to output values [30]. Most commonly used activation functions are the Sigmoid, ReLu, LeakyReLu, Tanh, and Softmax functions. In the network during the optimization procedure, an optimization algorithm makes the network converge towards the best optimal solution, which is minimizing or maximizing the objective. This can be considered as finding the appropriate values of parameters θ1...θn [6, 31].

Parameters are in any ANN the weights for each variable or feature in the ANN model. In many cases they are determined by the backpropagation algorithm and iterations made by the optimizing function [32]. Hyperparameters in a neural network are settings whose values can be determined and manually modified from outside the algorithm itself, and that controls the capacity of a model [3, 32].

2.3.1 ANN learning process

The learning process of an ANN consists of modeling past observations with the objective of estimating the underlying temporal relationships [25]. Any artificial neural network learns by a procedure called the backpropagation procedure. In an artificial neural network the backpropagation procedure make the network learn and update its parameters after each training epoch. In detail, it evaluates the gradient of an error function by backpropagating the errors backwards through the network. The resulting derivatives are than used to compute new values for the neural networks weights. These adjustments can lead to significant improvements in optimization of the objective error function, and aim to minimize the error function [5, 30, 33].

In deep learning solutions, when a model converges to a local minimum, that result is accepted, since the loss function is approximately minimized [3]. This characteristic makes any artificial neural network as an approximator to any objective.

Most deep learning optimization algorithms are based on the stochastic gradient descent algorithm. SGD is an optimization algorithm that aims to maximize or minimize an objective function, in this research an error function, also called a loss function [3].

When the algorithm operates on a training set of examples, it usually follows the estimated gradient downhill towards a local or global minimum, that optimizes any objective function [3, 30].

In the predictions produced by a neural network there is always an element of randomness. Therefore the network is trained multiple times where each training cycle is called an epoch. An epoch can be defined as a pass through the training set, where a pass includes both a forward and a backward pass through the neural network. The number of epochs denotes how many passes, forward and backward, were required for the best training of the model. During one epoch the neural network with all the training data is trained for one cycle [22, 34]. After a fixed number of epochs, training of the network stops, and the average or best result of all epochs becomes the resulting output of the neural network [6].

In the time series forecasting domain different deep learning models can be applied. The most commonly used deep learning models in time series forecasting are: the multi-layer perceptron (MLP), the recurrent neural network (RNN), and the convolutional neural network (CNN) [31]. For this regression type problem, RNN and CNN networks are promising solutions [10].

2.3.2 MultiLayer perceptron

A multilayer perceptron is a relatively simple artificial neural network that is used to approximate a mapping function from input variables to output variables. The network is more commonly known as a “feedforward neural network”. It can be applied as a deep learning model to any time series forecasting, since the network is robust to noise from the input data. It does not make strong assumptions about the mapping function, and is capable to learn complex and high-dimensional mappings, and both linear and nonlinear relationships [35]. MLP’s are memory-less, are unidirectional where neurons are grouped in two or more layers [36], and use the feed forward neural network architecture with backpropagation [20]. The neural network aims to generalize over data samples, such that newer samples are produced beyond by what is known by the model itself. It can therefore make accurate and valuable forecasts.

One key limitation is that a MLP has to specify the temporal dependence upfront during the design of the model [4].

2.3.3 Recurrent neural networks

Recurrent neural networks are a type of ANN with the following characteristics:

  • RNN’s have typically been used in the modeling of sequences, and were developed for modeling data with a time dimension [17, 19], and are capable of modeling seasonal patterns [22].

  • RNN’s can automatically learn the temporal dependence and correlations from data [2]. Considering this temporal dependence for past observations have been proven to a successful methodology for time series forecasting [26].

  • One observation at a time can be shown from a sequence, and the model can learn what observations it has seen previously that are relevant and how they are relevant to the forecasting [4].

In the RNN model architecture, both a mapping from inputs to outputs, and the context from the input sequence useful for the mapping are learned [4]. Each RNN cell contains an internal memory state that serves as a summary of past information, and it is repeatedly updated with new observations for every time step [19].

Besides its overall better performance, flexibility, and improved memory capabilities compared to the MLP, RNN’s are computationally more expensive. The overall process takes significant computation time [10, 17]. Also, standard RNN’s have difficulty in learning long-term dependencies [26], and could make poor forecasts because of the vanishing gradient problem in larger sequences. This means that RNN’s are not capable of carrying long-term dependencies [5, 22].

There are two specific variants of the recurrent neural network, namely the long short term memory (LSTM), and the gated recurrent unit (GRU) networks [22].

A LSTM network model has special LSTM units that are composed of cells, where each has an input gate, output gate, and a forget gate [31]. The input gate and forget gate determine how much of the past information is retained in the current cell state for each LSTM cell, and also how much of the current information to propagate forward [22].

The model learns a function that maps a sequence of past observations as input to an output observation. It reads one time step of the sequence at a time and builds up an internal state representation that can be used as a learned context for making predictions [4]. It is a special RNN variant, since the model is able to learn long term dependencies [8, 22], by replacing the hidden layers of a RNN with memory cells. Each cell in the LSTM network remembers the desired values over arbitrary time intervals [31]. Furthermore, it is able to overcome to most common limitation of standard RNN’s, the vanishing gradient problem [2, 10, 19, 26].

Another RNN variant is the Gated Recurrent Unit (GRU), which is recently developed, first in 2014. It is an artificial neural network that uses an input gate, forget gate, update gate, and reset gate. Each gate is a vector, that decides what information should be passed to the output gate. The update gate decides how much of the last memory to keep. The reset gate defines how to combine the new input with the previous memory [8]. The GRU on average is not the most successful model in forecasting, but is less complicated to build and computations made by a GRU are faster than the LSTM. Also, it often shows competitive performances compared to ARIMA and RNN models.

2.3.4 Convolutional neural network

A convolutional neural network (CNN) is a specialized kind of neural network for processing data that has a known grid-like topology. It is different from other known neural networks since it uses convolutions instead of matrix multiplications in at least one layer. The input of a convolutional matrix can be a matrix, or a sequence. Also typical CNN’s do not need medium of large sized datasets to perform excellent, only require a small set of parameters, and are able to make connection when data is sparse. Time series data can be considered as a type of 1-D grid taking samples at regular time intervals [3].

A convolutional neural network combines three architectural ideas; local receptive fields, shared weights, and spatial or temporal subsampling [35]. In a CNN, a sequence of observations can be treated like a one-dimensional(1-D) image, what a CNN can read and distill the most pertinent elements. In a 1-D CNN, the network uses inputs within its local receptive field to make forecasts [19]. CNN’s support both univariate and multivariate input data and supports efficient feature learning [4].

Layers in a typical CNN model are the convolutional layer, the hidden layer, a pooling layer, a flatten layer, and a dense layer [4]. The convolutional layer has to ability to extract useful knowledge. The pooling layer distills and sub samples the output of the convolutional layer to the most salient elements [10], it thereby reduces the size of the convolved feature, in this research the input sequence. A flatten layer is implemented as a layer between the convolutional and dense layer to reduce the feature maps to a single one-dimensional vector. And the dense layer is a fully connected layer, similar to an MLP, which at a final stage of the CNN network, interprets the features extracted by the convolutional part of the model [3, 37]. Figure 1 illustrates the one dimensional sequential CNN architecture, and how any input data is transformed by the convolutional operations into certain output.

Figure 1.

1-D sequential CNN model architecture.

A convolutional neural network has a kernel, that can be considered as a tiny window. The kernel slides over the input sequence or matrix, and applies the convolution operation on each subregion, called a patch, that the kernel meets across the input data. It functions as a filer that extracts the features from any 1-D sequence or higher dimensional image. This results in a convolved matrix, which is more useful than the original features of the input data, and often improves modeling performance [10].

In training a convolutional network, a forward pass executes training on the entire network, from the initial layer to the final dense layer. Loss is calculated and during the backward pass, in a the backpropagation procedure. This procedure takes place by computing local gradients for each CNN gate: δ output/δ input for each input/output combination, also with use of convolutions. Similar as in the forward phase, one matrix slides over the other, what results in the computation of a local gradient. These local gradients are found and than taken together with the use of the chain rule that completely propagates all the gradients back through the convolutional network [38]. Afterwards, the networks parameters in each layer are updated [37].

A CNN can be very effective at automatically extracting and learning features from one-dimensional sequence data, such as univariate time series data, and can directly output a in multi-step vector [4]. Also, pooling operations in the neural network can significantly reduce the number of required network parameters, and makes the model more robust [10]. This can result in faster training and less overfitting on training data [37].

2.4 Automatic machine learning

Automatic machine learning (AutoML) is a research area in the AI field that focuses on the automatic optimization of ML and CI hyperparameters, stages and pipelines [39]. This results in the further development of function methods that allows complex data preparation, feature extraction, and CI modeling in fewer lines of code without the need of building whole machine learning and data science frameworks from scratch [32].

It therefore becomes easier for novices in machine learning to build competitive models, and for machine learning experts in building complex models faster [29]. This is because two main barriers, structured programming and higher mathematics, are bypassed with the progressions made by AutoML. Examples of application of AutoML that are used in this research are hyperparameter tuning in the ARIMA model, and feature engineering in the pre modeling phase.


3. Methodology

The objective of time series forecasting is the development of one or more mathematical or advanced deep learning models, that can explain the observed behavior of a time series, and possibly forecast future states of the series.

The actual time series research was subdivided into the following tasks:

  • Exploratory data analysis.

  • Data wrangling.

  • Analysis of the time series.

  • Model construction and implementation.

3.1 Exploratory data analysis

Exploratory data analysis can be considered as the set of techniques that try to maximize insight into the data, uncover the underlying structure of the data, and the extraction of important variables and features [7]. For time series analysis, to understand the underlying data and its characteristics, plotting the data is very useful. When plotting time series data, there are always two variables; the time scale on the x-axis, and the numerical variable on the y-axis. Most commonly used plots in time series data analysis are; run sequence plots, lag plots, autocorrelation plots, partial autocorrelation plots, histograms, and box plots. These time series plots can determine what models would be appropriate to model the time series.

3.1.1 Run sequence plot

The run sequence plot is in time series analysis another name for the line plot. It shows the development of the corona virus infections over time in line graph format. In this particular plot, the 7-day moving average per day is also included. Figure 2 shows the development of the number of COVID-19 cases over a time period april 2020 - april 2021.

Figure 2.

Run sequence plot of the number of corona infections per day.

3.1.2 Lag plot

A lag plot can check for randomness in data. If data is random than the data should not show any identifiable structure in the lag plot, such as linearity [7]. Plotting a lag plot can be very efficient, since it quickly concludes if time series data is random or not. If such data is not random, than a random walk model for forecasting would not be appropriate.

3.1.3 Auto correlation and partial auto correlation plot

From every collection of time series data samples, its auto correlation is its most important internal structure to analyze, besides trend and seasonality.

The auto correlation function (ACF) shows how similar the previous term and the current term are. In fact, autocorrelation shows the correlation coefficients between current and previous values [9].

Autocorrelation can be defined as the second order moment E(xp x{t + h} = g(h), that is a function of only the time lag h, and independent of the actual time index t.

It measures the degree of linear dependency between the time series at index t, and the time series at indices t-h or t + h. A positive auto correlation indicates that the present and future values of the time series move in the same direction. A negative auto correlation illustrates present and future values moving in the opposite direction [40]. If a ACF is close to 1, there is an upward trend, and an increasing value in the time series is often followed by another increase. Also, when the ACF is negative and close to −1, a decrease will probably be followed by another decrease [9].

The auto correlation function is used to determine if the time series data is stationary of non-stationary. The function can be plotted into a graph, a correlogram, called the ACF plot, which is a plot of the autocorrelation of a time series by its lag. It is used in the model identification stage for various Box-Jenkins (ARIMA) models. If the data is truly stationary, the ACF plot will drop to zero very quickly after a few lags, while a the line graph in the ACF plot of any non-stationary data will converge to zero very slowly.

A partial autocorrelation function (PACF) in time series analysis defines the correlation between xt and x{t + h}, that is not accounted for by lags t + 1 and t + h-1. It actually measures the correlation between the time series with a lagged version of itself, but after eliminating the variations already explained by the intervening comparisons [18].

The ACF and PACF plots can tell if an autoregressive (AR) or moving average (MA) model, or both, can be appropriate. If the ACF plot shows a few serious spikes in the beginning, but not in later lags, than its recommended to model the time series data with an AR model, and use the AR component (p > 0) in the ARIMA model. If the PACF plot shows serious data spikes in the beginning, and in later lags, but only very few spikes in the ACF plot, than it is recommended to use the MA model, and make use of the MA component (q > 0) in the ARIMA model. If cases when both charts show many significances, it is recommended to model the data with an autoregressive moving average (ARMA) or autoregressive integrated moving average (ARIMA) model [40].

ACF and PACF plots are also useful for hyperparameter tuning ARIMA models, and both plot can indicate upper and lower bound values in the grid search for the p and q parameters [40].

3.1.4 Other plots

Other visualization plots that can be used to understand the time series data, are commonly used dataplots in statistics such as the QQ-plot, histogram and boxplot.

A QQ-plot helps to determine whether or not datapoints are normally distributed [9, 11]. A histogram represents the distribution of numerical data, and shows the shape of the variables distribution. The boxplot will display how data is distributed, based on minimum value, first quartile, median, third quartile, and maximum value. The smaller the boxplot, the less variability in data values [41]. These plots can be easily plotted in with Pythons matplotlib and seaborn libraries.

3.2 Data wrangling

The data wrangling process applied in this research contains data pre-processing techniques such as data preparation, feature selection, feature engineering, and data aggregation.

Data pre-processing is an important, but also time consuming process in the field of data science, which has gained importance over the past decade. This is because most CI algorithms have not been made for time series data [42].

Also, most CI models rely on high quality data in order to improve modeling performance from models that operate in real-world environments [43]. Thus, to effectively run a model and yield results, pre-processing of real-time and time series data is necessary.

In any large dataset only the most relevant features are selected, and irrelevant information, what does not have any influence on the desired output, is removed in a data pre-processing process called feature subset selection. This leads consequently to dimensionality reduction in data, and having a learning algorithms that operates faster and more effective on more simple input data [41].

Also feature engineering and data aggregation are applied, since the cumulative values from the original dataset are transformed into its single original date values, by means of differencing, groupby mechanisms, and resampling data to daily sums. This creates new features to day scale, which make the Covid-19 data more suitable for analytical and modeling purposes [32, 39, 44].

Also these data transformation techniques are able to reduce noise in the time series data, and produce smoothing of the original time series [11].

For example, in the run sequence plots, a 7-day moving average calculated over the daily sum is plotted, which has a smoothing effect on the data, in the same plot as the daily sums. This is a statistic that is computed by sliding a window, in this research seven days, over the daily series, that aggregates the data for every window [11].

In the later modeling stage, data from the used dataset will be divided in a smaller data subset, by a process called instance reduction [43]. For modeling simplicity, only months that contains data from every day of the month are considered. Since the used dataset starts recording the Corona virus infections from 13 March 2020, March 2020 is not included in the used data. The input data for every models contains the number of corona virus infections, from April 2020 until and including April 2021.

3.3 Time series analysis

Time series data can be considered as a series of measurements, or a sequence of observations that are indexed in time order [45]. An important aspect in time series is fitting models on historical data, and using these model to predict future observations.

Not every chunk of data can be considered and treated in similar ways. Data is only considered as time series data when it has a datetime measurement, also called a time component. This makes the time series data different from other types of data, but also more difficult to interpret. Time series data adds specifically a time dimension, which means an explicit order dependence between observations. In these instances, each data point in a time series depends on previous data points from that same time series [13].

Time series analysis is about the use of statistical methods and machine learning algorithms to extract certain information and characteristics of data, in order to predict future values based on stored past time series data [23]. Time series analysis is different than other analysis in supervised machine learning. Time series cannot be considered as a standard linear regression problem, since the assumption that observations are independent does not hold [6]. In the case of time series data, each data value depends on the previous data value, and is so called lag dependent. Therefore time series analysis cannot be solved with simple linear regression.

Time series analysis is the phase in the whole time series process that follows right after the exploratory data analysis. In any time series analysis data plot, the x-axis has in many cases the time variable, showing the amount of a numerical variable plotted on the y-axis at specific datetime points and time interval [46].

Time series data can be univariate or multivariate. Univariate time series data are datasets containing a single series of observations with a temporal ordering and a model is required to learn a function from the series of past observations to predict one or more new output values. In multivariate time series data there is more than one variable observed at each time step [4, 15].

3.3.1 Stationarity

One very important characteristic for time series data, is that the data needs to be stationary before doing any forecasts in many classical ML models. Time series data that are “stationary” have values that fluctuate around a constant mean or have a constant variance, that does not change over time [11, 28]. This means that a change it time, for example taking the time series of a newer year or month, does not change the shape of the distribution when all data is stationary [9]. Also, stationary time series have no predictable pattern in the long-term [23].

Non-stationarity is in many cases caused by fluctuations in trend or seasonality [6, 7]. When the data is non-stationary, it can be set stationary with differencing, or with a method called time series decomposition [11, 15].

3.3.2 Differencing

Any time series can be made stationary with first-order or higher-order differencing [20]. It transforms the data in a way that a previous observation is subtracted from the current observation, and thereby removes any trend and seasonality structure of the time series data [18, 23].

The differencing approach is used as one of the main parameters in the Box-Jenkins ARIMA model [45]. First order differencing is mathematically formulated by: d = 1, xt = xt - x{t-1}, and second order differencing by: d = 2, xt = xt - 2x{t-1} - x{t-2} [11].

In this research, first order differencing is applied, and can be easily computed with the python. diff() build-in function.

3.3.3 Time series decomposition

Data can often consist of multiple patterns, and can show linear or cyclic behavior. Therefore data splitting into multiple components can be very beneficial to improve understanding, and to discover any irregularities or white noise. The process of splitting or dividing time series data into multiple components is called time series decomposition [18].

Any time series model can be decomposed in trend, seasonality, and irregular components. The trend component is the pattern and behavior of the data in the long term. It is a certain pattern of growth of the data, and a description of the variable over a certain period of time. The seasonal component is a particular pattern in the time series data, that is repeated at specific time periods. Irregular components can be considered as data that is far off-trend. It contains abnormal values, sometimes called residual or outliers. It is also referred to as “white noise”. Time series data can also contain cyclical components. These can be considered as movements observed after every few units of time, but they occur less frequently than seasonal fluctuations [11].

The objective of time series decomposition is to model the long-term trend and seasonality, and to estimate the overall time series as a combination of them [11]. A time series decomposition model can be additive or multiplicative. When the time series data appears to have any sort of changing seasonality pattern, the multiplicative decomposition model is recommended, in other cases the additive model is endorsed [7].

3.3.4 Augmented dickey fuller test

The augmented dickey fuller (ADF) test is a statistical unit root test that is used to determine stationarity of a time series, and the magnitude of the trend component [8, 11]. It is a hypothesis test, and any time series can be considered as stationary(where H0 is false), with 95% confidentiality, when the ADF’s p-value is less than 0.05 [18].

The ADF unit root test should at first be applied on the original time series that has not been differenced, to check if the data is already stationary. If not, than data should be stationarized with first order or higher order differencing [11, 15, 20].

3.3.5 Smoothing with moving averaging

Smoothing can be defined as the removal of noise from data, and can be applied in both regression and clustering problems [44]. In time series data, smoothing can be applied as a rolling moving average over a number time steps [18]. In this cases a one week(7-day) or one month(30-day) moving average can be computed. For each day, the moving average changes, since the method makes use of the sliding window, taking the average over different days [21].

Smoothing is also applied as a modeling technique that assigns weights to observations, whereas the most recent observations have more weight, than observation further away in the past. Examples of these techniques are single exponential smoothing, and triple(Holt-Winters) exponential smoothing.

3.4 Model construction and implementation

The final objective of time series analysis is the development of one or more mathematical or advanced deep learning models, that can explain the observed behavior of a time series, and possibly forecast future states of the series.

The construction and implementation of multiple time series forecasting models can be divided into the following parts:

  • Data preprocessing.

  • Data plotting.

  • Model building.

  • Model evaluation.

  • Model improvement.

One of the first steps in performing time series research is to determine what data will be used for training a computational intelligence model, and what data will be used to test the performance of that model. The input sequence is divided into a training set and a test set. The sequence contains thirteen months of data, measured and aggregated to a total of 395 observations, where each observation is one day. Of those 395 days, 306 days are taken as the training set, the remaining 89 days are the test set. For the total time series data, around 77.5% is used as training data, and approximately 22.5% as testing data.

3.4.1 Data pre-processing and data plotting

Before data is used as input for a CI model, is it pre-processed first. For the dataset, only the most relevant features are selected in a process called feature selection. Secondly, all relevant data is transformed into its shape that is useful for modeling with data aggregation, which contains rescaling and resampling the time series data.

Before data gets fed to a CI model, the data is plotted to recognize its underlying structure, to determine what models can be suitable to model the time series data. In the classical time series models, the run sequence plot is plotted with a seven day moving average applied as a method for smoothing the data. The ARIMA model contains multiple data plots in its pre modeling phase, including the run sequence plot, lag plot, QQ plot, ACF plot, and PACF plot. Before constructing the CNN, it is very beneficial to have sufficient understanding in the underlying data patterns. Therefore, before building the CNN model, the data was plotted and interpreted with run sequence, ACF, and first-order differences plots.

3.4.2 Model building

Since the input time series data is univariate, and contains one column, it is best practice to extract the whole time series and store it in a variable [9].

The baseline model that is constructed in this research, simply calculates the average of all the training examples, and takes that average as its forecast.

Any AR model can be defined by an ARIMA(x, 0, 0), where x is the autoregressive parameter, a positive integer ranging between 1 and 5. The MA model is defined by an ARIMA(0, 0, x), where x is the moving average parameter, what is also a positive integer, that in many cases ranges between 1 and 5. In the ARIMA model, all three parameters from the ARIMA(p, d, q) model needs to be defined, including the differencing parameter d, because the data is non-stationary. In both exponential smoothing models, the smoothing factor needs to be determined by manual input. An optimal search would be to run a smoothing model twice, on a smoothing factor of 0.1 and 0.9, and check the performance metrics. Than can be determined if the smoothing factor needs have a high value close to 1, or a low value close to 0.

In the CNN model, the training and testing data can be split with the split_sequence function. After reshaping the input data, all CNN operations transform the sequence into a 1-D output vector. The model is fitting and best predictions are made after performing two thousand epoch training cycles.

When running and testing any model, it is run against the testing set to predict data it has not seen before.

3.4.3 Model evaluation

To evaluate a models performance, first some intuition from its characteristics is required, before making any judgments. This can be done with summarizing or describing its outcomes.

The baseline model is summarized with the .describe() function. It is a function that gives the count, mean, standard deviation, and all the boxplot values [11].

All other model are summarized with the .summarize() function.

The essential aspect in model evaluation is determining how well its forecasting results are. Performance measures such as the (root) mean squared error are a way of measuring the performance of a model [1, 3]. The following two performance measures determine the effectiveness of each model:

  • Root mean squared error (RMSE). Measures the square root of the average of the squared errors of each data value [9]. It has the effect that large forecasting errors significantly increase the RMSE performance metric.

  • Mean absolute error (MAE). Is the average of the forecast error values, and forces all error values to be positive [47].

3.4.4 Model improvement

When a model is performing less accurate than expected, the model can be improved. The modeling improvement step is however not a mandatory step, and is only needed when a model performs poorly than was originally intended.

In any AR, MA, ARMA, and ARIMA model, the modeling performances can be improved with hyperparameter tuning. It is considered as important in CI, since it evaluates the model to be implemented on different configurations, in order to find the best set of hyperparameters that yields in the best predictive performance [39].

Grid search is a search method that can do any hyperparameter tuning. It is a brute-force and semi-automatic based search method that explores all possible model configurations within a user specified parameter range, and is considered as an exhaustive search that often takes relatively long runtime [22, 39].

One important metric that is used in tuning hyperparameters, is the Akaike Information Criterion (AIC). It measures the relative quality of the model being considered for the description of the phenomenon, and is proven to be fast and efficient. Its value shows an estimate of the information lost, when a specific order of the model is being considered. The smaller the value of the AIC, the less information is lost, and the more accurate the model is considered to be [9].


4. Used models

4.1 Baseline model

The research started with the setup of a simple and consistent baseline model. The purpose of a baseline model is out of simplicity and for setting the boundaries for all the other models. It also helps understanding the data better, and could determine whether or not more data preparation or feature engineering is necessary [48]. If a more complex model performs worse than the baseline model, it can be considered as a poor model for forecasting the specific dataset.

The simple average model discussed in the previous section, serves in this research as the baseline model. A simple average can be implemented with formula [21]:


Most baseline models serve as a benchmark in forecasting research for comparing new methods to this simple method [6].

Since pandemic infections have the tendency to suddenly increase, but also to decrease very quickly in some cases, a naive or persistence model, that predicts the last seen value, would not be a model to consider as a baseline model.

4.2 Classical machine learning models

The classical machine learning, or statistical models that are implemented assume that observations are continuous, time is discrete and equally spaced, and that there are no missing observations [28].

The classical machine learning methods that are implemented are the moving average and autoregressive models, the simple exponential smoothing model, and the triple (Holt-Winters) exponential smoothing model. Special focus in the research in spend on modeling the ARIMA model, since ARIMA is a very popular and quite accurate model in time series forecasting.

4.3 CNN model

In this research one promising neural network will be implemented and tested on the COVID-19 data, the convolutional neural network (CNN) model. The CNN network that is implemented in this research, treats the input data as a sequence over which convolutional read operations can be performed, in a similar fashion as in one dimensional images [49].

The CNN model is a univariate multi-step one-dimensional vector output forecasting model. Univariate, means that one feature will be forecasted, and multi step defines the output of a sequence with multiple output values.

The CNN architecture includes an activation function, and an optimization algorithm. The activation function used in the convolutional neural network, is the ReLu activation function. It is a function that is able to overcome the problem of exploding and vanishing gradients, which occur in typical ANN’s like the MLP and RNN [4]. It can be defined as ReLU = R(z) = max(0,z) [31, 32]. When activating, negative and zero inputs will have zero output, and positive inputs will be exactly the same. The ReLU activation function, therefore consistently filters out negative numbers [50].

The optimization function that is used in the CNN model, is the ADAM algorithm, which is a modified version of the SGD algorithm, and an adaptive learning rate optimization algorithm. It uses running averages and both the gradients and second moments of the gradients [31]. It has a built-in tensorflow implementation, and requires the learning rate parameter to operate [22]. The learning rate, in many cases denoted as α, indicates at which pace the weights in the neural network get updated [32]. It is currently the most commonly used optimization algorithm in artificial neural networks [3, 51].

4.4 Python packages

The research conducted in the classical statistical models rely on a few well known and frequently applied Python-packages, such as Pandas, Numpy, and Matplotlib. The Seaborn libary is used for some data visualizations. For specific time series analysis the python library Statsmodels is often used. It is a Python package that includes basic tools and models for time series analysis and modeling, and is specifically build for time series data [43]. It also provides all functionality required to model an ARIMA and exponential smoothing model [26].

Another package that is applied is pmdarima, which is used in the hyperparameter tuning of the ARIMA model [14, 52].

Sklearn, a famous python library in machine learning, is used for evaluating performance metrics of all trained CI models [43].

In the modeling phase of the 1-D sequential CNN model, the Keras library performs all CNN operations. Keras is a python library that is extensively used in many deep learning modeling situations.


5. Research results

In the time series analysis results, the lag plot as displayed in Figure 3 indicates that the data containing the daily COVID-19 cases is non-random. Since the plot clearly shows a linear structure between y(t) and its lag y(t + 1), the data can be considered as non-random and suitable for time series forecasting.

Figure 3.

Lag plot of the number of corona infections per day.

First order differences as displayed in Figure 4 show the daily changes, and clearly indicates upward and downward trends in the data starting from October 2020, up and until April 2021. Figure 5 shows the autocorrelation function of the data of the first 400 lags. The ACF graph slowly moves to the zero value, indicating that the data is non-stationary. Therefore, differencing and appling an ARIMA model with the differencing paramater set at a value of at least 1, is strongly advised.

Figure 4.

First order differences number of Covid-19 infections.

Figure 5.

ACF plot of the number of Covid-19 infections.

In the modeling phase, all models have been constructed, implemented, and have produced forecasts in the Jupyter notebook environment, in python 3 code. Each forecast made by each of the seven models was measured against the test set, and resulted into a root mean-squared error (RMSE) and mean average error (MAE) score. Table 1 below shows all RMSE and MAE performances of the implemented models.

Model nameRMSE scoreMAE score
Simple Average (SA) – Baseline model3222.912733.16
Autotregressive (AR)2913.902369.61
Moving average (MA)3224.082736.03
Exponential smoothing (ES)1817.111567.40
Holt-Winters ES (HWES)1611.341379.34
1-D sequential CNN409.86315.99

Table 1.

Performance of all seven implemented models.

The results as displayed in Table 1 show that five out of six models were able to make better forecasts than the forecasts made by the baseline model. The AR, ES, HWES, ARIMA and CNN model made forecasts that were all slightly or significantly better than the forecast made by the simple average model. Only the moving average (MA) model made forecasts that resulted in almost identical RMSE and MAE error scores, compared to the performances of the baseline model.

The research results show that the 1-D sequential CNN with vector output is the best performing model on the test data. In Figure 6 the forecasts made by the CNN model are displayed, and clearly show significant alignment with the test data. The models relative good results could indicate that convolutional operations perform well on 1-D sequence data, and are able to make better forecasts than traditional machine learning models. The 1-D sequential CNN made approximately five times more accurate forecasts than the ARIMA model. The CNN model has a RMSE error of 409,86 and a MAE error of 315,99, compared to an RMSE error of 2172,88 and MAE error of 1721,42 from the ARIMA model. The initial set goal of having a CNN model that is able to outperform the prominent and accurate ARIMA model, has been clearly achieved.

Figure 6.

CNN forecasting performance (green) and actual infections (blue and orange) on the number of corona infections per day, in time interval February – April 2021.

Another favorable model is the Holt-Winters exponential smoothing (HWES) model. With a RMSE error of 1611,34 and MAE error of 1379,34. It performed approximately 25 percent better on the RMSE metric, and made around 20 percent better forecasts according to the MAE metric, than the forecasts made by the ARIMA model. Also the single exponential smoothing (ES) model slightly outperformed the ARIMA model, by about 10 percent.


6. Conclusion

Many sources claim that classical statistical methods, like ARIMA and exponential smoothing, achieve better performance than standard deep learning models, like MLP’s and RNN’s on smaller datasets.

However, the one-dimensional CNN model with vector output made forecasts that were five time more accurate than the forecasts made by the ARIMA model, on a smaller dataset containing less than one thousand observations. These findings support the fact that neural networks are resistant to errors and some outliers in the underlying dataset, which make them useful in the analysis and prediction of larger and sometimes even smaller time series datasets. Also, classical machine learning methods like ARIMA and exponential smoothing fail to identify and capture nonlinear and complex behavior of time series. Thus, in the case of pandemics, where the data does not show a clear trend and data patterns are relatively hard to extract, neural networks can be a solution for this complexity.

Nevertheless, predicting the future with accurate forecasts is still a very difficult to an even impossible task. This is because of the presence of confounding variables, for example human decision making processes, that cannot be modeled upfront in any of the models. However, the CNN model have proven its potential in predicting and forecasting new corona virus infections. When dealing with viruses that act like COVID-19 in a similar way, artificial neural can, in some cases, simulate future values surprisingly well and close to actual future values.



ACFAuto Correlation function
ADFAugmented Dickey Fuller
AIartificial intelligence
AICAkaike Information Criterion
ANNartificial neural network
ARIMAAutoRegressive Integrated Moving Average
ARMAAutoRegressive Moving Average
AutoMLAutomatic Machine Learning
CIcomputational intelligence
CNNConvolutional Neural Network
DBNdeep belief network
DRLdeep reinforcement learning
EDAexploratory data analysis
ESExponential smoothing
GRUGated Recurrent Unit
HWESHolt-Winters exponential smoothing
LSTMLong Short Term Memory
MAMoving Average
MAEMean average error
MLmachine learning
MLPMulti-Layer Perceptron
PACFPartial Autocorrelation function
RLReinforcement learning
RMSERoot mean squared error
RNNRecurrent Neural Network
SGAStochastic Gradient Ascent
SGDStochastic Gradient Descent


  1. 1. Papastefanopoulos V, Linardatos P, Kotsiantis S. Covid-19: A Comparison of Time Series Methods to Forecast Percentage of Active Cases per Population. Department of Mathematics, University of Patras, Greece. Multidisciplinary Digital Publishing Institute; 2020. DOI: 10.3390/app10113880
  2. 2. Shastri S, Singh K, Kumar S, Kour P, Mansortra V. Time Series Forecasting of Covid-19 using Deep Learning Models: India-USA Comparitive Case Study. Department of Computer Science & IT, University of Jammu, Jammu & Kashmir, India: Elsevier; 2020. DOI: 10.1016/j.chaos.2020.110227
  3. 3. Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge, Massachussetts, USA: MIT Press; 2016. ISBN: 9780262035613. url:
  4. 4. Brownlee J. Deep Learning for Time Series Forecasting: Predict the Future with MLPs, CNNs and LSTMs in Python. Calle de San Francisco, San Juan, Puerto Rico: Machine Learning Mastery; 2018
  5. 5. Gamboa J. Deep Learning for Time Series Analysis. Germany; University of Kaiserslautern; 2017. arXiv:1701.01887
  6. 6. Hyndman R, Athanasopoulos G. Forecasting: Principles and Practice. OTexts, Melbourne, Australia: Monash University; 2018. ISBN: 978-0-9875071-1-2
  7. 7. NIST/SEMATECH: Engineering statistics handbook. National Institute of Standards and Technology. U.S. Department of Commerce; 2013. DOI: 10.18434/M32189
  8. 8. Yamak P, Yujian L, Gadosey P. A Comparison between Arima, Lstm, and Gru for Time Series Forecasting. In: Proceedings of 2019 2nd International Conference of Algorithms; 2019. DOI: 10.1145/3377713.3377722
  9. 9. Mather B. Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python, Massachussetts, USA: Kindle Store, Amazon; 2020. ISBN: 978-0-6487830-7-7
  10. 10. Livieris I, Pintelas E, Pintelas P. A CNN--LSTM Model for Gold Price Time-Series Forecasting. Department of Mathematics, University of Patras, Greece: Springer; 2020. DOI: 10.1007/s00521-020-04867
  11. 11. Pal A, Prakash P. Practical Time Series Analysis: Master Time Series Data Processing, Visualization, and Modeling Using Python. Livery Place, Birmingham, UK: Packt Publishing Ltd,; 2017. ISBN: 978-1-78829-022-7
  12. 12. Döring M. Data science blog. 2020. Available from:\newlinepost/machine-learning/forecasting\_vs\_prediction/. [Accessed: April 8, 2021]
  13. 13. Brownlee J: What Is Time Series Forecasting? Machine Learning Mastery Pty Ltd, Calle de San Francisco, San Juan, Puerto Rico; 2016. Available from: [Accessed: March 17, 2021]
  14. 14. Faloutsos C, Gasthaus J, Januschowski T, Wang Y. Forecasting big Time Series: Old and new. VLDB Endowment. Seattle, Washington, USA: Amazon AI Labs; 2018. DOI: 10.14778/3229863.3229878
  15. 15. Pulagam S: Time Series Forecasting using Auto ARIMA in Python. San Francisco HQ, San Francisco, California, USA: Towards Data Science. Medium Corporation; 2020. Available from:\newline-arima-in-python-bb83e49210cd. [Accessed: May 8, 2021]
  16. 16. Shi Q, Yin J, Cichocki A, Yokota T, Chen L, Yuan M, Zeng J. Block Hankel Tensor ARIMA for Multiple Short Time Series Forecasting. New York: 34th AAAI Conference on Artificial Intelligence; 2020. DOI: 10.1609/aaai.v34i04.6032
  17. 17. Petnehazi G. Recurrent Neural Networks for Time Series Forecasting. Doctoral School of Mathematical and Computational Sciences, Hungary: University of Debrecen; 2019. arXiv:1901.00069v1
  18. 18. Jain A: A Comprehensive Beginner's Guide to Create a Time Series Forecast. Udyog Vihar, Gurugram, India: Analytics Vidhya. 2016. Available from: [Accessed: April 8, 2021]
  19. 19. Lim B, Zohren S. Time Series Forecasting with deep Learning: A Survey, Oxford, UK: Philosophical Transactions of the Royal Society A; 2020. arXiv:2004.13408; 2020
  20. 20. Kaushik S, Choudhury A, Sheron P, Dasgupta N, Natarajan S, Pickett L, et al. AI in Healthcare: Time-series Forecasting using Statistical, Neural, and Ensemble Architectures, Lausanne, Switserland: Frontiers in Big Data; 2020. DOI: 10.3389/fdata.2020.00004
  21. 21. Singh A: 7 Methods to Perform Time Series Forecasting. Udyog Vihar, Gurugram, India: Analytics Vidhya; 2018. Available from: [Accessed: March 5, 2021]
  22. 22. Hewamalage H, Bergmeir C, Bandara K. Recurrent Neural Networks for time series Forecasting: Current Status and Future Directions. Monash University, Australia: Elsevier; 2021. DOI: 10.1016/j.ijforecast.2020.06.008
  23. 23. Prem: Top 5 common Time Series Forecasting Algorithms. Iunera GmbH & Co KG. 2021. Available from:\newline-algorithms. [Accessed: May 27, 2021]
  24. 24. Brownlee J: 11 Classical Time Series Forecasting Methods in Python(Cheat Sheet). Calle de San Francisco, San Juan, Puerto Rico: Machine Learning Mastery Pty Ltd; 2018. Available from:\newlinepython-cheat-sheet. [Accessed: May 8, 2021]
  25. 25. Domingos S, de Oliveira J, de Mattos NF, Paulo S. An Intelligent Hybridization of ARIMA with Machine Learning Models for Time Series Forecasting. Universidade de Pernambuco, Brazil: Elsevier; 2019. DOI: 10.1016/j.knosys.2019.03.011
  26. 26. Masum S, Liu Y, Chiverton J. Multi-step Time Series Forecasting of Electric Load Using Machine Learning Models. University of Portsmouth, UK: Springer; 2018. DOI: 10.1007/978-3-319-91253-0\_15
  27. 27. Brownlee J. How to Create an ARIMA Model for Time Series Forecasting in Python. Calle de San Francisco, San Juan, Puerto Rico: Machine Learning Mastery Pty Ltd; 2017. Available from:
  28. 28. McKinney W, Perktold J, Seabold S. Time Series Analysis in Python with Statsmodels. Duke University, Durham, North Carolina, USA: Jarrodmillman Com; 2011
  29. 29. Moews B, Herrmann J, Ibikunle G. Lagged Correlation-Based Deep Learning for Directional Trend Change Prediction in Financial Time Series. University of Edinburgh, UK: Elsevier; 2019. DOI: 10.1016/j.eswa.2018.11.027
  30. 30. Bishop C. Pattern Recognition and Machine Learning. Cambridge, UK: Springer, Microsoft Research Ltd,; 2006. ISBN-10: 0-387-31073-8
  31. 31. Sezer O, Gudelek M, Ozbayoglu A. Financial Time Series Forecasting with Deep Learning: A Systematic Literature Review: 2005--2019. TOBB University of Economics and Technology, Ankara, Turkey: Elsevier; 2020. DOI: 10.1016/j.asoc.2020.106181
  32. 32. Heller M: Automated Machine Learning or AutoML Explained. Needham, Massachussetts, USA: IDG Communications Inc; 2019. Available from:\newlineautoml-explained.html. [Accessed: May 27, 2021]
  33. 33. Amidi S. Deep Learning Cheatsheet. Stanford, California, USA: Standford University; 2018. Available from:∼shervine/teaching/cs-229/cheatsheet-deep-learning
  34. 34. Bealdung Corporation: Epoch in Neural Networks. Bucharest, Romania; Bealdung; 2021. Available from:
  35. 35. LeCun Y, Bengio Y. Convolutional Networks for Images, Speech, and Time Series. The handbook of Brain Theory and Neural Networks. Holmdel, New Jersey, USA: AT&T Bell Laboratories; 1995. Available from:∼lisa/pointeurs/handbook-convo.pdf
  36. 36. Pozorska J, Scherer M. Company Bankruptcy Prediction with Neural Networks. Springer, Czestochowa, Poland: Czestochowa University of Technology; 2018. DOI: 10.1007/978-3-319-91253-0\_18
  37. 37. Smith J, Wilamowski B. Discrete Cosine Transform Spectral Pooling Layers for Convolutional Neural Networks. Alabama, USA: Springer. Auburn University; 2018. DOI: 10.1007/978-3-319-91253-0\_23
  38. 38. Solia P: Convolutions and Backpropagations. San Francisco HQ, San Francisco, California, USA: Medium Corporation; 2018. Available from:\newline-46026a8f5d2c. [Accessed: May 19, 2021]
  39. 39. Raschka S, Patterson J, Nolet C. Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence. Multidisciplinary Digital Publishing Institute, Madison, Wisconsin, USA: University of Wisconsin-Madison; 2020. DOI: 10.3390/info11040193
  40. 40. Data Science Show: How to use ACF and PACF to Identify Time Series Analysis Models. 2020. Available from: [Accessed: May 27, 2021]
  41. 41. Andrade F. A simple Guide to Beautiful Visualizations in Python. San Francisco HQ, San Francisco, California, USA; Medium corporation. Available from:\newlinevisualizations-in-python-\newlinef564e6b9d392. [Accessed: May 8, 2021]
  42. 42. Smyl S. A Hybrid Method of Exponential Smoothing and Recurrent Neural Networks for Time Series Forecasting. San Francisco, California, USA: Elsevier. Uber Technologies; 2020. DOI: 10.1016/j.ijforecast.2019.03.017
  43. 43. Garcia S, Ramirez-Gallego S, Luengo J, Benitez J, Herrera F. Big Data Preprocessing: Methods and Prospects. BioMed Central, Spain: University of Granada; 2016. DOI: 10.1186/s41044-016-0014-0
  44. 44. Brownlee J: How to Prepare Data for Machine Learning. Calle de San Francisco, San Juan, Puerto Rico: Machine Learning Mastery Pty.; 2020. Available from:\newline-learning/. [Accessed: April 25, 2021]
  45. 45. Watson A: Prediction Vs Forecasting. San Francisco HQ, San Francisco, California, USA: Medium corporation; 2021. Available from:\newline67223ff08e34. [Accessed: February 22, 2021]
  46. 46. Influxdata Inc: Time Series Forecasting Methods. San Francisco, California, USA: InfluxData Inc,; 2021. Available from: [Accessed: March 17, 2021]
  47. 47. Brownlee J: Time Series Forecasting Performance Measures with Python. Calle de San Francisco, San Juan, Puerto Rico: Machine Learning Mastery Pty Ltd.; 2020. Available from:\newline-measures-with-python/. [Accessed: 2021-03-17]
  48. 48. Ameisen E: Always Start with a Stupid Model, no Exceptions. San Francisco HQ, San Francisco, California, USA: Medium corporation; 2018. Available from:\newline-exceptions-3a22314b9aaa. [Accessed: May 8, 2021]
  49. 49. Brownlee J: How to Develop Convolutional Neural Network Models for Time Series Forecasting. Calle de San Francisco, San Juan, Puerto Rico: Machine Learning Mastery Pty Ltd.; 2018. Available from:\newline-network-models-for-time-series\newline-forecasting/. [Accessed: February 28, 2021]
  50. 50. Praveen_1998: What Is a Convolutional Neural Network. Bommanahalli, Bengaluru, India: Intellipaat; 2020. Available from: [Accessed: May 11, 2021]
  51. 51. Binkowski M, Gautier M, Donnat P. Autoregressive convolutional neural networks for asynchronous time series. International Conference on Machine Learning. 2018. PMLR;80:580-589
  52. 52. Pulagam S: Time Series Forecasting Using Auto ARIMA in Python. San Francisco HQ, San Francisco, California, USA: Medium corporation; 2020. Available from:\newline-arima-in-python-bb83e49210cd. [Accessed: May 8, 2021]

Written By

Steven Kraamwinkel

Submitted: 30 January 2022 Reviewed: 13 April 2022 Published: 15 June 2022