## 1. Introduction

The limit order book (LOB) trading mechanism became the dominant way to trade assets on financial markets. Since the limit order book represents liquidity supply of assets on a market, it essentially reflects the demand for as well as the supply of assets above the equilibrium price-volume point. Its variation is affecting the liquidity and price dynamics of an asset, and thus, the goal of this study is to conduct a comprehensive multivariate analysis of the limit order book (variation) data.

Here we model the covariance structures of order book data of several assets by employing key multivariate methods. Theodore W. Anderson synthesized various subareas of the subject and has influenced the direction of recent and current research in theoretical multivariate analysis [1]. The principal components, factor and discriminant analysis remain quite popular dimension-reduction and classification techniques that are applied in many research fields.

Multivariate techniques are, for example, recently used in financial econometrics of limit order book markets. The principal component analysis is performed in the studies about commonalities in liquidity (measures), see, for example [2, 3], or while analysing price impact data [4]. The dynamics of liquidity supply curves is captured by the so-called dynamic semiparametric factor model in [5], whereas [6] characterize traders’ behaviour using discriminant analysis.

Our focus lies on understanding of the variability of posted quantities of the asset, to be potentially sold or bought at the market. The volume (variation) at every order book level is analysed as a random variable, and thus we do not suppress the order book information through, for example, liquidity measures or reward functions. In this chapter, we consider the (full) structure of the covariance matrices. Potential applications thus include improving order execution strategies, understanding price formation and liquidity commonalities, designing trading algorithms.

This study is organised as follows: after the limit order book data have been described in Section 2, the statistical methods are presented in Section 3. Empirical results are provided in Section 4, and Section 5 concludes.

## 2. Limit order book data

The limit order book of an asset lists the volume of pending buying or selling orders at given prices for the asset under consideration and here we analyse its variance-covariance structure. At a fixed time point, the order book essentially represents a snapshot of the asset’s demand and supply curves above the market equilibrium quantity level. The volume to be potentially bought forms the asset’s demand (bid) side, whereas the volume to be potentially sold depicts the asset’s supply (ask) side. To be more precise, the order book bid and ask curves represent liquidity supply, thus quantities above the equilibrium volume level, as orders below the equilibrium (would) have been traded at the market.

### 2.1. NASDAQ market data and descriptive statistics

At the NASDAQ stock market, one of the world’s largest securities exchange, the orders are posted nearly instantaneously and the limit orders are executed in the received order. To visualize a limit order book, consider the data of Intel Corp. (INTC) on 30 June 2016, obtained from the data provider LOBSTER (lobsterdata.com). The number of shares to be potentially bought or sold at different prices at 10:00 and 11:00 are depicted in **Figure 1**. For example, at 10:00 at prices 32.14 (fifth best bid price) and 32.18 (best bid price), there are 16,834 and 2927 stocks demanded, respectively. At the same time, the number of offered shares at prices 32.19 (best ask price) and 32.23 (fifth best ask price) similarly equals 1700 and 15,355, respectively. At 11:00, one furthermore observes that the order book shifted to the direction of higher prices. We attribute this movement to the (observed) increased demand pressure.

At the NASDAQ order book driven securities exchange, there are several event types that influence the bid and ask curves, namely submissions of new limit orders, cancellations, deletions and executions (lobsterdata.com). Our data set allows us thus to reconstruct all order book activities of a particular company over the course of a trading day. For a description of trading that is common to most limit order book markets, see, for example [7].

The order book volume at given price level represents here a *p*-dimensional random variable. Denoted by _{,} the associated volume vector. The limit order book of an asset is given by the pairs

The expected volume vector is denoted by and the object of our interest, the limit order book volume variance-covariance matrix by

here

Limit order book data of the 20 largest stocks traded at the NASDAQ stock market have been collected for the purpose of our analysis. In modelling of the high-dimensional covariance structures of this object, we set

The number of daily order book changes varies considerably across the investigated stocks, that is, between 59,628 and 1,805,688, see **Table 1**. After the referendum results, there have been many order book changes present, as compared to the trading activities on 30 June 2016. For almost all stocks, the number of changes then decreased quite substantially.

The majority of the companies had on 30 June 2016 interestingly more stocks (on average) listed at the given price levels of the order book compared to that on 27 June 2016, see **Figure 2**. For convenience, denote the observed

with a

### 2.2. Covariance structure estimation

The results above indicate that the order book change count as well as the estimated average volume vector changed (substantially) on 30 June as compared to the market situation on the 27 June 2016. Having estimated the mean vector, we are ready to focus on the (potential) changes in the variance-covariance matrices, that is, covariance structures of the order book data. The covariance matrix of the order book volume is estimated by

where **Figures 3** and **4**, for the mega-cap and large-cap stocks, respectively. Since the analysed order book volume vector is a 10-dimensional object, _{} matrix **Figures 3** and **4**. The matrix values are used to define the vertex colours by scaling the values to map to the full range of the ‘colourmap’, see the MATLAB documentation for more details. Note that a darker (blue) colour shows a larger value of the estimated covariance between the random variables and vice versa.

Our empirical results indicate several interesting findings. One observes a relatively stronger variation in the individual volume variables than the covariance levels across all stocks. We aim identifying the linear combination that is responsible for the largest proportion of the data variation. There are furthermore relatively larger covariance levels between the bid and ask sides on 30 June 2016 in comparison with the levels on 27 June 2016, indicating a stronger impact of one market side on order book variation immediately after the referendum results. Our analysis aims particularly to select the most important factor associated with this variation.

## 3. Statistical modelling

### 3.1. Modelling framework

Recall, we model the limit order book volume as a *p*-dimensional random vector

Among multivariate techniques that deal with dimension reduction of high-dimensional random vectors, in volume covariance structure modelling we focus on the principal components, factor and discriminant analysis. Multivariate techniques deal with simultaneous relationship among variables and differ from univariate and bivariate analysis in that they direct attention away from the analysis of the mean and variance of single variable or from the pairwise relationship between two variables, to the analysis of the covariances and correlations among three or more variables [9].

### 3.2. Principal components analysis

Principal component analysis focuses on standardised principal components of a high-dimensional random variable. It has been first introduced by Karl Pearson for nonstochastic variables and by Harold Hotelling for random vectors [10]. The low dimensional representation enables us to study the correlation between the principal components and the original data; here our goal is to find the standardized linear combination of the order book volume vector

The standardized linear combination of a *p*-dimensional variable

In modelling order book data, we estimate the principal components by

with _{ } matrix

### 3.3. Factor analysis

In factor analysis the random vector *k*-factor model

where

The associated factor loadings represent the combinations which reflect the common variance part and the remaining variation is quantified through the covariance matrix of the specific factors. In practice, we are consequently interested in estimating the matrix of common factor loadings

where

### 3.4. Discriminant analysis

In discriminant analysis, multivariate data observations are classified into two or more known groups. A modern treatment of discriminant analysis and a brief history of discriminant analysis is included in [10]. In the analysis of group differences [13], for example, state two questions: (i) does there exists a significant difference between the groups (variation) and (ii) which variables are responsible in this aspect? In practice, a discriminant rule is used to classify existing and new observations and the number of correctly classified observations reflects the quality of the approach. Here we are interested in the classification accuracy: to which extend a price change can be expected (or not) at each order book entry based exclusively on observed volume data.

The linear Fisher’s discriminant rule is based on a linear combination of data, say _{ }and _{ }and

where the

## 4. Empirical results

An analysis of principal components often reveals relationships that were not previously suspected and thereby allows interpretations that would not ordinarily results [15]. Consider, for example, the proportion of order book variance explained by two principal components in **Table 2**. Two principal components are sufficient to describe the order book variation, since the explained proportions range between 0.81 and 0.96 (27 June 2016) and 0.78 and 0.97 (30 June 2016).

The limit order book variation of most companies is clearly stronger explained on 30 June 2016 as compared to the resulting explanatory power on 27 June 2016. The largest explained proportion increase is evident for smaller stocks, especially for SBUX, CELG, QCOM, COST and PCLN. Looking only at the descriptive results reported in **Table 1** one would conclude that the number of changes is apparently similar across all stocks. Now it is evident that the demand and supply curves of smaller stocks change relatively stronger during turbulent times (here during a downward price movement). We attribute this to the relatively lower liquidity of large-cap stocks as compared to the highly liquid mega-cap stocks.

Factor analysis can be considered as an extension of principal component analysis, although both techniques can be viewed as attempts to approximate the covariance matrix; however, the approximation based on the factor analysis model is more elaborate [15]. In the sequel, we chose a **Tables 3** and **4** for the mega-cap and large-cap companies, respectively, based on the estimated values of the factor loadings

2016-06-27 | 2016-06-30 | 2016-06-27 | 2016-06-30 | ||
---|---|---|---|---|---|

AAPL | Demand | Demand | FB | Supply | Demand |

GOOGL | Demand | Demand | CMCSA | Demand | Demand |

GOOG | Demand | Supply | INTC | Supply | Demand |

MSFT | Supply | Supply | CSCO | Demand | Both |

AMZN | Demand | Demand | AMGN | Supply | Supply |

2016-06-27 | 2016-06-30 | 2016-06-27 | 2016-06-30 | ||
---|---|---|---|---|---|

GILD | Supply | Demand | QCOM | Supply | Supply |

KHC | Demand | Demand | COST | Supply | Supply |

WBA | Supply | Supply | MDLZ | Supply | Demand |

SBUX | Supply | Demand | PCLN | Demand | Demand |

CELG | Demand | Supply | TXN | Supply | Supply |

Across all stocks, demand is selected as the most important factor on 30 June 2016. The price of the companies indeed reacted positively during this day. For most of the relatively illiquid large-cap stocks, interestingly, the same factor has been identified on both days. Its magnitude changed, as evident from the principal components analysis.

Discriminant analysis cannot usually provide an error-free method of assignment of data, because there may not be a clear distinction between the measured characteristics of the populations—that is, the groups may overlap [15]. We report the proportions of correctly classified price changes based only on volume data in **Tables 5** and **6** for the selected mega-cap and largest large-cap, and large-cap stocks, respectively.

2016-06-27 | 2016-06-30 | 2016-06-27 | 2016-06-30 | ||
---|---|---|---|---|---|

AAPL | 0.41 | 0.55 | FB | 0.42 | 0.41 |

GOOGL | 0.50 | 0.59 | CMCSA | 0.58 | 0.51 |

GOOG | 0.51 | 0.56 | INTC | 0.64 | 0.65 |

MSFT | 0.61 | 0.64 | CSCO | 0.67 | 0.68 |

AMZN | 0.54 | 0.56 | AMGN | 0.49 | 0.50 |

2016-06-27 | 2016-06-30 | 2016-06-27 | 2016-06-30 | ||
---|---|---|---|---|---|

GILD | 0.49 | 0.52 | QCOM | 0.50 | 0.64 |

KHC | 0.47 | 0.50 | COST | 0.50 | 0.50 |

WBA | 0.47 | 0.50 | MDLZ | 0.54 | 0.53 |

SBUX | 0.45 | 0.55 | PCLN | 0.52 | 0.57 |

CELG | 0.53 | 0.51 | TXN | 0.55 | 0.60 |

The empirical findings suggest that limit order book volume data successfully classify price changes, especially on 30 June 2016, a day with relatively low number of order book entries. Here the first group contains entries with mid-quote price

## 5. Conclusions

Limit order book data of 20 highly traded stocks at the NASDAQ market in June 2016 have been analysed. We select 2 days after the ‘Britex’ referendum, namely, 27 June (lowest S&P 500 level) and 30 June (recovery day). The variable of interest is the 10-dimensional order book volume data vector, that is, quantities pending at the five best levels of the demand side and at the five best supply side levels.

Two principal components account for approximately 85–95% of the order book data variation. The results of a one-factor model identify the demand (variation) as the most important factor explaining the order book covariance structure. The limit order book volume data variation is quite informative in predicting the price evolution (change or no change in the mid-quote) across all stocks and during the analysed trading activities. The mega-cap and the smallest investigated large-cap companies share almost the same classification performance. Finally, multivariate statistical techniques are successfully employed in covariance modelling of order book data.