Stochastic Leader-Follower Differential Game with Asymmetric Information

Jingtao Shi

doi:10.5772/intechopen.75413

Abstract

In this chapter, we discuss a leader-follower (also called Stackelberg) stochastic differential game with asymmetric information. Here the word “asymmetric” means that the available information of the follower is some sub- σ -algebra of that available to the leader, though they play as different roles in the classical literatures. Stackelberg equilibrium is represented by the stochastic versions of Pontryagin’s maximum principle and verification theorem with partial information. A linear-quadratic (LQ) leader-follower stochastic differential game with asymmetric information is studied as applications. If some system of Riccati equations is solvable, the Stackelberg equilibrium admits a state feedback representation.

Keywords

backward stochastic differential equation (BSDE)
leader-follower stochastic differential game
asymmetric information
stochastic filtering
linear-quadratic control
Stackelberg equilibrium

Author Information

Show +

Jingtao Shi*
- School of Mathematics, Shandong University, Jinan, P.R. China

*Address all correspondence to: shijingtao@sdu.edu.cn

1. Introduction

Throughout this chapter, we denote by R n the Euclidean space of n -dimensional vectors, by R n × d the space of n × d matrices, by S n the space of n × n symmetric matrices. ⋅ ⋅ and ∣ ⋅ ∣ denote the scalar product and norm in the Euclidean space, respectively. Τ appearing in the superscripts denotes the transpose of a matrix. f x , f xx denote the partial derivative and twice partial derivative with respect to x for a differentiable function f .

1.1. Motivation

In practice, there are many problems which motivate us to study the leader-follower stochastic differential games with asymmetric information. Here we present two examples.

Example 1.1: (Continuous time principal-agent problem) The principal contracts with the agent to manage a production process, whose cumulative proceeds (or output) Y t evolve on 0 T as follows:

dY t = Be t dt + σ dW t + σ ˜ d W ˜ t , Y 0 = Y 0 ∈ R , E1

where e t ∈ R is the agent’s effort choice, B represents the productivity of effort, and there are two additive shocks (due to the two independent Brownian motions W , W ˜ ) to the output. The proceeds of the production add to the principal’s asset y t , which earns a risk free return r , and out of which he pays the agent s t ∈ R and withdraws his own consumption d t ∈ R . Thus the principal’s asset evolves as

dy t = ry t + Be t − s t − d t dt + σ dW t + σ ˜ d W ˜ t , y 0 = y 0 ∈ R , E2

where y 0 is the initial asset. The agent has his own wealth m t , out of which he consumes c t , thus

dm t = rm t + s t − c t dt + σ ¯ dW t + σ ¯ ˜ d W ˜ t , m 0 = m 0 ∈ R , E3

Thus, the agent earns the same rate of return r on his savings, gets income flows due to his payment s t , and draws down wealth to consume. In the above σ , σ ˜ , σ ¯ , σ ¯ ˜ are all constants. At the terminal time T , the principal makes a final payment s T and the agent chooses consumption based on this payment and his terminal wealth m T . In the above, we restrict y t , s t , d t to be nonnegative.

We consider an optimal implementable contract problem in the so-called “hidden savings” information structure (Williams [1], also in Williams [2]). In this problem, the principal can observe his asset y t and the agent’s initial wealth m 0 but cannot monitor the agent’s effort e t , consumption c t , and wealth m t for t > 0 . The principal must provide incentives for the agent to put forth the desired amount of the effort. For any s t , d t , the agent first chooses his effort e t ∗ and consumption c t ∗ , such that his exponential preference

J 1 e c s d = E − ∫ 0 T e − ρt exp − λ c t − 1 2 e t 2 dt + e − ρT s T + m T E4

is maximized. Here ρ > 0 is the discount rate and λ > 0 denotes the risk aversion parameter. The above e t ∗ c t ∗ is called an implementable contract if it meets the recommended actions of the principal’s, which is based on the principal’s observable wealth y t . Then, the principal selects his payment s t ∗ and consumption d t ∗ to maximize his exponential preference

J 2 e ∗ c ∗ s d = E − ∫ 0 T e − ρt exp − λ d t dt + e − ρT y T − s T . E5

Let F t denote the σ -algebra generated by Brownian motions W s , W ˜ s , 0 ≤ s ≤ t . Intuitively, F t contains all the information up to time t . Let G 1 , t contains the information available to the agent, and G 2 , t contains the information available to the principal, up to time t respectively. Moreover, G 1 , t ⊆ G 2 , t . In the game problem, first the agent solves the following optimization problem:

J 1 e ∗ c ∗ s d = max e , c J 1 e c s d , E6

where e ∗ c ∗ is a G 1 , t -adapted process pair. And then the principal solves the following optimization problem:

J 2 e ∗ c ∗ s ∗ d ∗ = max s , d J 2 e ∗ c ∗ s d , E7

where s ∗ d ∗ is a G 2 , t -adapted process pair. This formulates a stochastic Stackelberg differential game with asymmetric information. In this setting, the agent is the follower and the principal is the leader. Any process quadruple e ∗ c ∗ s ∗ d ∗ satisfying the above two equalities is called a Stackelberg equilibrium. In Williams [1], a solvable continuous time principal-agent model is considered under three information structures (full information, hidden actions, and hidden savings) and the corresponding optimal contract problems are solved explicitly. But it can not cover our model.

Example 1.2: (Continuous time manufacturer-newsvendor problem) Let D ⋅ be the demand rate for a product in the market, which satisfies

dD t = a μ − D t dt + σdW t + σ ˜ W ˜ t , D 0 = d 0 ∈ R , E8

where a , μ , σ , σ ˜ are constants. Suppose that the market is consisted with a manufacturer selling the product to end users through a retailer. At time t , the retailer chooses an order rate q t for the product and decides its retail price R t , and is offered a wholesale price w t by the manufacturer. We assume that items can be salvaged at unit price S ≥ 0 , and that items cannot be stored, that is, they must be sold instantly or salvaged. The retailer will obtain an expected profit

J 1 q ⋅ R ⋅ w ⋅ = E ∫ 0 T R t − S min D t q t − w t − S q t dt . E9

When the manufacturer has a fixed production cost per unit M ≥ 0 , he will get an expected profit

J 2 q ⋅ R ⋅ w ⋅ = E ∫ 0 T w t − M q t − S max q t − D t 0 dt . E10

In the above, we assume that S < M ≤ w t ≤ R t .

Let F t denote the σ -algebra generated by W s , W ˜ s , 0 ≤ s ≤ t , which contains all the information up to time t . At time t , let the information G 1 , t , G 2 , t available to the retailer and the manufacturer, respectively, are both sub- σ -algebras of F t . Moreover, G 1 , t ⊆ G 2 , t . This can be explained from the practical application’s aspect. Specifically, the manufacturer chooses a wholesale price w t at time t , which is a G 2 , t -adapted stochastic process. And the retailer chooses an order rate q t and a retail price R t at time t , which are G 1 , t -adapted stochastic processes. For any w ⋅ , to select a G 1 , t -adapted process pair q ∗ ⋅ R ∗ ⋅ for the retailer such that

J 1 q ∗ ⋅ R ∗ ⋅ w ⋅ ≡ J 1 q ∗ ⋅ w ⋅ R ∗ ⋅ w ⋅ w ⋅ = max q ⋅ , R ⋅ J 1 q ⋅ R ⋅ w ⋅ , E11

and then to select a G 2 , t -adapted process w ∗ ⋅ for the manufacturer such that

J 2 q ∗ ⋅ R ∗ ⋅ w ∗ ⋅ ≡ J 2 q ∗ ⋅ w ∗ ⋅ R ∗ ⋅ w ∗ ⋅ w ∗ ⋅ = max w ⋅ J 2 q ∗ ⋅ w ⋅ R ∗ ⋅ w ⋅ w ⋅ , E12

formulates a leader-follower stochastic differential game with asymmetric information. In this setting, the manufacturer is the leader and the retailer is the follower. Any process triple q ∗ ⋅ R ∗ ⋅ w ∗ ⋅ satisfying the above is called a Stackelberg equilibrium. In Øksendal et al. [3], a time-dependent newsvendor problem with time-delayed information is solved, based on stochastic differential game (with jump-diffusion) approach. But it cannot cover our model.

1.2. Problem formulation

Motivated by the examples earlier, in this chapter we study the leader-follower stochastic differential games with asymmetric information. Let Ω F ℙ be a complete probability space. W ⋅ W ˜ ⋅ is a standard R 2 -valued Brownian motion and F t 0 ≤ t ≤ T be its natural augmented filtration and F T = F where T > 0 is a finite time duration. Let the state satisfy the stochastic differential equation (SDE)

dx u 1 , u 2 t = b t x u 1 , u 2 t u 1 t u 2 t dt + σ t x u 1 , u 2 t u 1 t u 2 t dW t + σ ˜ t x u 1 , u 2 t u 1 t u 2 t d W ˜ t , x u 1 , u 2 0 = x 0 , E13

where u 1 ⋅ and u 2 ⋅ are control processes taken by the two players in the game, labeled 1 (the follower) and 2 (the leader), with values in nonempty convex sets U 1 ⊆ R , U 2 ⊆ R , respectively. x u 1 , u 2 ⋅ , the solution to SDE Eq. (13) with values in R , is the state process with initial state x 0 ∈ R n . Here b t x u 1 u 2 : Ω × 0 T × R × U 1 × U 2 → R , σ t x u 1 u 2 : Ω × 0 T × R × U 1 × U 2 → R , σ ˜ t x u 1 u 2 : Ω × 0 T × R × U 1 × U 2 → R are given F t -adapted processes, for each x u 1 u 2 .

Let us now explain the asymmetric information character between the follower (player 1) and the leader (player 2) in this chapter. Player 1 is the follower, and the information available to him at time t is based on some sub- σ -algebra G 1 , t ⊆ G 2 , t , where G 2 , t is the information available to the leader. We assume in this and next sections that G 1 , t ⊆ G 2 , t ⊆ F t . We define the admissible control sets of the follower and the leader, respectively, as follows.

U k ≔ u k u k : Ω × 0 T → U k is G k , t − adapted and sup 0 ≤ t ≤ T E u k t i < ∞ i = 1 2 ⋯ , k = 1 , 2 . E14

The game initiates with the announcement of the leaders control u 2 ⋅ ∈ U 2 . Knowing this, the follower would like to choose a G 1 , t -adapted control u 1 ∗ ⋅ = u 1 ∗ ⋅ u 2 ⋅ to minimize his cost functional

J 1 u 1 ⋅ u 2 ⋅ = E ∫ 0 T g 1 ( t x u 1 , u 2 t u 1 t u 2 t ) dt + G 1 x u 1 , u 2 T . E15

Here g 1 t x u 1 u 2 : Ω × 0 T × R × U 1 × U 2 → R is an F t -adapted process, and G 1 x : Ω × R → R is an F T -measurable random variable, for each x u 1 u 2 . Now the follower encounters a stochastic optimal control problem with partial information.

SOCPF. For any chosen u 2 ⋅ ∈ U 2 by the leader, choose a G 1 , t -adapted control u 1 ∗ ⋅ = u 1 ∗ ⋅ u 2 ⋅ ∈ U 1 , such that

J 1 u 1 ∗ ⋅ u 2 ⋅ ≡ J 1 u 1 ∗ ⋅ u 2 ⋅ u 2 ⋅ = inf u 1 ∈ U 1 J 1 u 1 ⋅ u 2 ⋅ , E16

subject to Eqs. (13) and (15). Such a u 1 ∗ ⋅ = u 1 ∗ ⋅ u 2 ⋅ is called an optimal control, and the corresponding solution x u 1 ∗ , u 2 ⋅ to Eq. (13) is called an optimal state.

In the following step, once knowing that the follower will take such an optimal control u 1 ∗ ⋅ = u 1 ∗ ⋅ u 2 ⋅ , the leader would like to choose a G 2 , t -adapted control u 2 ∗ ⋅ to minimize his cost functional

J 2 u 1 ∗ ⋅ u 2 ⋅ = E ∫ 0 T g 2 ( t x u 1 ∗ , u 2 t u 1 ∗ ( t u 2 t ) u 2 t ) dt + G 2 x u 1 ∗ , u 2 T . E17

Here g 2 t x u 1 u 2 : Ω × 0 T × R × U 1 × U 2 → R , G 2 x : Ω × R → R are given F t -adapted processes, for each x u 1 u 2 . Now the leader encounters a stochastic optimal control problem with partial information.

SOCPL. Find a G 2 , t -adapted control u 2 ∗ ⋅ ∈ U 2 , such that

J 2 u 1 ∗ ⋅ u 2 ∗ ⋅ = J 2 u 1 ∗ ⋅ u 2 ∗ ⋅ u 2 ∗ ⋅ = inf u 2 ∈ U 2 J 2 u 1 ∗ ⋅ u 2 ⋅ u 2 ⋅ , E18

subject to Eqs. (13) and (17). Such a u 2 ∗ ⋅ is called an optimal control, and the corresponding solution x ∗ ⋅ ≡ x u 1 ∗ , u 2 ∗ ⋅ to Eq. (13) is called an optimal state. We will rewrite the problem for the leader in more detail in the next section. We refer to the problem mentioned above as a leader-follower stochastic differential game with asymmetric information. If there exists a control process pair u 1 ∗ ⋅ u 2 ∗ ⋅ = u 1 ∗ ⋅ u 2 ∗ ⋅ u 2 ∗ ⋅ satisfying Eqs. (16) and (18), we refer to it as a Stackelberg equilibrium.

In this chapter, we impose the following assumptions.

(A1.1) For each ω ∈ Ω , the functions b , σ , σ ˜ , g 1 are twice continuously differentiable in x u 1 u 2 . For each ω ∈ Ω , functions g 2 and G 1 , G 2 are continuously differentiable in x u 1 u 2 and x , respectively. Moreover, for each ω ∈ Ω and any t x u 1 u 2 ∈ 0 T × R × U 1 × U 2 , there exists C > 0 such that

1 + x + u 1 + u 2 − 1 ∣ ϕ t x u 1 u 2 ∣ + ∣ ϕ x t x u 1 u 2 ∣ + ∣ ϕ u 1 t x u 1 u 2 ∣ + ∣ ϕ u 2 t x u 1 u 2 ∣ + ∣ ϕ xx t x u 1 u 2 ∣ + ∣ ϕ u 1 u 1 t x u 1 u 2 ∣ + ∣ ϕ u 2 u 2 t x u 1 u 2 ∣ ≤ C , E19

for ϕ = b , σ , σ ˜ , and

1 + x 2 − 1 ∣ G 1 x ∣ + 1 + x − 1 ∣ G 1 x x ∣ + 1 + x 2 − 1 ∣ G 2 x ∣ + 1 + x − 1 ∣ G 2 x x ∣ ≤ C , 1 + x 2 + u 1 2 + u 2 2 − 1 ∣ g 1 t x u 1 u 2 ∣ + 1 + x + u 1 + u 2 − 1 ( ∣ g 1 x t x u 1 u 2 ∣ + ∣ g 1 u 1 t x u 1 u 2 ∣ + ∣ g 1 u 2 t x u 1 u 2 ∣ ) + ∣ g 1 xx t x u 1 u 2 ∣ + ∣ g 1 u 1 u 1 t x u 1 u 2 ∣ + ∣ g 1 u 2 u 2 t x u 1 u 2 ∣ ≤ C , 1 + x 2 + u 1 2 + u 2 2 − 1 ∣ g 2 t x u 1 u 2 ∣ + 1 + x + u 1 + u 2 − 1 ( ∣ g 2 x t x u 1 u 2 ∣ + ∣ g 2 u 1 t x u 1 u 2 ∣ + ∣ g 2 u 2 t x u 1 u 2 ∣ ) ≤ C . E20

1.3. Literature review and contributions of this chapter

Differential games are initiated by Issacs [4], which are powerful in modeling dynamic systems where more than one decision-makers are involved. Differential games have been researched by many scholars and have been applied in biology, economics, and finance. Stochastic differential games are differential games for stochastic systems involving noise terms. See Basar and Olsder [5] for more information about differential games. Recent developments for stochastic differential games can be seen in Hamadène [6], Wu [7], An and Øksendal [8], Wang and Yu [9, 10], and the references therein.

Leader-follower stochastic differential game is the stochastic and dynamic formulation of the Stackelberg game, which was introduced by Stackelberg [11] in 1934, when the concept of a hierarchical solution for markets where some firms have power of domination over others, is defined. This solution concept is now known as the Stackelberg equilibrium, which in the context of two-person nonzero-sum games, involves players with asymmetric roles, one leader and one follower. Pioneer study for stochastic Stackelberg differential games can be seen in Basar [12]. Specifically, a leader-follower stochastic differential game begins with the follower aims at minimizing his cost functional in response to the leader’s decision on the whole duration of the game. Anticipating the follower’s optimal decision depending on his entire strategy, the leader selects an optimal strategy in advance to minimize his cost functional, based on the stochastic Hamiltonian system satisfied by the follower’s optimal decision. The pair of the leader’s optimal strategy and the follower’s optimal response is known as the Stackelberg equilibrium.

A linear-quadratic (LQ) leader-follower stochastic differential game was studied by Yong [13] in 2002. The coefficients of the the cost functionals and system are random, the diffusion term of the state equation contain the controls, and the weight matrices for the controls in the cost functionals are not necessarily positive definite. The related Riccati equations are derived to give a state feedback representation of the Stackelberg equilibrium in a nonanticipating way. Bensoussan et al. [14] obtained the global maximum principles for both open-loop and closed-loop stochastic Stackelberg differential games, whereas the diffusion term does not contain the controls.

In this chapter, we study a leader-follower stochastic differential game with asymmetric information. Our work distinguishes itself from these mentioned above in the following aspects. (1) In our framework, the information available to the follower is based on some sub- σ -algebra of that available to the leader. Moreover, both information filtration available to the leader and the follower could be sub- σ -algebras of the complete information filtration naturally generated by the random noise source. This gives a new explanation for the asymmetric information feature between the follower and the leader, and endows our problem formulation more practical meanings in realty. (2) Our work is established in the context of partial information, which is different from that of partial observation (see e.g., Wang et al. [15]) but related to An and Øksendal [8], Huang et al. [16], Wang and Yu [10]. (3) An important class of LQ leader-follower stochastic differential game with asymmetric information is proposed and then completely solved, which is a natural generalization of that in Yong [13]. It consists of a stochastic optimal control problem of SDE with partial information for the follower, and followed by a stochastic optimal control problem of forward-backward stochastic differential equation (FBSDE) with complete information for the leader. This problem is new in differential game theory and have considerable impacts in both theoretical analysis and practical meaning with future application prospect, although it has intrinsic mathematical difficulties. (4) The Stackelberg equilibrium of this LQ problem is characterized in terms of the forward-backward stochastic differential filtering equations (FBSDFEs) which arises naturally in our setup. These FBSDFEs are new and different from those in [10, 16]. (5) The Stackelberg equilibrium of this LQ problem is explicitly given, with the help of some new Riccati equations.

The rest of this chapter is organized as follows. In Section 2, we solve our problem to find the Stackelberg equilibrium. In Section 3, we apply our theoretical results to an LQ problem. Finally, Section 4 gives some concluding remarks.

2. Stackelberg equilibrium

2.1. The Follower’s problem

In this subsection, we first solve SOCPF. For any chosen u 2 ⋅ ∈ U 2 , let u 1 ∗ ⋅ be an optimal control for the follower and the corresponding optimal state be x u 1 ∗ , u 2 ⋅ . Define the Hamiltonian function H 1 : Ω × 0 T × R × U 1 × U 2 × R × R × R → R as

H 1 t x u 1 u 2 q k k ˜ = qb t x u 1 u 2 + kσ t x u 1 u 2 + k ˜ σ ˜ t x u 1 u 2 − g 1 t x u 1 u 2 . E21

Let an F t -adapted process triple q ⋅ k ⋅ k ˜ ⋅ ∈ R × R × R satisfies the adjoint BSDE

− dq t = b x ( t x u 1 ∗ , u 2 t u 1 ∗ t u 2 t ) q t + σ x ( t x u 1 ∗ , u 2 t u 1 ∗ t u 2 t ) k t + σ ˜ x ( t x u 1 ∗ , u 2 t u 1 ∗ t u 2 t ) k ˜ t − g 1 x ( t x u 1 ∗ , u 2 t u 1 ∗ t u 2 t ) dt − k t dW t − k ˜ t d W ˜ t , q T = − G 1 x x u 1 ∗ , u 2 T E22

Proposition 2.1 Let (A1.1) hold. For any given u 2 ⋅ ∈ U 2 , let u 1 ∗ ⋅ be the optimal control for SOCPF, and x u 1 ∗ , u 2 ⋅ be the corresponding optimal state. Let q ⋅ k ⋅ k ˜ ⋅ be the adjoint process triple. Then we have

E ⟨ H 1 u 1 ( t x u 1 ∗ , u 2 t u 1 ∗ t u 2 t q t k t k ˜ t ) u 1 − u 1 ∗ t ⟩ G 1 , t ≥ 0 , a . e . t ∈ 0 T , a . s . , E23

holds, for any u 1 ∈ U 1 .

Proof Similar to the proof of Theorem 2.1 of [10], we can get the result.

Proposition 2.2 Let (A1.1) hold. For any given u 2 ⋅ , let u 1 ∗ ⋅ ∈ U 1 and x u 1 ∗ , u 2 ⋅ be the corresponding state. Let q ⋅ k t k ˜ ⋅ be the adjoint process triple. For each t ω ∈ 0 T × Ω , H 1 t ⋅ ⋅ u 2 t q t k t k ˜ t is concave, G 1 ⋅ is convex, and

E H 1 ( t x u 1 ∗ , u 2 t u 1 ∗ t u 2 t q t k t k ˜ t ) G 1 , t = max u 1 ∈ U 1 E H 1 ( t x u 1 ∗ , u 2 t u 1 u 2 t q t k t k ˜ t ) G 1 , t , E24

holds for a . e . t ∈ 0 T , a.s. Then u 1 ∗ ⋅ is an optimal control for SOCPF.

Proof Similar to the proof of Theorem 2.3 of [10], we can obtain the result.

2.2. The Leader’s problem

In this subsection, we first state the SOCPL. Then, we give the maximum principle and verification theorem. For any u 2 ⋅ ∈ U 2 , by Eq. (23), we assume that a functional u 1 ∗ t = u 1 ∗ t x ̂ u 1 ∗ , u ̂ 2 t u ̂ 2 t q ̂ t k ̂ t k ˜ ̂ t is uniquely defined, where

x ̂ u 1 ∗ , u ̂ 2 t ≔ E x u 1 ∗ , u 2 t G 1 , t , u ̂ 2 t ≔ E u 2 t G 1 , t , q ̂ t ≔ E q t G 1 , t , k ̂ t ≔ E k t G 1 , t , k ˜ ̂ t ≔ E k ˜ t G 1 , t . E25

For the simplicity of notations, we denote x u 2 ⋅ ≡ x u 1 ∗ , u 2 ⋅ and define ϕ L on Ω × 0 T × R × U 2 as ϕ L t x u 2 t u 2 t ≔ ϕ t x u 1 ∗ , u 2 t u 1 ∗ t x ̂ u 1 ∗ , u ̂ 2 t u ̂ 2 t q ̂ t k ̂ t k ˜ t u 2 t , for ϕ = b , σ , σ ˜ , g 1 , respectively. Then after substituting the above control process u 1 ∗ ⋅ into Eq. (22), the leader encounters the controlled FBSDE system

dx u 2 t = b L t x u 2 t u 2 t dt + σ L t x u 2 t u 2 t dW t + σ ˜ L t x u 2 t u 2 t d W ˜ t , − dq t = b x L ( t x u 2 t u 2 t ) q t + σ x L ( t x u 2 t u 2 t ) k t + σ ˜ x L ( t x u 2 t u 2 t ) k ˜ t − g 1 x L ( t x u 2 t u 2 t ) dt − k t dW t − k ˜ t d W ˜ t , x u 2 0 = x 0 , q T = − G 1 x x u 2 T . E26

Note that Eq. (26) is a controlled conditional mean-field FBSDE, which now is regarded as the “state” equation of the leader. That is to say, the state for the leader is the quadruple x u 2 ⋅ q ⋅ k ⋅ k ˜ ⋅ .

Remark 2.1 The equality u 1 ∗ t = u 1 ∗ t x ̂ u 1 ∗ , u ̂ 2 t u ̂ 2 t q ̂ t k ̂ t k ˜ ̂ t does not hold in general. However, for LQ case, it is satisfied and we will make this point clear in the next section.

Define

J 2 L u 2 ⋅ ≔ J 2 u 1 ∗ ⋅ u 2 ⋅ = E ∫ 0 T g 2 ( t x u 1 ∗ , u 2 t u 1 ∗ t u 2 t ) dt + G 2 x u 1 ∗ , u 2 T ≡ E ∫ 0 T g 2 ( t x u 1 ∗ , u 2 t u 1 ∗ ( t x ̂ u 1 ∗ , u ̂ 2 t u ̂ 2 t q ̂ t k ˜ ̂ t k ˜ t ) u 2 t ) dt + G 2 x u 1 ∗ , u 2 T ≔ E ∫ 0 T g 2 L ( t x u 2 t u 2 t ) dt + G 2 x u 2 T , E27

where g 2 L : Ω × 0 T × R × U 2 → R . Note the cost functional of the leader is also conditional mean-field’s type. We propose the stochastic optimal control problem with partial information of the leader as follows.

SOCPL. Find a G 2 , t -adapted control u 2 ∗ ⋅ ∈ U 2 , such that

J 2 L u 2 ∗ ⋅ = inf u 2 ∈ U 2 J 2 L u 2 ⋅ , E28

subject to Eqs. (26) and (27). Such a u 2 ∗ ⋅ is called an optimal control, and the corresponding solution x ∗ ⋅ ≡ x u 2 ∗ ⋅ to Eq. (26) is called an optimal state process for the leader.

Let u 2 ∗ ⋅ be an optimal control for the leader, and the corresponding state x ∗ ⋅ q ∗ ⋅ k ∗ ⋅ k ˜ ∗ ⋅ is the solution to Eq. (26). Define the Hamiltonian function of the leader H 2 : Ω × 0 T × R n × U 2 × R × R × R × R × R × R × R × R → R as

H 2 t x u 2 u 2 q k k ˜ y z z ˜ p = yb L t x u 2 u 2 + z σ L t x u 2 u 2 + z ˜ σ ˜ L t x u 2 u 2 + g 2 L t x u 2 u 2 − p b x L t x u 2 u 2 q + σ x L ( t x u 2 u 2 ) k + σ ˜ x L ( t x u 2 u 2 ) k ˜ − g 1 x L ( t x u 2 u 2 ) . E29

Let ϕ L ∗ t ≡ ϕ L t x ∗ t x ̂ ∗ t u 2 ∗ t u ̂ 2 ∗ t for ϕ = b , σ , σ ˜ , g 1 , g 2 and all their derivatives. Suppose that y ⋅ z ⋅ z ˜ ⋅ p ⋅ ∈ R × R × R × R is the unique F t -adapted solution to the adjoint conditional mean-field FBSDE of the leader

dp t = b x L ∗ t p t + E [ b x ̂ L ∗ t p t G 1 , t ] dt + σ x L ∗ t p t + E [ σ x ̂ L ∗ t p t G 1 , t ] dW t + σ ˜ x L ∗ t p t + E [ σ ˜ x ̂ L ∗ t p t G 1 , t ] d W ˜ t , p 0 = 0 , − dy t = b x L ∗ t y t + E [ b x ̂ L ∗ t y t G 1 , t ] + σ x L ∗ t z t + E [ σ x ̂ L ∗ t z t G 1 , t ] + σ ˜ x L ∗ t z ˜ t + E σ ˜ x ̂ L ∗ t z ˜ t G 1 , t − b xx L ∗ t q ∗ t p t − E b x x ̂ L ∗ q ∗ t p t G 1 , t − σ xx L ∗ t k t p t − E σ x x ̂ L ∗ t k t p t G 1 , t − σ ˜ xx L ∗ t k ˜ t p t − E σ x x ̂ L ∗ t k ˜ t p t G 1 , t + g 1 xx L ∗ t p t + E [ g 1 x x ̂ L ∗ t p t G 1 , t ] + g 2 x L ∗ t + E [ g 2 x ̂ L ∗ t G 1 , t ] dt − z t dW t − z ˜ t d W ˜ t , y T = G 1 xx x ∗ T p T + G 2 x x ∗ T . E30

Now, we have the following two results.

Proposition 2.3 Let (A1.1) hold. Let u 2 ∗ ⋅ ∈ U 2 be an optimal control for SOCPL and x ∗ ⋅ q ∗ ⋅ k ∗ ⋅ k ˜ ∗ ⋅ be the optimal state. Let y ⋅ z ⋅ z ˜ ⋅ p ⋅ be the adjoint quadruple, then

E H 2 u 2 ( t x ∗ t u 2 ∗ t q ∗ t k ∗ t k ˜ ∗ t y t z t z ˜ t p t ) u 2 − u 2 ∗ t + E [ H 2 u ̂ 2 ( t x ∗ t u 2 ∗ t q ∗ t k ∗ t k ˜ ∗ t y t z t z ˜ t p t ) G 1 , t ] u ̂ 2 − u ̂ 2 ∗ t G 2 , t ≥ 0 , a . e . t ∈ 0 T , a . s . , for any u 2 ∈ U 2 . E31

Proof The maximum condition Eq. (31) can be derived by convex variation and adjoint technique, as Anderson and Djehiche [17]. We omit the details for saving space. See also Li [18], Yong [19] and the references therein for mean-field stochastic optimal control problems. □

Proposition 2.4 Let (A1.1) hold. Let u 2 ∗ ⋅ ∈ U 2 and x ∗ ⋅ q ∗ ⋅ k ∗ ⋅ k ˜ ∗ ⋅ be the corresponding state, with G 1 xx x ≡ G 1 ∈ S n . Let y ⋅ z ⋅ z ˜ ⋅ p ⋅ be the adjoint quadruple. For each t ω ∈ 0 T × Ω , suppose that H 2 t ⋅ ⋅ ⋅ ⋅ ⋅ y t z t z ˜ t p t and G 2 ⋅ are convex, and

E H 2 ( t x ∗ t u 2 ∗ t q ∗ t k ∗ t k ˜ ∗ t y t z t z ˜ t p t ) + E [ H 2 ( t x ∗ t u 2 ∗ t q ∗ t k ∗ t k ˜ ∗ t y t z t z ˜ t p t ) G 1 , t ] G 2 , t = max u 2 ∈ U 2 E H 2 ( t x ∗ t u 2 q ∗ t k ∗ t k ˜ ∗ t y t z t z ˜ t p t ) + E [ H 2 ( t x ∗ t u 2 q ∗ t k ∗ t k ˜ ∗ t y t z t z ˜ t p t ) G 1 , t ] G 2 , t , a . e . t ∈ 0 T , a . s . E32

Then u 2 ∗ ⋅ is an optimal control for SOCPL.

Proof This follows similar to Shi [20]. We omit the details for simplicity. □

3. Applications to LQ case

In order to illustrate the theoretical results in Section 2, we study an LQ leader-follower stochastic differential game with asymmetric information. In this section, we let G 1 , t ≔ σ W ˜ s 0 ≤ s ≤ t and G 2 , t = F t . This game is a special case of the one in Section 2, but the resulting deduction is very technically demanding. We split this section into two subsections, to deal with the problems of the follower and the leader, respectively.

3.1. Problem of the follower

Suppose that the state x u 1 , u 2 ∈ R satisfies a linear SDE

dx u 1 , u 2 t = Ax u 1 , u 2 t + B 1 u 1 t + B 2 u 2 t dt + Cx u 1 , u 2 t + D 1 u 1 t + D 2 u 2 t dW t + C ˜ x u 1 , u 2 t + D ˜ 1 u 1 t + D ˜ 2 u 2 t ] d W ˜ t , x u 1 , u 2 0 = x 0 . E33

Here, u 1 is the follower’s control process and u 2 is the leader’s control process, which take values both in R ; A , C , C ˜ , B 1 , D 1 , D ˜ 1 , B 2 , D 2 , D ˜ 2 are constants. In the first step, for announced u 2 , the follower would like to choose a G 1 , t -adapted, square-integrable control u 1 ∗ to minimize the cost functional

J 1 u 1 u 2 = 1 2 E ∫ 0 T Q 1 x u 1 , u 2 t 2 + N 1 u 1 t 2 dt + G 1 x u 1 , u 2 T 2 . E34

In the second step, knowing that the follower would take u 1 ∗ , the leader wishes to choose an F t -adapted, square-integrable control u 2 ∗ to minimize

J 2 u 1 ∗ u 2 = 1 2 E ∫ 0 T Q 2 x u 1 ∗ , u 2 t 2 + N 2 u 2 t 2 dt + G 2 x u 1 ∗ , u 2 T 2 , E35

where Q 1 , Q 2 , G 1 , G 2 ≥ 0 , N 1 ≥ 0 , N 2 > 0 are constants. This is an LQ leader-follower stochastic differential game with asymmetric information. We wish to find its Stackelberg equilibrium u 1 ∗ u 2 ∗ .

Define the Hamiltonian function of the follower as

H 1 t x u 1 u 2 q k k ˜ = q Ax + B 1 u 1 + B 2 u 2 + k Cx + D 1 u 1 + D 2 u 2 + k ˜ C ˜ x + D ˜ 1 u 1 + D ˜ 2 u 2 − 1 2 Q 1 x 2 − 1 2 N 1 u 1 2 . E36

For given control u 2 , suppose that there exists a G 1 , t -adapted optimal control u 1 ∗ of the follower, and the corresponding optimal state is x u 1 ∗ , u 2 . By Proposition 2.1, Eq. (36) yields that

0 = N 1 u 1 ∗ t − B 1 q ̂ t − D 1 k ̂ t − D ˜ 1 k ˜ ̂ t , E37

where the F t -adapted process triple q k k ˜ ∈ R × R × R satisfies the BSDE

− dq t = Aq t + Ck t + C ˜ k ˜ t − Q 1 x u 1 ∗ , u 2 t dt − k t dW t − k ˜ t d W ˜ t , q T = − G 1 x u 1 ∗ , u 2 T . E38

We wish to obtain the state feedback form of u 1 ∗ . Noting the terminal condition of Eq. (38) and the appearance of u 2 , we set

q t = − P t x u 1 ∗ , u 2 t − φ t , t ∈ 0 T , E39

for some deterministic and differentiable R -valued function P t , and R -valued, F t -adapted process φ which admits the BSDE

dφ t = α t dt + β t d W ˜ t , φ T = 0 . E40

In the above equation, α ∈ R , β ∈ R are F t -adapted processes, which are to be determined later. Now, applying Itô’s formula to Eq. (39), we have

− dq t = P ̇ t x u 1 ∗ , u 2 t + P t Ax u 1 ∗ , u 2 t + α t + P t B 1 u 1 ∗ t + P t B 2 u 2 t dt + P t Cx u 1 ∗ , u 2 t + D 1 u 1 ∗ t + D 2 u 2 t dW t + P t C ˜ x u 1 ∗ , u 2 t + D ˜ 1 u 1 ∗ t + D ˜ 2 u 2 t + β t d W ˜ t . E41

Comparing Eq. (41) with Eq. (38), we arrive at

k t = − P t Cx u 1 ∗ , u 2 t + D 1 u 1 ∗ t + D 2 u 2 t , k ˜ t = − P t C ˜ x u 1 ∗ , u 2 t + D ˜ 1 u 1 ∗ t + D ˜ 2 u 2 t − β t , E42

and

α t = − P ̇ t + 2 AP t + Q 1 x u 1 ∗ , u 2 t − Aφ t − P t B 1 u 1 ∗ t − P t B 2 u 2 t + Ck t + C ˜ k ˜ t , E43

respectively. Taking E ⋅ G 1 , t on both sides of Eqs. (39) and (42), we get

q ̂ t = − P t x ̂ u 1 ∗ , u ̂ 2 t − φ ̂ t , E44

and

k ̂ t = − P t C x ̂ u 1 ∗ , u ̂ 2 t + D 1 u 1 ∗ t + D 2 u ̂ 2 t , k ˜ ̂ t = − P t C ˜ x ̂ u 1 ∗ , u ̂ 2 t + D ˜ 1 u 1 ∗ t + D ˜ 2 u ̂ 2 t − β ̂ t , E45

respectively. Applying Lemma 5.4 in [21] to Eqs. (33) and (38) corresponding to u 1 ∗ , we derive the optimal filtering equation

d x ̂ u 1 ∗ , u ̂ 2 t = A x ̂ u 1 ∗ , u ̂ 2 t + B 1 u 1 ∗ t + B 2 u ̂ 2 t dt + C ˜ x ̂ u 1 ∗ , u ̂ 2 t + D ˜ 1 u 1 ∗ t + D ˜ 2 u ̂ 2 t d W ˜ t , − d q ̂ t = A q ̂ t + C k ̂ t + C ˜ k ˜ ̂ t − Q 1 x ̂ u 1 ∗ , u ̂ 2 t dt − k ˜ t d W ˜ t , x ̂ u 1 ∗ , u ̂ 2 0 = x 0 , q ̂ T = − G 1 x ̂ u 1 ∗ , u ̂ 2 T . E46

Note that Eq. (46) is not a classical FBSDFE, since the generator of the BSDE depends on an additional process k ̂ . For given u 2 , it is important if Eq. (46) admits a unique G 1 , t -adapted solution x ̂ u 1 ∗ , u ̂ 2 q ̂ k ̂ k ˜ ̂ . We will make it clear soon. For this target, first, by Eq. (37) and supposing that.

(A2.1) N ˜ 1 t ≔ N 1 + D 1 2 P t + D ˜ 1 2 P t > 0 , ∀ t ∈ 0 T ,

we immediately arrive at

u 1 ∗ t = − N ˜ 1 − 1 t S ˜ 1 t x ̂ u 1 ∗ , u ̂ 2 t + S ˜ t u ̂ 2 t + B 1 φ ̂ t + D ˜ 1 β ̂ t , E47

where S ˜ 1 t ≔ B 1 + CD 1 + C ˜ D ˜ 1 P t , S ˜ t ≔ D 1 D 2 + D ˜ 1 D ˜ 2 P t . Substituting Eq. (47) into Eq. (43), we can obtain that if

P ̇ t + 2 A + C 2 + C ˜ 2 P t − B 1 + CD 1 + C ˜ D ˜ 1 2 N 1 + D 1 2 P t + D ˜ 1 2 P t − 1 P t 2 + Q 1 = 0 , P T = G 1 , E48

admits a unique differentiable solution P t , then

α t = − S ˜ 1 2 t N ˜ 1 − 1 t x u 1 ∗ , u 2 t + S ˜ 1 2 t N ˜ 1 − 1 t x ̂ u 1 ∗ , u ̂ 2 t − Aφ t + S ˜ 1 t N ˜ 1 − 1 t B 1 φ ̂ t − S ˜ 2 t u 2 t + S ˜ 1 t N ˜ 1 − 1 t S ˜ t u ̂ 2 t − C ˜ β t + S ˜ 1 t N ˜ 1 − 1 t D ˜ 1 β ̂ t , E49

where S ˜ 2 t ≔ B 2 + CD 2 + C ˜ D ˜ 2 P t . By (A2.1), we know that Eq. (48) admits a unique solution P t > 0 from standard Riccati equation theory [22]. In particular, if C ˜ = D ˜ 1 = 0 , Eq. (48) reduces to

P ̇ t + 2 A + C 2 P t − B 1 + CD 1 2 N 1 + D 1 2 P t − 1 P t 2 + Q 1 = 0 , P T = G 1 , N 1 + D 1 2 P t > 0 , E50

which recovers the standard one in [22]. With Eq. (49), the BSDE Eq. (40) takes the form

− dφ t = [ S ˜ 1 2 t N ˜ 1 − 1 t x u 1 ∗ , u 2 t − S ˜ 1 2 t N ˜ 1 − 1 t x ̂ u 1 ∗ , u ̂ 2 t + Aφ t − S ˜ 1 t N ˜ 1 − 1 t B 1 φ ̂ t + C ˜ − S ˜ 1 t N ˜ 1 − 1 t D ˜ 1 β t + S ˜ 2 t u 2 t − S ˜ 1 t N ˜ 1 − 1 t S ˜ t u ̂ 2 t ] dt − β t d W ˜ t , φ T = 0 . E51

Moreover, for given u 2 , plugging Eq. (47) into the forward equation of Eq. (46), and letting

A ˜ t ≔ A − B 1 N ˜ 1 − 1 t S ˜ 1 t , C ˜ ˜ t ≔ C ˜ − D ˜ 1 N ˜ 1 − 1 t S ˜ 1 t , B ˜ 2 t ≔ B 2 − B 1 N ˜ 1 − 1 t S ˜ 1 t , F ˜ 1 t ≔ − B 1 N ˜ 1 − 1 t B 1 , B ˜ 1 t ≔ − B 1 N ˜ 1 − 1 t D ˜ 1 , F ˜ 3 t ≔ − D ˜ 1 N ˜ 1 − 1 t D ˜ 1 , D ˜ ˜ 2 t ≔ D ˜ 2 − D ˜ 1 N ˜ 1 − 1 t S ˜ t , E52

we have

d x ̂ u 1 ∗ , u ̂ 2 t = A ˜ t x ̂ u 1 ∗ , u ̂ 2 t + F ˜ 1 t φ ̂ t + B ˜ 1 t β ̂ t + B ˜ 2 t u ̂ 2 t dt + C ˜ ˜ t x ̂ u 1 ∗ , u ̂ 2 t + B ˜ 1 t φ ̂ t + F ˜ 3 t β ̂ t + D ˜ ˜ 2 t u ̂ 2 t d W ˜ t , x ̂ u 1 ∗ , u ̂ 2 0 = x 0 , E53

which admits a unique G 1 , t -adapted solution x ̂ u 1 ∗ , u ̂ 2 , for given φ ̂ β ̂ . Applying Lemma 5.4 in [21] to Eq. (51) again, we have

− d φ ̂ t = A ˜ t φ ̂ t + C ˜ ˜ t β ̂ t + F ˜ 4 t u ̂ 2 t dt − β ̂ t d W ˜ t , φ ̂ T = 0 , E54

where F ˜ 4 t ≔ S ˜ 2 t − S ˜ 1 t N ˜ 1 − 1 t S ˜ t . For given u ̂ 2 , Eq. (54) admits a unique solution φ ̂ β ̂ from standard BSDE theory. Putting Eqs. (53) and (54) together, we get

d x ̂ u 1 ∗ , u ̂ 2 t = A ˜ t x ̂ u 1 ∗ , u ̂ 2 t + F ˜ 1 t φ ̂ t + B ˜ 1 t β ̂ t + B ˜ 2 t u ̂ 2 t dt + C ˜ ˜ t x ̂ u 1 ∗ , u ̂ 2 t + B ˜ 1 t φ ̂ t + F ˜ 3 t β ̂ t + D ˜ ˜ 2 t u ̂ 2 t d W ˜ t , − d φ ̂ t = A ˜ t φ ̂ t + C ˜ ˜ t β t + F ˜ 4 t u ̂ 2 t dt − β ̂ t d W ˜ t , x ̂ u 1 ∗ , u ̂ 2 0 = x 0 , φ ̂ T = 0 , E55

which admits a unique G 1 , t -adapted solution x ̂ u 1 ∗ , u ̂ 2 φ ̂ β ̂ . By Eqs. (55), (44), (45), and (47), we can uniquely obtain the solvability of Eq. (46). Moreover, we can check that the convexity/concavity conditions in Proposition 2.2 hold, and u 1 ∗ given by Eq. (47) is really optimal. We summarize the above procedure in the following theorem.

Theorem 3.1 Let (A2.1) hold, P t satisfy Eq. (48). For chosen u 2 of the leader, u 1 ∗ given by Eq. (47) is the optimal control of the follower, where x ̂ u 1 ∗ , u ̂ 2 φ ̂ β ̂ is the unique G 1 , t -adapted solution to Eq. (55).

3.2. Problem of the leader

Since the leader knows that the follower will take u 1 ∗ by Eq. (47), the state equation of the leader writes

dx u 2 t = Ax u 2 t + A ˜ t − A x ̂ u ̂ 2 t + F ˜ 1 t φ ̂ t + B ˜ 1 t β ̂ t + B 2 u 2 t + B ˜ 2 t − B 2 u ̂ 2 t dt + Cx u 2 t + F ˜ 5 t x ̂ u ̂ 2 t + B ˜ ˜ 1 t φ ̂ t + D ˜ ˜ 1 t β ̂ t + D 2 u 2 t + F ˜ 2 t u ̂ 2 t dW t + C ˜ x u 2 t + C ˜ ˜ t − C ˜ x ̂ u ̂ 2 t + B ˜ 1 t φ ̂ t + F ˜ 3 t β ̂ t + D ˜ 2 u 2 t + D ˜ ˜ 2 t − D ˜ 2 u ̂ 2 t d W ˜ t , − d φ ̂ t = A ˜ t φ ̂ t + C ˜ ˜ t β ̂ t + F ˜ 4 t u ̂ 2 t dt − β ̂ t d W ˜ t , x u 2 0 = x 0 , φ ̂ T = 0 , E56

where x u 2 ≡ x u 1 ∗ , u 2 , x ̂ u ̂ 2 ≡ x ̂ u 1 ∗ , u ̂ 2 and B ˜ ˜ 1 t ≔ − B 1 N ˜ 1 − 1 t D 1 , D ˜ ˜ 1 t ≔ − D 1 N ˜ 1 − 1 t D ˜ 1 , F ˜ 5 t ≔ − D 1 N ˜ 1 − 1 t S ˜ 1 t , F ˜ 2 t ≔ − D 1 N ˜ 1 − 1 t S ˜ t . Noting that Eq. (56) is a decoupled conditional mean-field FBSDE, its solvability for F t -adapted solution x u 2 φ ̂ β ̂ can be easily guaranteed.

The problem of the leader is to choose an F t -adapted optimal control u 2 ∗ such that the cost functional

J 2 u 2 = 1 2 E ∫ 0 T Q 2 x u 2 t 2 + N 2 u 2 t 2 dt + G 2 x u 2 T 2 E57

is minimized. Define the Hamiltonian function of the leader as

H 2 t x u 2 u 2 φ ̂ β ̂ y z z ˜ p = 1 2 Q 2 x u 2 2 + N 2 u 2 2 + y Ax u 2 + A ˜ t − A x ̂ u ̂ 2 + F ˜ 1 t φ ̂ + B ˜ 1 t β ̂ + B 2 u 2 + ( B ˜ 2 t − B 2 ) u ̂ 2 + p A ˜ t φ ̂ + C ˜ ˜ t β ̂ + F ˜ 4 t u ̂ 2 + z Cx u 2 + F ˜ 5 t x ̂ u ̂ 2 + B ˜ ˜ 1 t φ ̂ + D ˜ ˜ 1 t β ̂ + D 2 u 2 + F ˜ 2 t u ̂ 2 + z ˜ C ˜ x u 2 + C ˜ ˜ t − C ˜ x ̂ u ̂ 2 + B ˜ 1 t φ ̂ + F ˜ 3 t β ̂ + D ˜ 2 u 2 + D ˜ ˜ 2 t − D ˜ 2 u ̂ 2 . E58

Suppose that there exists an F t -adapted optimal control u 2 ∗ of the leader, and the corresponding optimal state is x ∗ φ ̂ ∗ β ̂ ∗ ≡ x u 2 ∗ φ ̂ ∗ β ̂ ∗ . Then by Propositions 2.3, 2.4, Eq. (58) yields that

0 = N 2 u 2 ∗ t + F ˜ 4 t p ̂ t + B 2 y t + B ˜ 2 t − B 2 y ̂ t + D 2 z t + F ˜ 2 t z ̂ t + D ˜ 2 t z ˜ t + D ˜ ˜ 2 t − D ˜ 2 z ˜ t , E59

where the F t -adapted process p y z z ˜ satisfies

dp t = A ˜ t p t + F ˜ 1 t y t + B ˜ ˜ 1 t z t + B ˜ 1 t z ˜ t dt + C ˜ ˜ t p t + B ˜ 1 t y t + D ˜ ˜ 1 t z t + F ˜ 3 t z ˜ t d W ˜ t , − dy t = Ay t + A ˜ t − A y ̂ t + Cz t + F ˜ 5 t z ̂ t + C ˜ z ˜ t + C ˜ ˜ t − C ˜ z ˜ t + Q 2 x ∗ t dt − z t dW t − z ˜ t d W ˜ t , p 0 = 0 , y T = G 2 x ∗ T . E60

In fact, the problem of the leader can also be solved by a direct calculation of the derivative of the cost functional. Without loss of generality, let x 0 ≡ 0 , and set u 2 ∗ + ε u 2 for ϵ > 0 sufficiently small, with u 2 ∈ R . Then it is easy to see from the linearity of Eqs. (56) and (60), that the solution to Eq. (56) is x ∗ + ϵ x u 2 . We first have

J ˜ ϵ ≔ J 2 u 2 ∗ + ϵ u 2 = 1 2 E ∫ 0 T [ Q 2 x ∗ t + ϵ x u 2 t x ∗ t + ϵ x u 2 t + N 2 u 2 ∗ t + ϵ u 2 t u 2 ∗ t + ϵ u 2 t ] dt + 1 2 E G 2 x ∗ T + ϵ x u 2 T x ∗ T + ϵ x u 2 T . E61

Hence

0 = ∂ J ˜ ϵ ∂ϵ ε = 0 = E ∫ 0 T Q 2 x ∗ t x u 2 t + N 2 u 2 ∗ t u 2 t dt + E G 2 x ∗ T x u 2 T . E62

Let the F t -adapted process quadruple p y z z ˜ satisfy Eq. (60). Then we have

0 = E ∫ 0 T Q 2 x ∗ t x u 2 t + N 2 u 2 ∗ t u 2 t dt + E y T x u 2 T . E63

Applying Itô’s formula to x u 2 t y t − p t φ ̂ t , noting Eqs. (56) and (60), we derive.

0 = E ∫ 0 T Q 2 x ∗ t + Ay t + Cz t + C ˜ z ˜ t x u 2 t dt + E ∫ 0 T A ˜ t − A y t + F ˜ 5 t z t + C ˜ ˜ t − C ˜ z ˜ t x ̂ u ̂ 2 t dt + E ∫ 0 T N 2 u 2 ∗ t + B 2 y t + D 2 z t + D ˜ 2 z ˜ t u 2 t dt + E ∫ 0 T B ˜ 2 − B 2 y t + F ˜ 2 t z t + D ˜ ˜ 2 t − D ˜ 2 z ˜ t u ̂ 2 t dt − E ∫ 0 T Q 2 x ∗ t + Ay t + A ˜ t − A y ̂ t + Cz t + C ˜ z ˜ t + F ˜ 5 t z ̂ t + C ˜ ˜ t − C ˜ z ˜ t x u 2 t dt + E ∫ 0 T F ˜ 1 t y t + B ˜ ˜ 1 t z t + B ˜ 1 t z ˜ t φ ̂ t dt + E ∫ 0 T B ˜ 1 t y t + D ˜ ˜ 1 t z t + F ˜ 3 t z ˜ t β ̂ t dt + E ∫ 0 T p t A ˜ t φ ̂ t + C ˜ ˜ t β ̂ t dt − E ∫ 0 T φ ̂ t A ˜ t p t + F ˜ 1 t y t + B ˜ ˜ 1 t z t + B ˜ 1 t z ˜ t dt − E ∫ 0 T β ̂ t C ˜ ˜ t p t + B ˜ 1 t y t + D ˜ ˜ 1 t z t + F ˜ 3 t z ˜ t dt + E ∫ 0 T p t F ˜ 4 t u ̂ 2 t dt = E ∫ 0 T N 2 u 2 ∗ t + B 2 y t + D 2 z t + D ˜ 2 z ˜ t u 2 t dt + E ∫ 0 T B ˜ 2 t − B 2 y t + F ˜ 2 t z t + D ˜ ˜ 2 t − D ˜ 2 z ˜ t u ̂ 2 t dt + E ∫ 0 T p t F ˜ 4 t u ̂ 2 t dt = E ∫ 0 T ⟨ N 2 u 2 ∗ t + F ˜ 4 t p ̂ t + B 2 y t + B ˜ 2 t − B 2 y ̂ t + D 2 z t + F ˜ 2 t z ̂ t + D ˜ 2 t z ˜ t + D ˜ ˜ 2 t − D ˜ 2 z ˜ t , u 2 t ⟩ dt . E64

This implies Eq. (59).

In the following, we wish to obtain a “nonanticipating” representation for the optimal controls u 2 ∗ and u 1 ∗ . For this target, let us regard x ∗ p Τ as the optimal state, put

X = x ∗ p , Y = y φ ̂ ∗ , Z = z 0 , Z ˜ = z ˜ β ̂ ∗ , E65

and (suppressing some t below)

A 1 ≔ A 0 0 A ˜ t , A 2 ≔ A ˜ t − A 0 0 0 , B ˜ 1 ≔ 0 B ˜ 1 t B ˜ 1 t 0 , B ˜ ˜ 1 ≔ 0 0 B ˜ ˜ 1 t 0 , B 2 ≔ B 2 0 , B ˜ 2 ≔ B ˜ 2 t − B 2 0 , C 1 ≔ C 0 0 0 , C ˜ 1 ≔ C ˜ 0 0 C ˜ ˜ t , C ˜ 2 ≔ C ˜ ˜ t − C ˜ 0 0 0 , D ˜ ˜ 1 ≔ 0 D ˜ ˜ 1 t 0 0 , D ˜ 2 ≔ D ˜ 2 0 , D ˜ ˜ 2 ≔ D ˜ ˜ 2 t − D ˜ 2 0 , D 2 ≔ D 2 0 , G 2 ≔ G 2 0 0 0 , F ˜ 1 ≔ 0 F ˜ 1 t F ˜ 1 t 0 , F ˜ 2 ≔ F ˜ 2 t 0 , X 0 ≔ x 0 0 , F ˜ 3 ≔ 0 F ˜ 3 t F ˜ 3 t 0 , F ˜ 4 ≔ 0 F ˜ 4 t , F ˜ 5 ≔ F ˜ 5 t 0 0 0 , Q 2 ≔ Q 2 0 0 0 . E66

With the notations, Eq. (56) with Eq. (60) is rewritten as

dX t = A 1 X t + A 2 X ̂ t + F ˜ 1 Y t + B ˜ ˜ 1 Z t + B ˜ 1 Z ˜ t + B 2 w ∗ t + B ˜ 2 u ̂ 2 ∗ t dt + C 1 X t + F ˜ 5 X ̂ t + B ˜ ˜ 1 Τ Y t + D ˜ ˜ 1 Z ˜ t + D 2 w ∗ t + F ˜ 2 u ̂ 2 ∗ t dW t + C ˜ 1 X t + C ˜ 2 X ̂ t + B ˜ 1 Τ Y t + D ˜ ˜ 1 Τ Z t + F ˜ 3 Z ˜ t + D ˜ 2 w ∗ t + D ˜ ˜ 2 u ̂ 2 ∗ t d W ˜ t , − dY t = Q 2 X t + A 1 Τ Y t + A 2 Τ Y ̂ t + C 1 Τ Z t + F ˜ 5 Τ Z ̂ t + C ˜ 1 Τ Z ˜ t + C ˜ 2 Τ Z ˜ t + F ˜ 4 u ̂ 2 ∗ t dt − Z t dW t − Z ˜ t d W ˜ t , X 0 = X 0 , Y T = G 2 X T . E67

Noting Eq. (59), we have

u 2 ∗ t = − N 2 − 1 F ˜ 4 Τ X ̂ t + B 2 Τ Y t + B ˜ 2 Τ Y ̂ t + D 2 Τ Z t + F ˜ 2 Τ Z ̂ t + D ˜ 2 Τ Z ˜ t + D ˜ ˜ 2 Τ Z ˜ t , u ̂ 2 ∗ t = − N 2 − 1 F ˜ 4 Τ X ̂ t + B 2 + B ˜ 2 Τ Y ̂ t + D 2 + F ˜ 2 Τ Z ̂ t + D ˜ 2 + D ˜ ˜ 2 Τ Z ˜ t . E68

Inserting Eq. (68) into Eq. (67), we get

dX t = A 1 X t + A ¯ 2 X ̂ t + F ˜ ¯ 1 Y t + B ¯ 2 Y ̂ t + B 3 Z t + B ¯ ¯ 2 Z ̂ t + B ˜ ¯ 1 Z ˜ t + B ˜ ¯ 2 Z ˜ t dt + C 1 X t + F ˜ ¯ 5 X ̂ t + B ˜ 3 Τ Y t + D ¯ 2 Y ̂ t + D ˜ ˜ 2 Z t + D ¯ ¯ 2 Z ̂ t + D 3 Z ˜ t + D ˜ ¯ 2 Z ˜ t dW t + C ˜ 1 X t + C ˜ ¯ 2 X ̂ t + B ˜ ¯ 1 Τ Y t + D ¯ 3 Y ̂ t + D 3 Τ Z t + D ¯ ¯ 3 Τ Z ̂ t + F ˜ ¯ 3 Z ˜ t + D ˜ ¯ 3 Z ˜ t d W ˜ t , − dY t = Q 2 X t + F ˜ ¯ 4 X ̂ t + A 1 Τ Y t + A ¯ 2 Τ Y ̂ t + C ˜ 1 Τ Z ˜ t + C 1 Τ Z t + F ˜ ¯ 5 Τ Z ̂ t + C ˜ ¯ 2 Τ Z ˜ t dt − Z t dW t − Z ˜ t d W ˜ t , X 0 = X 0 , Y T = G 2 X T , E69

where

A ¯ 2 ≔ A 2 − B 2 + B ˜ 2 N 2 − 1 F ˜ 4 Τ , B ˜ ¯ 1 ≔ B ˜ 1 − B 2 N 2 − 1 D ˜ 2 Τ , B ¯ 2 ≔ − B 2 N 2 − 1 B ˜ 2 Τ − B ˜ 2 N 2 − 1 B 2 + B ˜ 2 Τ , B ¯ ¯ 2 ≔ − B 2 N 2 − 1 F ˜ 2 Τ − B ˜ 2 N 2 − 1 D 2 + F ˜ 2 Τ , B ˜ ¯ 2 ≔ − B 2 N 2 − 1 D ˜ ˜ 2 Τ − B ˜ 2 N 2 − 1 D ˜ 2 + D ˜ ˜ 2 Τ , B 3 ≔ B ˜ ˜ 1 − B 2 N 2 − 1 D 2 Τ , C ˜ ¯ 2 ≔ C ˜ 2 − D ˜ 2 + D ˜ ˜ 2 N 2 − 1 F ˜ 4 Τ , D ˜ ¯ 2 ≔ − D 2 N 2 − 1 D ˜ ˜ 2 Τ − F ˜ 2 N 2 − 1 D ˜ 2 + D ˜ ˜ 2 Τ , D ˜ ˜ 2 ≔ − D 2 N 2 − 1 D 2 Τ , D ¯ 2 ≔ − D 2 N 2 − 1 B ˜ 2 Τ − F ˜ 2 N 2 − 1 B 2 + B ˜ 2 Τ , D ¯ ¯ 2 ≔ − D 2 N 2 − 1 F ˜ 2 Τ − F ˜ 2 N 2 − 1 D 2 + F ˜ 2 Τ , D 3 ≔ D ˜ ˜ 1 − D 2 N 2 − 1 D ˜ 2 Τ , D ¯ 3 ≔ − D ˜ 2 N 2 − 1 B ˜ 2 Τ − D ˜ ˜ 2 N 2 − 1 B 2 + B ˜ 2 Τ , D ¯ ¯ 3 ≔ − D ˜ 2 N 2 − 1 F ˜ 2 Τ − D ˜ ˜ 2 N 2 − 1 D 2 + F ˜ 2 Τ , D ˜ ¯ 3 ≔ − D ˜ 2 N 2 − 1 D ˜ ˜ 2 Τ − D ˜ ˜ 2 N 2 − 1 D ˜ 2 + D ˜ ˜ 2 Τ , F ˜ ¯ 1 ≔ F ˜ 1 − B 2 N 2 − 1 B 2 Τ , F ˜ ¯ 3 ≔ F ˜ 3 − D ˜ 2 N 2 − 1 D ˜ 2 Τ , F ˜ ˜ 4 ≔ − F ˜ 4 N 2 − 1 F ˜ 4 Τ , F ˜ ¯ 5 ≔ F ˜ 5 − D 2 + F ˜ 2 N 2 − 1 F ˜ 4 Τ . E70

We need to decouple Eq. (69). Similar to Eq. (39), put

Y t = P 1 t X t + P 2 t X ̂ t , t ∈ 0 T , E71

where P 1 t , P 2 t are differentiable, deterministic 2 × 2 matrix-valued functions with P 1 T = G 2 , P 2 T = 0 . Applying Lemma 5.4 in [21] to the forward equation in Eq. (35), we obtain

d X ̂ t = A 1 + A ¯ 2 X ̂ t + F ˜ ¯ 1 + B ¯ 2 Y ̂ t + B 3 + B ¯ ¯ 2 Z ̂ t + B ˜ ¯ 1 + B ˜ ¯ 2 Z ˜ t dt + C ˜ 1 + C ˜ ¯ 2 X ̂ t + B ˜ ¯ 1 Τ + D ¯ 3 Y ̂ t + D 3 Τ + D ¯ ¯ 3 Z ̂ t + F ˜ ¯ 3 + D ˜ ¯ 3 Z ˜ t d W ˜ t , X ̂ 0 = X 0 . E72

Applying Itô’s formula to (3.31), we get

dY t = P ̇ 1 + P 1 A 1 + P 1 F ˜ ¯ 1 P 1 X t + [ P ̇ 2 + P 1 A ¯ 2 + P 1 B ¯ 2 P 1 + P 2 A 1 + A ¯ 2 + P 2 F ˜ ¯ 1 + B ¯ 2 P 1 + P 1 F ˜ ¯ 1 + B ¯ 2 P 2 + P 2 F ˜ ¯ 1 + B ¯ 2 P 2 X ̂ t + P 1 B 3 Z t + P 1 B ˜ ¯ 1 Z ˜ t + P 1 B ¯ ¯ 2 + P 2 B 3 + B ¯ ¯ 2 Z ̂ t + P 1 B ˜ ¯ 2 + P 2 B ˜ ¯ 1 + B ˜ ¯ 2 Z ˜ t dt + P 1 C 1 + P 1 t B 3 Τ P 1 X t + P 1 F ˜ ¯ 5 + P 1 B 3 Τ P 2 + P 1 D ¯ 2 P 1 + P 2 X ̂ t + P 1 D ˜ ˜ 2 Z t + P 1 D ¯ ¯ 2 Z ̂ t + P 1 D 3 Z ˜ t + P 1 D ˜ ¯ 2 Z ˜ t dW t + P 1 C ˜ 1 + P 1 B ˜ ¯ 1 Τ P 1 X t + P 1 C ˜ ¯ 2 + P 2 B ˜ ¯ 1 Τ + D ¯ 3 P 1 + P 1 B ˜ ¯ 1 Τ + D ¯ 3 P 2 + P 2 B ˜ ¯ 1 Τ + D ¯ 3 P 2 + P 2 C ˜ 1 + C ˜ ¯ 2 + P 1 D ¯ 3 P 1 X ̂ t + P 1 D 3 Τ Z t + P 1 F ˜ ¯ 3 Z ˜ t + P 1 D ¯ ¯ 3 Τ + P 2 D 3 Τ + D ¯ ¯ 3 Z ̂ t + P 1 D ˜ ¯ 3 Τ + P 2 F ˜ ¯ 3 + D ˜ ¯ 3 Z ˜ t d W ˜ t = − Q 2 + A 1 Τ P 1 X t + F ˜ ¯ 4 + A ¯ 2 Τ P 1 + A 1 Τ P 2 + A ¯ 2 Τ P 2 X ̂ t + C 1 Τ Z t + F ˜ ¯ 5 Τ Z ̂ t + C ˜ 1 Τ Z ˜ t + C ˜ ¯ 2 Τ Z ˜ t dt + Z t dW t + Z ˜ t d W ˜ t . E73

Comparing dW t and d W ˜ t on both sides of Eq. (73), we have

Z t = P 1 C 1 + P 1 B 3 Τ P 1 X t + P 1 F ˜ ¯ 5 + P 1 B 3 Τ P 2 + P 1 D ¯ 2 P 1 + P 2 X ̂ t + P 1 D ˜ ˜ 2 Z t + P 1 D ¯ ¯ 2 Z ̂ t + P 1 D 3 Z ˜ t + P 1 D ˜ ¯ 2 Z ˜ t , Z ˜ t = P 1 C ˜ 1 + P 1 B ˜ ¯ 1 Τ P 1 X t + P 2 C ˜ 1 + C ˜ ¯ 2 + P 2 B ˜ ¯ 1 Τ + D ¯ 3 P 1 + P 1 B ˜ ¯ 1 Τ + D ¯ 3 P 2 + P 2 B ˜ ¯ 1 Τ + D ¯ 3 P 2 + P 1 C ˜ ¯ 2 + P 1 D ¯ 3 P 1 X ̂ t + P 1 D 3 Τ Z t + P 1 D ¯ ¯ 3 Τ + P 2 D 3 Τ + D ¯ ¯ 3 Z ̂ t + P 1 F ˜ ¯ 3 Z ˜ t + P 1 D ˜ ¯ 3 Τ + P 2 F ˜ ¯ 3 + D ˜ ¯ 3 Z ˜ t . E74

Taking E ⋅ G 1 , t , we derive

Z ̂ t = P 1 C 1 + F ˜ ¯ 5 + P 1 B 3 Τ + D ¯ 2 P 1 + P 1 B 3 Τ + D ¯ 2 P 2 X ̂ t + P 1 D ˜ ˜ 2 + D ¯ ¯ 2 Z ̂ t + P 1 D 3 + D ˜ ¯ 2 Z ˜ t , Z ˜ ̂ t = P 1 C ˜ 1 + C ˜ ¯ 2 + P 1 B ˜ ¯ 1 Τ + D ¯ 3 P 1 + P 2 C ˜ 1 + C ˜ ¯ 2 + P 2 B ˜ ¯ 1 Τ + D ¯ 3 P 1 + P 1 B ˜ ¯ 1 Τ + D ¯ 3 P 2 + P 2 B ˜ ¯ 1 Τ + D ¯ 3 P 2 X ̂ t + P 1 + P 2 D 3 Τ + D ¯ ¯ 3 Z ̂ t + P 1 + P 2 F ˜ ¯ 3 + D ˜ ¯ 3 Z ˜ t . E75

Supposing that ( I 2 denotes the 2 × 2 unit matrix)

A 2.2 N ˜ 2 − 1 ≔ I 2 − P 1 D ˜ ˜ 2 + D ¯ ¯ 2 − 1 and N ˜ ˜ 2 − 1 ≔ I 2 − P 1 + P 2 D 3 Τ + D ¯ ¯ 3 N ˜ 2 − 1 P 1 D 3 + D ˜ ¯ 2 − P 1 + P 2 F ˜ ¯ 3 + D ˜ ¯ 3 − 1 exist , E76

we get

Z ̂ t = Σ 0 P 1 P 2 X ̂ t , Z ˜ t = Σ ˜ 0 P 1 P 2 X ̂ t , E77

where

Σ 0 P 1 P 2 ≔ N ˜ 2 − 1 P 1 D 3 + D ˜ ¯ 2 N ˜ ˜ 2 − 1 P 1 + P 2 C ˜ 1 + C ˜ ¯ 2 + B ˜ ¯ 1 Τ + D ¯ 3 P 1 + P 2 + D 3 Τ + D ¯ ¯ 3 N ˜ 2 − 1 P 1 C 1 + F ˜ ¯ 5 + B 3 Τ + D ¯ 2 P 1 + P 2 + P 1 C 1 + F ˜ ¯ 5 + B 3 Τ + D ¯ 2 P 1 + P 2 , Σ ˜ 0 P 1 P 2 ≔ N ˜ ˜ 2 − 1 P 1 + P 2 C ˜ 1 + C ˜ ¯ 2 + B ˜ ¯ 1 Τ + D ¯ 3 P 1 + P 2 + D 3 Τ + D ¯ ¯ 3 N ˜ 2 − 1 P 1 C 1 + F ˜ ¯ 5 + B 3 Τ + D ¯ 2 P 1 + P 2 . E78

Inserting Eq. (77) into Eq. (74), we have

Z t = P 1 C 1 + P 1 B 3 Τ P 1 X t + P 1 F ˜ ¯ 5 + B 3 Τ P 2 + D ¯ 2 P 1 + P 2 + D ¯ ¯ 2 Σ 0 P 1 P 2 + D ˜ ¯ 2 Σ ˜ 0 P 1 P 2 X ̂ t + P 1 D ˜ ˜ 2 Z t + P 1 D 3 Z ˜ t , Z ˜ t = P 1 C ˜ 1 + P 1 B ˜ ¯ 1 Τ P 1 X t + P 2 C ˜ 1 + C ˜ ¯ 2 + P 1 C ˜ ¯ 2 + P 2 B ˜ ¯ 1 Τ + D ¯ 3 P 1 + P 1 B ˜ ¯ 1 Τ + D ¯ 3 P 2 + P 1 D ¯ 3 P 1 + P 2 B ˜ ¯ 1 Τ + D ¯ 3 P 2 + P 1 D ¯ ¯ 3 Τ + P 2 D 3 Τ + D ¯ ¯ 3 P 1 D ¯ ¯ 2 Σ 0 P 1 P 2 + P 1 D ˜ ¯ 3 Τ + P 2 F ˜ ¯ 3 + D ˜ ¯ 3 P 1 D ˜ ¯ 2 Σ ˜ 0 ( P 1 P 2 ) X ̂ t + P 1 D 3 Τ Z t + P 1 F ˜ ¯ 3 Z ˜ t . E79

Supposing that

A 2.3 N ¯ 2 − 1 ≔ I 2 − P 1 D ˜ ˜ 2 − 1 ≔ I 2 n + P 1 D 2 N 2 − 1 D 2 Τ − 1 and N ¯ ¯ 2 − 1 ≔ I 2 − P 1 D 3 Τ N ¯ 2 − 1 P 1 D 3 − P 1 F ˜ ¯ 3 − 1 ≔ I 2 n − P 1 D ˜ ˜ 1 − D 2 N 2 − 1 D ˜ 2 Τ Τ × I 2 n + P 1 D 2 N 2 − 1 D 2 Τ − 1 P 1 D ˜ ˜ 1 − D 2 N 2 − 1 D ˜ 2 Τ − P 1 F ˜ 3 − D ˜ 2 N 2 − 1 D ˜ 2 Τ − 1 exist , E80

we get

Z t = Σ 1 P 1 P 2 X t + Σ 2 P 1 P 2 X ̂ t , Z ˜ t = Σ ˜ 1 P 1 P 2 X t + Σ ˜ 2 P 1 P 2 X ̂ t , E81

where.

Σ 1 P 1 P 2 ≔ N ¯ 2 − 1 P 1 C 1 + B 3 Τ P 1 + D 3 P 1 C 1 + B ˜ ¯ 1 Τ P 1 + D 3 N ¯ ¯ 2 − 1 P 1 D 3 Τ N ¯ 2 − 1 P 1 C 1 + B 3 Τ P 1 , Σ ˜ 1 P 1 P 2 ≔ N ¯ ¯ 2 − 1 P 1 C 1 + B ˜ ¯ 1 Τ P 1 + D 3 Τ N ¯ 2 − 1 P 1 C 1 + B 3 Τ P 1 , Σ 2 P 1 P 2 ≔ N ¯ 2 − 1 P 1 F ˜ ¯ 5 + B 3 Τ P 2 + D ¯ 2 P 1 + P 2 + D ¯ ¯ 2 Σ 0 ( P 1 P 2 ) + D ˜ ¯ 2 Σ ˜ 0 ( P 1 P 2 ) + D 3 N ¯ ¯ 2 − 1 P 2 C ˜ 1 + C ˜ ¯ 2 + P 2 B ˜ ¯ 1 Τ + D ¯ 3 P 1 + P 1 B ˜ ¯ 1 Τ + D ¯ 3 P 2 + P 1 D ¯ 3 P 1 + P 2 B ˜ ¯ 1 Τ + D ¯ 3 P 2 + P 1 C ˜ ¯ 2 + P 1 D ¯ ¯ 3 Τ + P 2 D 3 Τ + D ¯ ¯ 3 P 1 D ¯ ¯ 2 Σ 0 P 1 P 2 + P 1 D ˜ ¯ 3 Τ + P 2 F ˜ ¯ 3 + D ˜ ¯ 3 P 1 D ˜ ¯ 2 Σ ˜ 0 P 1 P 2 + P 1 D 3 Τ N ¯ 2 − 1 P 1 [ F ˜ ¯ 5 + B 3 Τ P 2 + D ¯ 2 P 1 + P 2 + D ¯ ¯ 2 Σ 0 ( P 1 P 2 ) + D ˜ ¯ 2 Σ ˜ 0 ( P 1 P 2 ) , Σ ˜ 2 P 1 P 2 ≔ N ¯ ¯ 2 − 1 P 2 C ˜ 1 + C ˜ ¯ 2 + P 2 B ˜ ¯ 1 Τ + D ¯ 3 P 1 + P 1 D ¯ 3 P 1 + P 1 C ˜ ¯ 2 + P 1 B ˜ ¯ 1 Τ + D ¯ 3 P 2 + P 2 B ˜ ¯ 1 Τ + D ¯ 3 P 2 + P 1 D ¯ ¯ 3 Τ + P 2 D 3 Τ + D ¯ ¯ 3 P 1 D ¯ ¯ 2 Σ 0 P 1 P 2 + P 1 D ˜ ¯ 3 Τ + P 2 F ˜ ¯ 3 + D ˜ ¯ 3 P 1 D ˜ ¯ 2 Σ ˜ 0 P 1 P 2 + P 1 D 3 Τ N ¯ 2 − 1 P 1 [ F ˜ ¯ 5 + B 3 Τ P 2 + D ¯ 2 P 1 + P 2 + D ¯ ¯ 2 Σ 0 ( P 1 P 2 ) + D ˜ ¯ 2 Σ ˜ 0 ( P 1 P 2 ) ] . E82

Comparing the coefficients of dt in Eq. (73) and putting Eqs. (77) and (81) into them, we get

0 = P ̇ 1 + P 1 A 1 + A 1 Τ P 1 + P 1 F ˜ ¯ 1 P 1 + Q 2 + C 1 + P 1 B 3 Σ 1 P 1 P 2 + C ˜ 1 Τ + P 1 B ˜ ¯ 1 Σ ˜ 1 P 1 P 2 , 0 = P ̇ 2 + P 2 A 1 + A ¯ 2 + A 1 + A ¯ 2 Τ P 2 + P 2 F ˜ ¯ 1 + B ¯ 2 P 1 + P 1 F ˜ ¯ 1 + B ¯ 2 P 2 + P 2 F ˜ ¯ 1 + B ¯ 2 P 2 + P 1 A ¯ 2 + A ¯ 2 Τ P 1 + P 1 B ¯ 2 P 1 + F ˜ ¯ 4 + C 1 + P 1 B 3 Σ 2 P 1 P 2 + C ˜ 1 Τ + P 1 B ˜ ¯ 1 Σ ˜ 2 P 1 P 2 + F ˜ ¯ 5 Τ + P 1 B ¯ ¯ 2 + P 2 B 3 + B ¯ ¯ 2 Σ 0 P 1 P 2 + C ˜ ¯ 2 Τ + P 1 B ˜ ¯ 2 + P 2 B ˜ ¯ 1 + B ˜ ¯ 2 Σ ˜ 0 P 1 P 2 , P 1 T = G 2 , P 2 T = 0 . E83

Note that the system of Riccati equations (83) is not standard, and its solvability is open. Due to some technical reason, we can not obtain the solvability of it now. However, in some special case, P 1 t and P 2 t are not coupled. Then we can first solve the first equation of P 1 t , then that of P 2 t by standard Riccati equation theory. We will not discuss for the space limit. And we will consider the general solvability of Eq. (83) in the future.

Instituting Eqs. (77) and (81) into Eq. (68), we obtain

u 2 ∗ t = − N 2 − 1 { B 2 Τ P 1 + D 2 Τ Σ 1 P 1 P 2 + D ˜ 2 Τ Σ ˜ 1 P 1 P 2 X t + [ F ˜ 4 Τ + B 2 Τ P 2 + B ˜ 2 Τ P 1 + P 2 + D 2 Τ Σ 2 P 1 P 2 + F ˜ 2 Τ Σ 0 P 1 P 2 + D ˜ 2 Τ Σ ˜ 2 P 1 P 2 + D ˜ ˜ 2 Τ Σ ˜ 0 P 1 P 2 ] X ̂ t } , E84

and the optimal “state” X = x ∗ p Τ of the leader satisfies

dX t = [ A 1 + F ˜ ¯ 1 P 1 + B 3 Σ 1 P 1 P 2 + B ˜ ¯ 1 Σ ˜ 1 ( P 1 P 2 ) ] X t + [ A ¯ 2 + F ˜ ¯ 1 P 2 + B ¯ 2 P 1 + P 2 + B 3 Σ 2 P 1 P 2 + B ¯ ¯ 2 Σ 0 ( P 1 P 2 ) + B ˜ ¯ 1 Σ ˜ 2 ( P 1 P 2 ) + B ˜ ¯ 2 Σ ˜ 0 ( P 1 P 2 ) ] X ̂ t dt + C 1 + B ˜ 3 Τ P 1 + D ˜ ˜ 2 Σ 1 P 1 P 2 X t + [ F ˜ ¯ 5 + B ˜ 3 Τ P 2 + D ¯ 2 P 1 + P 2 + D ˜ ˜ 2 Σ 2 P 1 P 2 + D ¯ ¯ 2 Σ ˜ 0 ( P 1 P 2 ) ] X ̂ t dW t + [ C ˜ 1 + B ˜ ¯ 1 Τ P 1 + D 3 Τ Σ 1 P 1 P 2 + F ˜ ¯ 3 Σ ˜ 1 ( P 1 P 2 ) ] X t + [ C ˜ ¯ 2 + B ˜ ¯ 1 Τ P 2 + D ¯ 3 P 1 + P 2 + D 3 Τ Σ 2 P 1 P 2 + D ¯ ¯ 3 Τ Σ 0 ( P 1 P 2 ) + F ˜ ¯ 3 Σ ˜ 2 ( P 1 P 2 ) + D ˜ ¯ 3 Σ ˜ 0 ( P 1 P 2 ) ] X ̂ t d W ˜ t , X 0 = X 0 , E85

where X ̂ is governed by

d X ̂ t = A 1 + A ¯ 2 + F ˜ ¯ 1 + B ¯ 2 P 1 + P 2 + B 3 Σ 1 ( P 1 P 2 ) + B ˜ ¯ 1 Σ ˜ 1 ( P 1 P 2 ) + B 3 Σ 2 P 1 P 2 + B ¯ ¯ 2 Σ 0 ( P 1 P 2 ) + B ˜ ¯ 1 Σ ˜ 2 ( P 1 P 2 ) + B ˜ ¯ 2 Σ ˜ 0 ( P 1 P 2 ) X ̂ t dt + C ˜ 1 + C ˜ ¯ 2 + B ˜ ¯ 1 Τ + D ¯ 3 P 1 + P 2 + D 3 Τ Σ 1 ( P 1 P 2 ) + F ˜ ¯ 3 Σ ˜ 1 ( P 1 P 2 ) + D 3 Τ Σ 2 P 1 P 2 + D ¯ ¯ 3 Τ Σ 0 ( P 1 P 2 ) + F ˜ ¯ 3 Σ ˜ 2 ( P 1 P 2 ) + D ˜ ¯ 3 Σ ˜ 0 ( P 1 P 2 ) X ̂ t d W ˜ t , X ̂ 0 = X 0 . E86

We summarize the above analysis in the following theorem.

Theorem 3.2 Let (A2.1) ∼ (A2.3) hold, P 1 t P 2 t satisfy Eq. (83), X ̂ be the G 1 , t -adapted solution to Eq. (86), and X be the F t -adapted solution to Eq. (85). Define Y Z Z ˜ by Eqs. (71) and (81), respectively. Then Eq. (69) holds, and u 2 ∗ given by Eq. (84) is a feedback optimal control of the leader.

Finally, the optimal control u 1 ∗ of the follower can also be represented in a nonanticipating way. In fact, by Eq. (47), noting Eqs. (68), (71), and (77), we have

u 1 ∗ t = − N ˜ 1 − 1 t S ˜ 1 Τ t x ̂ ∗ t + S ˜ t u ̂ 2 ∗ t + B 1 Τ φ ̂ ∗ t + D ˜ 1 Τ β ∗ t = − N ˜ 1 − 1 t S ˜ 1 Τ t 0 X ̂ t + S ˜ t u ̂ 2 ∗ t + 0 B 1 Τ Y ̂ t + 0 D ˜ 1 Τ Z ˜ t = − N ˜ 1 − 1 t S ˜ 1 Τ t 0 − S ˜ t N 2 − 1 [ F ˜ 4 Τ + B 2 + B ˜ 2 Τ P 1 + P 2 + D 2 + F ˜ 2 Τ Σ 0 ( P 1 P 2 ) + D ˜ 2 + D ˜ ˜ 2 Τ Σ ˜ 0 ( P 1 P 2 ) ] + 0 B 1 Τ P 1 + P 2 + 0 D ˜ 1 Τ Σ ˜ 0 ( P 1 P 2 ) X ̂ t , E87

which is observable for the follower.

Remark 3.3 When we consider the complete information case, that is, W ˜ ⋅ disappears and G 1 , t = F t , Theorems 3.1 and 3.2 coincide with Theorems 2.3 and 3.3 in Yong [13].

4. Concluding remarks

In this chapter, we have studied a leader-follower stochastic differential game with asymmetric information. This kind of game problem possesses several attractive features. First, the game problem has the Stackelberg feature, which means the two players play as different roles during the game. Thus the usual approach to deal with game problems, such as [6, 7, 8, 10], where the two players act as equivalent roles, does not apply. Second, the game problem has the asymmetric information between the two players, which was not considered in [3, 13, 14]. In detail, the information available to the follower is based on some sub- σ -algebra of that available to the leader. Stochastic filtering technique is introduced to compute the optimal filtering estimates for the corresponding adjoint processes, which act as the solution to some FBSDFE. Third, the Stackelberg equilibrium is represented in its state feedback form for the LQ problem under some appropriate assumptions. Some new conditional mean-field FBSDEs and system of Riccati equations are introduced to deal with the leader’s LQ problem.

In principle, Theorems 3.1 and 3.2 provide a useful tool to seek Stackelberg equilibrium. As a first step in this direction, we apply our results to the LQ problem to obtain explicit solutions. We hope to return to the more general case in our future research. It is worthy to study the closed-loop Stackelberg equilibrium for our problem, as well as the solvability of the system of Riccati equations. These challenging topics will be considered in our future work.

Acknowledgments

Jingtao Shi would like to thank the book editor for his/her comments and suggestions. Jingtao Shi also would like to thank Professor Guangchen Wang from Shandong University and Professor Jie Xiong from Southern University of Science and Technology, for their effort and discussion during the writing of this chapter.

Notes

The main content of this chapter is from the following two published article papers: (1) Shi, J.T., Wang, G.C., & Xiong, J. (2016). Leader-follower stochastic differential games with asymmetric information and applications. Automatica, Vol. 63, 60–73. (2) Shi, J.T., Wang, G.C., & Xiong, J. (2017). Linear-quadratic stochastic Stackelberg differential game with asymmetric information. Science China Information Sciences, Vol. 60, 092202:1–15.

References

1. Williams N. A solvable continuous time principal agent model. Journal of Economic Theory. 2015;159:989-1015
2. Williams N. On Dynamic Principle-agent Models in Continuous Time. Working Paper. University of Wisconsin-Madison; 2008
3. Øksendal B, Sandal L, Ubøe J. Stochastic Stackelberg equilibria with applications to time dependent newsvendor models. The Journal of Economic Dynamics and Control. 2013;37(7):1284-1299
4. Isaacs R. Differential Games, Parts 1–4. The Rand Corpration, Research Memorandums Nos. RM-1391, RM-1411, RM-1486; 1954-1955
5. Basar T, Olsder GJ. Dynamic Noncooperative Game Theory. London: Academic Press; 1982
6. Hamadène S. Nonzero-sum linear-quadratic stochastic differential games and backward-forward equations. Stochastic Analysis and Applications. 1999;17(1):117-130
7. Wu Z. Forward-backward stochastic differential equations, linear quadratic stochastic optimal control and nonzero sum differential games. Journal of Systems Science and Complexity. 2005;18(2):179-192
8. An TTK, Øksendal B. Maximum principle for stochastic differential games with partial information. Journal of Optimization Theory and Applications. 2008;139(3):463-483
9. Wang G, Yu Z. A Pontryagin’s maximum principle for non-zero sum differential games of BSDEs with applications. IEEE Transactions on Automatic Control. 2010;55(7):1742-1747
10. Wang G, Yu Z. A partial information non-zero sum differential game of backward stochastic differential equations with applications. Automatica. 2012;48(2):342-352
11. von Stackelberg H. Marktform und Gleichgewicht (An English translation appeared in the Theory of the Market Economy, Oxford University Press, 1952). Vienna: Springer; 1934
12. Basar T. Stochastic stagewise Stackelberg strategies for linear quadratic systems. In: Kohlmann M, Vogel W, editors. Stochastic Control Theory and Stochastic Differential Systems. Berlin: Springer; 1979
13. Yong J. A leader-follower stochastic linear quadratic differential games. SIAM Journal on Control and Optimization. 2002;41(4):1015-1041
14. Bensoussan A, Chen S, Sethi SP. The maximum principle for global solutions of stochastic Stackelberg differential games. SIAM Journal on Control and Optimization. 2015;53(4):1956-1981
15. Wang G, Wu Z, Xiong J. Maximum principles for forward-backward stochastic control systems with correlated state and observation noises. SIAM Journal on Control and Optimization. 2013;51(1):491-524
16. Huang J, Wang G, Xiong J. A maximum principle for partial information backward stochastic control problems with applications. SIAM Journal on Control and Optimization. 2009;48(4):2106-2117
17. Andersson D, Djehiche B. A maximum principle for SDEs of mean-field type. Applied Mathematics and Optimization. 2011;63:341-356
18. Li J. Stochastic maximum principle in the mean-field controls. Automatica. 2012;48(2):366-373
19. Yong J. Linear-quadratic optimal control problems for mean-field stochastic differential equations. SIAM Journal on Control and Optimization. 2013;51(4):2809-2838
20. Shi J. Sufficient conditions of optimality for mean-field stochastic control problems. In: Proceedings of 12th ICARCV, Guangzhou, 5–7 December 2012; pp. 747-752
21. Xiong J. An Introduction to Stochastic Filtering Theory. London: Oxford University Press; 2008
22. Yong J, Zhou X. Stochastic Controls: Hamiltonian Systems and HJB Equations. New York: Springer; 1999

[1] 1. Williams N. A solvable continuous time principal agent model. Journal of Economic Theory. 2015;159:989-1015

[2] 2. Williams N. On Dynamic Principle-agent Models in Continuous Time. Working Paper. University of Wisconsin-Madison; 2008

[3] 3. Øksendal B, Sandal L, Ubøe J. Stochastic Stackelberg equilibria with applications to time dependent newsvendor models. The Journal of Economic Dynamics and Control. 2013;37(7):1284-1299

[4] 4. Isaacs R. Differential Games, Parts 1–4. The Rand Corpration, Research Memorandums Nos. RM-1391, RM-1411, RM-1486; 1954-1955

[5] 5. Basar T, Olsder GJ. Dynamic Noncooperative Game Theory. London: Academic Press; 1982

[6] 6. Hamadène S. Nonzero-sum linear-quadratic stochastic differential games and backward-forward equations. Stochastic Analysis and Applications. 1999;17(1):117-130

[7] 7. Wu Z. Forward-backward stochastic differential equations, linear quadratic stochastic optimal control and nonzero sum differential games. Journal of Systems Science and Complexity. 2005;18(2):179-192

[8] 8. An TTK, Øksendal B. Maximum principle for stochastic differential games with partial information. Journal of Optimization Theory and Applications. 2008;139(3):463-483

[9] 9. Wang G, Yu Z. A Pontryagin’s maximum principle for non-zero sum differential games of BSDEs with applications. IEEE Transactions on Automatic Control. 2010;55(7):1742-1747

[10] 10. Wang G, Yu Z. A partial information non-zero sum differential game of backward stochastic differential equations with applications. Automatica. 2012;48(2):342-352

[11] 11. von Stackelberg H. Marktform und Gleichgewicht (An English translation appeared in the Theory of the Market Economy, Oxford University Press, 1952). Vienna: Springer; 1934

[12] 12. Basar T. Stochastic stagewise Stackelberg strategies for linear quadratic systems. In: Kohlmann M, Vogel W, editors. Stochastic Control Theory and Stochastic Differential Systems. Berlin: Springer; 1979

[13] 13. Yong J. A leader-follower stochastic linear quadratic differential games. SIAM Journal on Control and Optimization. 2002;41(4):1015-1041

[14] 14. Bensoussan A, Chen S, Sethi SP. The maximum principle for global solutions of stochastic Stackelberg differential games. SIAM Journal on Control and Optimization. 2015;53(4):1956-1981

[15] 15. Wang G, Wu Z, Xiong J. Maximum principles for forward-backward stochastic control systems with correlated state and observation noises. SIAM Journal on Control and Optimization. 2013;51(1):491-524

[16] 16. Huang J, Wang G, Xiong J. A maximum principle for partial information backward stochastic control problems with applications. SIAM Journal on Control and Optimization. 2009;48(4):2106-2117

[17] 17. Andersson D, Djehiche B. A maximum principle for SDEs of mean-field type. Applied Mathematics and Optimization. 2011;63:341-356

[18] 18. Li J. Stochastic maximum principle in the mean-field controls. Automatica. 2012;48(2):366-373

[19] 19. Yong J. Linear-quadratic optimal control problems for mean-field stochastic differential equations. SIAM Journal on Control and Optimization. 2013;51(4):2809-2838

[20] 20. Shi J. Sufficient conditions of optimality for mean-field stochastic control problems. In: Proceedings of 12th ICARCV, Guangzhou, 5–7 December 2012; pp. 747-752

[21] 21. Xiong J. An Introduction to Stochastic Filtering Theory. London: Oxford University Press; 2008

[22] 22. Yong J, Zhou X. Stochastic Controls: Hamiltonian Systems and HJB Equations. New York: Springer; 1999

Stochastic Leader-Follower Differential Game with Asymmetric Information

Game Theory - Applications in Logistics and Economy

Abstract

Keywords

Author Information

Jingtao Shi*

1. Introduction

1.1. Motivation

1.2. Problem formulation

1.3. Literature review and contributions of this chapter

2. Stackelberg equilibrium

2.1. The Follower’s problem

2.2. The Leader’s problem

3. Applications to LQ case

3.1. Problem of the follower

3.2. Problem of the leader

4. Concluding remarks

Acknowledgments

Notes

References

Dipping Headlights: An Iterated Prisoner’s Dilemma or Assurance Game

Stochastic Leader-Follower Differential Game with Asymmetric Information

Game Theory - Applications in Logistics and Economy

Abstract

Keywords

Author Information

Jingtao Shi*

1. Introduction

1.1. Motivation

1.2. Problem formulation

1.3. Literature review and contributions of this chapter

2. Stackelberg equilibrium

2.1. The Follower’s problem

2.2. The Leader’s problem

3. Applications to LQ case

3.1. Problem of the follower

3.2. Problem of the leader

4. Concluding remarks

Acknowledgments

Notes

References

Continue reading from the same book

Game Theory