Open access peer-reviewed chapter

Sequential Optimization Model for Marine Oil Spill Control

Written By

Kufre Bassey

Submitted: 02 November 2015 Reviewed: 14 March 2016 Published: 06 July 2016

DOI: 10.5772/63050

From the Edited Volume

Robust Control - Theoretical Models and Case Studies

Edited by Moises Rivas López and Wendy Flores-Fuentes

Chapter metrics overview

1,532 Chapter Downloads

View Full Metrics

Abstract

This chapter gives credence to the introduction of optimal control theory into oil spill modeling and develops an optimization process that will aid in the effective decision-making in marine oil spill management. The purpose of the optimal control theory is to determine the control policy that will optimize (maximize or minimize) a specific performance criterion, subject to the constraints imposed by the physical nature of the problem. A fundamental theorem of the calculus of variations is applied to problems with unconstrained states and controls, whereas a consideration of the effect of control constraints leads to the application of Markovian decision processes. The optimization objectives are expressed as value function or reward to be optimized, whereas the optimization models are formulated to adequately describe the marine oil spill control, starting from the transportation process. These models consist of conservation relations needed to specify the dynamic state of the process given by the chemical compositions and movements of crude oil in water.

Keywords

  • decision theory
  • marine oil spill
  • optimal control
  • sequential optimization
  • Markov processes

1. Introduction

The degradation of aquatic ecosystem is generally agreed to be undesirable. Historically, most evaluations of the ecological effects of petroleum contamination have related impacts to effects on the supply of products and services of importance to human cultures. According to Xu and Pang [1], most of the environmental and pollution control laws were legislated to protect ecological objectives and public health. Here, a substance is considered to be a pollutant if it is perceived to have adverse effects on wildlife or human well-being. In recent years, a number of substances appear to pose such threats. Among them is crude oil spillage, which first came to public attention with the Torrey Canyon disaster in 1967.

The risk of crude oil spillage to the sea presents a major threat to the marine ecology compared with other sources of pollution in the oceans. Before now, it was earlier reported that oil spillage impacts negatively on wildlife and their environments in various ways, which include the alteration of the ecological conditions, and can result into alterations of the environmental physical and chemical composition, destruction of nutritional capita of the marine biomass, changes in the biological equilibrium of the habitat, and as a threat to human health [2]. The same can also be said about Nigeria, where oil spillage is a major environmental problem and its coastal zone is rated as one of the most polluted spots on the planet in the year 2006 [3]. For instance, from 1976 to 2007, over 1,896,960 barrels of oil were sunk into the Nigerian coastal waters resulting in a serious pollution of drinkable water and destruction of resort centers, properties, and lives along the coastal zone. This was seen to be a major contributor to the regional crisis in the Nigeria Niger-Delta region.

As a case in point, after a spill in the ocean, oil in water body, regardless of whether it originated as surface or subsurface spill, forms a thin film called oil slick as it spreads in water. The oil slick movement is governed by the advection and blustery diffusion as a result of water current and wind action. The slick always spreads over the water surface due to gravitational, inertia, gluey, and interfacial strain force equilibrium. The oil composition also changes from the early time of the spill. Thus, the water-soluble components of the oil dissolve in the water column, whereas the immiscible components emulsified and disperse in the water column as small droplets and light (low molecular weight) fractions evaporate (for example, see [4]).

In essence, the frequency of accidental oil spills in aquatic environments has presented a growing global concern and awareness of the risks of oil spills and the damage they do to the environment. However, it is widely known that oil exploration is a necessity in our industrial society and a major sustainer of our lifestyle, as most of the energy used in Canada and the United States, for instance, is for transportation that runs on oil and petroleum products. Thus, in as much as the industry uses oil and petroleum derivatives for the manufacturing of vital products, such as plastics, fertilizers, and chemical feedstock, the drifts in energy usage are not likely to decrease much in the near future. In what follows, it is a global belief that the production and consumption of oil and petroleum products might continue to increase worldwide while the threat of oil pollution is also likely to increase accordingly.

Consequently, a fundamental problem in environmental research in recent time has been identified in the literature to how to properly assess and control the spatial structure of pollution fields at various scales, and several studies showed that mathematical models were the only available tools for rapid computations and determinations of spilled oil fate and for the simulation of the various clean-up operations.

Advertisement

2. Methodological model

Now, consider the introduction of an optimal control theory into spill modeling to develop an optimization process that will aid effective decision-making in marine oil spill management. The purpose of the optimal control theory is to determine the control policy that will optimize (maximize or minimize) a specific performance criterion subject to the constraints imposed by the physical nature of the problem. A fundamental theorem of the calculus of variations is applied to problems with unconstrained states and controls, whereas a consideration of the effect of control constraints leads to the application of Markovian decision processes.

The optimization objectives are expressed as a performance index (value function or reward) to be optimized, whereas the optimization models are formulated to adequately describe the marine oil spill control starting from the transportation process. These models consist of conservation relations needed to specify the dynamic state of the process given by the chemical compositions and movements of crude oil in water.

2.1. Mathematical preliminaries and definition of terms

In our basic optimal control problem, u ( t ) is used for the control and x ( t ) is used for the state variables. The state variable satisfies a differential equation that depends on the control variable:

x ( t ) = g ( t , x ( t ) , u ( t ) ) E1

where x ( t ) is the state differential defining the performance index. This implies that, as a control function changes, the solution to the differential equation will also change. In other words, one can view the control-to-state relationship as a map u ( t ) x = x ( u ) [we wrote x ( u ) just to remind us of the dependence on u]. Our basic optimal control problem therefore consisted of finding, in mathematical terms, a piecewise continuous control u ( t ) and the associated state variable x ( t ) to optimize a given objective function. That is to say,

max u t 0 t 1 f ( t , x ( t ) , u ( t ) ) d t E2

x ( t ) = g ( t , x ( t ) , u ( t ) ) x ( t 0 ) = 0 a n d x ( t 1 ) f r e e E3

Such a maximizing control is called an optimal control. By “ x ( t 1 ) free”, it means that the value of x ( t 1 ) is unrestricted. Here, the functions f and g are continuously differentiable functions in all arguments. Thus, whereas the control(s) is piecewise continuous, the associated states are piecewise differentiable. This implies that, depending on the scale of the spatial resolution (like the case of oil spill), an introduction of space variables could alter the basic model from ordinary differential equations (with just time as the underlying variable) to partial differential equations (PDEs). Let us focus our attention to the consideration of optimal control of PDEs. Our solution to the control problem will then depend on the existence of an optimal control in the PDE.

The general idea of the optimal control of PDEs here starts with a PDE with a state solution x and control u. Set ∂ to denote a partial differential operator with appropriate initial and boundary conditions:

x = f ( x , u ) in Ω × [ 0 , T ] E4

This implies that we are considering a problem with space x and time t within a territorial boundary, Ω × [ 0 , T ] . The objective functional in this problem represents the goal of the problem, and we seek to find an optimal control u * in an appropriate control set such that

J ( u * ) = min u J ( u ) E5

When the control cost is considered, with an objective functional

J ( u ) = 0 T Ω g ( x , t , x ( t ) , u ( x , t ) ) d x d t E6

To consider the properties of the functional, it is important to note the following fundamentals:

  1. A functional J is “a rule of correspondence that assigns to each function, say x ( t ) , constrained in a certain set of functions, say X, a unique real number. The set of functions is called the domain of the functional, and the set of real numbers associated with the functions in the domain is called the range of the functional” [5].

  2. Let δ ( J ) be the first variation of the functional; thus, δ ( J ) is the part of the increment of Δ J , which is linear in the variation δ ( x ) such that

    Δ J ( x , δ ( x ) ) = δ ( J ) [ x , δ ( x ) ] + g ( x , δ ( x ) ) δ ( x ) E7

    where δ ( J ) is also linear in δ ( x ) . Suppose that lim δ ( x ) 0 g ( x , δ ( x ) ) = 0 ; then, J is said to be differentiable on x, whereas δ ( J ) is the first variation of J evaluated for x ( t ) [5].

  3. A functional J with domain X has a relative optimum at x * if there is an ε > 0 , such that, for all functions x X , which satisfy that x x * < ε , the increment of J has the same sign. In other words, J ( x * ) is a relative minimum if Δ J = J ( x ) J ( x * ) 0 and a relative maximum if Δ J = J ( x ) J ( x * ) 0 . Hence, J is said to be a functional of the function x ( t ) if and only if it first satisfies the scalar commutative property J ( α x ) = α J ( x ) for all x X and for all real numbers α such that α x X .

  4. A rule of correspondence that assigns to each function x ( t ) X , defined for t [ t 0 , T ] , a real number is called the norm of a function, where the norm of x is given as x . If x and x + δ ( x ) are both functions for which the functional J is defined, then the increment of the functional Δ J is defined as

    Δ J = J ( x + δ ( x ) ) J ( x ) E8

  5. A differential equation whose solutions are the functions for which a given functional is stationary is known as an Euler-Lagrange equation (Euler’s equation or Lagrange’s equation).

Fundamental theorem of variational calculus [5]: This theorem states that “if x * is optimum, then it is a necessary condition that the first variation of J must vanish on x. That is to say, δ ( J ) [ x * , δ ( x ) ] = 0 for all admissible δ ( x ) ”.

2.2. Model conceptualization

The fundamental principle upon which the pollutant fate and transport models are based is the law of conservation of mass [6]:

{ h t + ¯ ( h v ¯ ) ¯ ( D ¯ h ) = R h C t + ( C u ) ( E C ) = R E9

where

  1. h = oil slick thickness,

  2. C = oil concentration,

  3. v ¯ = oil slick drifting velocity,

  4. D = oil fluid velocity,

  5. E = dispersion-diffusion coefficient,

  6. = computational slick spreading function,

  7. R h and R = physical chemical kinetic terms,

  8. u = grid size,

  9. ¯ = Cartesian coordinate, and

  10. t = time.

Eq. (9) can be modified as

γ i i d x d y d z ,

where d x d y d z denotes the differential volume of the state variable assuming a net chemical contaminant flux in each axial direction such that γ i = contaminant movement in each axial direction ( i = x , y , z ) and d x , d y , d z = differential distances in the x, y, and z directions.

The fluidity of oil in water contains the advection due to current and wind as well as the dispersive instability due to weathering processes. Thus, if we set

γ = ω q d q E10

where

  1. γ = movement of contaminant vector,

  2. ω = contaminant discharge vector,

  3. q = contaminant molar concentration,

  4. d = dispersion tensor, and

  5. ∇ = gradient operator (Laplacian).

With minor mathematical regularities, Eq. (10) will become

( ω q d q ) = τ t + m E11

where

  1. τ = total concentration of contaminant in the system,

  2. m = decay rate of contaminant, and

  3. t = time.

A two-dimensional differential representation of Eq. (11) is given as

τ t = [ c v x x + v x q x + v x x q x + v x 2 q x 2 q v y y v y q y + v y y q y + v y 2 q y 2 ] m = ( q . v x ) x + [ v x q x ] x ( q . v y ) y + [ v y q y ] y m , E12

so that we have v x and v y , which represented the fluid velocities in the x and y directions. By applying the principle of the conservation of mass, the steady-state equation of spill transportation is given as

2 V S T d x d y l b = h x x p 2 x + h x 2 p 2 x 2 + h y y p 2 y + h y 2 p 2 y 2 = x [ h x p 2 x ] + [ h y p 2 y ] E13

where

  1. h = oil penetrability trajectory,

  2. p = oil stress,

  3. V = oil viscidness,

  4. S = source of oil mass fluidity,

  5. T = temperature,

  6. b = molecular weight of oil, and

  7. l = a fixed length of the z direction.

According to Refs. [57], “the transport and fate of the spilled oil is governed by the advection due to current and wind, horizontal spreading of the surface slick due to turbulent diffusion, gravitational force, force of inertia, viscous and surface tension forces, emulsification, mass transfer of heat, and changes in the physiochemical properties of oil due to weathering processes (evaporation, dispersion, dissolution, oxidation, etc.)”. Thus, Eq. (13) can be transformed to

q ( x , t ) t = h x x q ( x , t ) + D 2 x 2 q ( x , t ) + R + S E14

where q = { q e , q d , q p } denotes the oil spill concentration in emulsified, dissolved, and particulate phases, respectively, at state x and time t; h is the fluid velocity; D is the spreading function, and R and S denote the environmental factors and the spill source term, respectively.

2.3. Optimality problem

When hydrocarbons enter an aquatic environment, their concentrations tend to decrease with time due to the evaporation, oxidation, and other weathering processes. This could be described as a death process and could be modeled as a first-order reaction [7]. Having known this, the optimal control problem can then be formulated by setting R in Eq. (14) to be

R = k C ( x , t ) E15

so that k denotes a kinetic constant of the environmental factors that influenced the concentration of oil in water. Here, it is assumed that the source term is not known so that’ S = 0.

Then, Eq. (14) can be expressed as

q ( x , t ) t = ( V q ( x , t ) ) + ( D q ( x , t ) ) k q ( x , t ) E16

which is called “oil spill dynamical (or transport) problem”. To solve this problem, a mechanism for controlling the system in marine environment can be set up as follows:

Let Ω be an open, connected subset of n , where n i is the Euclidean n-dimensional space. We defined the spatial boundary of the problem as Ω. The unit variable is t and is contained in the interval [ 0 , T ] , where T < . Let x be the space variable associated with Ω, and let ∂ be a partial differential operator with appropriate initial and boundary conditions, where ∂Ω is the differential boundary of Ω; then,

q t ( x , t ) α Δ q ( x , t ) = q ( x , t ) ( 1 q ( x , t ) ) u ( x , t )   q ( x , t ) i n Ω × [ 0 , T ] q ( x , 0 ) = q 0 ( x ) 0 o n Ω , t = 0 ( s e a b e d b o u n d a r y ) q ( x , t ) = 0 o n Ω × [ 0 , T ] ( s e a s i d e b o u n d a r y ) E17

where Ω × [ 0 , T ] mathematically defined an operation with a PDE operator ∂ in the spatial boundary of the problem Ω within a specified upper and lower horizons [ 0 , T ] .

Eq. (17) is defined as the state equation with a logistic growth q ( 1 q ) and a constant diffusion coefficient α due to weathering processes. The symbol Δ represents the Laplacian. The state q ( x , t ) denotes the volume or concentration of the crude oil and u ( x , t ) is the control that entered the problem over the volumetric domain. The zero boundary conditions imply the limitation of the slick at the surrounding environment.

The reward or value objective functional can be obtained as

J ( u ) = 0 T Ω e θ t ( ξ u ( x , t ) q ( x , t ) A u ( x , t ) 2 ) d x d t E18

Here, ξ denotes the price of spilled oil, so that ξuq represents the reward from the control amount uq. Note that a quadratic cost for the clean-up effort with a weighted coefficient A, where A is assumed to be a positive constant, is applied. The term e θ t is introduced to denote a discounted value of the accrued future costs with 0 θ < 1 . By setting ξ = 1 (for convenience), an optimal control u * is needed to optimize a control strategy focusing on the actual detected spill point, such that application of any control on a no-spill region (look-alike) would be minimized [i.e., u * ( x , t ) = 0 ] and the value of all future earnings would be maximized. In other words, we seek for u * such that

J ( u * ) = max u U J ( u ) E19

where U denotes a set of allowable control, and the maximization is over all measurable controls with 0 u ( x , t ) m < 1 a.e. Under this set-up, it follows that, within the context of optimal control, the state solution satisfies q ( x , t ) 0 on Ω × ( 0 , T ) by the maximum principle for parabolic equations.

Lemma 1 [8]: Let U be a convex set and J be strictly convex on U. Then, there exists at most one u * U such that J has a minimum at u * . This implies that, by the maximum principle for parabolic equations, the necessary conditions for optimality are satisfied whenever the state solution satisfies q ( x , t ) 0 on Ω × ( 0 , T ) .

Advertisement

3. Necessary optimality conditions

Consider the following conservation relations [8]:

x ˙ t = f ( x t , u t ) x t = 0 x ( 0 ) E20

where x t is the composition and concentration of the pollutant at time t, u t denotes controls that enter on the boundary of the problem at time t, f is a set of nonlinear functions representing the conservation relation, and x t = 0 denotes the initial condition of x. Every change in the control function changes the solution to Eq.(20). Thus, for a given objective functional to be maximized, a piecewise continuous control policy u t and the state variable x t have to be obtained. The principle technique is to determine the necessary conditions that define an optimal control policy u ( t ) that would cause the system to follow a path x ( t ) , such that the performance functional

J ( u ) = 0 T F ( x , u , t ) d t E21

would be optimized.

Consider also the Lagrangian

L = F ( x , u , t ) + λ ( f ( ) x ˙ ) E22

where λ denotes the dynamic Lagrange multipliers or costate variables with its derivative given as λ′. For more simplification, an augmented functional with the same optimum of (21) could further be derived as

J = 0 T L ( x , x ˙ , u , λ , t ) d t , E23
,

and by introducing the variations δ ( x ) , δ ( x ˙ ) , δ ( u ) , δ ( λ ) , δ ( T ) , the first variation of the functional would be

δ J = 0 T [ L x d d t L x ˙ ] δ ( x ) d t + [ L x ˙ ( T ) ] δ ( x T ) + [ L ( T ) ( L x ˙ ( T ) ) x ˙ ( T ) ] δ ( T ) + 0 T ( L λ ) δ ( λ ) d t + 0 T ( L u ) δ ( u ) d t E24

Noticed that, by the fundamental theorem of variational calculus, for x ( t ) to be an optimum of the functional J, it is necessary that δ J = 0 . Because the controls and states are unbounded, the variations δ ( x ) , δ ( λ ) , and δ ( u ) are free and unconstrained. Thus, the following are the necessary conditions for optimality:

(i) Existence and uniqueness: Euler-Lagrange equations

Because the variation δ ( x ) was not bounded (i.e., it was free), we have

L x d d t L x ˙ = 0 E25

Using Eq. (22), obtain

L x ˙ = λ . E26

The Euler-Lagrange equations could be transformed as

λ ˙ = L x , E27

and by the definition of the Lagrangian, Eq. (27) becomes

λ ˙ = ( f x ) λ F x E28

Eq. (28) shows that the Euler-Lagrange equations are the equations that specify the dynamic Lagrange multipliers.

(ii) Constraints relations

Because the variation δ ( λ ) is free, we have

L λ = 0 E29

which is equivalent to (20). This implies that, along the optimal trajectory, the state differential equations must hold.

(iii) Optimal control

Also, because the variation δ ( u ) is free, it follows that the optimal control policy must be consistent with

L u = 0 E30

or

F u + ( f u ) λ = 0 , and E31

(iv) Transversality boundary conditions

λ ( T ) δ ( T ) + [ F ( T ) + λ ( T ) f ( T ) ] δ ( T ) = 0 E32

The necessary conditions (i) to (iv) could be simplified further by introducing an Hamiltonian

H = F ( x , u , t ) + λ f ( x , u , t ) E33

Such that

  1. Euler’s equation:

    λ ˙ = H x E34

  2. Constraints relations:

    x ˙ = f ( ) = H λ E35

  3. Optimal control:

    H u = 0 , and E36

  4. Boundary conditions:

    λ ( T ) δ ( x T ) + H ( T ) δ ( T ) = 0 E37

Furthermore, with the assumption that all the necessary conditions for optimality exist and sufficient for a unique optimal control, a sequential decision processes for optimal response strategy can be developed.

Advertisement

4. Sequential optimization processes

Sequential decision processes are mathematical abstractions of situations in which decisions must be made in several stages while incurring a certain cost at each stage. The philosophy here is to establish a sequential decision policy to be used as a combating technique strategy in oil spill control.

First, consider x t at time t [ 0 , T ] , where T specifies the time horizon for the situation. For a control u t defined on [ 0 , T ] , the state equation given in Eq. (38) assumes a sudden rate of variation in the system. Thus, x t n denotes the state of oil spill in waters, whereas x ˙ t n represents the vector of first-order time derivatives of x t and u t U m denotes the control vector. With the assumption that the initial value x 0 and the control trajectory over the time interval 0 t T are known, the optimization problem over the control trajectory is given as

min u 0 T f ( x ( t ) , u ( t ) , t ) d t E38
s u b j e c t t o x ˙ ( t ) = g ( x ( t ) , u ( t ) , t ) E39

where g is a given function of u, ta, and possibly x. This model establishes a sequential decision path for optimal policy to be used in the application of oil spill combating technique.

By introducing a value function V, we have

V ( 0 , x 0 ) : = min u 0 T f ( t , x ( t ) , u ( t ) ) d t s u b j e c t t o x ˙ ( t ) = g ( t , x ( t ) , u ( t ) ) , E40

and by fixing Δ t > 0 , we get

V ( 0 , x 0 ) = min u { 0 Δ t f ( t , x ( t ) , u ( t ) ) d t + Δ t T f ( t , x ( t ) , u ( t ) ) d t . E41

Also, with the application of the principle of optimality,

See [9] for detailed discussion on principle of optimality.

we have
V ( 0 , x 0 ) = min u { 0 Δ t f ( t , x ( t ) , u ( t ) ) d t + V ( Δ t , x ( Δ t ) ) E42

Discretizing via Taylor series expansion, we get

V ( 0 , x 0 ) = min u { f ( t 0 , x 0 , u ) Δ t + V ( t 0 , x 0 ) + V t ( t 0 , x 0 ) Δ t + V x ( x 0 , t 0 ) Δ x + E43

where Δ x = x ( t 0 + Δ t ) x ( t 0 ) . Thus, letting Δ t 0 and dividing by Δ t , we have

V t ( x , t ) = min u { f ( t , x , u ) + V x ( x , t ) g ( t , x , u ) E44

with boundary condition

V ( T , x T ) = 0 . E45

Theorem 1 [8]: Let [ t 0 , t 1 ] denotes the range of time in which a sequence of control is applied. Then, for any processes, t 0 τ 1 τ 2 t 1 :

V ( τ 1 , x ( τ 1 ) ) V ( τ 2 , x ( τ 2 ) ) E46

and for any t, such that t 0 t t 1 , the set Λ t , x ( t ) is not empty, as the restriction of the control to the time interval is feasible for x ( t ) .

Proof:

Let u * be any optimal control in Λ τ 2 , x ( τ 2 ) , where u * is defined on [ τ 1 , τ 1 ] and is given by

u * ( ξ ) = { u ( ξ ) , i f τ 1 ξ τ 2 u ( ξ ) , i f τ 2 ξ τ 1 E47

Then, u * Λ τ 1 , x ( τ 1 ) . Hence,

V ( τ 1 , x ( τ 1 ) ) ϕ 1 ( τ 1 , x * ( τ 1 ) ) E48

where ϕ 1 ( ) is a value function defined on [ τ 1 , τ 1 ] . Because u * was any optimal control in Λ τ 2 , x ( τ 2 ) , taking the infimum over the controls in Λ τ 2 , x ( τ 2 ) gives

V ( τ 1 , x ( τ 1 ) ) V ( τ 2 , x ( τ 2 ) ) E49

This implies that, if u * is any optimal control for the sequential optimization process, the value function V evaluated along the state and control trajectories will be a nondecreasing function of time.

Theorem 1 summarizes the expected future utility at any node of the decision tree on the assumption that an optimal policy will be imminent. The implication is that a continuous selection of a sequence of control at different assessment point will optimize the performance index of the control strategy. This, however, requires a decision rule, and the next section contained further explanation on this.

4.1. Decision rule

A successful sequential decision requires a decision rule that will prescribe a procedure for action selection in each state at a specified decision epoch. This is a known strategy in the field of operation research. More so, the problems of decision-making under uncertainty are best modeled as Markov decision processes (MDP) [8]. When a rule depends on the previous states of the system or actions through the current state or action only, it is said to be Markovian but deterministic if it chooses an action with certainty [8]. Thus, a deterministic decision rule that depends on the past history of the system is known as “history dependent”. In general, MDP can be expressed as a process that

  • allowed the decision maker to select an action whenever the system state changes and model the progression in continuous time and

  • allowed the time spent in a certain state to follow a known probability distribution.

It follows a time-homogeneous, finite state, and finite action semi-MDP (SMDP) defined as

  1. P ( x t + 1 | u t , x t ) , t = { 0 , 1 , 2 , , T } , T transition probability;

  2. P ( r t | u t , x t ) reward probability; and

  3. P ( u t | x t ) = π ( u t | x t ) policy

This implies that, although the system state may change several times between decision epochs, the decision rule remains that only the state at a decision epoch is relevant to the decision maker. Consider the stochastic process x 0 , x 1 , x 2 , ... , where x t or x ( t ) (which may be used interchangeably). Note that we are considering an optimal control of a discrete-time Markov process with a finite time horizon T, where the Markov process x takes values in some measurable space Ω. In what follows, assuming that we have a sequence of control u 0 , u 1 , u 2 , ... , where u n is the action taken by the decision maker at time t = 0 , 1 , , n , take values in some assessable space U of allowable control. The decision rule is described by considering a class of randomized history-dependent strategies consisting of a sequence of functions

d n = ( d 0 , d 1 , , d T 1 ) , E50

and also by considering the following sequence of events:

  • an initial state x 0 is obtained;

  • having known x 0, the response official (the controller) selects a control u 0 U ;

  • a state x1 is attained according to a known probability measure P ( x 1 | x 0 , u 0 ) ; and

  • knowing x 1, the response official selects a control u 1 U .

The basic problem therefore is to find a policy π = ( d 0 , d 1 ) consisting of d 0 and d 1 that will minimize the objective functional J ( x 0 ) = f [ x 1 , d 1 ( x 1 ) ] P ( x 1 | x 0 , d 0 ( x 0 ) ) , which is given as P ( u t | x t ) = π ( u t | x t ) . Hence, we set μ t π : H t d R 1 to denote the total expected reward obtained by using Eq. (50) at decision epochs t , t + 1 , , T 1 . With an assumption that the history at decision epoch t is h t d H t d , the decision rule follows μ t π for t < T such that

μ t π ( h t d ) = E h t π [ k = t T 1 r k ( x k , u k ) + r k ( x T ) ] E51

In particular, if the SMD processes (i) to (iii) are stationary, then, for a given rule π and an initial state x, the future rewards can be estimated. Let V π ( x ) be the value function; then, the expected discounted return could be measured as

V π ( x ) = E [ t = 0 θ t r t | x 0 = x ; π ] E52

However, the entire cast of players involved in oil spill control (the contingency planners, response officials, government agencies, pipeline operators, tanker owners, etc.) shares keen interest in being able to anticipate oil spill response costs for planning purposes according to Arapostathis et al. [9]. This means that the type of decision and/or action chosen at a given point in time is a function of the clean-up cost. In other words, the clean-up/response cost is a key indicator for the optimal control. Thus, to set a pace for rapid response, it is important to introduce cost concepts into the control paradigm as discussed in the next section.

Advertisement

5. Optimal costs model

Considered the following synthesis: the system starts in state x 0 and the response team takes a permitted action u t ( x 0 ) , resulting in an output (reward) r t . This decision determines the cost to incur. Now, defining a cost function that assigned a cost to each sequence of controls as

C ( x 0 , u 0 : T 1 ) = t = 0 T 1 β ( t , x t , u t ) + ω ( x T ) E53

where β ( t , x , u ) is the cost associated with taking action u at time t in state x and ω ( x T ) is the cost related to actions taken up to time T; the optimal control problem is to find the sequence u 0 : T 1 , that minimizes Eq. (53). Thus, we introduce the optimal cost functional:

C ( t , x t ) = min u t : T 1 ( k = t T 1 β ( k , x k , u k ) + ω ( x T ) ) E54

which solves the optimal problem from an intermediate time t until the fixed end time T, starting at an arbitrary state x t . Here, the minimum of Eq. (53) is denoted by C ( 0 , x 0 ) . Hence, a procedure to compute C ( t , x ) from C ( t + 1 , x ) for all x recursively using dynamic programming is given as follows:

Set

C ( T , x ) = ω ( x )

So that

C ( t , x t ) = min u t : T 1 { k = t T 1 β ( k , x k , u k ) + ω ( x T ) } = min u t { β ( t , x t , u t ) + min u t + 1 : T 1 [ k = t + 1 T 1 β ( k , x k , u k ) + ω ( x T ) ] } = min u t { β ( t , x t , u t ) + C ( t + 1 , x t + 1 ) } = min u t { β ( t , x t , u t ) + C ( t + 1 , x t + f ( t , x t , u t ) ) } E55

It could be seen that the reduction to a sequence of minimizations over ut from the minimization over the whole path u0:T1 is due to the Markovian nature of the problem: the future depends on the past and the past depends on the future only through the present. Thus, it could be seen that, in the last line of Eq. (55), the minimization is done for each x t separately and also explicitly depends on time. The procedure for the dynamic programming is illustrated as follows:

Step 1: Initialization: C ( T , x ) = ω ( x )

Step 2: Backwards: For t = T 1 , , 0 and for all x, compute

u t * ( x ) = arg min u { β ( t , x , u ) + C ( t + 1 , x + f ( t , x , u ) ) } C ( t , x ) = β ( t , u t * ( x ) ) + C ( t + 1 , x + f ( t , x , u t * ( x ) ) )

Step 3: Forwards: For t = 0 , , T 1 , compute

x t + 1 * = x t * + f ( t , x t * , u t * ( x t * ) ) , x 0 * = x 0 .

Lemma 2: Let π * [ u 0 * , u 1 * , , u T 1 * ] be an optimal control policy for the control problem and assume that, when using π * , a given state x i occurs at time i , ( i t ) a.e. Suppose that the state is at stage x i at time i, and we wish to minimize the cost functional from time i to T:

E [ ω ( x T ) + t = i T 1 β ( x t , u t ( x t ) ) ] E56

Then, [ u i * , u i + 1 * , , u T 1 * ] is the optimal path for this problem and u t * is the optimal control.

Proof: Define C * ( t , x ) = ω ( x ) as the optimal cost-to-go:

min u [ g ( x , u ) + t C * ( t , x ) + x C * ( t , x ) f ( x , u ) ] = 0 E57

where C * ( T , x ) = ω ( x ) . We can say that x C * ( T , x ) = ω ( x ) . If we define x C * ( t , x t * ) = λ t , then, by introducing the Hamiltonian, H ( x , u , λ ) = g ( x , u ) + λ f ( x , u ) , where λ ˙ t = x H ( x t * , u t * , λ t ) ; it follows from the optimality principle that

u t * = arg min u [ g ( x t * , u ) + λ t f ( x t * , u ) ] t [ 0 , T ] E58

Theorem 2:Letminu[ g(x,u)+tV(t,x)+xV(t,x)f(x,u) ]=0t,x E59

with the condition that V ( T , x ) = ω ( x ) x

Suppose that u t * attains the minimum in Eq. (59) for all t and x. Let ( x t * | t [ 0 , T ] ) be the oil trajectory obtained from the known quantity of spill at the initial state denoted by x 0, when the control trajectory, u t * V ( t , x t * ) , is used and x ˙ t = f ( x t * , u * ( t , x t * ) ) t [ 0 , T ] . Then,

V ( t , x ) = C * ( t , x ) t , x E60

and { u t * | t [ 0 , T ] } is optimal control [7].

Advertisement

6. Conclusion

This chapter presents the mathematical abstractions of optimal control process where decisions must be made in several stages following an optimal control path to minimize the apparent toxicological effect of oil spill clean-up technique by determining the control measure that will cause a process to satisfy the physical constraints and at the same time optimizing some performance criteria for all future earnings from marine biota. Hence, in the future, if the optimal policy is followed, the recursive method for the sequential optimization will converge to optimal costs control and value function, which optimizes the probable future value at any node of the decision tree.

References

  1. 1. Xu X, Pang S: Briefing of activities relating to the studies on environmental behaviour and economic toxicity of toxic organics. Journal of Environmental Science, 1992; 4(4): 3–9.
  2. 2. Bassey K, Chigbu P: On optimal control theory of marine oil spill management: A Markovian decision approach. European Journal of Operational Research, 2012; 217(2): 470–478.
  3. 3. Brown J: Nigeria Niger Delta bears brunt after 50 years of oil spills. The Independent, UK, October 20, 2006.
  4. 4. Yapa P: State-of-the-art review of modelling transport and fate of oil spills. Journal of Hydraulic Engineering. 1996; 122(11): 594–600.
  5. 5. Bassey K: On optimality of marine oil spill pollution control. International Journal of Operations Research and Optimization, 2011; 2(2): 215–238.
  6. 6. Tkalich P, Huda M, Gin K: A multiphase oil spill model. Journal of Hydraulic Research. 2003; 4(2): 115–125.
  7. 7. Chigbu P, Bassey K: Numerical modelling of spilled oil transport in marine environment. Pacific Journal of Science and Technology. 2010; 10(2):565–571.
  8. 8. Bassey K. Methodological model for optimal control of marine oil spill [Ph.D. thesis]. University of Nigeria; 2014.
  9. 9. Arapostathis A, Borkar V, Fernandez-Gaucherand E, Ghosh M, Marcus S: Discrete-time controlled Markov processes with average cost criterion: A survey. SIAM Journal on Control and Optimization. 1993; 31: 282–344.

Notes

  • See [9] for detailed discussion on principle of optimality.

Written By

Kufre Bassey

Submitted: 02 November 2015 Reviewed: 14 March 2016 Published: 06 July 2016