Bilevel optimization is a special kind of optimization where one problem is embedded within another. The outer optimization task is commonly referred to as the upper-level optimization task, and the inner optimization task is commonly referred to as the lower-level optimization task. These problems involve two kinds of variables: upper-level variables and lower-level variables. Bilevel optimization was first realized in the field of game theory by a German economist von Stackelberg who published a book (1934) that described this hierarchical problem. Now the bilevel optimization problems are commonly found in a number of real-world problems: transportation, economics, decision science, business, engineering, and so on. In this chapter, we provide a general formulation for bilevel disjunctive optimization problem on affine manifolds. These problems contain two levels of optimization tasks where one optimization task is nested within the other. The outer optimization problem is commonly referred to as the leaders (upper level) optimization problem and the inner optimization problem is known as the followers (or lower level) optimization problem. The two levels have their own objectives and constraints. Topics affine convex functions, optimizations with auto-parallel restrictions, affine convexity of posynomial functions, bilevel disjunctive problem and algorithm, models of bilevel disjunctive programming problems, and properties of minimum functions.
- convex programming
- affine manifolds
- optimization along curves
- bilevel disjunctive optimization
- minimum functions
- Mathematics Subject Classification 2010: 90C25
1. Affine convex functions
In optimization problems [16, 17, 19, 23, 24, 25, 26, 27], one can use an affine manifold as a pair , where is a smooth real -dimensional manifold, and is an affine symmetric connection on . The connection produces auto-parallel curves via ODE system
Definition 1.1 An affine manifold is called autoparallely complete if any auto-parallel starting at is defined for all values of the parameter .
Theorem 1.1  Let be a (Hausdorff, connected, smooth) compact -manifold endowed with an affine connection and let . If the holonomy group (regarded as a subgroup of the group of all the linear automorphisms of the tangent space ) has compact closure, then is autoparallely complete.
Let be an auto-parallely complete affine manifold. For a function , we define the tensor of components
Definition 1.2 A function is called:
(1) linear affine with respect to if , throughout;
(2) affine convex (convex with respect to ) if (positive semidefinite), throughout.
The function is: (1) linear affine if its restriction on each autoparallel satisfies , for some numbers that may depend on ; (2) affine convex if its restriction is convex on each auto-parallel .
Theorem 1.2 If there exists a linear affine nonconstant function on , then the curvature tensor field is in .
Proof. For given , if we consider
as a PDEs system (a particular case of a Frobenius-Mayer system of PDEs) with equations and the unknown function , then we need the complete integrability conditions
Corollary 1.1 If there exists linear affine functions on , whose are linearly independent, then is flat, that is, .
Of course this only means the curvature tensor is zero on the topologically trivial region we used to set up our co-vector fields . But we can always cover any manifold by an atlas of topologically trivial regions, so this allows us to deduce that the curvature tensor vanishes throughout the manifold.
Remark 1.1 There is actually no need to extend to the entire manifold. If this could be done, then would now be everywhere nonzero co-vector fields; but there are topologies, for example, , for which we know such things do not exist. Therefore, there are topological manifolds for which we are forced to work on topologically trivial regions.
The following theorem is well-known [16, 17, 19, 23]. Due to its importance, now we offer new proofs (based on catastrophe theory, decomposing a tensor into a specific product, and using slackness variables).
Theorem 1.3 Let be a function.
(1) If is regular or has only one minimum point, then there exists a connection such that is affine convex.
(2) If has a maximum point , then there is no connection making affine convex throughout.
Proof. For the Hessian be positive semidefinite, we need conditions like inequalities and equalities. The number of unknowns is The inequalities can be replaced by equalities using slackness variables.
The first central idea for the proof is to use the catastrophe theory, since almost all families , , , of real differentiable functions, with parameters, are structurally stable and are equivalent, in the vicinity of any point, with one of the following forms :
We eliminate the case with maximum point, that is., Morse -saddle and the saddle point. Around each critical point (in a chart), the canonical form is affine convex, with respect to appropriate locally defined linear connections that can be found easily. Using change of coordinates and the partition of unity, we glue all these connections to a global one, making affine convex on .
At any critical point , the affine Hessian is reduced to Euclidean Hessian, . Then the maximum point condition or the saddle condition is contradictory to affine convexity condition.
A direct proof based on decomposition of a tensor: Let be an affine manifold and be a function.
Suppose has no critical points (is regular). If the function is not convex with respect to , we look to find a new connection , with the unknown a tensor field , such that
where is a positive semi-definite tensor. A very particular solution is the decomposition , where the vector field has the property
and the tensor is
Remark 1.2 The connection is strongly dependent on both the function and the tensor field .
Suppose has a minimum point . In this case, observe that we must have the condition . Can we make the previous reason for and then extend the obtained connection by continuity? The answer is generally negative. Indeed, let us compute
Here we cannot plug in the point because we get , an indeterminate form.
To contradict, we fix an auto-parallel , starting from minimum point , tangent to and we compute (via l’Hôpital rule)
But this result depends on the direction (different values along two different auto-parallels).
In some particular cases, we can eliminate the dependence on the vector . For example, the conditions
are sufficient to do this.
A particular condition for independence on is
In this particular condition, we can show that we can build connections of previous type good everywhere.
1.1. Lightning through examples
Let us lightning our previous statements by the following examples.
Example 1.1 (for the first part of the theorem) Let us consider the function and . Then and has no critical point. Moreover, the Euclidean Hessian of is not positive semi-definite overall. Let us make the above construction for . Taking , we obtain the connection
that is not unique.
Example 1.2 (for one minimum point) Let us consider the function and . Then and has a unique critical minimum point . However, the Euclidean Hessian of is not positive semi-definite overall. We make previous reason for . Hence we obtain ,
Observe that . Hence take .
The next example shows what happens if we come out of the conditions of the previous theorem.
Example 1.3 Let us take the function , where the critical point is an inflection point. We take , which is not defined at the critical point , but the relation of convexity is realized by prolongation,
Let us consider the ODE of auto-parallels
are auto-parallels on , where are real solutions of . These curves are extended at by continuity. The manifold is not auto-parallely complete. Since the image is not a “segment”, the function is not globally convex.
Remark 1.3 For , there exists functions which have two minimum points without having another extremum point. As example,
has two (global) minimum points .
is difference of two affine convex functions (see Section 2).
Our chapter is based also on some ideas in:  (convex mappings between Riemannian manifolds),  (geometric modeling in probability and statistics),  (arc length in metric and Finsler manifolds),  (applications of Hahn-Banach principle to moment and optimization problems),  (geodesic connectedness of semi-Riemannian manifolds), and  (tangent and cotangent bundles). For algorithms, we recommend the paper  (sequential and parallel algorithms).
2. Optimizations with autoparallel restrictions
2.1. Direct theory
The auto-parallel curves on the affine manifold are solutions of the second order ODE system
Obviously, the complete notation is , with
Definition 2.1 Let be open and connected and a function. The point is called minimum (maximum) point of conditioned by the auto-parallel system, together with initial conditions, if for the maximal solution , there exists a neighborhood of such that
Theorem 2.1 If is an extremum point of conditioned by the previous second order system, then .
Definition 2.2 The points which are solutions of the equation are called critical points of conditioned by the previous spray.
Theorem 2.2 If is a conditioned critical point of the function of class constrained by the previous auto-parallel system and if the number
is strictly positive (negative), then is a minimum (maximum) point of constrained by the auto-parallel system.
Example 2.1 We compute the Christoffel symbols on the unit sphere , using spherical coordinates and the Riemannian metric
When , we find
and all the other s are equal to zero. We can show that the apparent singularity at can be removed by a better choice of coordinates at the poles of the sphere. Thus, the above affine connection extends to the whole sphere.
The second order system defining auto-parallel curves (geodesics) on are
The solutions are great circles on the sphere. For example, and = const.
We compute the curvature tensor of the unit sphere . Since there are only two independent coordinates, all the non-zero components of curvature tensor are given by , where . We get and the other components are .
Let be the maximal auto-parallel which satisfies , ; , . We wish to compute with the restriction .
Since , the critical point condition becomes . Consequently, the critical points are either , or , or .
The components of the Hessian of are
At the critical points or , the Hessian of is positive or negative semi-definite. On the other hand, along , we find Consequently, each point , is a minimum point of along each auto-parallel, starting from given point and tangent to .
2.2. Theory via the associated spray
This point of view regarding extrema comes from paper .
The second order system of auto-parallels induces a spray (special vector field) on the tangent bundle , that is,
The solutions of class are called field lines of . They depend on the initial condition , and therefore the notation is more suggestive.
Definition 2.3 Let be open and connected and a function. The point is called minimum (maximum) point of conditioned by the previous spray, if for the maximal field line , there exists a neighborhood of such that
Theorem 2.3 If is an extremum point of conditioned by the previous spray, then is a point where is in .
Definition 2.4 The points which are solutions of the equation
are called critical points of conditioned by the previous spray.
Theorem 2.4 If is a conditioned critical point of the function of class constrained by the previous spray and if the number
is strictly positive (negative), then is a minimum (maximum) point of constrained by the spray.
Example 2.2 We consider the Volterra-Hamilton ODE system .
which models production in a Gause-Witt -species evolving in : (1) competition if , , , and (2) parasitism if , , , .
Changing the real parameter into an affine parameter , we find the connection with constant coefficients
Let be the maximal field line which satisfies . We wish to compute with the restriction .
We apply the previous theory. Introduce the vector field
We set the critical point condition . Since , it follows the relation , that is, the critical point set is a conic in .
Since , the sufficiency condition is reduced to , that is,
This last relation is equivalent either to
Each critical point satisfying one of the last two conditions is a maximum point.
3. Affine convexity of posynomial functions
For the general theory regarding geometric programming (based on posynomial, signomial functions, etc.), see .
Theorem 3.1 Each posynomial function is affine convex, with respect to some affine connection.
Proof. A posynomial function has the form
where all the coefficients are positive real numbers, and the exponents are real numbers. Let us consider the auto-parallel curves of the form
joining the points and , which fix, as example, the affine connection
One term in this sum is of the form , and hence
Remark 3.1 Posynomial functions belong to the class of functions satisfying the statement “product of two convex function is convex”.
Corollary 3.1 Each signomial function is difference of two affine convex posynomials, with respect to some affine connection.
Proof. A signomial function has the form
where all the exponents are real numbers and the coefficients are either positive or negative. Without loss of generality, suppose that for we have and for we have . We use the decomposition
we apply the Theorem and the implication convex.□
Corollary 3.2 (1) The polynomial functions with positive coefficients, restricted to , are affine convex functions.
(2) The polynomial functions with positive and negative terms, restricted to , are differences of two affine convex functions.
Proudnikov  gives the necessary and sufficient conditions for representing Lipschitz multivariable function as a difference of two convex functions. An algorithm and a geometric interpretation of this representation are also given. The outcome of this algorithm is a sequence of pairs of convex functions that converge uniformly to a pair of convex functions if the conditions of the formulated theorems are satisfied.
4. Bilevel disjunctive problem
Let , the leader decision affine manifold, and , the follower decision affine manifold, be two connected affine manifolds of dimension and , respectively. Moreover, is supposed to be complete. Let also be the leader objective function, and let be the follower multiobjective function.
The components are (possibly) conflicting objective functions.
A bilevel optimization problem means a decision of leader with regard to a multi-objective optimum of the follower (in fact, a constrained optimization problem whose constraints are obtained from optimization problems). For details, see [5, 10, 12].
Let , be the generic points. In this chapter, the disjunctive solution set of a follower multiobjective optimization problem is defined by
(1) the set-valued function
(2) the set-valued function
We deal with two bilevel problems:
(1) The optimistic bilevel disjunctive problem
In this case, the follower cooperates with the leader; that is, for each , the follower chooses among all its disjunctive solutions (his best responses) one which is the best for the leader (assuming that such a solution exists).
(2) The pessimistic bilevel disjunctive problem
In this case, there is no cooperation between the leader and the follower, and the leader expects the worst scenario; that is, for each , the follower may choose among all its disjunctive solutions (his best responses) one which is unfavorable for the leader.
So, a general optimization problem becomes a pessimistic bilevel problem.
Theorem 4.1 The value
exists if and only if, for an index , the minimum exists and, for each , either exists or . In this case,
coincides to the minimum of minima that exist.
Proof. Let us consider the multi-functions and . Then . It follows that exists if and only if either exists or , and at least one minimum exists.
Taking minimum of minima that exist, we find
Theorem 4.2 Suppose is a compact manifold. If for each , at least one partial function is affine convex and has a critical point, then the problem has a solution.
Proof. In our hypothesis, the set is nonvoid, for any , and the compacity assures the existence of .
In the next Theorem, we shall use the Value Function Method or Utility Function Method.□
Theorem 4.3 If a increasing scalarization partial function
has a minimum, then there exists an index such that . Moreover, if is bounded, then the bilevel problem
Proof. Let . Suppose that for each . Then would not be minimum point for the partial function . Hence, there exists an index such that .□
Boundedness of implies that the bilevel problem has solution once it is well-posed, but the fact that the problem is well-posed is shown in the first part of the proof.
4.1. Bilevel disjunctive programming algorithm
An important concept for making wise tradeoffs among competing objectives is bilevel disjunctive programming optimality, on affine manifolds, introduced in this chapter.
We present an exact algorithm for obtaining the bilevel disjunctive solutions to the multi-objective optimization in the following section.
Step 1: Solve
Let be a subset in representing the mapping of optimal solutions for the follower multi-objective function.
Step 2: Build the mapping .
Step 3: Solve the leader’s following program
From numerical point of view, we can use the Newton algorithm for optimization on affine manifolds, which is given in .
5. Models of bilevel disjunctive programming problems
The manifold is understood from the context. The connection can be realized in each case, imposing convexity conditions.
Both the lower and the upper level optimization tasks have two objectives each. For a fixed value, the feasible region of the lower-level problem is the area inside a circle with center at origin and radius equal to . The Pareto-optimal set for the lower-level optimization task, preserving a fixed , is the bottom-left quarter of the circle,
The linear constraint in the upper level optimization task does not allow the entire quarter circle to be feasible for some . Thus, at most a couple of points from the quarter circle belongs to the Pareto-optimal set of the overall problem. Eichfelder  reported the following Pareto-optimal set of solutions
The Pareto-optimal front in space can be written in parametric form
Example 5.2 Consider the bilevel programming problem
where the set-valued function is
Since , we get
on the regions where the functions are defined.
Taking into account and , it follows that is the unique optimistic optimal solution of the problem. Now, if the leader is not exactly enough in choosing his solution, then the real outcome of the problem has an objective function value above which is far away from the optimistic optimal value zero.
Example 5.3 Let and a Pareto disjunctive problem
Then it appears a bilevel disjunctive programming problem of the form
This problem is interesting excepting the case . If and are convex functions, then .
To write an example, we use
and we consider a bilevel disjunctive programming problem of the form
The objective and the multi-function produce a multi-function
In context, we find the inferior envelope
Since , the unique optimal solution is .
If we consider only as active, then the unique optimal solution is maintained. If is active, then the optimal solution is .
6. Properties of minimum functions
Let , the leader decision affine manifold, and , the follower decision affine manifold, be two connected affine manifolds of dimension and , respectively. Starting from a function with two vector variables
and taking the infimum after one variable, let say , we build a function
which is called minimum function.
A minimum function is usually specified by a pointwise mapping of the manifold in the subsets of a manifold and by a functional on . In this context, some differential properties of such functions were previously examined in . Now we add new properties related to increase and convexity ideas.
First we give a new proof to Brian White Theorem (see Mean Curvature Flow, p. 7, Internet 2017).
Theorem 6.1 Suppose that is compact, and . Let . If, for each with , we have , then is an increasing function.
Proof. We shall prove the statement in three steps.
(1) If is continuous, then is (uniformly) continuous.
Indeed, is continuous on the compact , hence uniformly continuous. So, for it exists such that if , then , for any , or
On one hand, if we put and , then we have
Hence , so is .
On the other hand,
Hence , so is .
Finally, , for , that is, is (uniformly) continuous.
(2) Let us fix . If and , then it exists such that , for any .
Suppose , it exists such that , for each . It follows , and so is .
If , then we use . For , the above proof holds, and we take .
(3) is an increasing function.
Let and note . is not empty. If , then, by the step (2), and, by the step (1), . If , we can use the step (2) for and it would result that was not the lower bound of . Hence and .
Remark The third step shows that a function having the properties (1) and (2) is increasing. For this the continuity is essential. Only property (2) is not enough. For example, the function defined by on and on has only the property (2), but it is not increasing on .
Remark Suppose that is a function and , where is an interior point of . Since is a critical point, we have
Consequently, is an increasing function. If has a nonvoid boundary, then the monotony extends by continuity (see also the evolution of an extremum problem).
Example 6.1 The single-time perspective of a function is the function , , . The single-time perspective is convex if is convex.
The single-time perspective is an example verifying Theorem 7.1. Indeed, the critical point condition for , in , , gives , where is a critical point of . Consequently, . On the other hand, in the minimum point, we have . Then is increasing if , as in Theorem 4.1.
Theorem 6.2 Suppose that is compact and . Let . If, for each with , we have , then is a partially increasing function.
Proof. Suppose that is a function and , where is an interior point of . Since is a critical point, we have
Consequently, is a partially increasing function. If has a non-void boundary, then the monotony extends by continuity.□
Theorem 6.3 Suppose that is compact and . Let . If, for each with , we have , then is an affine concave function.
Proof. Without loss of generality, we work on Euclidean case. Suppose that is a function and , where is an interior point of . Since is a critical point, we must have
Taking the partial derivative with respect to and the scalar product with it follows
On the other hand
Theorem 6.4 Let be a function and
If the set is affine convex and is affine convex, then is affine convex.
Proof. Suppose is a function. At points , we have