## 1. Introduction

### 1.1. Background

This work deals with modeling rational player behavior in the simulated robotic soccer game. Although we are using the setting for the RoboCup 2D simulated soccer, we want to propose a method that is not based on the particular features of this implementation. This will result in a more general method that could be equally applicable to 3D soccer and also to the soccer video games other than RoboCup.

The ability by the soccer player making rational decisions about where to move on the field without the ball is the critical factor of success both in a real-life and simulated game. With the total of 22 soccer players in two teams, normally only one player on each side is controlling the ball or trying to get the ball possession. The rest 20 players are doing something else; indeed each must be deliberately moving to some place on the field. For an average player, this is happening more than 90 per cent of the time. Thus, unlike other behaviors dealing with rather rare events such as passing the ball, one should be expecting about 90 per cent impact on the whole game from any improvement in player positioning.

Here we propose the algorithm for determining by the artificial player a reasonably good position on the soccer field for himself when the ball is beyond his control. We limit the consideration to the *offensive* positioning, i.e. when the ball is possessed by own team. In the offensive situation, the major purpose of co-coordinated moving to some individual position on the field by each player without the ball is creating opportunities for receiving passes and eventually scoring the opponent goal. Therefore, player positioning is not a standalone task; rather, it is part of a coordination mechanism and is closely related to decision making about passing the ball and the communication between players.

In this study, however, we deliberately isolate positioning from the communication and ball passing assuming that all these features are developed in a concert. In the early publication, one of the co-authors proposed a solution to the ball passing problem with respect to multiple optimality criteria, i.e. a Pareto-optimal solution (Kyrylov, 2008). For the purpose of optimal offensive positioning, here we are using similar multi-criteria approach.

### 1.2. The Two-Layer Player Positioning Model

During the last 10 years, RoboCup scholars have addressed the positioning problem in different ways. Before overviewing them, first we propose a *generic two-level model* that includes all existing methods for player positioning. Besides providing a common base for the comparison of the existing methods, it helps to demonstrate the unique features of our approach.

The first, *upper level*, determines so-called *reference* position for the player where he should be moving to unless this player is involved in intercepting or kicking the ball. Player coordination on this level is addressed only implicitly. The second, *lower level*, is responsible for fine tuning this reference position by transforming it into the *target* one, i.e. the place on the field where the player should be. Good methods thus facilitate coordination with teammates. On both levels, either heuristic decision making rules are applied or some optimality criteria are used. Such rules and criteria are normally reflecting the algorithm designer’s vision of the soccer game. In many cases these rules are acquired from the real soccer played by humans. This is different from the decision making rules derived by learning algorithms; they do not always have direct counterparts in the body of knowledge accumulated by humans about the soccer game.

### 1.3. Previous Work

The first ever comprehensive study of the player positioning problem in the simulated soccer was presumably made in (Stone, Veloso, & Riley, 1999). In this method called *strategic positioning by attraction and repulsion* (SPAR), the higher-level reference position can be regarded as a fixed point in the field assigned to each player with respect to its role in the team formation. For each player the lower control level is dealing with the target position which is the point on the field where attraction and repulsion factors reach some balance. This allowed the player to deviate from its default position within some area in order to move to the target position calculated with respect to current situation. This method proved to be a good start, as after it was implemented in *CMUnited*, this simulated soccer team became the World RoboCup winner in 1998. Yet later on it was criticized for the limited flexibility on the upper control level. More sophisticated method named *Situation-Based Strategic Positioning* and first proposed in 1999 (Reis, Lau, & Oliveira, 2008) has addressed these higher-level shortcomings; unsurprisingly, *FCPortugal* in which it was implemented, has beaten *CMUnited*. This method was based on a set of logical rules that reflected the subjective views of the algorithm designers on player positioning.

The development that followed, did not demonstrate much improvement on the lower control level, though. The next very successful team, *UvA Trilearn* (De Boer & Kok, 2002) that has outplayed *FCPortugal* in several competitions, implemented somewhat simpler player positioning method. In our classification, the early version of this method dated 2001-2002 is completely located on the higher control layer. Indeed, at any time, the reference position was determined as a weighted sum of the player home position in the formation and current location of the ball. So this position is changing over time and maintains relative locations of the players in the formation thus implementing the team strategy. However, this method does not take into account fine details such as the opportunity to receive a pass. Presumably for this reason it was later improved using so-called *coordination graphs* (Kok, Spaan, & Vlassis, 2003). This lower-level model combines decision making about passing the ball and receiving the pass in an elegant way; implemented in *UvA Trilearn*, it helped to become the World RoboCup winner in 2003. However, from the today standpoint, this model could be further improved, as it was using heuristics rather than rigor multi-criteria optimization.

One more group of the improvements to player positioning is a set of methods based on learning algorithms (Andou, 1998; Nakashima, Udo, & Ishibuchi, 2003; Kalyanakrishnan, Liu, & Stone, 2007). Like the coordination graph, some of these methods do not treat positioning as a standalone player skill, which makes theoretical comparisons difficult. More difficulties arise while trying to elicit meaningful decision making rules and especially address the convergence issue of learned rules to optimal decision making algorithms based on explicitly formulated decision criteria. Thus we are leaving methods based on learning beyond the scope of this study.

The *main objective* of this work is to provide an improved solution for low-level decision making about player positioning based on the known tactical principles of the real-life soccer played by humans (Beim, 1977; Johnes & Tranter, 1999). Because the recent studies addressed the lower control level rather superficially, it looks like our closest prototype is still SPAR (Stone, Veloso, & Riley, 1999), the historically first method that was using both control levels. On the lower level, this method maximizes the following utility function:

where

*P* – the desired position for the player in anticipation of a pass;

*w _{Oi}, w_{Ti}, w_{B}, w_{G}
* – some non-negative weights;

*n* – the number of agents on each team;

*O _{i}
*– the current position of each opponent,

*i*= 1,…,

*n*;

*T _{j}
* – the current position of each teammate,

*j*= 1,…, (

*n*-1);

*B* – the current position of the teammate with ball;

*G* – the position of the opponent goal;

*dist(A,C)* – distance between positions *A* and *C*.

One advantage of this utility function is robustness; the two sums suppress the sensor noise. As the authors put it, this method repulses the player without ball from other players and encourages advancing to the opponent goal while seeking its best position. This statement indeed is reflecting the soccer tactics to some extent.

There are also apparent shortcomings. From the tactical standpoint, the aggregated criterion (1) does not encourage players to deliberately get open for receiving a pass; this can only happen by chance. Also the contribution of the remotely located players is exaggerated; increasing distance from the closest opponent by say 5 meters has same effect as increasing distance by 5 meters for the opponent located on the other end of the field. From the mathematical standpoint, the authors clearly indicate that this is a vector optimization problem; indeed, each addendum in (1) could stand for a criterion. However, the reduction to single-criteria optimization is questionable. Aggregating several criteria in a linear combination is indeed theoretically acceptable if all criteria functions and constraints are convex (Miettinen, 1998). However, the first two addendums in (1) are multi-modal functions of *P*; hence they are non-convex. So this single-criterion optimization does not guarantee that all potentially acceptable player positions will be considered. In other words, some options that are making the *Pareto set* might be left out. This issue could be resolved by using the Pareto optimality principle.

*Dynamics* is one more general difficulty with optimizing player position on low level. Reaching an optimal position takes some time, and the player must be persistent in doing so. In our experiments we have found that some random factors in the simulated soccer tend to make the computation of the optimal target very unstable from cycle to cycle. A closely related problem is choosing the right time horizon for predicting the situation. As for now, we have found in the literature neither a solution to this problem nor even a discussion of it with respect to player positioning. Thus we would speculate that the reluctance of RoboCup scholars to systematically improve player positioning on the lower level could be explained by these difficulties.

### 1.4. The Unique Contribution of This Study

The unique contribution of this work is in that we (1) investigate and resolve the time horizon issue; (2) propose a new set of decision making criteria based on real-life soccer tactics; and (3) implement a Pareto-optimal solution to the optimization problem with these criteria.

Section 2 addresses the time horizon issue. Section 3 explains how feasible alternatives for decision making cold be generated. Section 4 introduces five optimality criteria and the algorithm for finding Pareto-optimal solution. Section 5 provides experimental results and presents the conclusions.

## 2. Predicting the Situation for Player Positioning

Because the essence of player positioning is planning his actions for some future time interval, identifying the time constraints for such planning is critically important. We see two such constraints: the prediction time horizon *τ*
_{1} and the planning time horizon τ_{2}.

### 2.1. Prediction Time Horizon

A straightforward approach is just setting some fixed prediction time horizon *τ*
_{1}, say 1 to 3 seconds, and trying to extrapolate the movements of the players and the ball for this time interval. The simplest way is just using linear extrapolations of player positions. However, our experiments have shown that this method does not work well. The major problem is in that, once the ball is kicked by some player in new direction, the rest players revise their previous intentions and change their directions of movement. Neglecting these abrupt changes if the next ball kick occurs before the prediction time expires would result in poor decisions. On the other hand, forecasting new movements by the players when the ball gets under close control by the teammate, although theoretically possible, is impractical because of too high computational costs.

While trying to resolve this issue, it makes sense to see how human players are acting in this situation (Beim, 1977; Johnes & Tranter, 1999).

Because we consider the offensive positioning, the ball must be under the control by some teammate. This teammate may be in two different states: (a) chasing the freely rolling ball while staying the fastest player to it and thus maintaining the ball possession or (b) keeping the ball close to him while being able to pass it at any time.

Figure 1 illustrates case (a). The yellow player #7 (C) is about to receive the ball (B) passed by his teammate #8 (A). The decision maker in question is player #9 (E). He must figure it out where to better move to render himself for receiving the ball passed by player #7 if the latter decides to do so.

In this case, an experienced human player without the ball can easily predict the situation and estimate time remaining until the ball will be intercepted. So nothing abrupt will likely occur while the ball is rolling freely. Thus a human player predicts the situation and plans his actions accordingly by determining the target position and moving to it.

This gives the idea of the time horizon used by the player without ball #9 for predicting the situation on the field. Based on this prediction, he determines the good position F and concentrates on reaching it. No communication is really necessary, and it does not make much sense substantially changing the originally optimized target position while the ball is rolling. Only minor corrections to it may be necessary as the world model is updated.

Thus the prediction time horizon for the positioning problem is the time *τ*
_{1} needed for the ball to cover distance BD.

### 2.2. Planning Time Horizon

The planning time horizon is the time interval *τ*
_{2} available for the player to execute the intended action to go to some *target* position. This execution time is needed to determine the spatial constraints for the aspired player movements. It does not make sense for the player to go to the areas on the field that require time longer than τ_{2} for getting there.

During the execution time *τ*
_{2} the player must concentrate on reaching good position F expecting that the teammate would make a pass once he gets close control of the moving ball. Assuming that the ball is passed without any delay, the estimated minimal available time for reaching this position is the time needed for the ball to roll along the segment BD and then to cover segment DF. This determines the planning time horizon in our model.

### 2.3. Predicting Player Movements

For determining the best location of the future ball interception point F it is critical to know the anticipated positions of the opponent players and the teammates at time *τ*
_{1}. It is reasonable to assume that during this time the behavior of all players is staying persistent. While the ball is rolling freely along segment AD in Figure 1, its movement follows the laws of physics and is predictable. All players (if they are rational) are acting in the ways that are also rather easy to predict if their decision making model is known.

Therefore, for predicting player movements we are using the prediction horizon *τ*
_{1} that is equal to the time remaining until the ball will be intercepted by a teammate. (If the fastest to the ball is the opponent player, the situation changes to defensive; this lies outside the scope of this study.) With this approach, once the ball has been kicked, the player without the ball estimates the interception point and time *τ*
_{1}. This is the minimal time remaining until the ball could be likely kicked in the new direction substantially changing the situation. Because the sensor information is imperfect, the time horizon and the target position are updated as new information arrives. Still our experiments have shown that these updates are only slightly changing the decision thus not affecting the persistent behavior. This can be expected, as the player is watching the vicinity of the ball location more frequently.

The model for predicting the situation comprises three components: the ball, the friendly and the opponent player movements. The ball movement can be predicted with high precision, as the underlying physics is simple; the typical model assumes straight movement with the known speed decrement. The movements by teammates can be also predicted with precision, as their control algorithms and perceived states of the world to some extent are available to the player in question. The fastest teammate to the ball will be intercepting it by moving with the maximal speed; so its position can be predicted even more precisely. The rest teammates will be moving towards the best positions determined with yet another precisely known algorithm which could be used for predicting their positions. However, in our method we do not use such a sophisticated approach; in regards of each teammate without the ball, we assume that it is just moving with a constant velocity. This velocity is estimated by the decision making player using his own world model. Of the opponent players, the fastest one will be presumably also chasing the ball by following the trajectory that can be predicted fairly precisely. For the rest opponents possible options include assuming same positioning algorithm as for the teammates or using the opponent model estimated and communicated by the online coach.

In the case when the ball is kickable by the teammate (i.e. when it is either holding the ball, or is closely dribbling, or prepares to pass), we would suggest that *τ*
_{1} should be set to a small constant value *τ*
_{min} which should be determined experimentally.

## 3. Identifying the Feasible Options

Decision making is always a choice from a set of alternatives. In the discussed problem, the player first generates a set of feasible options (i.e. positions on the soccer field) and evaluates those using different criteria. Then the multi-criteria algorithm is applied to find the single best option by balancing the anticipated risks and rewards.

The feasible options are drawn from a set of alternative positions in the vicinity of the reference position. Because the decision space (xy-plane) is continuous, it contains infinite number of such positions. With highly nonlinear and non-convex decision criteria, searching such space systematically would be hardly possible. Therefore, we use a discrete approximation, with the alternative positions forming on the xy-plane a grid about the reference position. To reduce computations, we would like to keep the number of points in this grid minimal. The grid step determines the total number of the alternatives to be searched. The rule of thumb is setting the step equal to the radius of the player reach for kicking the ball. Increasing it might result in lost opportunities. Using a smaller step makes sense only if we have enough computational time to further fine tune the balance of different optimality criteria (see more about it in the next section).

In Figure 2 the three areas determining the set of feasible options for the yellow player #9 are shown. The square with the reference position as its center, defines the *responsibility area*. By staying in it, this player is maintaining the team formation. The circle around the player approximates the *reachable area*, i.e. the set of points on the field that he can reach in time *τ*
_{2} or less. The *feasible area* is intersection of these two areas, i.e. the set of all feasible positions for this player.

Thus the player is only interested in those positions in his responsibility area that could be reached in time *τ*
_{2}. This allows eliminating part of the grid that is lying beyond the player reach. One more constraint that helps eliminating poor alternatives is the maximal distance from the reference position. The alternatives lying outside the field or creating risk of offside are also eliminated.

Figure 3 shows the example situation on the field with the predicted positions of all objects at the moment when the ball is intercepted. The head of the black arrow is the anticipated interception point. Figure 4 shows the alternative positions for the red player #8. The player area of responsibility is filled with small gray points; the reference position being the center of this area. The bigger blue points show the reachable positions of which this player must choose the best.

## 4. Criteria for Decision Making and the Optimization Algorithm

Each position in the feasible set has its own pros and cons that must be evaluated by the intelligent player and the best option must be selected. Thus we arrive to a multi-criteria optimization problem. To correctly formalize it, we need to specify the optimality criteria and choose the algorithm for finding a balanced solution with respect to these criteria.

### 4.1. Optimality Criteria

The decision criteria for choosing the best option should be reflecting the soccer tactics; in particular they should be measuring anticipated rewards and risks. We propose slightly different criteria sets for attackers, midfielders, and defenders because their tactical roles differ (Beim, 1977; Johnes & Tranter, 1999).

For the attackers the criteria set is, as follows.

All players must maintain the formation thus implementing the team strategy. So the distance between the point in the feasible set and the reference position should be minimized.

All attackers must be open for a direct pass. Thus the angle between the direction to the ball interception point and the direction to the opponent located between the evaluated position and the interception point must be maximized.

All players must maintain open space. This means that the distance from the evaluated point to the closest opponent should be maximized.

The attackers must keep an open path to the opponent goal to create the scoring opportunity. So the distance from the line connecting the evaluated point and the goal center to the closest opponent (except the goalie) should be maximized. This criterion is only used in the vicinity of the opponent goal.

The player must keep as close as possible to the opponent offside line to be able to penetrate the defense. So, the player should minimize the x-coordinate distance between the point in the feasible set and the offside line (yet not crossing this line).

Note that each criterion appears to have equal tactical importance; this observation will be used while discussing the optimization procedure below.

Criteria for midfielders and defenders differ in that they do not contain criteria 4 and 5 that encourage the opponent defense penetration. Instead, these players should be creating opportunities for launching the attack. This is achieved by minimizing the opponent player presence between the evaluated position and the direction to the opponent side of the field.

### 4.2. Optimization Algorithm

All proposed criteria are conflicting, as it is hardly possible to optimize all them simultaneously; a reasonable balance must be sought instead. This situation is well known in the literature on systems analysis and economics; a special paradigm called the *Pareto optimality principle* allows to eliminate wittingly inappropriate so-called dominated alternatives (Miettinen, 1998). These are the points in the feasible set that could be outperformed by at least some other point by at least one criterion. So only the non-dominated alternatives making so-called *Pareto set* should be searched for the ‘best’ balance of all criteria. Balancing requires additional information about the relative importance of these criteria, or their weights. If the criteria functions and the feasible set are all convex, then the optimal point could be found by minimizing the weighed sum of the criteria (assuming that they all must be minimized) (Miettinen, 1998). However, on the xy-plane, which is the soccer field, several local maxima for criteria 2, 3, and 4 exist; they all are around the predicted locations of opponent players. Therefore, in our case there is no hope for such a simple solution as using the weighed sum.

The way out has been proposed in our recent work (Kyrylov, 2008), where a method for searching the balanced optimal point in the finite Pareto set was presented. This method is based on the sequential elimination of the poorest alternatives using just one criterion at a time. With *N* alternatives in the Pareto set, it requires *N*-1 steps. The criterion for the elimination on each step is selected randomly with the probability proportional to the weight of this criterion. Hence more important criteria are being applied more frequently. The sole remaining option after *N*-1 steps is the result of this optimization. This method works for any non-convex and even disconnected Pareto set. Its computational complexity is *O*(*N*
^{2}).

In this application, we have further simplified the decision making procedure by assuming that all criteria have equal importance. Thus instead of randomly selecting the criteria on each step of elimination, our procedure is looping through the criteria in the deterministic order.

If the total number of the alternatives is too small, this would result in only near-optimal decision. Better balancing of the conflicting criteria is possible with increased *N*. So we propose to estimate the available computational time in current simulation cycle and select larger *N* if time permits. This optimization algorithm is scalable indeed. It is also robust, because even with small *N* the decisions returned by it are still fairly good.

If this optimization ends in still rather poor option, the player elects just to move towards the reference position; making decision to pass the ball or not is left up to the teammate, anyway. This teammate may elect to dribble or to pass the ball to some other player whose position on the field is better.

Although we proposed five optimality criteria, for the purpose of illustration we have aggregated them all in just two: the *Risk*, which is combination of criteria 2 and 3, and *Gain* which aggregates criteria 1, 4, and 5. The signs of the individual criteria in these aggregates were chosen so that both *Risk* and *-Gain* must be minimized. As a result, it is easy to visually explore the criteria space because it has only two dimensions.

Figure 5 and 6 illustrate the configuration of the Pareto set in the decision and criteria space, respectively. Of the total of 21 points in the Pareto set, 20 are eliminated one by one in the order shown on the labels near each point in Figure 6; the remaining point is the sought solution. Note that the Pareto frontier is non-convex.

The optimal point is reachable and is located at less than the maximal distance of the reference position. It is lying on the way towards the opponent goal and far away from the predicted positions of the two threatening opponents, yellow #10 and #6. This point is open for receiving the pass by red player #8 from the anticipated interception point where red #7 is about to arrive faster that his opponent yellow #11. This is indeed a well-balanced solution to the positioning problem for red player #8. With non-aggregated five criteria we can only expect even better decisions.

## 5. Experimental Results and Conclusion

We have conducted experiments with the purpose to estimate the sole contribution of the proposed method for the lower-level optimized player positioning compared with only strategic, higher-level positioning.

Measuring the player performance using existing RoboCup teams is difficult because new features always require careful fine tuning with the existing ones. For this reason, we decided to compare two very basic simulated soccer teams. The only difference was in that the experimental team had player positioning on two levels and the control team just on one level. Players in both teams had rather good direct ball passing and goal scoring skills and no dribbling or holding the ball at all. Thus any player, once gaining the control of the ball, was forced to immediately pass it to some teammate. In this setting, the ball was rolling freely more than 95 per cent of the time, thus providing ideal conditions for evaluating the proposed method.

To further isolate the effects of imperfect sensors, we decided to use *Tao of Soccer*, the simplified soccer simulator with complete information about the world; it is available as the open source project (Zhan, 2009). Using the RoboCup simulator would require prohibitively long running time to sort out the effects of improved player positioning among many ambiguous factors.

The higher-level player positioning was implemented similar to used in *UvA Trilearn* (De Boer, & Kok, 2002); this method proved to be reasonably good indeed. Assuming that both goals are lying on x-coordinate axis, the coordinates of the reference position for *i*-th player are calculated as follows:

where *w* is the weight (0<*w*<1), (*xhome _{i}
*,

*yhome*) and (

_{i}*xball*,

*yball*) are the fixed home and the current ball positions respectively,

*Δx*is the fixed individual adjustment of x-coordinate whose sign differs for the offensive and defensive situations and the player role in the team formation. Our improving was in introducing the shift

_{i}*Δx*in this method.

_{i}Because players in the control team were moving to the reference positions without any fine tuning, ball passing opportunities were occurring as a matter of chance. In the experimental team, rather, players were creating these opportunities on purpose.

The team performance was measured by the game score difference. Figure 7 shows the histogram based on 100 games each 10 minutes long.

In this experiment only one game has ended in a tie; in all the rest 99 games the experimental team won. The mean and the standard deviation of the score difference are 5.20 and 2.14, respectively. By approximating with Gaussian distribution, we get 0.9925 probability of not losing the game. The probability to have the score difference greater than 1 is 0.975 and greater than 2 is 0.933. This gives the idea of the potential contribution of the low-level position optimization. With the smaller proportion of the time when the ball is rolling freely, this contribution will decrease. So teams favoring ball passing would likely benefit from our method more than teams that prefer dribbling.

The experimental results demonstrate that, by locally adjusting their positions using the proposed method, players substantially contribute to the simulated soccer team performance by scoring on the average about five extra goals than the opponent team that does not have this feature. This confirms that optimized player positioning in the simulated soccer is the critical factor of success.

Although this method has been developed for simulated soccer, we did not rely much on the specifics of the simulation league. Nor have we built our method upon the specifics of the two dimensions. Therefore, we believe that the main ideas presented in this work could be reused with minor modifications in the 3D simulated soccer and in other RoboCup leagues. These ideas could be also reused by video games developers. Besides soccer, our general approach is applicable to different sports games.