Models of Paradoxical Coincident Cost Degradation in Noncooperative Networks

[Networks] As large-scale networks, one can think of transportation networks, which consist of very many roads, through which lots of vehicles run through. It is inappropriate if the utilization factors of the networks are so low that the amounts of traffic through the networks are too small, or if the sojourn times of vehicles are too long due to the congestion. In relation to the notion of transportation networks, we have the term ‘information highway,’ which means information networks that are represented by the Internet. The Internet is an information network consisting of a great many nodes (or routers) and the communication lines interconnecting them, through which packets (in place of vehicles) run. Commonly as transportation networks, it is inappropriate if the utilization factors of the networks are so low that the amounts of traffic through the networks are too small or if the sojourn times of packets are too long due to the congestion.


Introduction
[Networks] As large-scale networks, one can think of transportation networks, which consist of very many roads, through which lots of vehicles run through. It is inappropriate if the utilization factors of the networks are so low that the amounts of traffic through the networks are too small, or if the sojourn times of vehicles are too long due to the congestion. In relation to the notion of transportation networks, we have the term 'information highway,' which means information networks that are represented by the Internet. The Internet is an information network consisting of a great many nodes (or routers) and the communication lines interconnecting them, through which packets (in place of vehicles) run. Commonly as transportation networks, it is inappropriate if the utilization factors of the networks are so low that the amounts of traffic through the networks are too small or if the sojourn times of packets are too long due to the congestion.
It is appropriate if each portion of the networks has a suitable amount of traffic and if vehicles or packets (later we call both of them 'users' or 'players') pass through it within suitable time lengths. We cannot be sure, however, that such good situations are always kept. In order to keep such good situations, we need to control the networks in one way or others. More concretely, we need to select adequately the paths for users to run through (i.e., routing) or to decide adequately the rates of users to run through the networks (i.e., flow control).
In distributed computer systems including recently highlighted Grids (a method of sharing computer resources by using communication networks), we have the problems of load balancing in order to have the high efficiency of utilizing computing resources (for example, [26,34,53]) that are equivalent to routing problems in the networks. In this chapter, we present some seemingly paradoxical results on the routing problems and on the equivalent load balancing problems. There have been found similar seemingly paradoxical results on flow control [19,20], but we do not discuss them here.
[Distributed and independent decision making] On one hand, in general, the above-mentioned networks are of large scales and overall detailed control of them may be difficult. On the other hand, they are usually shared by users or organizations that make independent decisions; e.g., in transportation networks, there are independent vehicle owners and enterprises that run buses and trucks; e.g., in the Internet, there are Internet service providers that are private enterprises and universities/research organizations, each of which is considered to make independent decisions. These independent decision makers are equivalent to what are called 'players' in the framework of game theory.
Lots of people may believe that in a network or in a system, if each independent decision maker coincidently pursues the reduction of the cost relevant to itself, the overall utility of the network or of the system may increase. As to economic behavior, it appears to be generally believed that, if each decision maker seeks its profit independently, selfishly, and noncooperatively from others, the overall social system may achieve the most economical state by the guidance of so-called 'Invisible Hand' as mentioned by Adam Smith [50]. In the situations where the overall detailed control seems to be difficult in such as large-scale networks, the believes in the effectiveness of independent decision making, may lead to the expectations that if each user (or player) makes decisions only for its own objective, the entire problem of obtaining the best system state is divided into the collection of dispersed smaller-scale problems each of which is more easily solved than the entire problem.
[Paradox] There have been found phenomena seemingly quite paradoxical to the above-mentioned belief, for example, what is called the 'Braess paradox,' which we discuss later in this chapter. This chapter present some research results on the possible magnitudes of the harms brought by such paradoxical phenomena. In fact, it looks that no big harms induced by such paradoxical phenomena like large coincident performance deterioration of user costs, have been reported thus far. We have already the situations where the Internet is shared by a great number of independent Internet service providers and where online POS (Point of Sales) systems are shared by mutually independent chains of convenience stores. In spite of that, no big problems such as the above-mentioned severe performance degradation have been revealed.
We expect, however, that each independent organization will pursue its cost decrease much more seriously in the future and that the scales of the shared networks and the numbers of users sharing the networks will become much larger. We will thus need to investigate the above-mentioned problems and to gain much powerful insight into the problems.

Different degrees in the dispersion of decision making
We can think of different degrees in the dispersion of decision making.
(A)[Completely centralized decision making]: All users are regarded to belong to one group that has only one decision maker (Only one player in the game). The decision maker seeks to optimize a single performance measure such as the total cost over all users (for example, the expected sojourn time over all users). In the literature, the corresponding solution concept is referred to as a system optimum, overall optimum, cooperative optimum or social optimum.
In this chapter, we shall refer to it as the overall optimum. This may reflect the situation where the entire system is controlled by a single unified organization (i.e., only one player).
(B)[Completely dispersed decision scheme]: Each of infinitely many infinitesimal users optimizes its own cost (for example, its own expected sojourn time), independently and selfishly of the others. In this optimized situation, each job cannot expect any further benefit by changing unilaterally its own decision. In this setting, the number of such users is so many that the impact of any one such user is infinitesimally small on the costs experienced by all other users. It is then assumed that the decision of a single job has a negligible impact on the performance of the entire system.
In these types of noncooperative networks, each infinitesimal, i.e., "non-atomic" (as some game-theorists say), user makes its own routing decision so as to minimize its expected delay from its origin to its destination given the routing decisions of other users. In this case, the situation where every infinitesimal user has optimized its decision, given the decisions of other users, and would not unilaterally deviate from that choice, is called an individual equilibrium. The name given to this form of equilibrium is Wardrop equilibrium, i.e., a Nash equilibrium with infinitesimal players (nonatomic users) ( [18,41,54] etc. ).
(C)[[Intermediately dispersed decision scheme]: Infinitely many jobs (users) are classified into a finite number (N(> 1)) of classes or groups, each of which has its own decision maker and is regarded as one player or user. Each decision maker optimizes non-cooperatively its own cost (e.g., the expected sojourn time) over only the jobs of its own class. The decision of a single decision maker of a group has a non-negligible impact on the performance of other groups. In this optimized situation, each of a finite number of classes or players cannot receive any further benefit by changing unilaterally its decision. In the literature, the corresponding solution concept is referred to as a class optimum, Nash equilibrium, or user optimum. In this chapter, we shall refer to it as the group optimum. This may reflect the situation where the system is shared by a finite number of mutually independent organizations each of which is totally unified. We may have different levels in intermediately dispersed optimization.
In this situation, the users are referred to as "atomic" (as some game-theorists say) in that each user's decision has an impact on the costs experienced by the other users. The situation where, in such a scheme, every user has optimized his decision, given the decisions of other users, and furthermore, would not unilaterally deviate from this decision is called a Nash equilibrium, since it is, in this respect, "stable" [18,24,27].
Note that (C) is reduced to (A) when the number of players reduces to 1 (N = 1) and approaches (B) when the number of players becomes infinitely many (N → ∞) [18]. In the cases of (B) and (C), there are plural decision makers and they can be regarded as 'games' (in particular, congestion games (for example, [42,46])). In the terms of economics, (A), (B), and (C), respectively, present monopoly, perfect competition, and oligopoly.

Pareto inefficiency and paradox
We think that the cost (or utility) of each user is determined for each state of the system. For example, if the route through which each user runs through and the amount of traffic of each path is determined, the total cost (or utility) of each user (decision maker, or player) will be determined. In engineering fields, as the measure of evaluating the system status, a single measure such as the sum or the weighted means of the costs (or utilities) of all users (or players) has been used in general so far. It is questionable or problematic, however, to determine the superiority between the two states of the system if, in one system state, the utility of one user is better than that of another user and if, in the other system state, the the utility of the former user is worse than that of the latter user.
The exact definition of superiority among system states is given in terms of Pareto notions. The notions of Pareto optimality, superiority, and inefficiency have already been established. In the next section, we first confirm the notions and their definitions. Then, we discuss a definition of the measure of Pareto superiority.

Pareto optimality and superiority
We consider a system consisting of a number of users or players, numbered 1, 2, · · · , n (Denote by n the set {1, 2, · · · , n}). For each state of the system, each user has its own utility. Denote a combination of utilities of all users in a system state S by U(S) = (U 1 (S), U 2 (S), . . . , U n (S)). We consider only the cases where U i (S) > 0 for all i.
[Pareto optimality and efficiency]: There may exist a state of the system where we cannot improve the utility of each user without decreasing the utility of some other user. This is called a Pareto optimum or efficient state. In general, there may be infinitely many Pareto optimal states for a system. Consider the space of the combinations of utilities U(S) for all system state S. That is, each point U(S) in the utility space corresponds to a combination of utilities of the users of a system state. Then, each axis of the utility space shows the utility of a user given the system state. The set of points corresponding to Pareto optimal states forms the border (Pareto border) separating the set of achievable combinations of utilities from the set of unachievable ones in the utility space.
[Pareto superiority and inferiority]: Consider an arbitrary pair of two (achievable) states of the system, S a and S b : . Then, S b is Pareto superior to S a if and only if k i > 1 for some i and k j ≥ 1 for all other j. S b is Pareto inferior to S a if and only if k i < 1 for some i and k j ≤ 1 for all other j.
We define strong Pareto superiority and inferiority. That is, S b is strongly Pareto superior to S a iff k i > 1 for all i. S b is strongly Pareto inferior to S a iff k i < 1 for all i. A state to which some other state is (strongly) Pareto superior is (strongly) Pareto inefficient.
An overall optimum is evidently Pareto optimal. Individual optima or group optima (Nash equilibria) may be Pareto optimal (for example, [1,7]), but may not be Pareto optimal as in the game called the prisoners' dilemma (for example, [38]). It has been shown that if the utility of each player is continuous, Nash equilibria are generally Pareto inefficient (See [13,49]).
[A measure of Pareto superiority/inferiority]: As we see in the above, the definition of Pareto superiority/inferiority has already given and well accepted. It seems, however, that the measure of the degree of Pareto superiority/inferiority has not been generally accepted. The measure is necessary, e.g., for defining the degree of paradoxical coincident cost degradation.
The Pareto superiority depends on the vector (k 1 , k 2 , . . . , k n ). It would, however, be convenient to express the degree of Pareto superiority by a single scalar measure. The primary concern must be the requirement that the value of the measure clearly distinguish cases of Pareto inferiority and, thus, paradoxes, from other cases, simply and almost clearly. Define k min min i k i and k max max i k i . If k min > 1, the state S b is (strongly) Pareto superior to S a , and if k min < 1, the state S b is Pareto indifferent or inferior to S a . If k max < 1, the state S b is (strongly) Pareto inferior to S a , and if k max > 1, the state S b is Pareto indifferent or superior to S a . Thus, the measures k min and k max may be used as primary measures of the degree of Pareto superiority and inferiority, respectively. We note that if k min < 1 and k max > 1, states S a and S b are mutually Pareto indifferent to each other. On the other hand, for example, a measure X based on a certain average of all of k i should be rejected, since it can hold that X > 1 even if some k i < 1 for some i but if k j 1 for all other j's. Such a measure may be used as a secondary measure. (In many practical situations, the variables may have continuous values and truly exact equalities occur rarely. Or, the tie-breaking of the case that k min = 1 may depend on such a secondary measure.) We propose that k min (>1) and k max (< 1) are used as primary measures showing the degrees of Pareto superiority and inferiority, respectively. The tie-breaking of the case that k min = 1 and that k max = 1 may depend on some other secondary measure.
Note, in passing, that a measure similar to the above has been used to discuss the effects of symmetry on globalizing separated monopolies to a Nash-Cournot oligopoly [28].
[Pareto inefficiency and paradox] Braess paradox presents an example of the case where for an equilibrium system state there exists a non-equilibrium system state that is Pareto superior to it as shown later. This chapter presents the research results on the cases where (paradoxical) coincident cost degradation of each user occurs and on the possible sizes of such coincident cost degradation. On the other hand please note that there exist cases where coincident cost improvement for each user is unlimitedly large [22].
The price of anarchy The idea of a measure, the price of anarchy, was mentioned by Koutsoupias and Papadimitriou [32], and its name 'the price of anarchy' appeared in Papadimitriou [39]. The term 'anarchy' is considered to mean the state of a Nash or Wardrop equilibrium which is reached by the situation where every player behaves selfishly or freely under no constraints imposed by a central controller to optimize its own cost or utility. The measure looks to show the degree how bad is the state of the worst-case Nash/Wardrop equilibrium against the best state. The proposer of the measure uses, as the best state, the state with the optimal social cost. Then, the price of anarchy is equal to the ratio of the social cost of the worst-case Nash/Wardrop equilibrium to the minimum social cost. A number of results have been obtained based on this measure, many of which are described by Roughgarden [43,45]. In fact, before the idea of the measure, price of anarchy, was proposed, anomalous behaviors of Wardrop and Nash equilibria compared with social optima, like those expressed in terms of the price of anarchy, had already been discovered and investigated in the context of load balancing in distributed computer systems that was identical to routing in networks of particular types [25,26,55].
On the other hand, the measure of the Pareto inefficiency of a Nash equilibrium has to reflect the comparison with all the Pareto optima, whereas the state with the optimal social cost is only one Pareto optimum. Therefore, the price of anarchy cannot be a good measure of the Pareto inefficiency of a Nash equilibrium [33]. Following the spirit of the price of anarchy, we may think of the ratio of the social cost of state A to that of state B for comparing two states A and B. According to the discussion given above on the measure of Pareto superiority/inferiority and paradoxes, we do not use the above-mentioned measure that has the spirit of the price of anarchy as the primary measure of Pareto superiority/inferiority and paradoxes but may use it as a secondary measure. We note that the above-mentioned anomalous behavior of the Wardrop/Nash equilibrium necessarily occurs when the Braess paradox occurs, but not vice versa. [Braess network] Braess [5] considered a network consisting of 4 nodes, 1 origin (0), 2 relay nodes (1,2), and 1 destination (3) (Fig. 1). As shown in Fig. 1 left, before adding a link, the network has two paths, 0-1-3 (Path 1) and 0-2-3 (Path 2), each of which contains two links, Link 1 (from Node 0 to Node 1) and Link 2 (from Node 1 to Node 3) for the first path, and Link 3 (from Node 0 to Node 2) and Link 4 (from Node 2 to Node 3) for the second: See  Lots of users pass through the network from the origin (0) to destination (3). The passage time through each link is determined by the rate of users that pass through the link. Each user strives to pass through the path of the minimum sojourn time. Each user cannot decrease its sojourn time by unilaterally changing the path it chooses to pass through in the equilibrium state, that is, the Wardrop equilibrium or individual optimum. Since the network has only one origin and one destination, any paths used have the same cost (the identical sojourn time). As shown in Fig. 1, if the rates of users that pass through links 01, 13, 23, 02, and 12, respectively, are η 1 , η 2 , η 3 , η 4 and η 5 , the passage times through links 01, 13, 23, 02, and 12, are a(

Braess paradox
Denote by X the total rates of users that pass through network. Before adding a link, the two paths 0-1-3 and 0-2-3, respectively, have the rates of users passing through, x and y (x + y = X). After adding a link, the three paths, 0-1-3, 0-2-3, and 0-1-2-3, respectively, have the rates of users passing through, u, v, and w (u + v + w = X). Denote by C o and C c , respectively, the costs of the paths before and after adding a link. The ratio of the cost before adding a link and that after adding a link is denoted by That is, by adding a link, the cost of all users (sojourn time) increases by about 10 percent. Adding a link leads to the augmentation of the degree of the freedom of decision making. In spite of it, it looks paradoxical that adding a link brings about the cost degradation to all users or decision makers. Thus the above-mentioned phenomenon is regarded as 'paradox.' In fact, it has been observed that similar paradox occurs in the real world [29]. Thus, we see that the existence of a state that is Pareto superior to a Wardrop equilibrium has been shown.
[Cohen-Kelly paradox] In the Braess network, linear functions are considered as the link costs.
The link cost functions considered in the networks of queues are nonlinear in general. Cohen and Kelly [10] considered the following network with nonlinear link costs. λ and φ are system parameters. X = 2λ. Link flows η 1 , η 2 , η 3 , η 4 and η 5 , respectively, give b( This also is regarded as a paradox. Thus, the ratio of degradation, k, is less than 1.5. [The researches related to Braess paradox] Later, Braess paradox gradually caused attention of many scholars including economist Samuelson [48] and related studies have been accumulated including the above-mentioned study by Cohen and Kelly [10] (For example, [6,9,11,12,16,17,35,36,40,51,52]). In addition, it has been shown that for mechanical and electrical systems that have the topology similar to the Braess network, there may occur phenomena similar to Braess paradox, and the results were presented in a scientific journal, Nature [8]. A list of references on the Braess paradox is kept in Braess's home page (http://homepage.ruhr-uni-bochum.de/Dietrich.Braess/#paradox).
Almost all of the related results have been obtained as to Wardrop equilibria, and most of them have discussed the networks that have the same topology as that of Braess's or its generalized versions. Furthermore, some have handled only weak Paradox as explained later. Moreover, it has been shown that there exists a case of similar paradox as to a group optimum (Nash equilibrium of a finite number of users) in a network whose topology is similar to the Braess network [30]. Korilis et al. [31] have obtained a sufficient condition whereby no paradox occurs in a network with one origin and one destination and with plural groups of users of the same kind.
[The bounds of the degrees of the paradox in networks of Wardrop equilibrium] There seem to have been only a few studies that have provided an estimation of how harmful the paradox can be, i.e., the worst-case degree of coincident cost degradation for all users by adding connections to a noncooperative system [21,44,47]. As to a generalized Braess network as shown in Fig. 1, it has been shown that, if functions a and c are increasing and if functions b, d, and t are non-decreasing, k < 2, i.e., the degree of paradox cannot be over 2 [21]. Furthermore, if a generalized Braess network is embedded in a larger network, the degree of paradox with respect to the embedded network cannot be over 2 [21]. As a more general result, it has been shown that as to networks consisting of one origin, one destination, n nodes and links of nondecreasing costs, the degree of paradox cannot be over n/2 [44]. An extreme case of it is shown in Fig. 2  In fact, it is likely that there has not been found any system in Wardrop equilibrium (with infinitesimal users) for which the degree of coincident cost degradation can increase without bound if the number of nodes in the network is bounded. On the other hand, we have also seen that the benefit brought by the addition of connections to a noncooperative network can increase without bound [22]. In contrast, it has been shown that there exists a system in a Nash equilibrium (with a finite number of users) for any size of the degree of the paradox as shown in the later section 4.1.1 and by [27].

Coincident cost degradation (Paradox) for all users by adding connections to networks
[The degree of coincident cost degradation (Paradox)] As to the above-mentioned networks in Wardrop equibria that has only one origin and one destination, the cost of all users are identical, and the comparison between the costs of before and after adding connections to the networks may be rather easy. In other networks, however, the costs of users are not necessarily identical, and we define here the degree of paradoxes for those networks. It seems that such definitions have not received attention until rather recently.
Following the section 1.2, we consider the concept of strong Pareto superiority. Denote by S b and S a , respectively, the states before and after adding connections. Assume that the costs of uses are positive. In fact, in the examples presented in this chapter, they are so.
Consider, for user i ∈ n, k i given by k i = C a i /C b i . If k i > 1 for all i ∈ n, S b is Pareto superior to S a , which means a paradox. Define k min such that k min = min p k p . That k i > 1 for all i ∈ n is equivalent to that k min > 1. In contrast, if k i ≤ 1 for some i ∈ n, i.e., k min ≤ 1, S b is not Pareto superior to S a , which implies no paradox. Thus, k min shows whether a paradox occurs. We consider furthermore, that k min shows the degree of a paradox. We thus consider k min the measure of the degree of a paradox [22]. Note that the networks mentioned in Section 2 have only one origin and one destination, and that, in each Wardrop equilibrium, the utility (cost, e.g., sojourn time) of every user is the same, and that k min degenerates to k. Thus, as to the Braess-like paradoxes, only in this special cases, the price of anarchy can be a good measure of paradoxes.
[Weak paradox] In the paradoxes discussed in this chapter, the state S b before adding connections is strongly Pareto superior to the state S a after adding connections. On the other hand, even if S b is not strongly Pareto superior to S a , it is possible that the social or overall cost of users (for example, the overall average sojourn time or passage time for all users or packets) for S b is better than S a , which looks anomalous [26,55]. In such a case, however, adding connections does not lead to the coincident cost degradation for all users. We call such an anomalous case a weak paradox. In the cases presented in Section 2, it holds that k i = k, but it does not necessarily holds in general. In particular, the degree of cost degradation may be different for each user (i ∈ n). The results that depend only on the price of anarchy may show only a weak paradox and not a (strong) paradox, and this chapter does not touch on such results.

The systems where the cost of each user may not necessarily be identical ---Paradoxes in the models of distributed computer systems
During about 30 years since the Braess paradox had started to attract attention of many authors, a lot of papers related to it were published, but in almost all of them, the networks discussed by them looked to have either the topology similar to the Baress's, or to have each user non-distinct with only one pair of the origin and the destination. As networks that have each user with distinct cost, we show the networks of distributed computer systems. That is, as to the group optima in a model of distributed computer systems, an example of (strong) paradox has been found. This type of paradox may occur for symmetric systems, and in some cases, the degree of paradox (coincident cost degradation) can increase without bound.

Analytic results on symmetric distributed computer systems
As to the paradoxes for symmetric distributed computer systems, there have been obtained very general and complicated analytical results [27], which is presented in the appendix to this chapter. In the next Section 4.1.1, by using a special case of the above-mentioned systems, some basic results that give intuitions into the nature of the paradox are presented.

A simple example of the paradox for symmetric distributed systems
Consider a model of distributed systems consisting of two nodes, as shown in Figs. 3 and 4. μ, φ, and T i , respectively, denote the processing capacity of each node (computer), the job arrival rate to each node, and the expected sojourn time of the job that arrives at node i. Each node i is associated with one decision maker who decides the rate x ij (i = j) of jobs to forward to the other node j in order to minimize selfishly only the cost T i for the jobs that arrive at node i. The equilibrium state is a group optimum (a Nash equilibrium). When there is no network connection between the two nodes as shown in Fig. 3, each decision maker cannot forward jobs to the other node, and, in the group equilibrium state, After two nodes are inter-connected as shown in Fig. 4, denote by x ij the rate of jobs forwarded from node i to node j. Then, Define x = (x 12 , x 21 ). Denote C by the set of the vectors that satisfy relation (1). Assume that job forwarding takes the time length t irrespective of the value of x. Then, where β i denotes the workload on node i, and is obtained as follows: T ik (x) denotes the expected sojourn time of a job that arrives at node i and is processed by node k. After the network connection is added, denote byx the solution (the Nash equilibrium) of the group optimum (Fig. 5). Then, x is obtained as follows: The value of T i is identical to the value before adding the connection, which implies no paradox occurring.
(ii) The case where 0 < t ≤ φ/(μ − φ) 2 : E presents the difference of the value of T i minus the value before adding the connection, and E > 0 shows the paradox occurring. Therefore, 0 < t < φ/(μ − φ) 2 is the necessary and sufficient condition for the occurrence of the paradox for this model.
[A derivation of the above-mentionedx and T i (x)] From (2) and (4), From (5), we see that ∂T i ∂x i is increasing in x i such that x ∈ C. If suchx as satisfies the following: the value ofx is the solution of the group optimum (the Nash equilibrium). From (5) and by (7) If (6) hold, then, from (7), d = 0. Then, from (5), Therefore, From the above, we see that this is a unique solution (case (ii)). From this and (2), we can obtain T i (x).
For the case where t > φ (μ − φ) 2 (case (i)), from (8), we have for x i = 0 (i = 1, 2), Since is the solution of the group optimum (the Nash equilibrium). The uniqueness of the solution in this case is shown as outlined as follows: Assumex 1 > 0. From the definition of d, (5), and the condition (ii) with respect to t, we must have d < 0, and thusx 2 is not to be zero. If we have a similar argument as above with respect tox 2 we must have d > 0, which is a contradiction. Thus, we see thatx = 0 is a unique group optimum. For the existence and uniqueness of more general cases, see [2,23,37] and [27]. 2 For any values of system parameters, the solutions of the overall optimum and the individual optimum are the same as the solution given above for case (i). Therefore, differently from the Braess networks, no paradox occurs for the individual optimum (the Wardrop equilibrium) in this model.
As in the above, we consider that the degree of the paradox is the ratio of the expected sojourn time after adding the connection to that before doing so. Denote it by k(μ, φ, t) here.
With φ and μ being fixed, the degree of the paradox is largest when .
Therefore, we see that for any system including asymmetric one there exists a symmetric distributed system that has the larger degree of paradox than it. Thus, symmetric systems bring about the worst-case paradoxes. As to any group of systems that have finite degrees of paradox (the characteristics of the groups can be expressed in natural ways), for each group a symmetric system presents the worst-case paradox among the group as discussed in the later section 5.

Gereral results on the paradoxes for symmetric distributed systems
Analytic results have been obtained for the models with the number of nodes, the characteristics of job types, the processing capacities of nodes, and the characteristics of job-transfer capacities being much more general than those of the model of the above section. Details are given in the appendix.

Paradoxes for asymmetric distributed computer systems
Consider an extension of the models of symmetric distributed systems presented in Section 4.1 to those of asymmetric distributed systems. In the cases where each value of the parameter describing the system is not identical as to every user, the cost of each user is considered distinct in a Nash equilibrium. In this section, we consider the system consisting of m nodes, 1, 2, . . . , m [26,53]. Jobs are classified into groups, i = 1, 2, . . . , m, depending on the node at which they arrive, and they arrive at node i according to Poisson distribution with the arrival rate φ i . Out of them, the rate x ii of jobs are processed at node i, and the rate x ij (i = j) of jobs are forwarded through the transfer facility to node j(j = i) and processed there. Thus 1, 2, . . . , m) unilaterally within the constraints so that the cost at node i may be minimum. As the result, node i has the workload β i = ∑ q x qi . This system is equivalent to the network consisting of m origin-destination pairs that have a common destination (See Fig. 6 for the case of m = 3. In the figure, the variable shown closely to each arrow presents the rate of jobs passing through the arrow.) Denote by D i (β i ) the expected sojourn time (including the waiting time) of a job that arrives at node i that has the workload β i . It is assumed that D i (β i ) is convex and increasing in β i . Denote by G ij (x) the expected transfer time of forwarding a job from node i to node j(j = i). Denote by T i (x) the expected sojourn time that a job arrives at node i and finally goes out of the network. Then we have, The group optimal state (the Nash euilibrium)x satisfies the following (for example, [24]).
where (x −(i) ; x i ) is m × m dimensional vector that is made fromx by replacing its elements corresponding tox i by x i . As to the existence and uniqueness of the group optimum (the Nash equilibrium), see [2,3].
In the next section, we present some results on the asymmetric and symmetric distributed systems wherein the degrees of paradoxes are finite. The results show that, in the group of distributed systems characterized in natural ways, the degrees of paradoxes have the worst-case values for symmetric systems.

The models with multiple nodes with nonlinear costs
Consider the model of distributed computer systems consisting of m(≥ 2) nodes. With the workload β i , the expected node passage time (the cost) is: otherwise ∞). The transfer cost G ij (x) has the following two cases (A) and (B): where, λ = ∑ p ∑ q,(q =p) x pq denotes the rate of jobs being transferred through the interconnection. One job-transfer channel is shared by the entire system.
The entire system has m(m − 1) channels.

Numerical experiments
The following algorithm is used to obtain the solution of a group optimum (a Nash equlibrium).
Given the values of φ, μ, and t, - . , x n−1 m ). Repeat this step until the conversion.
If the above algorithm converges, a group optimum is obtained. Then, we have the group optimum costT i (φ, μ, t) of decision maker i for the values of φ, μ, t. It has been shown that, in the cases where the costs of nodes, etc. are linear, the algorithm converges [4]. In fact, for all the cases we examined, the algorithm converged. For a given combination of the values of μ i and φ, there exists the value of t ∞ , such that, in the group optimum, if t ≥ t ∞ , the transfer facility is not used.
If the following holds, a paradox occurs.
Here, , μ, t) gives the cost for the decision maker at node i, given the parameter values φ, μ, and t. Therefore, (12) shows that, if the capacity of the job transfer increases such that the transfer parameter decreases from t ∞ to t, the costs of decision makers at all nodes increases, which looks paradoxical. The maximum value, Γ(μ, φ), of the degree of paradox k min (φ, μ, t) is presented as follows: In the next section, the values of Γ for the combinations of the values of μ and φ are given.

The results of numerical experiments
We examined the cases of various combinations of the values μ and φ for some m(≥ 2). The left and right parts of Fig. 7  . In both cases, as μ 1 increases, the worst-case paradox Γ increases up to some limit. It seems to be clear that the symmetric systems approaches to the limit. We can observe that there exists no asymmetric system whose values of worst-case paradox Γ depicted by the dotted curve are over the limit. That is, we can see that for any asymmetric system within each group, there exists a completely symmetric system whose degree of paradox is equal to or larger than that of the asymmetric system. In other experiments than those presented here, we found the same tendencies as shown here [14,15]. These numerical examinations may imply that, for any group that can be expressed in natural (not excentric) ways and that has the finite value of the worst-case paradox, the worst-case paradox can be achieved by a completely symmetric system within the group.

Concluding remarks
In information or transportation networks or distributed computer systems wherein a plural number of users make independent decisions to minimize their own costs, it may be possible that all the users have coincident cost degradation similarly as the prisoners' dilemma or the Braess paradox. Motivated by the above, in this chapter, we have presented an overview of the analytical and numerical results on the worst-case degree of paradoxes such that in selfish routing in networks or in noncooperative load balancing in distributed systems, as the degree of freedom in making decisions increases, the costs of all users degrade coincidently. Furthermore, we have considered the relation between the paradox and the Pareto superiority, and have shown a measure that gives the degree of paradox. As to the networks in an individual optimum (an Wardrop equilibrium) wherein decision making is completely dispersed, the degree of the worst-case paradox is limited for a finite number of nodes in the networks. On the other hand, as to the networks such as distributed computer systems like GRID, in an individual optimum, no paradox may occur whereas in a group optimum (a Nash equilibrium) wherein decision making is intermediately dispersed, paradoxes may occur and the degree of the paradoxes can increase without bound. Furthermore, for a group of systems, the worst-case paradox for the group may be achieved by a symmetric system in the group. For symmetric distributed systems, analytic results have been obtained for general systems. For asymmetric systems, however, it may not be so easy to obtain analytical results, and so we have to rely on numerical investigation in order to understand the problems in question.
Even in the direction of research (such as routing in networks and load balancing in distributed systems), there may remain to be solved many explicit and implicit problems.
We have discussed here static or quasi-static controls, but it may be far more difficult to obtain rich results on dynamic controls. Paradoxes in flow controls that we have not discussed here may be pursued later.

Appendix A: The results on the paradoxes for general symmetric distributed computer systems
The following shows the results on the models generalized with respect to the number of nodes, the characteristics of job types, the node processing times, and the job transfer facilities from the models presented in Section 4.1 [27].

A.1. The model and assumptions
The model described in Section 4.1.1 is generalized as in the following. We consider a system with m (≥ 2) nodes (host computers or processors) connected with a job-transfer means.
Jobs that arrive at each node i, i = 1, 2, · · · , m, are classified into n types k, k = 1, 2, · · · , n. Consequently, we have mn different job classes R ik . Each class R ik is distinguished by the node i at which its jobs arrive and by the type k of the jobs. We call such a class local class, or simply class. We assume that each node has an identical arrival process and identical processing capacity. Jobs of type k arrive at each node with node-independent rate φ k . We denote the total arrival rate to the node by φ (= ∑ k φ k ), and without loss of generality, we assume a time scale such that φ = 1. We also consider what we call global class J k that consists of the collection of local class R ik , i.e., J k = i R ik . J k thus consists of all jobs of type k. Whereas, for local class R ik , all the jobs arrive at the same node i, the arrivals of the jobs of global class J k are equally distributed over all nodes i.
The average processing (service) time (without queueing delays) of a type-k job at any node is 1/μ k and is, in particular, node-independent. We denote φ k /μ k by ρ k and ρ = ∑ k ρ k . Out of type-k jobs arriving at node i, the rate x ijk of jobs is forwarded upon arrival through the job-transfer means to another node j ( = i) to be processed there. The remaining rate x iik = φ k − ∑ j( =i) x ijk is processed at node i. Thus ∑ q x iqk = φ k . That is, the rate x ijk of type-k jobs that arrive at node i is forwarded through the job-transfer means to node j, while the rate x iik of local-class R ik jobs is processed at the arrival node i. We have 0 ≤ x ijk ≤ φ k , for all i, j, k. Within these constraints, a set of values for x ik (i = 1, 2, · · · , m, k = 1, 2, · · · , n) are chosen to achieve optimization, where x ik = (x i1k , · · · , x imk ) is an m-dimensional vector and called 'local-class R ik strategy'. We define a global-class J k strategy as the mm-dimensional vector x k = (x 1k , x 2k , · · · , x mk ). We will also denote, by an mmn-dimensional vector x, the vector of strategies concerning all local classes, x = (x 1 , x 2 , · · · , x n ).We call x the strategy profile.
For a strategy profile x, the load β i on node i is The contribution β (k) i on the load of node i by type-k jobs is and clearly β i = β We denote the set of x's that satisfy the constraints (i.e., ∑ l x ilk = φ k , x ijk ≥ 0, for all i, j, k) by C. Note that C is a compact set.
We have the following assumptions: Assumption Π1 The expected processing (including queueing) time of a type-k job that is processed at node i (or the cost function at node i), is a strictly increasing, convex and continuously differentiable function of β i , denoted by μ −1 k D(β i ) for all i, k. Assumption Π2 The expected job-transfer delay (including queueing delay) or the cost for forwarding type-k jobs arriving at node i to node j (i = j), denoted by G ijk (x), is a positive, nondecreasing, convex and continuously differentiable function of x. G iik (x) = 0. Each job is forwarded at most once.
We refer to the length of time between the instant when a job arrives at a node and the instant when it leaves one of the nodes, after all processing and job-transferring, if any, are over as the sojourn time for the job. The expected sojourn time of a local-class R ik job that arrives at node i, T ik (x), is expressed as, where Using the fact that all nodes have the same arrival process, the expected sojourn time of a global-class k job is The overall expected sojourn time of a job that arrives at the system is

(A) [Completely centralized decision scheme: Overall optimization]
The overall optimumx is unique and given as follows: For all i, j( =i), k,x ijk = 0 andx iik = φ k (No transfer facility is used). The expected sojourn time is, for all i, k,

(B) [Completely dispersed decision scheme: Individual optimization]
The individual optimumx is unique and equal to the overall optimumx. Therefore, in contrast to the Braess network, no paradox occurs in the individual optima.
(C) [Intermediately dispersed decision scheme: Group optimization] Furthermore, we have the following assumption on the job transfer facility.
[Assumption Π3] Define the following function G ijk (x): Type G-I: G ijk (x) = ω −1 k G(ω −1 k x ijk ) (One dedicated line for each combination of a pair of origin and destination nodes, and a local class: i.e., m(m − 1)n lines in total). Type G-II(a): G ijk (x) = ω −1 k G(∑ p,q =p ω −1 k x pqk ) (One bus line for each global class: i.e., n bus lines in total), Type G-II(b): G ijk (x) = ω −1 k G(∑ p,q( =p),r ω −1 r x pqr ) (One common bus line for the entire system: i.e., 1 bus line.) where ω k is a constant, G(0) = 1, and G(x) is a nondecreasing, convex, and differentiable function of x.
Remark A.1 ω −1 k can be regarded as the expected job transfer time (without queueing delays) for forwarding a Type-k job from the arrival node to another processing node. 2 The group optimumx satisfies, for all i and k, the following: where (x −(ik) ; x ik ) is the mmn-dimensional vector with the elements corresponding tox ik being replaced by x ik .
Defineg ijk (·) as follows:g In the case where the assumption Π3 holds, for all i, j( = i), k that satisfies x ijk = x k , denote as follows: Group optimum: Denote Γ k = ρ 2 k σ −1 k and σ k = φ k /ω k . The group optimumx is unique and is given as follows: The cases of G-I and G-II(a) (a) As to group R ik such that Γ k D (ρ) ≤ G(0): For all i, j( = i)x ijk = 0, andx iik = φ k . This is identical to the overall optimumx. Similarly, the expected sojourn time is, for all i, k, (b) As to group R ik such that Γ k D (ρ) > G(0): For all i, j( = i), k,x ijk =x k , wherex k is the unique solution of the following: The expected sojourn time is, for all i, k, The case of G-II(b) The group optimum is obtained by the following steps: First, reorder k as follows: The we have the following 3 cases as to K, or Γ n D (ρ) > G(0) (that is, K = n) (25) or Γ 1 D (ρ) ≤ G(0).
In the case (26), the unique solutionx k = 0, for all k, is obtained. In the case (24) or (25), the unique solution is obtained in the following way. Define F k (X) as follows: .
Obtain the largest k and X =X˜k(> 0) that satisfy F˜k(X˜k) = 0 and [Γ˜kD (ρ) − G(X˜k)] > 0. Then, by using the next equation (27), we obtainx k for k = 1, 2, · · · ,k, We can thus obtain the unique set of values such thatx k > 0, k = 1, 2, · · · ,k and thatx˜k +1 = x˜k +2 = · · · =x n = 0. This is the unique solution. The expected sojourn time is, for all i, k, Therefore, we have the following conclusion: In symmetric distributed systems, the necessary and sufficient condition for the occurrence of paradoxes is that there exists a job type k such that Γ k D (ρ) > 1.
Remark A.2 Thus, the possibility of the coincident cost degradations depends on the value Γ k (= ρ 2 k /σ k = φ k ω k /μ 2 k ), and different for each group of jobs. The probability of paradox is higher for groups with the larger arrival rate (φ k ), with the longer processing time (μ −1 k ), and with the smaller job-transfer capacity (ω −1 k ). Furthermore, in the case of higher utilization factors of each node (ρ (= ∑ k ρ k )), paradoxes may occur more easily. By noting that ∑ k φ k = φ = 1, as the number n of job types is greater, each φ k becomes smaller, and the possibility of paradox may be smaller. On the other hand, the number of nodes m may not have big influence on the occurrence of paradoxes. 2

Hisao Kameda
University of Tsukuba, Japan