Overall results for some particular cases

## 1. Introduction

Some decades ago the increasing importance of the telephony service pushed most telecommunications companies (TELCOs) to deploy optical fibre networks. In order to guarantee appropriate service availability, these networks were designed in such a way that several independent paths were available between each pair of nodes, and in order to optimize these large capital investments several models and algorithms were developed.

Already the optimal design of a single layer network is a challenging task that has been considered by many research groups, see for instance the references: [1-3]. Throughout this work this optical network is referred to as the *physical layer*.

Some years afterwards, the exponential growth of Internet traffic volume demanded for higher capacity networks. This demand led to the deployment of dense wavelength division multiplexing (DWDM) technology. Today, DWDM has turned out to be the dominant network technology in high-capacity optical backbone networks. Repeaters and amplifiers must be placed at regular intervals for compensating the loss in optical power while the signal travels along the fibre; hence the cost of a lighpath is proportional to its length over the physical layer. DWDM supports a set of standard high-capacity interfaces (e.g. 1, 2.5, 10 or 40 Gbps). The cost of a connection also depends of the capacity but not proportionally. For economies of scale reasons, the higher the bit-rate the lower the per-bandwidth-cost. The client nodes together with these lightpath connections form a so-called *logical layer* on top of the physical one.

The increasing number of per-physical-link connections -intrinsic to DWDM- may cause multiple logical link failures from a single physical link failure (e.g., fibre cut). This issue led to the development of new multi-layer models aware of the stack of network layers. Most of these models share in common the 1+1 protection mechanism, that is: for every demand two independent lightpaths must be routed such that in case of any single physical link -or even node- failure, at least one of them survive. The following references: [4] and [5] are good examples of this kind of models. Those multi-layer models are suitable for certain families of logical layer technologies such as: synchronous optical networking (SONET) or synchronous digital hierarchy (SDH) since both standards have 1+1 protection as their native protection mechanism.

During many years the connections of IP networks were implemented over SONET/SDH -for simplicity we will only mention SDH from now on-. Most recently: multiprotocol label switching (MPLS), traffic engineering extensions for dynamic routing protocols (e.g. OSPF-TE, ISIS-TE), fast-reroute algorithms (FRR) and other new features were added to the traditional IP routers. This new *technology bundle* known as IP/MPLS, opens a competitive alternative against traditional protection mechanisms based on SDH.

Since IP/MPLS allows recovering from a failure in about 50ms, capital savings may come from the elimination of the intermediate SDH layer. Another improvement of this technology is that the number of paths to route demands between nodes is not pre-bounded; so it might exist in fact a feasible different configuration for most failure scenarios. Since IP/MPLS allows the elimination of an intermediate layer, manages Internet traffic natively, and makes possible a much easier and cheaper operation for virtual private network (VPN) services, it is gaining relative importance every day.

Setting aside technical details and for the purpose of the model presented in this work, we remark two important differences between SDH and IP/MPLS. The first one is the need of SDH to keep different demands between the same nodes. In IP/MPLS networks, all the traffic from one node to another follows the same path in the network referred to as *IP/MPLS tunnel*. The second remarkable difference is how these technologies handle the existence of parallel links in the logical layer. In SDH the existence of parallel links is typical but in IP/MPLS parallel links may conflict with some applications so we will avoid them.

In this paper we address the problem of finding the optimal -minimum cost- configuration of a logical topology over a fixed physical layer. The input data set is constituted by: the physical layer topology -DWDM network-, the client nodes of the logical layer -IP/MPLS nodes- and the potential links between them, as well as the traffic demand to satisfy between each pair of nodes and the per-distance-cost in the physical network associated with the bitrates of the lightpaths to deploy over it. The decision variables are: what logical links do we have to implement, which bitrate must be assigned to each of them and what path do these lightpaths have to follow in the physical layer. For being a feasible solution a configuration must be capable of routing every traffic demand over the remaining active links of the logical layer for every single physical link failure scenario.

The problem previously described is NP-hard and due to its complexity we developed a metaheuristic based on GRASP to find good quality solutions for real size scenarios. Actually, we analysed the performance of the proposed metaheuristic using real-world scenarios provided by the Uruguayan national telecommunications company (ANTEL).

The main contributions of this article are: i) a model to represent a common network overlay design problem; ii) the design of a GRASP metaheuristic to find good quality solutions for this model; iii) the experimental evaluation based on real-world network scenarios.

The remaining of this document is organized as follows. A mixed-integer programming model will be presented in Section-2. In Section-3 we will show some exact solutions found with CPLEX for small/simple but illustrative problems; in this section we also analyze the intrinsic complexity of the problem. In Section-4 a GRASP metaheuristic to solve this problem is presented. Finally, in Section-5 we will show the solutions found with the previous metaheuristic for real-world network scenarios.

## 2. Mathematical model

We will now introduce the basic mixed-integer programming model that arises from the detailed interaction of technologies.

### 2.1. Parameters

The physical network is represented by an undirected graph

For every different pair of nodes

These paths are unique at every moment, but in case of link failures they may change to follow an alternate route. For simplicity we assume that the traffic volume is symmetric (i.e.

### 2.2. Variables

This model comprises three classes of variables. The first class is composed of the logical link capacity variables. We will use boolean variables

The second class of variables determines how are going to be routed the logical links over the physical network. If

The third and final class of variables is that that determines how the IP/MPLS tunnels are going to be routed against any particular failure in the physical layer.

### 2.3. Constraints

This problem comprises three groups of constraints. The first group of constraints establishes the rules that the routes of the lightpaths must follow to be feasible.

The meaning of the previous constraints is the following: (1) establishes that the number of capacities assigned to every logical link is at most 1 -it could be 0 if the link is not going to be used-; (2) and (3) guarantee that if any particular link

Before going any further we have to introduce a set of auxiliary variables:

The second group of constraints establishes the rules that the routes of the IP/MPLS tunnels must follow in the logical layer and their meaning is similar to the previous ones except for (1) and (8). The inequalities in (8) were added to guarantee that whatever the failure scenario is (

Variables sets

The group of constrains (15) prevents from using

### 2.4. Objective

The function to minimize is the sum of the cost of every logical link. According on what capacity was assigned to a logical link there is an associated per-distance-cost (

Although straightforward, this approximation is inappropriate because it is not linear. The following sub-problem expresses the objective value with an equivalent linear expression.

We used the real variable

The only exception is when the values of

## 3. Finding exact solutions

We will start by showing particular solutions for some simple example cases. The first example has four nodes

The following example comprises seven nodes and explores again the clique-over-cycle case. The remaining parameters are analogous:

Through these two examples we attempted to show that solutions are not intuitive even for very simple cases. To find optimal solutions we used ILOG CPLEX v12.1. All computations were performed on a Linux machine with an INTEL CORE i3 Processor and 4GB of DDR3 RAM. Table 1 shows information for several test instances analogous to those represented in Figure 1, that is:

|V| | b1 range | #variables | #constrains | Elapsed time(hh:mm:ss) |

5 | 2 – 6 | 1230 | 1640 | 00:00:00 – 000:00:11 |

6 | 3 – 9 | 3390 | 4035 | 00:00:02 – 000:19:31 |

7 | 2 – 12 | 7896 | 8652 | (*) 00:00:05 – 087:19:05 |

8 | 3 – 16 | 16296 | 16772 | (*) 00:00:02 – 100:10:17 |

We proved that: it is always possible to find minimal feasible solutions for these particular topologies and demands when:

We also proved that: the cycle configuration for the logical network -the simplest possible- is feasible for every

Keeping these physical and logical topologies while trying with simpler matrices of demand (e.g.

**Proposition 1:** The problem presented in this section is NP-Hard.

Demonstration lies under reduction of NPP (Number Partitioning Problem) to our particular problem that will be referred to as MORNP (Multi-Overlay Robust Network Problem) within this proof. NPP problem consist in finding two subsets with the same sum for a known multiset of numbers. Formally: given a list of positive integers:

finds its minimum value within the set

Since logical and physical topologies are the same and all distances are equal, the logical layer projected over the physical one for any optimal solution must copy the underlying shape. So, if there exists a solution for such an instance of MORNP, this solution should have a feasible routing scenario when transport -and logical- link

Because there is only one capacity both links must have been assigned with

Since all the transformation are of polynomial complexity it stands that

We proved the complexity of MORNP is intrinsic to the problem, since it is NP-Hard. Because of the previous result and like for most other network design problems, an exhaustive search for the optimal solution of the problem presented in this work is infeasible for real size problems.

## 4. Metaheuristics

We decided to use a metaheuristic algorithm based on GRASP to find good quality solutions for real instances of this problem. A very high level diagram of our algorithm is shown in Figure 3.

### 4.1. GRASP implementation

As for every GRASP implementation this algorithm has a loop with two phases. The *construction phase* builds a *randomized feasible solution*, from which a local minimum is found during the *local search phase*. This procedure is repeated

The *initialization phase* performs computations whose results are invariants among the iterations, like the shortest path and distance over the physical layer between each pair of nodes.

### 4.2. Construction phase

The *randomized feasible solution phase* performs a heuristic low cost balanced routing of the logical layer over the physical one. The exact solution for this sub-problem is also NP-Complete as it can be seen in [9]. The goal is to find a path for every lightpath, such that the number of physical link intersections is minimum. It is also desirable that the total cost be as low as possible but as a secondary priority. The strategy chosen in this heuristic is the following: nodes are taken randomly (.e.g.: uniformly), and for each node their logical links are also taken randomly but with probabilities in inverse ratio to the minimal possible distance of their lightpaths over the physical layer.

Instead of using the real distances of the physical links (

According to these new weights, logical links are routed following the minimal distance without repeating physical links among them. Usually, after routing some lightpaths the set of not-yet-used physical links empties, and it is necessary to start over a new *control window* by filling again the not-yet-used set. Prior to do this, the pseudo-distances are updated using the following rule:

For instance, let us guess our networks are like those sketched in Figure 4 and the links drawn are:

The next two logical links are

The elements of the input data in Algorithm-1 are: the logical graph *initialization phase*-. The output is an application between logical links and the subset of physical links used by their lightpaths.

The algorithm detailed in Algorithm-1 is the one depicted in Figure 4 and Figure 5. The outcome of the randomized feasible solution phase is a candidate configuration for the route of each lightpath over the physical network. We did not make use of capacity and traffic information yet; and before going any further we must state that -as in the exact examples- in our practical applications we limited the capacities set to only one capacity (

The main reason was that the telecommunications company we developed this application for, wanted the maximum possible bitrate for all the interfaces of its core network. The next issue is determining whether the configuration found is feasible or not. The answer to this question is far from being easy, since this sub-problem is NP-Complete. We have based on a heuristic to answer this question. The heuristic is the following: demands are taken in decreasing order of volume (

For instance, Figure 6 shows an example logical topology whose link capacities are 3. Let the demands be:

This constraint based routing algorithm is straightforward and it is based on Dijkstra's algorithm. Nevertheless an efficient implementation is quite complicated because of the following fact: to be sure a solution is feasible this algorithm must be repeated for each single failure scenario. In order to improve the efficiency: routes cache, optimized data structures and several others low-level programming techniques were used. This isFeasible function is used in both: construction and local search phases. The performance of this function is critic since it is used several times within the same iteration in the local search, as it is represented in Figure 3. Up to this point and before entering the local search phase, we have a feasible configuration for the routes of every lightpath; but we are still using all of the initial logical links and this input network is very likely to be over-sized. Moreover, in the construction phase we attempted to distribute the routes of the lightpaths uniformly over the physical layer, but it is still possible that many logical links fail simultaneously because of a single physical link failure. Therefore, it is very likely that many of these “redundant links” may be disposed of, if they are not really adding useful capacity. It is worth mentioning that from this point on and until the next iteration, lightpaths costs are revealed because we have their lengths -from the configuration for their routes- and there is only one possible capacity.

### 4.3. Local search

Through the local search phase we intend to remove the most expensive and unnecessary logical links for the current configuration. The process is the following: logical links are taken in decreasing order of cost for their lightpaths, each one is removed and the feasibility of the solution is tested again. If the solution remains feasible the current logical link is permanently removed, otherwise it is reinserted and the sequence follows for the remaining logical links. Once this processes is finished the result is a minimal solution. After

## 5. Application case context and results

We will focus now in the context of ANTEL -the telecommunication company we applied this metaheuristic to-. Prior to doing so we are giving some basic elements of the overall Internet architecture. Internet is actually a network that could be disaggregated into several separate smaller networks also known as *Autonomous Systems* (AS). Typically every AS is a portion of the global Internet owned/governed by a particular Internet Service Provider (ISP). Internet users access content residing in servers of: companies, universities, government sites or even from other residential customers (e.g. P2P applications). A portion of this content is located within the own network of the ISP this customer lease the service to -into some of the *Points Of Presence* (POP) of the ISP-, but most content is scattered over the Internet. Since traffic interchange is necessary among different ISPs, the Internet architecture needs special POPs known as *Network Access Points* (NAPs). Within these NAPs: Carriers, ISPs and important content providers (e.g.: Google, Akamai) connect to each other in order to interchange traffic.

This company had two different IP/MPLS networks referred to as: *aggregation network* and *public Internet network*. The *aggregation network* is geographically dispersed all over the country and it is responsible of gathering and delivering the traffic of the customers to the *public Internet network*. The *public Internet network* is where the AS of this ISP is implemented; centralizes the international connections with other ISPs as well as those to Datacenters of local content providers. The *public Internet network* is geographically concentrated and only has POPs in the Capital City and in an important NAP of the US territory (see grey clouds in Figure 7). In terms of the model covered in this article we may stand that the physical network has all of its nodes but one -the NAP- within the national boundaries.

There are four independent paths for international connections -leased to Carriers- between the NAP and the national boundaries. The *aggregation* and *public Internet* networks are both logical. The* public Internet network* only has presence in a few POPs of the Capital City and in the NAP; and although the *aggregation network* has full-national presence it does not span to the NAP. A Non-Disclosure Agreement (NDA) signed between the telecommunications company and our research institute protects more accurate information and details. The costs and traffic information shown in the rest of this article are only referential.

Several planning concerns arose from the situation exposed: Is it convenient the current architecture? Or it would be better to merge both IP/MPLS networks? Are profitable the IT infrastructure investments necessary to increase the percentage of local content? Which would be the optimal network to fulfil every demand requirements scenario? We helped to answer these questions by identifying representative scenarios and creating their associated data sets to feed the metaheuristic.

The overall performance of the algorithm described in Section-4 was very good -under the two hours of execution time in every scenario-.

We tried several scenarios based on the following considerations: traffic volume, network architecture and the percentage of locally terminated traffic. We selected eight remarkable scenarios to detail in Table 2.

According on traffic forecasts it is expected that some years from now the total volume of traffic to be placed somewhere between 57 and 100 (reference values). If some IT investments and agreements were made it is expected that the percentage of locally terminated traffic (national traffic) could be greater (High). These new potential sources of traffic would be placed in the Capital City, specifically in the same POPs where the public Internet network is present. Those scenarios where merged networks is set to False inherit the current network architecture.

scenarioindex | aggregatedtraffic demand | %localcontent | mergednetworks | numberof nodes | requiredlightpaths | totalcost |

1 | 100 | Low | False | 56 | 81 | 10,000,000 |

2 | 100 | Low | True | 68 | 118 | 7,662,651 |

3 | 100 | High | False | 56 | 81 | 7,578,234 |

4 | 100 | High | True | 68 | 133 | 5,713,563 |

5 | 57 | Low | False | 56 | 75 | 6,319,470 |

6 | 57 | Low | True | 63 | 94 | 4,872,987 |

7 | 57 | High | False | 56 | 75 | 5,108,587 |

8 | 57 | High | True | 63 | 105 | 4,064,597 |

Another remarkable aspect of this architecture is that whereas the aggregation network is deployed directly over the physical layer, there is an extra SDH layer between the public Internet network and the physical one. Since the public Internet network only has a few nodes and its protection relies on the 1+1 protection mechanism of SDH, its optimal value can be estimated easily. The only portion where we needed computer assistance is that of the aggregation network. The columns number of nodes and required lightpaths refers exclusively to the values for this last network.

On the other hand and in order to compare solutions fairly, the column *total cost* represents the combined cost of both networks -when they are not joined-. It is worth observing those scenarios: 1 and 3, as well as 5 and 7 require the same number of lightpaths. Moreover, their solutions use exactly the same lightpaths. This result should be expected because in both pairs of scenarios share the same traffic and non-merged network architecture; since Datacenters -the only difference- are connected to the *public Internet network*, the *aggregation network* is unaware of the percentage of local content. The only changes are in the *total cost* because of the saving of international capacity.

Less intuitive are those savings arising exclusively from the merging of both networks like: 1 and 2, 3 and 4, and so on. The reason is the following: “the routing search-space of the IP/MPLS technology is much greater than that of the SDH equivalent, so it is much more efficient”. For simplicity let us guess for a while that traffic does not need to be fitted in tunnels and instead can behave as a fluid. Since the length of international connections is measured in thousands of kilometres, this links are the most expensive of the physical network. As it was showed in Figure 7 there are four independent connections to the NAP; hence if we needed to guarantee 60Gbps of international traffic we could reserve 20Gbps in every one of these links, because a single failure could only affect one of them. Therefore the efficiency in the usage of international connections could rise to 75% if the efficiency of IP/MPLS would be available. The protection mechanism of SDH (1+1) cannot exploit this degree of connectivity. To protect 60Gbps of traffic using SDH active/stand-by independent paths, we always need other 60Gbps of reserved capacity, so the efficiency of SDH it is limited to 50%. The improved efficiency of IP/MPLS to exploit the extra connectivity degree between local and international traffic explains by itself most of the savings.

## 6. Conclusion

Perhaps the most remarkable result of this work relies on exposing through a real-world application, how much more cost-efficient could be networks deployed using IP/MPLS, than those based on traditional protection schemes like SDH. This efficiency comes not only from the savings linked to the elimination of an intermediate layer, but also from the extra degrees of freedom available to route the traffic.

We presume that the application this work dealt with is not an exception, and the potential savings might replicate from one ISP to the other. The results for the examples of Section-5 and their later analysis justify the convenience to update network design models this work introduced, in order to follow the new technology trends and exploit their benefits.

Regarding on the metaheuristic, we presume that applying path-relinking could significantly increase the computational efficiency of the proposed GRASP, so this is the line of our immediate future work.