Gaps with respect to CPLEX by instance size.

## 1. Introduction

Districting, geographic demand zone segmentation, and customer clustering subjects, in their different varieties, are rather common and well studied problems within the literature of design and planning of logistic and supply chain networks, mainly focused on distribution/collection supply chain subsystems. Moreover, when a product distribution fleet must be defined and designed, customer-vehicle assignments appear as relevant decisions to be addressed, which can be indirectly considered as a customer clustering/segmentation method. Finally, other kinds of company operations might require some type of demand segmentation, such as territory sales balance as in Zoltners and Sinha (1983), among others.

These problems have been addressed with a wide variety of approaches, such as cluster analysis, that are a collection of statistical methods and graphic/computational tools (such as Geographic Information System, GIS). For instance, in Barreto et al. (2007) a clustering analysis was employed to solving a capacitated location-routing problem. GIS systems are commonly used in districting problems, such as in (Kalcsiscs et al. 2005) where the integration of districting models into Geographic Information Systems is discussed.

Within the field of operations research (focus of the present chapter), several techniques have been considered to deal with this type of problems. The main approaches are: vehicle routing problems (Crainic and Laporte, 1998; Laporte and Osman, 1995), continuous approximation of routing cost (Novaes and Graciolli, 1999, Novaes et al., 2000), and location-allocation modeling structures (Koskosidis and Powell, 1992; Miranda and Garrido, 2004a; Gonzalez-Ramirez et al., 2010-2011). In all these cases, the general framework requires defining a set of decision variables and a mathematical optimization model (e.g. inclusion of customers into demand zones), in which a single or a set of performance indicators should be optimized (e.g. expected routing costs, workload balance, fleet size, demand zone compactness, etc.). These previously mentioned tasks must be addressed observing several practical, logical or mathematical constraints, in order to assure that the optimal solution found is feasible to be implemented.

In addition to the previously mentioned modeling process, the need to solve these models must be addressed by building and implementing computational algorithms for searching optimal or near optimal solutions. Moreover, let us note the high complexity of this type of problems (usually considered NP-Hard), which means that the majority of the employed methods are heuristics and meta-heuristics (i.e. near optimal algorithms). Among the heuristic techniques are those based on evolutionary techniques (Novaes et al., 2000), construction and improvement algorithms (Gonzalez-Ramirez et al., 2010-2011; Koskosidis and Powell, 1992), and those based on mathematical programming (Miranda and Garrido, 2004a).

Based on the different problems and formulations observed, several criteria have been considered as performance indicator for the customer clustering, depending on the stage and level within Supply Chain Planning process addressed (i.e., strategic, tactical, or operational level). For example, when customer grouping is developed in order to reduce the size of the problem, or when the objective is designing preliminary areas for sale purposes or physical distribution, concepts such as compactness and closeness between customers appear as relevant criteria (in the objective function or as restrictions of the optimization model). On the other hand, considering the case of distribution/collection fleet sizing, transportation cost generated by different routes defined for each vehicle is considered as one of the most frequently usedcriterion to be optimized, along with vehicle capacity constraints.

Finally, given the wide variety of criteria observed to address customer clustering and geographic demand zoning problems, multi objective optimization appears as a rather common approach in the related literature, in which a set of Pareto efficient or Pareto optimal solutions must be found. A Pareto efficient solution is one for which it is not possible to reach any other feasible solution presenting a better performance for a single criterion, without observing a worse performance of any other criterion. According to the aforementioned concerns, this chapter presents a review and taxonomy of different criteria and modeling approaches employed to address different customer clustering problems, along with a review of solution approaches observed in the literature.

Section 2 presents a literature discussion focused on districting or customer clustering problems, including analyses of problem context, design criteria, and modeling and solution approaches. Particularly, this section includes a description of different cost modeling structure for vehicle routing based customer clustering problems. Section 3 summarizes two researches addressing districting/customer clustering problems, with different context, design criteria and solution approaches,consideringa similar Mixed Integer Programming (MIP) modeling structure, based on location-allocation decision variables. Section 4 proposes a general framework for addressing the problem of strategic supply chain network design problems, including integrated districting/customer clustering decisions, based on vehicle routing considerations within a more general inventory location modeling structure. Finally, Section 5 presents the main conclusion of this chapter, along with a discussion of further research, perspectives, and final remarks.

## 2. Literature review and discussion

### 2.1. Districting /customer clusteringapplications

Districting or territory design is a geographical problem that involves the partitioning of a region into smaller areas in order to optimize operations for a given criterion (Muyldermans et al., 2003). A district is designed to be used for a long period of time (at least one year) because of the big effort required for adapting the system, physically, operationally and managerially. For this reason, districting is considered a strategic or tactical decision with the aim of yielding good performance of the operations and system robustness to deal with small changes in a variety of environmental factors.

Districting emerges in different contexts such as politics, health care, sales territory planning, covering planning for emergency centers (fireman and police stations), public schools planning, logistics and routing applications. In each case, the districting process serves different purposes and can be economically motivated or have socio-demographic aims (Kalcsics et al., 2005). For instance, in political districting problems, the region under consideration is partitioned into smaller regions to which each candidate belongs. On the other hand, logistics districting is mainly associated with the routing activities of a company, having a strong impact in their performance (Van Oudheusden et al., 1998). The most common contexts of the logistics districting problem are: distribution/collection services, emergency services, medical, fire and police (Moonen, 2004).

Tavares-Pereira et al. (2007a) highlight that districting problems can be classified in terms of two factors: the number of criteria and the solution method (exact and non-exact algorithms). However, since Altman (1997) showed that districting problem belongs to the class of NP-Complete problems (a very complex class of problems which cannot be solved by polynomial algorithms), all the reviewed works propose heuristic methods. According to Grilli di Cortone et al. (1999), there are two main constructive techniques for districting problems: division and agglomeration. In the former, the service region is considered as a whole and divided into pieces. In the latter, a region is already split into small areas, which are aggregated to build the districts. Accordingly, in this chapter heuristic approaches are also presented.

#### 2.1.1. Contributions in logistics districting problems

One of the earliest works in the area of logistics districting is proposed by Keeney (1972). He addresses the problem of partitioning an area such that each district is assigned to an existing facility, considering a single greedy criterion. Hardy (1980) compares the method for vehicle routing proposed by Clarke and Wright (1964) with a methodology based on a districting approach. Wong et al. (1984) consider a problem very similar to districting known as the Vehicle Routing Using Fixed Delivery Areas (VRFDA), where a service area is divided into fixed sub areas in which the daily route may change from day to day. The authors propose a methodology in which total travel distance is minimized. A special case of the VRFDA is the Fixed Routes Problem (FRP) studied by Beasley (1984) in which the service region is divided into sub areas in which the route is fixed from day to day.

Daganzo (1984a) proposes an approximation method for the design of multiple-vehicle delivery zones, seeking tours of minimum total length. The objective of his work is to explore the impact that the zone shape has on the expected length of each route. Daganzo (1984b) presents a methodology in which the region is partitioned into zones of nearly rectangular shape elongated toward the source. In his work, the number of points is large compared to vehicle capacity. Newell and Daganzo (1986a) analyze the districting of a region in which the underlying network of roads is a dense ring-radial network. They propose an approximation method for the design of multiple-vehicle delivery tours in which the aim is to minimize total travel distance. Han and Daganzo (1986) investigate the design of delivery zones for distributing perishable freight without transshipment. Daganzo (1987a, 1987b) investigates and models distribution problems with time windows.

Langevin and Soumis (1989) study the problem of designing multiple vehicle delivery tours satisfying time constraints for the letter and parcel pick-up and delivery problem using a continuous approximation model. The authors propose a methodology that involves partitioning the region into approximately rectangular delivery zones that are arranged into concentric rings around the depot. Rosenfield et al. (1992) study the problem of planning service districts with a time constraint and derive analytical expressions to determine the optimal number of service districts for the U.S. postal system. Robusté et al. (1990) employ continuous approximations for fleet design vehicle routing problems. Novaes and Graciolli (1999) present a methodology to design multi-delivery tours associated with the servicing of an urban region of irregular shape, assuming a discrete grid-cell representation of the served region. In contrast, Novaes et al. (2000) present a methodology for solving the same problem but using continuous approach to represent the region. Both, Novaes and Graciolli (1999) and Novaes et al. (2000), assume a polar coordinate system (known as a ring-radial system), model capacity probabilistically based on chance constraint programming, and employ a continuous approximation to determine routing costs and times.

Muyldermans et al. (2002, 2003) address the problem of districting for salt spreading operations on roads. They present a graph based model, assuming that each district is served by a single facility. The objective is to minimize the deadheading distances, and the number of vehicles required. Miranda and Garrido (2004a), extending the work of Koskosidis and Powell (1992), propose a Location-Allocation modeling structure along with a Lagrangian relaxation based heuristic, in which a Hub and Spokes cost structure is considered as a performance indicator to be optimized. Their research addresses a fleet design problem within a supply chain network, with a set of fixed and known distribution centers. Each distribution center will be assigned to a set of vehicle routing zones, which is a set of problem decision variables. A greedy assignment non-capacitated criterion is considered for assigning fleet zones or clusters to existent distribution centers.

Haugland et al. (2005) consider the districting problem for vehicle routing problems with stochastic demands, along with a Tabu Search (TS) and a multistart heuristic to solve the problem. They address a two-stage stochastic problem with recourse that seeks to minimize the expected travel time for each district. Galvao et al. (2006) extend the model presented by Novaes et al. (2000) and present a special case of a Voronoi diagram. Tavares-Pereira et al. (2007a) consider the districting problem with multiple criteria. They propose a method to approximate the Pareto front based on an evolutionary algorithm with local search. Tavares-Pereira et al. (2009) propose metrics to compare partitions obtained in a districting configuration, specifically for the case of a connected, undirected, with a graph representation. Novaes et al. (2009) develop two continuous location-districting models applied to logistics problems combining a Voronoi diagram with an optimization algorithm.

González-Ramírez et al. (2010, 2011) analyze a logistics districting problem for package pick-up and delivery within a region, motivated by a real-world application. The region is divided into districts, each served by a single vehicle that departs from a central depot. The districting process aims to optimize two criteria: compactness and workload balance, but the problem is formulated as a single objective problem, with the weighted sum of the two criteria under consideration. The authors propose a heuristic solution approach combining elements of Grasp and Tabu Search.

#### 2.1.2. Contributions in other fields

Some applications of districting include politics. In this field we can mention the work of Hess et al. (1965), who present a Location-Allocation heuristic under population equality, compactness, and contiguity considerations. Garfinkel and Nemhauser (1970) present an exact algorithm to solve this problem under contiguity, compactness, and limited population deviation. Hojati (1996) proposes a three-stage approach and Mehrotra et al. (1998) propose a column generation approach based on branch and price. A review on political redistricting is presented by Williams (1995).

Regarding sales territory design, the first reviews are provided by Zoltners, (1979) and Zoltners and Sinha, (1983). Fleishman and Paraschis, (1988) study a sales territory alignment for a German company for consumer goods and develop a procedure based on a location-allocation. Drexl and Haase (1999) study the problem simultaneously with sales force sizing, salesman location, and sales resource allocation. More recently, a commercial territory design was introduced by Rios-Mercado and Fernandez (2009) that differs from traditional sales territory design in that rather than placing salesmen in territories the authors are interested in locating service centers.

Regarding school districting problems, Diamond and Wright (1987) consider the case where a limited number of schools are allowed. Church and Murray (1993) consider the problem of redesigning school districts to achieve racial balance. Elizondo et al. (1997) present a model to minimize the travel distances of the students. For a review of the most relevant work on this problem refer to Caro et al. (2004).

Within health care system applications, Blais et al. (2003) study a districting problem for a local community health clinic optimizing “visit personnel mobility” and “workload equilibrium”, both criteria combined into a single objective function. Baker et al. (1989) study the redistricting problem of primary response areas for county ambulance services. Regarding the police context, Bodily (1978) designs patrol sectors using multi attribute utility theory to include preferences of the interest groups. More recently, D'Amico et al. (2002) present a simulated annealing algorithm for the redistricting police command boundaries. For a more extensive review of the districting problem in the context of emergency sites, refer to Moonen (2004).

### 2.2. Design criteria issues

As can be observed in the previous section, there are several research works and applications addressing decisions of districting/customer clustering, routing and location issues, motivated and based on a wide variety of criteria for solution performance evaluation. According to Muyldermans (2002), objective functions for evaluating the performance of location and routing decisions are very intuitive, and usually strongly related to the optimization of economic criteria. For instance, facility location models typically deal with minimizing total or maximal distance from depots to customers, and vehicle routing problems normally aim to optimize the number of tours or vehicles, the total distance traveled, time spent on service and travel, violation of capacity an time constraints, etc. (some details about these performance indicators and modeling approaches are presented in sections 2.4, for routing problems, and 4.2, for location problems).

However, as stated by Muyldermans (2002), it seems to be much more difficult to specify measurements for achieving a good districting configuration, due partially to the fact that these measurements rely strongly on the districting context and practical concerns. For a districting problem, the most common requirement is the * workloadbalance*among districts(or population equality for some other applications), and geographic

*of the districts. Compactness definition may vary according to the districting context, but in general, it impliesdistricts to be as round or square as possible, avoiding elongated shape districts. Another metric that is commonly found in the literature is*compactness

*, which is related to the possibility of walkingfrom any point to any other location within the same district without leaving it, as if it were a single land parcel (Ricca, 2004). Another criterion is the*contiguity

*which looks for districts composed of customers as homogeneous as possible. Naturally, homogeneity may be defined according to the problem context and features. Others requirements in districting are*inner variance,

*, which means that a point should not partially allocated to different districts, and*integrity

*, which means that if one draws any closedcurve in a district, all points within theinner domain of the curve belongto the same district. This last constraint has been shown to be fully satisfied when the problem is modeled as a connected graph in(Ricca, 2004).Some other common requirements are to satisfy time or capacity constraints or a maximum budget for routing costs.*hole absence

### 2.3. Modeling approaches

We distinguish in the literature two main modeling approaches: Mixed Integer Programming (MIP) and Continuous Modeling (CM). Within MIP, districting/clustering decisions are usually modeled through binary variables, which model the inclusion of customers, demand points, or nodes, to each district. Some works using MIP modeling approaches are Hardy (1980), Haugland et al. (2005), Tavares-Pereira et al. (2007a), González-Ramírez et al. (2010, 2011), Koskosidis and Powel (1992), Miranda and Garrido (2004a), Marianov and Serra (2003) and Caro et al. (2004). On the other hand, CM assumes a continuous segmentation of the region, in which districts are defined by a specific shape, representing decision variables of the problem. For example, in CM category we can mention Newell and Daganzo (1986a), Langevin and Soumis (1989), Novaes et al. (2000), Galvao et al. (2006), Novaes et al. (2009), Novaes and Graciolli (1999). In addition, MIP and CM can be considered as Mathematical Programming (MP) approaches.

Finally, we define a third category refereed to as the “* Algorithmic Approach*(AA)”, which does not explicitly model the problem as a standard optimization problem (mono or multi criterion objective function, mathematical constraints, decision variables, etc.), but pays more attention to performance indicators (mono or multi criterion) and solution methodologies to obtain well performing districting configurations (usually consisting of heuristic approaches). Some examples are Keeney (1912), Deckro (1977), Wong et al. (1984), Muyldermans et al. (2002, 2003).

### 2.4. Vehicle routing modeling structures for districting/customer clustering

Focusing on vehicle routing based clustering, four main approaches can be found in the literature, which model routing costs and decisions explicitly, but based on different objectives, hierarchical levels, problem characteristics and assumptions: standard Vehicle Routing Problems (VRP), Hub and Spokes Structures (H&S), Continuous Approximation (CA), and more recently, the Probabilistic Traveling Salesman Problem (PTSP).

#### 2.4.1. Standard vehicle routing modeling structure

Vehicle Routing Problems (VRP), one of the most studied problems in Operation Research and Mathematical Programming, are based on the well known Traveling Salesman Problem (TSP). In the TSP, an optimal visit sequence for a known set of customers must be found, as indicated in Figure 1, minimizing total cost, distance, time, or any other related metric.In general, for the VRP under deterministic assumptions, a specific visit sequence for each vehicle considered is the standard output or decision modeled. In the VRP with time windows, the specific instant in which every single customer is visited is specified, instant that must observe minimum and maximum values. We refer to Crainic and Laporte (1998), Laporte and Osman (1995), Dantzig, Fulkerson and Johnson (1954), Hoffman and Wolfe (1985), for thorough reviews on the VRP and other related problems.

However, when a strategic perspective is considered, and moreover when assuming stochastic behavior in customer demand and appearance, this modeling structure diminishes in attractiveness and appropriateness, mainly due to:

Specific visit sequences are not required in the long run, and they are usually modified and re-optimized in the short run, according to customer requirements.

Specific visit sequences are not feasible and applicable in all days, due to the set of customer to be visited each day is not the same, and more over, not known in advance.

Routing costs provided by standard deterministic VRP models may not be a good estimate of expected costs in stochastic long-run scenarios.

#### 2.4.2. Hub &spokes cost structure

Recently, Miranda et al. (2009) analyze the inclusion of fleet design decisions and routing costs within a known inventory location model, based on a Hub & Spokes cost structure. These modeling structures for routing costs and decisions are previously developed in Miranda and Garrido (2004a), assuming a fixed and known distribution network configuration. These models propose the inclusion of an approximated cost estimation for fleet design and customer clustering (demand zoning) into previous Inventory-Location Problems, which address supply chain network design problems, neglecting specific visit sequences decisions as in the VRP and TSP.

The Hub & Spokes modeling structure, as shown in Figure 2, models two types of transportation costs: a direct cost from each warehouse to a hub or centroid of each customer cluster; and a second inner cluster cost, from each hub or centroid to each customer included in the respective cluster. Accordingly, hub selection and customer-hub assignments, define entirely the routing system and costs. Hub & Spokes cost structure and models are studied and analyzed by Marianov and Serra (2003), Sasaki et al. (1999), Mayer and Wagner (2002), Koskosidis and Powell (1992), and Campbell (1994), among several other works in the Hubs and Facility Location literature.

#### 2.4.3. Continuous approximation of routing costs

One of the main contributions in the last decades, within Logistics and Supply Chain Management, is the computation of the expected distance or cost of vehicle routing systems (similar to the TSPand its extensions), based on Continuous Approximation (CA). These computations are usually required previous to the design and operation of distributing systems. Daganzo (1984a, 1984b, 2005)propose the use of CA in order to develop simple closed mathematical expressions for TSP, and also for other logistic issues, presenting good empirical results.

Figure 3 indicates that routing cost, RC, is estimated by a continuous approximation * RCCA*(Routing Cost Continuous Approximation), in terms of the number of points or nodes (

*) and the dimension of the area involved (*N

*).*A

For example, in the most basic case, for a given compact and convex area * A*containing

*demand points or customers, the optimal TSP distance can be accurately approximated by*n

#### 2.4.4. Probabilistic traveling salesman problem

The Probabilistic Traveling Salesman Problem (PTSP), introduced by Jaillet (1985,1988) and reviewed later by Powell et al. (1995), is an extension of the TSP. In this problem each known existent customer * i*has a probability

p

_{i}to require a visit in a specific routing period. Accordingly, an optimal

*solution must be found in which all customers must be included (independent of their requirements),for which an*a-priori

*solution is defined for each possible scenario or realization, just skipping each absent customer, as shown in Figure 4. The optimal*a-posteriori

*solution is defined by the minimization of the expected costs considering all possible realizations, in which a weighted summation of distances is considered,where the occurrence probabilities of the realizations are the weights. Thus, the expected cost of any given*a-priori

*tour can be obtained estimating the cost for each possible realizations of the problem. For*a-priori

*nodes this has a complexity of O(*n

n

^{.}2

^{n}), which makes the simple process of computing the costs of a single

*tourprohibitive.*a-priori

However, the tour length can also be computed with a lower complexity. Without loss of generality, it is assumed that in the * a-priori*sequence(

*), the nodes are visited in an ascending numbering order, namely,*τ

*=1, 2, 3,…,*τ

*. Then, the expected value of the associated costs of this*n

*sequence can be computed with the followingexpresion:*a-priori

Where: E[* (*)] is the expected cost of an a-priori tour ; p

_{i}is the node i appearance probability; and q

_{i}is the probability of node i not appearing in the problem’s realization (1-p

_{i}). In the first summation the term multiplying d

_{ij}represents the probability of making a trip between i and j for any realization of the problem, where p

_{i}p

_{j}is the probability of nodes i and jbeing present, and is the probability of all nodes between i and j being absent in the realization. Similarly, in the second summation, the term that multiplies d

_{ij}is associated to the return trip made from the last visited node, i, to the node where the tour started, j. This calculation of E[L()] can be obtained with O(n

^{2}) complexity.

Given the high complexity of PTSP (NP-Complete Probabilistic Combinatorial Optimization Problem), it has been solved mainly by heuristic approaches, which usually are extensions of well known heuristics of the TSP, as presented in Bianchi and Campbell (2007), Bianchi et al. (2005), Lamas et al. (2007), Tang and Miller-Hooks (2005), and Branke and Guntsch (2004). However, Laporte et al. (1994) developed an exact algorithm to solve the PTSP, but it is applicable only for small instances.

## 3. Two cases for logistics districting and customer clustering

In addition to a review of different approaches that can be employed to address clustering problems, we present two different cases based on MIP approaches, assuming location-allocation decision variables, and addressing different practical concerns.

The first one, based on Gonzalez-Ramirez et al. (2010 and 2011), is focused on a general view of the districting problem, in which routing costs are not considered, and wheredistrict shapes and workload balancing issues are the main focusof the research. The number of districts is known and predetermined by a planner, and a known single depot is considered.

In contrast, in Miranda and Garrido (2004a), a similar modeling structure is proposed to design a distribution system, in which approximated routing costs and stochastic vehicle capacity constraints are modeled explicitly as design criteria. In this problem, the number of districts (equivalent to the fleet size) is a decision to be optimized. Finally, in this paper a set of pre-existent warehouses is considered, and a simple greedy criterion is consider to assign the clusters to the existent warehouses.

### 3.1. District design for aparcel and pickup problem

#### 3.1.1. Problem description and mathematical formulation

Consider a connected, undirected graph G(V,E) where V is the vertex set and E the edge set. The graph is generally not complete. We assume that all the edges e_{rs}=( v_{r}, v_{s}) have a positive length and represent a real road between adjacent points v_{r}and v_{s}. Distances between points are edge lengths for those points that are connected in the graph and shortest path distances for other pairs of points. A district is defined as a subset of the points. Each vertex may require either a pickup or a delivery. The aim of the districting procedure is to optimize two criteria: balance of the workload content among the districts and compactness of district shapes. The mathematical model proposed for this problem consists of a single objective model in which the weighted sum of both criteria is minimized.

Compactness is not defined precisely for all the districting problems in the literature and it is generally defined according to the application context. For this problem we define it as the distance between the two furthest apart points in a district and we proposed a minimax objective in which we attempt to obtain compact districts when the maximum compactness metric is minimized.

The workload content of a district is defined as the time required to perform all required pickups and deliveries plus the time needed to drive from the depot to the farthest point in the district. In order to balance the workload content among districts, we propose to minimize the maximum workload allocated to a district. We also attempt to obtain districts with balanced workload content by minimizing the dispersion of the workload assigned to each district, which is represented by the sum of the absolute value of the differences between the workload content of each district with respect to the average workload.

We propose a single objective mathematical formulation in which the weighted sum of the compactness metric and the maximum workload content assigned to a district is minimized, each of them normalized. The following notation is defined:

:Maximum number of pickups for each district.

:Maximum number of deliveries for each district.

J:District set, J={1,…,m}.

wp_{i}, wd_{i}:Number of pickups or deliveries respectively, requested bydemand point i,iV.

Stp, Std:Stopping time per pickup and delivery respectively in each node.

d_{ik}:Distance from point i to point k, i,kV,λ= Scale factor, 0≤λ≤1.

d_{0i}:Distance from the depot to the point i, iV.

Sp:Average speed.

Nz:Normalization parameter for the compactness metric.

Nw:Normalization parameter for the workload metric.

The following decision variables are defined:

X_{ij}:Binary variable. 1 indicates that customer iis assigned to district j.

And the following auxiliary variables are defined:

W:Continuous variable that represent the maximum workload content assigned to a district.

Z:Continuous variable that measure the compactness as the maximumtravel time between the furthest apart points of a district.

D_{j}:Continuous variable that takes the value of the traveling time from the depot to the farthest point of district j.

M_{j}:Continuous variable that takes the value of the traveling time between the two furthest apart points of district j.

The mathematical formulation model is as follows:

Equation GR–(1) is the objective function that minimizes a weighted average of the maximum workload and maximum compactness metrics. The objectives are normalized and the relative weighting is given by λ. Constraints GR–(2) guarantee that each demand point is assigned to only one district. Constraints GR–(3) and GR–(4) guarantee that each district has a maximum of pickups and deliveries, respectively. These constraints help to balance the number of pickups and deliveries allocated to a district so that the capacity of the vehicles is not exceeded. Constraints GR–(5) guarantee that M_{j}takes the value of the maximum travel time between the points assigned to each district in time units. Constraints GR–(6) guarantee that Z takes the maximum value over M_{j}.Constraints GR–(7) guarantee that D_{j}takes the value of the time from the depot to the farthest point of each district j. Constraints GR–(8) guarantee that W takes the maximum amount of workload of a district. Constraints GR–(9) are the binary requirements. Normalization parameters are estimated with respect to the optimal values of the compactness and workload content

#### 3.1.2. Solution approach

A multi-start heuristic algorithm that hybridizes GRASP with Tabu Search is proposed. It consists of two phases as it is typical of a GRASP approach: construction of a feasible initial solution and improvement by local search.

GRASP is a multi-start constructive metaheuristic proposed by Feo and Resende (1989) in which a single iteration consists of two phases: i) construction of an initial solution, and then ii) improvement of the solution by a local search approach. The construction phase includes greedy criteria, and it is randomized by the definition of a list with the best candidates, from which a candidate is selected randomly. Among all the solutions created, the best solution is reported as the final step of the algorithm. For a detailed description of GRASP, see Resende and Ribeiro (2002), in which the authors present details of different solution construction mechanisms, techniques to speed up the search, strategies for the implementation of memory, hybridization with other metaheuristics, and some applications.

Tabu Search (TS),proposed by Glover (1977), is a technique based on an adaptive memory, which enhances the performance of a basic local search procedure, aiding to escape from local optima by accepting even non-improving moves. To prevent cycling to previously visited solutions, last moves are labeled as “tabu-active” during a predetermined number of iterations. However, good quality solutions that are currently tabu active may be visited under some criteria that are referred to as “aspiration criteria”. Comprehensive tutorials on Tabu Search are found in Glover and Laguna (1993) and (1997).

In the combined GRASP-Tabu Search algorithm, a solution is considered to be feasible if all the points are allocated to a district and the capacity limits with respect to both services (pickups and deliveries) are respected for all the districts, as established by constraints RG-(2), RG-(3) and RG-(4). Among all the solutions created and improved, the best one is reported as the final solution for a given instance. In case of ties, the solution that provides the lowest dispersion value for the workload content among districts is selected.

A key concept is the adjacency among points and districts, which is a condition that should be updated when a point is assigned or moved to a district. This requirement is imposed as part of the procedure with the aim of constructing districts of compact shape. A point is considered adjacent to a district if there exists at least an edge connecting the point with one of the points already allocated to the district. Knowledge of the adjacency helps to avoid unnecessary evaluations that may result in long computational times and also enhance compactness of the solution constructed. Each time that a point is assigned to a district, adjacency among districts needs to be updated.

a) Construction phase

We propose two main steps to construct the initial feasible solution: Selection of a set of m seeds and allocation of points to the districts formed by a seed. Throughout the procedure, every time that a point is assigned to a district, the adjacency among points and districts is updated. To enhance compactness, points are attempted to be assigned to the closest seed as long as adjacency conditions are fulfilled. In a number of iterations, points are attempted to be assigned to an adjacent district respecting capacity constraints. Then, if required, the remaining points are assigned to an adjacent district even if capacity constraints are violated and a local search procedure is applied with the aim of achieving feasibility. If no feasible solution is constructed, then the solution is discarded.

b) Local search phase

This procedure implements a TS short term memory with an aspiration criterion that allows a tabu active move only if the resulting solution is better than the current best solution. The search space consists of the solutions yielded after transferring a point between adjacent districts. The best solution found is reported after a number of iterations. In the case of ties, the solution with the less dispersion on the workload content of the districts is selected.

The neighborhood structure is a greedy approach that consists of a quick evaluation of all the feasible moves between adjacent districts. The best solution is selected and the corresponding move of a point is performed during each of the iterations. Given that the best move may result in a worse solution than current solution, during the procedure a list of the three best solutions is maintained. At the end of the procedure a final attempt is made to improve these three best solutions in hopes of finding a better solution with a small amount of additional effort. The overall best solution found is reported as the final solution for the given initial feasible solution.

#### 3.1.3. Resultsand discussion

To test the performance of the proposed solution procedure, a set of instances was randomly generated. All problem instances were solved on a 2.00 GHz Pentium processor with 2 GB of RAM running under Windows XP. Five different instance sizes were defined, which are classified by the number of points and districts: 50_5, 200_10, 450_15, 1000_20 and 1500_30. The instances were solved by the proposed heuristic and CPLEX 11.0.

Points were uniformly generated over a plane, and a set of edges was generated by forming a spanning tree and adding additional edges. Euclidean distances were computed only for the points connected by an edge, and for the rest shortest paths are found using the Floyd-Marshall algorithm (Floyd, 1962). Stopping times were fixed at a realistic value for all the instances generated, considering that the service activities are performed within an urban region and that a pickup usually requires more time than a delivery. Three levels of average speed were considered, assuming that all the vehicles assigned to the districts travel on average at the same speed over the entire service region: 25, 30 and 35 kilometers/hour. Two levels of capacity are defined: tight and less restricted. The relative weighting factor was varied over three values: λ=0.25, 0.5 and 0.75. Three replicates were generated for each of the five instance sizes. Each instance was solved varying the three values of the relative weighting factor, the two levels of capacity limits and the three values of speed resulting in a total of 53323 = 270 instances. A limit time of 3600 seconds was set for the instances, both for CPLEX and the heuristic.

For each instance solved by CPLEX and the heuristic, we compute a gap between the best integer solution reported by CPLEX (that in some cases corresponds to the optimal) and the heuristic. Positive gaps are obtained when CPLEX finds a better solution than the heuristic. A negative gap indicates that the solution found by the heuristic is better than the best integer solution found by CPLEX under the limit time that was set.

Table 1 presents the results of the heuristic with respect to CPLEX solutions by instance size in which we can observe that CPLEX found at least an integer solution only for the instances of size 50_5 and 200_10. The maximum, average, and minimum gap is shown. We can observe that CPLEX did not find the optimal solution for the 200_10 instances, for which the heuristic found a better solution. For the smaller size instances of 50_5, CPLEX found the optimal solution for almost all the instances. We can also observe that on average, the heuristic yielded small gaps, with a maximum gap of less than 8.7%.

For further research, we propose the formulation of a stochastic version of the problem and the analysis of different demand scenarios. A model containing chance constraints could also be formulated. The problem may also be solved as a bi-objective optimization problem to find the efficient frontier instead of a single solution. We also propose to analyze different metrics of the workload content of a district (such as the closest point or a centroid line hauls metric) and their effects on the performance of the heuristics. We could also analyze different metrics to measure the compactness of the districts. Another extension is to propose a decomposition approach in which the sub problems consist of defining each of the districts and the master problem selects a set of districts so that all the points are allocated to a district. We may also try to find a better mathematical formulation for the problem that may allow CPLEX to solve larger instances. We could also try to find a tight lower bound for the procedure.

SIZE | Metric | Computational Time | GAP | |

CPLEX | Heuristic | |||

50_5 | Max | 3603.24 | 0.563 | 0.087 |

Average | 3200.69 | 0.414 | 0.0158 | |

Min | 166.485 | 0.296 | 0 | |

200_10 | Max | 3609.93 | 16.437 | -0.197 |

Average | 4604.938 | 14.005 | -0.402 | |

min | 0.641 | 12.172 | -0.555 | |

450_15 | max | 140.048 | ||

average | 116.517 | |||

Min | 95.877 | |||

1000_20 | Max | 1173.827 | ||

Average | 1032.099 | |||

Min | 924.799 | |||

1500_30 | Max | 3829.187 | ||

Average | 3733.512 | |||

Min | 3608.015 |

### 3.2. Fleet design and customer clustering based on a hub &spokescosts structure approximation

#### 3.2.1. Problem description and mathematical formulation

Miranda and Garrido (2004a) propose an approach for the design of a fleet for delivery or distribution, considering a known set of depots or distribution centers, and including stochastic constraints for vehicle capacity for each zone or district. Two additional distinctive elements compared to the model presented in section 3.1 are that the number of zones or vehicles is a decision variable, and an approximated routing cost function is included based on a hub and spokes modeling structure, as stated in Section 2.5.2.

The model notation is the following:

X_{j}:Binary variable. 1 indicates that customer j is a hub.

W_{jl}:Binary variable. 1 indicates that customer l is assigned to cluster j.

D_{j}:Mean daily demand for the whole cluster j (variable).

V_{j}:Variance of the daily demand for the whole cluster j (variable).

RCap:Vehicles capacity (parameter).

_{j}: Mean of the daily demand for customer j (parameter).

_{j}^{2}:Variance of the daily demand of customer j (parameter).

TC_{ij}:Transportation cost between the customer j and depot i (parameter).

RC_{jl}:Transportation cost between the customer l and the hub j (parameter).

FC:Fixed daily cost due to operate each vehicle (parameter).

Z_{1-}:Standard normal value, accumulating a probability of 1- (parameter).

Potentially, each customer can be chosen as a hub, then, CR_{jl}must be defined for each pair of customers.Furthermore, in this paper we assume each hub is assigned to its closest depot or warehouse, representing a debatable assumption, but it suggests an important opportunity for future research: it is possible to integrate this model into a facility location problem, in which the optimal assignment of hubs to depots will be solved by the model, considering some kind of capacity constraint.

Then the cost of choosing the customer j as a hub is defined as:

Thus, the optimization model can be formulated as follows:

Expression MG-(1) represents the total system cost, considering a cost structure based on a Hub & Spokes approximation. The first term considers the total fixed cost associated to each vehicle and the transportation cost between the depots and the hubs. Note that each cluster is assigned to the nearest depot (with respect to its hub). The second term represents the total transportation cots between hubs and customers, grouped into the respective clusters.

Constraints MG-(2) assure that each customer is assigned to a single cluster. Note that each customer j can also be assigned to itself, with zero transportation cost. Constraints MG-(3) state that if some customer was not chosen as a hub, it is not possible assign customers to him. Constraints MG-(4) represent the stochastic capacity constraints, which assure that the probability of violating the vehicle capacity for each cluster does not exceed . These constraints assume normality for demand of clusters. Finally, constraints MG-(5) assure the integrality of the variables X and Y.

#### 3.2.2. Solution approach

The solution approach proposed comprises two subroutines: A construction-improvement local search heuristic and a Lagrangian relaxation-based algorithm to compute upper bounds to errors for heuristic solutions provided by the first subroutine.

a) Construction-improvement local search heuristics

The construction-improvement heuristic, defined by Steps I to V, iteratively improves (based on Steps III, IV and V) an initial feasible solution (obtained through Step I and II), using a local search algorithm.

Step I: Selecting an initial set of hubs.

Step II : Greedy assignment of customers to initial hubs.

Step III: 2-Opt hubs update within each cluster.

Step IV: 1-Opt customers interchange (between each pair of clusters).

Step V: 2-Opt customers interchange (between each pair of clusters).

b) Lower bounds with lagrangian relaxation

This section describes a Lagrangian relaxation (LR) approach used to obtain a lower bound for the optimal value of the SMDCCP. The LR technique gives the optimal value of the dual problem, which sets a lower bound for the optimal value of the primal SMDCCP. Furthermore, the difference between the optimal value of the dual problem and the primal objective function (found through the heuristic stated in the last section), represents an upper bound for the duality-gap and heuristic solution error.

The LR implemented in this paper relies on the subgradient method to update and optimize dual penalty variables, ^{1}, ^{2}, and^{3} (see Crowder, 1976, Nozick, 2001, and Miranda and Garrido, 2004b, among others).

Next the relaxation method is described, in which the constraints MG-(2), MG-(5) and MG-(6) are relaxed, obtaining M sub-problems, one sub-problem for each customer j. If ^{1}, ^{2}, and^{3} are the vectors of the dual variables associated with each relaxed constraints, the dual-lagrangian function can be written as follows:

Clearly, the problem of minimizing expression (5), in terms of X, W, D, and V, for fixed values of ^{1}, ^{2}, and ^{3}, is equivalent to solve one sub-problem for each cluster j, given by:

This problem can be easily solved, as described in Miranda and Garrido (2004a), by a procedure very similar to basic applications of lagrangian relaxation to standard facility location problems, as shown in Daskin (1995) and Simchi-Levi et al. (2003). This procedure, for a set of known values of ^{1}, ^{2}, and ^{3}, relies on observation of weather benefits (or negative costs) related to expressions _{j}. Aforementioned benefits are previously computed observing constraints (4)-(ii),(4)-(iii),and (4)-(iv), assuming X_{j}=1

#### 3.2.3. Results and discussion

The procedures described in previous sections were applied to a numerical example, considering 20 depots, and 200 customers. The customers were located randomly in a square area with sides 1,000 km long. The depots were uniformly distributed over this area. The daily fixed cost, FC, was set to $16, $19.2, $22.4, $25.6, $29.8 and $32, while the transportation costs (TC_{ij}and RC_{jl}) were estimated based on a unitary cost of 8 cents/km. The customers mean demands were randomly simulated around 14 units, and the variances were generated considering a coefficient of variation close to 1. For the vehicle capacities we considered values of 150, 170, 190, 210, 230 and 250 units, while the level of service for capacity constraints was fixed at 85%, 90%,and 95% (1.036, 1.282 and 1.645). Thus, we consider 108 instances.

Figure 5 and Figure 6 show the evolution of the objective function obtained by the heuristic, and through Lagrangian relaxation (dual bound), in terms of the fixed cost FC, for capacity values of 150 and 250, respectively. Each figure shows these results for level of service values of 85% and 95%. Firstly, we observe that both functions vary in a reasonable way in terms of fixed cost, capacities, and level of service. Second, we observe that the dual bound is always lower than the objective function of the heuristic solutions, with an optimal objective value between the heuristic and dual bound values. It must be noted that the difference between these functions is the sum of the error of heuristic and the duality gap (the difference between primal and dual optimums).Thus a small difference between these functions indicates that the solutions found are nearly optimal.

In terms of heuristic quality, Figure 7shows a histogram of error upper bound for the 108 instances considered, obtaining an average of 2.76%.It is worth noting that in 65.4% of the cases, we obtain anerror upper bound lower than 3%. Finally, only for 6.37 % of cases the error upper bound was greater than 5%, and in all cases lower than 7.5%.

## 4. Integrative approaches for supply chain network design

### 4.1. Hierarchical discussion of supply chain network design

In this final section, we discuss the districtingor customer clustering problem in a wider context, focusing on an integrative approach for addressing strategic network design and planning problems within the scope of Supply Chain Management (SCM) and Logistics. In this context, districting or customer clustering problems are usually conceived based on fleet design and vehicle routing considerations. More specifically, customer clustering decisions consist of assignment of customers into routing zones or vehicle routes, where the number of vehicles, zones, or clusters might be considered as an additional required outcome of the problem. For the aforementioned reasons, MIP based methodologies (modeling and solution techniques) arise as very common and widely studied approaches, mainly based on VRP modeling structure.

In terms of existent problems and state of the art literature and methodologies, Logistics and SCM comprises several problems at different hierarchical levels of decision making. Some problems at the strategic long-run level are production capacity planning and supply chain network design. At a tactical level, the most relevant examples are fleet design problemsand production and inventory planning. Finally, the operational short-run level includes daily routing decisionsand daily ordering and inventory decisions. For a thorough review of hierarchical levels and problems in SCM see Miranda (2004), Miranda and Garrido (2004c), Garrido (2001), Simchi-Levi et al. (2003), Coyle et al. (2003), Ballou (1999), Mourits y Evers (1995) and Bradley and Arntzen (1999).

One of the main problems in Logistics and SCM is Distribution or Supply Chain Network Design (SCND). This problem consists of finding optimal sites to install plants, warehouses, and distribution centers, as well as assigning the customers to be served by these facilities, and finally how theses facilities are connected with each other. One likely objective for these networks is to serve customer demands for a set of products or commodities, minimizing system costs and maximizing, or observing, specific system service levels. Usually, customers are geographically distributed in wide areas, requiring significant efforts for distributing their products from immediate upstream facilities (distribution centers and warehouses), typically based on a complex vehicle routing systems.

The specific problem that must be modeled and solved strongly depends on several features of the real application. Some examples are: customer requirements and characteristics, logistic and technological product requirements, geographic issues, and operational and managerial insights of the involved firms, among others elements.

Although SCND, along with its decisions and costs, has been considered as strategic in Logistics and SCM, it strongly interacts with other tactical and operational problems such as inventory planning, fleet design, vehicle routing, warehouse design and management, etc. However, standard and traditional approaches to tackle SCND might consider only a sequential approach, in which tactical and operational decisions are only attended once strategic decisions have already been solved. For example, inventory planning and control are solved only assuming the pre-existent locations. The same happens with fleet design and routing decisions, which are addressed only for each existent distribution center or warehouse. Several published works focus on specific SCND problems considering only strategic, tactical and operational viewpoint.

This sequential approach is described by Figure 8, considering routing and inventory decisions in addition to SCND problem, and considering the three hierarchical levels: strategic, tactical, and operational. As suggested by the dotted lines, several interactions among the decisions involved are not modeled, in contrast to the continuous lines, whichrepresent standard interactions usually modeled by a sequential approach.

Being consistent with this viewpoint, this section focuses on an integrative approach including tactical routing and inventory decisions into the SCND modeling structure, as suggested in Figure9, where continuous lines represent interactions modeled by the proposed approach. Naturally, it is possible to consider, at least for future research, the inclusion of operational costs and decisions within the framework; however, including these is expected to provide less significant results compared to the present proposal. Therefore, operational modeling is not considered in the proposal.

Mainly assuming the sequential approach stated in Figure 8, SCND is one of the most studied problems in SCM. Related literature includes numerous reports addressing diverse aspects of the general problem, considering a wide range of degrees of interactionbetween strategic and tactical decisions.

At the strategic level, facility location theory is one of the most commonly used approaches. For a comprehensive review of Facility Location Problems (FLP), see Drezner and Hamacher (2002), Drezner (1995), Daskin (1995), and Simchi-Levi et al. (2005). Traditional FLP consider deterministic parameters, demands, constraints, and an objective function within a mixed-integer modeling structure. However, based on the traditional FLP framework, it is hard to model interactions with other tactical and operational issues of SCM, such as inventory controland fleet design problems. These potential interactions are shown in Figure 9, where inventory control and vehicle routing decisions (within tactical and operational levels) interact with the strategic SCND problem.

Accordingly, any integrative approaches for coping with strategic network design problems should incorporate VRP decision and costs into Facility Location based models.

### 4.2. Standard facility location modeling structure

As stated in previous sections, Supply Chain Network Design problems are traditionally tackled within facility location literature, assuming a strategic perspective in its modeling structure and costs. In this framework, main decisions are modeled using binary decisions variables for selecting facilities and assigning customers to these facilities.The objective function expressed in (6) represents a typical cost function to be minimized in a FLP.

In this expression, M is the set of customers to be served, each one having an expected demand d_{j}, in the specific considered planning horizon; N is the set of potential sites to install warehouses; F_{i}is the total fixed cost when installing a warehouse in site i; R_{i}is the transportation unit costs from a single existent plant to each warehouse i, and T_{ij}is thefull-truckload transportation cost from each warehouse i to each assigned customer j; X_{i}is the binary variable that models locating decision on each site i; and finally Y_{ij}is the binary variable that models assignment decisions between each customer j and each potential warehouse on site i. First term of (6) represents warehouse fixed setup cost, and the second expression represents the system transportation costs, including plant-warehouse and warehouse-customer flows. According to these definitions, the network is entirely defined by the binary decisions variables X and Y.

Depending on the specific problem assumptions and features, some standard constraints that can be found in well known FLP, based on the previous notation, as follows:

Equation (7) assures that each customer j is served by exactly one warehouse. Equation (8) assures that customers are assigned to previously installed warehouses (X_{i}= 1). Equation (9) assures that the expected demand assigned to each warehouse does not exceed the corresponding warehouse capacity, WC_{i}. Finally, constraints (10)enforce the integrality (0-1) of the decisions variables. The highly unrealistic nature of this basic modeling structure is evident due to the lack of modeling of other cost elements and more general cost structures, such as those related to warehouses inventory and fixed ordering costs.

### 4.3. Simultaneous modeling of facility location and inventory planning decisions

The need for modeling interactions between locations and inventory decisions, as introduced in previous sections, has generated several works, yielding simultaneous inventory-location models to address SCND. For example, Miranda and Garrido (2004b), Daskin et al. (2002), Shu et al. (2005), Shen et al. (2003), and Erlebacher and Meller (2000), present similar of inventory-location models, along with different solution approaches.

In general terms, in all the above inventory-location models, the two costs terms of the equation (11) are incorporated into the objective function (6): expected safety stock costs, corresponding to the first term in (11), and expected cyclic inventory costs, second term in (11), both per time unit (daily, monthly or yearly).

Expression (11) is based on the following notation:

LT_{i}: Deterministic lead time when ordering from the warehouse i (parameter).

D_{i}:Mean daily demand to be assigned to the warehouse i (dependent variable).

V_{i}: Variance of the daily demand assigned to the warehouse i (dependentvariable).

HC_{i}: Inventory holding cost at warehouse i ($/unit-time).

OC_{i}: Fixed ordering cost at warehouse i ($/order).

Q_{i}: Order size for each warehouse i (items / order).

1 – : Service level associated to warehousesafety stocks, service level thatcorresponds to the probability that demand during lead time does notexceed the reorder point.

This service level is strictly related to the safety stock at each warehouse, given by expression_{1-} is the value of the Standard Normal distribution, which accumulates a probability of 1 – .

Dependent variables D_{i}and V_{i}are strictly related to, and defined by, the mean and variance of the assigned customer demand. Accordingly, expressions (12) and (13) link the mean and variance of warehouse demandsto the mean and variance of demand for each customer j, d_{j}and v_{j}, assuming independency.It is worth noting that these sets of constraints link facility location decisions (X and Y) to inventory decisions and costs as stated in equation(11).

Ozsen (2004) and Ozsen et al. (2008) propose a deterministic 100% service level constraint for inventory capacity, which requires the inventory capacity be observed every ordering period. In this constraint, maximum inventory levels are defined as the reorder point (initial inventory level just prior to an order) in addition to the order quantity. Consequently, this maximum level is required to respect inventory capacity, ICap, as stated in in (14).

The peak inventory level considered in (14), (RP_{i}+ Q_{i}) corresponds to the maximum inventory level when “no demand has arisen during the lead time”. Consequently, this constraint might be considered extremely protective, because a no-demand situation is, in general, quite unlikely. Considering previous observation, and based on Chance Constraint Programming, Miranda (2004) and Miranda and Garrido (2006, 2008) propose an inventory capacity constraint based on a probabilistic service level, in which a minimum probability, 1-, is required for observing inventory capacity. It is assumed that the maximum inventory level is a stochastic variable, as a consequence of stochastic nature of demand during lead-time, SD(LT_{i}), as shown in equation (15).

Miranda, (2004) and Miranda and Garrido, (2008) showed that this constraint can be reformulated as a deterministic nonlinear constraint (which assures that the probabilistic constraint is satisfied) as follows:

Finally, Miranda and Garrido (2009) propose an iterative approach to optimize inventory service level, based on the integrative inventory location models previously described. For a deep review and analysis of Stochastic Programming methodologies, see Birge and Louveaux (1997).

### 4.4. Simultaneous modeling of inventory, location and routing decisions

One of the problems and remaining inconsistencies of the previously described inventory location modeling structure, which as inherited from standard FLP literature, is related to the routing costs and the modeling structure considered in SCND models. As suggested by expression (6), transportation costs from warehouses to customers are modeled as a direct shipment - full truckload strategy or approximation, ignoring routing costs and decisions, particularly in less-than truckload situations.

Quite earlier, Webb (1968), Christofides and Eilon (1969), and Elion et al. (1971) made an explicit discussion about the error of not consideringexplicitlythe routing costs when warehouses serve assigned customers in FLP. More recently, Salhi and Rand (1989) analyze and evaluate the effects of ignoring routing costs and decisions. Their work, based on a sequential approach, shows that an effectively local optimal facility location solution (consistent to the first strategic stage in Figure 8 and Figure 9) does not necessarily represent the optimal solution when exact routing costs are included. In addition, and in agreement with these results, several works have been focused on simultaneous modeling of location and routing decisions and costs, in order to achieve simultaneously optimal solutions. Laporte (1988), Perl and Daskin (1985), and Min et al. (1998), present reviews of several formulation for different Location Routing Problems (LRP), considering deterministic demandsand standard VRP formulations of routing costs between warehouses and customers. For examples of different LRP formulations, along with related heuristic and exact solution approaches, see Laporte, Nobert and Taillefer (1988), Laporte and Nobert (1981), Laporte et al. (1986), Prins et al. (2006), and Prins et al. (2007).

Finally, recent works in Location-Routing literature are:

Albareda-Sambola et al. (2007) and Albareda-Sambola (2004): A stochastic LRP is proposed. The stochastic nature considered is focused on customer appearance, similar to the Probabilistic Traveling salesman Problem, PTSP (discussed later in section II.2), as introduced by Jaillet (1985-1988). The remaining costs, decisions and assumptions, are based on the previously described deterministic LRP, without considering inventory control decisions and costs. Additionally, fleet design and related demand zoning decisions are not considered as outcomes.

Orman (2005): In this research a LRP formulation with inventory decisions is proposedin which customers display stochastic demand, but whose appearance is considered deterministic (in contrast to Albareda-Sambola et al., 2007). Consequently, routing costs are modeled based on a standard VRP modeling structure. Fleet design decisions are considered assuming deterministic warehouse and vehicle capacity constraints.

Shen and Qi (2007): The authors proposed a model that deals simultaneously with inventory control and routing costs within a FLP model, considering nonlinear routing and inventory costs and stochastic demands. The proposed model is based on a continuous approximation of routing costs for a multi-vehicle distribution system. The model does not consider the number of vehicles or routing zones as decision variables. Warehouse and vehicle capacity constraints are not considered in the formulation.

Miranda et al. (2009): The authors propose an integrative model for addressing Inventory, Location and Customer Clustering, based on a Hub & Spokes cost structure, as in Koskosidis and Powell (1992) and Miranda and Garrido (2004a). The customer clustering decisions are aimed at defining a preliminary solution for fleet design within SCNDP.

### 4.5. A General integrative approach for customer clustering within SCNDP

Based on the discussion about different routing modeling structures presented in Section 2.4, and similar to the model presented in Miranda et al. (2009), the present section proposes a general modeling framework to include routing costs and clustering decisions within existent models in the facility location and inventory location literature.

Considering the aforementioned inventory location modeling structure of Section 4.3, a preliminary objective function proposed for a simultaneous inventory location model with routing costs and decisions might be as follows:

In expression (17), W is a matrix of binary decision variables, where element W_{jl}indicates if customer l is assigned to cluster j, which also defines the customer clusters of the distribution network; M is a variable set of customer clusters to be assigned to installed warehouses; d_{j}is the mean demand for each customer cluster, which is actually a dependent variable defined by matrix W; and finally SRC(W) represents System Routing Costs as a function of the decision variable matrix W. All other elements (parameters and variables) are as previously defined in Sections 4.2 and 4.3.

One advantage of this formulation is that it allows considering a general modeling structure for routing costs and decisions. For instance, when modeling routing costs viaa continuous approximation, we should include the following constraint:

where C_{j}is the variable set of customers included on each cluster j, A_{j}is the area defined by assigned customers, and RC_{j}is the respective approximation routing cost of each cluster j, as a function of C_{j}and A_{j}. For example, following Shen and Qi (2007), an acceptable expression for routing costs within a cluster with multiple vehicles, RC_{j}, might be as follows:

In (19), q is the capacity of the vehicles employed; n is the number of visits in a year; _{l}is the expected yearly demand for each customer l; d_{jl}is the distance between the cluster hub j and the customer l, and is a scale factor depending on the considered metric. For example, when a Euclidean metric is used, =0.75.

Furthermore, in case of a Hub & Spokes cost structure, the following constraints should be included, assuming an additional variable H_{l}, which indicates if customer l is selected as a hub of a cluster:

Equation (20) computes total routing costs within all clusters, in accordancewith the Hub & Spokes structure, and constraints (21) assure that customers are assigned to cluster hubs that are effectively selected.Notice that the consideration of a variable set of clusters should be consistent with facility location decisions, X and Y. For example, it is possible to include a constraint toensure that each cluster hub is assigned to a single warehouse, as stated in (22).

## 5. Conclusions

In this chapter we present an updated literature review for works related to districting or customer clustering problems, and we classify the different contributions according to the modeling and solution approaches. Additionally, we illustrate a wide variety of applications, models, solution approaches and design criteria, focused on logistic operations and supply chain network planning. One of the main difficulties found in the literature are the definitions and measurements of objectives and criteria, for achieving good districting configurations, which in general are much more difficult to specify, compared to other problems within Supply Chain Management and Supply Chain Network Design problems.

For instance, compactness is a common metric in districting,which implies to designdistricts assquare or circular as possible. As an estimate, some examples of compactness metrics observed in the literature are the length of the minimum spanning tree formed within each district, or the maximum distance between two points that belong to the district, or based on a Hub & Spokes structure (line-haul and inner transportation costs). None of these metrics is exact and only approximates the desire shape of the district, in contrast to other logistic problems (e.g. vehicle routing or facility location problems), in which a usual metric consists of a sum of costs. Hence there is room for improvements for estimating the performance metrics of a districting configuration.

We conclude that the representation of an instance based on a graph topology is also a difficult task, because there are no clear and obvious criteria for setting the node adjacency or the set of edges. Finally, we observed that only few works present a solution approach based on mathematical programming techniques, and most of the proposed methodologies are based on heuristic approaches.

Two application examples of Mixed Integer Programmingmodels are presented, both of them based on a location-allocation modeling structure.

The first example consists of a pickup and delivery parcel logistics districting problem for which a hybrid metaheuristic is proposed. The aim of the procedure is to optimize two criteria: balance the workload content among the districts and obtain districts of compact shape. Workload content is defined as the sum of the required time to perform the service (either pickup or delivery a package), the required time to travel from the depot to the district (line-haul cost), and an estimated time of inner transportation costs within the district. Compactness metric is defined as the travel time between the furthest apart points within the district. The mathematical formulation presented is a weighted sum of both criterion, with the aim of minimize the maximum workload and compactness metrics of a district. The solution approach is based on a GRASP and Tabu Search procedure with two phases: construction and local search. Numerical results showed a good performance of the algorithm with low computational times.

It is suggested as an extension of this work to analyze different modeling structures, especially for the compactness metric, which is approximately estimated by the maximum distance between a pair of points within the district. Also, provided the stochastic nature of the problem it is desired to extend the work and consider different demand scenarios instead of using a single and representative day and provide a more robust configuration. More realistic issues of the geographical region should be also considered, such as geographical barriers and the street configuration of the city.

A second example is presented, in which customer clustering is addressed focused on fleet design. In this problem, vehicle capacity constraints are explicitly modeled based on chance constraint programming with stochastic demands. Solution approach is based on lagrangian relaxation with valid inequalities, along with a two-step local search heuristic. This work shows that vehicle routing issues can be effectively considered as criteria to address districting problems, in which vehicle capacity and routing cost can be considered as relevant performance measurements to design districts or customer clusters. Additionally, it illustrates the usefulness of decomposition methods based on mathematical programming and dual relaxation, to address districting/customer clustering models. In both cases, modeling and solution approaches, we highlight the consideration of stochastic demands. In this work, a known set of existent warehouses is considered, assuming a greedy cluster-warehouse assignment criterion, yielding a promising modeling approach to be integrated in a more general framework to address strategic network design problems.

For districting and customer clustering problems, we propose as a further research to explore solution methods based on mathematical programming such as relaxation and decomposition approaches based on Column Generation, Lagrangian Relaxation, Branch and Cut, among others. This is mainly motivated from that few contributions of this type have been observed in the literature. We also propose to explore different modeling structures and define clear metrics to compare different districting configurations over a defined operation horizon. For these matters, simulation techniques might be employed. We also propose to explore stochastic programming models and approaches, such as chance constraint programming, scenarios analysis, and two stages stochastic program with resource.

As a further challenge we propose a basic and general framework to analyze the inclusion of logistics districting and customer clustering within strategic Supply Chain Network Design problems, integrating this type of decisions into some other strategic problems such as facility location. This framework should be based on vehicle routing consideration, along with the modeling of inventory planning and control considerations. This research might be useful to analyze the impacts of customer clustering and districting decision into strategic Supply Chain Network Design problems.