Area optimization by MSA for 2D packing

## 1. Introduction

The optimization techniques for integrated circuit (IC) layout design are important.Generally speaking, the basic process of modern hardware engineering includes designing, manufacturing and testing. IC layoutis an inevitable stage of designing before manufacturing. There are many applications which are directly related with layout optimization in practice, such asfloor planfor very-large-scale integration (VLSI) design, placementfor printed circuit board (PCB) design,packing for logistics management, and so on. In this research, we mainly focus on the optimization for three layout problems, which are 2D packing, 3D packing and 2D placement. The 2D/3D packing is to position different modules into a fixed shape, normally rectangular one, with area or volume minimization. The placement can be regarded as the packing problem with interconnect optimization. Since a general placement problem is NP-hard, there are no practical exact algorithms so far to be sure to find optimal solutions. As an alternative to get the optima, heuristics [1-6] are typically used to find near optimal solutionswithin a given runtime.

As product size keeps shrinking, product lifecycle keeps shortening and product complexity goes up, more electronic components will be integrated into a smaller IC chip or PCB with higher density and shorter time to market. At the same time, multi-objective optimization is common for IC/PCB layout in real product design, so another difficulty is the trade-off between conflicting objectives, such as low power and high performance. Pareto improvement for multiple objectives is one of the biggest challenges we have to face nowadays. The layout problem becomes much harder to find near-optimal or even acceptable solutions with high requirements. In orderto improve the best cases and mitigate the worst cases of IC/PCBlayout, it becomes increasingly criticaland urgent to improve the quality of solution and reduce runtime.

Simulated annealing based algorithm with a good representationfor2D/3D packing is one of the most popular ways to improve the quality of solution.On the one hand, many researches explored different representations [7-12], such as bounded-slice-line grid, sequence pair, FAST sequence pair, Q-sequence, selectedsequence pair, etc. In order to code and decode 3D-packing problem, sequence pair for 2D packing is extended to sequence triple and sequence quintuple, and it has been proved that sequence triple could represent the topology of the tractable 3D packing and there are at least one sequence quintuple which can be decoded to a topology as an optimal packing for volume minimization. But the effectiveness to improve solution quality and reduce runtime is quite limiteddue to huge solution space and complex solution distribution, even if a very good representation is used. The experimental results within a short runtime are still far from near-optimal solutions in real implementation to solve the packing problem.So it is the right time to explore new algorithms in order to solve 2D/3D problem more effectively.

There are many significant shortcomings of traditional heuristics for IC layout optimization. Let us take simulated annealing (SA) and genetic algorithm (GA) as an example. For SA, firstly, some slight modifications of solution are repeated to get a good convergence. Therefore, the global search is inefficient in general. It is disadvantageous to solve the problem with huge solution space, such as VLSI design. Secondly, SA does not use the past experience, including past good solutions and past moves, and it is a big informational waste. To speed up SA, some researchers [4] proposed two-stage SA for VLSI design. But the search speed is still quite slow, and it is not seriously considered to avoid or reduce informational waste. For GA, it evaluates too many candidates in order to get next generations. The evaluation takes too much runtime. Besides GA selects the next generation according to a ranking function, which is not always necessary but takes much time. So it is possible to improve the solution quality or reduce runtime if we can overcome the mentioned shortcomings.

In this research, a simulated annealing based approach [13-14], named mixed simulated annealing (MSA), is proposed to improve solution quality and reduce runtimeby overcomingthe shortcomings of inefficient global search and informational waste. Inmixed simulated annealing, a special crossover operator is designed to use a part of information from past good solutions and get higher improving efficiency, and the solutions gotten by the crossover are much better than random solutions.To evaluate the effectiveness and the reliability of the proposedmixed simulated annealing, we apply it to three mentioned optimization problems, i.e. 2D packing, 2D placement and 3D packing, and get considerable improvement for all three problems. The experimental results show the runtime, the packing ratio of area for 2D packing, the packing ratio of volume for 3D packing and two more objectives (low power and short maximal delay) for 2D placement are improved considerably by using MCNC, ami49_X and ami98_3D benchmarks. For example, the runtime of mixed simulated annealing with sequence quintuple representation is up to 4 times faster than that of 2-stage SA with the same representation, and the packing ratio of volume is improved by up to 12% within 100s runtime.

## 2. Simulated annealing for integrated circuit layout

Based onthe theory of statistical mechanics and the analogy between solid annealing and optimization problem, S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi [1] proposed simulated annealing algorithm in 1983. The annealing is to heat up a solid with a very high temperature and then to cool it down slowly until it reaches or approaches its minimum energy state. Each state of solid represents a feasible solution of problem. The energy of the state is the value of cost function to evaluate the solution. The state with the lowest energy corresponds to the optimal solution with the best value of cost function. SA is a stochastic algorithm with iterative improvement. Each iterative step consists of changing current solution to a new solution, named a move to neighbourhood. The acceptance probability of new solutions depends on the current temperature, which is scheduled from the highest temperature to the lowest temperature. An important point we have to mention here is that, if the physical process is to cool the solid down very quickly, it is known as quenching, instead of annealing. The difference between normal simulated annealing and simulated quenching is the parameter setting of temperature scheduling.

In detail, let *S* be the solution space with neighbourhood structure. For any solution S belongs to *S*, we define the cost function *C*(S), i.e. the total cost function (*C*_{t}) for multi-objective placement problem. A non-optimal solution S is defined by local optimum, if it can not reach better solution by moving to any neighbouring solution S′. That is to day, for any neighbour solution (S′) of local optimal solution (S), the inequality *C*(S) < *C*(S′) is always satisfied. The depth *D*(S) of local optimal solution is defined by the maximum value such that *D*(S) + *C*(S) > *C*(S′). The maximum depth of local optimal solution in *S* is denoted by d(*S*). Let *X*(*T*_{i}) be a variable of the cost function C(S) at each temperature *T*_{i}, where *i* is 0, 1, 2, …. Let *C*_{opt}be the minimum cost function. According to [2], the equality lim_{i→∞}*X*(*T*_{i}) = *C*_{opt} is satisfied with the following conditions: (1) The solution space *S* is finite and irreducible; (2) There exists an equilibrium distribution for the transition probability matrix; (3) *T*_{i} ≥ *T*_{i+1}and *T*_{i}> 0 for all *i*; (4) lim_{i→∞}*T*_{i} = 0; (5) *∑*_{i:∈(0, ∞)} [exp(-d(*S*)/*T*_{i})] = ∞.

In real implementation with a given finite runtime, we are using a fast geometric simulated quenching scheduling (*T*_{k+1} *= qT*_{k}*,0<q<1*) with repeated inside loop (*p*times)to enhance the efficiency of standard SA [3] as the following equation.

where *i* is the iterative step, *T*_{i} is the variable temperature at the *i*^{th} step, *T*_{0} is the initial temperature when *i* = 0, *p* is the inside loop number and *q* the temperature coefficient near but less than 1.

As shown in Figure 1, it is a typical flow chart of SA, which is used for layout optimization in this research. The initial solution is randomly produced or simply follows past layout designs. The temperature scheduling is used to change the current temperature (*T*). The parameters of the temperature scheduling include the starting temperature *T*_{0}, the ending temperature *T*_{e}, a temperature coefficient and an inside loop number. One of moving methods is selected with given probabilities, for example, the same probability for each moving method in real experiment (near 33% in the case of three moving methods). A new solution is tried by using the current selected moving method. The new solution is evaluated by a cost function (*C*) and compared with the old one. The new solution is accepted with a calculated acceptance probability *P* = exp[*-∆C/T*], which depends on the difference of cost function(*∆C*) and the current temperature (*T*). The probability *P* is between 0 and 1. The temperature coefficient between 0 and 1 is set to control the speed of temperature reduction. The inside loop number is set to control the repeated moves for each *T*. If the new solution is improved (*∆C < 0*), then *P* = 1, and the best recorder will be implemented: If the new solution is better than the current best, the best record will be replaced. If rejected, the current solution will go back to the old one and continues the next temperature scheduling until reaching the lowest temperature Te. The output is the latest best record.The real implementation of SA algorithm depends on four basic definitions: (1) solution representation, (2) moving methods, (3) cost function, (4) temperature scheduling.

For the solution representation, 2D/3D topology for IC layout is defined by the orthogonal coordinate system, as shown in Figure 2, and is representedby sequence pair for 2D general cases, sequence triple for 3D simple cases or sequence quintuple for 3D general cases. Each layout in 3D general cases is regarded as a set of the relations of relative location between boxes, i.e. “Top-Bottom”, “North-South” and “West-East” (TB-, NS- and WE-) relations. The coding and the decoding are based on TB-, NS- and WE- relation corresponding to the order of modules. In Figure 2, box *m*_{2} is on the west of box *m*_{3}, i.e. WE-relation, since the x-coordinate of any part of box *m*_{2} is always smaller than or equal to that of any part of box*m*_{3}. Similarly box *m*_{1} is on the north of box*m*_{3}, i.e. NS-relation,and box *m*_{2} is on the top of box*m*_{1}, i.e. TB-relation. The 3D packing can be represented by *Γ*^{1}(*m*_{2}*,m*_{1}*,m*_{3}), *Γ*^{2}(*m*_{1}*,m*_{3}*,m*_{2}),and*Γ*^{3}(*m*_{1}*,m*_{2}*,m*_{3})according to the coding rule. The detail of representation will be introduced in section 5.The solution space of all mentioned representation is finite, instead of infinite solution space of original layout problem. All solutions decoding by the mentioned representations are feasible.

For the moving methods, let us take 2D placement as example. Three basic moving methods with small changes are designed to change the current solution by using the sequence-pair representation. The “rotation” changes the orientation of a module. The “exchange” exchanges the order of two modules in all sequences. The “move” changes the order of a module in one of sequences.The detail of each moving method will be discussed in section 6 and section 7.

For the cost function in the case of 2D placement, the total value (*C*_{t}) includes the dynamic power function (*C*_{p}), the maximal delay function (*C*_{d}) and the bounding area function (*C*_{a}). The estimation for each cost function will be discussed in section 8.

For the temperature scheduling, the starting temperature *T*_{0}, the ending temperature *T*_{e}, a temperature coefficient and an inside loop number are set according to the size of module number and the requirement of solution quality. As a reference, a set of parameters in our experiment is set as follows: *T*_{0} = 100000, *T*_{e} = 10, Inside loop number *p*= 500, Temperature coefficient *q*= 0.98.

## 3. How to improve traditional simulated annealing

There are at least two shortcomings which impact the search speed of traditional SA. (1) Inefficient global search: In order to assure a final convergence effectively, the moving methods with relatively small changes should be used, so the global search within a short runtime is quite limited. Even using higher temperature, it is still slow to explore the huge search space of layout problem. (2) Informational waste: It does not use the information of past experience, including past solutions and past moves. It is quite possible to get a very good configuration of past solutions at the beginning but to lose it at last.

First of all, to overcome the shortcoming of inefficient global search, a two-stage algorithm is considered as follows. The first stage is named rough search, and the second stage is named focusing search. The rough search tends to big changes, such as crossover from genetic algorithm, to improve global search ability, while the focusing search tends to small changes, such as exchange, move and rotation, to get final convergence and reach better near-optimal solution.

Secondly, to overcome the shortcoming ofinformational waste, a special crossover operator from genetic algorithm, which reuses the information of past solutions, is considered. Comparing with random operator, the crossover operator has a search direction, which is based on the configuration from past good solutions, by use a part of configuration of the current best to reduce the informational waste. Besides, a guide with the probabilities to select running method adaptively according to the short-term improving speed is also considered.

In real implementation of 2-stage SA, the temperature scheduling of the second stage is same withthat of the first stageusing a geometric scheduling (*T*_{k+1} *= qT*_{k}*,0<q<1*) with repeated inside loop (*p*times) as the equation (1). The only differenceis the parameter setting. In the second stage, *T'*_{i} (instead of *T*_{i} in the first stage) is the variable temperature at the *i*^{th} step in the second stage, *T'*_{0} (instead of *T*_{0}) is the initial temperature of the second stage, *p'* (instead of *p*) is the inside loop number of the second stage and *q'* (instead of *q*) the temperature coefficient of the second stage. The detailed temperature scheduling setting depends on the requirement of runtime. As a reference, two sets of the parameters are set separately as follows: *T*_{0}= 100000, *T*_{e}= 100, *p*= 1000, *q*= 0.98, *T'*_{0}= 1000, *T'*_{e} = 1, *p'*= 1000, *q'*= 0.98. For different benchmarks, the inside loop number can be increased from 1000 to 2000, 5000, etc. Also the temperature coefficient can be closer to 1, such as 0.99, 0.995, etc. The parameter setting with a given runtime is adjusted by selecting the initial temperature, the final temperature, the temperature coefficient and the inside loop number.

## 4. Mixed simulated annealing

By overcoming the mentioned shortcoming, we proposed a mixed simulated annealing (MSA) to speed up traditional simulated annealing (SA) and 2-stage SA. The basic idea is to improve the global search ability and to speed up the search process by a special crossover operator, which uses the information of past solutions. Just like SA and 2-stage SA, MSA is an iterative improvement method and a stochastic algorithm. The main difference between 2-stage SA and MSA is the special crossover operator to use a part of configuration of the current best and to reduce the informational waste. Although we can get a rough solution by producing solutions randomly,it is with low improving efficiency. The proposed crossover operator has a search direction, which is based on the configuration from past good solutions, to get high improving efficiency.

The intuitive comparison shows several intuitive advantage of MSA comparing with traditional SA and 2-stage SA. For global search ability, 2-stage SA is better than traditional SA due to big changes in the first stage, and MSA is even better than 2-stage SA due to the crossover operator and even bigger changes in the first stage. Traditional SA and 2-stage SA do not use past experience, while MSA is using past good solutions by using the crossover operator.

## 5. Application to integrated circuit layout optimization

The detailed IC design process includes system specification, architectural design, functional design, logic design, circuit design, layout design and verification. The layout is near the last stage of IC design, and it is a critical stage of electronic product development. As one of the key steps of IC layout, the placement has big impact on the overall quality of IC chip.

### 5.1. Problem definition

Let us start with the formulation of 3D placement. Let *M =* {*m*_{1}, *m*_{2},*..**.*, *m*_{n}} denote the modules or blocks to be placed, where *n* is the number of modules. Each *m*_{i}, where *1*≤*i*≤*n*, has height *h*_{i}, length *l*_{i} and width *w*_{i}. The packing volume is defined by the minimum bounding rectangular parallelepiped including all modules. For the placement, we need to optimize the interconnect networks. Let *N =* {*n*_{1}*, n*_{2}*, …, n*_{l}} be the set of interconnect nets between modules, where *l* is total net number. Let *len*_{i} denote the estimated wire length of each net *n*_{i}, *1*≤*i*≤*l*. Let *P*_{i}denote the estimated dynamic power, i.e. the interconnect power of net *n*_{i}. Let (*x*_{i}*, y*_{i}*, z*_{i}*, r*_{x-i}*, r*_{y-i}*, r*_{z-i})be the location and rotation on 3D orthogonal coordinate system for each module *m*_{i}, *1*≤*i*≤*k*, where (*x*_{i}*, y*_{i}*, z*_{i}) means the coordinates of the below-rear-left corner of module *m*_{i}, and (*r*_{x-i}*, r*_{y-i}*, r*_{z-i}) denotes the rotation (0, 1) of *m*_{i} on yz-, zx- and xy- plane. *r*_{z-i} =1 is the normal state of modules, while*r*_{z-i} =0 is rotated by 90 degree.

In short, The input is a set of modules *M =* {*m*_{1}*, m*_{2}*,...*} with height, length and width {(*h*_{1}*, l*_{1,}*w*_{1})*,* (*h*_{2}*, l*_{2,} *w*_{2})*,...*} and a net list *N =* {*n*_{1}*, n*_{2}*,.*..}. The constraint is no overlap between *m*_{i}and *m*_{j}, where *i*≠*j*. The output is a set of location and rotation for each module{(*x*_{1}*, y*_{1}*, z*_{1}*, r*_{x-1}*, r*_{y-1}*, r*_{z-1})*,* (*x*_{2}*, y*_{2}*, z*_{2}*, r*_{x-2}*, r*_{y-2}*, r*_{z-2})*,...*} such that:(1) Minimize the power consumption; (2) Minimize the maximal delay; (3) Minimize the volume of bounding box.

The 3D-packing problem is a special case of 3D placement with no consideration of power and delay. The 2D placement problem is regarded as a special case of 3D placement with z=0, as shown in Figure 3. The 2D-packing problem is a special case of 3D packing with z=0. All of the mentioned three problems are formulated so far.

### 5.2. Problem representation

The original packing or placement is with infinite solution space. The coding and decoding method is generally needed to connect the problem and its representation. The solution space of a good representation should be finite. And the solutions after a good representation should be feasible and be better to include at least one optimal solution. Sequence quintuple can be used to represent a general 3D-packing topology, but the solution space of sequence quintuple is quite large. Furthermore, sequence quintuple representation is simplified to sequence triple representation, which can be decoded to a relatively simple 3D topology. In the case of 2D-packing topology, sequence pair, which can be simplified fromsequence triple, is used to represent a general 2D packing in this research.As an example, the coding from Figure 3 to Figure 6 can be gotten by using the coding-decodingtransition methodin Figure 4 and Figure 5, which are based onNorth-South and West-East relation corresponding to the order of modules in sequence pair as follows.

In order to get a positive sequence Γ+, each West-South corner of module connects to the West-South corner of the whole layout, and each East-North corner of module connects to the East-North corner of the whole layout without any intersection as shown in Figure 4. We can get a sequence (*m*_{3},*m*_{2},*m*_{4},*m*_{1},*m*_{5}) corresponding to (0,1,2,3,4)from left side (i.e. two red points in Figure 4) to right side (i.e. two blue points in Figure 4). The positive sequence Γ+ is changed from the West-North corner to the East-South corner, i.e. from 0 to *n*-1 in Γ+, where *n* is the total number of modules. By using this coding method, we can get the whole sequence Γ+ as shown in Figure 5, which isΓ+ (*m*_{3},*m*_{2},*m*_{4},*m*_{1},*m*_{5}). Similarly, in order to get Γ-, each West-North corner of module connects to the West-North corner of the whole layout, and eachEast-Southcorner of module connects to the East-South corner of the whole layout without any intersection.As shown in Figure 5, we can get (*m*_{1},*m*_{2}, *m*_{5},*m*_{3},*m*_{4}) as Γ-using the same method to get Γ+. The negative sequence Γ- is changed from the West-South corner to the East-North corner, i.e. from 0 to *n*-1 in Γ-. So far, we get the coding of a general 2D-packing topology based on North-South and West-East relation corresponding to the order of modules in sequence pair, as shown in Figure 6.

Generally, a 3D-packing topology is defined by the orthogonal coordinate system (*x, y, z*) in sequence triple and sequence quintuple representations. It is regarded as a set of the relations of relative location between boxes, i.e. “Top-Bottom”, “North-South” and “West-East” (TB-, NS- and WE-) relations. Let (*m*_{i}T *m*_{j}) denote that *m*_{i} is on the top of *m*_{j}. Similarly, (*m*_{i}N *m*_{j}) and (*m*_{i}W *m*_{j}) denote NS- and WE-relations. The rule of symmetry to be followed is that (*m*_{i}T *m*_{j}) is the same relation as (*m*_{j} B *m*_{i}). That is to say, the topology should be reversely decoded if the order of labeling is reversed.

We define the notation ofsequence pair, sequence triple and sequence quintuple as follows. Let (*Γ*^{ i}[0], *Γ*^{i} [1],..., *Γ*^{i} [*n*-1]) be the components of *Γ*^{ i}. Let *F*^{i}(*m*_{j})be the order of *m*_{j} in sequences *Γ*^{ i}. For example, if *Γ*^{ i}[*l*] is *m*_{j}, then *F*^{i}(*m*_{j}) *= l*. So the order of *m*_{j} can be represented by (*F*^{1}(*m*_{j}), *F*^{2}(*m*_{j}),...). In general, let *A+B* be the sequence which is the concatenation of *A* and *B*, and *A-B* be the sequence obtained from *A* by removing all the elements in *B*, where *A* and *B* are sequences. Let us denote *A*[*i*, *j*], where *i*<*j*, as the sequence (*A*[*i*], *A*[*i*+1],..., *A*[*j*]), where *A* = (*A*[0], *A*[1],..., *A*[*n*-1]).

In case ofsequence pair, the two sequences generate a finite solution space which includes at least one optimal solution of 2D packing for area optimization by decoding. Sequence pair defines (*m*_{i}W *m*_{j}) when

*F*^{1}(*m*_{i}) <*F*^{1}(*m*_{j}) and *F*^{2}(*m*_{i}) <*F*^{2}(*m*_{j})

It defines (*m*_{i}N *m*_{j}) when

*F*^{1}(*m*_{i}) <*F*^{1}(*m*_{j}) and *F*^{2}(*m*_{i}) >*F*^{2}(*m*_{j})

For a given packing with *n* modules, the solution space is (*n!*)^{2}. If the rotation of the module is not fixed, then the solution space will increase to (*n!*)^{2}*2*^{n}.

In case ofsequence triple, it consists of three independent sequences *Γ*^{ i}*,* where 1≤*i*≤3. The coding and the decoding are based on TB-, NS- and WE- relation corresponding to the order of modules. sequence triple defines (*m*_{i}W *m*_{j}) when

*F*^{2}(*m*_{i}) >*F*^{2}(*m*_{j}) and *F*^{3}(*m*_{i}) <*F*^{3}(*m*_{j}).

It defines (*m*_{i}N *m*_{j}) when

*F*^{1}(*m*_{i}) <*F*^{1}(*m*_{j}), *F*^{2}(*m*_{i}) <*F*^{2}(*m*_{j}) and *F*^{3}(*m*_{i}) <*F*^{3}(*m*_{j})

It defines (*m*_{i}T *m*_{j}) when

*F*^{1}(*m*_{i}) <*F*^{1}(*m*_{j}), *F*^{2}(*m*_{i}) >*F*^{2}(*m*_{j}) and *F*^{3}(*m*_{i}) >*F*^{3}(*m*_{j})

For a given packing with *n* modules, the solution space is (*n!*)^{3}. If the rotation of the module is not fixed, then the solution space will increase to (*n!*)^{3}*2*^{3n}.

However, sequence triple does not cover all kinds of topology of 3D packing. As shown in Figure 7, (*m*_{4}N *m*_{1}) and (*m*_{1}N *m*_{6}) lead to (*m*_{4}N *m*_{6}). At the same time, (*m*_{6}W *m*_{2}) and (*m*_{2}W *m*_{4}) lead to (*m*_{4}W *m*_{6}). The pair (*m*_{4}, *m*_{6})is conflicting with the rule of uniqueness, i.e. each pair of modules should be assigned with a unique topology. That meansthe packing can not be represented bysequence triple. As a result, the topology with the minimum volume might not be covered bysequence triple.

In case of sequence quintuple, it consists of five sequences *Γ*^{ i}*,* where 1≤*i*≤5. Sequence quintuple generates a finite solution space which includes at least one optimal solution of 3D packing for volume optimization by decoding. sequence quintuple defines (*m*_{i}W *m*_{j}) when

*F*^{1}(*m*_{i}) <*F*^{1}(*m*_{j}) and *F*^{2}(*m*_{i}) <*F*^{2}(*m*_{j})

It defines (*m*_{i}N *m*_{j}) when

*F*^{3}(*m*_{i}) <*F*^{3}(*m*_{j}) and *F*^{4}(*m*_{i}) <*F*^{4}(*m*_{j})

It defines (*m*_{i}T *m*_{j}) when

F^{5}(m_{i}) < F^{5}(m_{j})

where *m*_{i} and *m*_{j} is overlapping in the projected xy-plane after WE- and NS- decoding. For a given packing with *n* modules, the solution space is (*n!*)^{5}. If the rotation of the module is not fixed, then the solution space will increase to (*n!*)^{5}*2*^{3n}.

## 6. Moving methods for SA, 2-stage SA and MSA

According to section 5.2, we design the following moving methods.First of all, three basic moving methods,which are named by rotation, exchange and move, are used in the focusing search. Based on the basic methods, the group rotation and the group exchange are also used as two of moving methods in the rough search. They are repeatedly used the rotation and exchange operator with a given number. The groups are randomly selected modules for rotation or pairs of modules for exchange.

In detail, the rotation changes the orientation of a module. When a rotation is applied to module *m*_{i}, *r*_{i} is changed to 1 *- r*_{i}. As an example shown in Figure 8, if a rotation is applied to module*m*_{4}, *r*_{4} is changed to 1 –*r*_{4}.With respect to 3D packing, *r*_{i} is randomly selected from *r*_{x-i}*, r*_{y-i}*,* and *r*_{z-i}.

The exchange moving method exchanges the order of two modules in *Γ*^{i}, where *Γ*^{i} corresponds to all sequences, i.e. the sequence pair (*Γ*^{+}*, Γ*^{-}), the sequence triple (*Γ*^{1}*, Γ*^{2}*,Γ*^{3}) or the sequence quintuple (*Γ*^{1}*, Γ*^{2}*, Γ*^{3}*, Γ*^{4}, *Γ*^{5}). When an exchange is applied to module *m*_{i} and *m*_{j} with sequence triple representation, *F*^{1}(*m*_{i}), *F*^{2}(*m*_{i}), *F*^{3}(*m*_{i}), *F*^{1}(*m*_{j}), *F*^{2}(*m*_{j}) and *F*^{3}(*m*_{j}) are changed to *F*^{1}(*m*_{j}), *F*^{2}(*m*_{j}), *F*^{3}(*m*_{j}), *F*^{1}(*m*_{i}), *F*^{2}(*m*_{i})and *F*^{3}(*m*_{i}), respectively. In the case ofsequence pair, *F*_{+}(*m*_{i}), *F-*(*m*_{i}), *F*_{+}(*m*_{j}), and *F-*(*m*_{j}) are changed to *F*_{+}(*m*_{j}), *F-*(*m*_{j}), *F*_{+}(*m*_{i}), and *F-*(*m*_{i}), respectively. For example, if the modules *m*_{3} and the module *m*_{5} are operated by the exchange in Figure 9, then(*F*_{+}(*m*_{5}), *F*_{-}(*m*_{5}))is changed from (4, 2) to (0, 3), and (*F*_{+}(*m*_{3}), *F*_{-}(*m*_{3})) is changed from (0, 3) to (4, 2).

The move changes the order of a module in*Γ*^{i}. When a move is applied to module *m*_{i} in *Γ*^{i}, *F*^{i}(*m*_{i}) is changed to another value, say *j*, and the orders of modules whose order is between *F*^{i}(*m*_{i}) and *j* are shifted accordingly. For example,, if the operation is to move *m*_{5} to *F*_{-}(*m*_{5}) *=* 0 in *Γ*^{ -}, the move will lead to *F*_{-}(*m*_{1}) *= F*_{-}(*m*_{1}) *+* 1 and *F*_{-}(*m*_{2}) *= F*_{-}(*m*_{2}) *+* 1, i.e.*Γ*^{-}(*m*_{1,}*m*_{2,}*m*_{5,}*m*_{3,}*m*_{4})is changed to *Γ*^{-}(*m*_{5,}*m*_{1,}*m*_{2,}*m*_{3,}*m*_{4}) as shown in Figure 10.

## 7. Crossover operator for MSA

Besides, a special crossover operator is designed to generate a new solution from the current solution and the best solution so far in the rough search based on the representation in section 5.2. The margin and centre of the new solution (child) inherit the margin of the current solution (father) and the reversed centre of the best solution (mother), respectively. The reason to reverse the best solution is to get a different solution even two given solutions are the same solution.

For the detail of crossover, two sequences *Γ*^{+} and *Γ*^{–} are selected randomly from (*Γ*^{1}*, Γ*^{2}*,Γ*^{3}) or (*Γ*^{1}*, Γ*^{2}*, Γ*^{3}*, Γ*^{4}, *Γ*^{5}) for 3D packing. Let us denote the father as (*Γ*_{f}^{+}, *Γ*_{f}^{–}) which is selected from the current solution. The mother (*Γ*_{m}^{+}, *Γ*_{m}^{–}) is from the best solution so far. A number *i* is an integer randomly produced between 1 and *k/2*–1. The child of sequence pair (*Γ*_{c}^{+}, *Γ*_{c}^{–}) is given by *Γ*_{f}^{+}[*0, i*] + *Γ'*_{m}^{+}+ *Γ*_{f}^{+}[(*k*–*i*–1), *k*–1] and *Γ*_{f}^{–}[*0, i*] + *Γ'*_{m}^{–}+ *Γ*_{f} ^{–}[(*k*–*i*–1), *k*–1], where *Γ'*_{m}^{+} and *Γ'*_{m}^{–} are the inverse of *Γ*_{m}^{+} – *Γ*_{f}^{+}[*0,i*] – *Γ*_{f}^{+}[(*k*–*i*–1), *k*–1] and the inverse of *Γ*_{m}^{–} – *Γ*_{f}^{–}[*0,i*] – *Γ*_{f}^{–}[(*k*–*i*–1), *k*–1], respectively.

To make it clearer,let us take an example to explain the crossover operator. As shown in Figure 11, the left layout is represented by *Γ*^{+}(*m*_{3}*,m*_{4}*,m*_{2}*,m*_{1}*,m*_{5}) and *Γ*^{ -}(*m*_{5}*,m*_{1}*,m*_{2}*,m*_{3}*,m*_{4}*)* as the father, which is the capital “F“ in Figure 12. The right one is *Γ*^{+}(*m*_{1}*,m*_{3}*,m*_{4}*,m*_{2}*,m*_{5}) and *Γ*^{ -}(*m*_{3}*,m*_{4}*,m*_{1}*,m*_{5}*,m*_{2}) as the mother, which is the reversed capital “M“ in Figure 12. If we assume the *i* be 1, the child will be the layout *Γ*^{+}(*m*_{3}*,m*_{2}*,m*_{4}*,m*_{1}*,m*_{5}) and *Γ*^{ -}(*m*_{5}*,m*_{2}*,m*_{1}*,m*_{3}*,m*_{4}) as the right layout of Figure 12*,* where *Γ*^{+}(*m*_{3}*,..., m*_{5}) and *Γ*^{ -}(*m*_{5}*,..., m*_{4}) are from the father as the margin of left picture of Figure 12, and *Γ*^{+}(*...,m*_{2}*,m*_{4}*,m*_{1}*,...*)and *Γ*^{ -}(*...,m*_{2}*,m*_{1}*,m*_{3}*,...*) are from mother with an inverse order as the centre of left picture of Figure 12.

## 8. Objectives and cost function

To solve multi-objective problem, we are using the total cost function, which includes area of bounding rectangle for 2D case, volume of bounding box for 3D case, interconnect power and maximal delay. Especially, the interconnect power and themaximal delay are two typical conflicting objectives, which need to experiment carefully to satisfy the requirements in real product design.

For the multi-objective optimization of 2D placement in this research, three different objectives are defined by one formula as follow.

where *α+β+γ=1*, and *C*_{t}is the total cost function, which includes the power function *C*_{p}, the delay function *C*_{d}and the area function *C*_{a}. And α, *β, γ* can be user-defined. As mentioned, *C*_{p} and *C*_{d}are normally conflicting in real implementation. That is to say, good *C*_{p} may lead to bad *C*_{d}, so we have to consider the trade-off between *C*_{p} and *C*_{d} using a lot of random values of α and *β*.

For power estimation, the dynamic power of a net *n*_{i}is proportional to *C(i)*, *V*_{dd}*(i)*^{2}, *f(i)* and *S(i)*, where *C(i)* is the capacitance of a net, *V*_{dd}*(i)* is the voltage of power supply, *f(i)* is the clock frequency, and *S(i)* is switching probability of the net. Normally *C(i)* is proportional to the length of net, so let *Len*_{i}represent its value. In case of no information, let us assume that *V*_{dd}*(i)* and *f(i)* are same for each net and *S(i)* is randomly defined from 0 to 1. So the interconnect power is simplified as the function of *Len*_{i}and *S(i)*.

For performance estimation, the maximal delay among all nets is used. The delay is defined by the wire length of nets. To get the wire length estimation for each net, the half perimeter wire length (*HPWL*) is used for the approximation of wire length. Given any net *n*_{i}, connected with modules{*m*_{1}*, m*_{2}*,..., m*_{s}}, *HPWL* is half perimeter of the minimum bounding box for all centres of module *m*_{i}, where *i* is an integer from *1* to *s*. In case of *r*_{i}=1, *HPWL*[*n*_{i}] is given by *max*[*x*_{i} *+ h*_{i}*/2*] *– min*[*x*_{i} *+ h*_{i}*/2*] *+ max*[*y*_{i} *+ w*_{i}*/2*] *– min*[*y*_{i} *+ w*_{i}*/2*]. So *HPWL*[*n*_{i}] is gotten from(*h*_{i}*, w*_{i}*, n*_{i})*,* (*x*_{i}*, y*_{i}*, r*_{i}). The power and the delay are estimated so far.

For the objective of 2D packing, the area estimation is the minimum bounding rectangle including all modules, which is the total height *H* multiplied by the total width *W*. In practical implementation, we use a relative value as the cost function of area, i.e. the bounding area divided by the area of total modules, because any value with unit would not be scalable to use the experiments by diverse benchmarks.

For the objective of 3D packing, instead of 2D case, the volume estimation is given by the minimum bounding rectangular parallelepiped including all modules, which is the total height *H* multiplied by the total width *W* and multiplied by the total length *L*. In real implementation, the cost function is also using the relative value of volume.

## 9. Experiment and comparison

To evaluate the effectiveness and reliability of the proposed MSA in practice, a set of experiments was implemented, comparing with traditional SA and 2-stage SA. In the case of 2D packing and placement, we are using ami49_X and MCNC benchmarks. The ami49_X is produced by duplicating ami49 circuit X times. In the case of 3D packing, ami98_3D benchmark is produced by inheriting the height and width of 2D ami49_2 benchmark and randomly getting the length between the given minimum and maximum dimensions. The implementation for 3D packing is to compare MSA with the mentioned 2-stage SA. MSA is implemented in Python environment on 2.16GHz PC with 3.00GB memory. For a fair comparison, SA and 2-stage SA is also implemented at the same machine. The maximum runtime is within 14,400s (4 hours) each time.

For area optimization of 2D packing, let γ be 1 and α+β be 0 in the cost function. As shown in Table 1, the best, average and worst cases among 50 trials are gotten. The comparison of solution quality and runtime between SA and MSA is gotten. MSA reduced near 21% runtime with better solution quality. As shown in Table 2, a near log-linear trend of average improvement rates from SA to MSA is gotten. That means MSA should be more suitable for the placement with a larger number of modules.

For interconnect optimization of 2D placement, let γ be 0 and α+β be 1 in the cost function. The experiment is using ami49_X benchmarks. To get the figures, α is randomly produced from 0.1 to 0.9. 240 solutions are tested for comparison. For all tested ami49_X with X from 1 to 12, block number from 49 to 588, and net number from 408 to 4896, the improved results are gotten. Figure 13 shows that MSA obtains at least 13% Pareto improvement with the constraint of less than 108.2% maximal delay. To get the worst cases, we tested more 120 solutions with α given by 0.1, 0.3, 0.5, 0.7, and 0.9. As shown in Figure 14, MSA got near 6% worst-case mitigation on average for the interconnect power with no degradation of maximal delay.

Benchmarks | Best (mm2) | Average (mm2) | Worst (mm2) | Runtime (s) |

apte | 47.08 | 47.36 | 47.67 | 3.2 |

xerox | 19.80 | 20.50 | 21.21 | 1.5 |

hp | 9.03 | 9.17 | 9.34 | 2.3 |

ami33 | 1.18 | 1.23 | 1.29 | 17 |

ami49 | 36.91 | 37.79 | 38.83 | 37 |

ami49_2 | 73.58 | 75.48 | 77.38 | 142 |

ami49_4 | 147.3 | 151.1 | 155.8 | 547 |

Benchmarks | Solution(mm2) | Runtime (s) | Improvement (%) | |||

SA | MSA | SA | MSA | Solution | Runtime | |

apte | 47.38 | 47.36 | 4.1 | 3.2 | 0.04% | 22% |

xerox | 20.51 | 20.50 | 1.9 | 1.5 | 0.05% | 21% |

hp | 9.18 | 9.17 | 2.7 | 2.3 | 0.11% | 15% |

ami33 | 1.24 | 1.23 | 22 | 17 | 0.52% | 23% |

ami49 | 37.96 | 37.79 | 45 | 37 | 0.48% | 18% |

ami49_2 | 75.98 | 75.48 | 194 | 142 | 0.71% | 27% |

ami49_4 | 152.2 | 151.0 | 720 | 547 | 0.88% | 24% |

For volume optimization of 3D packing, we compare the computational performance of volume ratio using 2-stage SA and MSA. The results show considerable improvement from 2-stage SA to MSA. The improvement of packing ratio is between 2% and 7% for sequence triple representation. The improvement for runtime is up to 3 times, as shown in Figure 15. With regard to sequence quintuple representation, the experiment also shows the improvement from 2-stage SA to MSA. The improvement of packing ratio is between 3% and 12%. The improvement for runtime is up to 4 times with the sequence quintuple representation, as shown in Figure 16. The packing ratio of volume is improved by near 7% with less than 100s runtime, if we select MSA with sequence triple representation, instead of 2-stage SA with the same representation. The packing ratio of volume is improved by near 12% with less than 100s runtime, if we select MSA with sequence quintuple representation, instead of 2-stage SA with the same representation. In short, the overall solution quality and the runtime of MSA algorithm are better than these of 2-stage SA algorithm.

## 10. Conclusion and future work

In summary, the optimization techniques for integrated circuit (IC) layout design with large solution space are facing big challenges to get better solution quality with less runtime. In this research, a new simulated annealing based approach, named mixed simulated annealing (MSA), is proposed to solve three typical layout design problems, which are 2D packing, 2D placement and 3D packing, by using sequence pair, sequence triple and sequence quintuple representations. A new crossover operator is designed to reuse the information of past solutions and get high improving efficiency. Based on experiment, MSA improved both the best and the worst cases of 2D placement for interconnect power and maximal delay. For area minimization, MSA reduced computational runtime with the better solution quality, and a near log-linear trend of average improvement rates from SA to MSA is gotten for both solution quality and runtime. The overall quality of packing by MSA is normally better than the published results. For the volume minimization of 3D packing, MSA improved the solution quality (up to 12% better) and the computational time (up to 4 times faster). For the future work, the proposed MSA has potential to be extended to more general problems, such as 2D/3D packing or placement with rectilinear boxes.

## Acknowledgement

The authors would like to thank Prof. Kunihiro Fujiyoshi at Tokyo University of Agriculture and Technology, to thankProf.Shuichi Ueno, Dr. Tayu, Mr. Shinoda, Mr. Inoue and Mr. Zhao at Tokyo Institute of Technology, and to thank Mr. Yamada, Mr. Ando and Mr. Ukon at Osaka University for their discussion, comment and support.