Advanced Statistical Methodologies for Tolerance Analysis in Analog Circuit Design

highlights key design issues and challenges to guarantee the development of successful applications of analog circuits. Researchers around the world share acquired experience and insights to develop advances in analog circuit design, modeling and simulation. The key contributions of the sixteen chapters focus on recent advances in analog circuits to accomplish academic or industrial target specifications.

• to characterize statistically integrated circuits (IC) manufacturing process fluctuations; • to predict reliably circuit performance spreads at the design stage.
Failure in the former can result in a low parametric yield, since ICs do not meet design specifications. On the one hand, a successful statistical characterization promotes a robust manufacturability reflecting in a high fabrication yield (i.e. a high proportion of produced circuits which function properly). On the other hand, it requires managing complex design flows in the design-verification-production life-cycle of ICs. Summing up, random and systematic defects as well as parametric process variations have a big influence on the design/production cycle, causing frequent re-spinning of the whole development and manufacturing chain. This leads to high costs of multiple manufacturing runs and entails extremely high risks of missing a given market window. One way to overcome these drawbacks is to implement the DFM/DFY paradigm (Bühler et al., 2006) where Design for Manufacturability (DFM) mates Design for Yield (DFY) to form a synergistic manufacturing chain to be dealt with in terms of: i) relationships between the statistical circuit parameters matching the production constraints, and ii) performance indicators ensuring correctly functioning dies. This chapter introduces a pair of procedures aimed at identifying these parameters exactly with the goal of maximizing performance indicators defined as a function of the parameters' confidence region. The material is organized as follows. In Section 2 we discuss the statistical aspect of IC design and introduce the lead formalism. In Section 3 we focus on the statistical modeling task with special regard to two advanced solution methods. Hence we introduce benchmarks in Section 4 to both provide a comparison between the performances of the above methods and show their behaviors w.r.t. state-of-the-art procedures introduced by researchers in the last years. Concluding remarks are drawn in the final section.

Statistics in IC design
Electronic devices are replicated multiple times on a wafer and different wafers are produced, but each device cannot be produced in the same way in terms of electrical performance. Main factors that make the fabrication result uncertain are: the imperfections characterizing the masks and tolerances in their positionings, various changing effects of ion plant temperature during production, tolerances in size, etc. Generally fluctuations' processes produce fluctuations in electrical performance. Consequently, an essential tool for electronic circuit designing is represented by the statistical model which formally relates the former to the latter. A circuit is classified as acceptable in performances if all specifications on its electrical behavior are met. In the context of the microelectronics industry, the term yield phrases the ratio between the number of acceptable chips and total number of produced chips: yield = # accetable chips # manufactured chips (1) The acceptability of each chip is decreed by checking that the questioned electrical parameters individually fall into tolerance intervals. In addition, each wafer contains several sites with special test structures that enable further performance measurements in order to verify the manufacturing process. All the measurements are collected in a database which statistically characterizes the electrical behavior of the devices.
As for the final product we may classify the integrated circuits into: • acceptable chips, which satisfy all performance requests, • functional failures, when malfunctions affect chips, • parametric failures, when chips fail to reach performances.
Coming to their manufacturing, we are used to distinguish three categories of failures that we synthesize through: 2.1. random yield (sometimes called statistical yield), concerning the random effects occurring during the manufacturing process, such as catastrophic faults in the form of open or short circuits. These faults may be a consequence of small particles in the atmosphere landing on the chip surface, no matter how clean is the wafer manufacturing environment. An example of a random component is that of threshold voltage variability due to random dopant fluctuations (Stolk et al., 1988); 2.2. systematic yield (including printability issues), related to systematic manufacturability issues deriving from combinations and interactions of events that can be identified and addressed in a systematic way. An example of these events is the variation in wire thickness with layout density due to Chemical Mechanical Polishing/Planarization (CMP) (Chang et al., 1995). The distinction from the previous yield is important because the impact of systematic variability can be removed by adapting the design appropriately, while random variability will inevitably impact design margins in a negative manner; 2.3. parametric yield (including variability issues), dealing with the performance drifts induced by changes in the parameter setting -for instance, lower drive capabilities, increased leakage current and greater power consumption, increased resistance and capacitance (RC) time constants, and slower chips deriving from corruptions of the transistor channels.
From a complementary perspective, the unacceptable performance causes for a circuit may be split into two categories of disturbances: • local, caused by disruption of the crystalline structure of silicon, which typically determines the malfunctioning of a single chip in a silicon wafer; • global, caused by inaccuracies during the production processes such as misalignment of masks, changes in temperature, changes in doses of implant. Unlike the local disturbance, the global one involves all chips in a wafer at different degrees and in different regions. The effect of this disturbance is usually the failure in the achievement of requested performances, in terms of working frequency decrease, increased power consumption, etc.
Both induce troubles on physical phenomena, such as electromagnetic coupling between elements, dissipation, dispersion, and the like. The obvious goal of the microelectronics factory is to maximize the yield as defined in (1). This translates, from an operational perspective, into a design target of properly sizing the circuit parameters, and a production target of controlling their realization. Actually both targets are very demanding since the involved parameters π are of two kinds: • controllable, when they allow changes in the manufacturing phase, such as the oxidation times, • non controllable, in case they depend on physical parameters which cannot be changed during the design procedure, like the oxide growth coefficient.
Moreover, in any case the relationships between π and the parameters φ characterizing the circuit performances are very complex and difficult to invert. This induces researchers to model both classes of parameters as vectors of random variables, respectively Π and Φ 1 . The corresponding problem of yield maximization reverts into a functional dependency among the problem variables. Namely, let Φ =( Φ 1 , Φ 2 ,...,Φ t ) be the vector of the performances determined by the parameter vector Π =( Π 1 , Π 2 ,...,Π n ), and denote with D Φ the acceptability region of a given chip. For instance, in the common case where each performance is checked singularly in a given range, i.e.: The yield goal is the maximization of the probability P that a manufactured circuit has an acceptable performance, i.e.
where f Φ is the joint probability density of the performance Φ.
To solve this problem we need to know f Φ and manage its dependence on Π.N a m e l y , methodologies for maximizing the yield must incorporate tools that determine the region of acceptability, manipulate joint probabilities, evaluate multidimensional integrals, solve optimization problems. Those instruments that use explicit information about the joint probability and calculate the yield multidimensional integral (4) during the maximization process are called direct methods.T h e t e r m indirect is therefore reserved for those methods that do not use this information directly. In the next section we will introduce two of these methods which look to be very promising when applied to real world benchmarks.

Statistical modeling
As mentioned in the introduction, a main way for maximizing yield passes through mating Design for Manufacturability with Design for Yield (DFM/DFY paradigm) along the entire manufacturing chain. Here we focus on model parameters at an intermediate location in this chain, representing a target of the production process and the root of the circuit performance. Their identification in correspondence to a performances' sample measured on produced circuits allows the designer to get a clear picture of how the latter react to the model parameters in the actual production process and, consequently, to grasp a guess on their variation impact. Typical model and performance parameters are described in Table 1 in Section 4. In a greater detail, the first requirement for planning circuits is the availability of a model relating input/output vectors of the function implemented by the circuit. As aforementioned, its achievement is usually split into two phases directed towards the search of a couple of analytic relations: the former between model parameters and circuit performances, and the latter, tied to the process engigneers' experience, linking both design and phisical circuit parameters as they could be obtained during production. Given a wafer, different repeated measurements are effected on dies in a same circuit family. As usual, the final aim is the model identification, in terms of designating the input (respectively output) parameter values of the aforementioned analytical relation. In some way, their identification hints at synthesizing the overall aspects of the manufacturing process not only to use them satisfactory during development yet to improve oncoming planning and design phases, rather than directly weighontheproduction. For this purpose there are three different perspectives: synthesize simulated data, optimize a simulator, and statistically identify its optimal parameters. All three perspectives share the following common goals: ensure adequate manufacturing yield, reduce the production cost, predict design fails and product defects, and meet zero defects specification. We formalize the modeling problem in terms of a mapping g from a random vector X =( X 1 ,...,X n ), describing what is commonly denoted as model parameters 2 , to a random vector Y = (Y 1 ,...,Y t ), representing a meaningful subset of the performances Φ. The statistical features of X, such as mean, variance, correlation, etc., constitute its parameter vector θ X ,henceforth considered to be the statistical parameter of the input variable X.N a m e l y ,Y = g(X )= (g 1 (X),...,g t (X)), and we look for a vector θ Y that characterizes a performance population where P(Y ∈ D Y )=α, having denoted with D Y the α-tolerance region,i . e .t h ed o m a i n spanned by the measured performances, and with α a satisfactory probability value. In turn, D Y is the statistic we draw from a sample s y of the performances we actually measured on correctly working dies. Its simplest computation leads to a rectangular shape, as in (3), where we independently fix ranges on the singular performances. A more sophisticated instance is represented by the convex hull of the jointly observed performances in the overall Y space (Liu et al., 1999). At a preliminary stage, we often appreciate the suitability of θ Y by comparing first and second order moments of a performances' population generated through the currently identified parameters with those computed on s y . As a first requisite, we need a comfortable function relating the Y distribution to θ X . The most common tool for modeling an analog circuit is represented by the Spice simulator (Kundert, 1998). It consists of a program which, having in input a textual description of the circuit elements (transistors, resistors, capacitors, etc.) and their connections, translates this description into nonlinear differential equations to be solved using implicit integration methods, Newton's method and sparse matrix techniques. A general drawback of Spice -and circuit simulators in general -is the complexity of the transfer function it implements to relate physical parameters to performances which hampers intensive exploration of the performance landscape in search of optimal parameters. The methods we propose in this section are mainly aimed at overtaking the difficulty of inverting this kind of functions, hence achieving a feasible solution to the problem: find a θ X corresponding to the wanted θ Y .

Monte Carlo based statistical modeling
The lead idea of the former method we present is that the model parameters are the output of an optimization process aimed at satisfying some performance requirements. The optimization is carried out by wisely exploring the research space through a Monte Carlo (MC) method (Rubinstein & Kroese, 2007). As stated before, the proposed method uses the experimental statistics both as a target to be satisfied and, above all, as a selectivity factor for device model. In particular, a device model will be accepted only if it is characterized by parameters' values that allow to obtain, through electrical simulations, some performances which are included in the tolerance region. The aim of the proposed flow is the following: on the basis of the information which constitutes the experimental statistics, we want to map the space Y of the performances (such as gain and bandwidth) to the space X of circuit parameters (such as Spice parameters or circuit components values), as outlined in Fig. 1. Variations in the fabrication process cause random fluctuations in Y s p a c e ,w h i c hi nt u r nc a u s eX to fluctuate (Koskinen & Cheung, 1993). In other words, we want to extract a Spice model whose parameters are random variables, each one characterized by a given probability distribution function. For instance, in agreement with the Central Limit Theorem (Rohatgi, 1976), we may work under usual Gaussianity assumptions. In this case, for the model parameters which have to be statistically described, it is necessary and sufficient to identify the mean values, standard deviations and correlation coefficients. In general, the flow of statistical modeling is based on several MC simulation steps (strictly related to bootstrap analysis (Efron & Tibshirani, 1993)), in order to estimate unknown features for each statistical model parameter. The method will proceed by executing iteratively the following steps, in the same way as in a multiobjective optimization algorithm, where the targets to be identified are the optimal parameters θ X of the model. In the following procedure, general steps (described in roman font) will be specialized to the specific scenario (in italics) used to perform simulations in Section 4.
Step 1. Assume a typical (nominal) device model m 0 is available, whose model parameters' means are described by the vectorν X (central values). Let D Y be the corresponding typical tolerance region estimated on Y observations s y . Choose an initial guess of X joint distribution function on the basis of moments estimated on given X observations s x . Let M denote the companion device statistical model, and set k = 0.
In the specific case of hyper-rectangle tolerance regions defined as in (3), letν Y j ± 3σ Y j , j = 1,...,t denote the two extremes delimiting each admissable performance interval. Moreover, since model parameters X of M follows a multivariate Gaussian distribution, assume (in the first iteration) a null cross-correlation between {X 1 ,...,X n },h e n c eθ the same mean as the nominal model is chosen as initial value, and σ X i is assigned a relatively high value, for instance set equal to the double of the mean value.
Step 2. At the generic iteration k,a nm-sized 3 sample s M k = {x r }, r = 1...,m will be generated according to the actual X distribution.
In particular, when X i are nomore independent, the discrete Karhunen-Loeve expansion (Johnson, 1994) is adopted for sampling, starting from the actual covariance matrix.
Step 3. For each model parameter x r in s M k , the target performances y r will be calculated through Spice circuit simulations.
Step 4. Only those model parameters in s M k reproducing performances lying within the chosen tolerance region D Y will be accepted. On the basis of this criterion a subsample s M k of s M k having size m ′ ≤ m will be selected.
In particular, by keeping a fraction 1 − δ,sa y0.99, of those models having all performance values included in D Y , we are guaranteeing a confidence region of level δ under i.i.d. Gaussianity assumptions.
Step 5. On the basis of the subsample s M k ,an e wm o d e lM ′ k will be computed through standard statistical techniques. For each model parameter X i , i = 1,...,n, the n standard deviations could be computed on t h es a m p l es M through Maximum Likelihood Estimators (MLE) (Mood et al., 1974), Spearman Rank-Order correlation coefficient (Lehmann, 2006;Press et al., 1993) may be used to estimate cross-correlation, while, according to circuit designers' report, the n means will be kept equal to the nominalν X i , i = 1,...,n.
Step 6. If the number m of selected model parameters which have generated M ′ is sufficiently high (for instance they constitute a fraction 1 − δ, let's say 0.99, of the m instances, then the algorithm stops returning the statistical model M ′ . Otherwise, set k = k + 1andgotoStep 2. The iterative procedure described above is based on Attractive Fixed Point method (Allgower & Georg, 1990), where the optimal value of those features to be estimated represents the fixed point of the algorithm. When the number of the components significantly increases, the convergence of the algorithm may become weak. To manage this issue, a two-step procedure is introduced where the former phase is aimed at computing moments involving single features X i while maintaining constant their cross-correlation; the latter is directed toward the estimation of the cross-correlation between them. The overall procedure is analogous to the previous one, with the exception that cross-correlation terms will be kept fixed until Step 5 has been executed. Subsequently, a further optimization process will be performed to determine the cross-correlation coefficients, for instance using the Direct method as described in Jones et al. (1993). The stop criterion in Step 6 is further strengthen, prolonging the running of the procedure until the difference between cross-correlation vectors obtained at two subsequent iterations will drop below a given threshold.

Reverse spice based statistical modeling
A second way we propose to bypass the complexity handicap of Spice functions passes through a principled philosophy of considering the region D X where we expect to set the model parameters as an aggregate of fuzzy sets in various respects (Apolloni et al., 2008). First of all we locally interpolate the Spice function g through a polynomial, hence a mixture of monomials that we associate to the single fuzzy sets. Many studies show this interpolation to be feasible, even in the restricted form of using posynomials, i.e. linear combination of monomials through only positive coefficients (Eeckelaert et al., 2004). The granular construct we formalize is the following.
Given a Spice function g mapping from x to y (the generic component of the performance vector y), we assume the domain D X ⊆ R n into which x ranges to be the support of c fuzzy sets {A 1 ,...,A c }, each pivoting around a monomial m k .W e consider this monomial to be a local interpolator that fits g well in a surrounding of the A k centroid. In synthesis, we have g(x) ≃ ∑ c k=1 μ k (x)m k (x),w h e r eμ k (x) is the membership degree of x to A k , whose value is in turn computed as a function of the quadratic shift (g(x) − m k (x)) 2 .
On the one hand we have one fuzzy partition of D X for each component of y. On the other hand, we implement the construct with many simplifications, in order to meet specific goals. Namely: • since we look for a polynomial interpolation of g, we move from point membership functions to sets, to a monomial membership function to g,sothatg(x) ≃ ∑ c k=1 μ k m k (x). In turn, μ k is a sui generis membership degree, since it may assume also negative values; • since for interpolation purposes we do not need μ k (x), we identify the centroids directly with a hard clustering method based on the same quadratic shift.
Denoting m k (x)=β k ∏ n j=1 x α kj j , if we work in logarithmic scales, the shifts we consider for the single (say the i-th) component of y are the distances between z r =(log x r ,logy r ) and the hyperplane h k (z)=w k · z + b k = 0, with w k = {α k1 ,...,α kn } and b k = log β k , constituting the centroid of A k in an adaptive metric. Indeed, both w k and b k are learnt by the clustering algorithm aimed at minimizing the sum of the distances of the z r s from the hyperplanes associated to the clusters they are assigned to. With the clustering procedure we essentially learn the exponents α kj through which the x components intervene in the various monomials, whereas the β k s remain ancillary parameters. Indeed, to get the polynomial approximation of g(x) we compute the mentioned sui generis memberships through a simple quadratic fitting, i.e. by solving w.r.t. the vector μ = {μ 1 ,...,μ c } the quadratic optimization problem: μ = arg min µ ∑ m r=1 (g(x r ) − y r )) 2 , where x rj denotes the j-th component of the r-th element of the training set s x , y rj its approximation, with where the index r has been hidden for notational simplicity, and μ k s override β k s.

A suited interpretation of the moment method
An early solution of the inverse problem: Which statistical features of X ensure a good coverage (in terms of α-tolerance regions)of the Y domain spanned by the performances measured on a sample of produced dies? relies on the first and second moments of the target distribution, which are estimated on the basis of a sample s y of sole Y collected from the production lines as representatives of properly functioning circuits. Our goal is to identify the statistical parameters θ X of X that produce through (5) a Y population best approximating the above first and second order moments. X is assumed to be a multidimensional Gaussian variable, so that we identify it completely through the mean vector ν X and the covariance matrix Σ X which we do not constrain in principle to be diagonal (Eshbaugh, 1992). The analogous ν Y and Σ Y are a function of the former through (5). Although they could not identify the Y distribution in full, we are conventionally satisfied when these functions get numerically close to the estimates of the parameters they compute (directly obtained from the observed performance sample). Denoting with ν X j , σ X j , σ X j,k and ρ X j,k , respectively, the mean and standard deviation of X j and the covariance/correlation between X j and X k , the master equations of our method are the following: 1.
where M ik on the right is a short notation of m ik (X),andν M ik denotes its mean.
2. Thanks to the approximations with Ξ = log X, coming from the Taylor expansion of respectively Ξ, with We numerically solve (6) and (8-9) in ν X and Σ X when the left members coincide with the target values of ν Y and Σ Y , respectively, and ν M ik is approximated with its sample estimate computed on samples artificially generated with the current values of the parameters. Solving equations means minimizing the differences between left and right members, so that the crucial point is the optimization method employed.The building blocks are the following. The steepest descent strategy. Using the Taylor series expansion limited to second order (Mood et al., 1974), we obtain an approximate expression of the gradient components of Thus we may easily look for the incremental descent on the quadratic error surface accounting for the difference between computed and observed means. Expression (12) confirms the scarce sensitivity of the unbiased mean ν X , and its gradient as well, to the second moments, so that we may expect to obtain an early approximation of the mean vector to be subsequently refined. While analogous to the previous task, the identification of X variances and correlations owns one additional benefit and one additional drawback. The former derives from the fact that we may start with a, possibly well accurate, estimate of the means. The latter descends from the high interrelations among the target parameters which render the exploration of the quadratic error landscape troublesome and very lengthy. Identification of second order moments. An alternative strategy for X second moment identification is represented by the evolutionary computation. Given the mentioned computational length of the gradient descent procedures, algorithms of this family become competitive on our target. Namely, we used Differential Evolution (Price et al., 2005), with specific bounds on the correlation values to avoid degenerate solutions. A brute force numerical variant. We may move to a still more rudimentary strategy to get rid of the loose approximations introduced in (6) to (12). Thus we: i) avoid computing approximate analytical derivatives, by substituting them with direct numerical computations (Duch & Kordos, 2003), and ii) adopt the strategy of exploring one component at a time of the questioned parameter vector, rather than a combination of them all, until the error descent stops. Spanning numerically one direction at a time allows us to ask the software to directly identify the minimum along this direction. The further benefit of this task is that the function we want to minimize is analytic, so that the search for the minimum along one single direction is a very easy task for typical optimizers, such as the naive Nelder-Mead simplex method (Nelder & Mean, 1965) implemented in Mathematica (Wolfram Research Inc., 2008). We structured the method in a cyclic way, plus stopping criterion based on the amount of parameter variation. Each cycle is composed of: i) an iterative algorithm which circularly visits each component direction minimizing the error in the means' identification, until no improvement may be achieved over a given threshold, and ii) a fitting polynomial refresh on the basis of a Spice sample in the neighborhood of the current mean vector. We conclude the routine with a last assessment of the parameters that we pursue by running jointly on all them a local descent method such as Quasi-Newton procedure in one of its many variants (Nocedal & Wright, 1999).

Fine tuning via reverse mapping
Once a good fitting has been realized in the questioned part of the Spice mapping, we may solve the identification problem in a more direct way by first inverting the polynomial mapping to obtain the X sample at the root of the observed Y sample, and then estimating θ X directly from the sample defined in the D X domain. The inversion is almost immediate if it is univocal, i.e., apart from controllable pathologies, when X and Y have the same number of components. Otherwise the problem is either overconstrained (number n of X components less than t, dimensionality of Y components) or underconstrained (opposite relation between component numbers). The first case is avoided by simply discarding exceeding Y components, possibly retaining the ones that improve the final accuracy and avoid numeric instability. The latter calls for a reduction in the number of questioned X components. Since X follows a multivariate Gaussian distribution law, by assumption, we may substitute some components with their conditional values, given the others.

Numerical experiments
The procedures we propose derive from a wise implementation of the Monte Carlo methods, as for the former, and a skillful implementation of granular computing ideas (Apolloni et  2008), as for the latter, however without theoretical proof of efficiency. While no worse from this perspective than the general literature in the field per se (McConaghy & Gielen, 2005), it needs numerical proof of suitability. To this aim we basically work with three real world benchmarks collected by manufacturers to stress the peculiarities of the methods. Namely, the benchmarks refer to: 1. A unipolar pMOS device realized in Hcmos4TZ technology.

2.
A unipolar nMOS device differentiating from the former for the sign (negative here, positive there) of the charge of the majority mobile charge carriers. Spice model and technology are the same, and performance parameters as well. However, the domain spanned by the model parameters is quite different, as will be discussed shortly.
3. A bipolar NPN circuit realized in DIB12 technology. DIB technology achieves the full dielectric isolation of devices using SOI substrates by the integration of the dielectric trench that comes into contact with the buried oxide layer.
The related model parameter took into consideration and measured performances are reported in Table 1.
We have different kinds of samples for the various benchmarks as for both the sample size which ranges from 14, 000 (pMOS and nMOS) to 300 (NPN-DIB12) and the measures they report: joint measures of 4 performance parameters in the former two cases, partially independent measures of 3 performance parameters in the latter, where only HFE and VA are jointly measured. Taking into account the model parameters, and recalling the meaning of t and n in terms of number of performance and model parameters, respectively, the sensitivity of the former parameters to the latter and the different difficulties of the identification tasks lead us to face in principle one balanced problem with n = t = 4 (nMOS), and two unbalanced ones with n = 6andt = 4(pMOS)andn = 4andt = 3 (NPN-DIB12). In addition, only 4 of the 6 second order moments are observed with the third benchmark.

Reverting the Spice model on the three benchmarks
With reference to Table 2, in column θ X we report the parameters of the input multivariate Gaussian distribution we identify in the aim of reproducing the θ Y of the Y population observed through s y . Of the latter parameter, in the subsequent column    Table 2. Benchmarks used for testing the proposed procedure and analysis of the identification solution. Rows: benchmarks. Columns: inferred model distribution parameters (indexed by X) and reconstructed performance parameters (indexed by Y ), plus comparative levels of the tolerance regions (as a function of δ).  Table 2 when projected on the two principal components of the target. Points: reconstructed population lying within (dark gray) and outside (light gray) 0.90 tolerance region (black curves) identified by black points. Gray crosses: original target output; black crosses: target output uniformly spread with noise terms.
the values computed on the basis of θ X (referring to a reconstructed distribution -in italics) with those computed through the maximum likelihood estimate from s y (referring to the original distribution -in bold). As a further accuracy indicator, we will consider tolerance regions obtained through convex hull peeling depth (Barnett, 1976) containing a given percentage 1 − δ of the performance population. In the last column of Table 2, headed by (1 − δ)/(1 − δ), we appreciate the difference between planned tolerance rate (in bold), as a function of the identified Y distribution, and ratio of sampled measures found in these regions (in italics). We consider single values in the table cells since the results are substantially insensitive to the random components affecting the procedure, such as algorithm initialization. Rather, especially with difficult benchmarks, they may depend on the user options during the run of the algorithm. Thus, what we report are the best results we obtain, reckoning the overall trial time in the computational complexity consideration we will do later on in this section. For a graphical counterpart, in Fig. 2 we report the scatterplot of the original Y sample and an analogous one generated through the reconstructed distribution, both projected on the plane identified by the two principal components (Jolliffe, 1986) of the original distribution. We also draw the intercept of this plane with a tolerance region containing 90% of the reconstructed points (hence δ = 0.1). An overview of these data looks very satisfactory, registering a relative shift between sample and identified parameters that is always less than 0.17% as for the mean values, 45% for the standard deviations and 25% for the correlation. The analogous shift between planned and actual percentages of points inside the tolerance region is always less than 2%. We distinguish between difficult and easy benchmarks, where the pMOS sample falls in the first category. Indeed the same percentages referring to the remaining benchmarks decreases to 0.13%, 10% and 9%. Given the high computational costs of the Spice models, their approximation through cheaper functions is the first step in many numerical procedures on microelectronic circuits. Within the vast set of methods proposed by researchers on the matter (Ampazis & Perantonis, 2002a;Daems et al., 2003;Friedman, 1991;Hatami et al., 2004;Hershenson et al., 2001;McConaghy et al., 2009;Taher et al., 2005;Vancorenland et al., 2001) in Table 3 we report a numerical comparison between two well reputed fitting methods and our proposed Reverse Spice based algorithm (for short RS). The methods are Multivariate Adaptive Regression Splines (MARS) (Friedman, 1991), i.e. piecewise polynomials, and Polynomial Neural Networks  (PNN) (Elder IV & Brown, 2000). Namely, we consider the θ X reported in Table 2 as the result of the nMOS circuit identification. On the basis of these parameters and through Spice functions, we draw a sample of 250 pairs (x r , y r ) that we used to feed both competitor algorithms and our own. In detail we used VariReg software (Jekabsons, 2010a; to implement both MARS and PNN. To ensure a fair comparison among the differente methods, we: i) set equal to 6 the number of monomials in our algorithm and the maximum number of basis functions in MARS, where we used a cubic interpolation, and ii) employ the default configuration in PNN by setting the degree of single neurons polynomial equal to 2. Moreover, in order to understand how the various algorithms scale with the fitting domain, we repeat the procedure with a second set θ ′ X of parameters, where the original standard deviations have been uniformly doubled. In the table we report the mean squared errors measured on a test set of size 1000, whose values are both split on the four components of the performance vector and resumed by their average. The comparison denotes similar accuracies with the most concentrated sample -the actual operational domain of our polynomials -and a small deterioration of our accuracy in the most dispersed sample, as a necessary price we have to pay for the simplicity of our fitting function. As for the whole procedure, we reckon overall running times of around half an hour. Though not easily contrastable with computational costs of analogous tasks, this order of magnitude results adequate for an intensive use of the procedure in a circuit design framework.

Stochastically optimizing the third benchmark model
The same NPN-DIB12 benchmark discussed in Section 4.1 was also used to run the two-step MC procedure depicted in Section 3.1. In particular, estimation of the sole standard deviations σ X i s in the former phase alternates with cross-correlation coefficients' in the latter, while the means remain fixed to their nominal values ν X i =ν X i Namely, at each iteration a sample s M = {x r }, r = 1...,m = 5000 was generated, and the whole procedure was repeated 7 times, until over 99% of sample instances were included in the tolerance region. Fig. 3 shows the number m of selected instances for each iteration of the algorithm.

Comparing the proposed methods
In order to grasp insights on the comparative performances of the proposed methods, we list their main features on the common NPN-DIB12 benchmark. Namely, in the first row of Table 4 we report the reference value of the means and standard deviations of both X and Y distributions. As for the first variable, we rely on the nominal values of the parameters for the  Table 4. Comparison between both model and performance moments re reference and reconstructed frameworks. means, leaving empty the cell concerning the standard deviations. As for the performances, we just use the moment MLE estimate computed on the sample s y . In the remaining rows we report the analogous values computed from a huge sample of the above variables artificially generated through the statistical models we identify. Both tables denote a slight comparative benefit of using the reverse modeling (row RS), in terms of both a greater variance of the model parameters and a better similarity of the reconstructed performance parameters with the estimated ones w.r.t. the analogous parameters obtained with Monte Carlo method (row MC). The former feature reflects into less severe constraints in the production process. The latter denotes some improvement in the reconstruction of the performances' distribution law, possibly deriving from both freeing the ν X from their nominal values and a massive use of the Spice function analytical forms.

Conclusions
A major challenge posed by new deep-submicron technologies is to design and verify integrated circuits to obtain a high fabrication yield, i.e. a high proportion of produced circuits that function properly. The classical approach implemented in commercial tools for parameter extraction (IC-Cap by Agilent Technology (2010), and UTMOST by Silvaco Engineered (2010)) requires a dedicated electrical characterization for a large number of devices, in turn demanding for a very long time in terms both of experimental characterization and parameter extraction. Thus, a relevant goal with these procedures is to reduce the computational time to have a statistical description of the device model. We fill it by using two non conventional methods so as to get a speed-up factor greater than 10 w.r.t. standard procedures in literature. The second method exploits a granular construct. In spite of the methodology broadness the attribute granular may evoke, we obtain a very accurate solution taking advantage from strict exploitation of state-of-the-art theoretical results. Starting from the basic idea of considering the Spice function as a mixture of fuzzy sets, we enriched its implementation with a series of sophisticated methodologies for: i) identifying clusters based on proper metrics on functional spaces, ii) descending, direction by direction, along the ravines of the cost functions of the related optimization problems, iii) inverting the (X, Y ) mapping in case of unbalanced problems through the bootstrapping of conditional Gaussian distributions, and iv) computing tolerance regions through convex hull based peeling techniques. In this way we supply a very accurate and fast algorithm to identify statistically the circuit model. Of course, both procedures are susceptible of further improvements deriving from a more and more deep statistics' exploitation. In addition, nobody may guarantee that they will resist to a further reduction of the technology scales. However the underlying methods we propose could remain at the root of new solution algorithms of the yield maximization problem.