Multicanonical Monte Carlo Method Applied to the Investigation of Polarization Effects in Optical Fiber Communication Systems

Polarization-mode dispersion (PMD) is a major source of impairments in optical fiber com‐ munication systems. PMD causes distortion and broadens the optical pulses carrying infor‐ mation and lead to inter-symbol interference. In long-haul transmission systems it is necessary to limit the penalty caused by polarization effects [1], so that the probability of ex‐ ceeding a maximum specified penalty, such as 1 dB, will be small, typically 10-5 or less. This probability is referred as the outage probability. Since PMD is a random process, Monte Car‐ lo simulations are often used to compute PMD-induced penalties. However, the rare events of interest to system designers, which consists of large penalties, cannot be efficiently com‐ puted using standard (unbiased) Monte Carlo simulations or laboratory experiments. A very large number of samples must be explored using standard unbiased Monte Carlo simu‐ lations in order to obtain an accurate estimate of the probability of large penalties, which is computationally costly. To overcome this hurdle, advanced Monte Carlo methods, such as importance sampling (IS) [2], [3] and multicanonical Monte Carlo (MMC) [4] methods, have been applied to compute PMD-induced penalties [5], [6] using a much smaller number of samples. The analytical connections between MMC and IS are presented in [7], [8], [9], [10]. The MMC method has also been used to estimate the bit-error rate (BER) in optical fiber communication systems due to amplified spontaneous emission noise (ASE) [11], for which no practical IS implementation has been developed, and to estimate BER in spectrum-sliced wavelength-division-multiplexed (SS-WDM) systems with semiconductor optical amplifier (SOA) induced noise [12]. More recently, MMC has been used in WDM systems, where the

performance is affected by the bit patterns on the various channels and, in order to account for this pattern dependence, a large number of simulations must be performed [13].
In optical fiber communication systems without PMD compensators, the penalty is correlated with the differential group delay (DGD) due to PMD. As a consequence, one can apply IS to bias the DGD [2] for the computation of PMD-induced penalties. However, biasing the DGD alone is inadequate to compute penalties in compensated systems. On the other hand, the use of multiple IS in which both first-and second-order PMD are biased [3] allows one to efficiently study important rare events with large first-and second-order PMD. In [5] and [14], we used multiple IS to bias first-and second-order PMD to compute the outage probability due to PMD in uncompensated systems and in compensated systems with a singlesection PMD compensator. The development of IS requires some a priori knowledge of how to bias a given parameter in the simulations. In this particular problem, the parameter of interest is the penalty. However, to date there is no IS method that directly biases the penalty.
Instead of directly biasing the penalty, one has to rely on the correlation of the first-and second-order PMD with the penalty, which may not hold in all compensated systems. In contrast to IS, MMC does not require a priori knowledge of which rare events contribute significantly to the penalty distribution function in the tails, since the bias is done automatically in MMC.
In this chapter, we investigated and applied MMC and IS to accurately and efficiently compute penalties caused by PMD. Using these techniques, we studied the performance of PMD compensators and compared the efficiency of these two advanced Monte Carlo methods to compute the penalty of several types of compensated systems. Since Monte Carlo methods are not deterministic, error estimates are essential to verify the accuracy of the results. MMC is a highly nonlinear iterative method that generates correlated samples, so that standard error estimation techniques cannot be applied. To enable an estimate of the statistical error in the calculations using MMC, we developed a method that we refer to as the MMC transition matrix method [15]. Because the samples are independent in IS simulations, one can successfully apply standard error estimation techniques and first-order error propagation to estimate errors in IS simulations. In this chapter, we also estimate the statistical errors when using MMC and IS. Practical aspects of MMC and IS implementation for optical fiber communication systems are also discussed; in addition, we provide practical guidelines on how MMC can be optimized to accurately and rapidly generate probability distribution functions.

MMC Implementation and Estimation of Errors in MMC simulations
In this section, we show how the MMC method can be implemented to PMD emulators and to compute PMD-induced penalty in systems with and without PMD compensators, and also show how one can efficiently estimate errors in MMC simulations using the MMC Transition Matrix method that we developed [15]. For example, when using a standard, unbiased Monte Carlo simulation to calculate the probability density function (pdf) of a statistical quantity, such as the DGD, each sample drawn is independent from the other sample. Hence, when the histogram is smooth, one can infer that the error is acceptably low. The same is not true in MMC simulations because the MMC algorithm requires a substantial degree of correlation among the samples to effectively estimate the histogram, which induces a correlation between the calculated values of the probabilities of neighboring bins. Therefore, it is essential to be able to estimate errors particularly in MMC simulation to assess the accuracy of the calculation.

Multicanonical Monte Carlo method for PMD-Induced penalty
In this sub-section, we briefly review the multicanonical Monte Carlo (MMC) method proposed by Berg and Neuhaus [16], and we describe how we implemented MMC to compute the probability density function (pdf) of the differential group delay (DGD) for PMD emulators. Then, we present results showing the correlation among the histogram bins of the pdf of the DGD that is generated using the MMC method. Finally, we present results with the application of MMC to compute the PMD-induced penalty in uncompensated and singlesection compensated system. In particular, we use contours plots to show the regions in the | τ | -| τ ω | plane that are the dominant source of penalties in uncompensated and singlesection PMD compensated systems.

The multicanonical Monte Carlo method
In statistical physics applications, a conventional canonical simulation calculates expectation values at a fixed temperature T and can, by re-weighting techniques, only be extrapolated to a vicinity of this temperature [17]. In contrast, a single multicanonical simulation allows one to obtain expectation values over a range of temperatures, which would require many canonical simulations. Hence, the name multicanonical [16], [17]. The multicanonical Monte Carlo method is an iterative method, which in each iteration produces a biased random walk that automatically searches the state space for the important rare events. Within each iteration, the Metropolis algorithm [18] is used to select samples for the random walk based on an estimated pdf of the quantity of interest or control parameter, which is updated from iteration to iteration. Each new sample in the random walk is obtained after a small random perturbation is applied to the previous sample. In each MMC iteration, a histogram of the control parameter is calculated that records how many samples are in each bin. In each iteration, one generates a pre-determined number of samples that can vary from iteration to iteration. Typically, each iteration has several thousand samples. Once the pre-determined number of samples in any iteration has been generated, the histogram of the control parameter is used to update the estimate of the probability of all the bins as in [16], which will be used to bias the following iteration. After some number of iterations, typically 15 -50, the number of samples in each bin of the histogram of the control quantity becomes approximately constant over the range of interest, indicating that the estimated pdf of the control quantity is converging to the true pdf.

MMC implementation to PMD emulators
In the computation of the pdf of the DGD, the state space of the system is determined by the random mode coupling between the birefringent sections in an optical fiber with PMD, and the control parameter E is the DGD, as in [19]. When applying MMC, the goal is to obtain an approximately equal number of samples in each bin of the histogram of the control quantity. We compute probabilities by dividing the range of DGD values into discrete bins and constructing a histogram of the values generated by the different random configurations of the fiber sections. The calculations are based on coarse-step PMD emulators consisting of birefringent fiber sections separated by polarization scramblers [20]. We model the fiber using emulators with N s = 15 and N s = 80 birefringent sections. Prior to each section, we use a polarization scrambler to uniformly scatter the polarization dispersion vector on the Poincaré sphere. When polarization scramblers are present, the evolution of the polarization dispersion vector is equivalent to a three-dimensional random walk, and an exact solution [21] is available for the pdf of the DGD that can be compared with the simulations. In unbiased Monte Carlo simulations, the unit matrix R = R x (ϕ)R y (γ)R x (ψ) rotates the polarization dispersion vector before each section, such that the rotation angles around the x-axis in the i-th section, ϕ i and ψ i , have their pdfs uniformly distributed between − π and π, while the cosine of the rotation angle γ i around y-axis has its pdf uniformly distributed between − 1 and 1.
Within each MMC iteration, we use the Metropolis algorithm to make a transition from a state k to a state l by making random perturbations Δϕ i , Δγ i , and Δψ i of the angles ϕ i , γ i , and ψ i in each section, where Δϕ i , Δγ i , and Δψ i are uniformly distributed in the range − επ,επ . To keep the average acceptance ratio close to 0.5 [22], we choose the coefficient of perturbation ε = 0.09. This perturbation is small, since it does not exceed 10% of the range of the angles. In order to further optimize the MMC simulations and avoid sub-optimal solutions, the random perturbation should also be optimized. We are currently investigating the dependence of the relative error obtained in MMC simulations on the random perturbation and coefficient of perturnation used. The results of this investigation will be published in another publication.
To obtain the correct statistics in γ i , since in the coarse step method the cosine of γ i is uniformly distributed, we accept the perturbation Δγ i with probability equal to When the perturbation is not accepted, we set Δγ i = 0. The random variable with acceptance probability given by min 1,F (γ i + Δγ i )/ F (γ i ) can be implemented by obtaining a random number from a pdf uniformly distributed between 0 and 1, and then accepting the perturbation Δγ i if the random number obtained is smaller than F (γ i + Δγ i )/ F (γ i ). To introduce a bias towards large values of the control parameter E, each transition from state k to the state l in the iteration j + 1 is accepted with probability P accept (k → l) = min 1, P j (E k ) / P j (E l ) , and rejected otherwise, where P j ( E ) is the estimate of the pdf of DGD obtained after the first j iterations. At the end of each iteration we update P j ( E ) using the same recursion algorithm as in [16], so that the number of hits Current Developments in Optical Fiber Technology in each bin of the control parameter histogram becomes approximately equal as the iteration number increases.

Summary of the MMC algorithm
In the first iteration we use M 1 samples and set the pdf of the DGD P 1 ( E ) of a PMD emulator with N s sections as uniform, P 1 (E )=1/ N b (N b = number of bins). Because every step in the Metropolis algorithm will be accepted with this initial distribution, we more effectively exploit the first iteration by choosing the coefficient of perturbation ε=1 To update the pdf of the DGD at the end of this iteration we use the recursive equation as in (1), which is the same equation used in any other iteration. We then carry out an additional N − 1 iterations with M l ( 1 < l ≤ N ) samples in each iteration. We note that in general the number of samples in each iteration does not have to be the same. We now present a pseudo-code summary of the algorithm: Loop over iterations j = 1 to N -1: Loop over fiber realizations (samples) m=1 to M l : (1) start random walk on ϕ, γ, and ψ with small steps Δϕ, Δγ, and Δψ with the angles ϕ + Δϕ, γ + Δγ and ψ + Δψ.
(3) accept provisional step with probability equal to min 1,P j (E m ) / P j (E prov ) Where ĝ k j , the relative statistical significance of the k-th bin in the j-th iteration, is defined as in a given iteration, then the k-th bin has no statistical significance in this iteration. Therefore, we set g k j =0 in that iteration. The statistical significance, 0 ≤ ĝ k j ≤ 1, depends on both previous bins and previous iterations, inducing a significant correlation among P k j .
Finally, the P k where N b is the number of bins. MMC is an extension of the Metropolis algorithm [18], where the acceptance rule accepts all the transitions to states with lower probabilities, but rejects part of the more likely transitions to states with higher probabilities. As the number of iterations increases, the histogram of the number of hits in each bin will asymptotically converge to a uniform distribution (H k +1 j / H k j → 1), and the relative statistical significance will asymptotically converge to zero (ĝ k j → 0). Consequently, P j+1 will asymptotically converge to the true probability of the control parameter.
Equations (1) and (2) were derived by Berg and Neuhaus [16] assuming that the probability distribution is exponentially distributed with a slowly varying exponent that is a function of the control quantity (the temperature in their case and DGD or the penalty due to PMD in ours). This assumption is valid in a large number of problems in optical fiber communications, including the pdf of the DGD in fibers with an arbitrary number of sections [19], [23]. The recursions in (1) and (2) were derived by applying a quasi-linear approximation to the logarithm of the pdf in addition to a method for combining the information in the current histogram with that of previous iterations according to their relative statistical significance [16], [19].

Correlations
The goal of any scheme for biasing Monte Carlo simulations, including MMC, is to reduce the variance of the quantities of interest. MMC uses a set of systematic procedures to reduce the variance, which are highly nonlinear as well as iterative and have the effect of inducing a complex web of correlations from sample to sample in each iteration and between iterations. These, in turn, induce bin-to-bin correlations in the histograms of the pdfs. It is easy to see that the use of (1) and (2) generates correlated estimates for the P k j , although this procedure significantly reduces the variance [16]. In this section, we illustrate this correlation by showing results obtained when we applied MMC to compute the pdf of the DGD for a PMD emulator with 80 sections.
We computed the correlation coefficient between bin i and each bin j ( 1 ≤ j ≤ 80 ) in the histogram of the normalized DGD by doing a statistical analysis on an ensemble of many independent standard MMC simulations. The normalized DGD, | τ | / | τ | , is defined as the Current Developments in Optical Fiber Technology DGD divided by its expected value, which is 30 ps in this case. Suppose that on the l-th MMC simulation, we have P i l as the probability of the i-th bin and suppose that the average over all L MMC simulations is P ī . Then, we define a normalized correlation between bin i and bin j as where σ P i and σ P j are the standard deviation of P i and P j , respectively. The normalized correlation defined in (3) is known as Pearson's correlation coefficient [24].
The values for C ( i, j ) generated by (3) will range from -1 to 1. A value of +1 indicates a perfect correlation between the random variables. While a value of -1 indicates a perfect anticorrelation between the random variables. A value of zero indicates no correlation between the random variables.
In Figs

Estimation of errors in MMC simulations
In this sub-section, we explain why a new error estimation procedure is needed for multicanonical Monte Carlo simulations, and we then present the transition matrix method that we developed to efficiently estimate the error in MMC. Finally, we present the validation and application of this method.

Why a new error estimation procedure ?
Since MMC is a Monte Carlo technique, it is subject to statistical errors, and it is essential to determine their magnitude. In [25], we showed how to compute errors when using importance sampling. In this sub-section, we show how one can efficiently estimate errors in MMC simulations using a transition matrix method that we developed. In practice, users of Monte Carlo methods often avoid making detailed error estimates. For example, when using an standard, unbiased Monte Carlo simulation to calculate the pdf of a quantity such as the DGD, the number of samples in each bin of the pdf's histogram is independent. Hence, when the histogram is smooth, one can infer that the error is acceptably low. This procedure is not reliable with MMC simulations because, as we showed in Section 2.1.4, the MMC algorithm induces a high degree of correlation from bin to bin. While it is always best to estimate error with any Monte Carlo method, it is particularly important in MMC simulations, due to the presence of large sample-to-sample correlations on the tails of the distributions.
The existence of correlations in the samples generated with the MMC method makes calculating the errors in MMC simulations significantly more difficult than in standard Monte Carlo simulations. Also, due to the correlations, one cannot apply to MMC standard error analysis that are traditionally used for simulations with uncorrelated samples. For the same reason, one cannot determine the contribution of the variance from each iteration using standard error propagation methods as in the case with importance sampling simulations [5]. Thus, the MMC variance cannot be estimated by applying a standard error analysis to a single MMC simulation. One can in principle run many independent MMC simulations in order to estimate the error by using the standard sample variance formula [26] on the ensemble of MMC simulations. However, estimating the error of the pdf of the quantity of interest by running many independent MMC simulations is computationally costly and in many cases not feasible. One can overcome this problem with the transition matrix method that we developed.
The transition matrix method is an efficient numerical method to estimate statistical errors in the pdfs computed using MMC. In this method, we use the estimated transition probability matrix to rapidly generate an ensemble of hundreds of pseudo-MMC simulations, which allows one to estimate errors from only one standard MMC simulation. The transition probability matrix, which is computed from a single, standard MMC simulation, contains all the probabilities that a transition occurs from any bin of the histogram of the quantity of interest to any other bin after a step (or perturbation) in the MMC random walk. The pseudo-MMC simulations are then made using the computed transition matrix instead of running full simulations. Each pseudo-MMC simulation must be made with the same number of samples per iteration and the same number of iterations as in the original standard MMC simulation.
Once an ensemble of pseudo-MMC simulations has been calculated, one can use standard procedures to estimate the error. Since the transition matrix that is used in the pseudo-MMC simulations has its own statistical error, it might seem strange at first that it can be used as the basis from which to estimate the error in the MMC simulations. However, bootstrap theory assures us that such is the case [27]. Intuitively, the variation of any statistical quantity among the members of an ensemble of pseudo-MMC simulations is expected to be the same as the variation among members of an ensemble of standard MMC simulations because the simulations are carried out with the same number of samples and the same number of iterations.
To illustrate the transition matrix method, we calculated the pdf of DGD due to PMD and the associated confidence interval for two types of PMD emulators [28]. We validated our method by comparison to the results obtained by using a large ensemble of standard MMC simulations. We tested our method by applying it to PMD emulators because it was the first random phenomenon in optical fiber communication to which MMC was applied [19] and has become essential for testing biasing Monte Carlo methods. Moreover, it is computationally feasible to validate the proposed method with a large ensemble of standard MMC simulations. That is not the case for most other problems, e.g., the error rate due to optical noise [29] and the residual penalty in certain PMD-compensated systems [6].

New error estimation procedure
Here we introduce an efficient numerical procedure that we refer to as the transition matrix method, to compute statistical errors in MMC simulations that properly accounts for the contributions of all MMC iterations. The transition matrix method is a bootstrap resampling method [27], [30] that uses a computed estimate of the probability of a transition from bin i to bin j of the histogram of the DGD. In a bootstrap method, one estimates a complex statistical quantity by extracting samples from an unknown distribution and computing the statistical quantity. In the case of computing the pdf of the DGD in PMD emulators, the complex statistical quantity is the probability of each bin in the histogram of the DGD, the pseudo-samples are the DGD values obtained in the pseudo-MMC simulations, and the unknown distribution is the true transition matrix. One then repeatedly and independently draws an ensemble of pseudo-samples with replacement from each original sample and computes the statistical quantity of interest using the same procedure by which the statistical quantity was first estimated. One can then estimate the variance of the quantity of interest from these pseudo-samples using standard techniques. The bootstrap method is useful when it is computationally far more rapid to resample the original set of samples than to generate new samples, allowing for an efficient estimate of the variance.

Bootstrap method
Efron's bootstrap [27] is a well-known general purpose technique for obtaining statistical estimates without making a priori assumptions about the distribution of the data. A schematic illustration of this method is shown in Fig.4. Suppose one draws a random vector .,x n ) with n samples from an unknown probability distribution F and one wishes to estimate the error in a parameter of interest θ= f ( x ) . Since there is only one sample of θ, one cannot use the sample standard deviation formula to compute the error. However, one can use the random vector x to determine an empirical distribution F from F (unknown distribution). Then, one can generate bootstrap samples from F , x * =( x 1 * ,x 2 * ,...,x n * ), to obtain θ * = f ( x * ) by drawing n samples with replacement from x. The quantity f ( x * ) is the result of applying the same function f (.) to x * as was applied to x. For example, if f ( x ) is the median of x, then f ( x * ) is the median of the bootstrap resampled data set. The star notation indicates that x * is not the actual data set x, but rather a resampled version of x obtained from the esti-mated distribution F . Note that one can rapidly generate as many bootstrap samples x * as one needs, since those simulations do not make use the system model, and then generate independent bootstrap sample estimates of θ, θ 1  The transition matrix method that we describe in this chapter is related to the bootstrap resampling method as follows: 1. F is an estimate of the transition matrix obtained from a single standard MMC simulation; 2. x 1 * ,..., x B * , are the collection of samples that is obtained from the ensemble of pseudo-MMC simulations. We note that x b * should be computed using the exact same number of iterations and the exact same number of samples per iteration as in the original standard MMC simulation;

3.
Each θ b * , where b=1,2,...,B, is a value for the probability p k * of the k-th bin of the histogram of the DGD obtained from each of the pseudo-MMC simulations;

4.
Given that one has B independent p k * , one can obtain an error estimate for each bin in the estimated pdf of the DGD using the traditional sample standard deviation formula [26,27].

The transition matrix method
In this sub-section, we explain the transition matrix method in the context of computing errors in the pdf of the DGD for PMD emulators. The transition matrix method has two parts.
In the first part, one obtains an estimate of the pdf of the DGD and an estimate of the onestep transition probability matrix Π. To do so, one runs a standard MMC simulation, as described in Section 2.1.2. At the same time, one computes an estimate of the transition probability π i, j , which is the probability that a sample in the bin i will move to the bin j after a single step in the MMC algorithm. We stress that a transition attempt must be recorded whether or not it is accepted by the Metropolis algorithm after the fiber undergoes a random perturbation. The transition matrix is a matrix that contains the probability that a transition will take place from one bin to any other bin when applying a random perturbation. It is independent of the procedure for rejecting or accepting samples, which is how the biasing is implemented in the MMC method. An estimate of the transition matrix that is statistically as accurate as the estimate of the pdf using MMC can be obtained by considering all the transitions that were attempted in the MMC ensemble. One uses this information to build a N b × N b one-step transition probability matrix, where N b is the number of bins in the histogram of the pdf. The transition matrix Π consists of elements π i, j , where the sum of the row elements of Π equals 1. The elements π i, j are computed as And π i, j =0, otherwise. In (6), M t is the total number of samples in the MMC simulation and E m is the m-th DGD sample. The indicator function I i (E) is chosen to compute the probability of having a DGD sample inside the bin i of the histogram. Thus, I i (E) is defined as 1 inside the DGD range of the bin i, otherwise I i (E) is defined as 0. In the second part of the procedure, one carries out a new series of MMC simulations (using the transition probability matrix), that we refer to as pseudo-MMC simulations. In each step, if one starts for example in bin i of the histogram, one picks a new provisional bin j using a procedure to sample from the pdf π i , where π i ( j )= π i, j . One then accepts or rejects this provisional transition using the same criteria as in full, standard MMC simulations, and the number of samples in the bins of histogram is updated accordingly. Thus, one is using the transition matrix Π to emulate the random changes in the DGD that result from the perturbations Δϕ i , Δγ i , and Δψ i that were used in the original standard MMC simulation. In all other respects, each pseudo-MMC simulation is like the standard MMC simulation. In particular, the metric for Current Developments in Optical Fiber Technology accepting or rejecting a step, the number of samples per iteration, and the number of iterations must be kept the same. It is possible to carry out hundreds of these pseudo-MMC simulations in a small fraction of the computer time that it takes to carry out a single standard MMC simulation. This procedure requires us to hold the entire transition matrix in memory, which could in principle be memory-intensive, although this issue did not arise in any of the problems that we considered. This procedure will be useful when evaluating a transition using the transition matrix requires far less computational time than calculating a transition using the underlying physics. This is an assumption that was valid for the cases in which we considered, and we expect that it is applicable to most practical problems. An estimate of the pdf of the DGD is obtained in the final iteration of each pseudo-MMC simulation. Since the estimates of the probability in a given bin in the different pseudo-MMC simulations are independent, one may apply the standard formula for computation of the variance σ p i * is the probability of the i-th bin in the histogram of the DGD obtained in the b-th pseudo-MMC simulation and B is the total number of pseudo-MMC simulations. Thus, σ p i * is an estimate of the error in the i-th bin in the histogram of the DGD obtained in a single MMC simulation. We now illustrate the details of how we choose the provisional transition from bin i to bin j with the following pseudo-code: bin DGD of current sample = i use random number to generate x from a uniform pdf between 0 and 1:x ← U 0,1 where π i, j cdf =∑ m=1 j π i,m is the cumulative transition probability. This procedure is used to sample from the pdf π i , where π i ( j )= π i, j , and with π i, j defined as the probability that a sample in the bin i will move to the bin j.

Assessing the error in the MMC error estimation
The estimate of the MMC variance also has an error, which depends on the number of samples in a single standard MMC simulation and on the number of pseudo-MMC simulations (bootstrap samples) [31]. Here, the error due to the bootstrap resampling is minimized by using 1,000 bootstrap pseudo-MMC simulations. Therefore, the residual error is due to the finite number of samples used to estimate both the pdf of the DGD and the transition matrix in the single standard MMC simulation, i.e., in the first part of the transition matrix method. Thus, there is a variability in the estimate of the MMC variance due to the variability of the transition matrix Π as an estimate of the true transition matrix Π. To estimate the error in the estimate of the MMC variance, we apply a procedure known in the literature as bootstrapping the bootstrap or iterated bootstrap [32]. The procedure is based on the principle that if the bootstrap can estimate errors in one statistical parameter using Π , one can also use bootstrap to check the uncertainty in the error estimate using bootstrap resampled transition matrices Π * . The procedure consists of: where, In (9) and (10), is the standard deviation of p ** computed using the n-th pseudo-transition matrix. In Fig. 5, we show the relative variation of p ** and its confidence interval Δ p ** for a PMD emulator with 15 sections. We used 14 MMC iterations with 4,000 samples each (total of 56,000 samples). The confidence interval of the relative variation is defined in (8). We used a total of 80 evenly-spaced bins where we set the maximum value for the normalized DGD as five times the mean DGD. We also use the same number for bins for all the figures shown in this chapter. As expected, we observed that the error in the estimate of the MMC variance is large when the MMC variance is also large. The confidence interval Δ p ** is between (2.73 × 10 −2 , 3.19 × 10 −2 ) and (3.61 × 10 −1 , 4.62 × 10 −1 ) for | τ | / | τ | < 2. It increases to (2.68 × 10 −1 ,4.48 × 10 −1 ) when | τ | / | τ | =3 and to (4.05 × 10 −1 , 9.09 × 10 −1 ) at the largest value of | τ | / | τ | . We concluded that the estimate of the relative variation of the probability of a bin is a good estimate of its own accuracy. This result is similar to what is observed with the standard analysis of standard Monte Carlo simulations [26]. Intuitively, one expects the relative error and the error in the estimated error to be closely related because both are drawn from the same sample space. In Fig. 5, we also observe that the relative variation increases with the DGD for values larger than the mean DGD, especially in the tail of the pdf. This phenomenon occurs because the regions in the configuration space that contribute to the tail of the pdf of the DGD are only explored by the MMC algorithm after several iterations. As the number of iterations increases, the MMC algorithm allows the exploration of less probable regions of the configuration space. Because less probable regions are explored in the last iterations, there will be a significantly smaller number of hits in the regions that contribute to the tail of the pdf of the DGD. As a consequence, the relative variation will increase as the DGD increases.

Application and validation
We estimated the pdf of the normalized DGD (P DGD ) and its associated confidence interval  We monitored the accuracy of our computation by calculating the relative variation of the pdf of the normalized DGD. The relative variation is defined as the ratio between the standard deviation of the pdf of the normalized DGD and the pdf of the normalized DGD ( σ P DGD / P DGD ) . In Fig. 6, we show the relative variation when we used PMD emulators with Current Developments in Optical Fiber Technology 15 and with 80 birefringent sections. The symbols show the relative variation when we applied the procedure that we described in Section 2 with 1,000 pseudo-MMC simulations based on a single standard MMC simulation and the transition matrix method, while the solid and the dot-dashed lines show the relative variation when we used 1,000 standard MMC simulations. The circles and the solid line show the results for a 15-section PMD emulator, while the squares and dot-dashed line show the results when we used an 80-section PMD emulator. As expected, the result from an ensemble of pseudo-MMC simulations shows a systematic deviation from the result from an ensemble of standard MMC simulations for both emulators. The systematic deviation changes depending on which standard MMC simulation is used to generate the pseudo ensemble. In Fig. 6, the two dashed lines show the confidence interval of the relative variation with the 15-section PMD emulator computed using the transition matrix method, i.e., the confidence interval for the results that are shown with the circles. The confidence interval Δ p ** is between (3.04 × 10 −2 , 3.28 × 10 −2 ) and (2.76 × 10 −1 , 3.62 × 10 −1 ) for | τ | / | τ | < 2. It increases to (2.39 × 10 −1 , 4.31 × 10 −1 ) when | τ | / | τ | =3 and to (2.69 × 10 −1 , 9.88 × 10 −1 ) at the largest value of | τ | / | τ | .
While the relative variation that is computed using the transition matrix method from a single MMC simulation will vary from one standard MMC simulation to another, the results obtained from different standard MMC simulations are likely to be inside this confidence interval with a well-defined probability. The confidence interval of the relative variation was obtained using a procedure similar to the one discussed in the Section 2.2, except that we computed the relative variation of the probability of a bin using the transition matrix method for every one of the 1,000 standard MMC simulations. Therefore, we effectively computed the true confidence interval of the error estimated using the transition matrix method. We have verified that the confidence interval calculated using the double bootstrap procedure on a single standard MMC simulation agrees well with the true confidence interval in all the cases that we investigated. We observed an excellent agreement between the results obtained with the transition matrix method based on a single standard MMC simulation and the results obtained with 1,000 standard MMC simulations for both 15 and 80 fiber sections when the relative variation ( σ P DGD / P DGD ) is smaller than 15%. For larger relative variation, the true error is within the confidence interval of the error, which can be estimated using the double bootstrap method described in Section 2.2. The curves for the 80-section PMD emulator have a larger DGD range because a fiber with 80 birefringent sections is able to produce larger DGD values than is possible with a fiber with 15 birefringent fiber sections [28].
In Figs. 7 and 8, we show with symbols the results for the pdf of the normalized DGD and its confidence interval using the numerical procedure that we presented in Section 2.2. The solid line shows the pdf of the normalized DGD obtained analytically using a solution (see [21]) for 15 and 80 concatenated birefringent fiber sections with equal length. For comparison, we also show the Maxwellian pdf for the same mean DGD. In table 1, we present selected data points from the curves shown in Fig. 7. For both 15-and 80-section emulators, we find that the MMC yields estimates of the pdf of the normalized DGD with a small confi-dence interval. In Figs. 7 and 8, we see that the standard deviation ( σ P DGD ) for the DGD pdf is always small compared to the DGD pdf. The values of the relative variation ( σ P DGD / P DGD ) ranges from 0.016 to 0.541. We used only 56,000 MMC samples to compute the pdf of the DGD in a 15-section emulator, but we were able nonetheless to accurately estimate probabilities as small as 10 −8 . Since the relative error in unbiased Monte Carlo simulations is approximately given by N I −1/2 , where N I is the number of hits in a given bin, it would be necessary to use on the order of 10 9 unbiased Monte Carlo samples to obtain a statistical accuracy comparable to the results that I show in the bin with lowest probability in Figs. 7 and 8.    Fig. 6. The columns from left to right show: the normalized DGD value, the analytical probability density function, the estimated probability density function, the standard deviation computed using the transition matrix method, and the relative variation.
We would like to stress that the computational time that is required to estimate the errors using the transition matrix method does not scale with the time needed to carry out a single standard MMC simulation. For instance, it takes approximately 17.5 seconds of computation using a Pentium 4.0 computer with 3 GHz of clock speed to estimate the errors in the pdf of the DGD for the 80-section emulator using 1,000 pseudo-MMC simulations with the transition matrix method, once the transition matrix is available. The computational time that is required to compute the pdf of the DGD using only one standard MMC simulation is 60 seconds. To obtain 1,000 standard MMC simulations would require about 16.6 hours of CPU time in this case.
We also stress that it is difficult to estimate the statistical errors in MMC simulations because the algorithm is iterative and highly nonlinear. We introduced the transition matrix method that allows us to efficiently estimate the statistical errors from a single standard MMC simulation, and we showed that this method is a variant of the bootstrap procedure. We applied this method to calculate the pdf of the DGD and its expected error for 15-section and 80-section PMD emulators. Finally, we validated this method in both cases by comparing the results to estimates of the error from ensembles of 1,000 independent standard MMC simulations. The agreement was excellent. In Section 4, we apply the transition matrix method to estimate errors in the outage probability of PMD uncompensated and compensated systems. We anticipate that the transition matrix method will allow one to estimate errors with any application of MMC including the computation of the pdf of the received voltage in optical communication systems [29] and the computation of rare events in coded communication systems [33].

PMD Compensators
In this chapter, we investigated a single-section and three-section PMD compensators. A single-section PMD compensator [34], which is a variable-DGD compensator that was programmed to eliminate the residual DGD at the central frequency of the channel after compensation, and a three-section PMD compensator proposed in [35], which compensates for first-and second-order PMD. The three-section compensator consists of two fixed-DGD elements that compensate for the second-order PMD and one variable-DGD element that eliminates the residual DGD at the central frequency of the channel after compensation. The three-section compensator that we used has the first-and second-order PMD as feedback parameters. This compensator can also in principle operate in a feedforward configuration.

Single-section compensator
The increased understanding of PMD and its system impairments, together with a quest for higher transmission bandwidths, has motivated considerable effort to mitigate the effects of PMD, based on different compensation schemes [36], [37], [38]. One of the primary objectives has been to enable system upgrades from 2.5 Gbit/s to 10 Gbit/s or from 10 Gbit/s to 40 Gbit/s on old, embedded, high-PMD fibers. PMD compensation techniques must reduce the impact of first-order PMD and should reduce higher-order PMD effects or at least not increase the higher orders of PMD. The techniques should also be able to rapidly track changes in PMD, including changes both in the DGD and the PSPs. Other desired characteristics of PMD mitigation techniques are low cost and small size to minimize the impact on existing system architectures. In addition, mitigation techniques should have a small number of feedback parameters to control [39].
In this section, we describe a PMD compensator with an arbitrarily rotatable polarization controller and a single DGD element, which can be fixed [40] or variable [41]. Figure 9 shows a schematic illustration of a single-section DGD compensator. The adjustable DGD element or birefringent element is used to minimize the impact of the fiber PMD and the polarization controller is used to adjust the direction of the polarization dispersion vector of the compensator. The expression for the polarization dispersion vector after compensation, which is equivalent to the one in [42], is given by Figure 9. Schematic illustration of a single-section compensator with a monitor and a feedback element. In practical systems, the compensator will usually be part of the receiver, so that the monitor and the feedback control are integrated with the detection circuit.

Current Developments in Optical Fiber Technology
where τ c is the polarization dispersion vector of the compensator, τ f ( ω ) is the polarization dispersion vector of the transmission fiber, R pc is the polarization transformation in Stokes space that is produced by the polarization controller of the compensator, and T c (ω) is the polarization transformation in Stokes space that is produced by the DGD element of the compensator. We model the polarization transformation R pc as pc pc pc pc R = R ( )R ( )R ( ).
x y x f y f - (12) We note that the two parameters of the polarization controller's angles in (12) are the only free parameters that a compensator with a fixed DGD element possesses, while the value of the DGD element of a variable DGD compensator is an extra free parameter that must be adjusted during the operation. In (12), the parameter ϕ pc is the angle that determines the axis of polarization rotation in the y-z plane of the Poincaré sphere, while the parameter ψ pc is the angle of rotation around that axis of polarization rotation. An appropriate selection of these two angles will transform an arbitrary input Stokes vector into a given output Stokes vector. While most electronic polarization controllers have two or more parameters to adjust that are different from ϕ pc and ψ pc , it is possible to configure them to operate in accordance to the transformation matrix R pc in (12) [43].
In all the work reported in this chapter, we used the eye opening as the feedback parameter for the optimization algorithm unless otherwise stated. We defined the eye opening as the difference between the lowest mark and the highest space at the decision time in the received electrical noise-free signal. The eye-opening penalty is defined as the ratio between the back-to-back and the PMD-distorted eye opening. The back-to-back eye opening is computed when PMD is not included in the system. Since PMD causes pulse spreading in amplitude-shift keyed modulation formats, the isolated marks and spaces are the ones that suffer the highest penalty [44]. To define the decision time, we recovered the clock using an algorithm based on one described by Trischitta and Varma [45].
We simulated the 16-bit string "0100100101101101." This bit string has isolated marks and spaces, in addition to other combinations of marks and spaces. In most of other simulations in this dissertation we use pseudorandom binary sequence pattern. The receiver model consists of an Gaussian optical filter with full width at half maximum (FWHM) of 60 GHz, a square-law photodetector, and a fifth-order electrical Bessel filter with a 3 dB bandwidth of 8.6 GHz. To determine the decision time after the electronic receiver, we delayed the bit stream by half a bit slot and subtracted it from the original stream, which is then squared. As a result a strong tone is produced at 10 GHz. The decision time is set equal to the time at which the phase of this tone is equal to π /2 . The goal of our study is to determine the performance limit of the compensators. In order to do that, we search for the angles ϕ pc and ψ pc .
of the polarization controller for which the eye opening is largest. In this case, the eye opening is our compensated feedback parameter. We therefore show the global optimum of the compensated feedback parameter for each fiber realization.
To obtain the optimum, we start with 5 evenly spaced initial values for each of the angles ϕ pc and ψ pc in the polarization transformation matrix R pc , which results in 25 different initial values. If the DGD of the compensator is adjustable, we start the optimization with the DGD of the compensator equal to the DGD of the fiber. We then apply the conjugate gradient algorithm [46] to each of these 25 initial polarization transformations. To ensure that this procedure yields the global optimum, we studied the convergence as the number of initial polarization transformations is increased. We examined 10 4 fiber realizations spread throughout our phase space, and we never found more than 12 local optima in the cases that we examined. We missed the global optimum in three of these cases because several optima were closely clustered, but the penalty difference was small. We therefore concluded that 25 initial polarization transformations were sufficient to obtain the global optimum with sufficient accuracy for our purposes. We observed that the use of the eye opening as the objective function for the conjugate gradient algorithm produces multiple optimum values when both the DGD and the length of the frequency derivative of the polarization dispersion vector are very large.
The performance of the compensator depends on how the DGD and the effects of the firstand higher-order frequency derivatives of the polarization dispersion vector of the transmission fiber interact with the DGD element of the compensator to produce a residual polarization dispersion vector and on how the signal couples with the residual principal states of polarization over the spectrum of the channel. Therefore, the operation of singlesection PMD compensators is a compromise between reducing the DGD and setting one principal state of polarization after compensation that is approximately co-polarized with the signal. An expression for the pulse spreading due to PMD as a function of the polarization dispersion vector of the transmission fiber and the polarization state over the spectrum of the signal was given in [47].

Three-section compensator
Second-order PMD has two components: Polarization chromatic dispersion (PCD) and the principal states of polarization rotation rate (PSPRR) [35]. Let τ 1 be the polarization dispersion vector of the transmission line, and let τ 2 and τ 3 be the polarization dispersion vectors of the two fixed-DGD elements of the three-section compensator. Using the concatenation rule [42], the first-and second-order PMD vector of these three concatenated fibers are given by Current Developments in Optical Fiber Technology where R 2 and R 3 are the rotation matrices of the polarization controllers before the first and the second fixed-DGD elements of the compensator, respectively. In (14), τ 1w q 1 and τ 1 q 1w are the transmission line PCD and the PSPRR components, respectively, where we express the polarization dispersion vector of the transmission fiber as τ 1 = τ 1 q 1 . Here, the variable τ is the DGD and q= τ / | τ | is the Stokes vector of one of the two orthogonal principal states of polarization. The three-section PMD compensator has two operating points [35]. For the first operating point, the term τ 3 × R 3 τ 2 in (14) is used to cancel the PSPRR component R 3 R 2 τ 1 q 1w , provided that we choose R 3 and R 2 so that R 3 † τ 3 × τ 2 and R 2 τ 1 q 1w are antiparallel, where R 3 † is the Hermitian conjugate of R 3 . Note that with this configuration one cannot compensate for PCD.
For the second operating point, τ 3 × R 3 τ 2 in (14) is used to compensate for PCD by choosing R 3 † τ 3 × τ 2 and R 2 τ 1w q 1 to be antiparallel. Moreover, we can add an extra rotation to R 2 so that ( R 3 † τ 3 + τ 2 ) × R 2 τ 1 q 1 and R 2 τ 1 q 1w are also antiparallel. In this way, the compensator can also reduce the PSPRR term. In our simulations, we computed the reduction of the PCD and PSPRR components for the two operating points and we selected the one that presented the largest reduction of the second-order PMD. Finally, the third, variable-DGD, section of the compensator cancels the residual DGD τ tot after the first two sections.

Simulation results and discussions
We evaluate the performance of optical fiber communication systems with and without PMD compensators using the statistical methods of importance sampling (IS) and multicanonical Monte Carlo (MMC). Both MMC and IS can be used to bias Monte Carlo simulations to the outage probability due to PMD in optical fiber communication systems with one-section and with three-section PMD compensators. When there exist a IS bias technique available, IS is more effective than MMC because each sample in IS is independent, while the samples in MMC slowly become uncorrelated. However, the effectiveness of MMC can be comparable or even exceed that of IS in the cases in which there isn't a high correlation between the parameters that are biased in IS and the parameter of interest. This is the case of optical communication systems with PMD compensation, in which IS has to exploit a vast region of the probability space that does not contribute to the events of interest.
In Fig.10, we show the pdf of the eye-opening penalty for a system with 30 ps mean DGD and a single-section compensator. We compute the pdf using IS in which only the DGD is biased, and we also compute the pdf using IS in which both the first-and the second-order PMD are biased. We observed that it is not sufficient to only bias the DGD in order to accurately calculate the compensated penalty and its pdf. This approach can only be used in systems where the DGD is the dominant source of penalties, which is the case in uncompensated systems and in systems with limited PMD compensation. In Fig. 11, we show the outage probability as a function of the eye-opening penalty. We apply the MMC algorithm to compute PMD-induced penalties in a 10 Gbit/s NRZ system using 50 MMC iterations with 2,000 samples each. The results obtained using the samples in the final iteration of the MMC simulation (dashed and solid lines) are in excellent agreement with the ones obtained using importance sampling (open circles and squares). Here we used the results computed with importance sampling to validate the results obtained with MMC. The use of importance sampling to compute penalties in PMD single-section compensated systems was already validated with a large number of standard Monte Carlo simulations by Lima Jr. et al. [36], [14]. Therefore, the results computed with importance sampling can be used to validate the results computed with MMC. Our goal here is to show the applicability of MMC to accurately compute PMD-induced penalties in uncompensated and single-section PMD compensated systems.
In Fig. 12, we show contours (dotted lines) of the joint pdf of the magnitude of the uncompensated normalized first-and second-order PMD, | τ | and | τ ω | , computed using importance sampling, as in [48]. We also show contours for the eye-opening penalty (solid lines) of an uncompensated system with a mean DGD, | τ | , of 15 ps. The penalty contours were produced using the same samples we generated using the MMC method in the computation of the outage probability shown in Fig. 11. The MMC method automatically placed its samples in the regions of the | τ | -| τ ω | plane that corresponds to the large DGD values that have the highest probability of occurrence, which is the region that is the dominant source of penalties in uncompensated systems.  In Fig. 13, we show similar results for a system with | τ | =30 ps and a variable-DGD compensator that was programmed to minimize the residual DGD at the central frequency of the channel after compensation. In contrast to Fig.12, the MMC method automatically placed its samples in the regions of the | τ | -| τ ω | plane where | τ ω | is large and the DGD is close to its average, corresponding to the region in the plane that is the dominant source of penalties in this compensated system. These results agree with the fact that the contour plots in the region dominating the penalty are approximately parallel to the DGD axis, indicating that the penalty is nearly independent of DGD. In Figs. 12 and 13, the samples obtained using the MMC method are automatically biased towards the specific region of the | τ | -| τ ω | plane that dominates the penalty, i.e., the region where the corresponding penalty level curve intersects the contour of the joint pdf of | τ | and | τ ω | with the highest probability. We did not compute the confidence interval for the results showed in this section. In the following results, we evaluated the performance of a single-section and a three-section PMD compensator in a 10 Gbit/s nonreturn-to-zero system with a mean DGD of 30 ps. We used perfectly rectangular pulses filtered by a Gaussian shape filter that produces a rise time of 30 ps. We simulated a string with 8 bits generated using a pseudorandom binary sequence pattern. We modeled the fiber using the coarse step method with 80 birefringent fiber sections, which reproduces first-and higher-order PMD distortions within the probability range of interest [14]. The results of our simulations can also be applied to 40 Gbit/s systems by scaling down all time quantities by a factor of four. As in previous results, we used the eye opening for performance evaluation. The three-section compensator has two fixed-DGD elements of 45 ps and one variable-DGD element. The results that we present in this section were obtained using 30 MMC iterations with 8,000 samples each and using importance sampling with a total of 2.4 × 10 5 samples. We estimated the errors in MMC using the transition matrix method that we described in Section 2.2, while we estimated the errors in importance sampling as in [25].
In Fig. 14, we show the outage probability for a 1-dB penalty as function of the DGD element (τ c ) for a system with the three-section compensator that we used. We observed that there is an optimum value for τ c that minimizes the outage probability, which is close to 45 ps. We set the values for the two fixed-DGD elements of the three-section PMD compensator that we used to this optimum value. The reason why the outage probability rises when τ c becomes larger than this optimum is because large values of τ c add unacceptable penalties to fiber realizations with relatively small second-order PMD values that could be adequately compensated at lower values of τ c . We also observed that there is a relatively small dependence of the outage probability on τ c . That is because the third, variable-DGD section of the compensator cancels the residual DGD after the first two sections, which significantly mitigates the penalty regardless of the value of τ c .  In Fig. 15, we plot the outage probability (P out ) as a function of the eye-opening penalty for the compensators that we studied. The histogram of the penalty was divided into 34 evenly spaced bins in the range −0.1 and 2 dB, even though we show results from 0 to 1.5 dB of penalty. The maximum relative error ( σ P out / P out ) for the curves computed with MMC shown in this plot equals 0.13. The relative error for the curves computed with importance sampling is smaller than with MMC, and is not shown in the plot. The maximum relative error for the curves computed with importance sampling equals 0.1. The results obtained using MMC (solid lines) are in agreement with the ones obtained using importance sampling (symbols). The agreement between the MMC and importance sampling results was expected for the case that we used a single-section compensator, since this type of compensator can only compensate for first-order PMD [6], so that the dominant source of penalty after compensation is the second-order PMD of the transmission line. Hence, it is expected that MMC and importance sampling give similar results. We also observed good agreement between the MMC and importance sampling results for the three-section compensator. This level of agreement indicates that three-section compensators that compensate for the first two orders of the Taylor expansion of the transmission line PMD produce residual third and higher orders of PMD that are significantly correlated with the first-and second-order PMD of the transmission line. That is why the use of importance sampling to bias first-and second-order PMD is sufficient to accurately compute the outage probability in systems where the first two orders of PMD of the transmission line are compensated.
Significantly, we observed that the performance improvement with the addition of two sections, from the single-section compensator to the three-section compensator, is not as large as the improvement in the performance when one section is added, from the uncompensated to the single-section compensator. The diminishing returns that we observed for increased compensator complexity is consistent with the existence of correlations between the residual higher orders of PMD after compensation and the first two orders of PMD of the transmission line that are compensated by the three-section compensator.  Figures 16-18 quantify the correlation between the lower and higher orders of PMD. In Fig.  16, we show the conditional expectation of the magnitude of second-order of PMD both before and after the three-section compensator given a value of the DGD of the transmission line. In these figures, the DGD | τ | is normalized by the mean DGD | τ | and | τ ω | is normalized by | τ ω | to obtain results that are independent of the mean DGD and of the mean of the magnitude of second-order PMD. We observed a large correlation between | τ | and | τ ω | before compensation, while after compensation | τ ω | is significantly reduced and is less correlated with the DGD, demonstrating the effectiveness of the three-section compensator in compensating for second-order PMD. In Figs. 17and 18, we show the conditional expectation of the magnitude of the third-order PMD and of the fourth-order PMD, respectively, before and after the three-section compensator, given a value of the DGD of the transmission line. In both cases, we observed a high correlation of the third-and the fourth-order PMD with the DGD before and after compensation. In addition, we observed a significant increase of these higher-order PMD components after compensation, which leads to a residual penalty after compensation that is correlated to the original firstand second-order PMD.
In Fig. 19, we show contour plots of the conditional expectation of the penalty with respect to the first-and second-order PMD for a system with a three-section PMD compensator [35]. These results show that the residual penalty after compensation is significantly correlated with the first-and second-order PMD. The correlation between the higher orders of PMD with the DGD that we show in Figs. 16-18 can be estimated from the concatenation rule [42], which explicitly indicates a dependence of the higher-order PMD components on the lower order components. The increase in these higher-order components after compensation is also due to our choice of the operating point of this compensator, which is set to compensate only for first-and second-order PMD, regardless of the higher-order PMD components. It is possible that this three-section PMD compensator would perform better if all 7 parameters of the compensator are adjusted to achieve the global penalty minimum. However, finding this global optimum is unpractical due to the large number of local optima in such a multidimensional optimization space, as we found in our investigation of single-section PMD compensators [14]. On the other hand, the compensation of first-and second-order PMD using the three-section compensator that we studied here, which was proposed by Zheng, et al. [35], can be implemented in practice. In this Section, we showed that both multiple importance sampling and MMC can be used with all the compensators that we investigated to reduce the computation time for the outage probability due to PMD in optical fiber communication systems. Importance sampling in which both the first-and second-order PMD are biased can be used to efficiently compute the outage probability even with a three-section PMD compensator in which both first-and second-order PMD are compensated, which is consistent with the existence of a large corre-lation between first-and second-order PMD of the transmission line and higher orders of PMD after compensation. We directly verified the existence of these correlations. In contrast to what we presented in Fig.11, where importance sampling was used to validate the results with MMC, in the resulted subsequently presented, we used MMC to validate the results obtained with importance sampling. We used MMC to validate the results obtained with importance sampling because MMC can be used to compute penalties induced by all orders of PMD and not just penalties correlated to first-and second-order PMD as is the case with the importance sampling method. We showed that MMC yields the same results as importance sampling, within the statistical errors of both methods. Finally, we showed that the three-section compensator offers less than twice the advantage (in dB) of single-section compensators. We attribute the diminishing returns with increased complexity to the existence of correlations between the first two orders of PMD prior to compensation and higher orders of PMD after compensation.

Conclusions
In this chapter, we used MMC and IS in which both the first-and second-order PMD are biased to investigate the performance of single-section and three-section PMD compensators. We showed that both methods are effective to compute outage probabilities for the optical fiber communication systems that we studied with and without PMD compensators. The comparison of importance sampling to the MMC method not only allowed us to mutually validate both calculations, but yielded insights that were not obtained from either method alone. The development of IS requires some a priori knowledge of how to bias a given parameter in the simulations. In this particular problem, the parameter of interest is the penalty. However, to date there is no IS method that directly biases the penalty. Instead of directly biasing the penalty, one has to rely on the correlation of the first-and second-order PMD with the penalty, which may not hold in all compensated systems. In contrast to IS, MMC does not require a priori knowledge of which rare events contribute significantly to the penalty distribution function in the tails, since the bias is done automatically in MMC. Because the samples in IS are independent, IS converges more rapidly than MMC when the biased quantity is highly correlated to the parameter of interest. However this is not always the case. The applicability of IS to model a system with a three-section PMD compensator, in which both first-and second-order components of the Taylor's expansion of PMD in the frequency domain are compensated, is consistent with the existence of a large correlation between first-and second-order PMD components of the transmission line and the higher orders of PMD after compensation. Thus, even when the first two orders of PMD are compensated, these quantities prior to compensation still remain highly correlated with the residual penalty.
It is essential to carefully monitor statistical errors when carrying out Monte Carlo simulations in order to verify the accuracy of the results. Effective procedures for calculating the statistical errors in standard Monte Carlo simulations are well known and are easily implemented. Moreover, in this case, each sample is independently drawn, and the errors in each bin of the histogram will also be independent. Hence, the smoothness of the histogram is often a good indication that the errors are acceptably low. While calculating the statistical errors with importance sampling is more complicated, analytical formulae have been successfully implemented. By contrast, calculating statistical errors using MMC is not trivial. MMC generates correlated samples, so that standard error estimation techniques cannot be applied. To enable the estimate of the statistical errors in the calculations using MMC we developed a method that we refer to as the MMC transition matrix method. The method is based on the calculation of a transition matrix with a standard MMC simulations and the use of this transition matrix to draw a large number of independent samples.