Machine Learning Techniques to Mitigate Nonlinear Phase Noise in Moderate Baud Rate Optical Communication Systems

Nonlinear phase noise (NLPN) is the most common impairment that degrades the performance of radio-over-fiber networks. The effect of NLPN in the constellation diagram consists of a shape distortion of symbols that increases the symbol error rate due to symbol overlapping when using a conventional demodulation grid. Symbol shape characterization was obtained experimentally at a moderate baud rate (250 MBd) for constellations impaired by phase noise due to a mismatch between the optical carrier and the transmitted radio frequency signal. Machine learning algorithms have become a powerful tool to perform monitoring and to identify and mitigate distortions introduced in both the electrical and optical domains. Clustering-based demodulation assisted with Voronoi contours enables the definition of non-Gaussian boundaries to provide flexible demodulation of 16-QAM and 4+12 PSK modulation formats. Phase-offset and in-phase and quadrature imbalance may be detected on the received constellation and compensated by applying thresholding boundaries obtained from impairment characterization through statistical analysis. Experimental results show increased tolerance to the optical signal-to-noise ratio (OSNR) obtained from clustering methods based on k-means and fuzzy c-means Gustafson-Kessel algorithms. Improvements of 3.2 dB for 16-QAM, and 1.4 dB for 4+12 PSK in the OSNR scale as a function of the bit error rate are obtained without requiring additional compensation algorithms. A classifier, is introduced and range A proposal of a bit-based SVM as a non-parameter nonlinear mitigation approach in the millimeter-wave Radio-over-Fiber (mm-RoF) system for different modulation formats is demonstrated. Experimental results outperform the k -means algorithm and show improvements of 1.2 dB for 16-QAM, 1.3 dB for 64-QAM, 1.8 dB for 16-APSK, and 1.3 dB for 32-APSK at BER of 1e-3 with the SVM detector, respectively. Convolutional An intelligent constellation diagram analyzer is proposed to implement both modulation format recognition (MFR) and OSNR estimation by using a CNN-based deep learning technique. The experimental results showed that the OSNR estimation errors for all the signals were less than 0.7 dB and the accuracy of MFR was 100%, proving the feasibility of the proposed scheme. Maximization of capacity over deployed links require operation regime estimation based on precise understanding of transmission conditions through linear and nonlinear SNR from the received signal. The extraction of NLPN and second-order statistical moments by a neural network is trained to estimate SNR from extensive realistic fiber transmissions. Measured performance of 0.04 and 0.2 dB of standard error for the linear and nonlinear SNRs, respectively,


Introduction
Novel transceiver architectures for the 5th Generation (5G) communication networks front-and backhaul must make use of the digital signal processing (DSP)assisted transmission capabilities to optimize the trade-off among signal processing complexity, spectral efficiency and reach in the short access radio networks [1]. One key element in hybrid optical-wireless next-generation technologies is the capability to identifying the noise and distortions, their impact on the signal, and how the modulation format is adapted to that channel quality conditions [2]. For this reason, an important disadvantage of analog Radio-over-Fiber (A-RoF) systems using high-order modulation formats is its inherent propensity to transmission impairments risen from the electrical and optical domain. The immediate effect on the signal is a limitation in the dynamic range (DR) [3], in terms of reach as a function of the bit rate. In the radiofrequency (RF) communications context, oscillators are key devices of transmission chain whose main purpose is to deliver stable reference signals to perform frequency translation from baseband to passband and to provide in-phase (I)/quadrature (Q) components synchronization. As inherent in electronic devices, practical RF oscillators exhibit nonideal behavior and characteristics leading to random phase noise (PN) on the received signal. The spectral regrowth at the output of a nonlinear amplifier and IQ imbalance, combined with the effect of PN, are the major RF impairments that degrade the performance of communication systems in the electrical domain [4]. PN arises mainly by inherent passive and active components, which introduce thermal noise inside the circuitry of oscillators; therefore, due to transistor transient state's, aspects, as a rule of thumb, the higher the frequency of an oscillator, the worst is its performance in terms of PN. For coherent systems, the impact of PN over data symbols may be observed in the constellation diagram, where the main effect is a random rotation of symbols, causing their distribution over rings with elongated shapes. Because of this shape distortion, some immediate effects are (i) increased detection of erroneous symbols, (ii) spectral broadening, which can lead to interusers interference, (iii) in-band signal injection of out-of-band emission, and (iv) loss of orthogonality among multi-carrier waveforms. These effects over signal imply tackling synchronization issues leveraging on dedicated blocks of the DSP chain at the receiver [5]. The main task of the synchronization stage is to match the frequency and phase of transmitter and receiver oscillators, in both the electrical and optical domains. This process includes time recovery and tracking of the local oscillator (LO) frequency mismatch, to compensate for impairments caused by PN.
In this scenario, a critical block in the DSP chain of optical communication systems is the carrier phase recovery (CPR) block, which is required before the symbol decision takes place. It has the role of compensating the PN, directly related to the frequency noise of the lasers employed as optical carriers in the system. The noise of lasers consist of short-term random fluctuations of the optical phase and other different output parameters. Besides, the baud rate, the transmission length, and, in general, the DR of optical communication systems are limited by noise issues. The key parameter related to the noise in lasers is the linewidth, defined for a monochromatic source as the width of its optical spectrum, i.e., the power spectral density (PSD) of the emitted electrical field, oscillating at optical frequencies, as a function of frequency, wavenumber, or wavelength. An ideal electronic or optical oscillator would be, strictly speaking, monochromatic, exhibiting a zero linewidth. In other words, all the energy it emits is completely concentrated on its nominal frequency. Real oscillators are not exactly monochromatic and have a finite, nonzero linewidth. Besides, before the first laser was experimentally demonstrated, Schawlow and Townes showed that a laser linewidth could not have values lower than a fundamental quantum limit [6]. Nowadays, the most widely employed lasers in optical communications are semiconductor lasers; Henry demonstrated that this kind of lasers exhibits a linewidth enhancement factor [7], concerning to the Schawlow-Townes quantum limit. The line broadening is caused by the coupling of phase and amplitude noise, due to the variations of carrier density (i.e., of the refractive index) in the laser cavity. Further analysis of these phenomena is out of

Nonlinear phase noise (NLPN)
As demonstrated by Gordon and Mollenhauer [17], the linear phase noise associated with ASE noise from inline amplifiers interacts with the transmitted signals via the Kerr effect, generating a strong distortion over symbols, referred to as NLPN. In an optical transmission system, amplification is an essential operation to compensate for signal propagation losses. Commonly, erbium-doped fiber amplifiers (EDFAs) are employed as optical boosters in the transmitters, as well as in-line optical amplifiers and pre-amplifiers in the receivers. Their benefit in preserving the signal level among the receiver sensitivity comes with the undesired amplified spontaneous emission (ASE) noise. The main effect of NLPN consists of creating a nonlinear distortion over the shape of external symbols, which degrades the symbol error performance. To compensate for this effect, the demodulation process requires non-Gaussian decision boundaries that, in practice are not simple to realize.
In the considered scenario for the validation of the proposed method, NLPN was observed in both the 16-QAM and 4 + 12 PSK constellation diagrams. In our case, the NLPN effects are very strong, due to the low adopted baud rate, which translates to a long symbol duration, that must be compared to the coherence time t c between the local oscillator and the incoming signal from the transmitter (ideally, a baud rate of 250 MBd needs a t c around 4 ns). Usually, this effect on symbols is due to higher linewidths of the laser [18], but, in our case, the linewidth used was 1 kHz. In Table 1, a state-of-the-art synthesis and review of different methods to compensate impairments using the conventional equalization techniques through DSP techniques and the design and optimization of constellations tolerant to noises and distortions is introduced. This table presents columns with a brief explanation of the proposed method showing the metrics used to evaluate the performance, the modulation scheme, and the authors and year of publication.
For multilevel advanced modulation formats, especially for m-ary phase-shift keying (m-PSK) and m-ary quadrature amplitude modulation (m-QAM), the symbols in the constellation diagram suffer from linear amplitude noise and PN. In an ideal oscillator, the phase variance over a given time interval remains small, and the output signal is nearly periodic. However, as already mentioned, in practical oscillators, the phase random variation or phase jitter is a random variable associated with the PN random process, being the instantaneous deviation of phase from the ideal value caused by PN [28]. Either in homodyne or heterodyne detection, the beating of the received signal with a LO with the same frequency of the optical carrier enables the detection and demodulation process; however, since the phase is not stable in time, frequency will not be the same between LO and carrier, causing a lack of synchronization that translates into the degradation of the received signal. The constellation symbol will not be in its ideal place (for example, in QAM the ideal location of a symbol point i.e., 1 + i), and it will be rotated by an angle given by the phase difference between the phasor of the transmitted signal (i.e., symbol) and the phasor of the LO. If this phase difference is large, the symbols fall outside the right decision region, tracing an elongated shape in the constellation, and the symbol error performance is degraded. In long-haul scenarios, the ASE noise from inline high-gain amplifiers becomes the main source of the noise [29]. The NLPN can be compensated electronically by subtracting from the received phase a correction that is proportional to the received intensity. The optimal scaling factor is approximately equal to half of the ratio of the mean NLPN and the mean received intensity [30]. Thus, the phase jitter, i.e. the standard deviation of residual phase noise is halved, and the transmission distance is doubled. In [31], the minimization of the total noise variance is performed by using a power profile along with the link that is linear with distance.

Description
Mod. format

Authors and year
Digital signal processing and equalization algorithms Digital backpropagation (DBP) with coherent detection are proposed to mitigate dispersion effects and nonlinear impairments through noniterative asymmetric split-step Fourier method (SSFM). An optimal decision map and a heuristic for the step size and sampling rate requirements for RZ-QPSK were obtained by numerical simulation.
RZ-QPSK Ezra and Kahn [19] PN tolerance of circular multilevel quadrature amplitude modulation (C-mQAM) constellations employing different carrier phase recovery (CPR) algorithms is studied and evaluated using a differential decoding and bitmapping scheme. Carrier phase recovery achieves similar performance than the blind phase search (BPS) algorithm at lower computational complexity level, obtaining 3.8e-3 and 1e-2 BER levels for C-16QAM and C-64QAM modulation formats, respectively.

16-QAM Pakala and
Schmauss [22] An effective k-nearest neighbor (kNN) detector to further improve the performance of the optical phase conjugate (OPC) system by mitigating non-Gaussian impairments caused by NLPN is numerically evaluated.

QPSK, 16-QAM
Jiang et al. [23] Constellation design and optimization The authors propose an optimal signal constellation design (OSCD) algorithm suitable for the phase noise channels using the cumulative log-likelihood (LLR) function as the optimization criterion.

DPSK
Liu et al. [24] Joint coding and modulation format optimization using bit-interleaved coded modulation (BICM) mutual information shows that circular constellation 8-QAM is suitable for high-coding rates while rectangular 8-QAM provides the best performance for low-coding rates, and both schemes outperform start 8-QAM.

8-QAM Circular and star 8-QAM
Ríos Müller et al. [25] Prototype nonconventional constellations generated using digital pre-distortion and demodulated at the receiver using clustering were validated experimentally.

Machine learning (ML) techniques
The main challenge of ML algorithms consists of mitigating nonlinearities, where several performance parameters and error estimation techniques have been applied to linear and nonlinear impairments (chromatic dispersion, polarization mode dispersion, Kerr effect, etc.) that distort the shape of symbol points in the constellation diagram. Under adverse conditions during propagation, complex highly unpredictable analytical models should be better implemented with flexible techniques that take advantage of feature extraction and exploit the knowledge of historical data to create direct input-output relations between monitored parameters and desired outputs. The immediate effect of these ML methods is to decrease the error rate by enhancing the distance transmission or increasing the optical signal to noise ratio (OSNR) tolerance against these linear and nonlinear effects. An additional key issue is that ML in optical communication systems can work where is not possible to obtain a closed-form expression that model the optical channel [32]. Parameter estimation or system monitoring must be performed before signal demodulation, but is not a trivial task because of linear and nonlinear transmission impairments, especially when the signal crosses through different channels like the hybrid optical-wireless considered here (the Radio-over-Fiber system) has not a defined mathematical model including channel condition variations and transient impairment contributions during operation. Therefore, it is appropriate to use versatile methods that learn from data to predict i.e., the conditional probability or maximum likelihood function or to use the classical compensation method where to have a constrained channel model is possible to apply training sequences to estimate and equalize the channel response and improve the system performance. In this sense, to apply different machine learning approaches is attractive to characterize, to train and learn a mapping function between the signal features in the opticalwireless scenario, to and define non-Gaussian boundaries in constellation diagram before signal demodulation.
Machine learning has been successfully applied to a wide variety of problems in the context of telecommunication networks, mainly in wireless networks, ranging from opportunistic spectrum access, to channel estimation and signal detection in orthogonal-frequency division multiplexing (OFDM) to multiple-input-multipleoutput (MIMO) communications. Many techniques within the previous categories can be addressed at the physical layer of optical transmission for optical performance monitoring (OPM) [33][34][35][36][37][38][39][40], modulation format recognition (MFR) [41], or nonlinearity mitigation. Ideally, OPM should be implemented just with a single photodiode and machine learning algorithms that can learn the mapping between the detected signal and optical fiber channel parameters and finally predict optical fiber channel parameters from energy constellation diagrams or power eye diagrams. Moreover, machine learning algorithms enable the MFR [39] of an incoming signal to perform accurate constellation demodulation, also, to apply signal processing and detection at the receiver.
In Table 2, novel architectures and techniques under the artificial intelligence umbrella are reviewed, mainly from perspective of optical communication systems to mitigate nonlinear effects and perform signal demodulation. Machine learning techniques based on clustering or support vector machine (SVM) may assist the

Method Description Mod. format Authors year
Clustering Burst optical signal detection based on a modified k-means clustering algorithm is used to establish a threshold, and the effects of the unbalanced ratio of bits zero and one to improve detection performance in terms of SNR without preamble field for amplitude recovery. In addition, a data-aided feedforward symbol-timing recovery method based on a polynomial interpolation and maximum-likelihood estimation theory are developed.

16-APSK
Han et al. [11] definition of nonlinear boundaries for demodulation, avoiding additional stages for phase estimation and correction, extraction of features from constellations of real transmission scenarios, demonstrating the potential of artificial intelligence algorithms in next-generation optical-wireless technologies. In [42], a description of the mathematical background of machine learning techniques from a signal processing perspective applied to nonlinear transmission systems, OPM, and cross-layer network optimizations for software-defined networks (SDN) is reviewed as an alternative, unique, and powerful set of tools to conventional compensation techniques in the wireless-optical fiber context. In this chapter, clustering techniques like k-means and fuzzy c-means are used to classify data symbols of the classical 16-QAM constellation and 4 + 12 PSK modulation formats. Here, we report on an experimental RoF testbed to demonstrate the impact of techniques based on statistical analysis and advanced clustering methods for the mitigation of NPLN-induced impairments. The definition of such RoF testbed includes a clustering-based detector that allows identifying the distortive effects over symbols in the constellation diagram of both channels, the electrical and the optical fiber channel that, together, introduce particular distortions including NLPN. The processing that applies Voronoi contours before signal demodulation, defines non-Gaussian boundaries and increases noise tolerance improving the system error performance.

Oscillator phase noise model
The general model of a noisy signal in the time domain is represented by where ε(t) is the amplitude modulation (AM) noise, and ϕ(t) is the phase modulation (PM) noise, also known as phase jitter. The normalized carrier power spectrum without noise is given by The power spectrum of Eq. (1) is shown in Figure 1 with and without noise. The PN causes the instantaneous frequency to "jitter" around the carrier frequency, f c . This leads to the noise sidebands on either side of f c as shown in the picture. An important relation between the coherence time and the PN must be considered here. The coherence time is the time over which a propagating wave may be considered correlated (coherent). In simple words, it is the interval within which the wave phase is, on average, predictable. Therefore, the coherence time can be expressed as where Δv is the spectral width (Hz) of the source or, in other words, the range within which the frequency "jitters." The PN is a random process that is a function of time, but, commonly, its monolateral power spectrum density is considered and represented versus the offset frequency from the carrier, as depicted as an example in Figure 2. The phase jitter Δφ can be calculated from the power of the PN random process S ϕ (f) by the following formula: usually expressed in radians. The lower and upper integral bound f l and f u can be arbitrarily set depending on the way the phase jitter is calculated, as it will be explained in the following. It is worth noticing that the value of the integral corresponds to the PN process variance, and besides, the process to be non-zero only in the range f l to f u .
Another important random variable for the analysis of the stability of an oscillator is the time jitter Δτ, which can be calculated from the phase jitter where f c is the carrier frequency. The timing jitter, usually expressed in seconds (or fractions of seconds), represents the mean variation in time of the wave period, which can change since the wave frequency fluctuates.
For this reason, it can also be expressed in parts per million (ppm) of the period. The timing jitter is divided by the carrier frequency. The integral lower bound of Eq. (4) is f l , i.e., the closest offset frequency from considered to calculate the jitter. Therefore, the inverse of the lower bound frequency 1/f l can be considered as a good approximation, the coherence time t c , because it represents the longest time over which that curve of PN ensures that particular value of jitter. To illustrate the relation between the coherence time t c and the PN, consider Figure 2. As an example, for a baud rate of 250 MBd (equivalent to 1 Gb/s in 16-QAM), the needed coherence time corresponds is t c ≈ 4 ns, while for 10 GBd it is t c ≈ 100 ps because it is the duration of one symbol. In other words, the required signal stability should be kept for at least one symbol duration.  PN manifests as the jitter on a signal zero crossings. The variance of such (random) zero-crossing jitter is related to the variance of the phase noise itself [51]. One way to deal with the PN effect on the BER consists in measuring the parameter f sym /f c . For large values of this ratio, the noise effect on BER becomes smaller as f sym /f c increases [52]. For that range, one symbol time is short compared with the phase fluctuation period caused by the PN. Hence, the phase fluctuation in one symbol time delayed signal is virtually the same as the fluctuation in the signal before the onset of the delay. Conversely, in small f sym /f c range, the PN effect on BER becomes smaller as f sym /f c decreases. This is because the PN integration value over the receiver noise-equivalent bandwidth becomes smaller due to the filter at the receiver.
As already mentioned, a way to introduce PN in a system to measure its effect is by using larger laser linewidth on the light source, despite this is not this case because the experimental setup that will be reported here used a laser LW of 1 kHz. Therefore, from the laser's LW point of view, the non-spectral purity (nonmonochromatic) of the lasers employed for signal modulation/demodulation in coherent optical systems generally translates into PN impairment in the detected signal. The normalized output of a transmitting laser can be modeled as w tx and ϕ PN are the central emitting frequency and PN of the laser. The PN can be modeled as a Wiener process With Δϕ PN being an independent and identically distributed (i.i.d.) random variable with a mean μ = 0 and variance σ 2 ÞÁΔv is the laser LW, and T s is the symbol period. When transmission uses higher m-ary modulation formats, the effect of the laser LW must be as low as possible to avoid PN and other impairments arise from combined effects introduced from devices in the system.
Clock signals are required in almost every integrated circuit and some desired characteristics related to their quality must be considered. Digital signal processing is used to perform conversions between analog and digital signals and to process and transmit data at higher data rates and also higher resolutions. The clock signal quality can be described by timing jitter or PN measurements. Therefore, the typical clock PN measurement explores the power density spectrum (PSD) of a clock signal through the above defined time (or period) jitter as the often used jitter measurement parameter.
Through the Fourier series expansion, it can be shown that a square-wave clock signal has the same jitter behavior as its base harmonic sinusoid signal. For this case, a clock sinusoid signal with PN [51] can be written by And the period jitter being given by The signal in Eq. (9) is a phase-modulated sinewave by the PN, ϕ(t). Since the PN is usually much smaller than π/2, Eq. (8) can be approximated as From an operational point of view, therefore, it is possible to set to set a method measuring the PN process observing that the signal s clk (t) mixed with a cos(2πf c t) and filtered by a low-pass filter in the frequency domain allows to obtain the PN spectrum S ϕ (f) and its logarithmic version L(f), related by the following formula: The PN spectrum L(f) defined as the attenuation in dBc/Hz from the peak value S clk (f) at the clock frequency, f o , to a value of S clk (f) at f m , as shown in Figure 1b. The mathematical expression that represents the PN spectrum is given by Phase noise testing characterizes the spectral purity of a LO by calculating the ratio of the desired energy being delivered by the oscillator at the specified output frequency to the amount of undesired energy being delivered at adjacent frequencies. It is a measure of noise power per unit bandwidth appearing at a given offset from the carrier frequency [53]. From Eq. (12), the mean square (MS) of ϕ(t) is equivalent to the PN variance, since it is a zero-mean process, and it can be calculated by that, finally, shows the relation between the timing jitter Δτ and the PN spectrum, L(f), as The real integral bounds are set to be f l (that is not 0, but with an offset from the carrier) and f u (usually corresponding to the frequency where the PN spectrum enters the noise floor of measurement instrument). An approximated linear piecewise function using a log scale for the L(f) spectrum can be used to estimate the Δτ as presented in [51].

Clustering algorithms
According to state of the art, clustering algorithms are presented in this section to understand how we apply the algorithms to identify the change of shape and distortions and to classify each data symbol affected by PN from statistical analysis. Also, like each symbol is surrounded by a decision region, we use the Voronoi diagram to estimate the contour of each symbol, and hence, the last section describes how to apply Voronoi algorithm to obtain flexible thresholding decision boundaries.

k-means clustering algorithm
k-means is a well-known clustering algorithm applied to perform phase recovery and demodulation processes [54] in a phase-modulated RoF system. It is one unsupervised learning algorithm, whose start point is a predefined and fixed grid on IQ plane. For the modulation formats 16-QAM and 4 + 12 PSK to be evaluated, the partition corresponds to 16 clusters, where each observation is assigned to one of the k clusters defined by centroids according to the nearest mean. The algorithm aims to minimize an objective function known as squared error J and works as follows: for a cluster centroid v i = {v 1 , … ,v c } of the data-set {x 1 , … ,x N }, consisting of N observations, the algorithm [55] first groups randomly selected centroids, which are used as the beginning points for every cluster, and then performs iterative calculations to optimize the positions of the centroids, according to A data-set of N observations assigned according to this algorithm (7), create different clusters with irregular shapes. Figure 3 shows the general way in which the k-means algorithm works when no vector of centroids is predefined. The first step (Figure 3a) is to place centroids v at random locations or referenced locations, i.e., the ideal location of symbols according to the m-ary level of the modulation format used. Then, in the second step (also as in Figure 3a), each point from x k is assigned to the nearest centroid of the cluster i, by applying some distance metric, e.g., Euclidean distance (ED) between sample x k and cluster centroid v i . After this, the third step (Figure 3b) consists of updating each cluster by averaging all the points assigned to cluster i in the previous step. Finally, at the fourth step (also in Figure 3b), all the points are assigned to the closest cluster, and the centroids are updated until convergence.

Fuzzy c-means (FCM)
The FCM is an algorithm for clustering that allows to observations belonging to one, two, or more clusters with some membership degree [55]. FCM is based on the minimization of the following objective function where k is the number of clusters, N is the total number of observations. The squared norm represents the sum of squares of the distances of each data point to its assigned centroid vector c k , for a set of binary indicator variables μ ik ∈{0,1}, known as the degree of membership of the data points x n associated to the cluster centroids c k . The centroid of the cluster prototype is calculated as The goal is to find an assignment value of membership for each data point to the cluster where belongs (it means, the belonging degree per symbol for each cluster in the constellation diagram), as well as, the weighted sum of distances of each data point to its closest vector ||c k À v i || 2 that minimizes J. In (8) and (9), m > 1 is a real number that governs the influence of individual fuzzy sets in fuzzy partition, and besides, when m goes toward infinity, all clusters tend to converge to the centroids of the data-set X.

Fuzzy c-means Gustafson-Kessel (FCM-GK)
The Gustafson-Kessel algorithm [55,56] associates each sample point with its centroid and its covariance. The main feature of this clustering algorithm is the local adaptation of the distance matrix to identify clusters of different geometrical shapes in one data-set. Each cluster has its own norm-inducing matrix A i , which yields the following inner product norm: where v i is the mean for that points over cluster i (v i is referred as the cluster prototype), and the matrices A i are used as optimization variables in the c-means functional, besides, allowing each cluster to adapt the distance norm to the local topological structure of the data. The objective functional J m of GK algorithm is defined as However, the objective function cannot be directly minimized concerning A i , because it is linear in A i , and then to obtain a feasible solution, it is necessary to constrain the determinant of A i . This procedure allows the matrix A i to vary with its determinant fixed corresponding to the optimization of the cluster's shape while its volume remains constant: where ρ i is fixed for each cluster. Using the Lagrange multiplier method, the following expression for A i is obtained: Being F i the fuzzy covariance matrix of the ith cluster defined by: The substitution of Eqs. (13) and (14) in Eq. (10) gives a generalized squared Mahalanobis distance norm between x k and the cluster mean v i , where the covariance is weighted by the membership degrees in the partition matrix U = [μ ik ]. Besides, whereas the original FCM algorithm makes the implicit assumption that the clusters are spherical, the Gustafson-Kessel algorithm is not subject to this constraint and can identify ellipsoidal clusters, due to a size restriction on the covariance matrix whose determinant must be the unit. In Figure 4, a graphical description of the FCM-GK algorithm is shown. The general setting for our clustering-based demodulation (for simplicity, we only use two clusters) is applied as follows: (i) the FCM-GK (in general, also k-means and FCM procedures are similar) algorithm requires as inputs, the data, and, the number of clusters v i (predefined value for 16-QAM modulation format). The data symbols correspond to bidimensional (2-D) vectors x(k) = [x i (k) x q (k)], where x i and x q are the in-phase and quadrature (IQ) component samples projected on complex plane. From this data-set, the weighted exponent m, the tolerance criteria ε ≤ 0.001, and the partition matrix U (randomly) are all initialized; (ii) then centroid prototypes are estimated as indicated in (8) and, as shown in Figure 4a; (iii) after that, membership values are obtained for each data x k with respect to the closest centroid (see Figure 4b), calculating the distance norm (k-means and FCM uses Euclidean norm, and, FCM-GK uses Mahalanobis distance norm, but constrained to fixed determinant of A equal to 1); (iv) the matrix fuzzy partition U is updated (see Figure 4c); and, finally, (v) the criterion for termination is calculated as ||U(l) À U(l À 1)|| < ε, if convergence is achieved, the algorithm stops, if not, returns to step iii. Besides, we obtain Voronoi contours [57] to define the nonlinear decision boundaries without considering the shape or orientation of the symbol within the constellation diagram. This effect is relevant especially for the symbols in the corners, which are the most severely impacted by PN, as it is shown in Figure 4d, and explained as follow.

Voronoi contours for flexible thresholding
The Voronoi diagram by contours has been used to assist the clustering-based demodulation stage when symbol distortion is severe [11]. An approximation to its definition may be introduced as the proximity regions to a set of objects in a bidimensional plane. As an example, Figure 5 shows a Voronoi diagram for a 16-QAM and 4 + 12 PSK constellation defining the boundaries of the data symbols. This technique adds flexibility to the demodulation process because non-Gaussian boundaries caused by PN are not trivial to estimate and require nonlinear equalizers highly time-consuming and computationally expensive. In the 4 + 12 PSK modulation scheme, due to the lower ED between symbols in the external ring, the overlapping on the symbol clusters is evident, and as a direct consequence, the symbol error rate is increased at the demodulation stage. Then, an arbitrary symbol point is assigned to the site (single point-centroid-of a region inside a Voronoi partition that meets certain properties [58]) with the minimal Euclidean distance. The set of all symbol points with the same assigned site builds the Voronoi region (VR). In our case, for both modulation schemes, we have 16 sites leading to more Voronoi edges (VEs), these VEs are defined by the points which hold the same ED for two sites. But, in general, there are more than two sites requiring at least one Voronoi vertices (VV), which represents a symbol point with the same minimum Euclidean distance toward the 16 sites. In this way, the Voronoi diagram resulting as can be seen in Figure 5 on the right, defining the contours properly for all the sites corresponding to the 16 symbol points of the constellation. Figure 6 shows the schematic diagram for the RoF system setup as single channel and single polarization. A pseudorandom binary sequence (PRBS) with length 2 18 at a baud rate of 250 MBd was generated to deliver 1 Gb/s in a single channel and single polarization case. The pulse shaping was implemented using a finite impulse response (FIR) root-raised-cosine (RRC) filter with eight taps and a commonly used roll-off factor α = 0.2. After that, the modulation process was performed encoding 4 bits per symbol for both 16-QAM and 4 + 12 PSK. This signal was digitized through a Fujitsu LEIA board with a sampling frequency of 64 GSa/s with an embedded up-conversion stage that translated the baseband signal to a RF carrier at 6 GHz. The RF signal was fed into the Mach-Zehnder modulator (MZM) RF inputs to modulate a continuous wave (CW) laser. The obtained optical signal was propagated over a spool of fiber with length 78.8 km. The employed optical source was a distributed feedback (DFB) laser (LW of 100 kHz) emitting at 1550 nm, which was amplified by an EDFA before the electro-optical MZM. To test the channel performance against noise, ASE noise was generated over a bandwidth of 3.5 nm centered around the CW wavelength and injected in the channel before the receiver. The received signal was pre-amplified and filtered by an optical bandpass filter (OBPF) with a bandwidth of 3.5 nm and photodetected by a Nortel PP-10G photoreceiver with an electrical bandwidth of 13 GHz. A variable optical attenuator (VOA) was adjusted to set the power at the input of the photodetector at À5 dBm. Then, the electrical signal was entered into a digital oscilloscope with a sampling rate of 25 GSa/s, and the captured data were stored for offline processing. Homodyne detection was performed at the receiver, and the frequency difference at the transmitter between the LO and the optical carrier was not compensated. This mismatch introduces PN distortion on data symbols, mainly on outer symbols like the one shown in Figure 5. After the data acquisition, the down-conversion stage translated the RF signal to baseband; and then, the matched filtering with the same prototype characteristics of the transmitter side was applied. Besides, in the demodulation stage, the conventional rectangular grid demodulations, i.e., kmeans, FCM, and FCM-GK clustering-based methods are evaluated varying the OSNR from 16 to 36 dB, as a function of bit error rate (BER). The FCM-GK demodulation process is aided with the Voronoi diagrams to estimate the non-Gaussian boundary per symbol using different step-size per contour line.

Analysis of the results
Observing the 16-QAM constellation of Figure 5, the shape of symbols mainly in the closest to corners is quasi-ellipsoidal: this non-Gaussian shape is due to the hybrid channel nonlinearities jointly combined with NLPN, conversely, the inner clusters of symbols follows a circular behavior. It is important to highlight that the main distortions introduced by the PN are proportional to the noise variance (power) of each symbol, being this effect consistent with the higher amplitude of the external symbols as shown in Figure 7. In this figure, three rings can be clearly identified. Here, a ring is defined as the circumference along which symbols are distributed, and it is associated with an energy level on a circumference where symbols are distributed. Considering this non-Gaussian shape, and taking advantage of the clustering technique, it was possible to perform an individual symbol characterization through statistical analysis. In Figure 8, we show a 16-QAM constellation, where fragmentation per ring and symbol has been applied. We perform an individual analysis of each cluster by estimating the mean, standard deviation, variance, histogram, and contour. For example, for symbol "1001" (Gray coding), we obtain a mean of À0.9530 and a variance of 0.1986; this inner symbol grows with a circular shape like observed in an additive white Gaussian Noise (AWGN) channel. But for the symbol at the corner "0000," the shape is ellipsoidal, and the normal distribution has a higher variance, i.e. dispersion, and variance of the symbols. With these features, we can choose the appropriate clustering algorithm to mitigate the PN impact on specific symbols, i.e., by applying FCM-GK for detecting clusters with ellipsoidal shape, and k-means or conventional demodulation method based on a rectangular fixed grid in inner symbols, where the distortion is minor. As observed in the histograms of Figure 8, the symbol toward close to the center of the constellation diagram (i.e., the symbol 1111) has a Gaussian shape with lower standard deviation (σ = 0.1986) than the symbol on the corner. In the same way, the symbol 0000 spreads the Gaussian bell on the top and has a higher standard deviation value of 0.3073. This analysis allows to conclude that lower variance (or, equivalently, lower standard deviation) values correspond to more compact clusters that have circular shape, comparable with the cluster that would be obtained when only AWGN is present.
The Voronoi contour shows a more defined Gaussian shape for the inner symbol and an elongated non-Gaussian shape for the external symbol. This statistical analysis is performed for the different OSNR levels measured in the RoF experimental setup, and the combined effect of data symbol clustering with Voronoi contour enhances the demodulation stage, either using a direct detection [26] or coherent receiver. Besides, by observing the constellation characterization, a residual phase offset was detected on external symbols of constellations, then using the clustering fragmentation a the dual-stage phase offset compensation (DS-PS-RF) proposed method in [9], an additional improvement of BER can be obtained when conventional demodulation is applied with rectangular decision boundaries.  In Figure 9, we plot the BER performance vs. OSNR for 16-QAM at the errorfree condition after the forward error correction (FEC) encoding that sets the errorfree threshold at 2 Â 10 À2 . The line with blue color and circle marker corresponds to the conventional demodulation method; the error performance for FCM is the curve in orange color with a diamond marker. The other curve, with a square red marker, corresponds to the k-means algorithm, and, finally, with the asterisk green marker, the FCM-GK method is plotted. k-means and FCM-GK perform a similar gain of 2.1 dB in the OSNR scale comparing with conventional demodulation of 16-QAM; however, beyond 26 dB, the gain margin for FCM-GK over k-means holds around 1 dB for the rest of the curve up to 36 dB. Besides, for FCM-GK assisted with Voronoi contour, the OSNR achieves a gain of 3.2 dB.
The constellation inset shows the data symbol distribution for FCM-GK obtaining an error vector magnitude (EVM) of 20.1% for 16-QAM. On the other hand, in Figure 10, the BER performance results are presented for 4 + 12 PSK modulation format. An improvement in the OSNR scale of 1.4 dB for FCM-GK compared with the conventional demodulation method is observed in Figure 10, besides in the inset, an EVM of 20.9% for the 4 + 12 PSK is obtained at 22 dB of OSNR. The curve for the FCM algorithm shows the worst performance compared with the other methods. These effects are possibly due to distortions on the shape of the clusters, which are ellipsoidal and have higher overlaps due to lower Euclidean distance among clusters. Similar performance curves are shown for k-means and conventional demodulation algorithms.
The decision boundaries obtained from Voronoi contours for each cluster define non-Gaussian regions; then, when the inner symbols grow circularly (following a Gaussian shape), the conventional fixed grid or k-means demodulation techniques must be used but, in contrast, when outer symbols take ellipsoidal shape, the FCM-GK provides higher tolerance to distortions and improves the BER and OSNR. Also, it is evident that to optimize the geometric shaping is a non-trivial task because the Euclidean distance must be maximized by performing a nonconventional distribution of symbols, besides limiting the symbol energy level to avoid power penalties concerning conventional demodulation formats. However, to have an idea of the non-Gaussian behavior due to NLPN, Figure 11a and b and represent the 3D behavior of the received data symbols showing how the distortions impact and change the shape of the symbols. Besides, a residual phase offset of centroids mainly over external clusters shows that ideal centroids are shifted to newer locations over the plane as demonstrated experimentally in [34]. Additionally, it is possible to improve error performance using a second stage to estimate amplitude changes and phase offset to compensate these residual shifts observed in the constellation. The a priori initialization of centroids with a fixed number of clusters avoids introducing a  higher computational time of the evaluated algorithms. Besides, computational complexity in comparison with linear and nonlinear equalizers is an issue to be further explored in a future work. Additional specific experimental studies should be conducted to validate with more data the EVM/SNR estimation, and to define the impact of key aspects such as the shape of the clusters, or the performance for different distance norms, and the evaluation of optimized algorithms derived from conventional k-means and FCM.
Finally, the lower time-computing because of centroid initialization compensates the time spent in the calculation of inverse matrices required during the execution of Gustafson-Kessel algorithm. However, further studies on different optimization techniques must be conducted, e.g. for smaller OSNR values and using nonconventional modulation formats, implementing them in real-time scenarios, for a deeper analysis of the processing time issue.

Conclusions
We proposed and experimentally evaluated a constellation-based tool for demodulation, aided by Voronoi thresholding, clustering fragmentation, and signal demodulation, validated with 16-QAM and 4 + 12 PSK constellations, distorted especially with nonlinear phase noise. Fuzzy c-means Gustafson-Kessel was evaluated as an alternative to performing the demodulation-based clustering, taking advantage of the improved identification of clustering with non-Gaussian shapes enabled by the FCM-GK. The non-Gaussian boundaries for symbol demodulation were estimated using contour Voronoi diagrams, obtaining improvements of 3.2 dB for 16-QAM and 1.4 dB for 4 + 12 PSK using the fuzzy c-means Gustafson-Kessel algorithm. We compared the error performance for different OSNRs as a function of the BER at a threshold level of 2 Â 10 À2 . Our fragmented constellation in combination with Voronoi thresholding method mitigates distortive effects over data symbols, performing localized transmission impairments characterization by applying the individual analysis of each cluster of symbols or subconstellations (quadrant in the constellation diagram), without requiring frequency and phase carrier recovery algorithms. Besides, this method can be extended to the coherent receiver avoiding the time-consuming post-compensation algorithms to equalize the received signals. However, if during the characterization stage the distortions of the received constellation have Gaussian behavior, employing clustering-based demodulation using k-means or the conventional demodulation algorithm with fixed grid provides similar error performance as demonstrated, but the trade-off is focused on the computational effort. But if non-Gaussian shapes are identified over the constellation, then a better approach to improve the system performance is to use the FCM-GK method aided with a Voronoi diagram. Finally, a new proposal for estimating the EVM reduces the number of symbols required to calculate the error metric; this reduces the computational time because only external symbols are required to approach the result.