Aliasing Compromises Staggered-Rotamer Analysis of Polypeptide Sidechain Torsions

Circular undersampling and the ensuing aliasing effect are demonstrated to compromise nuclear magnetic resonance (NMR)-based molecular torsion-angle analysis referring to experimental J-coupling constants when employing the staggered-rotamer model, also known as Pachler model. This popular model is flawed insofar as it systematically produces counterintuitive probabilities for the two minor constituents out of the total three rotamers, to the effect that the apparent circular mean direction of the molecular bond conformation is inflected about its main rotamer angle, a situation that apparently went unnoticed for more than 50 years. The principal reason for systematic errors lay in the model’s ill-conceived attempt to resolve the bimodal J-coupling-angle dependency by a mere three discrete points on the circle, thereby conflicting with the Nyquist-Shannon sampling theorem. An anti-aliasing approach is being offered that helps improve the results.


Introduction
Atom-atom bonds in a molecule often give rise to rotational degrees of freedom, also known as torsion angles. A torsion angle, also known as dihedral angle, is formed by three consecutive bonds in a molecule and defined by the angle between the two outer bonds projected onto a plane perpendicular to the central bond (Figure 1).
Finding out how two parts on either side of a rotatable bond relate to each other, that is, assigning a value to the torsion angle, presents one of the challenges in molecular structure determination. A molecule adopting different geometric arrangements-without breaking or making bonds-is said to exhibit distinguishable conformers. Nuclear magnetic resonance (NMR) spectroscopy [2] is uniquely positioned to help characterize not only static molecular structure, but also dynamical processes that involve interconversion between conformers on a short, typically nanosecond timescale. Studying torsion-angle geometry and dynamics by NMR benefits from the measurement of 3 J coupling constants [3]. Typically on the order of a few Hertz, these are magnetic interaction parameters between atoms X and Y in the three-bond, four-atom fragment X-A-B-Y constituting dihedral angle θ XY about bond AB. In essence, 3 J XY coupling interaction is strongest when bonds XA and BY are orientated parallel (θ XY = AE180°) and weakest if perpendicular (θ XY $ AE90°). A secondary smaller maximum exists for θ XY = AE0°.

The Karplus curve
Karplus [4] formulated a universal empirical relation as to how a 3 J coupling constant depends on the intervening torsion angle: Karplus coefficient C 0 signifies the average coupling strength obtained for a complete torsion-angle revolution. The C 2 term lends the Karplus curve its bimodal shape and determines the undulation depth. The C 1 term adds a unimodal component that affects the difference between primary and secondary maxima. Typically, C 1 is negative in order for the primary coupling-constant maximum to appear in the so-called trans conformation, when the X-A-B-Y angle is at AE180°and the atomatom interaction is strongest.
We here focus on biomolecular structure analysis of the so-called χ 1 torsion in aminoacid residues, the constituent units of polypeptide and protein chains [1], where the value of χ 1 refers to the dihedral angle θ N'Cγ about the bond between carbons C α and C β (Figure 2). It is imperative to understand, that for each of the up to nine possible atom-pair combinations formed from X = N 0 , C 0 , or H α ("front") and Y = C γ , H β2 , or H β3 ("rear"), the phase parameter Δθ in the Karplus curve must  Staggered-rotamer states for aminoacid sidechain torsion χ 1 viewed down its central C α dC β bond. Most aminoacids exhibit two H β protons oriented trans or gauche with respect to H α as indicated by a shorthand [5]. be either 0°or +120°or À120°. This crucially sets up and determines the invariable three-point sampling of the Karplus curve in the situation of χ 1 .

The staggered-rotamer model
Pachler [6] employed 3 J HαHβ2 / 3 J HαHβ3 coupling data pairs in order to quantify populations of conformational states in ethane-like compounds, such as those encountered in aminoacid sidechains. According to Pachler's model, the torsion χ 1 in the aminoacid sidechain is considered to adopt one of three, or to hop between two or between all three so-called staggered rotamers, characterized by nominal χ 1 angle values of À60°, AE180°, or +60° (Figure 2). An alternative set of angles comprising À120°, AE0°, and +120°would give rise to eclipsed rotamers, which, however, are unfavorable in view of molecular energy and, thus, are disregarded for reasons of limited or vanishing lifetime. Both Pachler model and staggered-rotamer model are synonymous.
The Pachler model allows one to deduce probabilities for each of the three staggered states once a pair of experimental 3 J coupling constants is available. Historically, hydrogen-hydrogen coupling constants J HαHβ2 and J HαHβ3 were the first ones accessible to measurement [7], heteronuclear coupling constants were alternatively used later [8]. Accordingly: Eq.
(2) derives staggered-rotamer probabilities from the measured J XY coupling constants by interpolating between specific coupling values J 180°a nd J AE60°, also designated as J trans and J gauche , respectively, that correspond to the specified fixed geometries for the θ XY angle.
The Pachler model sets up the experimental observable as the corresponding probability-weighted state-averaged value of J (indicated by brackets): Both models, the dependency of 3 J couplings on the torsion angle according to Karplus (continuous model) and the dependency of the torsion-angle distribution on relative 3 J coupling-constant pairs according to Pachler (discrete model), form an indispensable basis of many a biomolecular NMR structure investigation.

Continuous-discrete model interconversion
Sets of probabilities, associated with the three staggered rotamers in the traditional Pachler model of aminoacid sidechain torsion-angle variability, transform into equivalent continuous angle distributions and, vice versa, discrete staggeredrotamer probabilities can be computed from any given angular direction in continuous circular space.
Mean direction θ and concentration R of a continuous angle distribution convert into corresponding normalized probabilities for three discrete-state samples evenly distributed on the circle, according to [9], Conversely, discrete staggered-rotamer probabilities aggregate into the circular mean direction of a continuous angle distribution, and its circular order parameter (concentration) runs between 1 (static fixed angle) and 0 (rotational average), where P x and P y signify any pair out of the three probabilities P À60 , P 180 , and P +60 .
Swapping ω and its conjugate yields the inverse transform, including normalization, as Now consider probabilities at the N = 3 equispaced samples on the circle, that is, P 180 = M 0 /3, P À60 = M 1 /3, and P +60 = M 2 /3 inserted in Eq. (9), informing the first three modes p m = 0,1,2 of a circular probability density, generically given by Continuous-discrete model interconversion is transparent regarding Fourier transform. Combining circular modes p, the transforms of the P domain, yields the circular mean direction as the first circular moment, 1 and the (squared) circular order parameter, related to auto-correlation [9], as the second circular moment, The zero-th mode quotes the trivial chance of finding the direction at all, First and second modes, respectively, then supposedly represent amplitudes of features occurring once-per-cycle (m = 1) and twice-per-cycle (m = 2), However, these are conjugates of the same circular frequency! Thus, mode p 2 does not reflect twice the rate of p 1 . This peculiarity is unique to the 3-point DFT and ambiguates the meaning of single and double phase advances through the circle (Figure 3), because if, and only if N = 3, the following identities hold: implying ω +2 = ω À1 , ω À2 = ω +1 . 1 In analogy to linear statistics, circular modes are series of multiple-angle arguments, whereas circular moments are power series in the trigonometric operators. Moments are simple combinations of modes, the inverse does not hold. Apparently, periodic wrapping equates the double angular speed, ω 2 , to the conjugate of the single angular speed, ω À , thereby folding-under sign inversionsamples M 1 and M 2 onto each other, as visualized in Figure 4.
All issues regarding sampling-rate doubling, sign change in the real portion of the complex numbers, and data-pair ambiguity hold for both forward and inverse three-point transforms. Both setups produce identical results, making reconstructing clockwise and counterclockwise arrangements of the circular samples, or P +60 and P À60 , for that matter, somewhat impossible. Entirely independent of input and output data, the effects arise solely from the transformation operator and are, therefore, model-inherent.
Regarding the unique case of three staggered rotamers, let us also consider Karplus coefficients C 0 = m 0 , C 1 = m 1 , and C 2 = m 2 subjected to three-point DFT according to Eq. (7) and obtain precisely those specific J-coupling values needed to compute the state-averaged J value of Eq. (3): Inverse-transforming the values of J trans and J gauche , inserted as J 180 = M 0 /3, J +60 = M 1 /3, and J À60 = M 2 /3 into Eq. (9), recovers the three coefficients (i.e., modes) of the Karplus Eq. (1).
Mathematically, this looks clean and entirely reversible. In analogy to the modes of the circular probability density in Eq. (14), the DFT obtained for both J +60 and J À60 inserted in Eq. (3) would necessarily result in complex-conjugate numbers. And yet, deliberately ignored was the fact that theoretical as well as experimentally Karplus curve is shown in black. Aliased unimodal (red) and bimodal (blue) curves, respectively, result from coefficient C 2 folding onto C 1 and, vice versa, C 1 folding onto C 2 . Associated with staggered rotamers are those invariant focal points J trans and J gauche through which all curves pass. Small panels: sampling properties of rotamer models in contexts of 3-point and 6-point discrete Fourier transforms of Karplus curves. Each coefficient C m as the mth circular mode (bars) connects with an effective circular sampling rate in units of (2π rad) À1 , m being the periodicity of that component. Bottom: sampling the bimodal coupling-angle dependency at three 120°-equispaced (e.g., staggered-rotamer) angles corresponds to rate 1.5 π À1 rad À1 (dotted line), too low to resolve the higher C 2 mode. Consequently, C 2 inverts and folds into the lower frequency band and stacks onto or aliases the negative C 1 mode (red bar). The original C 2 amplitude appears blanked. Reconstructed from such distorted coefficients, the mis-sampled Karplus curve appears purely unimodal. Top: alternatively, C 1 folds under sign inversion onto the C 2 mode (blue bar), generating a purely bimodal curve. Middle: aliasing of the high mode is avoided with a 6-state staggered-eclipsed 60°-equispaced rotamer model that samples the circle at twice the rate of 3 π À1 rad À1 (dash-dotted line).
observed J-coupling constants are generally considered real data and, therefore, J values for gauche + and gauche À states are ordinarily identical. Clockwise and counterclockwise orientations become indistinguishable once again. At any rate, disregarding imaginary components amounts to loss of information, introducing a subtle irreversibility into the staggered-rotamer analysis that cannot be corrected.

Worked example: selected sidechain rotamers in a protein
In the course of extensive protein-structure studies by NMR, 3 J coupling constants related to aminoacid sidechain torsions were measured in a variety of proteins. Ongoing investigations target sidechain χ 1 torsion structure in ribonuclease T 1 (RNase T 1 ), an enzyme of 104 aminoacids size, experimental 3 J coupling data of which are deposited with the BMRB [11]. Applying up to nine available coupling constants as experimental constraints, two models, a continuous single-state torsion angle and the discrete staggered-rotamer populations, were least-squares fitted for each aminoacid in the enzyme. Exemplifying the present issues, RNase T 1 contains three histidines, each of which adopts a different predominant staggered state with only little dispersion about their respective circular mean direction. Accordingly, the torsions lock χ 1 conformations near AE180°in His27 (as evident from a large trans-3 J N'Cγ coupling constant, Figure 2), near +60°in His40°(large trans-3 J HαCγ ), and near À60°in His92 (large trans-3 J C'Cγ ).
The predominant χ 1 rotamer in His92 is populated at P À60 = 92%, with only negligible or minor contributions from P 180 = 0% and P +60 = 8%, which together make for a circular mean direction of À56° (Figure 5). The single-state model converges at À72°, somewhat "mirrored" or inflected about the ideal À60°s taggered state.
The His27 torsion noticeably tilts away from the ideal AE180°staggered angle, circular mean direction from the discrete model being À160°. However, a χ 1 value of +158°results from fitting a continuous single-state angle and would infer a primary AE180°and a secondary +60°rotamer, broadly at 3:1 proportion [9]. Yet, populations emerging from staggered-rotamer analysis (P À60 = 23%, P 180 = 89%, P +60 = À12%) suggest À60°as the more significant secondary constituent. This, together with a decidedly negative probability for the +60°rotamer, results in an "opposite" circular mean direction, again inflected about the constituent AE180°s tate. Similarly, discrete staggered-state populations in His40 emerge as P À60 = 0%, P 180 = 22%, and P +60 = 78%, making for a circular mean direction of approximately +76°, while χ 1 converges at +36°in the continuous model. In the staggered-rotamer model, a mean direction deviating from +60°toward smaller angle values would command a significant contribution from the À60°state to the overall average. Once again, the discrete model suggests the less plausible AE180°conformer.
Even though both fitted models agree as regards the main conformer in each case, state populations of the minor conformers in the staggered-rotamer model seem always somewhat counterintuitive. Deviating from a single ideal staggered state, the inflection phenomenon in the discrete model consistently manifests as a false apparent mean direction in the rotationally opposite sense, off in terms of degrees by-approximately-as much as the correct mean direction derived from the continuous model deviates from that nearest ideal staggered state. How do we know it is the staggered-rotamer model delivering wrong results and not the continuous model? Because were the angle tilted to the "opposite/other side," the corresponding set of probabilities (top panel of Figure 6) would generate a value of J typically in disagreement with the experimental value, for example, for 3 J HαHβ3 , as read off the Karplus curve for the given angle (bottom panel of Figure 6).
Such discrepancy is not limited to specific aminoacid residue types; rather, it is seen-without exception-in all rotamer studies. Revisiting 3 J data collected for a different protein, Desulfovibrio vulgaris flavodoxin [12], confirmed suspicion that a systematic issue would be at work. Invariably, counterintuitive admixtures of rotamers are calculated for all non-ideally staggered sidechain conformations.

A compromise fix: anti-aliasing
Anti-aliasing signifies any procedure that attempts to ameliorate adverse effects from aliasing due to coarse sampling. Anti-aliasing applied in, for example, image processing helps smoothen jagged lines and edges that result from digitizing continuum data into discrete samples [13]. Generally, such approaches form weighted averages over a number of adjacent data points and, therefore, are irreversible data manipulation.
Anti-aliasing applied to the discrete rotamer model helps restore some bimodal curve feature in the distorted coupling angle relationship and would also minimize unrealistic negative excursions in the probability parameters. Considering that Eq. (2) introduces into the traditional interpretation of staggered rotamers a strict one-to-one correspondence between each coupling and precisely one, and only one, staggered-state population, this permits the remaining two population parameters to be manipulated at will without interfering with that respective correspondence, offering an opportunity to diminish differential probability and improve the population parameters generally. The approach taken here represents the improved symmetrical, balanced variant of the range-bound probability fitting devised in [9] by manipulating all probability parameters equally and simultaneously: Conventionally optimized discrete staggered-rotamer probabilities for histidines in RNase T 1 , and the effect of anti-aliasing applied to these. Probabilities P AE180 (red), P À60 (green), and P +60 (blue) shown as areaproportional circles, and dials indicate the effective circular mean direction obtained from these. Crosses mark torsion angles that best fit the J-coupling data using a continuous variable-angle single-state model. Antialiasing improves the probabilities insofar as apparent mean directions for the modified sets are closer to a staggered state than for the original and also agree better with torsion angles from the single-state fit result. Antialiasing helps inflect the apparent mean direction about the nearest staggered angle toward the opposite side. Most noticeably, His27 is an example of a predominant trans rotamer with clockwise deflection of its apparent circular mean direction, chiefly due to the conventionally fitted negative probability of the +60°state (open circle). Anti-aliasing reverts the apparent mean into a counterclockwise direction and also inverts the implausible value of P +60 . aa P À60 ¼ P À60 þ z ÀP À60 P 180 =2 À P À60 P þ60 =2 þ P 180 P þ60 ð Þ aa P 180 ¼ P À60 þ z ÀP À60 P 180 =2 þ P À60 P þ60 À P 180 P þ60 =2 ð Þ aa P þ60 ¼ P þ60 þ z þP À60 P 180 À P À60 P þ60 =2 À P 180 P þ60 =2 ð Þ Underlying the anti-aliasing principle in Eq. (17) is the observation that the smallest probability derived from the Pachler model is always too small, if not negative, while that of the main rotamer is always too large. Therefore the small P value is being raised (one positive increment) at the expense of both other parameters, from which half the amount each is being taken off (two negative half increments). Applying the correction in turn to each set gives rise to the anti-aliasing matrix in Eq. (17). Normalization is critically ensured as each product term added (or subtracted) is being subtracted (or added) elsewhere.
Values between 0.5 and 1.5 for the anti-aliasing parameter z range form a reasonable compromise between such diverse effects as differential probability, negative probability, and unimodality of the coupling-angle dependence [9]. By choosing z = 1, all eclipsed-state populations amount to one-third, making these situations indistinguishable from the complete rotational-average limit. Contrasting the conventional staggered-rotamer model, the anti-aliased model typically inverts and reduces differential probability between the two minor rotamers (Figure 7).
Aliasing affects primarily the probabilities, yet, consequently also any backcalculated-as opposed to experimental-coupling values. Only the mean conformation-averaged calculated J couplings remain invariant when inserting anti-aliased probabilities aa P into Eq. (3), equaling those obtained with the unmodified P values.

(1). Annotated RNase-histidine examples highlight large differential probability between the minor rotamer pairs (top) as well as counterintuitive coupling values (bottom) as seen for all deflections from ideal staggered states (shaded zones).
In an attempt to improve the coupling estimates, anti-aliasing can be implemented in two possible ways: i. Applied as post-processing procedure, the fitted probabilities are being modified (anti-aliased) according to Eq. (17), and coupling estimates in Eq. (3) amended in accordance with the updated values of P x,y,z . As leastsquares parameter optimization already minimizes the discrepancy between observed and calculated J coupling values, any subsequent parameter modification will invariably increase the overall fit error in J. After all, we are interested less in the fit error than we are in plausible and acceptable molecular descriptions.
ii. Applied during data fitting, anti-aliasing and optimization of the probability parameters will interfere and cancel somewhat. Again, optimization strives to minimize the difference between observed and calculated J values; only this time, the instantaneous fit error feeds back on the continual probability-parameter adjustment, counteracting the antialiasing due to the model-inherent divergence between calculated and true coupling-angle dependence.
In practice, parameters optimized following option (ii) converged nearer to those obtained in the original, aliased fit, while more significant corrective effects are seen when adopting option (i).
As circular folding tends to equalize the meaning of both first and second mode coefficients of the Karplus curve, one might equally (or alternatively) employ jointly the C 2 coefficient with an additional folded (mirrored and inverted) C 1 coefficient. This would connect with twice the angular speed, which happens to coincide with the single speed of opposite sign (Eq. (14)). This condition is identical to anti-aliasing using z = 2.
Anti-aliasing is irreversible, as the sequence order in two recursions, z = +1/À1 vs. -1/+1, does make a difference. The latter results in two equal J trans maxima, much like applying z = +2 once, and four J gauche values for all other 60°intervals. The probabilities are more peaked at the top and flatter at the bottom. The former sequence produces flat tops in both P and J profiles and peaked bottoms, even though J trans and J gauche are correctly reproduced, and all eclipsed states show J mean .
As multiple couplings are considered in real applications, the final outcome is a balanced compromise between all contributing parts.

Discussion and conclusion
In principle, conversion of data into a model encounters one of four possible situations (Figure 8), depending on the ratio between the number of independent experimental observations and the number of model parameters to determine. Focusing here on aminoacid geometry: i. Normally, if many data are available and accurate, the analysis outcome is most likely reliable, as is the case with the polypeptide mainchain ϕ torsion angle, where six J-coupling constants were collected, associated with phases densely spaced at 60°intervals on the circle, allowing to determine the torsion-angle parameter with high accuracy [14].
ii. A number of model variables that exceeds the number of experimental observables renders analysis generally impossible, owing to too few data.
iii. At most, interpretation of insufficient data would produce some artificial result. Indications are that this might be the case with polypeptide mainchain ψ torsion analysis, where a mere three J-coupling constants are accessible for sampling rotamer probabilities associated with three phases at 120°intervals, similar to the present work, yet one more experimental observable to refer to. iv. Finally, the present case of polypeptide sidechain χ 1 torsion analysis would not normally suffer any data shortage in determining a single angle parameter, given a principal theoretical parameter over-determination from up to nine experimental observables. Rather, the analysis procedure itself produces the notoriously distorted artificial outcome. This work demonstrated the mathematical reasoning as to why implausible probabilities are being obtained regardless of an overwhelming data supply.
Attempting to extract meaningful parameters of mean direction and dispersion by digitizing the bimodal J-coupling angle dependency through a "bottleneck" of only three staggered-rotamer angles must appear error-prone. In many ways, this seems tantamount to the proverbial "square peg in a round hole." It is hoped that the issues and pitfalls connected with circular three-point sampling and transforms, demonstrated here in a biochemical context, would inspire a fresh look at applications in other disciplines in mathematics, physics, or engineering.