Comparison of RSA-based time-to-digital conversion techniques.
Abstract
This chapter reviews the technical evolution of random sampling-and-averaging (RSA) technique associated with variance reduction (VR) in the time-interval measurements of emerging microelectronic and quantum applications. First, the theoretical analysis of the RSA technique based on stochastic Monte Carlo methods is elaborated for power-efficient and highly-accurate signal and photon detections, including both synchronous and asynchronous RSA measurement techniques with superior time-domain detection resolution, scalable dynamic ranges, high linearity, high noise-immunity, and low power/area consumption. Second, to further enhance the conversion-rate of the RSA measurements, the theoretical expectations, variances, and correlation coefficients of two RSA-compatible VR techniques, self-antithetic and control-variate VRs, are comprehensively derived and expressed in the mathematical closed-forms for practical integrated-circuit (IC) hardware realization.
Keywords
- antithetic variate
- control variate
- correlated random variable
- independent and identically distributed
- joint probability density function
- Monte Carlo method
- quantum probability amplitude
- stochastic random sampling
- time-correlated single-photon counting
- time-domain modulo operation
- time-to-digital converter
- variance reduction
1. Introduction
Today, time-correlated single-photon counting (TCSPC) [1, 2] plays as the primary functionality in various emerging microelectronics and quantum technology. Different TCSPC architectures have their own pros and cons in multiple performance specifications depending on the requirements of time-to-digital conversion-rate or conversion-accuracy, which can roughly categorize TCSPC applications into two major areas as follows: (a) Quantum imaging, quantum sensing, positron emission tomography, time-resolved spectroscopy, fluorescence-lifetime imaging, light detection-and-ranging, and time-of-flight sensing, etc. employ small-area and low-power time-to-digital conversion (TDC) implementations with the disadvantage of low resolution, low accuracy, and high clock-generation power. (b) Quantum cryptography, Q-bit-state probability amplitude measurements, live-cell and tissue microscopy, and molecular imaging, etc. mainly exploit high-resolution TDC implementations with the downsides of low conversion-rates, high calibration complexity, and high-order digital filtering. In the long run of quantum-technology development, the demand for supporting both high-speed and high-resolution with low power/area consumption will be the common direction of all TCSPC realization approaches.
Therefore, this chapter reviews the theoretical analysis of random sampling-and-averaging (RSA) technique based on stochastic Monte Carlo methods used to perform high-accuracy TCSPC functionality with the downside of slow conversion rates. Then, two RSA-compatible variance-reduction (VR) techniques, self-antithetic and control-variate VRs, are elaborated in this chapter as well mainly for enhancing the conversion-rate of asynchronous RSA to realize a unified RSA-based TDC architecture for both categories of high-speed and high-resolution TCSPC applications. To verify the feasibility of the VR techniques, this chapter comprehensively summarizes the theoretical expectations, variances, and correlation coefficients in the mathematical closed-form expressions for practical integrated-circuit (IC) hardware realization.
2. Random sampling-and-averaging techniques
Monte Carlo methods have been broadly used in the fields of applied mathematics and financial engineering [3]. The main focus of this chapter, RSA measurement technique, was originated from the principle of Monte Carlo methods based on the analogy between and volume statistics and probability. In the field of stochastic processes, probability density functions (PDF) formulize the concept of probabilities to define the volumes of possible outcomes. Meanwhile, the Monte Carlo methods obtain the volumes from experiments and then interpret the volumes as probabilities. The relationship between the theoretical probability and experimental Monte Carlo method is summarized in Eqs. (1)-(3) by examining the expectation and mean values of a random variable, Y [4].
N is the total number of the samples; Yn represents the n-th experimental sample of Y; E[Y] represents the expectation of the random variable Y based on its PDF, f(y);
Eqs. (4) and (5) provide two pieces of important information as follows: (a) Since the delta between the Monte Carlo estimate and ideal expectation, (
To implement the RSA-based TDC in a single-photon time-interval measurement, the TCSPC system utilizes a time-to-voltage conversion (TVC) circuit, two identical voltage-controlled delay lines (VCDL), and an edge combiner to convert the one-time captured Δt information, which is the quantity under measurement, into a periodic digital clock signal, CKτ, carrying a scaled version of Δt within each clock cycle to enable a process for an unlimited number of samples. The TVC first generates a single pulse, INT, whose pulse width equals the time difference, Δt, between the rising edges of the two pulses under the detection. Then, the INT pulse width enables an analog integrator implemented by a tunable constant current source, II, charging the integration and parasitic capacitors, CI and CP, to form a DC voltage, VTVC. The time-interval, Δt, is converted and retained in the voltage domain as a differential DC voltage, ΔV = VDD − VTVC = KTVC·Δt, where KTVC is the conversion-gain of the TVC set by the magnitudes of II and CI. A variable-gain amplifier (VGA) buffers the constant voltage information with its gain, KVGA, to one of the following VCDLs. Because of the control voltage difference between VDD and VVGA, these two identical VCDLs generate two clock signals, CK1 and CK2, with a common frequency, 1/T, and a constant delay, τ = (KTVC·KVGA·KDL)·Δt, where KDL is the conversion-gain of the VCDLs. After a rising-edge combiner, the CKτ signal merged from CK1 and CK2 is a periodic pulse carrying the scaled time-interval information, τ, as its duty-cycle in every T. During this Δt-to-τ conversion-process, the maximum range of the time-interval measurement ΔtMAX is converted to the period of CKτ (T) so that the dynamic ranges for different time-interval (Δt) measurements can be set by the overall circuit conversion-factor, (KTVC·KVGA·KDL), on the condition of a certain T.
At this point, CKτ is sampled by a single digital Data Flip-Flop (DFF), which is basically a 1-bit TDC process per sample. The sampling clock of the DFF, CKDCO, is independently generated by a free-running ring oscillator digitally controlled by a pseudo-random-binary-sequence generator (PRBS) for random frequency modulation; i.e., a digital-controlled oscillator (DCO). With this random sampling frequency capability, the assumption of the uncorrelation [4, 5] between CKτ and CKDCO can be guaranteed, and the waveform or voltage of CKτ at any time instant within T can have an equal chance to be sampled by CKDCO to form a one-dimensional geometric probability density function [4]. Any sampled voltage from CKτ will be converted to either a Logic-1 or Logic-0 at the DFF output Yn based on the sampled voltage larger or less than the intrinsic threshold of the DFF. After NDCO sampling cycles, the accumulation of these NDCO outcomes at the DFF output (Y1, Y2, …,
The RSA process can be theoretically described in different points of view as follows: (a) When the sampling number NDCO is sufficiently large, the PDF of the DCO sampling instants should be uniformly distributed across the period of the CKτ, i.e., fDCO(t) = 1/T, t ∈ [0, T), so in the time-domain the probability of obtaining a Yn as Logic-1, P1, is therefore the ratio of τ to T (= τ/T). (b) In the voltage domain, the probability function of Y, f(y), is a Bernoulli distribution [4] because Yn has only two possible outcomes, Logic-1 or Logic-0, and their corresponding probability values are P1 (= τ/T) and P0 (= 1 − P1), respectively. (c) The Monte Carlo estimate,
In Eq. (7), the variance Var[
As mentioned, the RSA is basically the average of total NDCO of 1-bit TDC process, and its achievable accuracy is a function of the sampling number NDCO. Therefore, the performance metrics of an RSA-based TDC can be reflected by the accuracy and conversion-rate of each RSA measurement result
In Eq. (9), the signal power is either P12 or P02 based on the magnitude of P1 compared to 0.5 due to the symmetric and signal-dependent variance property of the Bernoulli distribution. Conceptually, Eq. (9) can be examined by considering NDCO = 1, then the RSA variance (Var[
3. Synchronous random sampling-and-averaging
Synchronous RSA means the number of samples per T during each RSA sampling process is constantly set by a certain value of oversampling ratio (OSR) though the uniformly distributed PDFs of all CKDCO sampling instants (or edges) are still I.I.D. Note that the definition of OSR in this chapter is defined by the number of samples per CKτ period (T), which is not the same as the OSR usually defined in the sampling theorem for anti-aliasing. Multiple examples of synchronous RSA are listed as follows: (a) OSR = 1/2, one CKDCO edge uniformly samples two periods of CKτ (2·T). (b) OSR = 1, one CKDCO edge uniformly samples one period of CKτ (T). (c) OSR = 2, one CKDCO edge uniformly samples a half period of CKτ (0.5·T). (d) OSR = 4, each CKτ period is sampled by four independent CKDCO edges, and each sampling edge possesses a uniform PDF bounded within one of the T/4 regions. In sum, the probability of each CKDCO sampling edge occurs uniformly within the region of T/OSR. Under this assumption for synchronous RSA, though CKτ can be seamlessly and uniformly sampled by CKDCO regardless of the values of OSR, however, OSR plays the key role of determining the accuracy of synchronous RSA. In reality, generating these well-bounded random sampling edges in IC implementation is challenging while the step size or resolution of the random sampling edges must be higher than the target ENOB of each synchronous RSA measurement. These realization issues can be resolved by asynchronous RSA elaborated in the next section. The assumption here is that the resolution of each CKDCO sampling PDF for synchronous RSA is sufficient to maintain an almost “continuous” probability density function [4] within its own distribution region set by T/OSR. Overall, though the realization of synchronous RSA is impractical, its concept can help readers to understand the theories behind the asynchronous RSA and VR techniques described in the second half of this chapter.
Based on the definition of the synchronous RSA described above, the theoretical variances with respect to different OSRs can be derived. In the case of OSR = 1, the probability of obtaining a Logic-1 per sample is exactly τ/T = P1, so the expectation and theoretical variance of each RSA measurement are equal to the results shown in Eqs. (6) and (7), respectively. In the cases of OSR < 1 (i.e., subsampling), for example, when OSR = 1/4, the sampling region of each CKDCO edge is extended to 4·T, the probability of obtaining a Logic-1 stays the same as τ/T = (4·τ)/(4·T) = P1 since the positive duty-cycle of CKτ is also extended by 4 times. Overall, whenever OSR ≤ 1, the expectation and theoretical variance of synchronous RSA can be expressed by Eqs. (6) and (7), respectively, on the condition that OSR must be always the reciprocal of a positive integer number to satisfy a constant CKτ duty-cycle within each uniformly distributed sampling region set by T/OSR.
When OSR > 1, each uniformly distributed sampling region is labeled by an integer index k, k ∈ [1, OSR]; there are total OSR sampling outcomes per T, and only one of these sampling outcomes can possibly be Logic-1 or 0, which is the region containing the CKτ transition from high to low. For the example of OSR = 2 and τ/T < 0.5, the CKτ period is equally split into two sampling regions (k = 1 to 2). Only the first region (k = 1) can possibly obtain a Logic-1 or Logic-0 outcome because τ/T < 0.5, and the Logic-1 probability is τ/(T/2) = 2·P1; note that P1 = τ/T is the probability of obtaining Logic-1 in the cases of OSR ≤ 1. On the other hand, the outcome of the second region (k = 2) is always Logic-0 since CKτ does not transit in the region of k = 2. The equivalent outcome of each synchronous RSA measurement is the average of k = 1 and k = 2 sampling outcomes. This example shows that the variance of each synchronous RSA, Var[
In the cases of OSR ≥ 1, the theoretical variance of the synchronous RSA can be acquired by finding the total covariance sum with the joint PDFs of all samples within each T, so the expectation and theoretical variance of the synchronous RSA measurement can be generalized as follows [6]:
Based on Eq. (11), the correlations induced by the oversampling process reduce the overall variances. That is, evenly splitting each T by OSR and taking the average of all sub-region outcomes to reach the overall result of each synchronous RSA measurement are actually producing negative correlations among all the samples. Note that “k” in Eq. (10) simply represents the index of the summation operator to find YOSR,n per T, but “k” in Eq. (11) is a specific integer number within 1 to OSR based upon the transition of CKτ; i.e., for a certain τ/T under the RSA measurement, only a certain k can meet the 1st line of Eq. (11). Also, note that, regardless of the value of OSR, the WLLN is always valid.
4. Asynchronous random sampling-and-averaging
The asynchronous RSA technique can be practically realized by power and area efficient ICs and perform I.I.D. random sampling within the period of CKτ without strict sampling PDF boundaries and frequency relationships between CKτ and CKDCO. Before explaining how I.I.D. random sampling can be in an asynchronous manner for practical RSA realization, some parameters are defined as follows: (a) TDCO,n is the n-th period of the DCO. (b) tSAMP,n is the n-th absolute time at the n-th sampling edge of the DCO. (c) ΔTPRBS,n is the n-th DCO period dynamically extended by the PRBS generator (TDCO,n = TDCO,MIN + ΔTPRBS,n) between (TDCO,MIN + 0) and (TDCO,MIN + ΔTPRBS,MAX) [7]. Both TDCO,MIN and ΔTPRBS,MAX can be coarsely adjusted by their own static controls in the DCO circuit. (d) For the sake of simplicity, CKτ and CKDCO are assumed to exhibit coincident rising edges at t = 0.
Though naturally the DCO can perform a phase-noise accumulation under the presence of device Johnson (thermal) noise, flicker noise, and power-supply noise, a strong phase-noise accumulation is required to form a widely distributed random sampling PDF. Therefore, a noise-energy dominated PRBS noise source is intentionally added in the DCO for the asynchronous RSA technique. Theoretically, the combinational effect of multiple I.I.D. noise sources are all applicable to the analysis result in this section, but the asynchronous RSA technique relies on artificial PRBS noise to effectively scramble and avoid the synchronicity between CKτ and CKDCO. Meanwhile, although the circuit/device noise sources exhibit much lower energy than the PRBS can, they offer arbitrarily small phase-noise accumulations to fill the gaps among discrete PRBS noise step sizes. Therefore, the combinational phase-noise accumulations from the circuits/devices and PRBS generator can help CKDCO to possess an almost continuous sampling PDF.
Under the parameters and noise definitions described above, the n-th absolute sampling time can be represented by Eq. (12) as follows [6]:
Each tSAMP,n consists of two components: the deterministic term (n·TDCO,MIN) and stochastic term created by the phase-noise accumulation, which represents the randomness of each CKDCO sampling instant with a PDF, fDCO,n(t), so the occurrence of the n-th CKDCO sampling edge is a random variable having its own PDF, fDCO,n(t), with the density magnitude across the distribution span. Also, as shown in Eq. (12), the stochastic term is the accumulation of “n” I.I.D. random variables (ΔTPRBS,k, k = 1 to n) produced by the PRBS generator over “n” times, so fDCO,n(t) is the convolution result of “n” uniformly distributed PDFs from the PRBS generator. For example, if n = 1, the stochastic term has one random variable from the PRBS generator with the PDF, fDCO,1(t), which is the fundamental uniform distribution with a distribution span over ∆TPRBS,MAX and constant density magnitude of 1/∆TPRBS,MAX. If n = 2, the stochastic term is the accumulation of two I.I.D. random variables, which both are I.I.D. but sequentially produced by the PRBS generator. Therefore, fDCO,2(t) is the convolution of the two individual uniformly distributed PDFs because of the Convolution Theorem [4], and it exhibits an isosceles triangular distribution with a distribution span across 2·ΔTPRBS,MAX and 1/ΔTPRBS,MAX peak density magnitude. By increasing the of number of samples, the Central Limit Theorem guarantees that the PDF of the stochastic term converges to a Gaussian distribution independent from the PDF of each random variable produced by the PRBS generator. That is, when n ≫ 1, the mean, standard deviation, span, and peak of fDCO,n(t) are converged to the expressions as follows [6]:
Note that the time-domain variable “t” represents absolute time values referenced to t = 0. Since the deterministic term (n·TDCO,MIN) of tSAMP,n sets the left distribution bound of fDCO,n(t), the PDFs of adjacent samples quickly exhibit a large amount of distribution overlaps along with the growth of Spann. Note that the overlaps among sampling PDFs in the time-domain do not change the monotonic ascending order of the tSAMP,n occurrences; this just indicates the correlation among the fDCO,n(t) due to the DCO phase-noise accumulation started from the 1st sample to the n-th sample as expressed in Eq. (12); e.g., tSAMP,n-1 always occurs before tSAMP,n although the PDFs of their stochastic term, fDCO,n-1(t) and fDCO,n(t), overlap on the top of each other in the absolute time domain.
The DCO sampling PDFs generated by the approach described so far does not seem uniform and independent because: (a) when n ≫ 1, fDCO,n(t) in Eq. (13) becomes a Gaussian distribution PDF; (b) all the sampling instants, tSAMP,n, are correlated because of the DCO phase-noise accumulation. These two concerns can be actually resolved by properly setting up the noise-energy (∆TPRBS,MAX) of the PRBS generator with the modulo-T operation because of the periodicity of CKτ, so the Gaussian distribution and correlated characteristics of the DCO sampling PDF, fDCO,n(t), can be turned into the uniform distribution and independent characteristic of modulo-T sampling PDFs, fn(t). The modulo-T operation is automatically achieved when converting Δt into the duty-cycle τ/T of the periodic signal CKτ. The detail of the modulo-T operation is described below.
If n ≫ 1, the Gaussian distribution span of fDCO,n(t) covers several CKτ periods, and the CKτ period T equivalently slices the entire Gaussian PDF into several pieces in the absolute time-domain. That is, these sliced PDFs maintain their own magnitude distributions but are all bounded within the same distribution span T, and all sample the identical CKτ waveform yn(t), t ∈ [0, T). In other words, the distributions of all sliced PDFs are strictly confined within a modulo-T time-interval between 0 and T. Thus, the net density-magnitude at any time instant within [0, T) is the linear summation of all sliced PDFs; i.e., the equivalent PDF of the n-th DCO sampling instant fn(t) is the summation of all sliced PDFs from fDCO,n(t) [6]:
where the time-domain variable “t” is confined within [0, T); (2·S + 1) is the number of segments set by the Spann; Meann’ and k·T are used to shift these (2·S + 1) segments to the modulo-T time-interval, [0, T). When n ≫ 1, fn(t) converges to a constant 1/T across the single T span; this fact can be proven by both mathematical calculation of Eq. (14) and statistical simulations. More importantly, by increasing “n”, all sampling PDFs converge to a uniformly distributed PDF with a constant density magnitude 1/T across the [0, T) distribution span, independent from the parameters of TDCO,MIN, ΔTPRBS,MAX, and even “n” when n ≫ 1. In other words, for all “n” ≫ 1, fn(t) becomes an “identically distributed” PDF, which satisfies the “second” criterion of the I.I.D. random variable and can be implemented by low-cost circuitry. This convergence of the uniform distribution also guarantees the convergence of the asynchronous RSA measurement result, which exactly matches to the expectation in Eq. (6), as follows [6]:
If the value of “n” is not large enough, the modulo-T PDF may behave like a non-uniform distribution; e.g., when ΔTPRBS,MAX = 0.25·T, fn(t) does not perform like a uniformly distributed PDF until n = 64. However, these uniformly distributed modulo-T PDFs will not affect the overall asynchronous RSA measurement result shown in Eq. (15) since these non-uniform modulo-T PDFs can occur only when all the RSA experiments have the identical absolute time reference at t = 0 as the assumption for Eq. (12). In realistic hardware implementation, the initial conditions of CKτ and CKDCO will be inevitably randomized during the sampling process so that fn(t) for all “n” can actually form uniformly distributed PDFs within [0, T) although the value of “n” is a small number. This effect can be observed by merging a naturally randomized initial condition tINT into the unit impulse δ(t) expressed in Eq. (16).
Based on the Convolution Theorem [4], fn(t) can be expressed in the format similar to fDCO,n(t) in Eq. (13), but note that fn(t) shall follow the Circular Convolution Theorem [8] due to the combination of the modulo-T operation and linear convolution [6]:
In Eq. (16), f1(t), fn-1(t), and fn(t) is in the modulo-T time-domain t ∈ [0, T); fDCO,1(t) and δ(t) is in the absolute time-domain t = [0, ∞). It is important to emphasize that f1(t) is not only the PDF of the 1st sampling instant but also the elementary PDF to derive fn(t) from fn-1(t) based on the Circular Convolution Theorem.
Several important attributes of f1(t) are summarized as follows: The distribution of f1(t) constantly starts at the remainder of TDCO,min/T. Whenever the PDF reaches t = T, it circulates back to t = 0 and then continues its distribution toward t = T. In the first case of ΔTPRBS,MAX < T, for example, ΔTPRBS,MAX = 0.25·T, intuitively f1(t) has non-zero values from a certain “t” to “t + 0.25·T” within [0, T). In the second case of ΔTPRBS,MAX = T or Mod[ΔTPRBS,MAX, T] = 0, f1(t) circulates multiple integer cycles uniformly from a certain “t” within [0, T) and then back to the same “t”. In the third case of ΔTPRBS,MAX > T and Mod[ΔTPRBS,MAX, T] ≠ 0, f1(t) exhibits two non-zero density magnitudes because fDCO,1(t) circulates within [0, T) multiple times with a non-zero remainder, and the delta between these two density magnitudes is 1/ΔTPRBS,MAX. The third case indicates f1(t) can converge to a uniform distribution by itself when ΔTPRBS,MAX ≫ T. The attributes of f1(t) seem insignificant since fn(t) will converge to a uniform distribution anyway; however, although all fn(t) are identical when n ≫ 1, this elementary PDF, f1(t), plays an important role from the standpoint of the correlations among all sampling PDFs, fn(t), n = 1 to NDCO.
To elaborate the correlations among all sampling PDFs, first of all, the covariance between adjacent samples, Yn and Yn+1, can be examined as follows convolution [6]:
where the 1st line of Eq. (17) is based on the fundamental covariance definition of two random variables, Yn and Yn+1, on the same sample space, R, with their joint PDF, f(yn, yn+1), and PDF variables, yn and yn+1. The 2nd line of Eq. (17) represents the covariance in the format of one-dimensional geometric probability with the modulo-T time-domain PDF variables, yn(t) and yn+1(t), and joint PDF, fn,n+1(t). yn(t) and yn+1(t) are the same because of the CKτ periodicity, and the possible outcomes of Yn and Yn+1 are either Logic-1 or Logic-0 both with the expectation E[Y] = τ/T = P1 verified in Eq. (15). By taking advantage of simple binary values, Cov[Yn, Yn+1] can be expanded into the summation of four conditional covariances based on the total four possible combinations of Yn and Yn+1 with their corresponding conditional joint PDFs as shown in the 3rd to 6th lines of Eq. (17). In general, fn(t) is a uniform distribution for all ΔTPRBS,MAX scenarios when n ≫ 1, so fn(t)|Yn = 1 and fn(t)|Yn = 0 (the conditional PDFs) are set by yn(t) across the modulo-T time interval. To further reach the conditional joint PDFs, fn,n+1(t)|Yn, each fn(t)|Yn has to circularly convolute with the fundamental PDF element, f1(t), which is a function of ΔTPRBS,MAX, so different f1(t) generate their corresponding fn,n+1(t)|Yn = 1 and fn,n+1(t)|Yn = 0 as follows [6]:
Note that fn(t) in Eq. (16) and fn,n+1(t)|Yn in Eq. (18) are different since fn(t) is calculated from fn-1(t) and becomes independent from f1(t), but fn,n+1(t)|Yn is obtained from fn(t)|Yn, and f1(t) determines their correlation. Eventually, the conditional joint PDFs of each scenario are derived from yn+1(t) across t ∈ [0, T) to thoroughly include all possible conditions of Yn and Yn+1.
When Mod[ΔTPRBS,MAX, T] = 0, the four conditional joint PDFs all maintain constant density magnitudes within their own integral time-intervals, [0, τ) and [τ, T), so the covariance of the adjacent samples can be further derived for this scenario easily [6]:
Based on the result of Eq. (19), i.e., a zero covariance between any adjacent samples Yn and Yn+1, and the accumulated relation from f1(t) to fn(t) shown in Eq. (16), Mod[ΔTPRBS,MAX, T] = 0 is the necessary condition for all, not just adjacent, fn(t) to be “independent.” By consolidating the identicality and independency of fn(t) for all ΔTPRBS,MAX scenarios, Mod[ΔTPRBS,MAX, T] = 0 is the fundamental criterion to perform an asynchronous RSA-based TDC with I.I.D. random sampling PDFs. This result matches the requirement of a synchronous RSA-based TDC when OSR = 1 and the CKDCO sampling PDF is uniformly distributed across one CKτ cycle, T. The main difference between the two is that the asynchronous RSA-based TDC requires more time per sample due to the deterministic offset TDCO,MIN. However, this deterministic offset per sample offers a margin for practical circuit implementation in the asynchronous RSA-based TDC.
Accurately performing “Mod[ΔTPRBS,MAX, T] = 0” may increase the cost of asynchronous RSA-based TDC implementation. To resolve this issue, the case of ΔTPRBS,MAX > T could be considered though the result of Eq. (19) indicates that the non-uniform conditional joint PDFs induce non-zero covariances. These non-uniform conditional joint PDFs are mainly caused by the non-uniform f1(t); however, the distribution of f1(t) can be flattened by setting ΔTPRBS,MAX ≫ T so that all the conditional joint PDFs can be roughly uniform. In other words, satisfying ΔTPRBS,MAX ≫ T can be easily implemented without the exact relationship between ΔTPRBS,MAX and T [7, 9], but the downside is to make conversion-rate even slower since TDCO,MIN is inevitably enlarged with ΔTPRBS,MAX when ΔTPRBS,MAX ≫ T. The compromise between the hardware cost and conversion-rate further confirms the necessity of variance-reduction (VR) techniques discussed in the following sections.
5. Self-antithetic variance reduction
The analysis of self-antithetic variance reduction (SAVR) technique in an asynchronous RSA system can be started from formulating the variance of a Monte Carlo estimate in general [10, 11]:
In Eq. (20), “n” and “k” are the sample indexes; Cov[Yn, Yk] represents the pairwise covariance of any two samples Yn and Yk. If n = k, then Cov[Yn, Yk] = Var[Yn] which is the variance of the n-th sample. If n ≠ k, Cov[Yn, Yk] = Cov[Yk, Yn] which is a symmetric covariance matrix. When Y1, Y2, …, and
According to the analysis about the covariances of the adjacent samples, Cov[Yn, Yn+1], under three ΔTPRBS,MAX scenarios (i.e., ΔTPRBS,MAX <, =, or > T) in Eq. (17), the conclusion shows Cov[Yn, Yn+1] can have pronounced non-zero values when Mod[ΔTPRBS,MAX, T] < T due to the non-uniform conditional joint PDFs within their own integral time-intervals, [0, τ) and [τ, T). In this section, the case of non-zero covariances between adjacent samples is further extended to the deduction of any pairwise correlation between Yn and Yk due to the fact of the DCO phase-noise accumulation property reflected by Eqs. (12) and (16). Thus, if the technique can take advantage of these non-zero covariances, Cov[Yn, Yk], and assure the pairwise covariance sum is negative, then VR can be successfully performed.
To find Cov[Yn, Yk], all conditional joint PDFs of any [Yn, Yk] pair have to be first formulated, where n = 1 to (NDCO − 1) and k = (n+1) to NDCO are sufficient to cover all covariance elements in the symmetric covariance matrix. Though there are only four possible binary combinational outcomes of a certain [Yn, Yk] pair, its conditional joint PDFs shall also cover all possible binary combinational outcomes from Yn+1 to Yk−1 because the accumulation property and Convolution Theorem described in Eq. (16) form the chain of correlations from each specific Yn through Yn+1, Yn+2, …, Yk−2, and Yk−1 all the way to each specific Yk. Thus, the conditional joint PDFs of a [Yn, Yk] pair under all possible conditions of [Yn, …, Yk−1] shall be generalized as follows [10]:
where q = 0, 1, 2, …, (2k − n – 1); all possible binary combinational conditions of [Yn, …, Yk−1] are represented by the corresponding decimal numbers, “q”, and decimal-to-binary operators, D2B[·], for the sake of simplicity. Since the location of “τ” within the modulo-T time-interval, [0, T), defines the PDF boundaries for the probability of Yk to be Logic-1 or Logic-0, the conditional joint PDF of a [Yn, Yk] pair under all possible conditions of [Yn, …, Yk] can be obtained by forcing one part, i.e., [0, τ) or [τ, T), of the PDF in Eq. (21) to zero to satisfy one of the possible Yk conditions [10]:
In other words, each conditional joint PDF under the conditions of [Yn, …, Yk−1] in Eq. (21) can diversify into two conditional joint PDFs under the conditions of [Yn, …, Yk] as shown in Eqs. (22) and (23), respectively. Note that the distribution profiles of all conditional joint PDFs are convoluted functions of τ and f1(t), and f1(t) itself is a function of Mod[TDCO,MIN, T] and ΔTPRBS,MAX. Therefore, the joint PDFs can be very different if the values of τ, Mod[TDCO,MIN, T], and ΔTPRBS,MAX are set differently; i.e., these parameters dominantly affect the behaviors of the pairwise covariances, Cov[Yn, Yk], which now can be generalized as follows based on the information of all conditional joint PDF of a [Yn, Yk] pair from Eqs. (22) and (23) [10]:
As shown in Eq. (24), Cov[Yn, Yk] can be grouped into four terms based on the binary combinational outcomes of a [Yn, Yk] pair, and each term is the summation of 2k−n−1 integrals of the conditional joint PDFs under all possible conditions of [Yn+1, …, Yk−1]. Thus, the total number of summation terms or conditional joint PDFs in Eq. (24) is 22·2k−n−1 = 2k−n+1 for each Cov[Yn, Yk]. In addition, each conditional joint PDF,
To simplify the analysis process, the mechanism of SAVR is elaborated by specific examples in an asynchronous RSA system, and then general cases can be further summarized. The following two examples have common parameter setups: the time-domain quantity under the RSA measurements, τ/T = 0.5 (= P1), NDCO = 256, and Mod[TDCO,MIN, T] ≈ T/2. The only difference is that their ΔTPRBS,MAX are set to T/8 and T/16 individually through the static PRBS energy-level control. The correlation function between Yn and Yk can be observed by plotting Cov[Yn, Yk] of each “n” across all possible “k” (= 1 to NDCO) with NEXP = 213. For the purpose of reference, note that the correlation functions of I.I.D. scenarios (i.e., ΔTPRBS,MAX = T) verified in Eq. (19) are basically zero for all n ≠ k and have the non-zero value, 0.25 (= P1·P0), at n = k. From these examples of ΔTPRBS,MAX < T, multiple attributes can be observed: (a) Cov[Yn, Yk] of each “n” is self-symmetrical with respect to “k = n”. (b) Because of the symmetric covariance matrix and Cov[Yn, Yk] = Var[Yn] = P1·P0 when k = n, all correlation functions have the same profile but shift along the k-axis based on the value of “n”, and therefore Cov[Yn, Yk] = Cov[
The effect of this SAVR technique can be more comprehensively demonstrated by the single-dimensional covariance sums and the overall (two-dimensional) covariance sum Var[
The analysis results so far for SAVR offer further insights: (a) A smaller ΔTPRBS,MAX, i.e., a narrower fundamental sampling PDF, f1(t), creates stronger correlations across all sampling points (i.e., longer correlation tails), longer LCMPs, and eventually more variance reduction. (b) Though the sign of each single-dimensional covariance sum can be either positive or negative with VR, the average of all single-dimensional covariance sums is always above zero, which verifies the principles of non-negative variances and finite measurement resolutions. (c) More importantly, the computation efficiency of each theoretical single-dimensional covariance sum and overall variance can be greatly improved by only summing the covariances surrounding the “k = n” within a few LCMPs [10]:
where “r” is the number of LCMP included in the approximation of Eq. (25). Although the accurate one-dimensional covariance sums and partial one-dimensional covariance sums formulated by the right-hand side of Eq. (25) with r = 1 and k = (n − LCMP/2) to (n + LCMP/2–1) at each “n” contain certain amounts of delta, the distribution of the partial one-dimensional covariance sums is relatively constant and basically the average of the accurate one-dimensional covariance sums except the values of “n” approach 1 or NDCO since the correlation functions of these “n” have significantly unbalanced correlation-tail lengths. Therefore, any partial single-dimensional covariance sum at any “n” far away from 1 or NDCO is sufficient for approximating the average of the accurate one-dimensional covariance sums formulated by the left-hand-side of Eq. (25). For instance, by plugging Eq. (25) into the end of the 1st line of Eq. (20) with “n = NDCO/2”, the overall variance can be approximated as follows [10]:
Note that the Cov[Yn, Yk] in Eqs. (25) and (26) is referring to its theoretical definition in Eq. (24), and the approximation errors in Eqs. (25) and (26) can be improved by extending the number of LCMP, i.e., “r”, included in the summation operators.
As discussed earlier, if the theoretical computation effort is dominated by the modulo-T circular convolutions, the computation efficiency improvement from Eq. (20) to Eq. (26) can be evaluated by the ratios between their operation numbers of modulo-T circular convolutions. With the fact of the correlation tails being roughly vanished when |k − n| > NDCO/2, the required number of modulo-T circular convolutions in Eq. (20) can be expressed in the numerator of Eq. (27). On the other hand, for Eq. (26), only the covariance elements within one LCMP need to be calculated, and the required number of modulo-T circular convolutions in Eq. (26) can be expressed in the denominator of Eq. (27) [10].
The ratio of modulo-T circular convolution numbers between Eqs. (20) and (26) under this specific example (i.e., Mod[TDCO,MIN, T] ≈ T/2 and ΔTPRBS,MAX ≈ T/8) is about 2.18 × 1034. This incredible “computation effort reduction” (NOT variance reduction), even just for NDCO = 28 and one LCMP included, is again mainly because all pairwise covariances theoretically have to be included in Eq. (20) due to the inevitable correlations (based on the DCO phase-noise accumulation property and Convolution Theorem) among the majority of NDCO samples, but actually only the average of all the covariances matters and can be approximated by summing the theoretical covariances within a few LCMPs of the central correlation function as shown in Eq. (26). In sum, the theoretical calculation results and the statistical asynchronous RSA simulation results match well, but the computation efficiency of the theoretical variance with SAVR enabled is actually quite low if relying on the equations from Eqs. (20)-(24). On the other hand, Eq. (26) brings the computation efficiency to a reasonable level but losing the accuracy. Only the Monte Carlo approach from the statistical software/lab experiments can simultaneously offer the efficiency and accuracy from the analytical point of view [10].
6. Control-variate variance reduction
The control-variate variance reduction (CVVR) [3] employs the information of errors in estimates of pre-set quantities to reduce the variance in an estimate of an quantity under the asynchronous RSA measurement. Note that SAVR and CVVR in this chapter both reduce the power of quantization error by creating correlations among the samples of each RSA measurement. The difference is that the correlations of SAVR exist among the samples of a random variable Y, i.e., [Y1, Y2, …,
The theoretical analysis of CVVR can start with the case where Y and YREF are individual I.I.D. random variables; i.e., no auto-correlation, and only cross-correlation is considered in this section. Eq. (28) shows the process of sampling CKτ and CKτREF by CKDCO simultaneously to generate a cross-correlation between Y and YREF only at the sampling instants of “n = k” because Y and YREF are individual I.I.D. random variables [11]:
In Eq. (28), both “n” and “k” = 1 to NDCO. Furthermore, the cross-covariance between the outcomes of the parallel asynchronous RSA processes,
Since Y and YREF are individual I.I.D. random variables, the cross-covariance can be simplified based on Eq. (28) as follows [11]:
As shown in Eq. (30), sampling CKτ and CKτREF by CKDCO simultaneously can create a cross-correlation between
where
According to Eq. (32), the correlation coefficients are functions of τ and τREF since P1 = 1 − P0 = τ/T and P1,REF = 1 − P0,REF = τREF/T according to Eq. (6), so the degree of the cross-correlation between Y and YREF is determined by the amount of overlap between the waveforms of the CKτ and CKτREF, which is proven by Eq. (32).
Once confirming the cross-correlation between Y and YREF can be implemented by the random sampling process, the Monte Carlo estimate of YREF is obtained by the regular asynchronous RSA process to measure the duty-cycle of CKτREF (τREF/T) with a relatively high accuracy so that
With the information of
where YCV,n is the variance-reduced version of Yn per sample; the error term, (YREF,n -
There are two major concerns in the IC implementation: additional circuit power/area consumption and achievable degree of correlation. Actually, realizing YCV,n in Eq. (33) requires a high speed/resolution digital operation per sample; this can consume a certain amount of power/area because of two reasons: (a) The pre-measured
Compared to Eqs. (33)-(35), respectively, Eqs. (36)-(38) represent a power/area efficient IC implementation of the RSA with CVVR technique: (a) Instead of finding every high-resolution YCV,n in Eq. (33) and then averaging NDCO samples, Eq. (36) shows that
With Eq. (36) and pre-measured
By plugging Eq. (39) into (38), the minimum variance of the RSA with CVVR technique is shown as follows [11]:
Clearly, the amount of variance reduction (the 2nd term of Eq. (40)) from the variance of an I.I.D. case (the 1st term of Eq. (40)) is mainly determined by the cross-correlation between
Based on Eqs. (39)-(41), several attributes and implementation methodologies of CVVR are discussed as follows: (a) The gain of CVVR, GCV, is a function of the cross-correlation between the final outcomes of the parallel RSA processes,
The attributes mentioned above are all under the assumption of knowing optimal the value of μCV. However, based on Eqs. (30) and (39) regarding μCV and Cov[
Though Eq. (42) eliminates the necessity of ideal expectations, it requires hardware to store the entire data sequences of Yn and YREF,n with NDCO samples until the parallel RSA process outcomes,
7. Conclusion
The evolution of the four RSA techniques has been reviewed in this chapter, including synchronous RSA, asynchronous RSA, asynchronous RSA with SAVR, and asynchronous RSA with CVVR, whose theoretical expectations, variances, figure of merits, and parameter settings of their IC implementations are listed in Table 1. The main goal of this chapter is to summarize the concept/algorithm of the RSA with VR techniques and introduce the unified RSA-based TDC architecture for both high-speed and high-resolution TCSPC applications.
Technology | 22-nm CMOS | 22-nm CMOS | 22-nm CMOS |
---|---|---|---|
Technique | Asyn. RSA | Asyn. RSA w/SAVR | Asyn. RSA w/CVVR |
DCO Power (mW) | 3 | 3.1 | 3 |
Sampling Frequency (GS/s) | 4 | 7.8 | 4 |
# of Sampling Phases | 8 | 8 | 8 |
Dynamic Range (ns) | 10 ∼ 1000 | 10 ∼ 1000 | 10 ∼ 1000 |
ENOB | 12 @NDCO = 224 14 @NDCO = 228 | 12 @NDCO = 219 14 @NDCO = 223 | 12 @NDCO = 220.9 14 @NDCO = 224.9 |
Effective Resolution (ps @14 ENOB) | 0.61 ∼ 61 | 0.61 ∼ 61 | 0.61 ∼ 61 |
Conversion-Rate (kHz @12 ENOB) | 2 | 120 | 16 |
TDC Power (mW) | 1.3 | 1.5 | 1.9 |
TDC FoM (pJ/step @12 ENOB) | 159 | 3.1 | 29 |
DCO + TDC Power Ratio | 1 | 1.07 | 1.14 |
Conversion-Rate Ratio | 1 | 60 | 8 |
TDC Area (mm2) | 0.01 | 0.01 | 0.018 |
Digital Filter Power (mW) | 0.45 | 0.90 | 0.91 |
Multi-bit Digital Operations per RSA-based TDC | Eq. (6) | Eq. (6) | Eq. (36) and (43) |
Theoretical Expectation | Eq. (15) | Eq. (6) | Eq. (37) |
Theoretical Variance | Eq. (7) | Eq. (20) | Eq. (40) |
Circuit Parameters @T = 250 ps, TDCO,MIN = T/2 | ΔTPRBS,MAX = T | ΔTPRBS,MAX = T/32 | ΔTPRBS,MAX = T, τREF LSB = T/16 |
References
- 1.
Becker W. Advanced Time-Correlated Single Photon Counting Techniques, Berlin. Germany: Springer; 2005 - 2.
Becker W. The bh TCSPC Handbook. 7th ed. Berlin, Germany: Becker & Hickl GmbH; 2017 - 3.
Glasserman P. Monte Carlo Methods in Financial Engineering. New York, NY, USA: Springer; 2003. DOI: 10.1007/978-0-387-21617-1 - 4.
Ghahramani S. Fundamentals of Probability. Upper Saddle River, NJ, USA: Prentice-Hall; 1996 - 5.
Haykin S. Communication Systems. 4th ed. New York, NY, USA: Wiley; 2001 - 6.
Wu T, Yang R, Hsueh T-C. Random sampling-and-averaging techniques for single-photon arrival-time detections in quantum applications: theoretical analysis and realization methodology. IEEE Transactions on Circuits and Systems I: Regular Papers. 2022; 69 (4):1452-1465. DOI: 10.1109/TCSI.2021.3135833 - 7.
Hsueh T-C, O’Mahony F, Mansuri M, Casper B. An on-die all-digital power supply noise analyzer with enhanced spectrum measurements. IEEE Journal of Solid-State Circuits. 2015; 50 (7):1711-1721. DOI: 10.1109/JSSC.2015.2431071 - 8.
Oppenheim AV, Schafer RW, Buck JR. Discrete-Time Signal Processing. 2nd ed. Upper Saddle River, NJ, USA: Prentice-Hall; 1999 - 9.
Hsueh T-C, Balamurugan G, Jaussi J, Hyvonen S, Kennedy J, Keskin G, et al. A 25.6Gb/s differential and DDR4/GDDR5 dual-mode transmitter with digital clock calibration in 22nm CMOS. In: IEEE ISSCC Digital Technical Papers. San Francisco, CA, USA: IEEE; 2014. pp. 444-445. DOI: 10.1109/ISSCC.2014.6757506 - 10.
Wu T, Hsueh T-C. A high-resolution single-photon arrival-time measurement with self-antithetic variance reduction in quantum applications: Theoretical analysis and performance estimation. IEEE Transactions on Quantum Engineering. 2022; 3 :1-15. DOI: 10.1109/TQE.2022.3209211 - 11.
Yang R, Wu T, Hsueh T-C. A high-accuracy single-photon time-interval measurement in mega-Hz detection rates with collaborative variance reduction: theoretical analysis and realization methodology. IEEE Transactions on Circuits and Systems I: Regular Papers. 2023; 70 (1):176-189. DOI: 10.1109/TCSI.2022.3206406