Time-correlated single-photon counting (TCSPC) applications usually deal with a high counting rate, which leads to a decrease in the system efficiency. This problem is further complicated due to the random nature of photon arrivals making it harder to avoid counting loss as the system is busy dealing with previous arrivals. In order to increase the rate of detected photons and improve the signal quality, many parallelized structures and imaging arrays have been reported, but this trend leads to an increased data bottleneck requiring complex readout circuitry and the use of very high output frequencies. In this paper, we present simple solutions that allow the improvement of signal-to-noise ratio (SNR) as well as the mitigation of counting loss through a parallelized TCSPC architecture and the use of an embedded memory block. These solutions are presented, and their impact is demonstrated by means of behavioral and mathematical modeling potentially allowing a maximum signal-to-noise ratio improvement of 20 dB and a system efficiency as high as 90% without the need for extremely high readout frequencies.
Time-correlated single-photon counting (TCSPC) is a mature and extremely accurate low light signal measurement technique that uses single quanta of light to provide information on the temporal structure of the light signal. The method was first conceived in nuclear physics  and was for a long time primarily used to analyze the light emitted as fluorescence during the relaxation of molecules from an optically excited state to a lower energy state . Today, TCSPC is widely used in many applications that require the analysis of fast weak periodic light events with a resolution of tens of picoseconds such as diffuse optical tomography (DOT) [3, 4], fluorescence lifetime imaging (FLIM)  and high-throughput screening (HTS) . TCSPC is based on detecting single photons of a periodical light signal, measuring the detection times within the light period and reconstructing the light waveform from the individual time measurements after repeating the measurements for enough times. Traditionally, the TCSPC technique relied on vacuum tube technologies such as PMTs and MCPs. These mature technologies are capable of achieving very good performances, but they are expensive, cumbersome and fragile and require extremely high operating voltages, which make them unsuitable for the fabrication of miniaturized portable TCSPC imaging systems. In recent years, single-photon avalanche diodes (SPADs) have gained a wide popularity as a less expensive and more compact alternative for vacuum tube detectors. The integration of planar epitaxial SPADs in standard CMOS technology has significantly improved the level of miniaturization of SPADs and paved the way for SPAD arrays. These devices possess the typical advantages of microelectronics integrated circuits, such as small size, ruggedness, low operating voltages and low cost. Furthermore, they can be directly implemented with the necessary associated circuits on the same chip to realize an integrated, ultrasensitive, high-speed and low-cost TCSPC imaging system. Many SPAD-based TCSPC systems have been successfully demonstrated lately. Nowadays, state-of-the-art imaging sensors integrating thousands of single-photon detectors on the same chip have been demonstrated in standard CMOS technology [7, 8]. Most integrated TCSPC systems consist of 2D arrays or 1D arrays of SPADs with their associated electronics in the form of smart pixels resulting in a trade-off between high-photon detection efficiency and advanced electronic functionalities [9, 10, 11]. This approach allows for a better detection efficiency compared to a single commercial SPAD. However, such designs should be conceived such that the detection yield is optimized, i.e. ensure an optimal detection efficiency and a limited counting loss probability. In this chapter, we present these two issues and propose methods to quantify and limit their effects based on mathematical and behavioral modeling.
2. A parallelized macropixel structure for SNR optimization
Single-photon avalanche diodes (SPADs) operate in Geiger mode; in this mode, the p-n junction is biased beyond its breakdown voltage, as a result a high electric field exists in the charge space such that a charge carrier ideally created by photoelectric interaction is enough to generate a self-sustained avalanche. Indeed, unlike linear APDs, where stopping the light signal is enough to stop the avalanche, when an avalanche is triggered in an SPAD, the current will continue to increase until the destruction of the component as a result of overheating. Therefore, the avalanche must be swiftly quenched by an associated circuitry that senses the avalanche and stops it by reducing the reverse bias below the breakdown voltage, so that the avalanche cannot maintain itself, then returned it to its initial condition. The circuit used to accomplish these tasks is the quenching circuit, and the selection of such circuit is not a trivial task as it directly affects many of the SPAD performance metrics . It is therefore important to choose a suitable quenching circuit for the desired application so it will not limit or deteriorate the SPAD characteristics.
Each SPAD with its associated electronics forms an independent pixel, and the quenching electronic is the main part of the SPAD-associated electronics; however, other smart functionalities could also be included in the pixel. In particular, it is possible to use a gating signal to activate or inactive the SPAD; this functionality is traditionally used to operate the SPAD in gated mode where it is enabled only during the gate-on window and disabled during the gate-off time interval such that absorbed photons do not trigger an avalanche. This functionality could also be used to deactivate SPAD showing an abnormal behavior that affects the system yield. In , a macropixel architecture that makes use of such approach was implemented, in this approach. The macropixel (Figure 1) is divided into eight pixels that could be activated or deactivated based on their activity levels. This option was added to ensure that the SNR is not affected by an undesirable effect that could decrease the detector’s efficacy.
The signal delivered by a photon counting detector is affected by temporal fluctuations that are expressed as a Poisson distribution. If
If is considered as a constant equal to the mean value instead of being measured each time, the variance of the term comes to zero, and thus, the number of photons and its associated noise are given by
Therefore, the signal-to-noise ratio is
In the case of a multi-SPAD macropixel, the SNR of the macropixel structure is the sum of each SPAD photon count divided by the total noise component:
where Nphi is the number of detected photons and Ndi is the dark count rate of the ith SPAD (SPADi) in the macropixel. Consequently, the signal-to-noise ratio can be optimized by switching SPADs on/off such that pixels showing undesirable activity levels are deactivated. These undesirable pixels could be ‘hot pixels’ showing an above-average high dark count rate or ‘dark pixels’ showing a below-average low light sensibility.
2.1. Hot pixel elimination
These pixels could be identified through a calibration phase where the individual DCR of each SPAD
By turning off the noisy SPAD, the SNR becomes
Consequently, disabling the noisy SPAD leads to a signal-to-noise ratio improvement of
where is the mean photon count on the mean DCR ratio.
Figure 2 shows the SNR gain versus the hot pixel DCR multiplication factor
2.2. Dark pixel elimination algorithm
The scenario that could lead to lower SNR is pixels with low light sensibility due to a manufacturing defect, a dust or as a result of the SPADs not being uniformly illuminated. To evaluate the SNR gain resulting from eliminating such pixels, we will consider the case where the eliminated SPADs are completely blind. This is the worst case of light sensibility and the elimination of these dark pixels results in the best SNR improvement (Figure 3). Assuming n as dark pixels, the corresponding SNR is
If all blinded SPADs are turned off, the SNR becomes
Consequently, for , the SNR gain is given by
2.3. SNR gain evaluation
A low SNR could be the result of a low signal levels or high noise levels; consequently, the SNR could be improved by elimination of pixels exhibiting high noise levels (hot pixel elimination) or pixels exhibiting low light sensibility (dark pixel elimination). Both schemes require a calibration phase. In the case of dark pixel elimination scheme, the counting rate of each pixel must be measured under illumination to detect SPADs with low sensibility levels, and these measurements should be repeated if the test conditions change. The hot pixel elimination scheme on the other hand requires a onetime calibration phase to measure the individual DCR for each SPAD and deactivate the too noisy SPADs based on their DCR levels. Both approaches resulted in an improved SNR; however, the dark pixel elimination efficiency was relatively low, whereas the hot pixel elimination was found to be useful in most cases.
3. Efficiency improvement of TCSPC systems
3.1. Counting loss in TCSPC systems
Typical TCSPC setup consists of a pulsed optical laser source, a photon detector such as a silicon photon multiplier (SiPM) or an SPAD, a time measurement block based on a time-to-digital converter (TDC) or time-to-amplitude converters (TAC) and an external CPU to process the measurement results. When a photodetection occurs, a certain time is required for data processing; such time interval is referred to as ‘dead time’ because the system is incapable of processing any additional photons collected by the SPAD resulting in counting loss and a reduction of the SNR caused by the decreased counting efficiency which is at best equal to
This issue is further complicated by the random nature of photon arrivals and the fact that TCSPC applications such as FLIM and HTS usually deal with high counting rates. In order to increase the rate of detected photons and improve the SNR, many parallelized imaging structures have been reported [5, 15], but this trend leads to an increased data bottleneck which requires the use of complex readout circuitry  as well as very high output frequencies to ensure a reasonable dead time . Another solution for the high output rate is the use of an embedded FIFO to store the measurement results, while they have been processed; nevertheless, FIFOs are very demanding in terms of power and silicon area, and to our knowledge, there has been no study done to properly determine the exact FIFO length required to achieve optimum results. It is therefore important to evaluate the counting gain resulting from the use of an embedded FIFO as a function of its depth and the readout rate.
3.2. TCSPC system as a queuing model
TCSPC systems are based on measuring arrival times of single-photon events. Processing these measurements requires several additional operation steps such as quenching the photon detector, shaping the regenerated signal, converting the time to a digital value and sending it into a processing unit or memory. While these operations are being conducted, the system is unavailable to process another measurement for a certain time interval referred to as ‘dead time’. To simplify the study of the TCSPC system, the readout period is considered equal to the system’s dead time. The dead time as well as the random nature of the single-photon detection events leads to random counting losses as the system is busy processing a previous photon arrival, thus limiting the system efficiency. To evaluate the counting loss, the TCSPC can be modeled using a queuing model with an arrival rate
The FIFO’s output follows a periodic departure process with a departure rate
3.2.1. Steady-state probability evaluation
where defined as
is the photon rate to the readout rate ratio. The number of occupied cells after the (
And, the transition probability from the state
In particular the one-step transition probability is.
which allows us to define the
The steady-state distributions satisfy the following equations:
Furthermore, the vector
resulting in a system of N equations with N variables (26):
3.2.2. Blocking probability
The main goal of using this queuing model is to evaluate the system efficiency based on the probability of an arrival finding the FIFO fully, and as a result of being lost, such probability represents the blocking probability
πk: State probabilities at departure instants
πa,k: State probabilities at arrival instants regardless whether the arrival joins the queue or not
An important property of the Poisson arrival process is the Poisson Arrival See Time Averages  which implies that the distribution of occupied cells seen at arrival instants is the same as the distribution seen by a random observer:
On the other hand, the probability that an arrival finds
In particular for
Furthermore, arrivals entering the system occur at a rate
Simultaneously, departures out of the system continue to occur with a rate
Given that in equilibrium the traffic entering the queue system is equal to the one leaving the queue , we have
And, the blocking probability is
The described method was used to determine the blocking probability and the system efficiency:
where is defined in (26).
Figure 7 shows the system efficiency for the use of a buffer and a FIFO with
3.3. Case study of a parallelized TCSPC system including an embedded FIFO
The TCSPC system illustrated in Figure 8 was designed to be used for an HTS application that requires counting rates up to several MHz per channel. With a TDC dead time of 40 ns, the maximum data rate is equal to 25 MS/s. According to Figure 7, the use of a unique TCSPC module would lead to an efficiency η of, respectively, 98, 90 and 50% for a photon rate of 0.25, 2.5 and 25 MHz, i.e. a service rate of 0.01, 0.1 and 1. Obviously, for a service rate ρ > 1, the system’s efficiency would tend to be 1/ρ regardless of the use of a FIFO. A photon rate of λ = 25 Mega photons/s is therefore not reasonable in the configuration of a single TCSPC module, but if the arrival rate is divided among the eight TCSPC (Figure 8) and assuming that the arrival process is equally distributed among the eight units, each TCSPC
resulting in a service rate and an efficiency , i.e. an expected departure rate out of each TCSPC unit which is similar to the value obtained in . Giving the low service rate of each TCSPCi, the output of each TCSPC unit will have a distribution very similar to the Poisson process, and the resulting process is the sum of eight Poisson processes with their respective arrival rate and is therefore also a Poisson process with an arrival rate:
Assuming an output frequency of only 33.33 MHz, the service rate will be . In the absence of the FIFO, the system can be assimilated to a buffer resulting in memory block efficiency and a total efficiency:
The efficiency of the system is therefore not improved by the parallelization of the TCSPC even with the reduction of the pile-up effect. However, using the eight FIFO cells leads to a memory block efficiency of the overall TCSPC system efficiency is maintained at about 90%. Such efficiency level can only be achieved with a 3 GHz output frequency without the use of the FIFO which proves the great impact including the FIFO in the TCSPC system.
The random nature of photon and applications involing a high counting rate require a specialized TCSPC system scheme to process the resulting data and improve the SNR. This requires the optimization of the photon detection process through the reduction of noise effects and low sensibility. It also requires the optimization of the system?s architecture such that photon events are not lost due to the dead time following a previous photon arrival. In this chapter, we have discussed these two issues and presented solutions using mathematical models to assess the gain of such schemes. A low SNR could be the result of low signal levels or high noise levels. In the case of an SPAD, a low signal level is the result of low light sensibility, while a high noise level is the result of a high DCR. Thus, increasing the detector’s SNR can be achieved by limiting the negative effect of these two cases. We presented a TCSPC macropixel architecture in which the SNR can be increased by deactivating dark pixels and/or hot pixels. A dark pixel is a pixel with an abnormally low sensibility level and a hot pixel is a pixel with high noise level in comparison to other pixel noises. The dark pixel elimination scheme requires a calibration phase to determine the activity level of each pixels and the low sensibility pixels that must be deactivated; this calibration phase should be conducted whenever the measurement conditions are changed and would lead to an SNR gain up to 1.5 times higher. The hot pixel elimination scheme on the other hand requires a onetime calibration scheme to determine the DCR of each pixel, and as a result, the pixels must be deactivated which allow an SNR improvement ranging up to 20 dB. The processing of detected photons can be optimized by means of a parallelized TCSPC architecture that make use of an embedded FIFO to limit counting loss due to photon detections’ subsequent dead time. Using a queueing model, we demonstrated the impact of such approach and quantified the efficiency improvement as a function of the FIFO length, the counting rate and the readout rate. The proposed TCSPC architecture is capable of achieving a 90% efficiency rate with a counting rate of 25 MHz at a readout rate of 33 MHz. Without the use of the embedded FIFO; such efficiency would require the use of a 3 GHz readout frequency.