Computationally Efficient Hybrid Interpolation and Baseline Restoration of the Brain-PET Pulses

The design and component level architectures of a novel offset compensated digital baseline restorer (BLR) and an original hybrid interpolator are described. It allows diminishing the effect of modifications occurring during the readout of Positron Emission Tomography (PET) pulses. Without treatment, such artifacts can result in a reduction in the scanner’s performance, such as its sensitivity and resolution. The BLR recompenses the offset of PET pulses. Afterward, the pertinent parts of these pulses are located. Onward, the located portion of the signal is resampled by using a hybrid interpolator. This is constructed by cascading an optimized weighted least-square interpolator (WLSI) and a Simplified Linear Interpolator (SLI). The regulation processes for the WLSI coefficients and evaluation of the BLR and the interpolator modules are presented. The proposed hybrid interpolator’s computational complexity is compared with classic counterparts. These modules are implemented in Very High-Speed Integrated Circuits Hardware Description Language (VHDL) and synthesized on a Field Programmable Gate Array (FPGA). The functionality of the system is validated with an experimental setup. Results reveal notable computational gain along with adequate dynamic restitution of the bipolar offsets besides a useful and accurate improvement of the temporal resolution relative to the computationally complex conventional equivalents.


Introduction
The recent technological developments in the field of microelectronics have revolutionized the development and deployment of biomedical implants, mobile healthcare, and biomedical scanners. In this framework, a variety of highperformance PET scanners have been proposed and realized during the last decade [1][2][3]. PET imaging is evolving the overall influence of nuclear medicine [1]. It is because of its superior performance, in terms of resolution, compared to the Single Photon Emission Computed Tomography (SPECT). Additionally, it is the rapidly discussed in Section 3. Section 4 presents the experimental results. Section 5 makes a discussion and concludes the chapter. Figure 1 displays a block diagram of the proposed system. It illustrates that in the intended patient body "1" an appropriately controlled quantity of the radioactive tracer "2" is injected. The radioactive tracer used in this analysis is 18F-FDG. Most of this tracer is distributed through contaminated brain cells after a certain time [5]. It originates β+, which is annihilated with electrons of medium. Every annihilation emits two 511-keV energy γ-rays. These are simultaneously released at about 180°relative angle and interact with crystal scintillators of the detection sensors "4." Scintillators convert energies of γ-rays into photons of light. Afterward, photo-detectors pick up these photons and turn them into electrical pulses [8][9][10][11][12][13]. A determined number of sensors are used in a detection ring "3." The front-end electronics, located in the detection sensors "4," further prepare and process PET pulses emanating from photo-detectors. For the construction of tomographs, the selected pulses with extracted parameters like addresses of the intended crystal and the involved sensor are conveyed to the IRM "5."

The detection ring
The detection ring consists of a group of four sensors, arranged axially around an "B" axis in a circle. Most contemporary PET scanners are built from radially arranged scintillators [15,23]. The use of scintillator axial arrangements with the appropriate treatment of lateral sides of scintillators could increase the scanner's spatial resolution compared to alternatives, based on the radial arrangement of scintillators [11,23].
Each sensor is composed of a scintillator crystal matrix. The LYSO scintillators are used in this study [6,7]. The γ-rays interact with these crystals and turn their energies into photons of light. These photons are then sensed by two photo-detector matrices, positioned on both sides of the scintillator matrix. Such photo-detectors produce two electrical pulses as a result of each interaction [11,26]. Each face of a crystal scintillator is positioned with a photo-detector. Only 2xP photo-detectors are used for a matrix of P crystal scintillators (cf. Figure 2). It reveals a γ-ray's interaction with one scintillator crystal "6." The crystal has faces "6a" and "6b" that are paired with two "7a" and "7b" MPPCs, respectively. The PET pulses produced by MPPCs are transferred to the electronics front-end module "8." These pulses are sorted and processed and the selected pulses with extracted information are conveyed to IRM "5." A higher count of photons is attainable by MPPCs. Moreover, their behavior is free of the magnetic field influence. It is the reason behind the frequent usage of MPPCs in PET scanners as photo-detectors. The pixel size of the utilized MPPCs is 50 Â 50μm 2 . Each MPPC contains 3600 pixels. The MPPC-generated pulses are of extremely low amplitude [20]. Hundreds of microamperes bound the highest amplitude [12]. Thus, they are noise sensitive and can be completely malformed. Such pulses are amplified to enhance their immunity. It is realized by cascading a charge resistance with an appropriate bandwidth amplifier [29]. Such amplified pulses are transferred to the blocks embedded in "8" for signal selection and address encoding. They pick pulses among a matrix for the active crystal and also ensure that only one crystal is enabled at a time. If this requirement is satisfied, then the further processing of the selected pair of pulses is realized.
An appropriate baseline for the selected pulses should be available for the correct functionality of the suggested BLR [12]. Two lumped delay lines are used for each sensor in this context [30]. They incorporate a delay of 50 ns, in the selected PET pulses, with a standard deviation of only about 100 ps.
The delayed pulses, after A/D conversion, are conveyed to the IRM. To attain appropriate accuracy, the sampling frequency, FS, and the quantizer resolution should be selected tactfully [11,12,29]. In this case, the selected and prepared PET pulses are acquired with two 12-bit resolution Analog-to-Digital Converters (ADCs), functioning at FS = 200 MHz. Besides the chosen pulses, addresses of the effective crystal and the sensor in question are also encoded and communicated to the IRM. These data, obtained from different sensors in the detection ring, are used by the IRM for calculating DOIs and LORs. In this way, the three-dimensional tomographs are produced.

Digital conditioning of the PET pulses
The amplification enhances the resistance against noise in the PET pulses [29]. Besides, this amplification can also inject random offsets in the processed pulses. It influences the performance of energy estimators and degrades DOI accuracy. In the same manner, the digitization of selected PET pulses degrades the temporal resolution. The reduced temporal resolution lowers the performance of the post timestamp and LORs' estimators. A rise in FS can enhance the temporal precision. It does, however, lead to a costly solution in terms of cost and power utilization [30].
The suggested BLR and interpolator processes concepts are illustrated by using Figure 3. It indicates that the BLR mainly processes the digitized versions of selected pulses in order to recover their baselines. Outputs of BLRs are conveyed to energy estimators and an adder. The added pulses are interpolated. It is performed to ameliorate the temporal resolution. Onward, the up-sampled signal is used to estimate the timestamp [11,23].

The BLR
The BLR concept is shown in Figure 4. Two equivalent modules are introduced, working independently to retrieve the selected pair of PET pulses: S ka and S kb (cf. Figure 3). Figure 4 shows that a real-time offset value, O Calc , is first estimated from the incoming pulse by the BLR. It is determined as a mean of the baseline of the incoming pulse. The concept can be interpreted in mathematical terms using Eq. (1), where N represents the count of concerned samples, belonging to the digitized pulse baseline. S k n represents the sampled signal and n indexes the considered samples. N belongs to the set {1, 2, … , N}. N is selected in accordance to the  employed delay line and FS [29]. N = 8 is selected. It enables the mean value of the incoming pulse baseline to be estimated through accumulation and the right shifting while eviting the utilization of a complex conventional divider.
The restored pulse, free of offset, is obtained by employing Eq. (2) where y k n is the n th restored signal sample and S k n is the n th input signal sample.
This O Calc estimation in real-time makes the suggested BLR self-adjustable. The value of O Calc adapts as a function of the intended PET pulse. It allows an effective restoration of the incoming pulses with a diverse range of bipolar offsets [29].

The hybrid interpolator
The concept of the suggested hybrid interpolator is clear from the block diagram, shown in Figure 5. Figure 5 shows that the intended signal x k n , generated as sum of the outputs of both BLRs, is preliminary conveyed to the signal leading-edge selector. The attention on the leading-edge is due to the form of post timestamper, which is formed by combining multi-thresholds leading-edge discriminators [11].

The leading-edge selection
The signal selection mechanism is illustrated by using Figure 6, where V max is the input signal maximum amplitude, α is a percentage of V max and is chosen equal to 10% of V max . If the n th input signal sample, x k n , crosses the α.V max threshold then Q + 2 samples are selected. The association among the n th nominated signal sample, xs n , and the n th input signal sample, x k n , can be presented in mathematical terms with Eq (3).
This process of selection prevents processing of the entire signal length and thus dramatically improves the performance of the proposed system in terms of computation and power consumption [11].
The signal selector composes of a magnitude comparator, a circular buffer, and a module for logic and control. The concept is illustrated by using Figure 7. It displays that each input signal sample is compared with the predefined threshold α.Vmax. Once that threshold is exceeded by an input sample, the magnitude comparator output will be high. It is used as a logic and control unit notification that allows the circular buffer to output the xs n . The logic and control unit is based on a counter and a J-K latch. The comparator pilots the latch. Once set, it enables the output port of the buffer. The counter provides the address of buffer registers needed to read. Finally, the logic and control unit resets the J-K latch, after reading the xs n .

First stage interpolator
The chosen portions of pulses are up-sampled with interpolation factor, IF = 4, by utilizing the primary interpolator stage. It means three equally spaced samples are positioned statistically between the initial two consecutive samples. It fourfold ameliorates the first stage output temporal resolution compared to its input. The Weighted Least Square Interpolator (WLSI) up-samples at the first stage [11].
WLSI coefficients are determined using 10,000 summed pulses. The summed pulses provided by the adder "9" are used as reference ones (cf. Figure 9). It allows the interpolated values to be compared with the reference ones and adapts the interpolation coefficients to reduce the differences from the reference signal. In particular, these coefficients are determined by utilizing the Least Square (LS) algorithm to diminish the squares of discrepancies among the interpolated values and the actual ones. Five samples of xs n are passed to the first stage interpolator input (cf. Section 3). After up-sampling, it outputs 17 samples. The error functions of the WLSI coefficients for the 12 implanted samples are estimated as: : : xs 1 :w 1,12 þ xs 2 :w 2,12 þ … xs 5 :w 5,12 À xrefNp n À Á 2 (6)  In Eqs. (4)-(6), K shows the total number of summed pulses, used in the measurement. In this case, K = 5000 is selected. C is the count of samples per intended pulse. Eq. (7) determines the matrix of coefficients w, referring to the minimum errors.
where X is the matrix of the initial values used for the interpolation. Xref is the matrix of reference samples at the time instants where approximations are made.
The program for setting the X and Xref matrices and carrying out the matrix computations to calculate the WLSI coefficients is specifically implemented in MATLAB [31]. The architecture utilized by the designed WLSI can be seen in Figure 10.

Second stage interpolator
The output of first stage is further up-sampled by the second stage interpolator. It is presumed that the signal between two successive samples changes linearly. Therefore, in the second stage, the SLI with an IF = 4 is deployed. The value of n th approximated value, xr n , corresponding to the n th interpolating instant, tr n , is equal to the mean of its previous and following incoming samples (cf. Eq. (10)).
In this way, the temporal resolution of the second stage interpolator output is 16fold superior than that of the original incoming signal. The second stage interpolator collects 17 samples from the prior WLSI module and outputs 65 samples. Figure 8 illustrates the architecture and working of SLI. It indicates that 16 identical modules are operating concurrently.

The experimental setup
The system applicability is studied by using an experimental setup, as shown in Figure 9. It indicates that the radioactive tracer, 22 Na, is relocated by a robotic arm with respect to the utilized crystal scintillators matrix "6." The gap among successive relocation steps remains equivalent to 1 mm. The system includes a sensor, consisting of four LYSO crystals of 3 Â 3 Â 60 mm 3 , enclosed from both sides with 3 Â 3 mm 2 arrays of Hamamatsu MPPCs, "7a" and "7b." MPPC array consists of four MPPCs of 3 Â 3 mm 2 area, to suit the used scintillators.
The front-end electronics modules, "8a" and "8b," manage the pulses, which come from both MPPC matrices "7a" and "7b." They realize amplification, selection, and addition of delay. Signal selection and address encoding block "10" is used to process outputs of "8a" and "8b." The selected pulses are then added using the adder circuit "9." Finally, the selected pulses supplied by "10" and their sum, supplied by "9" are digitized concurrently by using an oscilloscope. The LeCroy WavePro 7300A oscilloscope is utilized in this study. The pulses, produced by "10" are digitized at FS = 200 MHz. The sum of pulses produced by "9" is digitized at the sampling rate of 3.2GHz. Later, when characterizing the hybrid interpolation module, these pulses, obtained at a frequency of 3.2 GHz, are used as reference ones. The approach is described further in Section 3.2.

VHDL implementation and synthesis on FPGA
The designed BLR and hybrid interpolator modules are implemented in VHDL and are synthesized on an FPGA. Preliminarily the 28-nm technology-based "xc7a200t FPGA" from the "Artix-7" family is considered [33].The circuit VHDL implementation is synthesized on the selected FPGA by using the Xilinx Synthesis Technology (XST) [34,35]. The system Register-Transfer-Level (RTL) schematics are shown in Figure 11a and Figure 11b. Figure 11a shows the top-level RTL schematic and Figure 11b shows the components-level RTL schematic.
The post-synthesis summary of the FPGA resource utilization, by one BLR and hybrid interpolator, is presented in Table 1. It shows that more than 10 proposed BLR and hybrid interpolator chains can be implemented on a single xc7a200t chip. The available DSP-blocks, on a single xc7a200t chip, mainly pose this limitation on the count of proposed conditioning chains. In case of need, this limitation can be resolved by using alternate logic-cells-based architectures and implementations, which minimize the use of DSP-blocks.

Results
The statistical features of the PET pulses such as rise-time, fall-time, bandwidth, and magnitude are extracted. It is carried out by analyzing the PET pulses delivered by "10." Ten thousand pulses are considered through this operation. They are collected at a sampling rate of 200 MHz by utilizing the inbuilt ADCs of the oscilloscope. A summary is presented in Table 2.

The characterization of BLR
The utilized delay lines inject 50-ns delays in the selected pulses. These are acquired with a sampling rate of 200 MHz. Thus, 10 samples are collected on the baselines of arriving pulses. While estimating O Calc , N = 8 is selected. It refers to primarily 40-ns portion of the baseline. On the accumulator output, the division with a factor of 8 is attained by executing a 3-bit right shift. Figure 12 shows a typical performance of the conceived BLR. It depicts how well the offset, injected by the front-end electronics, is compensated from the received pulses. Amplification and conditioning chains in modules "8a" and "8b" add a particular offset in the received pulses. Due to the influence of various offsets, for a fixed location among the radiotracer and the crystal scintillator matrix "6," the incoming pulses demonstrate a peak amplitude dispersion of roughly 25% among them. With the application of the designed BLR, these dispersions are limited to approximately 1%. It reflects how the use of designed BLR will enhance the correctness of post energies and DOI estimators. It will also improve the performance of the scanner in terms of precision in the localization of tumor cells [11,12].

The leading-edge selector
BLRs' outputs are summed and then transferred to the signal selection unit. The selected pulses are digitized at a rate of 200 MHz. Table 1 reveals that the incoming signal's mean [10,90] % rise time is 11 ns. It clarifies that there will be at least two samples, in the digitized version of the signal, on the signal leading-edge. Hence, Q = 3 is selected. It ensures a proper selection of the signal leading-edge, the most critical signal portion needed by the used timestamp estimator [11].

The first and the second stage interpolators
Signal leading-edge selector output is transferred to the first interpolation stage realized with WLSI. In this study, this interpolation stage receives five selected signal samples, initially sampled at 200 MHz. It outputs 17 samples, with a sampling rate of 800 MHz.
The SLI is selected to process the second interpolation stage. It receives 17 samples and outputs 65 samples with a sampling rate of 3.2 GHz. The interpolation errors are computed. Throughout this procedure, 10,000 interactions among the radioactive source and the scintillators matrix "6" are used. The intended interactions are recorded at predetermined locations.
The module "10" produces a pair of pulses as an outcome of each interaction. These pulses and their related sum are digitized with an oscilloscope. The pulses produced by the module "10" are recorded at a sampling frequency of 200 MHz. The sum of pulses produced by the adder circuit "9" is recorded at a sampling rate of 3.2 GHz. The error per interpolated sample, Ie n , is estimated with Eq. (11), where, xref n is the reference sample value with respect to the interpolation instant tr n . It represents the summed analog signal that is sampled with an oscilloscope, at 3.2 GHz. yr n is the approximated value computed with respect to the interpolation instant tr n .
The error of the used interpolator is measured in terms of the Mean Square Deviation Error (RMSDE). The RMSDE is determined for each intended pulse by using Eq. (12), where, C is the number of samples that are considered for the incoming pulse. The number of summed pulses is indexed by Np. Eventually, the RMSDE mean is determined as the average value of the RMSDE Np .
For designed hybrid interpolator, the average RMSDE value is 13.6 μV. The interpolated signal is used for post timestamps estimation. The interaction time among the γ-ray and the crystal scintillator is computed by comparing the arriving pulse amplitude against determined thresholds. The time is estimated on the basis of thresholds crossing instants. Due to the discrete-time digitized pulses, a threshold crossing is estimated as intersection among the threshold and the straight line, crossing via two successive samples that lay across that threshold. For the timestamp estimators, based on digital discriminators, the precession of timestamps is directly related to the temporal resolution and the magnitude accuracy of the arriving pulses [11,23].
In this case, by using the suggested hybrid interpolator, the temporal resolution of the selected portion is enhanced 16-fold. It outputs an interpolated signal with 0.3125 ns of temporal resolution and 13.6 μV of average RMSDE. It is capable of significantly improving the accuracy of the post timestamp estimator while digitizing pulses with ADCs that are economically accessible [11].

Comparison of the hybrid interpolator with a conventional interpolator
The effectiveness of the suggested hybrid interpolator is also compared with a counter mono-algorithm-based approach. Interpolation is achieved traditionally by using a unique algorithm. The computational load is directly related to the order of the interpolator and to the count of input samples.
The leading-edge selection module, in the proposed solution, allows concentrating on the pertinent portion of the signal. In the examined case, five samples per incoming pulse are picked. Without this element, however, the entire pulse duration of around 150 ns should be considered [11,12]. The ADC will deliver 30 samples per incoming pulse, for a sampling frequency of 200 MHz. It will increase the interpolator computational load with a factor of 6.
The computational complexity for the suggested case is determined by adding the computational costs of both stages. The computational complexity of the optimized WLSI with an IF = 4 is 15 multiplications and 3 additions (cf. Figure 10). The computational complexity of SLI is 3 additions and 3 binary-weighted divisions (cf. Figure 8). Relative to the addition and multiplication operations, the circuit level complexity of the binary weighted division is insignificant. However, the complexity of a WLSI with an IF = 16 is 45 multiplications and 15 additions for approximating 15 samples among two consecutive originals [12].
By the grace of leading-edge selector, in the proposed case, the WLSI needs to interpolate 5 selected samples and it delivers 17 interpolated samples at its output. The overall computational cost of WLSI becomes 75 multiplications and 15 additions. The complexity of used SLI for processing 17 samples is 51 additions. It results in an overall cost of 75 multiplications and 66 additions. Contrary, in conventional case, the system has to process 30 samples for the whole pulse length, between 10% rise-time to 10% fall-time, of around 150 ns should be considered [1,2]. It results in a computational cost of 1350 multiplications and 450 additions.
These results demonstrate that the suggested solution reaches an 18-fold gain in the count of multiplications and a 6.8-fold gain in the count of additions over the mono WLSI based solution.
The proposed hybrid interpolator's interpolation precision is also contrasted with the mono WLSI-based solution. Characterization is achieved by using the same pair of pulses registered and used during the measurement of the hybrid interpolator error. The interpolation error of the up-sampled signal outputs by the mono WLSI interpolator is estimated by using Eq. 12. It results in 11.4-μV RMSDE. For the case of designed hybrid interpolator, 13.6-μV RMSDE is attained. It reveals that the designed solution achieves substantial computational advantage over the mono WLSI-based tactic while achieving an analogous precision.

Discussion and conclusion
A brain-PET scanner's resolution and sensitivity rely on how accurate it is to measure the depths and times of interactions among scintillators and the γ-rays [1][2][3]. The exact measurement of the DOI requires an accurate estimate of the arriving PET pulses' energies [12]. A precise measurement of interaction times needs an exact calculation of timestamps [11]. This helps in accurate reconstruction of the LORs. Accurate DOIs and LORs contribute to the reconstruction of threedimensional high-resolution tomography, which allows a precise location of tumor cells in the patient's brain.
The readout electronics of PET pulses produce unpredictable offsets. It causes an incorrect estimation of these pulses' energies, which decreases the accuracy of measurement of the DOI and timestamp. As a result, it diminishes the scanner's ability to assess the location of tumor cells accurately. An efficient BLR is proposed for minimizing the impact of offsets. It attains a dynamic offset cancelation by employing a real-time OCalc estimation mechanism. By the grace of the suggested BLR, the peak amplitude dispersions for a determined position of the radioactive source with respect to the matrix of scintillators "6" is diminished from ≤25% to ≤1%. This assures that the use of designed BLR would enhance the precision of post energies and DOIs' estimators. This should improve the scanner's accuracy in the localization of tumor cells [1][2][3]11].
After energies, the second significant parameter to be measured in brain-PET scanners is the timestamp of the annihilated γ-rays [11]. It is conducted to calculate LORs, which allow the tumor cells to be located. A PET scanner's sensitivity and tomographic resolution are directly linked to the computational accuracy of timestamps [11,23]. The timestamp can be measured in either digital or analog worlds. The analog timestamp calculators require the development of complex, integrated circuits for specific applications. Using digital timestamp calculators can result in a cost-effective solution [23]. It allows a solution to be realized using regular ADCs and Field Programmable Gate Arrays (FPGAs). Comparing the amplitude of digitized versions of PET pulses to established thresholds, they measure timestamps [23,24,32]. The time is stamped by using the instants of the threshold crossing. Hence, the exactitude of timestamp calculations is directly related to the incoming pulses' temporal resolution and precision of magnitude. The incoming pulses are digitized at a sampling rate of 200 MHz in the examined case. It provides 5 ns of temporal resolution. Later, with the suggested hybrid interpolator, the temporal resolution of the selected portion of pulses is 16-fold enhanced. It results in an interpolated signal with a temporal resolution of 0.3125 ns and with RMSDE of 13.6 μV. It aptitudes a significant improvement in the precision of the post timestamp calculator while acquiring the pulses with economically available ADCs.
Component-level architectures of the suggested BLR and hybrid interpolator are described. The proposed chain is implemented in VHDL, and synthesis is realized on the xc7a200t FPGA. It is shown that more than 10 proposed BLR and hybrid interpolator chains can be implemented on a single xc7a200t chip, which costs around 260US$. This reveals that the suggested concept can be developed, unlike traditional predecessors, by using cost-effective ADCs and FPGAs [12]. It prevents the production of complex high-performance specific integrated circuits and thus results in effective realization. In addition to cost-effectiveness, it also facilitates the device reconfiguration compared to hardwired circuits and allows similar precision to be attained [12].
These results demonstrate the potential applicability of the proposed BLR and hybrid interpolator in current brain-PET scanners. They can be easily incorporated into contemporary PET scanners based on a simple architecture and can contribute effectively in improving their tomographic resolution.