Implementation of TOF-PET Systems on Advanced Reconfigurable Logic Devices

The ability to resolve the Time-Of-Flight (TOF) of the gamma particles resulting after the positron annihilation until their absorption by the detector material has a strong impact on the performance of the Positron Emission Tomography (PET) systems. This occurs because, by reducing the noise level, it becomes possible to also reduce the total amount of data required to reconstruct the medical image to a given quality degree. This furthermore translates into a reduction of the time required for the image acquisition or into a reduction of the radioactive dose employed. Additionally, the capability to resolve the TOF is critical for image recon‐ struction in situations where the detectors cannot be completely deployed around the point of interest [1].


Introduction
The ability to resolve the Time-Of-Flight (TOF) of the gamma particles resulting after the positron annihilation until their absorption by the detector material has a strong impact on the performance of the Positron Emission Tomography (PET) systems. This occurs because, by reducing the noise level, it becomes possible to also reduce the total amount of data required to reconstruct the medical image to a given quality degree. This furthermore translates into a reduction of the time required for the image acquisition or into a reduction of the radioactive dose employed. Additionally, the capability to resolve the TOF is critical for image reconstruction in situations where the detectors cannot be completely deployed around the point of interest [1].
In Figure 1 it is shown the improvement on the image quality as a function of the TOF resolution and the solid angle covered by the detectors. As it can be seen from the figure, the importance of the TOF-PET measurement is greater as the solid angle covered by the detectors becomes smaller. According to this, the TOF capability is essential to any PET system that cannot completely surround the patient, like it could be the case of specific-application PET systems developed for particular applications, as for instance the approach for nuclear cardiology depicted on Figure 2.
Current PET scanners are built around analog subsystems implemented with discrete circuits. The electronic advances have allowed replacing the analog circuits by digital equivalents. Some of the reasons are that digital circuits present higher throughput; digital circuits also increase self-test and diagnostic capability; they present higher reliability and they also present higher security of intellectual proprieties. In contrast to these advantages, uncertainties on the time determination appear due to the discretization and the rounding effect of the digital systems. Moreover, the complexity of the design tools is considerably higher [2].  PET systems contain trigger units responsible to identify true coincidences. These units are typically based on Complex Programmable Logic Device (CPLD) or Application Specific Integrated Circuit (ASIC) devices combined with Digital Signal Processors (DSPs).
On one hand, DSPs are designed to support high-performance, repetitive and numerically complex sequential tasks. They are specialized on execution of repetitive algorithms, which involve multiplication and accumulation operations. The execution of several operations with one instruction are the features that accelerate the performance in state of the art DSPs [3]. Such a performance strongly relies on pipelining, which increases the number of instructions that can be executed in a time unit. However, parallelism in DSP is not very extensive; DSP is limited in performance by the clock rate and the number of useful operations that can be performed at each clock cycle. For instance, the TMS320C6202 processor, a well-known DSP, has two multipliers and a 200 MHz clock, so it can achieve at most 400 10 6 multiplications per second, which is much less than a programmable logic device counterpart.
On the other hand, CPLDs are very simple reconfigurable logic devices, with a few tens of input channels and quite small logic units for data processing. They have gradually been replaced for more complex devices with higher amount of resources. For instance, ASICs present a better optimization of logic size and power management. For many high-volume designs the cost-per-gate for a given performance level is lower than that of high speed CPLDs or DSPs. However, the inherently fixed nature of ASICs limits their flexibility, and the long design cycle may not justify the cost for low-volume or prototype implementation, unless the design would be sufficiently general to adapt to many different applications. Moreover, the development of very high performance reconfigurable logic devices, as Field Programmable Gate Arrays (FPGAs), has allowed its successful application in a wide number of areas.
First FPGAs lacked the gate capacity to implement demanding DSP algorithms and did not have specific tools well enough for implementing DSP tasks. They were also perceived as being expensive and with a relatively poor power management. But these limitations are being overcome with the introduction of new DSP-oriented products from Altera and Xilinx, the two leading companies for FPGAs. High throughput and design flexibility have positioned FPGAs as a solid silicon solution over traditional DSP devices in high-performance signal processing applications. FPGAs can provide more raw data processing power than traditional DSP processors by using massive parallelism.
Since FPGAs can be hardware reconfigured, they offer a complete customization while implementing various DSP applications. All these features are, nowadays, easy to implement by means of a new generation of specific tools. FPGAs also have features that are critical to DSP applications, such as embedded memory, DSP blocks and embedded processors. Current FPGAs provide more than 96 embedded DSP blocks, delivering at least 384 multipliers operating at 420 MHz. This results on over 160 billion multiplications per second, a performance improvement of over 30 times what is provided by the fastest DSPs. This configuration leaves the programmable logic elements on the FPGAs available to implement additional signal processing functions and a system logic, including interfaces to high-speed chips and external memory interfaces such as DDR2 controllers. Using high bandwidth embedded memory, FPGAs can in certain cases suppress the need for external memory.
Summarizing, FPGAs present a high speed data transfer; fast data processing capabilities; the ability to handle simultaneously a huge number of electronic signals; and the possibility to reconfigure itself to adapt to the very wide range of applications without the need of modifying the hardware design. They additionally include hardware (Xilinx PowerPC) or software (Xilinx MicroBlaze) processor cores, depending on the model; they offer a huge storage capacity with dedicated RAM blocks and look-up table memories; and large logic capacity with tens of millions of system gates. All these features make FPGAs to be great candidates to replace CPLD or ASIC devices on PET trigger units.
Besides the advantages of PET systems based on FPGAs, recent advances in digital electronic design allows to use FPGAs for TOF determination with very high accuracy, less than 100 ps [4,5]. This timing resolution opens the door to the development of trigger units for PETs systems with TOF capabilities built on them at a very competitive cost. Moreover, the reconfiguration characteristics of these devices allow to easily modify the PET setup (number of channels, detector coincidence map, etc) and to adapt it to different environments or physical requirements. In this chapter the main considerations for the design of TOF-PET systems based on advanced reconfigurable logic devices will be presented.
In the first section, the main advantages of TOF-PET systems will be highlighted and a historical review of these systems will be presented. In the second section, the requirements on the scintillation crystals and detectors suitable for TOF-PET designs will be described. Details of the electronic TOF implementation on FPGAs will be provided in the third section. In the fourth section, the impact of the TOF information on the reconstruction algorithms will be discussed and, finally, the conclussions will be pointed out in the fifth section.

Historical perspective
In this section, a brief description of the evolution of TOF-PET scanners from its origin to nowadays is presented.
The idea of TOF information for PET was already suggested by Anger [6] and Brownell [7] in the 1960s. However, it was rejected since the available scintillators crystals, photo-sensors and electronics were not fast enough. It was considered again when the type of crystals like CsF or BaF 2 appeared in early 1980s. Several TOF-PET scanners were built at that time by leading groups as by CEA-LETI in Grenoble [8,9], by Ter-Pogossian's group at Washington University [10,11] and by Wong's group at University of Texas [12,13]. This first generation of TOF-PET devices achieved time resolutions ranging from 470 to 750 ps [14][15][16]. The decay time of these scintillator materials (CsF and BaF 2 ) was very short (see Table 1 below), but their low density, low photoelectric fraction and low light output resulted on a poor spatial resolution and sensitivity.
At the same time, Bismuth Germanate (Bi 4 Ge 3 O 12 or BGO) began to also be used for PET designs. This scintillator has much better characteristics for PET systems, as high detector efficiency due to its increase effective atomic number (Z). However, its long decay time made it hardly suitable for TOF-PET systems. It is also remarkable in 1980s that is the time span in which two major companies (i.e., General Electric and Computer Technology Imagery) entered into the PET industry and gave credence to the clinical application of PET, because prior to this time (the late 1980s) most PET applications had been research applications [17,18].
The development improvement of TOF-PET systems was stopped until the discovery in 1990s of new scintillators based on Cerium-doped Lutetium Orthosilicate (Lu 2 SiO 5 or LSO). LSO quickly revolutionized PET imaging systems because it excelled in three fundamental detector material parameters: high density, high effective Z and a relatively high light yield with a short decay time of around 40 ns, allowing very narrow coincidence windows. The short decay time (LSO decays 7.5 times faster than BGO) permitted to decreased patient scan times and, thus, supposed an improvement that made patients more comfortable during the procedure and from a clinical standpoint increased patient throughput. The increase in patient throughput made the procedure accessible to more patients and subsequently increased the testing revenue for hospitals and PET imaging centers. The short decay time also lowered the level of random noise in these scans [5]. In terms of resolution, systems based on LSO scintillators permitted a new generation of TOF-PET scanners with timing resolutions as small as 300 ps [19]. The decade of the 1990s, thus, is known as the decade in which the extended use of PET progressed and made strong in the clinical sector. As more and more members of the medical community became acquainted with the utility of PET and its present and future benefits, PET imaging became increasingly popular and was available in more hospitals, diagnostic clinics, mobile systems, and physician practices.
Recently, the discovery of new materials as Cerium-doped Lanthanum Bromide (LaBr 3 ) with shorter decay time (16 ns) and excellent energy resolution has led to the development of TOF-PET systems also reaching time resolutions of 420 ps, and it is expected to reduce this resolution to 315-330 ps [20]. LaBr 3 present the drawback of being hygroscopic and, thus, requiring a tedious manipulation and montage.
Finally, from a commercial point of view, only two TOF-PET scanners have been introduced in the market by Philips and Siemens. The Gemini TF PET-CT is commercialized by Philips since 2006, it uses LYSO scintillator crystals (similar to LSO but with slightly lower density) and achieves a time resolution of 585 ps [21]. Recently, there has been presented results for the Siemens TOF-PET scanner, called mMR, showing a time resolution of 550 ps [22].
Currently, in parallel with advances in scintillator materials, new fast and cost-effective photosensors are being developed. Silicon Photomultipliers (SiPMs) are at the forefront of this development. They are almost unaffected by magnetic fields [23], are very fast and have high gain. SiPMs aim to improve TOF resolution due to their fast timing [24]. Single-photo-electro timing resolutions close to 50 ps root-mean-square have been reported [25]. It is expected that a new generation of TOF-PET scanners based on fast scintillators and SiPMs would be able to achieve unprecedented time resolutions.
For additional information about the historical development of TOF-PET systems, excellent reviews can be found in the literature, for instance in [26][27][28].

Crystals and detectors for TOF-PET scanners
The capability of PET systems to return highly accurate TOF performances strongly depends on the read-out electronics but also on the detector block itself. In this section, the main considerations about this block namely the type of crystal and the photosensor, will be presented paying special attention to its timing properties.

Crystals for TOF-PET systems
PET devices containing scintillators crystals must be as denser as possible since they have to stop the photons of 511 keV energy produced in the positron-electron annihilation. Such crystals need to generate high amounts of scintillation light to be detected with the photosensors. The crystal light yield is very important since it directly relates with the energy resolution of the system but also with the spatial resolution and later with the timing performance. To increase the photon emission probability in the visible range during the relaxation process, most of the crystals are doped with small quantities of impurities which generate intermediate states of energy.
In order to obtain fast output signals from the scintillation light, it is also important to account for a decay time of such light as short as possible. Moreover, the emission light wavelength should match the sensitivity of the photo-sensor utilized for electronic conversion. NaI(Tl) has been one of the first types of crystals used for PET design. It generates significant amounts of scintillation light providing a high energy resolution and, thus, allowing to distinguish for instance photons of similar energies. One of the drawbacks when using this material has been its hygroscopic property, which requires using it in dry environments. In contrast to NaI, as stayed before, BGO crystal have been the most used scintillation crystal for PET applications, especially due to its high density, but with the lack of a good light yield and, therefore, time response.
GSO (Gadolinium-orthosilicate) has also been considered for PET designs although the light yield is also low compared to others. In this ranking, LSO (Lutetium oxyorthosilicate) appears to be good positioned offering a similar stopping power than BGO but also generating a high light yield compared to NaI. Nowadays, a LSO variant, commercially named LYSO is being widely used since its performances are very similar to LSO but at lower prices.
We will focus now in the decay time of the scintillation light since it is the dominant property in order to accurately achieve a TOF determination. As shortly introduced above, the scintillation light is described by a fast increase of the intensity followed by an exponential decrease of this emission. Here, it is called scintillation decay time to the one reached after the light pulse intensity reduces to 1/e of its maximum.
The time resolution is conditioned by the rise time, decay time and absolute light output. The rise time is negligible compared to the decay time, only the decay time and light output determine the intrinsic limits of the time resolution. In particular, faster decay times and higher light outputs reduce, i.e. improve, the time resolution. The shortest the decay time the lower the sensor dead time to process more events. The high initial rate suggests that LSO should return excellent timing properties.  However, the timing properties of a scintillator depend on both the energy deposited in the crystal and the geometry of the scintillation crystal.

Photosensor, detectors capable of TOF and signal types
The photosensors are the next part of the puzzle in order to reach high time resolutions. Two main groups of photosensors are currently under use in PET technology namely Photomultiplier Tubes (PMT) and solid state photo-diodes.
PMTs use the external photo-electric effect. The scintillation photon enters the PMT through the crystal window, deposits its energy in the photocathode, and excites the electrons in the photocathode coating. The photoelectrons are accelerated and focused to the first anode with the help of an electric field. The photoelectrons are multiplied after impacting the first dynode, and this structure is sequentially repeated. A typical PMT gain is of about 10 6 from anode to last dynode. It is possible to increase the gain with the high voltage difference and the number of stages or dynode sequence.
Most scintillators emit in the 400 nm range, allowing the use of Borosilicate glass-windowed PMTs. Many of the PMTs that are used in commercial PET cameras have transit times that vary significantly across the face of the PMT. Such a time corresponds to the interval between the light pulse striking the photocathode and the pulse signal at the anode. The transit time inversely depends with the square root of the supplied voltage. However, concerning the time resolution of the PMTs, this is better defined as the mean transit time. TOF measurements with PET scanners based on PMTs require a transit time variation very small among the different PMTs used in the design but also across the different PADs (anodes) of each individual device.
The coupling of PMTs and scintillation crystals permit to recover the photon impact position.
In the case of multi-anode PMTs it is somehow easier to derive such an incidence position. The location of the interaction is achieved by measuring the light detected on each anode. This is referred as Anger-logic. In the following Semiconductor detectors and, in particular, Avalanche Photodiodes (APDs) have proven to be suitable photosensors for PET detectors since the mid-1990s. These compact and reliable silicon-based devices have successfully been used to replace bulky photomultiplier tubes in high-resolution PET systems. Since arrays of small dimensions crystals are most commonly used as the scintillation block, these crystal pixels may be used individually coupled to single small area APDs. These sensors are very thin and, because of the high internal electric field and the short transit distances of the charge carriers, they are quite immune to magnetic fields. This characteristic allows them to be placed inside a magnet and to operate quite normally. APDs have been tested in high magnetic fields of up to 7 or 9.4 T without showing any performance degradation [29].
Although APDs are compact and insensitive to magnetic fields, they present limitations for optimal PET performance. In particular, they can be hardly used for TOF measurements due to their slow response time. They also show low gains in the order of a few hundreds, and therefore, require sophisticated preamplifiers. These drawbacks seem to be overcome by the so-called Silicon Photomultipliers (SiPMs). Note that they are differently named depending on the manufacturer.
A SiPM consists of multiple tiny (currently up to about 20 microns side length) avalanche photo-diodes (so-called microcells) connected to a common electrode structure. When a reverse bias is applied to the SiPM at a voltage higher than the breakdown, each microcell operates in Geiger mode providing single photon counting capability. However, one photoelectron saturates the microcell limiting the linear response of the device as a function of the quantity of photoelectrons to about half the number of microcells. Similarly to APDs, they are compact, exhibit good photon detection efficiency (PDE) and do not need high voltage power supply. In advantage to APDs, they require simpler electronics and provide a high gain (10 5 -10 6 ).
Due to their excellent timing resolution (hundreds of picoseconds), SiPMs are currently considered as the best choice for future TOF-PET applications. Their insensitivity to magnetic fields makes them ideal for the development of hybrid PET-MRI scanners. Moreover, their costs are expected to diminish rapidly in the near future due to increasing competition (there is no patent for the main invention) and automated massive production.

Design and implementation of a TOF-PET system based on a FPGA
In this section, several electronic techniques for TOF measurement will be described, also introducing the concept of FPGAs. The main features of this technology will be identified and their impact on the total system performance will be discussed. Once the use of FPGAs has been justified, the multiple implementation techniques, advantages and benefits for TOF measurements will be exposed.

Electronic techniques enabling TOF calculation
There exist several techniques for electronically measuring the photons TOF. In a first approach, TOF systems were based on analog circuits, using extremely uniform current sources and converting the electrical charge accumulated by a capacitor into voltage values, proportional to the charging time, that later were digitalized [30], as illustrated in Figure 4.  This technique presents several drawbacks, mainly related with scalability, design complexity and static power dissipation. Nowadays, most of TOF measurement devices are based on digital circuits using delay lines in different configurations. These devices use the propagation delay across the individual digital blocks to measure TOF [31][32][33], so they are able to measure it with a resolution lower than the system clock period. Figure 5 represents a digital delay line used to measure TOF. The digital TDCs (Time to Digital Converters) overcome the inconvenient of the analog approach and, if properly designed, they can even compensate the effect of temperature and/ or power supply fluctuations. However, most of them are built on ASICs, so they are expensive, have a reduced number of available channels, and their functionality is limited. Here, the development in recent years of very sophisticated reconfigurable logic devices opens the possibility to integrate digital TDCs on high performance FPGA.

FPGAs overview
FPGAs are pre-fabricated silicon devices that can be electrically programmed to carry out multiple digital functions. Unlike Microprocessors or Computers in which programming means change the incoming instructions to the device, programming an FPGA consist of change the internal logic of the device.
Historically, their strongest competitors in the market were the ASICs. They are designed for specific application using CAD (Computer-Aided Design) tools. Developing an ASIC takes much time but they have a great advantage in terms of recurring costs as very little material is wasted due to the fixed number of basic elements in the design. With an FPGA, a certain number of basic elements are always wasted, as these packages are standard. This means that the cost of an FPGA is often higher than that of a comparable ASIC. Although the recurring cost of an ASIC is quite low, its non-recurring cost is relatively high and often reaching into the millions. Since it is non-recurring though, its value per IC (Integrated Circuit) decreases with increased volume. If the cost of production in relation to the volume is analyzed, it will be find that going lower in production numbers, using FPGA actually becomes cheaper than using ASICs [34]. Furthermore, it is hardly possible to correct errors after fabrication.
In contrast to ASICs, FPGAs are configured after fabrication allowing the user for further reconfigurations. This is done with a hardware description language (HDL), which is compiled to a bit stream and downloaded to the FPGA. The disadvantages of FPGAs are that the same application needs more space on chip and the application runs faster on the ASIC counterpart. Due to the size reduction of the basic components, FPGAs were getting more powerful over the years. Herein, the development of ASICs was decreasing and becoming more expensive. Figure 5 shows the design flow of the two mentioned devices.
From Figure 6 it is easy to observe the highest complexity involved in an ASIC design as for instance: • Design for Testability (DFT) Insertion. This technique is used to check whether the manufacturing process has added defects to the chip. DFT insertion means incorporating an additional logic to improve the testability of the internal nodes of the design.
• Hand-off to foundry. The process takes several months due to the "personalized" design.
• Equivalency checking. A system design flow requires comparison between a Transaction Level Model (TLM) and its corresponding Resistor-Transistor Logic (RTL) specification.
• Verification of 2 nd and 3 rd order effects. This stage is not included in the FPGA design flow because is carried out by the manufacturer.
An FPGA design flow eliminates the complex and time-consuming floorplanning (design and interconnection of the internal blocks), place and route, timing analysis, and other stages of the ASIC design project since the design logic is already synthesized to be placed onto an already verified, characterized FPGA device. However, when needed, manufacturers provide the advanced floorplanning, hierarchical design, and timing tools to allow users to maximize the performance for the most demanding designs. Furthermore, FPGA technologies are considered very competitive due to the wide specification ranges. Each manufacturer provides FPGAs with different capabilities that adapt to the desired application. There are families for high performance applications, for high volume of production and even radiation tolerant families.
CPLDs are, in some cases, a good alternative to FPGA. They have a similar internal architecture to the FPGAs, as shown in the Figure 6. CPLDs are composed of digital blocks, which implement digital functions, analogous to the FPGA, IOBs (Input Output Block) and Interconnection Matrices. In general terms, CPLDs have less internal resources than FPGAs but they are able to achieve better speeds. However, when a considerable number of resources such as memory blocks and multipliers are required, FPGAs are still the best choice. In fact most of the current FPGAs incorporate Digital System Processing blocks, which have internal Multipliers. FPGAs have become more popular and, thus, CLPDs have experienced a noticeable decrease in its production, which gives FPGAs more guarantee of continuity. Therefore, FPGAs are increasingly applied to high performance embedded systems.

FPGA internal architecture
In the following, a basic description of the internal blocks of an FPGA is presented. Its basic structure is composed of three main blocks: • CLBs (Configurable Logic Blocks). Generic blocks, which contain digital logic for implementing specific functions. • IOBs. They are used to connect the FPGA to other systems of the whole application.

• Programmable Interconnect. Enables the communication between CLBs and IOBs.
Additionally to these basic blocks, FPGAs incorporate: • Distributed memory blocks that store the user-programmed configuration.
• Clock blocks that are intended to additional clock signal generation for using either in internal blocks or external purposes.
• Other blocks that manage the proper coexistence of all the resources.

FPGA design for TOF measurement
As commented above, there are several alternatives for implementing the TOF determination, many of them based on ASICs, that are expensive, hardly reconfigurable, and they need to be produced in high volumes to be cost-effective. However, reconfiguration capabilities of FPGAs and their low cost compared to other solutions have made them the ideal candidates for the development of complex electronic equipment, as PET systems [35]. Additionally, it is technically possible to use FPGAs to measure TOF with a very high time resolution [36], much better than the resolution of current commercial PET systems whose resolution is around 600 ps. Thus, the electronic device responsible for the TOF measurement must be able to distinguish events between time periods in the order of few-tens hundreds of picoseconds to be competitive enough in the market. In this subsection, the main considerations for TOF calculation using an FPGA will be presented.

Time to digital converter
TDC is a well-known technique traditionally used for TOF determination [37]. The TDC goal is to recognize events and to provide a digital representation of the time they occurred. There are many TDC implementation possibilities. Focusing in digital TDCs and leaving aside the analog TDC, the simplest is a high-frequency counter, which value is incremented at each clock cycle. When an event occurs, the accumulated amount of clock periods are stored and presented. The drawback of this approach is that the stored counter is a number of integer clock cycles and, therefore, the resolution is restricted to the clock system. Thus, in order to get accurate resolution, the use of a faster clock is required. Thus, the larger the frequency the more the signal integrity problems, translating into a complex system design. Moreover, the stability of the clock system becomes critical.
Interpolation circuits emerged as a necessity to measure events below the clock period. These circuits measure the time between a clock event and the event being measured. One of the problems is the TDC time required to perform a measurement, blocking new measurements for a certain period of time. One of the most implemented structures based on interpolating circuits is the Vernier Delay line.
Until recently, TDCs were ASIC implemented either by companies which launched the product to the market or by owners who wanted a specific design. Nowadays, the use of FPGAs aimed at this purpose is getting more popular [36][37][38]. Low cost, fast development cycle and commercial availability are some of the motivations of this fact. Other trade-offs of using FPGAs compared to ASICs have been amply discussed in previous sections. Sometimes a TDC is completely included within an FPGA but, depending on the application, some parts may be outside FPGA. Beyond the delay line, current TDCs contain many other elements. An example of a TDC block diagram is depicted in Figure 8. Figure 8 represents a basic scheme of a modern TDC. The most complex block corresponds to the delay line, which will be deeply discussed below. A simplified description of a TDC follows: • A calibration signal is initially selected for the system calibration. This is a necessary task to determine the individual delay of the elements from which the delay line is composed by. The raw counter (not yet in terms of time) is stored into the histogram memory.
• Each raw time element previously booked into the histogram memory is converted into a real time value and booked again into a look up table (LUT).
• The system is ready to receive an event through the "signal" connection, which is selected by the "select" signal.
• When the time event is greater than the clock period, a number of entire clock cycles must be stored, performed by the coarse counter block.
• Then, when an event occurs the "signal" is bypassed into the delay line. The encoder counts the number of elements reached by the signal and provides this number to the LUT, which convert this number to time and, after this value is combined with the coarse counter value, a final timestamp is generated.

System architecture
TDCs may incorporate more than one channel. In the block diagram previously described, a multiple channel TDC is referred. In this case, the proposed TDC channels will share the histogram memory block and coarse counter block. At the time to get the final timestamp, the coarse counter block will store each coarse time associated to each channel number. Analogously, the histogram memory block will store, after the initial calibration, the time delay of each tap for each channel.
The importance of channels lays in the possibility of group in one single device the TOF measurement of a complete PET system [35]. The outputs of the detectors placed on the PET ring system have to be fed into a trigger unit, which will be the responsible of data processing. When a signal coming from one detector is received, the system waits certain time with the purpose of receiving another signal coming from an opposite detector (or a defined set of them). The block CFD (Constraint Fraction Discriminator) is in charge to adapt the voltage values of the signals from the detector to those required by the FPGA, without disturbing the timing information. A TDC measures the time difference between the events coming from the two detectors in order to estimate the TOF. Data will be transferred to the co-processor unit (see below) to be further sent to the acquisition control unit. Figure 9 represents the mentioned architecture. The selected FPGA must account for enough resources to accommodate the required channels. Key resources that have to be considered are those that are going to be part of the delay line. Depending on the total amount of channels needed by the application, it will be mandatory focusing on the resources, which delay elements will be placed, and a proper FPGA selection.
Concerning the channel implementation in FPGA compared to other devices, FPGAs offer flexibility at the time of providing high number of channel inputs. These channels can be dynamically defined by software and enable/disable some of them if required. This means that those resources that are now free can be used for other purposes.

Delay line
Basically, a delay line is a set of interconnected elements whereby a signal is passed through. It is normally used to count the time between two or more events. Each delay element (also referred as tap or bins) has a propagation delay (τ) and a storage block (see Figure 5). At certain time instant, the incoming signal is stopped and the total amount of reached taps is counted.
Since the propagation delay of each element was previously measured, the time interval from the input signal arriving to the delay line until the signal is halted, can be determined.
It is very important to be taken into account that the total delay of the delay chain must be equal or greater than the clock period. Additionally, when high accuracy in TOF measurements is required (below 100 ps), any change on the propagation feature of the delay elements or the delay line path (path which join the bins) becomes critical. There are three major issues that threaten them [37,38]:

a. PVT
The propagation features of the delay elements are temperature and voltage dependent. This means that the variation of the temperature inside the device and the variations of the supplied voltage have to be controlled. In ASIC-based TDCs is possible compensating the delay variation through analog method, more exactly, generating a control voltage internal circuit ad-hoc. In FPGAs, analog calibration is not suitable and a digital compensation is adopted [39].
The two more popular approaches already proposed are: • Double registration. In this approach the total delay time of the delay line is designed to be longer than the system clock period. After a random time, the incoming signal is stored twice in order to take the average time value. This solution presents a fast time response but the drawback of this configuration is that does not provide a calibration of every bin independently since the average is taken when the bins have different width.
• Statistical. In this other approach the calibration process provides a compensated delay to each bin. The calibration process is, in many cases, automatically designed through a specific feedback. For instance, a certain component, which is also affected by PVT (Process Voltage and Temperature induced variations), is implemented and placed close to the delay line in order to resemble the temperature and voltage variations. This component might be, for example, a ring oscillator whose oscillator frequency is temperature and voltage dependent. Initially, the ring oscillator frequency is measured and stored as well as the initial time delay of each tap. Then, once the system has been calibrated, it remains continuously checking if the ring oscillator frequency has changed. If it has, the time values of each tap are interpolated according to the ring oscillator frequency differences.

b. Delay line placement.
A design tool often places delay elements of TDC automatically, what sometimes triggers imbalanced delays [37]. FPGAs dispose of repetitive structures commonly known as chain structures. FPGA designers place these sorted structures for general-purpose applications. The benefit is the short path connection between them what makes their use appropriate for TDC delay line implementation. Some of the different kinds of chain structures that the vendors include in many FPGAs are carry chain structures, sum-of-products chain, cascade chains, etc. Figure 10 depicts a deployed carry chain structure. Figure 10 shows an internal view of a commercial FPGA. Red blocks corresponds to the delay line, which in this case if composed of carry logic structures. This is one possible placement of many. Depending on the length of the delay line, it is possible to locate the carry chain in multiples areas as long as the region contains carry elements (they are not present in all FPGA blocks). In this case, the carry chain occupancy is almost 400 slices. Often it occurs that there is not space enough to accommodate all the carry chains in a single column and additional columns are required. This fact will make the delay line less uniform. Therefore, in some designs a possible placement restriction must be taken into account.

c. Differential non linearity (DNL).
The problem of the non-uniformity of tap delays is the greatest disadvantage of the FPGA delay line implementation. Its origins come from the internal way whereby the delay taps are connected, which in some cases is made by a CAD tool. Moreover, the discordances relate to the special features of some FPGAs. More specifically when the input signal passes across Logic Array Block boundaries and extra delays added cause ultra-wide bins [38]. An example of this effect is depicted in Figure 11.   It is easy to appreciate the DNL (Differential Non Linearity of the delay bins) effect. This effect deteriorates the time resolution of the TOF measurement system but, fortunately, there exist some techniques to reduce this negative effect if required [38].

Co-processor
An important part of the system intended to measure the TOF is the co-processor. The goal of this component is to manage the information coming from the TDC and to provide with a timestamp to the next part of the system. Traditionally, it is not included in the trigger system but it as an extra module.
Trigger systems have currently become more complex, integrating more sub-systems in it.
With the advent of modern devices, co-processors have been integrated into the main part of this trigger system, namely, FPGAs or ASICs. Either in ASIC or last decade in FPGAs, the coprocessor was hardware integrated, what meant that certain resources were already used and there were no chance to make user-defined architecture. However, new generation FPGAs provide software-defined co-processors, which are liable to be dimensioned according to the application requirements. This relatively new feature has given FPGAs even more advantages and, thus, more relevance when it deals with TOF calculation systems.

Impact of TOF information on reconstruction algorithms
To finalize this chapter, we will it will be described how the algorithms currently used for image reconstruction are affected by the TOF information.
Conventional PET (or non TOF-PET) reconstruction uses TOF only to determine if two detected photons are in the same time coincidence Δt and therefore belong to the same positron annihilation event. Here, a positron annihilation event would be registered along the line at which the event occurred, but it is unable to identify which voxel is the source of the event, thus all the voxels along the path are suggested to have the same probability of emission.
However, in TOF-PETs, the faster detectors are able to measure the difference in the arrival time of the two gamma rays, providing better localization of the annihilation event along the line formed by each detector pair. In fact, the position is blurred by a time measurement uncertainty named "time resolution", the time resolution of a detector is defined as the minimum time interval between two subsequent photon events in order for these to be recorded as separate events and depends on several instrumental factors. The smaller time resolution Δt, the smaller error on the localization of the source Δx. In fact the FWHM of the probability function is the localization uncertainty Δx (FWHM) = cΔt/2. This results in an overall improvement in signal to noise ratio (SNR) of the reconstructed image. In particular, the SNR in an image including TOF information improves with decreasing time resolution Δt (or the corresponding spatial uncertainty Δx). Therefore, such an uncertainty is larger for bigger patients (being related to the effective diameter D). The TOF SNR is proportional to the non-TOF SNR, through the following relationship: Nowadays, the image reconstruction problem for fully 3D TOF-PET is challenging because of the large data sizes involved. Thus, it produces a high degree of redundancy in 3D TOF-PET data which can be exploited in multiple ways as reducing data storage and thereby accelerating image reconstruction, or to reject missing or inconsistent data. These unmeasured data samples can be caused either by defective detectors, or incomplete angular coverage of the patient due to special PET scanner architectures like it could be the case of a dedicated ring PET with an aperture aiming to allow for biopsy procedures.
Mathematically, redundancy is expressed by consistency conditions which can be visualized in terms of the 3D Fourier Transform and employed for compensation of missing data, using Fourier rebinning of PET data from TOF to non TOF. Thus, TOF-PET systems require less data to provide higher quality images, so the doses to the patient could be reduced. Moreover, redundancy of information can be used to overcome missing data either from defective detectors or to special scanner architectures.
Current TOF-PETs timing resolutions of about 550-600 ps do not directly lead to an improvement in the spatial resolution of the reconstructed image. It actually reduces noise propagation by localizing events along segments of each Line of Response (LOR) rather than spreading statistical noise across the full length of each LOR. At the ultimate limit, TOF-PET could potentially localize annihilation events to within a single image voxel, effectively measuring the activity distribution directly and eliminating the need for tomographic reconstruction. However, this would require a timing-resolution of approximately 10 ps to isolate events to within a 3-mm voxel. With the current TOF-PET devices, inclusion of TOF information provides a degree of improvement similar to that obtained with the Point Spread Function (PSF) model. Moreover, TOF information can lead to an artifact-free image reconstruction when the number of angular samplings is reduced. This fact is important if PET devices with limited angle coverage are considered. Partial ring PET devices can have advantages over full ring geometries in future dedicated PET systems designed for imaging specific organs. However, partial ring design leads to an incomplete sampling of the polar angles, producing artifacts in image reconstruction. Nevertheless, the number of angular views necessary for an artifact-free image reconstruction is reduced as TOF-PET timing resolution improves (i.e. the additional TOF information can recover some of the missing information and reduce or eliminate the artifacts). In this sense, with TOF information, the angular sampling requirements are reduced. [40] TOF-PET approaches put challenges in the field of image reconstruction algorithms. The first challenge is to make the reconstruction time clinically viable, as TOF-PET implies a nonnegligible increase on the image reconstruction computational cost. A variety of reconstruction methods already exist for TOF-PET data. These image reconstruction procedures can be divided in two groups: analytical and iterative algorithms. This division is normally made whatever the tomography technique is considered (computed tomography, Single Photon Emission Computed Tomography (SPECT), and PET).

Analytical methods
Analytical (i.e. Filtered Back Projection, FBP) reconstruction methods were the only reconstruction methods available at the beginning of TOF-PET development and were originally described in the 1980s for 2D data [41,42]. In an analytical TOF-PET approach, the image is reconstructed by using a one dimensional time-of-flight weight along the time-of-flight line [43]. In this reconstruction, the TOF response kernel k(l) is usually taken to be a Gaussian where l is a scalar variable [44]: whose spatial FWHM, Δx=(2σ2(4ln2))1/2, is related to the FWHM time resolution Δt as described above. The convolution of the function describing the "unknown" emitter distribution e(r) with the kernel function k(l) is directly related to the TOF projection data d(θ,r) as: where û is the unit vector in the projection direction at angle θ. It can be demonstrated [44] that the function describing the emitter in the frequency domain, E(ν), can be obtained from: where D(θ,ν) is the Fourier Transform (FT) of the projection data at angle θ, and  , û). The CW reconstruction TOF filter has been shown to be optimal in terms of minimizing image noise variance when working with Poisson data from an infinite uniform source distribution [43], but could not be optimal in other situations. In the above discussion we have considered the 2D tomography problem and the continuous domain. These expressions can be discretized for practical implementation on real TOF-PET data. The 2D approach has been also extended to 3D data. Axial single-slice and Fourier rebinning approaches followed by 2D reconstruction have been described [45][46][47]. Moreover, techniques based on rebinning the TOF data into non-TOF arrays have been also developed [48].

Iterative Methods
Although analytical reconstruction methods are generally faster than the iterative ones, these last generate higher quality images, in terms of spatial resolution and image noise [49]. Iterative reconstruction methods such as the Ordered Subsets Expectation Maximization (OSEM) algorithm have to be modified in order to take into account TOF information. This is done by including a PSF along the LOR in the projector, with a width directly related to the time resolution of the scanner. Despite of the high computational cost of the iterative algorithms with respect to the analytical ones, current iterative reconstruction methods are the standard in clinical PET, and also appear to be the natural choice for TOF-PET in both present and future clinical TOF scanners [50]. Moreover, TOF-PET adds complexity to data organization and computation time to the reconstruction algorithm. If the reconstruction is sinogram based, TOF information adds a "4th" dimension to the 3D sinogram representation, changing data storage and dynamic memory requirements. In contrast to these drawbacks, if the reconstruction is list-mode based, the data are stored as a list of detected events [51]. However, 3D list-mode iterative TOF reconstruction allows for the modeling of all physical effects of the scanner system, thus retaining the resolutions of the data in the spatial and temporal domains without any binning approximation. In this sense, this approach is much more flexible and powerful than the sinogram approach at the cost of a computationally effort, being slower, since backand forward-projections are independently executed for each event of the list. In this case, the reconstruction time depends not only on the length of the list, i.e. the number of detected events, but also on the sizes of the spatial and TOF kernels. Fully 3D implementations of the TOF-OSEM algorithm from list-mode data have been described in [52,53].
In order to get image reconstruction times compatible with the daily clinic routine, 3D listmode TOF-OSEM algorithms use multiple (10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20) processors and non-optimized reconstruction parameter choices (e.g., stopping criteria determined by the reconstruction time rather than convergence and use of a truncated TOF kernel to speed up the forward-and backprojection steps) [54]. However, great effort has been put in optimizing timing requirements for TOF-PET iterative reconstruction algorithms. In reference [55] a new formulation for computing line projection operations on graphics processing units (GPUs) using the compute unified device architecture (CUDA) framework, is described. When applied to 3D list-mode TOF-OSEM image reconstruction this procedure is >300 times faster than the single-threaded reference CPU implementation [51].
Recently [56], a new TOF-PET list-mode based algorithm has been developed (DIRECT, direct image reconstruction for TOF) to speed up TOF-PET reconstruction that takes advantage of the reduced angular sampling requirement of TOF data by grouping list-mode data into a small number of azimuthal views and co-polar tilts. In terms of computing time, the total processing and reconstruction time for the DIRECT approach seems to be about 25%-30% that of list-mode 3D TOF-OSEM for comparable image quality. In addition, the total processing and reconstruction time is roughly constant with DIRECT, regardless of the sizes of the TOF and LOR resolution kernels, while the times for list-mode TOF-OSEM strongly depend on these kernel sizes. The reconstruction time per iteration for DIRECT is also independent of the number of events, while the per-iteration time for list-mode TOF-OSEM is almost linear with the number of counts [57].
Data corrections concerning randoms, attenuation and possibly also normalization for TOF-PET devices seem not to have a TOF structure. Thus, the current approach is to apply conventional non-TOF corrections to the new TOF data. However, scatter correction is clearly identified as the component that definitely has a TOF structure and requires an appropriate TOF computation [58].
Finally, it should be pointed out that TOF reconstruction is much less sensitive to errors and improper approximations. The redundant information present in TOF data naturally corrects the data inconsistencies during the reconstruction. It has been observed that TOF reconstruction reduces artifacts due to incorrect normalization, approximated scatter correction, truncated attenuation map, to name but a few [59].

Conclusion
In this chapter a complete review of the main design characteristics of TOF-PET systems based on reconfigurable logic devices has been performed. These systems have been presented from a historical perspective, and the main advantages of recovery timing information have been discussed. The goodness of the application of reconfigurable logic devices for TOF-PET systems have been described as well as digital electronics designs that would allow to accurate measure the timing information. Finally, the impact of timing information on image reconstruction algorithms has also been discussed.
As a conclusion, the implementation of the electronic hardware of PET systems on reconfigurable devices, including the TOF measurement capability, seems to offer several advantages over conventional approaches based on ASICs or CPLDs, mainly in terms of cost-effectiveness, time-to-market and re-configurability. Modern programmable logic devices present the necessary features to compete with the traditional used devices in terms of TOF calculation as technology of fabrication reaches high speeds and smaller sizes. For the time being, time resolutions in FPGAs are limited by the propagation time of the digital gates that conform the digital internal blocks of the device. But, due to the fast advances in fabrication processes, it is envisaged that these limitations will be overcome in the near future.