At room temperature, high-responsivity charge-coupled devices (CCD) comprising arrays of several thousand linear photodiodes are readily available. These sensors are capable of ultraviolet to near infrared wavelengths sensing with detecting resolutions of up to 24 dots per millimeter. Their applicability in novel spectrometry applications has been demonstrated. However, the complexity of their timing, image acquisition, and processing necessitates sophisticated peripheral circuitry for viable output. In this chapter, we outline the application specifications for a versatile spectrometer that is reliant on a field programmable gate array (FPGA) automation. The sustained throughput is 1.23 gigabit per second 8-bit color readout rate. This approach is attractive because the final FPGA design may be reconfigured readily to a single, branded, application-specific integrated circuit (ASIC) to drive a wider range of linear CCDs on the market. This is advantageous for rapid development and deployment of the spectrometer instrument.
- linear image sensing
- image acquisition
- high-speed processing
The proliferation of imaging devices in many applications today is due to the significant technological progress that has occurred over the last few decades in the area of image sensing, particularly with respect to charge-coupled device (CCD) sensors. Today, they are found everywhere from document line-scanned imaging to high-definition planar image acquisition, thereby covering a wide variety of applications. The interest to use CCDs in serious scientific instruments arose from the advances in the area of high-sensitivity, large-area and low-noise CCDs. These CCDs began to routinely provide a high quantum efficiency (QE) figure for each of the millions of pixels, low-noise readout over wide spectral and dynamic ranges. One factor that contributed to these advances is that solutions were given to many of the problems associated with defect states of the semiconductor substrates on which the devices were built. Defects lead to low charge transfer efficiencies due to their lossy nature. This made it difficult to fabricate large format image sensors. Various other advancements allowed the reduction of readout noise, thereby improving the photodiode sensitivity even further . Of particular note is the electron multiplication scheme that provided on-chip gain with a net effect that is analogous to the operation of the photomultiplier tube (PMT). The typical CCD imager consists of a coordinated collection of individual photodiodes that can range from a few hundred pixels for computer optical mouse position encoding to high-speed high-definition imaging . For many applications, the CCDs consist of primary color composite arrays that have image resolutions approaching 1200 dots per inch, excellent low-lux sensitivity. In general, devices with high QE and high detection sensitivity over wide wavelengths are fast-approaching performances nearly equal to the traditional photomultiplier tube, with an added bonus of having built-in control electronics that require only correct timing and serially orientated data acquisition [3, 4, 5, 6, 7]. The vast number of individual photosensors and serial readout for the typical CCD sensor necessitates a rather complex layout of additional components that are peripheral to the CCD for the purposes of control phasing and the actual data acquisition. CCDs have been successfully applied to many scientific applications, such as the cryogenically cooled CCDs in astronomy applications, in spectroscopy, and in education [8, 9, 10, 11, 12, 13, 14]. The operational principle of a CCD is simple in practice, namely that the individual photosensors are synchronously and serially clocked output. This poses formidable implementational challenges due to the complexity of biasing, timing, and control on any CCD device, for a usable intensity output data stream with high bandwidth images. According to the Nyquist sampling theorem, acquisition must therefore proceed at least twice the image bandwidth . In practice, the analog-to-digital converter (ADC) must operate faster than the Nyquist frequency because of the mandatory intermediate processes in the sequence of acquisition, from initialization to acquired image transmission to data terminal equipment (DTE). There are a number of usable approaches to the implementation of the sequences in the image acquisition from a CCD device. The first is the basic microcontroller- or microprocessor-based approach. The second is based on a field programmable gate array (FPGA), and the third is based on an application-specific integrated circuit (ASIC)-based design. The third strategy is based on a custom-designed hardware application, known as an application-specific integrated circuit (ASIC). The ASIC approach offers a competing strategy to FPGA that, despite its generally higher speed and throughput, is likely to better suit larger corporations that have the resources to develop and manufacture specific integrated circuit designs. The needs for mass deployment of an ASIC require justification, such as the range and the volume of the final units required. Where the ASIC is the desired implementation, it is more likely that an FPGA will be used in the developmental stage for design and validation before a custom ASIC can be developed. The cores of such an FPGA-derived ASIC can have provisions for upgrading, such as externally connected dual data rate (DDR) block memories and secure device cards (SD cards) for storage upgrading, user-specifiable display options, USB communications, and other peripheral technological features. Generally, it has become trivial to configure the internal complex logic blocks into a user-defined architecture with optimized memory and execution speed in a manner that is less reliant on additional peripherals and firmware.
An alternative solid-state imaging technology to the CCD is commonly referred to as active pixel sensors (APSs), which are based on complementary metal oxide semiconductors (CMOSs). The allure of the CMOS transistor is its low form factor (FF) on the semiconductor substrate in comparison with bipolar devices. In fact, advances in CMOS fabrication techniques are largely responsible for the microprocessor revolution because they are more suitable for large-scale integration (LSI). Therefore, APS devices have the primary advantage of having higher pixel densities; hence, these devices have larger pixel arrays and wider photon collection areas. The simplicity of the APS sensing mechanism, which comprises single photodiode and at most three transistors, is vastly reduced in comparison with that of the CCD. A detection element in an APS array essentially follows a randomized access, row-column addressing protocol. The CCD element, on the other hand, relies on a sequential conveyance of charge, leading to a framing approach to image recovery. The general consensus appears to be that CMOS sensors suitable for scientific instrumentation still have much room for improvement with respect to QE, in spite of having higher image access speeds [1, 16, 17]. The low QE figures for APS devices stem from what is referred to as the “fill factor,” which is a measure of the actual detection area to the entire area of the APS element. Although slow, CCDs have a high fill-factor and a large full-well capacity which makes them suitable for astrological imaging. The need to improve the QE figure has been the subject of active research and development, which has led to the attainment of CMOS performance at par with CCDs. Furthermore, unlike CMOS arrays, CCD pixels are not amenable to avalanche-gain enhancement at a given detection site, frequency/phase lock-in, and do not benefit from local pixel amplifiers to improve the signal-to-noise (SNR) figure. Also, CMOS benefits significantly from time-correlated imaging, global shutter synchronization, photon counting, and 0.1–0.5e- (sub-electron) RMS readout noise levels. The definition of color at pixel level through color filtering by using different p-n junction layers is easier in CMOS. Hybridized devices comprising CCD and CMOS that capitalize on the desirable aspects of each technology are also now being developed [18, 19]. A detailed comparison between CCD and CMOS imagers is beyond our present scope. Holst and Lomheim , Janesick et al. [21, 22], and others provide a good review of the two alternatives.
We begin by discussing the design methodologies that might be considered when designing a spectrometer for scientific applications, such as wavelength resolvable imaging using a linear CCD. Then, a typical FPGA spectrometer design based on an exemplary CCD is described after laying out the rationale for why the FPGA is preferred for the design. The approach taken in this work is to present a proof-of-concept for the overall system functionality rather than as a final implementation. The FPGA is therefore being used merely for design verification with the intention to re-synthesize onto a purely ASIC system. However, it must be noted that the proprietary nature of the Xilinx IP cannot be synthesized without a license onto non-Xilinx ASICs processes from various manufacturers, such as the Taiwan Semiconductor Manufacturing Company (TMSC) or Global foundries. This would then necessitate the design of equivalent codes for FIFOs and other IPs before implementation using ASICs.
2. FPGA or ASIC?
For high-volume productions, the lower unit cost of an ASIC has generally made it attractive in comparison to FPGAs. FPGAs are now widely recognized as using leading edge technologies in order to obtain the same system performance as an ASIC in older technologies; an increasing number of FPGA-based systems are routinely being converted to ASICs. However, the appeal of ASICs transcends the issue of cost alone. ASIC systems generally have significantly much lower power consumption, and this is a bonus in battery-operated mobile devices such as cameras and mobile phones. Also, the hard coding of the logic in an ASIC leads to more secure and reliable systems by making it virtually impossible to reprogram the device. This reliability makes ASICs the obvious choice over FPGAs for critical applications. Over the past two decades, there has been a gradual movement toward the development and application of femtosecond time-resolved spectroscopy (TRS) in many areas of measurement [23, 24, 25, 26, 27, 28, 29]. Such imagers are used quite extensively with CCD and CMOS cameras to give high readout rates. Such fast performance places a significant demand on the control system. In TRS, fixed position measurements are correlated at fixed photon transit times. This is largely due to the increasing number of pixels and resolution of these cameras [30, 31, 32, 33, 34]. In general, the processing requirements of temporally demanding applications that maintain high sensitivity within a single package tend to transcend the capability of commercially sourced off-the-shelf components. ASICs have amicably risen to the challenge and are finding increasing usage for such applications, particularly because they do away with the need for a dedicated, high-throughput ADC altogether. Thus, these developments mean that printed circuit board level design practices of ADCs are no longer needed. The characterization of the ADC becomes critical in such applications, but ASICs and FPGAs are natively suited for speed optimization during the design phases. High-input bandwidth ADCs with sampling frequencies of over 300 Ms./s that digitize events with femtosecond resolution and low crosstalk are now commonly implemented using ASICs [35, 36, 37] as well as FPGAs . Such ASICs are found in critical experiments such as the Large Hadron Collider (LHC) , space , organic, and biomedical applications [19, 20, 21, 22, 23, 24, 25, 26, 27].
2.1. The FPGA design flow
Figure 1 depicts the design steps in a typical FPGA-based design process. In general, FPGAs provide reduced design time and bug fixes due to faulted design logic since the logic design step in customary ASIC designs is absent. The verification of deep-submicron placement issues, particularly with respect to performance issues of heat removal and speed, is easily achieved. The prototype designs can be verified virtually instantly during development as many times as necessary by simply downloading the design into the development test bed. The disadvantage is that a given FPGA design relies heavily on the programmer’s abilities to write efficient FPGA code. The performance of the design therefore somewhat relies on developer’s ability. As with ASICs, there is a clear need to optimize the hardware instance.
2.2. The ASIC design flow
Figure 2 depicts the design steps in a typical ASIC-based design process. ASIC tools are generally script driven, unlike FPGAs. In the development process, FPGA system designers are increasingly reliant on graphical user interfaces (GUIs), although the very high speed integrated circuit (VHSIC) hardware definition logic (VHDL) is inherently script driven. In ASIC design, postsynthesis analyses of the timing and functional equivalences are the responsibility of the system designer before prototyping. The effects of deep-micron logic element placements need careful appraisal by the designer, unlike in FPGAs where they are routinely part of the design verification step. Designing using FPGAs is therefore associated with fast turnaround.
2.3. A typical FPGA application in CCD-based spectroscopy
Figure 3 shows the schematic of a typical linear CCD application in which the light focused on the CCD by a lens array is derived from a reflective or transmissive grating after striking a carefully positioned sample being characterized . The general elements in the layout are identifiable in other specific implementations. The output of the grating is proportional to wavelength and manifest as angularly dispersed alternating zones of high and low light intensity with a linear resolution in terms of spectral, that is, wavelength spread or range per unit length .
Figure 3 shows the general schematic of the image forming optics. A real but inverted image, shown as , is required to fall onto the photosensitive areas of the CCD for detection. This image is a magnification of the virtual object , which is itself the result of net interference of the light source as it passes through the grating. It is easy to show that the spread of wavelength along the total detection array length of the CCD is distance resolved. Thus, assuming that the achromatic reduction lens has image magnification , where generally , then the lengthwise spread of the image along the CCD’s photosensors is:
Therefore, a bright spot located at the position will be at wavelength and an intensity measured at the corresponding photodiode. Hence, a direct readout of the CCD intensity data is possible. By utilizing a standardized light source, the voltage output of the CCD can be calibrated in lux; although this is not necessary in a spectroscopic application where knowing the spectral character and the relative peak intensities is more important than actual intensity, it is normal to express the output count in arbitrary units (au). Therefore, a plot of the CCD sensor output, which is proportional to the incident intensity, versus wavelength suffices for many applications. In our particular case, the CCD output is measured in volts. The CCD output is read out sequentially in step with synchronization pulses. The hardware or software that receives the CCD output should therefore be designed to interpret that the CCD output as being within a well-defined frame. Figure 4 shows the timing and control signals that are typical in linear CCDs for the detection of three colors. The waveform-labeled OS (serial output) shows the intensity variations relative to the sample and hold (SH), clocking phases (Φ1 and Φ2), and the register shift (RS) signals.
Figure 4 shows the complexity of the signals required to drive a typical CCD in even the most cursory application, without data acquisition in any form. These signals are derived from an actual CCD, the TCD2557, which henceforth we shall refer to as the exemplary application. The complexity of the additional conditioning circuitry can be appreciated by considering the following scenario. The exemplary CCD has 5415 elements that make up the readout frame, but 5340 actual photosensors. The remaining 75 elements are dummy sensors, 64 presensor blocks and 11 postsensor blocks. The CCD requires dual-phase clock signals (Φ1 and Φ2) which, if run at 1.25 MHz (), define the shortest frame time of 5415= 4.332 ms. The needs of additional processing cause timing overheads beyond the needs to construct the frame. USB transfers and other processes are nonparallel and consume added real time. Between the three afore-mentioned design strategies that would realize a practical spectrometer that is designed around the exemplary linear CCD, the microcontroller approach is the least efficient for the following reason. If the ADC presumably uses an 8-bit successive-approximation, ADC converter running at 1 MHz (i.e., ), then each photodiode level is converted to at least ms. It is likely that an associated temporary memory for () bytes will be required. The operation of such a memory will be such that at the end of each conversion the memory is written with the ADC result. All photodiodes are then completely digitized in s. This time is clearly far too long for a simple 8-bit acquisition. In addition, the firmware-defined timing signals need to be generated and issued to the CCD from the free output lines of the microcontroller, further complicating the design and requiring an extremely fast microcontroller. It is estimated that the operating frequency of such a microcontroller would need to be well in excess of 40 MHz. Figure 5 shows an FPGA-based spectrometer built around a Xilinx Spartan FPGA. Several specific FPGAs may be used. For reasons of cost and availability, the author settled for the Spartan-6 LX9 FPGA device for an actual spectrometer application to drive the TCD2557 CCD [40, 41, 42, 43]. The Xilinx software development kit (SDK) allows the organization of the complex logic blocks (CLB) in some of its FPGAs into a powerful, firmware-defined 100-MHz microprocessor referred to as MicroBlaze. With phase-locked loop clock synthesis, MicroBlaze is capable of 400-MHz internal clock speeds. This complex, proprietary engine has all the functionalities of a microprocessor. Furthermore, as with normal microcontrollers, MicroBlaze can be further controlled by a user-defined firmware written in a high-level language such as C/C++. A major advantage of VHDL is that its processes are, by default, parallel and can be made either synchronous or asynchronous. Thus, the generation and handling of the timing and synchronization signals becomes a trivial exercise. Figure 6 shows the organization of the Spartan-6LX9 to implement the requisite control signals and the connection of the CCD.
The FPGA hardware strategy features a parallel design paradigm that is generally simpler and highly repeatable. The FPGA design thus far is volatile with respect to the power state that is lost when the power to the FPGA is removed. The bitstream file that represents the developed design and defines the application must therefore be stored in an electrically erasable programmable read-only memory (E2PROM) and reloaded into the FPGA at power on. The role of the intermediate memory is then implemented by defining an 8-kilobyte 8-bit first-in first-out (FIFO) structure. Using a FIFO buffer allows the readout device to operate at a slower clock rate alongside other parallel processes such as acquisition and USB transfers. In the actual application mentioned above, we have achieved a CCD readout rate of 1.23 gigabit/s per color.
Figure 7 shows the actual oscilloscope output (a) in response to the diffracted spectrum (b) obtained when a source of illumination was directed at the spectrometer.
2.4. The operational principle
The Spartan-6 MicroBlaze FPGA intellectual property (IP) owned by Xilinx differs from the conventional micro-controller that comes with a fixed architecture and is supported by rigid peripherals in that it permits the arbitrary implementation of high-speed functional blocks from the array of CLBs, clocks, and system management tiles. Their internal structures can be arranged into several virtual, highly parallel microcontrollers and peripherals . The advancements that culminate in MicroBlaze and other IPs are made possible due to the optimization of the logic of these “sixth-generation” FPGAs. The inherent 45-nanometer, copper-interconnected architecture permits devices to be built for speed and low-power consumption. Thus, FPGA-based applications having good cost/power and cost/performance ratios are a reality . There are other FPGAs beside Xilinx made that have well-defined development flow. In the exemplary CCD implementation, we have implemented an efficient system comprising a 2-register, 6-input lookup table (LUT), steering logic, 18 kilobytes of random access memory (RAM), and support for the USB 2.0 standard for communications with the DTE. These features therefore make the FPGA a very good alternative to the microcontroller approach. In contrast to the FPGA approach, there are on the market dedicated CCD management ASICs, such as those developed by the CCD manufacturers themselves. However, the lead time, specificity to CCD, general complexity, and cost are factors to consider during the design of the application. Many ASICs are heuristic, “black box” solutions to complex image acquisition. Such solutions are, for the low volume designs at least, that ASICs can be notoriously closed, proprietary solutions whose innards and general operations are hard to decipher without knowledge of the intellectual property, or some amount of reverse engineering. Therefore, while an ASIC can deliver a blazingly fast and reliable performance in a given application, it can be beyond the means of a low volume production in terms of development cost and ASIC design. In this work, the necessary CCD control signals are derived from a finite state machine (FSM) that operates concurrently with a MicroBlaze core. Huang et al. have described their Xilinx FPGA implementation of a visualization spectrometer . The FSM also synchronizes the 20 MB/s half-flash ADC with the FIFO. It effectively controls the output sequencing of the CCD output into the FIFO structure through the sample-and-hold (SH) signal, thereby allowing FIFO buffer reads by the DTE at rates that are significantly lower than the ADC conversion rate [37, 44, 45]. Much slower DTE interfaces, such as computer soundcard sampled at 44.1 kHz, can readily digitize the intensity stream using a number of third-party software (Figure 8).
2.4.1. Emulating a microcontroller strategy using an IP core
The concept of reusable intellectual properties (IP) pioneered by Xilinx in their sixth-generation FPGAs speeds up the design process even further . These are verified, visually presented subunits and instantiation templates that save further development time and effort. The sole requirement of the user is to correctly declare the desired unit and functionality, and then to contextualize the instance of the IP subunit. This approach, which has been employed in our practical instrument with respect to a 100-MHz MicroBlaze core and the FIFO IPs, enhances repeatability and dramatically improves lead-in time. The resulting system is supported by 16 kilobytes of RAM, a general-purpose output register (GPO) of 5-bit width, an 8- and 4-bit general-purpose input (GPI) registers, and communication interfaces consisting of a USB 2.0 port and a universal asynchronous receiver transmitter (UART). The UART is configured for 9600 baud, 8 bits, 1 stop bit, and no parity (9600, 8, N, 1). At present, the system does not process interrupts pertaining to the image acquisition stages.
2.4.2. A description of the FIFO IP core
The implementation of the 8-kilobyte 8-bit FIFO structure was done requisitioning 18 kilobytes of BRAM and relied on separate clocks for reading and writing to the structure . Figure 5 shows a diagrammatic depiction of how the FSM  scheme was configured to generate the various control and clock signals to drive the CCD and other features in the design. The handshaking and flag-signals handling that are typically associated with FIFO synchronization and flow control by a DTE unit were not used here, for the sake of simplicity and quick realization of a basic spectrometer. The design settled for software flow control instead. Potential data loss can be avoided and the performance maximized by defining a 3-byte threshold that signals a full FIFO buffer. Once detected, the DTE flushes the FIFO. The data transfers to the DTE are initiated only upon the detection of the FIFO full state. This approach effectively frees up the DTE to other processing tasks during image acquisition. In Figure 4, the rising-edge transitions of the shift-pulse (SP), clamp-pulse (CP), and the photodiode cell reset (RS) produce the analog level-shifting action that defines the CCD operation. In this way, intensity level sensed by a photodiode, which is held as a proportional voltage, is shifted into the next photodiode and eventually out of the CCD analogue output, until all photodiode levels have been shifted out of the frame. All three colors, red, green and blue (RGB), are shifted out in parallel output by the same pulse transitions. The external control of the FIFO by the DTE occurs through the high-level, SDK-developed C/C++ that understands the hardware description, right from the initiation of the design. Upon a request by the DTE, the FPGA sends all of its 8-kilobyte contents to the DTE through the available communication channel. After the reception of the image data by the DTE is completed, the software on the host recomposites the intensity data using a suitable parsing algorithm. At this point, all the intensities and pixels are then matched numerically. The interpretation of these data then yields information about the light that falls on the CCD. This light can be due to the secondary scattering by a sample under characterization.
This chapter presents a sufficiently detailed discussion regarding the pros and cons of three approaches in the design of a linear spectrometer instrument that uses a tricolor, CCD image sensor. The completed instrument is capable of being used for serious scientific measurements. The approaches discussed are with respect to the complex timing and signal conditioning requirements to realize a workable output from the CCD. These approaches are (i) microcontroller based, (ii) ASIC based, and (iii) FPGA based. We then described the essential aspects of the design by outlining the CCD and system control signals, the data acquisition and communication needs of the overall instrument. We suggest that the FPGA design approach leads to a high degree of reliability, repeatability for the task since a high performance, for rapid application development. The embedded intellectual properties found on later generations of Xilinx’s FPGAs, most notably the MicroBlaze and FIFO IPs, allow rapid application definition, implementation, and testing on a low-cost FPGA. We have discussed an exemplary 20 Ms./s image acquisition system for a spectrometer instrument that allows both USART and USB 2.0 communications with a normal personal computer. The output data stream, comprising intensity versus wavelength format, can be incorporated directly into postprocessing programs. An equation was derived to show how the alignment and calibration of the spectrometer may be done. An aim of this has been to demystify the complexity of the system by outlining in sufficient detail the physics behind image sensing while presenting the overarching challenges that such sensing presents to the acquisition system. The sensing of wavelength-proportional intelligence with a resolution suitable for serious scientific work clearly generates vast amounts of data, from the sensor frontend up until the final acquisition and storage on the DTE, on a time domain that approaches real time. This naturally raises questions about the best acquisition strategy. We evaluate the pros and cons of the FPGA approach versus the ASIC. For a small-scale development, the FPGA provides a quick route to design completion, whereas the ASIC route may be preferable in larger volume productions. This spectrometer has in fact been deployed in thermoluminescence (TL) measurement, photoluminescence (TL), and line scans. The relative ease with which the FPGA was reconfigured for different, actual CCD displays from different manufacturers aptly demonstrates the versatility of the chosen design approach for once-off or low volume products such as this spectrometer. Characterization of the readout rates using a 400-MHz digital storage oscilloscope (DSO) on the TCD2557 CCD produced a sustained figure of 1.23 gigabit/s.