## Abstract

The era of cloud computing has fuelled the increasing demand on data centers for high-performance, high-speed data storage and computing. Digital signal processing may find applications in future cloud computing networks containing a large sum of data centers. Addition and subtraction are considered to be fundamental building blocks of digital signal processing which are ubiquitous in microprocessors for arithmetic operations. However, the processing speed is limited by the electronic bottleneck. It might be valuable to implement high-speed arithmetic operations of addition and subtraction in the optical domain. In this chapter, recent results of M-ary optical arithmetic operations for high base numbers are presented. By exploiting degenerate and nondegenerate four-wave mixing (FWM) in highly nonlinear fibers (HNLFs), graphene-assisted optical devices, and silicon waveguide devices, various types of two-/three-input high-speed quaternary/octal/decimal/hexadecimal optical computing operations have been demonstrated. Operation speed up to 50 Gbaud of this computing approach is experimentally examined. The demonstrated M-ary optical computing using high base numbers may facilitate advanced data management and superior network performance.

### Keywords

- high-base optical signal processing
- multilevel modulation format
- four-wave mixing
- wavelength conversion
- optical computing

## 1. Introduction

The great progress of fiber-optic communication has driven the success in transmitting/receiving very high-speed data signals in optical fiber links [1–5]. Recently, the era of cloud computing has fuelled the increasing demand on data centers for high-performance, high-speed data storage and computing. Optical interconnection is considered to be a promising technology for data interconnection in data centers. In future cloud computing networks containing a large sum of data centers, optical technology will play an important part [6–8]. For inter-data center communication, modern optical communication links will be used. Advanced modulation formats and wavelength division multiplex (WDM) can be used to enhance the transmission capacity of inter-data center links. And for intra-data center links, low-cost short-reach optical interconnection technologies, such as vertical-cavity surface-emitting laser (VCSEL) and multimode fiber, will be adopted. The rapid development of optical interconnection in data centers has also promoted increasing interest for digital signal processing used in data centers for wavelength management or routing. Among various digital signal processing operations, two important arithmetic modules, i.e., addition and subtraction, are considered to be fundamental building blocks of digital signal processing which are ubiquitous in microprocessors for arithmetic operations. However, the processing speed is limited by the electronic bottleneck. It might be valuable to implement high-speed arithmetic operations of addition and subtraction in the optical domain.

Remarkably, nonlinear optics has offered great potential to develop high-speed optical signal processing using optical nonlinearities [9–21]. Multitudinous optical signal processing functionalities have been demonstrated. Commonly used optical signal processing functionalities include wavelength (de)multiplexing, wavelength conversion, data exchange, optical addressing, optical switching, optical logic gate and computing, optical format conversion, optical equalization, tunable optical delay, optical regeneration, optical coding/decoding, and more [22–54]. As depicted in Figure 1, the material platforms for nonlinear optical signal processing mainly include highly nonlinear fiber (HNLF) [51, 55–57], semiconductor optical amplifier (SOA) [58–60], periodically poled lithium niobate (PPLN) waveguide [32, 35, 36, 61, 62], chalcogenide (As_{2}S_{3}) waveguide [63], silicon waveguide [64–66], and graphene-assisted device [67]. Previously, optical arithmetic or optical logic operations have been reported in these material systems. It is noted that most of previous research efforts are dedicated to optical computing for binary modulation formats such as on-off keying (OOK) and binary phase-shift keying (BPSK). Despite favorable operation performance achieved for binary operation, it suffers the limited bit rate and low spectral efficiency because each symbol for binary modulation formats only carries single-bit information.

The use of M-ary phase-shift keying (m-PSK) and M-ary quadrature amplitude modulation (m-QAM) in coherent systems has become a key technique for efficient increase of the transmission capacity and spectral efficiency of optical communication systems. For instance, quadrature phase-shift keying (QPSK) with 2-bit information in one symbol has been extensively used in high-speed optical fiber transmission systems [68, 69]. Multilevel modulation format containing multiple constellation points in the constellation diagram can also be used to represent M-ary numbers. Taking QPSK as an example, four constellation points (i.e., four-phase levels) in the constellation diagram of QPSK signal can donate a quaternary base number (i.e., 0, 1, 2, 3), as shown in Figure 2. Similarly, 8 PSK (16 PSK) signal which has 8 (16) points in its constellation plane can represent an octal (hexadecimal) base number. The related optical signal processing functions to multilevel modulation formats could be addition and subtraction of high base numbers. In this scenario, a laudable goal would be to perform addition and subtraction of high base numbers because (i) high capacities might be achievable, (ii) optical spectra might be utilized efficiently, and (iii) processing throughput might be improved.

In this chapter, we tend to provide a comprehensive report of our recent research works on M-ary optical computing for multilevel modulation formats by exploiting optical nonlinearities [70–75]. Various material platforms, including HNLFs, graphene-assisted optical devices, and silicon waveguide devices, are adopted to performing high-speed M-ary addition and subtraction. First, we report the experimental results of optical addition and subtraction using HNLFs. Functionalities of quaternary addition/subtraction are examined. Second, we show the graphene-enhanced optical nonlinearities in graphene-assisted optical devices and its application in optical computing. Finally, we present the latest results of high-speed optical computing using ultracompact on-chip silicon waveguides. Quaternary/hexadecimal hybrid optical computing is successfully demonstrated in a complementary metal oxide semiconductor (CMOS)-compatible platform, which can be potentially integrated with standard CMOS large-scale integrated circuit.

## 2. Binary optical logic

In the last two decades, binary optical computing has been widely studied. Up to now, many schemes have been demonstrated to realize various elementary optical logic operations, including AND, OR, NOT, XOR, XNOR, NAND, and NOR [32, 55, 61, 62, 76–85]. By combining multiple elementary optical logic operations, advanced logic operations such as half-adder, half-subtractor, full-adder, and full-subtractor have also been proposed and demonstrated [36, 86–91]. Figure 3 shows an example of simultaneous half-adder, half-subtractor, and OR logic gate [36].

Despite favorable operation performance of the binary operation, it still suffers from the limited bit rate and low spectral efficiency. Owing to the great success of advanced modulation format and coherent detection in optical communication, the implementation of M-ary optical computing becomes possible. Since the multiple constellation points in the complex plane of multilevel modulation format can be used to represent M-ary numbers, it is easy to extend binary optical computing to M-ary.

## 3. M-ary optical computing using HNLF

We propose and demonstrate M-ary optical computing of advanced multilevel modulation signals based on degenerate/nondegenerate FWM in HNLFs.

We first demonstrate high-speed two-input high-base optical computing (addition/subtraction/complement/doubling) of quaternary numbers using optical nonlinearities and (differential) quadrature phase-shift keying ((D)QPSK) signals. Figure 4 illustrates the concept and operation principle of the proposed quaternary addition/subtraction/complement/doubling. Four-phase levels of (D)QPSK signal represent quaternary numbers. Three nondegenerate FWMs and three degenerate FWMs in an HNLF are exploited to simultaneously implement multiple arithmetic functions. The input of the HNLF contains two (D)QPSK signals (* A*,

*) and one continuous wave (CW) pump. Six converted idlers (idlers 1–6) are generated by three nondegenerate FWMs (idlers 1–3) and three degenerate FWMs (idlers 4–6). The relationships between the electrical field (E) and optical phase (Φ) under non-depletion approximation are expressed as*B

Owing to the phase wrap characteristic with a periodicity of 2π, it is implied from Eqs. (1) to (6) that idlers 1–6 carry out modulo four operations of quaternary addition (A + B), dual-directional subtraction (A − B, B − A), complement (−A, −B), and doubling (2B), respectively.

Shown in Figure 5 are measured spectra. Two 100-Gbit/s 2^{7}-1 RZ-(D)QPSK signals (A, 1546.6 nm; B, 1555.5 nm), and a CW pump (1553.2 nm), are launched into a 460-m HNLF. The low and flat dispersion of HNLF enables multiple FWM processes, and thus six idlers are obtained. The six idlers correspond to addition (A + B), subtraction (A − B, B − A), complement (−A, −B), and doubling (2B) of quaternary numbers (A, B), respectively.

We measured waveforms and balanced eyes of the demodulated in-phase (I) and quadrature (Q) components of two input 100-Gbit/s (D)QPSK signals and six converted idlers. The 100-Gbit/s (D)QPSK signal is demodulated using a delay-line interferometer (DLI) with a 20 ps delay difference between two arms. The obtained results are shown in Figures 6 and 7, which confirm the successful implementation of 50-Gbaud quaternary addition (A+B), dual-directional subtraction (A−B, B−A), complement (−A, −B), and doubling (2B) based on FWM in an HNLF.

Figure 8 shows the bit error rate (BER) curves. The power penalty is about 4 dB for addition, while 3 dB for subtraction, 2 dB for complement, and 3.1 dB for doubling at a BER of 10^{−9}. The measured constellations using an optical complex spectrum analyzer are shown in Figure 9. One can clearly see that addition (A+B) and subtraction (A−B, B−A) have four-phase levels (0, π/2, π, 3π/2), while doubling (2B) has only two-phase levels (0, π).

## 4. Graphene-enhanced optical nonlinearity for M-ary optical computing

Graphene as a purely two-dimensional material with only one-carbon-atom thickness has received great interest since it features many interesting and useful electrical, optical, chemical, and mechanical properties [92, 93]. Over the last decade, many remarkable optical properties of graphene have been discovered, such as self-luminosity, tunable optical absorption, strong nonlinearity, saturable absorption, etc. [94–96]. Recently, optical nonlinearities have been observed in graphene in various configurations, e.g., slow-light graphene-silicon photonic crystal waveguide [97], graphene optically deposited onto fiber ferrules [98], and graphene based on microfiber [99]. The large absorption and Pauli blocking effect in graphene, together with the ultrafast carrier dynamics and strong optical nonlinearity with a fast response time, make graphene-based photonic devices suitable for performing efficient nonlinear functions. Very recently, an experimental observation of FWM-based wavelength conversion of a 10-Gb/s non-return-to-zero (NRZ) signal was reported [100]. In this section, we introduce our recent progress in optical M-ary computing functions using a graphene-assisted nonlinear optical device.

Figure 10 illustrates the fabrication process of the graphene-assisted nonlinear optical device. First, a monolayer graphene was grown on a Cu foil by the chemical vapor deposition (CVD) method. Poly(methyl methacrylate) (PMMA) film was next spin coated on the surface of the graphene-deposited Cu foil, and the Cu foil was etched away with 1 M FeCl_{3} solution. The resultant PMMA/graphene film (5 mm × 5 mm) was then washed in deionized water several times and transferred to deionized water solution or Si/SiO_{2} substrate. Then, the floating PMMA/graphene sheet was mechanically transferred onto the fiber pigtail cross section and dried in a cabinet. After drying at room temperature for about 24 hours, the carbon atoms could be self-assembled onto the fiber end facet. The PMMA layer was finally removed by boiling acetone. By connecting this graphene-on-fiber component with another clean and dry fiber connector, the nonlinear optical device was thereby constructed for nonlinear optical signal processing applications.

Figure 11(a) depicts the optical microscope (OM) image of the grown graphene film transferred on a 300-nm SiO_{2}/Si substrate. Figure 11(b) shows a scanning electron microscopy (SEM) image of the graphene sheet transferred on silicon-on-insulator (SOI). One can clearly see the evidence of the uniformity of the graphene. The Raman spectrum of the graphene, as displayed in Figure 11(c), shows a weak D peak and a strong 2D peak. The D to G peak intensity ratio is ~0.08, which indicates that the graphene formed on a SiO_{2}/Si substrate was almost defect-free.

We first examine the wavelength conversion of the graphene-assisted nonlinear optical device. Figure 12(a) shows a typical output FWM spectrum obtained after the CVD single-layer graphene-coated fiber device. In the experiment, the signal wavelength is fixed at 1550.12 nm. A newly converted idler at 1546.88 nm is generated when the pump is set to be 1548.49 nm. We also measure the output spectrum without graphene for reference under the same experimental conditions. As clearly shown in the inset of Figure 12(a), the power of converted idler without graphene is observed to be ~5.5 dB lower than the one with graphene. That is, under the same experimental conditions, the converted idler without graphene is ~71.9% lower than the one with graphene. Hence, the degenerate FWM in graphene contributes more in the wavelength conversion process. The insets of Figure 12(a) also depict measured QPSK constellations of the converted idler and the input signal. We also present a comparison of the FWM conversion efficiency as a function of the pump power with and without graphene. As shown in Figure 12(b), the pump wavelength is fixed at λ_{pump} = 1548.49 nm and the signal is λ_{signal} = 1550.12 nm. One can clearly see that the conversion efficiency increases with the pump power. When the pump power varies from 23 dBm to 33 dBm, the enhanced FWM conversion efficiency by graphene changes from 4.7 dB to 7.5 dB.

Figure 13(a) plots the converted idler wavelength as a function of the pump wavelength when the pump power is fixed at 31 dBm. A linear wavelength relationship between the converted idler and pump is observed. The measured FWM conversion efficiency of tunable wavelength conversion with and without graphene is shown in Figure 13(b). The signal wavelength is fixed at 1550.12 nm and the pump wavelength is tuned from 1547 to 1553 nm. When using graphene-coated fiber device, the conversion efficiency varies about 1.7 dB within a ~6 nm wavelength range. By comparing the measured pump wavelength-dependent conversion efficiency with and without graphene, one can clearly see that the FWM conversion efficiency with graphene is enhanced more than 5 dB within the tuning range of pump wavelength.

To characterize the performance of QPSK wavelength conversion, we further measure the BER curve as a function of the received observed OSNR for B-to-B signal and converted idler. Figure 14 plots measured BER performance for tunable QPSK wavelength conversion with the converted idler generated at 1546.88, 1539.92, and 1557.90 nm, respectively. The measured conversion efficiencies for converted idlers at 1546.88, 1539.92, and 1557.90 nm are −36.2, −48.2, and −39.8 dB, respectively. As shown in Figure 14, the observed OSNR penalty is around 1 dB at a BER of 1×10^{−3} (7% forward error correction (FEC) threshold) for QPSK wavelength conversion with the converted idler at 1546.88 nm. The received OSNR penalties of ∼2.2 dB at a BER of 1×10^{−3} are observed for converted idlers at 1539.92 and 1557.90 nm. The increased OSNR penalty is mainly due to the reduced conversion efficiency for converted idlers at 1539.92 and 1557.90 nm. The right insets of Figure 14 depict corresponding constellations of the B-to-B signals and converted idlers. The obtained results shown in Figures 11–14 imply favorable performance achieved for tunable wavelength conversion of QPSK signal using a fiber pigtail cross section coated with a single-layer graphene.

We then show the results of optical computing based on the fabricated graphene-assisted nonlinear optical device. Figure 15 illustrates the concept and principle of two-input hybrid quaternary arithmetic functions. From the constellation in the complex plane (Figure 15(a)), it is clear that one can use four-phase levels (π/4, 3π/4, 5π/4, 7π/4) of (D)QPSK to represent quaternary base numbers (0, 1, 2, 3). To implement two-input hybrid quaternary arithmetic functions, the aforementioned graphene-assisted nonlinear optical device is employed. Two-input quaternary numbers (A, B) are coupled into the nonlinear device, and then two converted idlers (idler 1, idler 2) are simultaneously generated by two degenerate FWM processes. Figure 15(b) illustrates the degenerate FWM process. We derive the electrical field (E) and optical phase (Ф) relationships of two degenerate FWM processes under the pump non-depletion approximation expressed as

where the subscripts A, B, i1, and i2 denote input signal A, signal B, converted idler 1, and idler 2, respectively. Owing to the phase wrap characteristic with a periodicity of 2π, it is implied from the linear phase relationships in Eqs. (7) and (8) that idler 1 and idler 2 carry out modulo 4 operations of hybrid quaternary arithmetic functions of doubling and subtraction (2A−B, 2B−A).

Figure 16 depicts measured typical spectrum obtained after the CVD single-layer graphene-coated fiber device. Two 10-Gbaud NRZ-(D)QPSK signals at 1550.10 (A) and 1553.60 nm (B) are employed as two inputs. The power of two input signals (A, B) is about 32 dBm. The conversion efficiency is measured to be around −36 dB. One can clearly see that two converted idlers are obtained by two degenerate FWM processes with idler 1 at 1546.60 nm (2A−B) and idler 2 at 1557.20 nm (2B−A). The resolution of the measured spectrum is set to 0.02 nm. The steps in the measured spectrum are actually the modulation sidebands of two NRZ-(D)QPSK carrying signals. In order to verify the hybrid quaternary arithmetic functions, we measure the phase of symbol sequence for two input signals and two converted idlers, as shown in Figure 17. By carefully comparing the quaternary base numbers for two input signals and two converted idlers, one can confirm the successful implementation of two-input hybrid quaternary arithmetic functions of 2A−B and 2B−A.

We further investigate the BER performance for the proposed optical two-input hybrid quaternary arithmetic functions. The OSNR penalties at a BER of 2×10^{−3} for hybrid quaternary arithmetic functions are measured to be about 7.4 dB for 2A−B and 7.0 dB for 2B−A. The insets in Figure 18(a) show constellations of the last point of the BER curves of output Sig. B and 2A−B. The constellation of Sig. B is measured under an OSNR of 12.6 dB, while the constellation of 2A−B is observed under an OSNR of 19.6 dB. To clearly show the differences between these two constellations, we also assess the EVM of these two constellations, i.e., EVM = 27.61% for output Sig. B and EVM = 30.09% for output 2A−B. The significant performance degradations for the two-input hybrid quaternary arithmetic functions (2A−B, 2B−A) might be ascribed to the relatively low conversion efficiency for two converted idlers at 1546.60 nm and 1557.20 nm and accumulated distortions transferred from two input signals (A, B). It is possible to further enhance the conversion efficiency by appropriately increasing the number of graphene layers employed in the experiment. Figure 18(b) depicts the BER performance as a function of the relative time offset between two signals (signal offset) under an OSNR of ~20 dB. It is found that the BER is kept below enhanced forward error correction (EFEC) threshold when the signal offset/symbol time is within 15 ps, which indicates a favorable tolerance to the signal offset.

We also propose an approach to performing three-input optical addition and subtraction of quaternary base numbers using multiple nondegenerate FWM processes based on graphene-assisted device.

Figure 19 illustrates the concept and working principle of the proposed graphene-assisted three-input high-base optical computing. Three input (D)QPSK signals (A, B, C) are launched into the nonlinear device, in which three converted idlers (idler 1, idler 2, idler 3) are simultaneously generated by three nondegenerate FWM processes. Quaternary hybrid addition and subtraction of A+B−C, A+C−B, and B+C−A are obtained simultaneously.

In the experiment, the wavelengths of three input signals A, B, and C are fixed at 1548.52, 1550.12, and 1552.52 nm, respectively. Figure 20 depicts measured typical optical spectrum obtained after the single-layer graphene-coated fiber device. One can clearly see that three converted idlers are generated by three nondegenerate FWM processes with idler 1 at 1546.13 nm (A+B−C), idler 2 at 1550.92 nm (A+C−B), and idler 3 at 1554.13 nm (B+C−A), respectively. The conversion efficiencies of three nondegenerate FWM processes are measured to be larger than −34 dB. In order to verify the quaternary optical computing functions, we measure the phase of symbol sequence for three input signals and three converted idlers, as shown in Figure 21. By carefully comparing the quaternary base numbers for three input signals and three converted idlers, one can confirm the successful implementation of graphene-assisted three-input quaternary optical computing (i.e., quaternary hybrid addition and subtraction) functions of A+B−C, A+C−B, and A+C−B.

To characterize the performance of the proposed graphene-assisted three-input high-base optical computing functions, we further measure the BER curves as a function of the received OSNR for B-to-B signals and three converted idlers. Figure 22 depicts measured BER curves for 10-Gbaud three-input quaternary hybrid addition and subtraction of A+B−C, A+C−B, and B+C−A. As shown in Figure 22, the observed OSNR penalties of three-input quaternary hybrid addition and subtraction are accessed to be less than 7 dB at a BER of 2×10^{−3} (7% EFEC threshold). The increased OSNR penalties might be mainly due to the relatively low conversion efficiency for converted idlers and accumulated distortions transferred from three input signals (A, B, C). The insets in Figure 22 depict corresponding constellations of the B-to-B signals and converted idlers. The BER curves and constellations of three output signals (A, B, C) after graphene are also shown in Figure 22 for reference.

## 5. On-chip M-ary optical computing

Silicon photonics has become one of the most promising photonic integration platforms for its ultrahigh level of integration, low power consumption, and CMOS compatibility. In addition, nonlinear interaction will also be enhanced in silicon waveguides due to its tight light confinement. Thus, SOI is also considered to be a favorable nonlinear optics platform. To minimize the footprint of the computing building block and lower the power consumption, we demonstrate on-chip M-ary optical computing by adopting silicon photonics technology.

We first experimentally demonstrate all-optical two-input (A, B) optical quaternary doubling/subtraction (2A−B, 2B−A) using a silicon waveguide. The silicon waveguide used in the experiment is shown in Figure 23.

Figure 24 shows the measured symbol sequence for two-input optical quaternary hybrid doubling/subtraction. It can be confirmed from Figure 24 that simultaneous quaternary hybrid doubling/subtraction (2A−B, 2B−A) are successfully implemented using QPSK, degenerate FWM, and coherent detection.

We also experimentally demonstrate three-input (A, B, C) optical quaternary addition/subtraction (A+C−B, A+B−C, B+C−A) using such a silicon waveguide. Figure 25 shows the measured symbol sequence for three-input optical quaternary addition/subtraction.

It is relatively difficult to experimentally demonstrate higher-order computing using a pure silicon waveguide due to the large OSNR penalty. Thus, we simulate hexadecimal optical computing using nonlinear interactions in a silicon-organic hybrid slot waveguide [101]. Figure 26(a) shows the structure of a silicon-organic hybrid slot waveguide. It features a sandwich structure with a low-refractive-index PTS [polymer poly (bis para-toluene sulfonate)] layer surrounded by two high refractive index silicon layers. The TM mode profile and its power density along x/y directions are depicted in Figure 26(b)–(d). Tight light confinement is observed in the nanoscale nonlinear organic slot region, which offers high nonlinearity and instantaneous Kerr response. We assess the effective mode area and nonlinearity to be 7.7 × 10^{−14} m^{2} and 5500 w^{−1}m^{−1}, which can potentially facilitate efficient optical signal processing (e.g., hexadecimal addition/subtraction).

Figure 27 depicts simulation results for three-input multicasted 40-Gbaud (160-Gbit/s) hexadecimal addition/subtraction. Twenty symbol sequences are plotted in Figure 27, which confirms the successful implementation of three-input hexadecimal addition/subtraction (A + B − C, A + C − B, B + C − A, A + B + C, A − B − C, B − A − C). The constellations are also shown in Figure 28.

We further investigate the EVM of input signals and output idlers as functions of the OSNR of input signals. The results are shown in Figure 29(a) and (b). The EVM penalties are less than 4.5 for hexadecimal addition/subtraction under a 28-dB OSNR. EVM of hexadecimal addition/subtraction as a function of input signal power are shown in Figure 30. EVM increases slightly (<0.8 dB) with input signal power <50 mW, which implies a large available dynamic range (~27 dB).

## 6. Conclusion

In this chapter, we have reviewed recent research efforts toward M-ary optical computing by adopting multilevel modulation signals and exploiting optical nonlinearities.

M-ary optical computing using HNLF: By adopting 100-Gbit/s two-input (D)QPSK signals (A, B) and exploiting three degenerate FWM processes and three nondegenerate FWM processes in an HNLF, simultaneous 50-Gbaud two-input quaternary addition (A+B), dual-directional subtraction (A−B, B−A), complement (−A, −B), and doubling (2B) have been demonstrated in the experiment.

Graphene-enhanced optical nonlinearity for M-ary optical computing: We experimentally demonstrated hybrid two-/three-input quaternary addition/subtraction optical computing in a graphene-assisted nonlinear devices.

On-chip M-ary optical computing: To minimize the footprint of the computing building block and lower the power consumption, we demonstrate on-chip M-ary optical computing by adopting silicon photonics technology. We experimentally demonstrated on-chip quaternary addition/subtraction optical computing in a silicon waveguide. On-chip hexadecimal addition/subtraction is also numerically investigated using a silicon-organic hybrid slot waveguide.

Addition and subtraction are considered to be fundamental building blocks of digital signal processing. Optical signal processing technology opens a new world for ultrahigh-speed arithmetic operations. With future improvements, other different optical nonlinearities on various nonlinear optical device platforms would also be employed to flexibly manipulate the amplitude and phase information of advanced multilevel modulation signals. In addition, more complicated computing functionalities can be introduced, which might open diverse interesting applications in robust optical computing operation.

## Acknowledgments

This work was supported by the National Program for Support of Top-notch Young Professionals, the National Natural Science Foundation of China (NSFC) under grants 61222502, 11574001, and 11274131, the Program for New Century Excellent Talents in University (NCET-11-0182), the National Basic Research Program of China (973 Program) under grant 2014CB340004, the Wuhan Science and Technology Plan Project under grant 2014070404010201, the Fundamental Research Funds for the Central Universities (HUST) under grants 2012YQ008 and 2013ZZGH003, and the seed project of Wuhan National Laboratory for Optoelectronics (WNLO). The authors thank the Center of Micro-Fabrication and Characterization (CMFC) of WNLO for the support in the manufacturing process of silicon waveguides. The authors also thank the facility support of the Center for Nanoscale Characterization and Devices of WNLO.