## Abstract

Deep learning has become a vital approach to solving a big-data-driven problem. It has found tremendous applications in computer vision and natural language processing. More recently, deep learning has been widely used in optimising the performance of nanophotonic devices, where the conventional computational approach may require much computation time and significant computation source. In this chapter, we briefly review the recent progress of deep learning in nanophotonics. We overview the applications of the deep learning approach to optimising the various nanophotonic devices. It includes multilayer structures, plasmonic/dielectric metasurfaces and plasmonic chiral metamaterials. Also, nanophotonic can directly serve as an ideal platform to mimic optical neural networks based on nonlinear optical media, which in turn help to achieve high-performance photonic chips that may not be realised based on conventional design method.

### Keywords

- deep learning
- inverse design
- plasmonic metasurface
- dielectric metasurface
- chiral metamaterials
- all-optical neural network

## 1. Introduction

In the past several decades, nanophotonics has been demonstrated as an ideal platform to manipulate the light-matter interaction and engineer the wavefront of the electromagnetic wave at will. The rapid development on nanophotonics has led to tremendous applications ranged from lasing, Lidar, biosensor, LED, photodetector, integrated photonic circuit, invisibility cloak, etc. Nanophotonics covers many exciting topics: photonic crystal, plasmonics, metamaterials, and nanophotonics based on some novel materials (e.g., two-dimensional materials, perovskite). Currently, the building blocks for nanophotonics are made from either metallic or dielectric elements with regular shapes, such as rectangular wire, cylinder, cuboids, and sphere for plasmonic and dielectric metasurfaces. Usually, limited parameters are provided for such a regular structure, and, thus, the optimisation process can be done in a reasonable short time. For example, a single dielectric cylinder with only two parameters, including diameter and height, are involved. Due to the limited freedom, the performance of photonic devices based on the regular pattern is far away from the optimal one. Inverse design method has been widely used to tackle this problem because the full parameter space can be explored [1]. Conventional inverse design methods that include topology optimisation, genetic algorithm, steep descent, and particle swarming optimisation shown in Figure 1a, however, require the vast computational source and take a long time to find the optimal local structure. As a branch of machine learning, deep learning has received much attention worldwide because it can efficiently process and analyse a vast number of datasets. It has already found great success in computer vision and speech recognition. Recently, researchers and scientists have applied it to quantum optics, material design and optimisation of nanophotonic devices due to its outstanding capability of finding optimal solution from enormous data. At the same time, the computational cost is much lower compared to other inverse design methods [2, 3]. Several neural networks including deep neural network, generative neural network and convolutional neural network are frequently used to retrieve the optimal structure parameters for irregular structure with limited sets of data and shorter time when many structure parameters are involved for opmisation. This book chapter is organised as follows: In Section 2, we will discuss the inverse design enabled by deep learning on four different topics: multilayer structure, plasmonic metasurface, dielectric metasurface, chiral metamaterials (See Figure 1b). In Section 3, we review the recent progress on all-optical neural networks. Then, concluding remarks and outlook are presented in Section 4.

## 2. Optimisation of nanophotonics design by deep learning

Recently, deep learning using an artificial neural network has emerged as a revolutionary and powerful methodology in nanophotonics field. Applying the deep learning algorithms to the nanophotonic inverse design can introduce remarkable design flexibility which is very challenging and even impossible to achieve based on conventional optimisation approaches [1]. In this section, we will provide a brief review of the implementation of deep learning to solve nanophotonic inverse design problems.

### 2.1 Design of multilayer nanostructures by deep learning

Multilayer nanostructures can exhibit unique optical properties including field enhancements and distributions, special transmission/reflection spectra, based on the interference of different modes supported by different layers in the nanostructures. Machine learning has emerged as a more and more promising tool to solve the inverse design of photonic nanostructures. It will enable effective inverse design by simultaneously considering various inter-linked parameters such as geometric parameters, material types, etc., simultaneously (unlike the current regular approaches, which optimise one or two parameters only, at a time).

A recent work done by Peurifoy et al. has demonstrated using deep neural network (DNN) to relate the geometry of SiO_{2}/TiO_{2} multilayer spherical core-shell nanoparticles with their light-scattering properties (Figure 2a) [4]. The transfer matrix method has been used to analytically solve the scatterings to generate 50,000 different combinations of the shell thickness as the total examples for training, validation, and testing. The forward learning model was a fully-connected dense feed-forward network with four hidden layers. The inputs were set to be the thickness of each shell of the nanoparticles, and the outputs were the corresponding scattering cross section spectra. During the learning process, the output of the network was compared with the target response to provide a loss function against which the weights can be trained and updated. After the forward-feeding training process, by fixing the weights, and setting the inputs as a trainable variable and fix the output to the desired output, they run the neural network backwardly, let the neural networks to iterate the inputs and provide the desired geometry to give the target spectrum. After training, as can be seen from Figure 2a, for an arbitrarily given spectrum (blue curve), the DNN can successfully predict the thickness of each shell of the nanoparticles that can generate a similar scattering spectrum as wanted, with some minor deviations.

A further improvement of this approach is to take into account the different material combinations for the core-shell nanoparticles. In another work done by So et al., they have considered a simultaneous inverse design of materials and structural parameters using the deep learning network (Figure 2b) [5]. Here, they use the network to map the extinction spectra of the electric dipole (ED) and magnetic dipole (MD) to the core-shell nanoparticles, including the material information and shell thicknesses. The DL model consists of two networks: a designed network to learn a mapping from optical properties to design parameters, and a spectrum network to learn from design parameters to optical properties. Here, in order to adapt the network to the different types of input data (materials and thicknesses), the loss function has been devised accordingly by the weighted average of material and structural losses:

A similar network has also been used to explore the optical transmission spectra from multilayer thin films (Figure 2c,d) [8]. Here, Liu et al. combined the forward network modelling and inverse design in tandem architecture to overcome the data inconsistency which originates from the non-uniqueness in inverse scattering problems, i.e., the same optical responses can correspond to different designs. This non-uniqueness of the response-to-design mapping will cause conflicting examples within the training set and might lead to non-convergence of the neural network. The TN architecture consists of an inverse-design network connected to a forward model network. The forward network learns the mapping from the structural parameters to the optical responses and is trained separately first. After the forward network is trained, it is placed after the inverse-design model network, and its network weights remain fixed during the training of the inverse-design model network. The inverse-design network learns a mapping from the optical responses to the structural parameters. After the training process, such a DNN can efficiently predict the geometry of a device which is both promising and much faster as compared with the conventional electromagnetic solvers. As shown in the right diagram of Figure 2d, the learning curve of this tandem neural network has demonstrated a fast convergence during the training process. The structures designed by the network matches the desired transmission spectra with high fidelity.

### 2.2 Design of plasmonic metasurfaces by deep learning

Plasmonic metasurfaces have become the building blocks for the meta-optics field. It allows for manipulating the wavefront of the electromagnetic wave at will. In this section, we are going to give a summary of the current status applying deep learning approach for inversely designing plasmonic metasurfaces.

In recent years, with the burgeoning field of metasurfaces, deep learning has emerged as a powerful tool for realising efficient inverse design of different types of plasmonic metasurfaces for different applications including spectral control, near-field design [9, 10, 11]. In 2018, Malkiel et al. introduced a novel bidirectional DNN model which can realise both the design and characterisation of plasmonic metasurfaces [12]. The network consists of two standard DNNs: a geometry-predicting network (GPN) to solve the inverse design and a spectrum-predicting network (SPN) to solve the spectra prediction tasks for plasmonic metasurfaces of “H”-shaped gold nanostructures. They have shown that by combing these two networks and optimise them together, they can co-adapt to each other, which is more effective than training them separately, as shown in Figure 3a. The training data for the GPN consists of three groups of data: desired spectra for *x*-polarised pump and *y*-polarised pump, and the materials’ properties. Each group of data is fed into a different layer and three DNNs in parallel before they join the fully connected joint layers. This architecture has considered the differences of properties in the inputs’ data, thus allows a better performance of the networks suitable for the nanophotonic design. After that, they were using the predicted geometry from the GPN to feed the SPN and returns the predicted transmission spectra as the outputs. Then the backpropagation is used to optimise both networks. The networks show excellent agreement between the measurements, predictions and simulations, as demonstrated by two examples shown in Figure 3b using the network to realise the inverse design of “H”-shaped gold metasurfaces for target spectra.

As the structural complexity grows, the generation of the training data sets takes enormous time. Furthermore, the requirement for more degrees of freedom in metasurface patterns makes the problems more and more challenging for conventional neural networks. To solve this issue, generative adversarial network (GAN) has been employed for metasurface designs recently [13]. A GAN involves placing two neural networks (a generator and a critic) in competition with each other and trying to reach an optimum, as shown in Figure 3c. Here, the simulator was first pretrained using 6500 full-wave finite element method (FEM) simulations for metasurfaces with different shapes. After the training, the simulator was used to approximate the transmission spectra of any input patterns rather than using the full-wave FEM simulations to do it. This has significantly reduced the number of datasets for the network. The generator is used to produce the metasurface patterns in response to a given input spectra T, and then fed into the simulator to get the approximated spectra T′. The critic will compare the original input geometric data corresponding to T and the generated patterns from the generator and guide the generator to produce patterns that share common features with the geometric input data. Figure 3d gives one example demonstrating the excellent performance of this network on predicting and identifying the structure to produce the target spectra with only minor deviations.

### 2.3 Design of dielectric metasurface by deep learning

Recently, dielectric metasurface has triggered extensive interests in the past decades. Analogous to metallic nanostructures supporting plasmonic resonance, high index dielectric nanostructures provide multipole electric and magnetic resonance (also called as Mie resonance), which enable 2π phase coverage without ease. Besides, the intrinsic material loss is much lower for high index semiconductor than the counterpart of noble metals. These two unique properties make it possible to develop high-performance photonic devices based on dielectric metasurface. Although dielectric metasurfaces with such regular elements have much better performance compared to the plasmonic metasurfaces, they still do not reach the optimal one with the best efficiency. In order to further improve the performance of dielectric metasurface, inverse design approaches, including adjoint-based topology optimisation and genetic algorithms, have been widely used. The iterative optimisation methods lead to the findings of devices with high efficiency with irregular patterns which are usually beyond human intuition. However, these methods rely on extremely heavy computation, making them hard to apply to sophisticated devices featured by a very high dimensional design space. The recently developed deep learning approach, which is based on artificial neural networks, is viewed as the perfect solution of dealing massive data while reducing the computation cost. It has already found great success in computer vision and natural language processing. Recently, researchers have transferred deep learning to the inverse design of nanophotonic devices. Up to date, most frequently used neural networks in the design of dielectric metasurfaces are DNN, GAN, and convolution neural networks (CNN) In the following, we will illustrate them one by one and also discuss their unique strengths and drawbacks.

DNN with fully connected layers has been demonstrated as a versatile and efficient way of engineering a high-Q resonance with desired characteristics, including linewidth, amplitude, and spectral location [14]. The structure considered here is double identical silicon nanobars sitting on the substrate, as shown in Figure 4b. The width and length of nanobars are,, respectively, denoted as W and L while the centre to centre distance between nanobars is denoted as 2x_{0}. To reduce the structure complexity, the period of the unit cell and the thickness of silicon bars are fixed as p = 900 nm and t = 150 nm, respectively. Previous studies have demonstrated that such an array structure support a Fano resonance induced by the quasi bound state in the continuum. Since there are three parameters to be tuned, it is very challenging to find the desired structure parameters by one by one brute-force searching when the spectrum response is predefined. DNN can correctly address this issue in an reduced time period. 25,000 sets of the training data are randomly generated with rigorous coupled-wave analysis (RCWA). It is worth noting that it is straightforward and easy to train the network mapping from structure parameters to reflection/transmission spectrum because one set of structure parameters can only produce a given spectrum. The objective is to search the structure parameter for the desired spectra response. It might be challenging to use an only forward neural network to find out the required parameters because the non-uniqueness issue arises. In other words, different designs may produce the same far-field electromagnetic response because the optical resonance is mainly governed by the volume of structure but shows weak dependence on the structure shape. To solve this one-to-many issue, as shown in Figure 4a, a Tandem neural network consisting of inverse design model network and the forward model network is proposed. More specifically, the forward network is trained first to learn the mapping from structure parameters to the optical response. After the training of the forward network is done, inverse design model network is trained while the weight and bias for the forward network are fixed. Once the full training process is completed, one can retrieve the structure parameters in several milliseconds while the optical spectrum is predefined. In order to test the validity of Tandem network, Figure 4c–e compares the predefined spectrum and predicted spectrum of Fano resonance with different wavelength, linewidth and amplitude. The excellent agreement can be found between two, indicating the effectiveness of the deep learning approach in the inverse design of nanophotonics. Note that only amplitude of transmission spectrum is considered here. In many applications of dielectric metasurface (e.g., metalens), both amplitude and phase should be considered to shape the wavefront of electromagnetic wave. Since optical resonance is always accompanied by π phase-shift, which may make training difficult for phase spectra because it is better to be differentiated for output parameters (i.e., phase or amplitude). Instead of using phase and amplitude, researchers adopt both real and imaginary parts of the reflection/transmission spectrum as the output of training data.

Moreover, because of the huge mismatch between the dimensions of input and output, a revised neural network was applied. The first standard linear neural network was replaced with the bilinear tensor layer that can correlate two entity vectors in multiple dimensions. Training results indicated that modified neural network converges faster than the standard linear neural network. This is because input parameters are interdependent on each other. Taking an array of dielectric nanodisk as an example, the structure is fully described by four parameters: refractive index of materials, radius and height of disk, the gap between disks. As we mentioned previously, the optical resonance is mainly determined by the refractive index and volume of structures. In other words, the spectrum response is governed by permittivity (ε = n^{2}) and volume (V = πr^{2}h). Therefore, multiplication of two entities by bilinear tensor can better describe the nonlinearity, and thus facilitate the training process. However, it is worth pointing out that there are some limitations on deep neural network. First, the design solution retrieved from deep learning must fall into the boundary of the training data set. Second, it only works for structure defined by several simple parameters. When more parameters are involved, tens, hundreds of thousands of training data are required to guarantee the prediction accuracy. As a consequence, generating such a large amount of data may consume a long time and cause a high computational cost. Moreover, it will be challenging to train the data for dielectric metasurface with free form geometry via DNN.

GAN has been found to overcome the above limitations effectively. GAN is originally proposed in the computer vision. It is capable of creating artificial images that even cannot be distinguished from true images by the computers [15]. GAN has been successfully applied to the design of subwavelength scale metallic nanostructures and multifunctional dielectric metasurface [13, 16]. The operation principles of GAN in the design of metasurface are described as follows. The unit cell of the metasurface is divided into N*N (i.e., N = 32, 64) pixel images while the thickness of structure and period of the unit cell is fixed. There are two neural networks in GAN: generator and discriminator. The generator networks try to create the image so that it cannot be differentiated to the real image. In contrast, the discriminator networks are trained to distinguish the image produced by the generator from the real image sets. The competing process between these two networks leads to the creation of artificial images that cannot be distinguished from the real one. In fact, the topology optimisation method or deep learning approach does not always work alone. They can be combined together to build up a new generative network. Such a generative network has been proposed to optimise the efficiency of metagrating at large angle across a broadband wavelength range because it took both the advantages of GAN and adjoint-based topology optimisation [17]. Although GAN requires less training sets, the training data may be optimised first and thus demand more computation source. More recently, global topology optimisation networks (GLOnets) was proposed by Jiang et al. from Stanford [18, 19]. It incorporates the adjoint-based optimisation into the generative neural networks. Unlike DNN and GAN methods, it does not require pre-calculation of training data based on the electromagnetic solver. Instead, it adopts the generator networks followed by the adjoint-based topology optimiser, allowing for direct learning the physical relationship between geometry parameters of the device and electromagnetic response, as shown in Figure 4f. Such a global optimiser does not only reduce the computation time but also further improve the efficiency of metagrating at large angles compared to the topology optimisation method (See Figure 4g).

### 2.4 Design of chiral metamaterials by deep learning

Another example of deep learning’s application in nanophotonics is to design plasmonic chiral metamaterials [20, 21]. Chirality corresponds to the structure–property of an object which cannot superpose to its mirror image by any combination of rotation and translation. It shows different response under the illumination of left circular polarisation (LCP) and right circular polarisation (RCP) incidence. This concept is originated from molecules or ions in chemistry. However, the optical chirality in nature is extremely weak due to the small interaction volume in the visible wavelength. The emergence of metamaterials makes it possible to realise a strong optical chiral response. It is well established that a pair of rotating gold split-ring resonators (SRRs) separated by a dielectric spacer can induce strong chirality. The question of how to optimise the chirality at the given frequency still remain unanswered because so many parameters involved make it difficult to find out the optimal design [20]. The advent of machine learning approach provided the possibility of processing many parameters at once in a reasonable short time. Ma et al. developed a deep learning-based model to design and optimise three-dimensional plasmonic chiral metamaterials at the desired wavelength. The structure they considered is shown in Figure 5a. The period of the unit cell is fixed as 2.5 μm while the thickness and width of gold SRR are set as 200 nm and 50 nm, respectively. Other parameters, such as length of top and bottom SRR (*l*1 and *l*_{2}), top and bottom dielectric space layer (*t*_{1} and *t*_{2}), and the twisted angle α between two SRRs, are set as input parameters. For output parameters, 201 points are sampled in the reflection spectrum from 30 to 80 THz. Here, four characteristic reflection spectra that include R_{LL} (LCP-input: LCP-output), R_{LR} (LCP-input: LCP-output), R_{RR} (RCP-input: RCP-output) and chirality spectrum are investigated as output parameters. Figure 5b shows the structure of DNN that consists of primary networks (PN) and auxiliary network (AN). Both networks have a forward path and an inverse path. For the forward path of PN, the huge mismatch of dimension between input parameters (1 × 5) and output parameters (3 × 201) makes it hard to converge. This is especially obvious around the resonant frequency. To avoid this issue, a neural tensor network followed by the unsampled module is used. Instead of using DNN with fully connected layers that are formed by simply linear recombination from previous neurons, the first hidden layer is replaced as the neural tensor network to model second-order relationships because the input parameters are not independent with each other. Figure 5c compares the reflection spectra obtained from electromagnetic simulation and prediction of PN. The excellent agreement can be found for most wavelengths except around resonant wavelengths. This issue is well addressed by introducing another AN which learns the relationship between structural parameters and chirality spectrum. The results are shown in Figure 5d. After finishing the training both PN and AN, one can construct any chirality spectrum feature by single or double resonances as well as optimise the chirality at predefined spectrum. Note that such networks are not the only one which can design and optimise the chiral metamaterials. Li et al. developed a self-consistent framework termed BoNet (Bayesian optimisation (BO) and CNN) [21], which can conduct self-learning on the optical properties of nanostructure (i.e., near field and far-field). The unit cell of structure, as shown in Figure 5e, is divided into 40 × 40 pixels, where the empty area is denoted as 0, and the gold brick area is denoted as 1. Other parameters, such as period and thickness, are fixed. DNN used here is composed of convolution layers followed by several fully connected layers (see Figure 5f). Successful training on the BoNet can help to optimise the chirality at an arbitrary wavelength in the visible wavelength range. Figure 5g shows the chirality spectra of measurement and prediction from BoNet. The discrepancy can be attributed to the tolerance of fabrication and measurements.

## 3. All-optical neural networks

As was discussed above, neural networks have been successfully used to solve rather complex problems in nanophotonics in particular. There are two fundamentally different alternatives for the implementation of neural networks: a software simulation in conventional computers or a particular hardware solution capable of dramatically decreasing execution time. Software simulation can be useful to develop and debug new algorithms, as well as to benchmark them using small networks. However, if large networks are to be used, software simulation is not enough. The problem is the time required for the learning process, which can increase exponentially with the size of the network.

At the same time, there are ongoing attempts to implement this architecture in a hardware form, which should allow for substantial gains for scaling and distributed approaches. Digital circuits are usually implemented by using robust CMOS technology, where the neuron state summation is realised via common multipliers and adders. The activation function is more complicated to implement, which require a highly nonlinear response. One of the technical difficulties is related to the implementation of communication channels. In general, the connection scales as a square of the number of inputs. One of the solutions to this problem can be provided by optical networks, where the communication channels do not need to be hard-wired [22, 23]. Also, in free space, light waves can cross each other without affecting the carrying information. Other benefits include low energy to transmit the signal and high switching time up to 40 GHz. Thus, analogue optical technology allows to implement artificial neural networks directly in hardware, with data encoded in pulses of light and neurons made from optical elements, such as lenses, prisms, beam splitters, waveguides and spatial light modulators (SLMs), see Figure 6a. In particular, SLMs are used for algebraic operations, including matrix multiplication with a specific phase mask design [24].

Recently, another approach to realise optical neural networks was based on Mach-Zehnder interferometers (MZIs) to calculate matrix products [25, 26], see Figure 6b. By carefully manipulating a specific phase shift between a coherent pair of incoming light pulses allow to multiply a two-element vector, encoded in the amplitude of the pulses, by a two-by-two matrix [27, 28]. An array of the interferometers can then perform arbitrary matrix operations, which is widely used, for example, in the boson sampling approach.

One of the main challenges for the successful realisation of the optical neural networks is to find a suitable implementation of the activation function. Due to its inherent nonlinear response, light pulses are required to interact with a nonlinear media. Various nonlinear effects have been proposed for such functionality. To avoid optical signal loss, mostly dielectric materials have been considered. It includes photorefractive crystals, liquid crystals, and various semiconductors [29]. Most promising nonlinear effects are based on harmonics generation, phase conjugation, optical limiter, and bistable response. Recently, researchers from The Hong Kong University of Science and Technology proposed a new approach based on cold atoms exhibiting electromagnetic induced transparency effect to implement the nonlinear activation function [24]. Importantly, it requires very weak laser power and is based on nonlinear quantum interference. It is also possible to produce different activation functions by varying the positions of counterpropagating beams.

The group from the University of Münster has suggested an alternative approach by exploiting the wavelength-division multiplexing (WDM) to transport and sum multiple pulses at different wavelengths using single waveguides [30]. Importantly, they suggest a phase-change material (PCM) for both linear summing and nonlinear firing. In this approach, each neuron is implemented as a ring-shaped resonator of varying diameters to tap light signals with corresponding resonant wavelengths from a common waveguide. When the total power of all those signals exceeds a certain threshold, they then switch another piece of PCM, this time embedded in a resonator at the neuron’s output.

Despite recent progress in all-optical implementation of neural networks, various groups investigated hybrid optoelectronic systems in which neurons convert signals from light into electricity and then back to light. The group from Princeton suggested using electro-absorption modulation for the optimal integrated photonics implementation of the neural networks [31]. One of the essential aspects is the integration density. The electro-optical induced nonlinearity is realised by using photodiode couplers. Moreover, it also allows for spiking signal processing, which enables the direct implementation of neuromorphic computing. It led to the development of a new and quite promising platform of neuromorphic photonics combining the advantages of optics and electronics to build systems with high efficiency, high interconnectivity and high information density.

## 4. Conclusion and outlook

Although deep learning was proposed and found great success in the context of computer vision and speech/image recognition, it has become a powerful approach to solve complex problems in biology, physics and chemistry. As a branch of physics, nanophotonics has witnessed huge progress based on deep learning. Deep learning allows us to inversely design nanophotonic devices with even less computation source and time compared to conventional computational approaches, such as topology optimisation and genetic algorithm. Currently, the research interests and efforts are still fast-growing and expanding in deep learning-enabled nanophotonics. More research opportunities may be brought in this area.

On the one hand, although deep learning has been successfully applied to retrieve the structure parameters for any given spectrum, it remains an opening question that whether it is possible to realise narrowband or broadband absorbers at the specified wavelength or wavelength range. On the other hand, by combining deep learning and topology optimisation, beam steering at relatively large deflection angle with high efficiency has been demonstrated for single- or bi-operation wavelengths. Next step is to utilise deep learning to optimise the metasurface design with multi-functionalities further. For example, current broadband achromatic metalens has limited focusing efficiency. We believe the deep learning can entirely overcome this limitation by providing more irregular combinations of metaatoms that cannot be found by regular cylinder metaatoms. Finally, since nanophotonics offers a powerful and versatile platform to realise optical neural networks, more advanced and fast photonic chips that can bypass the computational capability based on traditional electric chips will be developed and paved the way toward the photonic computer.

## Acknowledgments

The authors acknowledge the funding support provided by UNSW Scientia Fellowship and ARC Discovery Project (DP170103778).

## Conflict of interest

The authors declare no conflict of interest.