Open access peer-reviewed chapter - ONLINE FIRST

Generative Adversarial Networks: Applications, Challenges, and Open Issues

Written By

Dorcas Oladayo Esan, Pius Adewale Owolawi and Chunling Tu

Submitted: 30 July 2023 Reviewed: 04 September 2023 Published: 23 November 2023

DOI: 10.5772/intechopen.113098

Deep Learning - Recent Findings and Research IntechOpen
Deep Learning - Recent Findings and Research Edited by Manuel Domínguez-Morales

From the Edited Volume

Deep Learning - Recent Findings and Research [Working Title]

Ph.D. Manuel Jesus Domínguez-Morales, Dr. Javier Civit-Masot, Mr. Luis Muñoz-Saavedra and Dr. Robertas Damaševičius

Chapter metrics overview

38 Chapter Downloads

View Full Metrics

Abstract

Generative Adversarial Networks (GANs) represent an emerging class of deep generative models that have been attracting notable interest in recent years. These networks are unique in their capacity to train high-dimensional distributions spanning a range of data types. Conventional GANs encounter problems related to model collapse, convergence, and instability. These issues can be primarily attributed to suboptimal network architecture design, misuse of objective functions, and inappropriate parameter optimisation methods. Several studies have made efforts to tackle these issues, to varying degrees of success. This research aims to offer an exhaustive review of contemporary techniques utilised in GANs, the persisting problems they face, applications of these techniques and performance evaluation metrics across various sectors. Comprehensive searches were performed using selected publications from 2014 to 2022 and out of 260 publications retrieved, 20 publications (7.69%) were deemed eligible. The result using Comprehensive Meta-Analysis (CMA) tool shows the mean effect size is −0,537 with a 95% confidence interval of −1205 to 0,132 having a p-value >0.05. This analysis will equip researchers with deeper insights into the potential applications of GANs and how they can help address current challenges in various domains.

Keywords

  • deep learning
  • generative adversarial networks
  • computer vision
  • meta-analysis
  • generative models

1. Introduction

In recent times, Generative Adversarial Networks (GANs) have gotten impressive consideration in academia and industry. GANs have caught on to immense useful capability across multiple disciplines to exclusively train high-dimensional distributions across image data, textual data, and video data [1]. This specific is employed in the GANs as a supervised approach to induce synthetic data. GANs are inspired by game theory. The Generator and Discriminator contend with learning deep representations without redundant training data for the distribution of realistic samples; due to these numerous experimenters in the field of image and computer vision, this has made GANs have outstanding advantages over other generative models similar to Variational Autoencoder (VAE) and Deep Generative Models (DGMs).

Deep Generative Models (DGMs) are types of algorithms that can create new data based on patterns they have learned. Some examples of these models are Restricted Boltzmann Machines (RBM), Deep Belief Networks (DBN), deep Boltzmann machines (DBM), and Generative Probability Networks (GPN). These models are useful for handling large amounts of data. The underlying distribution contains a synthesis of data and new samples, which are formed using a method called the Markov Chain Monte Carlo (MCMC). This method calculates and vanishes the gradient of the log-likelihood during the training process. This is one of the major reasons why the samples generated from Markov chains are slow.

The VAEs are another generative model that makes use of deep learning and statistical inference to represent data in the latent space [2]. They also experience complications in the calculation of intractable probabilistic computations. Furthermore, these models are being trained by maximising the probability of the training data, which iterates over many dataset dimensions, such as photos and videos. Samples from Markov chains models in high-dimensional space are ambiguous, imprecise, and computationally slow.

To address the challenges, Goodfellow et al. [3] developed Generative Adversarial Nets (GANs) that serve as an alternative method in addressing issues related to MCMC during the training of GANs discriminative and generative model by utilising backpropagation technique. Hence, this feature serves as an advantage in the utilisation of GANs for the generation of realistic images as it prevents the complexity related to maximum likelihood learning [4].

Two adversarial models make up the structured probabilistic model known as GANs; the discriminative model, known as Discriminator (D), is used to estimate the probability of determining whether data were generated from the actual data distribution or Generators (G) distribution.

In capturing distributed data, Generator (G) is sued for the gradient-based optimisation method, which is used together with Generator. This implies that the model is performed continuously until G can give image samples that are like the original image entered into the model, and D is unable to differentiate between the generated image and the original image. Hence, from the update G and D, the divergences between two distributions are computed from the loss. However, due to this advantage, GANs are often used for video generation, image super-resolution, and image generation [5]. Despite the positive results of GANs in various applications, unbalanced D and G training makes Gan highly unstable during training. Also, the discriminator’s gradient disappears if it is easy to tell the difference between real and fake images, and the generator stops updating if the discriminator cannot provide a gradient.

Numerous improvements in loss functions have been made to reduce divergence to handle the issue of model collapse. Furthermore, numerous solutions to stabilise the GANs training procedure have been proposed. As a result, performance metrics remain one of the challenges that have not been adequately addressed. Hence, this study investigates various GANs invariants and highlights the appropriate applications for GANs techniques, performance metrics, and challenging issues affecting current GANs. The contribution of this research is as follows:

  • We provide a comprehensive review of different GANs, applications, advantages, and limitations of each GANs technique.

  • The survey highlights the most promising future research directions.

  • Utilisation of meta-analysis to conduct a systematic evaluation of recently published GAN techniques to cover the existing research gap and contribute to the existing knowledge in providing researchers with deeper insights into the potential applications of GANs.

The structure of the paper is as follows. Section II explains various GANs invariants. Section III reviews the application of GAN models in various aspects of the world. Section IV lays out the challenges and open issues of GANs, Section V concludes the paper and Section VI discussed the Meta-Analysis of Gans in Different Applications.

Advertisement

2. Theoretical background

2.1 Generative modelling

Generative models in machine learning are a type of unsupervised learning where labels are not available, and the main objective is to learn the underlying data structure. The density estimation and sample generation are strengths of generative models. Probability distributions for several variables are represented by deep generative models. A portion of the models permits an unequivocal assessment of the likelihood circulation capability (express thickness), while others do not (verifiable thickness) and require some information on it, like examining the conveyances. For many tasks, the output-generated image samples from distribution are a fundamental requirement [2].

Generative models are important because they can be used to manipulate high-dimensional probability distributions, they can be used in reinforcement learning, and predictions with unknown data input. Finally, the use of multi-modal output by machine learning is made possible by generative models and GANs. Figures 1 and 2 show the structure of Generative Adversarial Networks (GANS) and Generative Model techniques.

Figure 1.

Taxonomy of generative model [2].

Figure 2.

Taxonomy of generative adversarial networks (GANs).

2.2 Implicit density models

The training for implicit density models takes place without specifying the density functions explicitly. The model is trained by sampling from a generative model and indirectly interacting with the generative model. To obtain a sample from the model, some models in this category define a Markov chain transition operator and draw samples from the generative model. The generative stochastic model serves as an illustration of this kind of network. However, as with any model that employs Markov chains, they have high computational costs and struggle to scale in high-dimensional spaces. GANs are exempted from this rule because they generate the model samples in one step, despite using Markov chains. The generative moment matching networks are an example of an implicit density model that relies on kernel moment matching. By minimising the maximum mean deviation, deep neural network cores are used to learn deterministic mappings from direct, easy-to-sample distributions from a given data distribution.

2.3 Adversarial networks

The first generative adversarial network architecture was proposed by [6], which gave generative models a breakthrough, even though these generative models were an active research area long before the introduction of adversarial networks. The quality of the outcomes generated by GANs was superior to that of other generative networks.

The advance in adversarial networks depends on the fundamental idea behind GANs. The discriminator, a generative learner or generator, was presented by the adversarial networks. The statistical theories of the discriminative and generative models are utilised by the generator and discriminator. To obtain similar training data, the generator is used while the discriminator is employed to differentiate sample input as either original or synthetic data. However, generator and discriminator models often act like rivals to learn because of the effectiveness of each other’s adversaries. In the conventional GANs, these models were pitted against one another in a minimax game, where the generator tries to maximise cross-entropy, and the discriminator tries to minimise it.

2.4 The generative adversarial networks (GANS)

The two neural networks (a generator and a discriminator) are employed in the algorithmic architectures of GANs. GANs are generative because they can learn a distribution (the training image dataset) and produce samples that fall within it. They are adversarial due to their game-like structure, which pits the Generator and the Discriminator against one another. Backpropagation is used to update the Generator’s and Discriminator’s parameters during training so that the generator can produce a realistic image output while the discriminator distinguishes between the produced synthetic images from actual ones.

2.4.1 The generator

The Generator (G) uses the random noise vector (Z) as its input to generate an image with the symbol G(z). The generated image is later passed into the discriminator. The generator is trained to produce increasingly realistic synthetic data that can deceive the discriminator into thinking it is real during training. It learns to represent an estimate of a distribution using a training set of samples from the p_data distribution.

2.4.2 The discriminator

The binary classifier is used to categorise the input data as the Discriminator (D); that is, data that is generated by the generator attempts to predict its category (fake or real). The generator receives feedback in the form of probabilities from the discriminative algorithms. The general architecture of GANs is depicted in Figure 3.

Figure 3.

General architecture of a GAN [7].

Advertisement

3. Generative adversarial networks techniques

This section discusses different GANs variants that have been used in literature in recent years and the advancements in some of the GANs. In this section, various types of GANs will be reviewed with their applications in image regeneration.

3.1 Deep convolutional generative adversarial networks (DCGANs)

Deep Convolutional GAN (DCGAN) is one of the GAN classes that utilise a G-network of neural deconvolution devices that construct images from d-dimensional vectors using deconvolution layers [8]. D-networks have the same equivalent structure as traditional CNNs, distinguishing whether the data is a predefined dataset or real images in G [9]. The training of DCGAN is represented by Eq. (1)(3).

ifX=XdataDXmaxDVD.G=ExpdataxlogDxE1
ifX=GZDX0;forDmaxDVD.G=EZPZxlog1DGzDX1;forGminGVD.G=ExpdataxlogDxE2
minGmaxDVD.G=ExpdataxlogDx+Ezpzzlog1DGzE3

where Pdata is the real data distribution, PZ is random noise, Xis the input data, Zis the dimensional vector, and the probability distributions of X and Z are represented as Pdatax and Pdataz is the probability of the input generated from Pdatax and 1DGz is represented as Dx, and G is train on log1DGz. As a result, G is optimised to maximise and minimum VD.G as in Eqs. (4) and (5), respectively.

DG=argmaxVGDE4
G=argminVGDGE5

From Eq. (3), G captures the data sample distributions, which gives samples similar to the noisy actual training data (z).

3.2 Conditional generative adversarial nets (cGANs)

Conditional Generative Adversarial Nets (cGANs) uses information in class label y such as class label by concatenating this information in y to the input, which is fed into the two models (the Generator G and the discriminator D). The minimax objective function can be modified as Eq. (6).

MinGMaxDFDG=ExlogDxy+Ezlog1DGzyE6

Where Ex is the real data samples, Dxy represents the likelihood of generated sample x is real, the generator output is given as G(z), and z represents random noise introduced to the image samples, DGzy is the discriminator’s probability estimate if the fake generated sample is real, Ez is the random input to the generator.

3.2.1 Advantage

  • Adding a label to generate the specified target convergence is faster.

3.2.2 Disadvantage

  • More requirements for the data set, the dataset needs to have a tag or mark.

3.3 Least squares generative adversarial networks (LSGANs)

In the generation of samples representing real-world data, other applications utilise LSGAN [10]. LSGAN has two advantages over GAN. First, LSGAN can generate better-quality images than traditional GANs. Second, LSGAN is more stable during the learning process [6]. GAN learning is unstable, so in practice, he has a more complicated problem when training a GAN.

The study conducted in [11] shows that GAN uncertainty during the learning process is often affected by the objective function. The reduction of objective function often affects the gradient loss, which makes the Generator updates more difficult. This hurdle can be overcome by LSGAN, as the sample punishment depends on the boundary distance, and further gradients can be introduced by altering the generator. By comparison, the instability of GANs during the learning process is based on the method-searching behaviour of the objective function, while LSGANs have less mode-searching behaviour [12]. The LSGAN cost function is shown in Eqs. (7) and (8), respectively.

minDVLSGAND=12EDXreal,i12]+12E[(DGXfake,i2]E7
minGLSGANG=12DGXfake,i12E8

The benefit of discriminators and generators in the LSGAN model allows data regeneration similar to input data [13].

3.3.1 Advantages

  • LSGAN improves the first-order GAN loss function by replacing the original cross-entropy loss function with the squared loss function.

  • The model improves the image quality outputs, which makes the convergence speed for training to be fasts

3.3.2 Disadvantage

  • A disadvantage of LSGAN is that it reduces sample diversity due to excessive penalties for outliers.

3.4 Wasserstein generative adversarial networks (WGANs)

The issue of GANs training network variability is addressed by WGANs. This problem is believed to be related to the presence of undesirable fine gradients in the GAN discriminant function [14]. WGANs are often utilised in data generation in the form of synthetic to monitor minority defect growth and use the synthetic signal to stabilise the training data set.

To estimate the Wasserstein distance, there is a need to find the K-Lipscgitz function which works the same way as the Discriminator (D) with the exception of the sigmoid function and produced scale output. The difference between GAN and his WGANs, in general, is that the discriminator is changed critically along with the cost function. The WGAN’s critical and generator cost functions can be seen in Eqs. (9) and (10), respectively.

WPrPg=infγεPrPgExyrxyE9
maxWεωExPrfwxEzpzfwGzE10

Where Π(Pr, Pg) denotes the set of all joint distributions γ(x, y) whose marginals are respectively Pr and Pg, (fw)wεω is the parameter for K-Lipschitz. D is used to optimise the parameter Wasserstein distance. The overall WGAN objective function is given as in Eqs. (11) and (12)

minGmaxWεωExPrfwxEzpzfwGzE11

OR

minGmaxDExPrfwxEzpzfwGzE12

3.4.1 Advantages

  • Due to WGAN’s efficient network architecture, the issue of training instability is resolved.

  • The lower the loss values of the corresponding generated image quality the better the training samples.

  • Training in WGAN requires a longer time.

3.5 Cycle GAN

The Cycle GAN framework helps in the transformation of image-to-image without training databases [15]. CycleGAN learns mapping from input (P) to output (Q) with the assistance of cycle consistent loss function.

There are two plot roles G in the cycle GAN framework P to Q. The Discriminator. DQ gains insight into the generator G that decodes P into a new synthetic output image. There are two cyclic losses; one in the forward direction is stated PGPFGPP, and another is the backward direction and expressed as qFqGFqq. The adversarial loss matched the distribution of the synthetic image with the target images in the target domain, while cycle inconsistency prevented the learned maps of G and F from contradicting each other. The adversarial loss and the cycle-consistent loss as in Eqs. (13) and (14).

LGANGDQPQ=EqsdataqlogDQq+Epsdatap[log(1DQGP]E13

where G represents the image generated, and DQ represents differentiated synthetic images. Hence, cycle consistency loss is given in Eq. (14).

LcycGF=EpsdatapFGPP1+EqsdataqGFqq1E14

The reconstructed image F(G(P)) is, therefore, very similar to the input image P. Hence, the objective function of the combined adversarial and cycle-consistent loss optimisation is given by Eqs. (15)(17).

LGFDpDQ=LGANGDQPQ+LGANFDPQP+LcycGFE15
GargminG,FmaxDP,DQLGFDPDQE16
Losscompute=Lossadv+LosscycE17

where adversarial loss is represented as Lossadv, the cyclic loss is represented as Losscyc, and represents the management of the objectives.

Model generalisation has been demonstrated in a variety of broad-spectrum applications without paired data, with CycleGAN showing excellent results. CycleGAN is a belt of texture and colour variation, but little progress has been made in geometric variation [15].

3.5.1 Advantages

  • low dataset requirements.

  • It helps in randomly changing the style of two images.

3.5.2 Disadvantages

  • The resulting target image quality is lower than pix2pix.

  • It can be useful in texture or colour, or transformation.

3.6 StyleGAN

StyleGAN is a type of generative adversarial network that uses a combination of PGGANs and Neural-style transfer technology [16]. StyleGAN gained attention by creating complete high-resolution levels with multiple control steps from image details to the whole [17]. StyleGAN can potentially resolve spatial entanglement issues in AdaIN, yb,i, scale ys,i, yb,i, and ys,i are the style biases for feature map xi. AdaIN is a shown in Eq. (18).

AdaINxy=ys,ixiμxiδxi+yb,iE18

where xi denotes the feature map, yb,i and ys,i are vectors, and μ is the mean.

3.6.1 Advantage

  • StyleGAN takes absolute control over the generated image by the latent vector w which changes the image’s style at various levels.

3.7 Big GAN (BigGAN)

Another variant of GANs is BigGAN [18], which is known for its large-scale and efficient imaging capabilities, making it one of the best GAN variant models available. This model utilises more parameters to train large networks. This gives very detailed results and significantly improves model performance. BigGAN contains several key features that show the high performance of GAN variants, namely controllability and initiation score (IS) by model output.

3.7.1 Advantage

  • BigGAN (BigGAN) attempts to further improve performance by increasing unconditional image processing power. The evaluation uses the increased Freshnet Inception Distance (FID) and inception score compared to the base model BigGAN.

3.8 Energy-based GAN (EBGAN)

The EBGAN [19] aims to solve the evaluation metric problem by using energy values as a metric. Energy-based models always create a mapping between a single point in the input space and a scalar value (also called “energy”).

The discriminator is based on an energy function that exhibits low energy for real data (desired configuration) and high energy for synthetic data (undesired configuration). Therefore, the energy function is different from the discriminator likelihood function of the underlying GAN. In addition, EBGAN also uses another loss function for the generator. Considering both model functions, the discriminator tries to produce low energy values, while the generator tries vice versa.

3.9 InformationGAN (InfoGAN)

The InfoGAN is built upon the GAN framework by generating interpretable representations for the latent variables in the model [20]. The key idea is to split latent variables into a set c of interpretable ones and a source of uninterpretable noise z. Interpretability is encouraged by adding an extra term in the original GAN objective function capturing the mutual information between the interpretable variables c and the output from the generator. More precisely, the InfoGAN minimax optimisation is defined as in Eq. (19).

minGmaxDVIGDG=VDG.IcGzcE19

Where G is the generator, D is the discriminator, z is the noise, c encodes the salient latent codes, and I denotes the mutual information. The loss function for InfoGAN is defined as follows:

minGmaxDLDGIcGzcE20

Where λI (c; G (z, c)) represents the mutual information loss, c is the generated samples.

Advertisement

4. Quantitative performance evaluation metrics

Another challenging part of training GANs is the performance evaluation, that is, to determine how the model fits the data distribution. There have been great advances in theory and application, and many GAN variants are now available. However, relatively little effort has been devoted to evaluating GANs, leaving some limitations in the quantitative evaluation methods. In this section, we introduce significant and common evaluation metrics used in evaluating GANs performance.

4.1 Frecher inception distance (FID)

This is defined as the Wassertein-2 distance between multivariate Gaussian fits the data embedded in the feature space [21] as in Eq. (21).

FIDrg=μrμg22+Trr+g+2rg1/2E21

Where μgg and μrr denotes the mean and covariance of the real and generated data distribution.

4.2 Inception score (IS)

This is used to measure the quality of images generated by GANs. This is done by measuring the average divergence of Kullback–Leibler (KL) between the conditional label distribution p(y|x) and marginal label distribution p(y) obtained from all samples [21], as shown in Eq. (22).

exp(Ex[KLpyxpy=expHyExHyxE22

where pyx is the image x conditional label from a trained model, and the marginal distribution is represented aspy.

4.3 Maximum mean discrepancy (MMD)

This is a distance on the space of probability measures. It utilises the differences drawn from each sample distributions probability of Pr and Pg. MMD can be used for two-sample testing as shown in Eq. (23).

MkPrPG=EX.XPrkxx2ExPr,yPyTkxy+Ey,yPykyyE23

Where Pr,PG is the sample distribution probability, x,y is the input image.

4.4 Kernel inception distance (KID)

The KID measures the dissimilarity between two probability distributions (Pr and Pg) using samples drawn independently from each distribution.

4.5 Peak signal-to-noise ratio (PSNR)

This metric evaluates the difference between the peak signal-to-noise ratio of images I and k and then compares the result with the corresponding real images [21]. The PSNR is computed as in Eqs. (24)(26).

PSNR=10log10L12MSE=20log10L1RMSEE24

where MSE is the mean squared error, L is the number of maximum possible intensity levels in an image.

PSNRIK=10log10maxi2MSE=20log10max2I20log10MSEI,KE25

where, MSEI,K=1mi=0m1i=0n1ImnKmn2 is the mean square error and Maxi is the minimum pixel value.

MSE=1mni=0m1j=0n1OijDij)2E26

where O represents sample image matrix, D represents degraded image data matrix data, m represents pixels number, i represents image columns, j represents the image column index.

4.6 Mode score (MS)

Mode score adequately reflects the variety and visual quality of the generated image. This is shown in Eq. (27)

exp(Ex[KLpyxpytrain)])KL(pyp(ytrain))E27

where p(ytrain) is the empirical distribution of labels computed from training data.

4.7 Structural similarity index measure (SSIM)

The structural similarity index measure (SSIM) is used for predicting the similarity between two images. The SSIM is shown from Eqs. (28)(30).

SSIMfg=IfgcfgsfgE28
Ifg=2μfμg+C1μf2+μg2+C1,cfg=2σfσg+C2σf2+σg2+C2E29
sfg=σfg+C3σfσg+C3E30

where i(f,g) is the comparison function that measures the closeness of two images mean μfandμg, the contrast comparison function is denoted as c(f,g), which measures the image contrasts together, and s(f,g) measures the correlation between two images (f and g).

Advertisement

5. Applications of GANs

GANS is currently used in various fields because it can be applied to many scenarios. Applications ranging from image applications to speech applications, art applications, and data generation in areas such as computer vision (CV) and artificial intelligence (AI) are some of the most common scenarios in which GANs are used [22].

Image applications implemented on GANs include image/video synthesis, image/video translation, and high-resolution editing [23]. Speech applications implemented on GAN include speech/text/speech applications and character generation. GANs are mainly used for music generation art applications. Object recognition is the primary way GANs are used in computer vision applications. Furthermore, GANs have proven applicable in many scenarios in the medical field as well.

GANs have been successfully applied in many areas of computer vision and image processing, which include video processing, image composition and manipulation, and image super-resolution.

5.1 Computer vision and image processing

5.1.1 Image synthesis and manipulation

The original GAN proposal was essentially aimed at solving the imaging problem. However, the latter variant of the model can implement the same theory to generate arbitrary data samples based on the input distribution, making GANs one of the most versatile models for image synthesis. Variants such as CGAN, WGAN, WGAN-GP, LAPGAN and InfoGAN all specialise in this. A similar model is also used in image processing.

5.1.2 Image translation

Also known as style mapping, this is used to map the styles of one image to another. This is achieved by using a model to transform the image into another format while preserving the aesthetic appearance of the image. This includes, for example, converting portraits of real people into pencil sketches. The most used variants to achieve this goal are CycleGAN, TripleGAN [24], and DiscoGAN [25].

5.1.3 Image super resolution (SR)

This is one of the difficult applications in computer vision, and it is a means of deriving a high-quality/upscaled image from a low-quality image by adding fine detail to an image in much more complicated ways than enhancing image features. A popular GAN model used to realise this type of object is Super-Resolution GAN (SRGAN) [26].

SRGAN can magnify the image by a factor of 4 compared to the input image. Another GAN model improved over SRGAN is Enhanced SRGAN (ESRGAN) [27]. ESRGAN uses relativistic concepts to create a GAN containing discriminators that can predict the relative reality of images. In addition, SR techniques can also be used for image reconstruction.

5.1.4 Video processing applications

A model implemented for video generation includes a GAN that combines a stationary background with a set of moving objects. However, this process becomes difficult because it is difficult to predict the motion of the object in each frame. Studies on several GAN variants suggest new ideas for generating video images, such as predicting future images using DNNs [5] and Disentangled-Representation Net (DrNET) [28].

Over the years, models have been proposed to learn disentangled images in videos. Another work called MoCoGAN, which synthesises new videos based on input videos, was proposed [9]. This technique used a GAN-based Video Generation (VGAN) with two generators—the two separate generators basically for each of the Generators to be able to perform two separate functions. One generator synthesises a moving foreground with moving objects, and the other generator is used to generate a stationary background. Another GAN variant is called Dual Video Discriminator GAN (DVDGAN) [29], which is used to generate high-definition video based on the BigGAN model.

5.2 Sequential data

Sequential data such as natural language, music, speech, speech, and time series have been one of the application areas in computer vision and image processing where GAN have been successful for data manipulation.

5.2.1 Natural language processing (NLP)

Information Retrieval GANS (IRGAN) was proposed for information retrieval “IR” [30]. The technique generates neural dialogues using adversarial learning [31]. Gans is also used for text generation [32] and speech processing [31] to produce high-quality speech samples and has been used to embed knowledge graphs. Adversarial Reward Learning (AREL) has been proposed in [33] for visual storytelling. DSGAN is proposed in [34] for the extraction of remote monitoring relationships. ScratchGAN [35] has been proposed by Qiao et al. to train speech GANs from scratch without maximum likelihood pre-training [36]. Learning text-to-image generation by redescription and text-associated auxiliary classifier GAN (TAC-GAN) [37] also has a text-to-image proposal and is used in image-to-text conversion (captions).

5.2.2 Music generation

In music generation, continuous RNN-GAN (C-RNN-GAN) [38], object-extended GAN (ORGAN) [39], and SeqGAN [39] have also been applied.

5.2.3 Speech and audio

In speech production [40], reinforcement [41], and recognition [42], GANs have been successfully applied to analyse speech.

5.3 Other applications

5.3.1 Medical field

GANs are also applied in health fields for DNA generation [43, 44], drug development, medical imaging in dental restorations [45], and the creation of individualised patient records and multi-label physician referrals [46].

5.3.2 Data science

In data science, GANs are also useful for data generation [47, 48], neural network generation [49], data augmentation [50], spatial representation learning [51], network embedding, and heterogeneous information networks. GANs can be used in other areas, including detection, privacy protection, etc.

Advertisement

6. Meta-analysis of Gans in different applications

6.1 Analysis using comprehensive meta-analysis (CMA)

The cumulative number of papers published on GANs from 2014 to 2020 is shown in Figure 4.

Figure 4.

Cumulative number of GAN-related paper publications per year from 2014 to 2020 [52].

Figure 4 shows the rapid increase in the adoption of GANs in different applications hence leading to the increase in the number of articles published on GANs from 2014 to 2020. The first two journal papers on GANs were published in 2014. In 2016, 48 papers on GANs were published. Over 200 papers were published in 2017, and in 2020, more relevant papers were published, are more than 500 papers on GANs were reviewed. We also reviewed the number of GAN-related papers from 2014 to 2022, according to Figure 5.

Figure 5.

Cumulative number of GAN-related papers reviewed.

Figure 5 shows the cumulative number of 260 GAN-related papers reviewed for this study. GAN paper was researched more in 2022. Also, from the papers reviewed, we observed the applications of GANs in a wide variety of areas. The applications of GANs in a wide variety of areas are shown in Figure 6.

Figure 6.

A cumulative number of GAN applications reviewed.

Many studies applied GANs in computer vision (image synthesis and manipulation, image super-resolution) (n = 120), Sequential Data (natural language processing) (n = 60), followed by the application of Gans in agriculture (n = 30), Health (Medical) (n = 50) respectively. There were also several studies on GAN applications, as shown in Figure 7. Most of the reviewed papers show that GAN is applied more in computer vision (application area number = 120). Sequential data (application area number = 60) was the second most frequently applied area, followed by health (application area number = 50) and low-application areas in agriculture. Furthermore, the observation in terms of the frequency of evaluation metrics is shown in Figure 8.

Figure 7.

The mean effect size for comparison of the various studies.

Figure 8.

Cumulative number of GAN evaluation metrics reviewed.

From Figure 8, One can observe that SSIM metrics are the most widely used metric for publication reviews followed by KAPPA, FID and PSNR.

6.2 Analysis using comprehensive meta-analysis software

For the analysis, twenty papers were selected from 260 publications reviewed using different methods for combining and summarising findings such as effect size, Confidence Interval (CI), and random-effects model for the statistical analysis, as shown in Figure 9.

Figure 9.

Random effect model forest plot for comparison of the various studies.

From the forest plot in Figure 9, one can see the twenty studies, their respective mean, and Confidence Interval (95%CI). The black plus signs represent the effect size of each study. The bigger the box means the study weighted more. The red diamond shape represents the pool of the twenty studies. One can see the red diamond is at a negative line OR = −1. This can be confirmed by the 95%-confidence interval and the p-value >0.05. Figure 7 shows the mean effect size is −0,537 with a 95% confidence interval of −1205 to 0,132.

From Figure 7, the mean effect size in the universe of comparable studies could fall anywhere in this interval. The Z-value tests the null hypothesis that the mean effect size is zero. The Z-value is −1573 with p = 0,116. Using a criterion alpha of 0,050. Assuming that the true effects are normally distributed (in raw units), we can estimate that the prediction interval is −3403 to 2330. The true effect size in 95% of all comparable populations falls in this interval.

Additionally, when meta-analysis captures all the significant studies, a funnel plot to be symmetric is expected. The studies are expected to be distributed equally on either side of the overall effect. Therefore, if the funnel plot is asymmetric, with a relatively high number of small studies (representing a large effect size) falling toward the right of the mean effect and relatively few falling toward the left, we are concerned that these left-hand studies may exist and are missing from the analysis. The results of publication bias are demonstrated in Figure 10.

Figure 10.

Publication bias funnel plot for comparison of various studies.

To analyse these studies, Duval and Tweedie developed a method which evaluates the possibility of missing studies, adds them, and computes the effect using the trim and fill method. This method of asymmetric trim studies from the right-hand side to locate the unbiased effect (in an iterative procedure). Under the fixed effect model, the combined studies’ point estimate and 95% confidence interval are −0,64,472(−0,85,850, −0,43,095). For the Confidence Interval, 95%CI was used which gives −0,53,660 (−120,517, 0,13,198). Using Trim and Fill, these values are unchanged.

Advertisement

7. Open research problems

Although GANs have provided unprecedented opportunities for computer vision in particularly image classification tasks and for improving model performance in different applications, there are still open research problems for GANs.

7.1 GANS optimization

One of the challenges in GANs is the difficulty in optimising GANs. Although many approaches have been proposed, such as new architectures, objective functions, training strategies, etc., for mitigating this issue, it is not guaranteed by any means to achieve optimally converged GANs.

7.2 GANs for discrete data

The generated samples by GANs are different from the generative parameters which makes it difficult for GANs to generate discrete data. Hence, finding the solution to this problem is crucial as it will assist to unlock the potential of GANs for NLP. Three different ways have been suggested to address this issue, this includes using Gumbel-SoftMax [53] or the concrete distribution [54] and Reinforce algorithm [55], to learn samples that can be transformed into discrete values.

7.3 Estimation data uncertainty

In general, uncertainty estimates decrease with more data. Specifying generated distributed samples is difficult in GANs.

7.4 Data selection

Current data selection strategies include random and dense sampling. Random sampling does not consider the adaptability of GANs to data density, and dense sampling reduces task difficulty and reduces data diversity.

7.5 Loss of training conditions

Another future study is the appropriate utilisation of loss function. Although identifying the appropriate loss function and their applicability to improve GANs performance remain unclear. From the empirical point of view, the utilisation of appropriate loss functions in GANs can assist in the construction of reliable objective GANs objective function for training.

Advertisement

8. Conclusions

GANs have gained considerable attention in the generation of realistic images and modern world applications. This paper presents a comprehensive review of various GANs that have been used in different applications. In this review, detailed generative models and GAN variants methods, applications, performance metrics, and open research issues were investigated. In addition, we conducted a meta-analysis on GANs.

Although GANs are widely used for generating realistic images, many challenges affect GANs performances, such as difficulties in training GANs due to mode collapse, non-convergence and stability, the inability of GANs to produce discrete data directly, data selection, and loss of training Conditions. The effective way to address this issue is to select appropriate GANs network architecture with suitable objective functions and parameter optimization techniques. Although many GANs invariants have provided solutions to this challenging issue, however, there are still some open issues that have not been resolved.

We provide the present progress of GANs by reviewing different GANs invariant methods. Furthermore, the image-generated output using the GANs methods discussed in this review has the potential to handle some of the computer vision application problems while preserving the extrinsic distribution. It is worth noting that generated images from GANs might not be used for the replacement of real datasets. To improve the performance of the deep learning model, the combination of real and generated images can be used. Hence, one can see that the future of GANs is promising, and there are a lot of opportunities for further research and applications in many fields. We believe this review will help readers to gain a thorough understanding of the GANs research area.

Advertisement

Acknowledgments

The authors acknowledge the contributions and financial support of the Department of Computer Systems Engineering, Tshwane University of Technology, South Africa.

References

  1. 1. Salehi P, Chalechale A, Taghizadeh M. Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments. IEEE Transactions on Visualization and Computer Graphics. 2018;24(6)216-221
  2. 2. Yinka-Banjo C, Ugot O-A. A review of generative adversarial networks and it’s application in cybersecurity. Artificial Intelligence Review. 2020;53:1721-1736
  3. 3. Goodfellow I et al. Generative Adversarial Networks. Advances in Neural Information Processing Systems. Vol. 12. 2014. pp. 2672-2680
  4. 4. Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks. Neural Information Processing Systems. 2012;25:1097-1105. DOI: 10.1145/3065386
  5. 5. Villegas R, Yang J, Hong S, Lin X, Lee H. Decomposing motion and content for natural video sequence prediction. In: Proceedings of 5th International Conference on Learning Representations (ICLR 2017). 2017. pp. 1-22
  6. 6. Sharma N, Sharma R, Jindal N. Comparative analysis of CycleGAN and AttentionGAN on face aging application. India Academy of Sciences. 2022;47(33):1-20
  7. 7. Baidoo-anu D, Owusu Ansah L. Education in the era of generative Artificial Intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI. 2023;7(1):52-62
  8. 8. Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv. 2020
  9. 9. Tulyakov S, Liu M-Y, Yang X, Kautz J. Mocogan: Decomposing motion and content for video generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Vol. 23. 2018. pp. 1526-1535
  10. 10. Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. In: Lee DD, Sugiyama M, von Luxburg U, Guyon I, Garnett R, editors. NIPS. 2016. pp. 2172-2180
  11. 11. Lipton ZC, Tripathi S. Precise recovery of latent vectors from generative adversarial networks. arXiv. 2020
  12. 12. Dumoulin V, Belghazi I, Poole B, Mastropietro O, Lamb A, Arjovsky M, et al. Adversarially learned inference. arXiv. 2020
  13. 13. Donahue J, Krähenbühl P, Darrell T. Adversarial feature learning. arXiv. 2020
  14. 14. Arjovsky M, Chintala S, Bottou L. Wasserstein Gan. arXiv. 2020
  15. 15. Zhu J-Y, Park T, Isola P, Efros A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. pp. 2242-2251. DOI: 10.1109/ICCV.2017.244
  16. 16. Karras T, Laine S, Aila T. A Style-Based Generator Architecture for Generative Adversarial Networks. In: Proceedings of the BIEEE Conference on Computer Vision and Pattern Recognition. 2019. pp. 4396-4405. DOI: 10.1109/CVPR.2019.00453
  17. 17. Park S-W, Ko J-S, Huh J-H, Kim J-C. Review on generative adversarial networks: Focusing on computer vision and its applications. Electronics. 2021;10:1216
  18. 18. Brock A, Donahue J, Simonyan K. Large scale GAN training for high fidelity natural image synthesis. arXiv. 2019
  19. 19. Zhao J, Mathieu M, LeCun Y. Energy-based generative adversarial networks. In: Paper presented at 5th International Conference on Learning Representations, ICLR 2017. Toulon, France. 2017. pp. 1-17
  20. 20. Evtimova K, Drozdov A. Understanding Mutual Information and its Use in InfoGAN. 2021
  21. 21. Vaccari I, Orani V, Paglialonga A, Cambiaso E, Mongelli M. A generative adversarial network (GAN) technique for internet of medical things data. Sensors. 2021;3726(21):1-14
  22. 22. Gui J, Sun Z, Wen Y, Tao D, Ye J. A review on generative adversarial networks: Algorithms, Theory, and Applications. IEEE Transactions on Knowledge and Data Engineering. 2023;35:3313-3332
  23. 23. Wu X, Xu K, Hall P. A survey of image synthesis and editing with generative adversarial networks. Tsinghua Science and Technology. 2017;(3)
  24. 24. Fang H, Deng W, Zhong Y, Hu J. Triple-GAN: Progressive Face Aging with Triple Translation Loss. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2020. pp. 3500-3509. DOI: 10.1109/CVPRW50498.2020.00410
  25. 25. Kim T, Cha M, Kim H, Lee JK, Kim J. Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70 ( ICML'17). 2017. pp. 1857-1865
  26. 26. Christian L, Theis L, Huszar F, Caballero J, Cunningham A, Acosta A, et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. pp. 105-114. DOI: 10.1109/CVPR.2017.19
  27. 27. Jolicoeur-Martineau A. The relativistic discriminator: A key element missing from standard GAN. arXiv. 2018
  28. 28. Dento E, Birodkar V. Unsupervised learning of disentangled representations from video. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. December 2017. pp. 4417-4426
  29. 29. Clark A, Donahue J, Simonyan K. Efficient Video Generation on Complex Datasets. arXiv preprint arXiv:1907.06571. 2019. DOI: 10.48550/arXiv.1907.06571
  30. 30. Benyou W, Peng Z, Dell Z. IRGAN: A minimax game for unifying generative and discriminative information retrieval models. In: International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017. pp. 515-524
  31. 31. Yu L, Zhang W, Wang J, Yu Y. SEQGAN: Sequence generative adversarial nets with policy gradient. In: Conference: AAAI-17: Thirty-first AAAI Conference on Artificial Intelligence, 4-9 February 2017. Vol. 31. San Francisco, California, USA. 2017. pp. 2852-2858
  32. 32. Lin K, Li D, He X, Zhang Z, Sun M-T. Adversarial ranking for language generation. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). 2017. pp. 3155-3165
  33. 33. Wang X, Chen W, Wang Y-F, Wang WY. No metrics are perfect: Adversarial reward learning for visual storytelling. In: Proceedings of the 56th Conference: Annual Meeting of the Association for Computational Linguistics. 2018. DOI: 10.18653/v1/P18-1083
  34. 34. Qin P, Xu W, Wang WY. DSGAN: Generative adversarial training for Distant supervision relation extraction. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. DOI: 10.18653/v1/P18-1046
  35. 35. d’Autume de CM, Rosca M, Rae J, Mohamed S. Training language GANs from Scratch. In: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). 2019
  36. 36. Qiao T, Zhang J, Xu D, Tao D. MirrorGAN: Learning Text-To-Image Generation by Redescription. In: IEEE Generative Adversarial Networks: Applications, Challenges, and Open Issues. Conference on Computer Vision and Pattern Recognition. 2019. pp. 1505-1514. DOI: 10.5772/intechopen.113098; DOI: 10.1109/CVPR.2019.00160
  37. 37. Dash A, Gamboa JCB, Ahmed S, Liwicki M, Afzal MZ. Tac-Gan-text conditioned auxiliary classifier generative adversarial network. arXiv. 2017
  38. 38. Mogren O. CRRGAN: Continuous recurrent neural networks with adversarial training. arXiv. 2016
  39. 39. Lee S-g, Hwang U, Min S, Yoon S. A Seqgan for polyphonic music generation. arXiv. 2017
  40. 40. Saito Y, Takamichi S, Saruwatari H. Statistical parametric speech synthesis incorporating generative adversarial networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2017;26(1):84-96
  41. 41. Pascual S, Bonafonte A, Serrà J. SEGAN: Speech Enhancement Generative Adversarial Network. At the Conference of Interspeech, 2017. pp. 3642-3646. DOI: 10.21437/Interspeech.2017-1428
  42. 42. Donahue C, Li B, Prabhavalkar R.Exploring speech enhancement with generative adversarial networks for robust speech recognition. In: ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2018. pp. 5024-5028. DOI: 10.1109/ICASSP.2018.8462581
  43. 43. Killoran N, Lee LJ, Delong A, Duvenaud D, Frey BJ. Generating and designing DNA with deep generative models. arXiv. 2017
  44. 44. Gupta A, Zou J. Feedback Gan (Fbgan) for Dna: A novel feedback-loop architecture for optimizing protein functions. arXiv. 2018
  45. 45. Hwang J-J, Azernikov S, Efros AA, Yu SX. Learning beyond human expertise with generative models for dental restorations. arXiv. 2018
  46. 46. Tian B, Zhang Y, Chen X, Xing C, Li C. DRGAN: A GAN-Based Framework for Doctor Recommendation in Chinese On-Line QA Communities. In: Database Systems for Advanced Applications. 2019. pp. 444-447. DOI: 10.1007/978-3-030-18590-9_63
  47. 47. Xu D, Wu Y, Yuan S, Zhang L, Wu X. Achieving Causal Fairness through Generative Adversarial Networks. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19). 2019. pp. 1452-1458, DOI: 10.24963/ijcai.2019/201
  48. 48. Zheng Z, Zheng L, Yang Y. Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in vitro. In: 2017 IEEE International Conference on Computer Vision (ICCV). 2017. pp. 3754-3762. DOI: 10.1109/ICCV.2017.405
  49. 49. Ratzlaff N, Li F. HyperGAN: A Generative Model for Diverse, Performant Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, PMLR. Vol. 97. 2019. pp. 5361-5369
  50. 50. Wang Q, Nguyen QVH, Yin H, Huang Z, Wang H, Cui L. Enhancing collaborative filtering with generative augmentation. In: 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), Anchorage, Alaska, United States, 4-8 August 2019. New York, NY, United States: Association for Computing; 2019. pp. 548-556. Machinery. DOI: 10.1145/3292500.3330873
  51. 51. Yunchao Z, Yanjie F, Pengyang W, Xiaolin L, Yu Z. Unifying Inter-region Autocorrelation and Intra-region Structures for Spatial Embedding via Collective Adversarial Learning. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2019. pp. 1700-1708. DOI: 10.1145/3292500.3330972
  52. 52. Hindupur A. The GAN Zoo. 2018 (Web page)
  53. 53. Kusner MJ, Hernandez-Lobato JM. Gans for sequences of discrete elements with the Gumbel-Softmax distribution. arXiv. 2016
  54. 54. Maddison CJ, Mnih A, Teh YW. The concrete distribution: A continuous relaxation of discrete random variables. arXiv. 2016
  55. 55. Williams RJ. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning. 1992;8:229-256

Written By

Dorcas Oladayo Esan, Pius Adewale Owolawi and Chunling Tu

Submitted: 30 July 2023 Reviewed: 04 September 2023 Published: 23 November 2023